What are the responsibilities and job description for the Site Reliability Engineer with ML platform - Only W2 position at Saransh Inc?
Title: Site Reliability Engineer SRE – ML platform
Location: Austin, TX or Sunnyvale, CA
ONLY W2
Responsibilities
Location: Austin, TX or Sunnyvale, CA
ONLY W2
Responsibilities
- Continuous Deployment using GitHub Actions, Flux, Kustomize
- Design and implement cloud solutions, build MLOps on cloud AWS
- Data science model containerization, deployment using docker, VLLM, Kubernetes
- Communicate with a team of data scientists, data engineers and architects, document the processes
- Develop and deploy scalable tools and services for our clients to handle machine learning training and inference.
- Knowledge of ML models and LLM
- 6 years of experience in ML Ops with strong knowledge in Kubernetes, Python, MongoDB and AWS.
- Good understanding of Apache SOLR.
- Proficient with Linux administration.
- Knowledge of ML models and LLM.
- Ability to understand tools used by data scientists and experience with software development and test automation
- Ability to design and implement cloud solutions and ability to build MLOps pipelines on cloud solutions (AWS)
- Experience working with cloud computing and database systems
- Experience building custom integrations between cloud-based systems using APIs
- Experience developing and maintaining ML systems built with open-source tools
- Experience with MLOps Frameworks like Kubeflow, MLFlow, DataRobot, Airflow etc., experience with Docker and Kubernetes
- Experience developing containers and Kubernetes in cloud computing environments
- Familiarity with one or more data-oriented workflow orchestration frameworks (Kubeflow, Airflow, Argo, etc.)
- Ability to translate business needs to technical requirements
- Strong understanding of software testing, benchmarking, and continuous integration
- Exposure to machine learning methodology and best practices
- Good communication skills and ability to work in a team