Design and develop AI/ML infrastructure and MLOps on AWS, partner with data scientists, and operationalize ML models at scale.
Key Highlights
Technical Skills Required
Benefits & Perks
Job Description
Job Title: AI/ML Ops Engineer with AWS experience - W2 Only - We can provide sponsorship as well
Duration: Long Term
Location: Durham, NC/Westlake, TX - Hybrid
Must Have:
- Deep SageMaker MLOps platform design
- Experience building feature stores
- Strong Step Functions / EventBridge automation
- AWS experience
- Python Experience
The Expertise You Have
- Has Bachelor’s or Master’s Degree in a technology related field (e.g. Engineering, Computer Science, etc.).
- Experience in Object Oriented Programming (Java, Scala, Python), SQL, Unix scripting or related programming languages and exposure to some of Python’s ML ecosystem (numpy, panda, sklearn, tensorflow, etc.).
- Experience in building cloud native applications using AWS services like S3, RDS, CFT, SNS, SQS, Step functions, Event Bridge, cloud watch etc.,
- Experience with building data pipelines in getting the data required to build, deploy and evaluate ML models, using tools like Apache Spark, AWS Glue or other distributed data processing frameworks.
- Data movement technologies (ETL/ELT), Messaging/Streaming Technologies (AWS SQS, Kinesis/Kafka), Relational and NoSQL databases (DynamoDB, EKS, Graph database), API and in-memory technologies.
- Strong knowledge of developing highly scalable distributed systems using Open-source technologies.
- 5+ years of proven experience in implementing Big data solutions in data analytics space.
- Experience in developing ML infrastructure and MLOps in the Cloud using AWS Sagemaker.
- Extensive experience working with machine learning models with respect to deployment, inference, tuning, and measurement required.
- Experience with CI/CD tools (e.g., Jenkins or equivalent), version control (Git), orchestration/DAGs tools (AWS Step Functions, Airflow, Luigi, Kubeflow, or equivalent).
- Solid experience in Agile methodologies (Kanban and SCRUM).
The Skills You Bring
- You have strong technical design and analysis skills.
- You the ability to deal with ambiguity and work in fast paced environment.
- Your experience supporting critical applications.
- You are familiar with applied data science methods, feature engineering and machine learning algorithms.
- Your Data wrangling experience with structured, semi-structure and unstructured data.
- Your experience building ML infrastructure, with an eye towards software engineering.
- You have excellent communication skills, both through written and verbal channels.
- You have excellent collaboration skills to work with multiple teams in the organization.
- Your ability to understand and adapt to changing business priorities and technology advancements in Big data and Data Science ecosystem.
The Value You Deliver
- Designing & developing a feature generation & store framework that promotes sharing of data/features among different ML models.
- Partner with Data Scientists and to help use the foundational platform upon which models can be built and trained.
- Operationalize ML Models at scale (e.g. Serve predictions on tens of millions of customers).
- Build tools to help detect shifts in data/features used by ML models to help identify issues in advance of deteriorating prediction quality, monitoring the uncertainty of model outputs, automating prediction explanation for model diagnostics.
- Exploring new technology trends and leveraging them to simplify our data and ML ecosystem.
- Driving Innovation and implementing solutions with future thinking.
- Guiding teams to improve development agility and productivity.
- Resolving technical roadblocks and mitigating potential risks.
- Delivering system automation by setting up continuous integration/continuous delivery pipelines.