Staff Software Engineer, ML Ops at EvolutionIQ

Summary

Join EvolutionIQ, a rapidly growing company named one of Inc.’s Best Workplaces, as a Staff Software Engineer - ML Platform. You will lead the architecture, deployment, and scaling of our machine learning (ML) and artificial intelligence (AI) infrastructure. This role requires deep technical expertise and strategic thinking to drive innovation in the ML pipeline and optimize workflows. You will collaborate with machine learning engineers and senior leadership to streamline experimentation and enhance system performance. You'll play a key role in setting MLOps standards and supporting the next phase of ML pipeline development to handle significant data volume increases. EvolutionIQ offers competitive compensation, comprehensive benefits, and a supportive work environment.

Requirements

8+ years of software development experience with a focus on platform development with AI/ML applications of scale
Experience in providing technical leadership to ML Infra / ML Platform teams
Experience in shipping products at scale
Expertise in clean and efficient coding with a focus on Python
Experience with orchestration frameworks such as Dagster/Airflow
Expertise in one or more Cloud platforms (GCP preferred but not required)
Bachelor’s Degree or higher in Computer Science, Mathematics, or related field
Excellent document writing skills (additional to presenting results through Jupyter notebooks)

Responsibilities

Design, build, and launch scalable ML and data processing systems supporting multi-machine data processing (e.g., MapReduce), GPU/TPU model training, and automated model monitoring systems on cloud platforms
Automate model lifecycle management, including training, evaluation, and deployment, to enable fast, safe, and consistent updates across environments
Introduce modern, scalable frameworks for model monitoring, feature engineering, hyperparameter tuning, and continuous re-training, ensuring robust model performance over time
Lead the deployment of models through REST and gRPC APIs, enabling smooth integration with application frontends and real-time user interaction
Continuously research, evaluate, and implement the latest MLOps tools, frameworks, and platforms to improve efficiency, scalability, and reliability of ML operations
Implement and manage monitoring systems to track model and data performance, proactively identifying and mitigating issues using tools like Prometheus and Grafana
Apply best practices in secure data handling and model integrity within ML workflows, ensuring regulatory and security compliance norms
Share MLOps knowledge and improvements in ML engineering workflows through internal training sessions and presentations
Support the next phase of ML pipeline development, focusing on building, maintaining, and monitoring pipelines to handle 10-100x increases in data volume
Utilize data and metrics to drive decision-making, ensuring that the ML platform is optimized for performance, reliability, and scalability
Proactively identifies and addresses bottlenecks in the ML pipeline, leveraging their expertise to develop and implement innovative solutions that enhance productivity and performance

Preferred Qualifications

Extreme creativity and resourcefulness, appetite to solve previously unsolved problems
Work-life balance
Open to giving and receiving critical feedback
Believes in the mission of the company, cares about fundamental fairness
Enthusiasm for team work and pair work
Kind, empathetic, polite, and professional
Remain agile and move between rapid prototyping and stable production development
Write design documents, perform code reviews, and maintain state of the art engineering practices

Benefits

Compensation: The range is $225-250K with flexibility depending on a candidate’s background and experience plus meaningful equity (stock options)
Well-Being: Full medical, dental, vision, short- & long-term disability, 401k matching. 100% of the employee contribution up to 3% and 50% of the next 2%
Work/Life Balance: For this role we are hoping this person can work out of the NYC office regularly with much of our leadership with flexibility. We also have a flexible vacation/PTO policy
Home & Family: 100% paid parental leave (4 months for primary caregivers and 3 months for secondary caregivers), sick days, paid time off. For new parents returning to work we offer a flexible schedule. We also offer sleep training to help you and your family navigate life schedules with a newborn
Office Life: Catered lunches, happy hours, and pet-friendly office space. $500 for your in home office setup and $200/year for upgrades every year after your initial setup
Growth & Training: $1,000/year for each employee for professional development, as well as upskilling opportunities internally
Sponsorship: We are open to sponsoring candidates currently in the U.S. who need to transfer their active H1-B visa

Staff Software Engineer, ML Ops

EvolutionIQ

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

Software Development

Senior

Similar Remote Jobs

Anima

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Mid-level

Remote

Software Development

Mid-level

Headway

Remote

Product

Mid-level

CoEnterprise

Remote

Data

Senior

Remote

Data

Senior

Remote

Software Development

Senior

Remote

Software Development

Mid-level

Stack AV

Remote

Software Development

Senior