Performance Engineer

Writer
Summary
Join Writer as a Principal Performance Engineer and lead the performance optimization of our cutting-edge Generative AI technology stack. This critical role ensures the scalability, efficiency, and reliability of our Large Language Models (LLMs) and Retrieval Augmented Generation (RAG) systems. You will identify and resolve performance bottlenecks, optimize resource utilization, and ensure a seamless user experience. Collaboration with AI research, software engineering, and infrastructure teams is key to delivering world-class AI solutions. The position requires extensive experience in performance engineering, particularly with large-scale distributed systems and AI/ML technologies. A Bachelor's degree in a related field is required, with a Master's preferred.
Requirements
- Hold a Bachelor's degree in Computer Science, Engineering, or a related field
- Have 10+ years of experience in performance engineering, with a focus on large-scale distributed systems
- Have 2+ years of experience working with AI/ML technologies
- Possess proven experience in performance testing, profiling, and analysis of complex software systems
- Demonstrate a deep understanding of NLP architectures, training, and inference
- Have experience with vector databases and search technologies
- Have experience with cloud computing platforms (e.g., AWS, Azure, GCP) and containerization technologies (e.g., Docker, Kubernetes)
- Possess strong programming skills in Python
- Have experience with performance analysis tools (e.g., profilers, debuggers, monitoring tools)
- Possess strong analytical and problem-solving skills
- Possess excellent communication and collaboration skills
- Have the ability to work in a fast-paced and dynamic environment
- Demonstrate a passion for AI and a desire to push the boundaries of performance engineering
Responsibilities
- Define and implement performance engineering strategies for our Generative AI full stack, including services, application, LLMs, RAG pipelines, and related infrastructure
- Lead performance testing, profiling, and analysis efforts to identify and resolve performance bottlenecks
- Establish and maintain performance benchmarks and SLAs for critical AI services
- Provide technical leadership and mentorship to performance engineering team members
- Analyze and improve LLM inference performance, including latency, throughput, and resource utilization
- Develop and implement strategies for LLM capacity planning and scaling
- Collaborate with AI researchers to evaluate and improve LLM model architectures and training techniques for performance
- Optimize LLM inference through techniques such as quantization, distillation, and optimized kernel implementation
- Design and implement performance tests for RAG pipelines, including retrieval, ranking, and generation components
- Identify and optimize performance bottlenecks in RAG systems, such as database queries, vector search, and document processing
- Evaluate and optimize RAG system architectures for scalability and efficiency
- Tune vector databases for optimal recall and latency
- Collaborate with infrastructure teams to optimize hardware and software configurations for AI workloads
- Evaluate and recommend new technologies and tools for performance monitoring and analysis
- Develop and maintain performance dashboards and reports to track key metrics
- Optimize GPU utilization and memory management for LLM inference
- Work closely with AI researchers, software engineers, and product managers to ensure performance requirements are met
- Communicate performance findings and recommendations to stakeholders at all levels
- Stay up-to-date with the latest developments in Generative AI and performance engineering
Preferred Qualifications
Hold a Master's degree in Computer Science, Engineering, or a related field
Benefits
- Generous PTO, plus company holidays
- Medical, dental, and vision coverage for you and your family
- Paid parental leave for all parents (12 weeks)
- Fertility and family planning support
- Early-detection cancer testing through Galleri
- Flexible spending account and dependent FSA options
- Health savings account for eligible plans with company contribution
- Annual work-life stipends for: Home office setup, cell phone, internet
- Wellness stipend for gym, massage/chiropractor, personal training, etc
- Learning and development stipend
- Company-wide off-sites and team off-sites
- Competitive compensation, company stock options and 401k
Share this job:
Similar Remote Jobs
