Senior Software Engineer - Data Engineering
closed
Teikametrics
Summary
Join Teikametrics as a Senior Software Engineer - Data Engineering and design, develop, and scale robust data pipelines for processing massive structured and unstructured data. Collaborate with data scientists, analysts, and product engineers to deliver high-performance, scalable solutions using technologies like Databricks, Spark, Kafka, and AWS S3. You will build and manage ETL processes, implement data validation and governance, and integrate AI/ML models into production environments. The role requires 4+ years of experience in software/data engineering with expertise in large-scale distributed data processing systems and specific technologies like Apache Spark and Kafka. You will optimize data workflows, document technical designs, and contribute to a collaborative development environment. Teikametrics offers competitive benefits including remote work flexibility, broadband reimbursement, group medical insurance, and training and development allowance.
Requirements
- 4+ years of experience as a professional software/data engineer, with a strong background in building large-scale distributed data processing systems
- Experience with AI, machine learning, or data science concepts, including working on ML feature engineering, model training pipelines, or AI-driven data analytics
- Hands-on experience with Apache Spark (Scala or Python) and Databricks
- Experience with real-time data streaming technologies such as Kafka, Flink, Kinesis, or Dataflow
- Proficiency in Java, Scala, or Python for building scalable data engineering solutions
- Deep understanding of cloud-based architectures (AWS, GCP, or Azure) and experience with S3, Lambda, EMR, Glue, or Redshift
- Experience in writing well-designed, testable, and scalable AI/ML data pipelines that can be efficiently reused and maintained with effective unit and integration testing
- Strong understanding of data warehousing principles and best practices for optimizing large-scale ETL workflows
- Experience with ML frameworks such as TensorFlow, PyTorch, or Scikit-learn
- Optimize ML feature engineering and model training pipelines for scalability and efficiency
- Knowledge of SQL and NoSQL databases for structured and unstructured data storage
- Passion for collaborative development, continuous learning, and mentoring junior engineers
Responsibilities
- Design and implement highly scalable, fault-tolerant data pipelines for real-time and batch processing
- Develop and optimize end-to-end Databricks Spark pipelines for ingesting, processing, and transforming large volumes of structured and unstructured data
- Build and manage ETL (Extract, Transform, Load) processes to integrate data from diverse sources into our data ecosystem
- Implement data validation, governance, and quality assurance mechanisms to ensure accuracy, completeness, and reliability
- Collaborate with data scientists, ML engineers, and analysts to integrate AI/ML models into production environments, ensuring efficient data pipelines for training, deployment, and monitoring
- Work with real-time data streaming solutions such as Kafka, Kinesis, or Flink to process and analyze event-driven data
- Improve and optimize performance, scalability, and efficiency of data workflows and storage solutions
- Document technical designs, workflows, and best practices to facilitate knowledge sharing and maintain system documentation
Preferred Qualifications
- Exposure to MLOps or Feature Stores for managing machine learning model data
- Experience with data governance, compliance, and security best practices
- Experience working in a fast-paced startup environment
Benefits
- Every Teikametrics employee is eligible for company equity
- Remote Work β flexibility to work from home or from our offices + remote working allowance
- Broadband reimbursement
- Group Medical Insurance β Coverage of INR 7,50,000 per annum for a family
- Crèche benefit
- Training and development allowance







