Software Engineer

Tekmetric
Summary
Join Tekmetric, a cloud-based auto-repair shop management system company, as a Software Engineer specializing in web scraping, data processing, and search technologies. You will be responsible for building a large-scale data ingestion and classification system, extracting data from diverse sources, cleaning and normalizing it, and building search capabilities using ElasticSearch/OpenSearch. You will work with Python, Scrapy, Airflow, Kubernetes, AWS, and Spark to create scalable, high-performance data pipelines. Tekmetric offers a dynamic work environment, flexible and remote work opportunities, generous PTO, exceptional leave programs, excellent medical, dental, vision, and prescription drug coverage, a 401(k) retirement savings plan with a 6% match, employer-covered STD, LTD, Life and AD&D Insurance Programs, up to $60 monthly for wellness expenses and activities, and education assistance.
Requirements
- 3+ years of experience in Python with building crawling/scraping solutions at scale
- Experience working with APIs (REST), PDF processing (OCR, Tesseract, PyMuPDF etc.)
- Proficiency in data processing & search technologies (ElasticSearch/OpenSearch, NoSQL/SQL databases)
- Hands-on experience with Airflow and Spark (EMR) or similar distributed systems
- Strong problem-solving skills in handling anti-scraping mechanisms and data scaling challenges
- Hands-on experience with AWS or GCP
Responsibilities
- Build and design large scale, distributed crawling bots (perhaps AI agents) and infrastructure that operate in an adversarial environment aiming at low operational overhead
- Develop and maintain data pipelines to extract data from large volumes of web pages, documents, PDFs (OCR), and APIs
- Help unify heterogeneous documents into a coherent data schema across varied source formats
- Preprocess and normalize raw data for downstream classification, ML/NLP, and search indexing
- Build APIs to expose structured, classified data via ElasticSearch/OpenSearch
- Collaborate with ML/NLP teams to integrate classification models into the pipeline
- Automate workflows using Apache Airflow and deploy solutions in Kubernetes on AWS
- Optimize and scale data pipelines using Spark (EMR) for processing large datasets
Preferred Qualifications
- Familiarity with NLP and Machine Learning (a plus but not required)
- Experience with LLMs, NLP models, or ML frameworks (e.g., Hugging Face, spaCy, TensorFlow, PyTorch)
- Prior experience in automated document classification
- Experience working in high-scale, production environments with petabytes of data
- Hands-on experience with Kubernetes
Benefits
- Flexible and remote work opportunities
- Generous PTO
- Exceptional leave programs for all of lifeβs moments: maternity, paternity and parental bonding, as well as medical leave to care for yourself or loved ones
- Excellent Medical, Dental, Vision and Prescription Drug Coverage
- 401(k) Retirement Savings Plan with a 6% Match
- Employer covered STD, LTD, Life and AD&D Insurance Programs
- Up to $60 monthly for wellness expenses and activities
- Education Assistance- includes undergraduate/graduate courses and continuing education
Share this job:
Similar Remote Jobs

