Senior Data Collection Engineer at Centric Software

Summary

Join Centric Software as a Data Collection Engineer and build scalable, high-quality data collection systems. Collaborate with cross-functional teams to drive innovation and maintain a robust data pipeline. Design and build robust web crawlers using Scrapy, ensuring modularity and maintainability. Enhance and maintain infrastructure, including CI/CD pipelines and Scrapyd. Uphold coding standards, conduct code reviews, and mentor junior engineers. Integrate performance monitoring systems and conduct regular spider audits. Build data validation mechanisms and collaborate with internal consumers to ensure data quality. Work cross-functionally and promote a culture of knowledge sharing and continuous improvement. This role requires expertise in web technologies, cloud infrastructure, and data pipeline development. The ideal candidate will possess strong communication and problem-solving skills.

Requirements

Comfort with Git workflows, code reviews, and CI/CD pipelines
Experience with cloud infrastructure like AWS
Experience with monitoring/observability systems like Grafana and Sentry
Knowledge of the Web environment (model, standards, DOM, Request-Response, Cookies, JavaScript, Browsers, Headers, XHR, etc.)
Excellent communication skills in English, both written and spoken
A collaborative mindset with a proactive approach to knowledge sharing
Strong analytical thinking and problem-solving abilities
Commitment to continuous improvement, mentoring, and agile team dynamics
Remain up-to-date with technology trends to keep our software as innovative as possible

Responsibilities

Design and Build Robust Web Crawlers
Develop and maintain spiders for high-scale data extraction using Scrapy
Ensure spiders are modular, reusable, and easy to maintain with components such as loaders, middlewares, and pipelines
Apply advanced techniques to bypass anti-bot mechanisms, including rotating proxies, captcha-solving strategies and fingerprinting
Enhance and Maintain Infrastructure
Build scalable CI/CD pipelines for automated testing, deployment, and monitoring of spiders
Leverage tools like Scrapyd for centralized spider scheduling and lifecycle management
Ensure efficient parallelization and cloud deployment for high-throughput crawling
Code Quality and Consistency
Uphold coding standards and implement consistent practices across teams
Conduct thorough code reviews and mentor junior engineers on clean code principles
Maintain version control and detailed change logs for spider development
Monitoring, Maintenance & Reliability
Integrate performance monitoring systems to ensure real-time alerts and health checks
Schedule periodic spider audits to handle structure changes and improve reliability
Troubleshoot failures and optimize resource usage (CPU/network) for crawling efficiency
Data Integrity and Accuracy
Build robust data validation mechanisms to guarantee quality outputs
Collaborate with internal consumers to ensure data collected aligns with business requirements
Continuously track data anomalies and automate recovery strategies
Collaboration and Knowledge Sharing
Work cross-functionally with product, engineering, and other data teams
Promote a culture of documentation, onboarding tools, and internal knowledge bases
Contribute to training initiatives, helping the team stay current on scraping techniques and technologies

Preferred Qualifications

Familiarity with TLS/SSL, TCP/IP stack, and low-level web networking is a strong plus
Proficient in designing fault-tolerant systems and deploying them at scale
Familiarity with containerized deployments
Proficient in developing scalable web crawlers and data pipelines using Python and Scrapy
Experience building resilient scraping systems across diverse web architectures
Prior experience mentoring or leading junior developers

Senior Data Collection Engineer

Centric Software

Summary

Requirements

Responsibilities

Preferred Qualifications

Remote

Data

Senior

Similar Remote Jobs

Netskope

Remote

Data

Senior

Remote

Data

Senior

Remote

Data

Senior

Centric Software

Remote

Data

Mid-level

Remote

Data

Senior

Remote

Data

Senior

Remote

Data

Senior

Remote

Data

Senior

Remote

Data

Senior

Remote

Data

Senior