Principal Data Engineer

iHerb
Summary
Join iHerb's Global Data engineering team as a Principal Data Engineer, leading the design, development, and maintenance of the infrastructure supporting company-wide reporting and analytics. Collaborate with engineers, architects, and analysts to provide insights and drive data-informed decision-making. Champion best development practices and possess a strong understanding of iHerb's data platform tools. This role involves designing and developing data pipelines, building data architecture and applications, managing data pipeline jobs, and ensuring site reliability. You will also build strong cross-functional partnerships, continuously improve data understanding, and mentor other engineers. The position requires extensive experience in data engineering, strong technical skills, and excellent communication abilities.
Requirements
- 7+ years of programming skills with Python
- 3+ years of experience working with API’s
- Have experience with Docker and/or Kubernetes
- Proven experience working with large datasets
- Proficient with shell scripting
- Proficient in building automated testing within CICD
- Experienced in Agile methodologies & DevOps approach to maintaining pipelines and databases
- Excellent knowledge of software engineering fundamentals
- Deep understanding of data lifecycles, data computation principles, data stores and a solid understanding of CICD principles
- Proficiency with Databricks (DLT, Medallion Architecture, Lakehouse Concepts, etc)
- Proven experience building scalable data platforms professionally
- Approves data analysis tools and processes
- Experience building data pipelines and ETL using PySpark on semi-structured data (merge, delete, combine, wrangling)
- Excellent ability to communicate large-scale projects and their impact on other decisions
- Experience with large-scale messaging systems like Kafka
- Have an ability to prioritize workload and handle multiple tasks and at times meet tight deadlines
- Advanced working SQL experience
- Comprehensive understanding of data modeling principles and patterns (star and snowflake DM, ER) and a history of implementing them professionally
- Relational and non-relational data structures, theories, principles, and best practices
- Knowledge of data privacy regulations (GDPR, CCPA, CRPA) and the impact these regulations have on data engineering framework
- Data encryption and secure transmission practices (SSL, SSH, SFTP, Certificates, PKI, OAUTH2)
- Has worked on data quality improvement projects such as Master Data Management
- Strong problem-solving and analytical skills
- Strong facilitation and consensus building skills. Strong oral and written communication skills; Ability to communicate by simplifying complexity
- Excellent ability to communicate large-scale projects and their impact on other decisions
- Ability to understand and apply customer requirements, including drawing out unforeseen implications and making design recommendations
- Passion for data engineering and for enabling others by making their data easier to access
- Be proactive, requiring minimal supervision with strong time management or organization skills
- Proven experience leading large scale projects in an engineering team
- Must be an inquisitive learner and have a thirst for improvement
- Ability to mentor Data and Analytics team members in best practices, processes and technologies in Data platforms
- Excellent verbal and written communication skills
- Databricks Engineer Professional Certification
- Generally, requires a minimum of 10 years as a Data Engineer and at least 5 years as a Data Engineer within a Data and analytics team
- Bachelor or master’s degree in computer science, Information Systems, or related fields preferred, or a combination of education and equivalent work experience
Responsibilities
- Designs and develops pipelines that support data ingestion, curation, and provisioning of complex enterprise data to support analytics and reporting in our current technology stack
- Provides successful deployment and provisioning of data solutions to required environments
- Designs and builds data architecture and applications that successfully enable speed, quality, and efficient pipelines
- Responsible for the data pipeline continuous integration and continuous delivery (CI/CD) processes
- Manages data pipeline jobs throughout their lifecycle
- Assist in the design and build efficient data models for robust business intelligence, analytics, and engineering needs that remain
- Demonstrate initiative by seeking potential business issues and proactively solve them
- Analyze and translate business needs into data models to support long-term, scalable, and reliable solutions
- Interacts with cross-functional customers and development team to gather and define requirements
- Reviews discrepancies in requirements and resolves with stakeholders in a timely manner
- Build strong cross-functional partnerships with Data Scientists, Analysts, Product Managers and Software Engineers to understand data needs and deliver on those needs
- Continuously improves understanding of the data and applications across the business
- Lead processes that ensure site reliability for our data stack
- Optimize and tune code performance
- Develop best practices for standard naming conventions and coding practices to ensure consistency of data models and tracking
- Actively engages with other technical teams to make recommendations on cohesive infrastructure guidelines
- Champion the use of the latest innovations
- Partner with IT and Legal to design secure and automated processes and implement practices that enable data democracy and agility
- Identifies and recommends appropriate data quality validations and ensures integrations are automated and have proper exception handling
- Leads pipeline code and metadata framework changes
- Engages with other development teams upstream to proactively understand downstream impacts
- Actively pursues industry developments and makes suggestions on best practices across the architecture
- Run, guide, and implement database administration responsibilities and continuously automate relevant processes
- Leads pipeline code and metadata framework changes
- Seeks out opportunities to elevate fellow engineers’ abilities and experience and mentor them to upgrade their skills
Preferred Qualifications
- One of the following AWS Certifications, preferred
- AWS Certified Solutions Architect – Associate/Professional
- AWS Certified Developer– Associate/Professional
- AWS Certified DevOps Engineer – Associate/Professional
- AWS Certified Data Analytics
- AWS Certified Cloud Practitioner
- Experience with Microsoft Office Suite (Word, Excel, PowerPoint)
- Experience with Google Business Suite (Gmail, Drive, Docs, Sheets, Forms) preferred
- Experience with technologies such as Python, KAFKA, Airflow and SQL
Benefits
- Employees (and their families) that meet eligibility criteria as outlined in applicable plan documents are eligible to participate in our medical, dental, vision, and basic life insurance programs and may enroll in our company’s 401(k) plan
- Employees will also be eligible for Time Off and Paid Sick Leave pursuant to the company’s policies
- Employees will enjoy paid holidays throughout the calendar year
- Hired applicant may be awarded Restrict Stock Units and receive annual bonuses pursuant to eligibility and performance criteria defined in the respective plan documents and policies