Summary

Join iHerb's Global Data engineering team as a Principal Data Engineer, leading the design, development, and maintenance of the infrastructure supporting company-wide reporting and analytics. Collaborate with engineers, architects, and analysts to provide insights and drive data-informed decision-making. Champion best development practices and possess a strong understanding of iHerb's data platform tools. This role demands expertise in data pipeline development, architecture design, and CI/CD processes, along with strong communication and collaboration skills. You will lead projects, mentor team members, and stay abreast of industry advancements. The position requires extensive experience in data engineering and a deep understanding of data modeling, quality, and privacy regulations.

Requirements

7+ years of programming skills with Python
3+ years of experience working with API’s
Have experience with Docker and/or Kubernetes
Proven experience working with large datasets
Proficient with shell scripting
Proficient in building automated testing within CICD
Experienced in Agile methodologies & DevOps approach to maintaining pipelines and databases
Excellent knowledge of software engineering fundamentals
Deep understanding of data lifecycles, data computation principles, data stores and a solid understanding of CICD principles
Proficiency with Databricks (DLT, Medallion Architecture, Lakehouse Concepts, etc)
Proven experience building scalable data platforms professionally
Approves data analysis tools and processes
Experience building data pipelines and ETL using PySpark on semi-structured data (merge, delete, combine, wrangling)
Excellent ability to communicate large-scale projects and their impact on other decisions
Experience with large-scale messaging systems like Kafka
Have an ability to prioritize workload and handle multiple tasks and at times meet tight deadlines
Advanced working SQL experience
Comprehensive understanding of data modeling principles and patterns (star and snowflake DM, ER) and a history of implementing them professionally
Relational and non-relational data structures, theories, principles, and best practices
Knowledge of data privacy regulations (GDPR, CCPA, CRPA) and the impact these regulations have on data engineering framework
Data encryption and secure transmission practices (SSL, SSH, SFTP, Certificates, PKI, OAUTH2)
Has worked on data quality improvement projects such as Master Data Management
Strong problem-solving and analytical skills
Strong facilitation and consensus building skills. Strong oral and written communication skills; Ability to communicate by simplifying complexity
Excellent ability to communicate large-scale projects and their impact on other decisions
Ability to understand and apply customer requirements, including drawing out unforeseen implications and making design recommendations
Passion for data engineering and for enabling others by making their data easier to access
Be proactive, requiring minimal supervision with strong time management or organization skills
Proven experience leading large scale projects in an engineering team
Must be an inquisitive learner and have a thirst for improvement
Ability to mentor Data and Analytics team members in best practices, processes and technologies in Data platforms
Excellent verbal and written communication skills
Databricks Engineer Professional Certification
Generally, requires a minimum of 10 years as a Data Engineer and at least 5 years as a Data Engineer within a Data and analytics team
Bachelor or master’s degree in computer science, Information Systems, or related fields preferred, or a combination of education and equivalent work experience

Responsibilities

Designs and develops pipelines that support data ingestion, curation, and provisioning of complex enterprise data to support analytics and reporting in our current technology stack
Provides successful deployment and provisioning of data solutions to required environments
Designs and builds data architecture and applications that successfully enable speed, quality, and efficient pipelines
Responsible for the data pipeline continuous integration and continuous delivery (CI/CD) processes
Manages data pipeline jobs throughout their lifecycle
Assist in the design and build efficient data models for robust business intelligence, analytics, and engineering needs that remain
Demonstrate initiative by seeking potential business issues and proactively solve them
Analyze and translate business needs into data models to support long-term, scalable, and reliable solutions
Interacts with cross-functional customers and development team to gather and define requirements
Reviews discrepancies in requirements and resolves with stakeholders in a timely manner
Build strong cross-functional partnerships with Data Scientists, Analysts, Product Managers and Software Engineers to understand data needs and deliver on those needs
Continuously improves understanding of the data and applications across the business
Lead processes that ensure site reliability for our data stack
Optimize and tune code performance
Develop best practices for standard naming conventions and coding practices to ensure consistency of data models and tracking
Actively engages with other technical teams to make recommendations on cohesive infrastructure guidelines
Champion the use of the latest innovations
Partner with IT and Legal to design secure and automated processes and implement practices that enable data democracy and agility
Identifies and recommends appropriate data quality validations and ensures integrations are automated and have proper exception handling
Leads pipeline code and metadata framework changes
Engages with other development teams upstream to proactively understand downstream impacts
Actively pursues industry developments and makes suggestions on best practices across the architecture
Run, guide, and implement database administration responsibilities and continuously automate relevant processes
Leads pipeline code and metadata framework changes
Seeks out opportunities to elevate fellow engineers’ abilities and experience and mentor them to upgrade their skills

Preferred Qualifications

One of the following AWS Certifications, preferred
AWS Certified Solutions Architect – Associate/Professional
AWS Certified Developer– Associate/Professional
AWS Certified DevOps Engineer – Associate/Professional
AWS Certified Data Analytics
AWS Certified Cloud Practitioner
Experience with Microsoft Office Suite (Word, Excel, PowerPoint)
Experience with Google Business Suite (Gmail, Drive, Docs, Sheets, Forms) preferred
Experience with technologies such as Python, KAFKA, Airflow and SQL

Benefits

Employees (and their families) that meet eligibility criteria as outlined in applicable plan documents are eligible to participate in our medical, dental, vision, and basic life insurance programs and may enroll in our company’s 401(k) plan
Employees will also be eligible for Time Off and Paid Sick Leave pursuant to the company’s policies
Employees will enjoy paid holidays throughout the calendar year
Hired applicant may be awarded Restrict Stock Units and receive annual bonuses pursuant to eligibility and performance criteria defined in the respective plan documents and policies

Principal Data Engineer

iHerb

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

Data

Principal

Share this job:

Similar Remote Jobs

Remote

Data

Principal

Remote

Data

Principal

Remote

Data

Principal

Seamless.AI

Remote

Data

Principal

Seamless.AI

Remote

Data

Principal

Seamless.AI

Remote

Data

Principal

Seamless.AI

Remote

Data

Principal

Remote

Data

Principal

Remote

Data

Principal