Senior Data Engineer

Nerdery
Summary
Join Nerdery as a Senior Data Engineer (GCP) and become a vital part of our team, architecting and implementing data solutions from the ground up. You will leverage your expertise in GCP data services, SQL, Python, and ETL processes to extract, transform, and load data from various sources. As a Senior Data Engineer, you will build and optimize data pipelines, integrate with tools like BigQuery and Vertex AI, and ensure data security and compliance. You will collaborate with cross-functional teams, including clients, to deliver actionable insights and support their data infrastructure needs. This role requires a Bachelor's degree in Computer Science or a related field, 6+ years of relevant experience, and proficiency in GCP data services and programming languages. The ideal candidate will be a proactive collaborator with excellent problem-solving and communication skills.
Requirements
- Bachelor's in Computer Science or related field or equivalent experience
- 6+ years of relevant experience
- In-depth knowledge of Google Cloud Platform (GCP) data services such as BigQuery, Dataflow, Dataproc, and Pub/Sub, with proven experience in designing and implementing data pipelines, data storage, and analytics solutions in GCP
- Ability to take technical requirements and produce functional code
- Experience with Git and specified technologies
- Proficiency in Python and SQL
- Experience with migrating data pipelines and infrastructure to GCP from multiple infrastructure stacks
- Deep understanding of data modeling, ETL processes, and data warehousing principles
- Familiarity with data pipeline orchestration tools and practices, such as Pub/Sub, Streaming, and Cloud Functions
- Excellent problem-solving and analytical skills
- Ability to communicate with technical and non-technical client stakeholders
- Must be legally authorized to work within the country of employment without sponsorship for employment visa status
Responsibilities
- Using complex SQL knowledge and experience, work with relational databases, Big Query, query authoring (SQL), and working familiarity with various databases
- Leverage BigQuery SQL for implementing ELT (Extract, Load, Transform) processes and performing complex data aggregations
- Support and optimize Looker dashboards based on BigQuery datasets, including tuning for optimal performance with complex visualizations
- Integrate BigQuery and Vertex AI for advanced analytics and machine learning applications
- Apply appropriate IAM (Identity and Access Management) roles within Google Cloud Platform to ensure fine-grained data access and security
- Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency, and other critical business performance metrics
- Design and implement scalable and reliable data pipelines on GCP
- Implement Change Data Capture (CDC) techniques and manage Delta Live Tables for real-time data integration and analytics, ensuring data consistency and enabling incremental data updates in cloud-based data platforms
- Design, configure, and manage Data Lakes in GCP, utilizing services like Google Cloud Storage, BigQuery, and Dataproc, to support diverse data types and formats for scalable storage, processing, and analytics
- Design API architecture, including RESTful services and microservices, integrating Machine Learning models into production systems to enhance data-driven applications and services
- Build the infrastructure, using IaC, required for extraction, transformation, and loading (ETL) of data from a wide variety of data sources using SQL and GCP
- Migrate and create data pipelines and infrastructure from AWS or Azure to GCP
- Write and maintain robust, efficient, scalable Python scripts for data processing and automation
- Use a strong understanding of data pipeline design patterns, and determine the best for the use case
- Work with unstructured datasets
- Build processes supporting data transformation, data structures, metadata, dependency, and workload management
- Manipulate, process, and extract value from large, disconnected datasets
- Work with stakeholders, including the Executive, Product, Data, and Design teams, to assist with data-related technical issues and support their data infrastructure needs
- Assume responsibility for the stability of the data in transit and at rest
- Collaborate directly with the client to identify and implement data security and compliance requirements. Keep client data secure using best practices
- Foster cross-functional collaboration as a technical liaison between engineering and other project disciplines (Design, Quality, Project Management, Strategy, Product, etc.)
Preferred Qualifications
Proactive collaborator works with colleagues to improve their technical aptitude