Data Engineer, Cloud Consultant at Pythian

Summary

Join Pythian, a global leader in data and cloud solutions, as a Data Engineer/Cloud Consultant! Work remotely within a team of experts to design and build impactful software for enterprise data platforms, primarily on cloud platforms. You will contribute to real customer projects, automate data platform implementations, and data migrations. This role requires proficiency in programming languages like Python or Java, experience with big data cloud technologies, and strong SQL skills. Pythian offers a competitive compensation package, flexible remote work, substantial training allowances, and a focus on employee well-being.

Requirements

Proficiency in a programming language such as Python, Java, Go or Scala
Experience with big data cloud technologies like EMR, Athena, Glue, Big Query, Dataproc, Dataflow
Understand the fundamentals of Spark (PySpark or SparkSQL) including using the Dataframe Application Programming Interface as well as analyzing and performance tuning Spark queries
Have experience developing and supporting robust, automated and reliable data pipelines
Develop frameworks and solutions that enable us to acquire, process, monitor and extract value from large dataset
Have strong SQL skills
Bring a good knowledge of popular database and data warehouse technologies & concepts from Google, Amazon or Microsoft (Cloud & Conventional RDBMS), such as BigQuery, Redshift, Microsoft Azure SQL Data Warehouse, Snowflake etc
Have strong knowledge of a Data Orchestration solutions like Airflow, Oozie, Luigi or Talend
Have strong knowledge of DBT (Data Build Tool) or DataForm
Experience with working with software engineering best practices for development, including source control systems, automated deployment pipelines like Jenkins and devops tools like Terraform
Experience in data modeling, data design and persistence (e.g. warehousing, data marts, data lakes)
Experience in performing DevOps activities such as IaC using Terraform, provisioning infrastructure in GCP/aws/Azure, defining Data Security layers etc

Responsibilities

Design and development of end to end Cloud based solutions with heavy focus on application and data and good understanding of infrastructure
Translate complex functional and technical requirements into detailed designs
Write high-performance, reliable and maintainable code
Develop test automation and associated tooling needed for the project
Work on complex and varied Cloud based projects including tasks such as collecting, parsing, managing, analyzing, and visualizing very large datasets etc
Maintain and execute DataOps tasks such as performance optimization of ETL/ELT pipeline, diagnosis and troubleshooting of pipeline issues, interpreting Data Observability Dashboards, Enhancements etc
Perform Data Pipeline specific DevOps activities such as Infrastructure provisioning, writing IaC code, implementing data security etc
Analyze potential issues and complete root cause analysis and assign issues to be resolved
Follow up with Data Engineering team members to see fixes through completion
Review bug descriptions, functional requirements and design documents, incorporating this information into test plans and test cases
Performance tuning for batch and real-time data processing
Secure components of clients’ Cloud Data platforms
Health-checks and configuration reviews
Data pipelines development – ingestion, transformation, cleansing
Data flow integration with external systems
Integration with data access tools and products
Foundational CI/CD for all infrastructure components, data pipelines, and custom data apps
Common operational visibility of the data platform from data platform infrastructure to data pipelines, machine learning apps
Assist client application developers and advise on efficient data access and manipulations
Define and implement efficient operational processes

Preferred Qualifications

Ideally you will have specific strong hands on experience working with Google Cloud Platform data technologies - Google BigQuery, Google DataFlow, and Executing PySpark and SparkSQL code at Dataproc
Experience with Apache Iceberg, Hudi and Query engines like Presto (Trino) is a plus
Knowledge of Data Catalogs (AWS Glue, Google DataPlex etc.), Data Governance and Data Quality Solutions (for eg. Great Expectations) is an added advantage
Have knowledge of how to design distributed systems and the trade-offs involved
Good to have knowledge of GenAI tools and frameworks such as Vertex AI, Langchain. Proficiency in prompt engineering

Benefits

Competitive total rewards package
Flexibly work remotely from your home, there’s no daily travel requirement to an office!
Hone your skills or learn new ones with our substantial training allowance; participate in professional development days, attend training, become certified, whatever you like!
We give you all the equipment you need to work from home including a laptop with your choice of OS, and an annual budget to personalize your work environment!
You will have an annual wellness budget to make yourself a priority (use it on gym memberships, massages, fitness and more)
You will receive a generous amount of paid vacation and sick days, as well as a day off to volunteer for your favorite charity

Data Engineer, Cloud Consultant

Pythian

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

Data

Mid-level

Similar Remote Jobs

Remote

Data

Senior

Remote

Data

Senior

Remote

Data

Senior

Remote

Data

Mid-level

Remote

Data

Senior

M&S Consulting

Remote

Software Development

Mid-level

Remote

Software Development

Senior

Remote

Data

Mid-level

Remote

Data

Mid-level

Remote

Data

Principal