Senior Software Engineer - Data Lakehouse Infrastructure at TRM Labs

Summary

Join TRM Labs, a blockchain intelligence company fighting crime and building a safer world, as a Senior Data Engineer. You will design, implement, and scale core components of our lakehouse architecture, owning data modeling, ingestion, query performance optimization, and metadata management. Leveraging cutting-edge tools like Apache Spark, Trino, Hudi, Iceberg, and Snowflake, you'll architect a high-performance data lakehouse on GCP. You will build and optimize distributed query engines, implement metadata management, and develop robust ETL/ELT pipelines. Collaborate with data scientists, engineers, and product managers to design and implement solutions. This role offers the opportunity to make a significant impact on a mission-driven team tackling complex global challenges.

Requirements

5+ years of experience in data or software engineering, with a focus on distributed data systems and cloud-native architectures
Proven experience building and scaling data platforms on GCP, including storage, compute, orchestration, and monitoring
Strong command of one or more query engines such as Trino, Presto, Spark, or Snowflake
Experience with modern table formats like Apache Hudi, Iceberg, or Delta Lake
Exceptional programming skills in Python, as well as adeptness in SQL or SparkSQL
Hands-on experience orchestrating workflows with Airflow and building streaming/batch pipelines using GCP-native services

Responsibilities

Architect and scale a high-performance data lakehouse on GCP, leveraging technologies like StarRocks, Apache Iceberg, GCS, BigQuery, Dataproc, and Kafka
Design, build, and optimize distributed query engines such as Trino, Spark, or Snowflake to support complex analytical workloads
Implement metadata management in open table formats like Iceberg and data discovery frameworks for governance and observability using Iceberg compatible catalogs
Develop and orchestrate robust ETL/ELT pipelines using Apache Airflow, Spark, and GCP-native tools (e.g., Dataflow, Composer)
Collaborate across departments, partnering with data scientists, backend engineers, and product managers to design and implement
Build scalable engines to optimize routine scaling and maintenance tasks like create self-serve automation for creating new pgbouncer, scaling disks, scaling/updating of clusters, etc
Enable tasks to be faster next time and reducing dependency on a single person
Identify ways to compress timelines using 80/20 principle . For instance, what does it take to be operational in a new environment? Identify the must have and nice to haves that are need to deploy our stack to be fully operation. Focus on must haves first to get us operational and then use future milestones to harden for customer readiness. We think in terms of weeks and not months
Identify first version, a.k.a., " skateboards " for projects. For instance, build an observability dashboard within a week. Gather feedback from stakeholders after to identify more needs or bells and whistles to add to the dashboard

Benefits

The estimated base salary range for this role is $190,000 - $220,000
Additionally, this role may be eligible to participate in TRM’s equity plan
PTO, Holidays, and Parental Leave for full time employees
Remote-first

Senior Software Engineer - Data Lakehouse Infrastructure

TRM Labs

Summary

Requirements

Responsibilities

Benefits

Remote

Software Development

Senior

Share this job: