Summary
Join TRM Labs, a blockchain intelligence company, and become a Senior or Staff Engineer on the Data Platform team. You will focus on data incrementalization, efficiently processing and updating massive datasets. Design and build internal tools empowering data scientists and machine learning engineers to transform raw blockchain data into real-time intelligence. Collaborate with cross-functional teams to design and implement new data models and tools. Continuously monitor and optimize the Data Platform's performance for cost efficiency, scalability, and reliability. This role offers the opportunity to make a meaningful impact on a mission-driven team tackling complex global challenges.
Requirements
- Bachelor's degree (or equivalent) in Computer Science or a related field
- 5+ years of experience in building distributed system architecture, with a particular focus on incremental updates from inception to production
- Strong programming skills in Python and SQL
- Deep technical expertise in advanced data structures and algorithms for incremental updating of data stores (e.g., Graphs, Trees, Hash Maps)
- Comprehensive knowledge across all facets of data engineering, including: Implementing and managing incremental updates in data stores like BigQuery, Snowflake, RedShift, Athena, Hive, and Postgres
- Orchestrating data pipelines and workflows focused on incremental processing using tools such as Airflow, DBT, Luigi, Azkaban, and Storm
- Developing and optimizing data processing technologies and streaming workflows for incremental updates (e.g., Spark, Kafka, Flink)
- Deploying and monitoring scalable, incremental update systems in public cloud environments (e.g., Docker, Terraform, Kubernetes, Datadog)
- Expertise in loading, querying, and transforming large datasets with a focus on efficiency and incremental growth
Responsibilities
- Design and build our Cloud Data Warehouse with a focus on incremental updates to improve cost efficiency and scalability
- Research innovative methods to incrementally optimize data processing, storage, and retrieval to support efficient data analytics and insights
- Develop and maintain ETL pipelines that transform and incrementally process petabytes of structured and unstructured data to enable data-driven decision-making
- Collaborate with cross-functional teams to design and implement new data models and tools focused on accelerating innovation through incremental updates
- Continuously monitor and optimize the Data Platform's performance, focusing on enhancing cost efficiency, scalability, and reliability
- Build scalable engines to optimize routine scaling and maintenance tasks like create self-serve automation for creating new pgbouncer, scaling disks, scaling/updating of clusters, etc
- Enable tasks to be faster next time and reducing dependency on a single person
- Identify ways to compress timelines using 80/20 principle . For instance, what does it take to be operational in a new environment? Identify the must have and nice to haves that are need to deploy our stack to be fully operation. Focus on must haves first to get us operational and then use future milestones to harden for customer readiness. We think in terms of weeks and not months
- Identify first version, a.k.a., " skateboards " for projects. For instance, build an observability dashboard within a week. Gather feedback from stakeholders after to identify more needs or bells and whistles to add to the dashboard
Benefits
- The estimated base salary range for this role is $230,000 - $255,000
- Additionally, this role may be eligible to participate in TRMβs equity plan
- PTO, Holidays, and Parental Leave for full time employees
- Remote-first
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.