Senior Infrastructure Software Engineer - Cloud

Dremio
Summary
Join Dremio's Observability and Core Services team and play a key role in building a next-generation observability platform. You will design and develop this platform to stream telemetry data, enabling real-time insights for better customer support. Collaborate with DevOps and SRE teams to provide a simplified and secure experience for enterprise customers. Work with core product development teams to ensure safe product launches and partner with field teams for effective user support. You will also contribute to modernizing the software stack by delivering shared services and core components. This role involves solving complex challenges related to distributed systems and contributing to a high-quality, scalable platform. Grow through collaboration and ownership of complex problems.
Requirements
- Bachelor's, Master's, or higher in Computer Science or a related technical field
- 5+ years of relevant work experience
- 2+ years of hands on Infrastructure-as-code devops or SRE experience
- Expert with cloud platforms such as AWS, Azure, or GCP
- Hands-on experience with OpenTelemetry or related observability standards and platforms
- Proficient in Java, Python, Bash, and Node.js
- Experience with container orchestration tools, such as Kubernetes and Docker
- Strong understanding of networking and common protocols like TCP/IP, DNS, HTTP, etc
Responsibilities
- Deliver an observability platform to ingest customer telemetry data—including events, logs, and traces—leveraging OpenTelemetry and industry best practices
- Provide dashboarding and visualization capabilities to enable Support/Field teams to monitor systems, identify potential issues, detect trends, and automatically scale to increase resiliency for Enterprise customers
- Design and implement a robust notification and alerting system driven by telemetry data and heuristics
- Develop infrastructure-as-code solutions using Helm and Terraform to simplify deployment and monitoring within Dremio Software
- Collaborate with data analytics engineers to Contribute to and maintain application frameworks that standardize development, accelerate time-to-delivery, and address complex challenges such as dependency injection management and role-based access control at scale
- Collaborate with Cloud SRE and DevOps teams to define best practices and deliver standardized solutions for common engineering workflows
- Engage with the developer community to gain deep insights into their pain points, use cases, and technical challenges
Preferred Qualifications
- In-depth knowledge of SaaS, microservices, and distributed systems development
- Hands-on experience with multi-threaded and asynchronous programming models
- Proficient with NoSQL databases such as MongoDB or RocksDB
- Knowledge of SQL and RDBMS systems
- Experience with Redis caching and queuing systems
- Familiar with CI/CD tools like Jenkins and ArgoCD
- Extensive experience in query processing and optimization, distributed systems, concurrency control, data replication, code generation, networking, and storage systems
- Practical experience with Java GC/heap management, Apache Arrow, SQL operators, caching techniques, and disk spilling
Benefits
- Workplace Wednesdays - to break down silos, build relationships and improve cross-team communication. Lunch catering / meal credits provided in the office and local socials align to Workplace Wednesdays
- In general, Dremio will remain a hybrid work environment. We will not be implementing a 100% (5 days a week) return to office policy for all roles
Share this job:
Similar Remote Jobs
