Lead Systems Support Engineer
Thoughtworks
Summary
Join Thoughtworks as a Lead System Support Engineer and play a critical role in ensuring the operational efficiency and stability of complex application systems. Lead the team to operational success, enhancing incident management and DevOps proficiency. Engage directly with clients, providing client-facing problem-solving and strategic solutions. Utilize your expertise in debugging, application monitoring, and various logging techniques. Lead system upgrades and migrations, minimizing downtime. Mentor less-experienced peers and apply the latest technology thinking to solve client problems. This role requires strong technical and professional skills, including experience with various programming languages, cloud platforms, and DevOps tools.
Requirements
- Have experience working in languages such as Java and .Net and a good understanding of a scripting language such as Python and Powershell
- Have a good understanding of cloud platforms such as AWS, Azure, or GCP
- Have experience working with application monitoring tools such as DataDog, Prometheus, and Grafana, understand the different metrics, and be able to generate reports and take corrective actions
- Have experience working with a relational or non-relational database
- Have experience working with CI/CD tools like Jenkins, Github, Actions, Buildkite, or Azure pipelines
- Possess strong debugging/triaging skills to troubleshoot code effectively
- Ensure deliverables (bug fixes and enhancements) are of high quality and well-tested
- Conduct system performance analysis, identify bottlenecks, and implement optimization strategies
- Perform predictive analysis and proactively identify issues with development teams
- Have a high-level understanding of various architectures such as monolithic, N-tier, layered, microservices, and serverless
- Enjoy influencing others and advocate for technical excellence while being open to change
- Have good communication and articulation skills
- Have a presence in the external tech community and willingly share expertise
- Be resilient in ambiguous situations and approach challenges from multiple perspectives
- Influence clients on processes (incident management, support levels, scope of work) and communicate details to justify changes
- Advocate for and implement cloud best practices in resource optimization, monitoring, and alerting
- Advocate for and implement security best practices
- Be comfortable with Agile methods, such as Scrum or Kanban
Responsibilities
- Understand complex application systems and debug business-impacting issues
- Use skills in incident management processes and tools, application monitoring metrics and tooling to generate reports and take corrective actions
- Leverage knowledge of different logging techniques for alerting, monitoring, and identifying the root cause of incidents
- Follow standards and best practices to ensure operational efficiencies, stability, and availability of the system
- Lead the planning and execution of system upgrades, migrations, and maintenance activities, minimizing downtime and disruption to operations
- Use continuous delivery practices to evolve and support high-quality software, bringing value to end customers while working in collaborative, value-driven teams to build innovative customer experiences for clients
- Efficiently use DevOps tools and practices to deploy and run software
- Act as a mentor for less-experienced peers through technical knowledge and leadership skills
- Apply the latest technology thinking from the Technology Radar to solve client problems
Benefits
- Learning & Development opportunities: interactive tools, numerous development programs, and supportive teammates
- Remote work (#LI-Remote)