Production Support Engineer

Makpar Corporation
Summary
Join Makpar, a leading professional and technical solutions provider for the Federal government, as a Production Support Engineer. This exciting opportunity is ideal for experienced professionals with Site Reliability and incident management expertise. You will be the first line of defense for production and test environment issues, collaborating with various teams to resolve incidents and apply Site Reliability Engineering principles. Responsibilities include troubleshooting, system monitoring, service ticket creation, and providing weekend/after-hours support. You will also participate in performance analysis and improvement, and provide regular status reports. This role requires a Bachelor's degree, AWS Cloud Certification, and 3+ years of relevant IT experience. Strong analytical, problem-solving, and communication skills are essential.
Requirements
- Bachelor's Degree in computer science, engineering, or related field
- AWS Cloud Certification
- 3+ years of relevant IT experience with site reliability engineering in cloud environment is preferred
- Knowledge of Java and micro service development and deployments
- Experience with Splunk is essential and AppDynamics is highly desirable
- Understanding of the business processes behind applications
- Strong analytical, problem-solving, negotiation, task and project management, and organizational skills
- Strong oral and written communication skills; including process documentation
- Proficiency in Microsoft Office applications (Word, PowerPoint, Excel, and Project)
- Proficiency knowledge of computer systems, databases and SharePoint
- A passion to help improve the customer experience
Responsibilities
- Be the first line of defense for production and test environment issues
- Work collaboratively with various teams to identify, manage and resolve ongoing incidents
- Troubleshoot and connect with appropriate teams to effectively triage issues impacting production and test environments
- Apply Site Reliability Engineering principles to create proactive alerts and to determine root cause of production issues
- Understand system architecture, upstream and downstream dependencies to enable effective participation in triage and restoration activities
- Setup and Perform systems monitoring of applications within the application domain after service restoration and post patching, maintenance and upgrades
- Create necessary service tickets and ensure tickets are routed to the appropriate technical teams
- Provide weekend and after hours support for planned maintenance and unplanned incident management activities
- Plan the upcoming maintenance events by working closely with all the necessary stakeholders
- Participate in analysis and improvement of system performance
- Provide regular status reports to management on system status and uptime and performance updates
Preferred Qualifications
Candidates must be a US Citizen or a Legal Permanent Resident (Green Card status) for 3 years and be Federal Tax compliant
Benefits
- Flexible work schedules
- Excellent training and career development opportunities
- Generous compensation package