Site Reliability Engineer

Slingshot Aerospace
Summary
Join Slingshot Aerospace and play a key role in the reliable and efficient operation of their global sensor network, producing high-quality space situational awareness data and information products. You will support the growing program in space systems and networks, assisting with network and office infrastructure. Responsibilities include managing telescope network IT infrastructure, automating processes, and developing scalable systems. You will also be responsible for monitoring system health, security, and compliance, and working with engineers to resolve critical issues. This role requires on-call responsibilities and strong communication and collaboration skills. The position offers a competitive salary and the opportunity to contribute to a company with a strong commitment to its values and mission.
Requirements
- Must be eligible to obtain or maintain US Government Security Clearance
- Bachelor's degree in information technologies, Computer Networking, Computer Engineering, Electrical Engineering, Computer Science, a related field or equivalent experience
- Proficiency with administering Linux and Windows servers
- Experience with developing scripts in at least one of the following: Bash, shell, Python
- Ability to learn new technologies
- Customer-success-driven, hands-on, action-oriented
Responsibilities
- Manages telescope network IT infrastructure, including Windows PCs, virtualized hosts, observatory control software, and Linux servers
- Automation, process optimization, and innovative solutions to streamline operations and drive efficiency. This role requires identifying inefficiencies, automating workflows, and delivering impactful improvements that align with business goals and minimize downtime in our systems
- Strong communication and collaboration abilities
- Develop and deploy scalable systems that align with organizational growth, ensuring reliability, efficiency, and seamless integration across platforms
- Deploy tools to improve monitoring of critical telescope network computing system resources, including CPU capacities, disk capacities, operational Linux and Windows processes, network activity, etc
- Knowledge and troubleshooting experience in various Dev environments
- Support security and VPN tools such as Palo Alto products or similar as an escalation point
- Deploy additional system recovery processes and ensure that the recovery process documentation is defined and up to date for each system
- Help monitor the health of remote telescope observatory sites, including sensor health, site network health, and sensor control hardware/software health
- Work closely with Slingshot’s engineers to determine and implement resolutions to critical issues
- Act as liaison between Slingshot’s field engineers and space teams and remote observatory hosts, as needed
- Monitor the security of critical telescope network components, e.g., by monitoring system logs and intrusion detection system logs, and defining alerting rules and implementing these rules in log server automation scripts
- Implement security compliance policies, including applying patches and fixes, and updating vulnerability and virus detection software
- Liaison with Slingshot’s business operations department regarding supply chain management of the telescope network hardware
- Document critical systems administration workflows and processes
- Ensure our systems are secure and compliant
- This role will require on-call responsibilities as the business needs requires
- Monitor and maintain remote observatory sites, including sensor health, network connectivity, and control systems
- Implement and maintain system monitoring tools for critical resources (CPU, disk capacity, processes, network activity)
- Develop and deploy scalable systems that ensure reliability and seamless integration across platforms
- Monitor security of telescope network components through system logs and intrusion detection
- Implement and maintain security compliance policies, including patch management and vulnerability detection
- Create and implement Role-Based Access Control (RBAC) across IT and Space IT systems
- Deploy and maintain system recovery processes
- Create and maintain comprehensive recovery documentation for all systems
- Document critical systems administration workflows and processes
- Establish and maintain change management procedures
- Work directly with Slingshot engineers to resolve critical issues
- Serve as liaison between field engineers, space teams, and remote observatory hosts
- Identify and implement automation opportunities to streamline operations
- Develop innovative solutions to improve efficiency and minimize system downtime
- Optimize workflows to align with business goals and organizational growth
- Maintain strong communication and collaborative relationships with cross-functional stakeholders
- Provide troubleshooting support across various development environments
- Participate in on-call rotation as business needs require
- Perform other duties as assigned (not to exceed 10% of primary responsibilities)
Preferred Qualifications
- Experience with network design and operations, including Palo Alto and Cisco equipment
- Experience with network security, including firewalls, IPS, and IDS systems
- Experience with cloud infrastructure (preferably AWS) and virtualization (XenServer, ESXi) technologies
- Experience or a keen interest in learning about various Linux packages (Alma, etc.) and databases (e.g., Ceph, Solr, MongoDB)
- Experience with developing applications in at least one of the following: Python, Java, C++
- Experience or a keen interest in learning about telescope hardware and software systems
Benefits
- $130,000 - $180,000
- Full time Exempt (learned professional exemption)
- Remote, USA
Share this job:
Similar Remote Jobs

