Site Reliability Engineer (SRE)

This is a remote, US-based role. As a Site Reliability Engineer (SRE), your primary goals will be to provide exceptional value to our clients working across our managed and professional services offerings.

Site Reliability Engineer - Engineering
Fairwinds Ops, Inc. is a Managed Services Provider specializing in Kubernetes, catering to businesses across North America. Our dynamic team of experts is committed to empowering organizations to thrive and expand. With nearly a decade of experience in Kubernetes Cloud Architectures and Security services, we've supported diverse sectors including SMB, Enterprise, SaaS, Healthcare, Financial, and Not-for-Profit, assisting our partners in meeting their daily IT and business objectives. Our diverse clientele and mission-critical applications necessitate a deep understanding of providing robust architectures and highly secure environments.
Fairwinds is seeking an intellectually curious, collaborative, enthusiastic, and flexible engineer to join our team as a Site Reliability Engineer. As an SRE at Fairwinds, you’ll work directly with clients to ensure their goals are met through automation, analysis, and infrastructure configuration. Working collaboratively with the other SREs at Fairwinds, your expertise will help our clients succeed with Kubernetes through building robust infrastructure, solving complex problems, maintaining reliable, secure environments, and standing behind our work as a part of our on-call rotation. We work with a diverse set of technologies in and around Kubernetes depending on what our clients need. Commonly this includes technologies like:

-EKS/GKE/AKS and their associated cloud primitives
-Common Kubernetes addons like cert-manager, external-dns, karpenter, keda, etc.
-Load balancing and ingress
-CI/CD tools (Gitlab, CircleCI)
-Configuration tools such as Terraform and Helm
-Golang
-Monitoring tools (Datadog, Prometheus)

What we offer:

Competitive salary complemented by a performance based bonus structure.
Compensation $120,000 - $150,000 base salary dependent on experience.
Career growth opportunities in a fast-growing industry.
100% of insurance premiums paid by Company (medical, dental, vision)
401K Plan

Qualifications:

2-4 years of experience in Site Reliability Engineering, DevOps, or Infrastructure roles
Hands on experience with Kubernetes (EKS, GKE, AKS, or self-managed clusters)
Familiarity in Infrastructure as Code (Terraform, Helm, GitOps workflows)
Practical knowledge of AWS (or other major cloud providers)
Experience in observability tools (Prometheus, Grafana, Loki, ELK, Datadog, etc)
Experience building and managing CI/CD tools such as CircleCI, Jenkins, Gitlab, etc
Strong knowledge of Linux systems and basic networking concepts
Familiarity with cloud-native security practices (RBAC, IAM, network policies, secrets management)
Programming/scripting experience in Go, or Bash for automation and tooling
Good communication skills and willingness to collaborate with and learn from senior engineers
Experience working with external customers

Responsibilities:

Deploy, operate, and maintain Kubernetes clusters for customer workloads across multiple cloud providers
Support infrastructure automation using Terraform, Helm, and GitOps workflows
Contribute to monitoring, logging, and alerting solutions to ensure system reliability and stability
Participate in incident response, troubleshooting issues and driving fixes
Collaborate with senior engineers and customers to deliver scalable infrastructure solutions
Assist in performance tuning and cost optimization of Kubernetes clusters and cloud resources
Follow and help improve security best practices (IAM, RBAC, network policies)
Participate in on-call rotations to ensure uptime and responsiveness
Learn and grow by contributing to tooling and automation projects, with mentorship from senior engineers
Attend customer syncs to provide support to our customers regularly around their Kubernetes questions

Benefits:

Health, dental, vision, and life insurance.
Unmetered PTO.

Schedule: Monday to Friday

Supplemental pay types: Bonus

Experience: 5 years relevant experience preferred

Required travel: 10%-20%

Work Location: Fully Remote

Apply Now