kaamvaam.com - 81 Site Reliability Engineer jobs in San Francisco

Zoox

Platform/Site Reliability Engineer Foster City

Skills & Focus: site reliability engineer, uptime, autonomous vehicles, fault-tolerant systems, deployment, operation, data-processing pipelines, compute-intensive tasks, CPUs, GPUs

About the Company: Zoox is a robotics company focused on developing autonomous vehicles with an ethos of automation throughout the infrastructure components they build.

Replit

Site Reliability Engineer Foster City

Skills & Focus: Site Reliability Engineering, SRE, Infrastructure Automation, Monitoring Solutions, Infrastructure as Code, CI/CD Pipelines, Incident Management, Performance Optimization, Distributed Systems, Cloud-native Technologies

About the Company: Replit is the fastest way to turn ideas into software. With our powerful AI-powered Agent and Assistant, anyone can create and launch apps from natural languag…

Experience: 3+ years of experience in Site Reliability Engineering or similar roles (DevOps, Systems Engineering, Infrastructure Engineering)

Type: Full-Time

Benefits: Flexible Work Hours, Competitive Salary & Equity, Home Office Set-Up Stipend, Health, Dental, Vision and Life Insurance…

Zoox

Staff Technical Operations Engineer Foster City

Skills & Focus: IT Technical Operations, real-time command center, monitoring services, Site Reliability Engineering (SRE), Technical Operations Engineering, stability, live robot missions, strategic initiatives, innovative solutions, reliability and performance

Senior Site Reliability Engineer Foster City

Skills & Focus: Site Reliability Engineering, Autonomous Vehicles, Microservice Architecture, Kubernetes, Data Pipelines, Performance Metrics, Linux, Python, C/C++, AWS

About the Company: Zoox is developing the first ground-up, fully autonomous vehicle fleet and the supporting ecosystem required to bring this technology to market. Sitting at the…

Experience: 2+ years

Salary: $210,000 to $250,000

Type: Full-time

Benefits: A comprehensive package including paid time off, health insurance, long-term and short-term disability insurance, life …

Neuralink

Infrastructure Team Member Fremont

Skills & Focus: software engineering, cloud architecture, infrastructure, networking protocols, Linux systems, hybrid cloud, security fundamentals, IAC tools, cryptographic protocols, systems administration

About the Company: We are creating devices that enable a bi-directional interface with the brain. These devices allow us to restore movement to the paralyzed, restore sight to th…

Experience: Experience building hybrid cloud/on-prem infrastructure, software engineering skills, and system administration experience.

Salary: $35/Hr USD

Type: Full-time

Benefits: An opportunity to change the world, growth potential, excellent medical/dental/vision insurance, paid holidays, commute…

Personalis, Inc

Senior Software Engineer Fremont

Skills & Focus: software engineering, LIMS, CI/CD pipelines, Python, Java, PostgreSQL, MySQL, Flask, Django, site reliability engineering

About the Company: Personalis is transforming the active management of cancer through breakthrough personalized testing, focusing on cancer management and patient care.

Experience: 5+ years of experience in software engineering, site reliability engineering, and/or devops.

Salary: $147,000 to $180,000 per year

Type: Full-time

Benefits: Competitive compensation package and benefits including medical, dental, vision, 401(k) match, ESPP, tuition reimbursem…

Neuralink

Infrastructure Engineer Fremont

Skills & Focus: software engineering, networking protocols, Linux systems, cloud infrastructure, system administration, DevOps, automating processes, cryptographic protocols, production environments, Brain-Computer Interface (BCI)

About the Company: We are creating devices that enable a bi-directional interface with the brain. These devices allow us to restore movement to the paralyzed, restore sight to th…

Experience: Robust software engineering skills, experience in Linux systems, cloud/on-prem infrastructure.

Salary: $116,000 - $235,000 USD

Type: Full-time

Benefits: Medical, dental, and vision insurance, paid holidays, commuter benefits, meals provided, equity + 401(k) plan, parental…

Robinhood Markets

Staff Software Engineer - Reliability Engineering Menlo Park

Skills & Focus: reliability, scalability, performance, security, distributed systems, programming languages, Linux, networking, incident metrics, monitoring

About the Company: Robinhood Markets was founded on a simple idea: that our financial markets should be accessible to all. With customers at the heart of our decisions, Robinhood…

Experience: 8+ years

Salary: $217,000 - $255,000 USD

Type: Full-time

Benefits: 100% paid health insurance for employees with 90% coverage for dependents; Annual lifestyle wallet for personal wellnes…

Staff Software Engineer - Reliability Engineering Menlo Park

Skills & Focus: reliability, software engineering, systems operations, incident metrics, production readiness, black box monitoring, infrastructure, Kubernetes, cloud computing, system resilience

About the Company: Join a leading fintech company that’s democratizing finance for all. With customers at the heart of our decisions, Robinhood is lowering barriers and providing…

Experience: 8+ years experience

Salary: $217,000 - $255,000 USD

Type: Full-time

Benefits: 100% paid health insurance for employees with 90% coverage for dependents, annual lifestyle wallet for personal wellnes…

Staff Software Engineer - Reliability Menlo Park

Skills & Focus: reliability, scalability, performance, security, software engineering, distributed systems, incident metrics, operational excellence, mentoring, infrastructure

About the Company: Robinhood Markets was founded on a simple idea: that our financial markets should be accessible to all. With customers at the heart of our decisions, Robinhood…

Experience: 8+ years experience in designing, building, and maintaining large-scale, distributed systems

Salary: $217,000 — $255,000 USD (Zone 1); $190,000 — $224,000 USD (Zone 2); $169,000 — $199,000 USD (Zone 3)

Type: Full-time

Benefits: 100% paid health insurance for employees with 90% coverage for dependents; Annual lifestyle wallet for personal wellnes…

Aerospike

Performance & Reliability Engineer Mountain View

Skills & Focus: performance engineering, reliability, distributed systems, database concepts, performance tuning, Linux/Unix, observability tools, problem-solving, collaboration, communication

About the Company: Aerospike, a leader in next-generation, always-on, hyperscale data solutions, enables extreme-scale, real-time applications for various industry leaders.

Experience: Experience with distributed systems or large-scale services, preferably in a production setting.

Salary: $140,000 - $175,000

Type: Full-time

Benefits: Equal Opportunity Employer, commitment to a non-discriminatory environment.

Coupang

Site Reliability Engineer (SRE) Mountain View

Skills & Focus: Site Reliability Engineering, Automation, Infrastructure Automation, Cloud-based Infrastructure, DevOps, CI/CD, Kubernetes, Observability, Large-Scale Systems, E-commerce

About the Company: Coupang is a large-scale e-commerce company, operating complex systems to deliver mission-critical services.

Experience: 10+ years of industry experience building and operating large-scale distributed systems.

Type: Full-time

Newsbreak

Software Engineer in Reliability & Availability Mountain View

Skills & Focus: AWS, Kubernetes (EKS), EMR (Elastic MapReduce), service reliability, fault-tolerant architectures, Infrastructure-as-Code (IaC), CI/CD pipelines, monitoring tools (Prometheus, Grafana), high-availability strategies, incident response

About the Company: NewsBreak is redefining the way users interact with local news and their communities. By bridging local users, local content creators, and local businesses, ou…

Experience: 2+ years in SRE, DevOps, or Infrastructure Engineering roles

Salary: $130,000 – $260,000 USD

Type: Full-time

Benefits: Discretionary bonus and options may also be available; overall rewards package designed to attract top talents.

Intuit

Staff Software Engineer Mountain View

Skills & Focus: Kubernetes, AWS, DevOps, Platform Engineering, Reliability Engineering, Cloud Architecture, Automation, Observability, Incident Management, Data Analysis

About the Company: Intuit is the global financial technology platform that powers prosperity for the people and communities we serve. With approximately 100 million customers wor…

Experience: 7+ years

Salary: $184,500 - $250,000

Type: Full-time

Benefits: Cash bonus, equity rewards and benefits

Coupang

Observability Engineer Mountain View

Skills & Focus: observability solutions, monitoring, alerting, logging, tracing, Kubernetes, DevOps, SRE practices, cloud-based infrastructure, performance indicators

About the Company: Coupang is a leading force in South Korean commerce, known for its exceptional customer service and innovative approach to retail and e-commerce. The company b…

Experience: Strong experience in implementing and managing observability solutions in large-scale, complex environments.

Salary: $159,000 - $324,000/year

Type: Full-time

Benefits: Medical/Dental/Vision/Life insurance, Flexible Spending Accounts, Long-term/Short-term Disability, Employee Assistance …

Technical Program Manager - Site Reliability Engineering (SRE) and Performance Mountain View

Skills & Focus: site reliability engineering, performance, distributed systems, large-scale systems, project management, security, privacy, compliance, stakeholders, scalability

About the Company: A fastest-growing retail company, disrupting the commerce industry from South Korea, combining startup culture with large global resources.

Experience: Minimum 12 years managing large-scale cross-functional projects

Salary: $159,000 - $324,000 per year

Type: Full-time

Benefits: Medical/Dental/Vision/Life, AD&D insurance, FSA & HSA, Disability insurance, EAP, 401K with match, PTO, public holidays…

Moody's Shared Services, Inc.

Senior Systems Engineer Newark

Skills & Focus: Design, Build, Operate, System operation, Monitoring, Hardware upgrades, Disaster recovery, Vendor communication, Big data Spark clusters, Kubernetes

Experience: At least two (2) years as a Systems Engineer or related role

Salary: $110,032 - $220,250/yr

Type: Full-time

Benefits: Medical, dental, vision, parental leave, paid time off, 401(k), life, disability, and accident insurance, stock purchas…

2k Games

Sr Manager of Devops and Observability Novato

Skills & Focus: Devops, Observability, SRE, cloud services, infrastructure management, CICD, security, performance, stakeholder management, microservices

About the Company: 2K is a global video game company, publishing titles developed by some of the most influential game development studios in the world. Our portfolio of titles i…

Experience: 5+ years in SRE, Devops, or system engineering fields, 3+ years coaching and mentoring senior technical talent.

Salary: $155,800 - $230,560 per Year

Type: Full-time

Benefits: Full range of medical, financial, and/or other benefits, including a bonus and/or equity awards.

Hippocratic Ai

Senior Site Reliability Engineer (GCP / Kubernetes) Palo Alto

Skills & Focus: infrastructure automation, deployment pipelines, monitoring, scalable systems, cloud platforms, Kubernetes, Terraform, Ansible, Jenkins, security compliance

About the Company: Hippocratic AI has developed a safety-focused Large Language Model (LLM) for healthcare. The company believes that a safe LLM can dramatically improve healthca…

Experience: At least 5 years of professional experience in DevOps engineering or a related field

Type: Full time

Senior ML Infrastructure Engineer Palo Alto

Skills & Focus: ML Infrastructure, Kubernetes, Terraform, multi-cloud environments, orchestration platform, cloud platforms, resource optimization, automation, system health monitoring, capacity planning

About the Company: Hippocratic AI has developed a safety-focused Large Language Model (LLM) for healthcare. The company believes that a safe LLM can dramatically improve healthca…

Experience: 3-5 years

Type: Full time

Senior Site Reliability Engineer (GCP / Kubernetes) Palo Alto

Skills & Focus: infrastructure automation, Kubernetes, DevOps, monitoring, scalability, cloud platforms, security compliance, deployment pipelines, disaster recovery, mentorship

About the Company: Hippocratic AI has developed a safety-focused Large Language Model (LLM) for healthcare, aiming to improve accessibility and outcomes by applying deep healthca…

Experience: At least 5 years of professional experience in DevOps engineering or a related field

Type: Full time

Luma Ai

Site Reliability Engineer (SRE) Palo Alto

Skills & Focus: Site Reliability Engineer, SRE, Infrastructure, GPU clusters, H100 GPUs, Monitoring tools, Management tools, Performance problems, Maintenance problems, Data Processing

Glean

Senior Site Reliability Engineer (SRE) Palo Alto

Skills & Focus: SRE, cloud infrastructure, automation, monitoring, incident management, performance optimization, scalability, security compliance, software development, cloud platforms

About the Company: We’re on a mission to make knowledge work faster and more humane. We believe that AI will fundamentally transform how people work.

Experience: 8+ years of experience in a senior-level role within Site Reliability Engineering or similar role

Salary: $155,000 - $250,000 annually

Type: Full-time

Benefits: Competitive compensation, Medical, Vision and Dental coverage, Flexible work environment and time-off policy, 401k, Com…

Luma Ai

Senior Software Engineer - Reliability Palo Alto

Skills & Focus: SRE, GPU, infrastructure, monitoring, cloud providers, automation, scalability, containerization, observability, problem-solving

Experience: 5+ years

Type: Full-time

Site Reliability Engineer (SRE) Palo Alto

Skills & Focus: SRE, Infrastructure, GPU clusters, H100 GPUs, Training, Data Processing, Monitoring, Management tools, Performance, Maintenance

Celonis

Site Reliability Engineer Redwood City

Skills & Focus: Site Reliability Engineering, SRE principles, observability, automation, incident prevention, cloud platforms, Java, Python, Kubernetes, error budgets

About the Company: Celonis helps some of the world’s largest and most esteemed brands make processes work for people, companies, and the planet. With over 5,000 enterprise custom…

Experience: Minimum of 8+ years of experience in software engineering or SRE roles.

Salary: $195,000 - $235,000 USD

Type: Full-time

Benefits: Great compensation and benefits packages (equity, life insurance, time off, generous leave for new parents from day one…

Box

Site Reliability Engineer Redwood City

Skills & Focus: SRE, reliability, scalability, cloud-native, Kubernetes, AWS, GCP, observability, automation, distributed systems

About the Company: Box (NYSE:BOX) is the leader in Intelligent Content Management. Our platform enables organizations to fuel collaboration, manage the entire content lifecycle, …

Experience: 5+ years of working experience designing, developing, and operating large-scale, customer-facing products or services

Type: Full-time

Benefits: Equity and benefits including healthcare benefits.

Celonis

Site Reliability Engineer Redwood City

Skills & Focus: Site Reliability Engineering, Microservices, Kubernetes, Automation, Incident management, Cloud computing, Java, Python, Observability, CI/CD

About the Company: Celonis helps some of the world’s largest and most esteemed brands make processes work for people, companies and the planet. With over 5,000 enterprise custome…

Experience: Minimum of 5 years of experience building and maintaining cloud-based software applications.

Salary: $160,000 - $210,000 USD

Type: Full-time

Benefits: Great compensation and benefits packages (equity, life insurance, time off, generous leave for new parents, etc.).

Astronomer

Director of Reliability Engineering San Francisco

Skills & Focus: Reliability Engineering, SRE, Cloud-native, Automation, Observability, Scalability, Incidents Management, Service Uptime, Distributed Systems, Team Leadership

About the Company: Astronomer empowers data teams to bring mission-critical software, analytics, and AI to life and is the company behind Astro, the industry-leading unified Data…

Experience: 10+ years in software engineering, SRE, or DevOps roles; 5+ years in technical leadership

Salary: $260,000 - $290,000 plus equity

Type: Full-time

Cisco Thousandeyes

Lead Site Reliability Engineer II, Production Engineering San Francisco

Skills & Focus: DevSecOps, SRE, cloud-native, Kubernetes, Docker, AWS, security architecture, CI/CD pipelines, vulnerability management, observability

About the Company: Cisco ThousandEyes is a Digital Experience Assurance platform that empowers organizations to deliver flawless digital experiences across every network – even t…

Experience: 8+ years of experience in SRE, DevSecOps, or similar roles, with a strong focus on security.

Salary: 198,600-282,900 USD

Type: Full-time

Benefits: Medical, dental and vision insurance, 401(k) with Cisco matching, short and long-term disability coverage, life insuran…

Gusto

Storage Infrastructure Engineer San Francisco

Skills & Focus: storage infrastructure, MySQL, Postgres, data streaming, Kafka, cloud platforms, AWS, Terraform, resiliency, automation

About the Company: Gusto is a modern, online people platform that helps small businesses take care of their teams. On top of full-service payroll, Gusto offers health insurance, …

Experience: 4+ years of experience with software development and architecture; 2+ years of experience with database technologies like MySQL or Postgres; 2+ years of experience with data streaming technologies, particularly Kafka

Salary: $164,000-$237,000 in Denver & most remote locations, $235,000-$265,000 for San Francisco & New York

Type: Full-time

Benefits: Health insurance, 401(k), expert HR, Total Rewards philosophy

Astranis

Senior Site Reliability Engineer - Ground Software San Francisco

Skills & Focus: Kubernetes, site reliability engineer, DevOps, Linux, monitoring, deployment practices, software systems, automation, mission control, shell programming

About the Company: Astranis is a telecommunications company that operates satellites from geostationary orbit (GEO) to connect millions of people worldwide, currently expanding i…

Experience: 7+ years of experience as a Site Reliability Engineer, DevOps or DevSecOps; 7+ years of experience on Linux

Salary: $150,000 - $215,000 USD

Type: Full-time

Benefits: Equity, high quality company-subsidized healthcare, disability and life insurance benefits, flexible PTO, 401(K) retire…

Anthropic

Staff Software Engineer, AI Reliability Engineering San Francisco

Skills & Focus: Software Engineering, Reliability Engineering, Service Level Objectives, Monitoring Systems, High-Availability Infrastructure, Incident Response, Cost Optimization, Distributed Systems, AI Infrastructure, Chaos Engineering

About the Company: Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a who…

Salary: $320,000 - $485,000 USD

Type: Full-time

Benefits: competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexibl…

Crusoe

Staff Site Reliability Engineer, Compute San Francisco

Skills & Focus: AI infrastructure, compute infrastructure, Linux kernel, virtualization, KVM, QEMU, hypervisors, SmartNICs, kernel tuning, performance optimization

About the Company: Crusoe is building the World’s Favorite AI-first Cloud infrastructure company, providing purpose-built AI infrastructure solutions trusted by Fortune 500 compa…

Experience: 8+ years of professional experience in SRE, Linux system engineering, or compute infrastructure roles

Salary: up to $250,000 per year + Bonus

Type: Full-time

Benefits: Hybrid work schedule, industry competitive pay, restricted stock units, health insurance, employer HSA contributions, p…

Sesame

Backend Infrastructure Engineer San Francisco

Skills & Focus: backend, infrastructure, systems, reliability engineering, monitoring, deployments, Terraform, Kubernetes, automation, data engineering

About the Company: Sesame believes in a future where computers are lifelike - with the ability to see, hear, and collaborate with us in ways that feel natural and human. With thi…

Salary: $175K - $280K

Type: Full-time

Benefits: 401k matching, 100% employer-paid health, vision, and dental benefits, Unlimited PTO and sick time, Flexible spending a…

Crusoe Energy Systems

Staff Site Reliability Engineer San Francisco

Skills & Focus: Site Reliability Engineering, AI infrastructure, automation, monitoring, incident response, system performance, network programming, security best practices, CI/CD, cloud infrastructure

About the Company: Crusoe is building the World’s Favorite AI-first Cloud infrastructure company. We’re pioneering vertically integrated, purpose-built AI infrastructure solution…

Experience: 8+ years of professional SRE experience

Salary: up to $250,000 per year + Bonus

Type: Full-time

Benefits: Hybrid work schedule, industry competitive pay, restricted stock units, health insurance package options, paid parental…

Abridge

Site Reliability Engineer (SRE) San Francisco

Skills & Focus: SRE, Kubernetes, CI/CD pipelines, cloud security, observability, GCP, distributed systems, engineering enablement, scalability, incident response

About the Company: Abridge was founded in 2018 with the mission of powering deeper understanding in healthcare. Our AI-powered platform was purpose-built for medical conversation…

Experience: 6+ years of software engineering experience, with at least 2 years as a back-end engineer.

Salary: $180K – $265K

Type: Full time

Benefits: Generous Time Off, Comprehensive Health Plans, Paid Parental Leave, 401k Matching, Learning and Development Budget, Sab…

Focal Systems

Sr. DevOps/Site Reliability Engineer (SRE) San Francisco

Skills & Focus: DevOps, Site Reliability Engineer, GCP, Kubernetes, CI/CD, Infrastructure Automation, Cloud Services, Docker, Monitoring/Alerting, Python

About the Company: Focal Systems is the industry leader in retail AI solutions. We are a Silicon Valley based startup that has more than doubled in size every year since inceptio…

Experience: Solid experience in an infrastructure or Site Reliability Engineer (SRE) role

Salary: $170-190k + stock

Type: Full-time

Benefits: Competitive Salary & Attractive Stock, Paid Time Off, Quarterly Team Retreats, Education grants

Openai

Infrastructure Engineer, Public Sector San Francisco

Skills & Focus: infrastructure, engineering, Kubernetes, Python, FastAPI, Cosmos DB, Postgres, Terraform, reliable systems, cloud

About the Company: OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the bounda…

Experience: 8+ years in engineering, including 4+ years in infrastructure

Salary: $220.5K – $385K

Type: Full time

Benefits: Medical, dental, and vision insurance, mental health and wellness support, 401(k) plan with 50% matching, generous time…

Site Reliability Engineer, Public Sector San Francisco

Skills & Focus: Site Reliability Engineer, Infrastructure, Systems, Cloud, Public Sector, Kubernetes, Docker, Security Clearance, Automation, Troubleshooting

About the Company: OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the bounda…

Experience: 5+ years

Salary: $279K – $385K

Type: Full-time

Benefits: Medical, dental, and vision insurance, mental health and wellness support, 401(k) plan with 50% matching, generous time…

Crusoe

Senior Site Reliability Engineer San Francisco

Skills & Focus: Site Reliability Engineering, AI infrastructure, production systems, system reliability, automation, monitoring, Unix/Linux, Cloud, Kubernetes, CI/CD

About the Company: Crusoe is building the World’s Favorite AI-first Cloud infrastructure company. We’re pioneering vertically integrated, purpose-built AI infrastructure solution…

Experience: 5+ years of professional SRE experience and 5+ years of experience contributing to the architecture and design of new and current systems.

Salary: $183,000 - $210,000 per year + Bonus

Type: Full-time

Benefits: Hybrid work schedule, Industry competitive pay, Restricted Stock Units, Health insurance package options, Employer cont…

Orb

Infrastructure Engineer San Francisco

Skills & Focus: infrastructure, reliability, observability, scalability, performance-critical, event processing, cloud, AWS, resiliency, mentorship

About the Company: Orb is on a mission to revolutionize billing infrastructure for the modern era of AI and software. We empower businesses to align their monetization with produ…

Experience: 5+ years in software engineering, 4+ years in infrastructure domain

Type: Full-time

Benefits: Excellent medical, dental, and vision insurance - 100% coverage for you and dependents; Unlimited PTO (with 15 days min…

Cisco Thousandeyes

Principal Site Reliability Engineer, Datastores San Francisco

Skills & Focus: datastores, operational excellence, automation, reliability, high availability, cloud services, Infrastructure as Code, Terraform, Kubernetes, collaboration

About the Company: Cisco ThousandEyes is a Digital Experience Assurance platform that empowers organizations to deliver flawless digital experiences across every network.

Salary: 176,000 USD - 314,200 USD

Type: Full-time

Benefits: U.S. employees have access to quality medical, dental and vision insurance, a 401(k) plan, short and long-term disabili…

Salesforce

SRE (Site Reliability Engineer) - Production Support LMTS San Francisco

Skills & Focus: site reliability engineering, incident management, cloud environments, monitoring, automation, capacity planning, CI/CD, security compliance, API fundamentals, root cause analysis

About the Company: Salesforce is a leading cloud-based software company specializing in customer relationship management (CRM). They aim to improve the state of the world through…

Experience: 8+ years in a SRE role or related field

Salary: $200,800 - $276,100

Type: Full-time

Benefits: Inclusive benefits including those related to equal pay and employee resource groups.

Twitter

Site Reliability Engineering Team Lead San Francisco

Skills & Focus: site reliability engineering, team leadership, engineering collaboration, technical design, reliability practices, coaching, team empowerment, personal development, cross-team communication, system scalability

About the Company: Twitter is a social media platform that allows users to post and interact with messages known as tweets.

Experience: 5+ years in a leadership role within engineering

Type: Full-time

Crusoe

Senior Site Reliability Engineer, Compute San Francisco

Skills & Focus: Linux kernel internals, virtualization, KVM, hypervisor, performance optimization, AI workloads, HPC workloads, kernel tuning, support for compute hardware, performance bottlenecks

About the Company: Crusoe is building the world’s favorite AI-first cloud infrastructure company, providing sustainable, purpose-built AI infrastructure solutions trusted by Fort…

Experience: 5+ years of professional experience in SRE, Linux system engineering, or compute infrastructure roles

Salary: $183,000 - $210,000 per year + Bonus

Type: Full-time

Benefits: Hybrid work schedule, industry competitive pay, restricted stock units, health insurance, employer contributions to HSA…

Checkr

Site Reliability Engineer II San Francisco

Skills & Focus: Site Reliability Engineer, AWS, Azure, containers, micro-service architecture, REST APIs, incident commander, production issues, data quality, collaboration

About the Company: Checkr is building the data platform to power safe and fair decisions. Established in 2014, Checkr’s innovative technology and robust data platform help custom…

Experience: 3+ years

Salary: $135,000 to $159,000

Type: Full-time

Benefits: 100% medical, dental, and vision coverage; Up to $25K reimbursement for fertility, adoption, and parental planning serv…

Crusoe Energy Systems

Site Reliability Engineer (SRE) San Francisco

Skills & Focus: observability, monitoring, telemetry, automation, collaboration, SRE, infrastructure, Python, Docker, Kubernetes

About the Company: Crusoe is building the World’s Favorite AI-first Cloud infrastructure company. We’re pioneering vertically integrated, purpose-built AI infrastructure solution…

Experience: 5+ years of professional SRE experience

Salary: $135,000 - $158,000

Type: Full-time

Benefits: Hybrid work schedule, Industry competitive pay, Restricted Stock Units, Health insurance package options, Employer cont…

Crusoe

Director of Engineering San Francisco

Skills & Focus: AI infrastructure, Cloud infrastructure, SRE organization, Incident Management, Operational Excellence, Reliability best practices, Observability standards, Mentorship, Incident Management program, Reliability engineering

About the Company: Crusoe is building the World’s Favorite AI-first Cloud infrastructure company. We’re pioneering vertically integrated, purpose-built AI infrastructure solution…

Salary: $320,000 - $360,000

Type: Full-time

Benefits: Industry competitive pay, Restricted Stock Units, Health insurance package options, Employer contributions to HSA accou…

Openai

Stream Infrastructure Engineer San Francisco

Skills & Focus: stream infrastructure, Kafka, Azure EventHub, AWS Kinesis, infrastructure tooling, Terraform, Kubernetes, data platform, scalability, reliability

About the Company: OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the bounda…

Experience: 4+ years in stream infrastructure engineering

Salary: $200K – $385K

Type: Full time

Benefits: Medical, dental, and vision insurance, mental health support, 401(k) with 50% matching, generous time off, paid parenta…

Alchemy

Infrastructure Engineer (Reliability Focus) San Francisco

Skills & Focus: Reliability, Observability, Infrastructure Engineer, Production Systems, AWS, Docker, Kubernetes, CI/CD, Infrastructure-as-Code, Engineering Excellence

About the Company: Alchemy is the only complete developer platform that offers the powerful APIs, SDKs, and tools necessary to build and scale onchain apps and rollups. Our infra…

Experience: 5+ years of experience as an Infrastructure Engineer focused on Reliability (e.g., Site Reliability Engineer, Production Engineer, Platform Engineer)

Salary: $135,000 - $350,000 annually

Type: Full-time

Benefits: Comprehensive medical, dental, and vision coverage, 401k, unlimited flexible time off, equity options

Abridge

Software Engineer, SRE San Francisco

Skills & Focus: platform engineering, SRE, cloud-native, Kubernetes, CI/CD, Google Cloud Platform, Terraform, GCP, security, automation

About the Company: Abridge was founded in 2018 with the mission of powering deeper understanding in healthcare, transforming patient-clinician conversations into structured clini…

Experience: 5+ years of platform/devops experience in a cloud-native software company

Salary: $162K – $234K

Type: Full time

Benefits: Generous Time Off, Comprehensive Health Plans, Paid Parental Leave, 401k and Matching, Pre-tax Benefits, Learning and D…

Baseten

Site Reliability Engineer San Francisco

Skills & Focus: Site Reliability Engineer, Kubernetes, Scalable Infrastructure, Infrastructure-as-Code, CI/CD Tools, Project Management, Collaboration, Mentorship, Performance Optimization, Machine Learning

About the Company: Join our dynamic team at Baseten, where we’re revolutionizing AI deployment with cutting-edge inference infrastructure. Backed by premier investors such as IVP…

Experience: 3+ years of professional work experience in a fast-paced, high-growth environment

Type: Full-time

Benefits: Competitive compensation package (Unlimited PTO, 401k, covered healthcare premiums), A unique opportunity to be part of…

Speak

SRE Engineer, Lead San Francisco

Skills & Focus: reliability, infrastructure, Kubernetes, GCP, Node.js, PostgreSQL, Redis, observability, incident response, scalability

About the Company: Speak is on a journey to fix the language learning experience by creating AI-powered conversational tools to help billions gain fluency.

Experience: 7+ years in SRE, DevOps, or infrastructure-focused engineering roles

Loft Orbital

Senior Site Reliability Engineer San Francisco

Skills & Focus: Site Reliability Engineering, Cloud Infrastructure, DevOps, satellites, space operations, integration, delivery, reliability, automated infrastructure, SatDevOps

About the Company: Loft Orbital is revolutionizing access to space by building reliable, shareable satellites that drastically reduce the time and complexity traditionally requir…

Openai

Software Engineer, Reliability San Francisco

Skills & Focus: reliability, scalability, performance, monitoring, automation, Infrastructure as Code, containerization, cloud infrastructure, observability, microservices

About the Company: OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the bounda…

Experience: Proven experience as a reliability engineer or a similar role in a fast-paced, rapidly scaling company.

Salary: $255K – $405K

Type: Full time

Benefits: Medical, dental, and vision insurance, mental health and wellness support, 401(k) plan with 50% matching, generous time…

Foundry Technologies, Inc.

Senior Site Reliability Engineer, Supply San Francisco

Skills & Focus: Site Reliability Engineer, Cloud Infrastructure, GPU Management, Incident Response, Monitoring Systems, Ansible, Scripting, Data Center Operations, Technical Documentation, AI Workloads

About the Company: Foundry is actively seeking talented candidates at the Senior to Principal level, with a goal to transform how AI companies access compute power. They are buil…

Experience: Experience working with Linux systems administration and command-line interfaces, experience leading incident response and root cause analysis.

Salary: $170,000 - $230,000

Type: Full-time

Benefits: Health, dental, and vision coverage for you and your dependents, 401k Plan with 4% company match, 21 days of PTO & 14 c…

Sigma Computing

Senior Software Engineer - Observability and Reliability San Francisco

Skills & Focus: observability, distributed tracing, application performance management, cloud security, GCP, AWS, Azure, data analytics, Kubernetes, best practices

About the Company: Sigma is the only cloud analytics and business intelligence tool empowering business teams to break free from the confines of the dashboard, explore data for t…

Experience: 5+ years industry experience building and maintaining high-quality software

Salary: $150k - $220k annually

Type: Full-time

Benefits: Equity, generous health benefits, flexible time off policy, paid bonding time for all new parents, traditional and Roth…

Hive

DevOps and Systems Engineer San Francisco

Skills & Focus: cloud-based AI solutions, machine learning, DevOps, Site Reliability, automation, enterprise SaaS, distributed computing, high performance computing, hybrid infrastructure, GPU integration

About the Company: Hive is the leading provider of cloud-based AI solutions to understand, search, and generate content, and is trusted by hundreds of the world's largest and mos…

Type: Full-time

Goodleap

Site Reliability Engineer San Francisco

Skills & Focus: Site Reliability Engineer, software engineering, system engineering, automation, monitoring, incident response, infrastructure management, DevOps, observability, AWS

About the Company: GoodLeap is a technology company delivering best-in-class financing and software products for sustainable solutions, from solar panels and batteries to energy-…

Salary: $97,000 - $141,000 a year

Type: Full Time

Cisco Thousandeyes

Senior Site Reliability Engineer, Infrastructure San Francisco

Skills & Focus: AWS, Terraform, Infrastructure-as-Code, Python, Go, Docker, Networking, Security, Site Reliability Engineering, Distributed Systems

About the Company: Cisco ThousandEyes is a Digital Experience Assurance platform that empowers organizations to deliver flawless digital experiences across every network – even t…

Experience: 5+ years

Salary: 174,300 - 203,100 USD

Type: Full-time

Benefits: medical, dental and vision insurance, 401(k) plan, short and long-term disability coverage, life insurance, paid holida…

Crusoe Energy Systems

Staff Site Reliability Engineer San Francisco

Skills & Focus: Site Reliability Engineering, infrastructure design, automation, monitoring, incident response, cloud infrastructure, AI applications, network programming, Unix/Linux, programming languages

About the Company: Crusoe is building the World’s Favorite AI-first Cloud infrastructure company. We’re pioneering vertically integrated, purpose-built AI infrastructure solution…

Experience: 8+ years of professional SRE experience

Salary: $250,000

Type: Full time

Benefits: Hybrid work schedule, restricted stock units, health insurance, paid parental leave, 401(k) with a 100% match, generous…

Site Reliability Engineer (SRE) - Observability San Francisco

Skills & Focus: Observability, Site Reliability Engineering, Infrastructure, Telemetry, Monitoring, Analytics, Collaboration, Automation, CI/CD, Security

About the Company: Crusoe is building the World’s Favorite AI-first Cloud infrastructure company. We’re pioneering vertically integrated, purpose-built AI infrastructure solution…

Experience: 7+ years of professional SRE experience.

Salary: $155,000 - $183,000

Type: Full-time

Benefits: Hybrid work schedule, industry competitive pay, restricted stock units, health insurance options, HSA contributions, pa…

Benchling

Infrastructure Engineer San Francisco

Skills & Focus: infrastructure, AWS, security, monitoring, automation, site reliability, cloud computing, Kubernetes, Terraform, CI/CD

About the Company: Benchling’s mission is to unlock the power of biotechnology. The world’s most innovative biotech companies use Benchling’s R&D Cloud to power the development o…

Experience: 5 or more years in DevOps, SRE, or platform engineering

Salary: $157,150 to $212,750

Type: Full-time

Benefits: Medical, dental, vision, fertility benefits, parental leave, 401(k), commuter benefits, mental health support, wellness…

Writer

Site Reliability Engineer (SRE) San Francisco

Skills & Focus: Site Reliability Engineering, cloud infrastructure, Terraform, Python, AWS, GCP, Docker, Kubernetes, monitoring tools, system optimization

About the Company: Writer is the full-stack generative AI platform delivering transformative ROI for the world’s leading enterprises. Named one of the top 50 companies in AI by F…

Experience: Minimum of 7 years of hands-on experience in Site Reliability Engineering

Type: Full-time

Benefits: Generous PTO, medical, dental, vision coverage, paid parental leave, fertility and family planning support, flexible sp…

Autify, Inc.

Infrastructure Engineer San Francisco

Skills & Focus: automation, reliable, secure, cost-efficient, cloud infrastructure, software reliability, engineering, SRE, GenAI, test automation

About the Company: Autify, Inc. is a San Francisco-based startup that was founded by the first Japanese team to graduate from Alchemist Accelerator, one of the top accelerators i…

Abridge

Site Reliability Engineer (SRE) San Francisco

Skills & Focus: Site Reliability Engineering, Cloud Security, Kubernetes, CI/CD, Distributed Systems, Observability, Infrastructure as Code, Google Cloud Platform, Performance Optimization, Incident Response

About the Company: Abridge was founded in 2018 with the mission of powering deeper understanding in healthcare. Our AI-powered platform was purpose-built for medical conversation…

Experience: 6+ years of software engineering experience focused on distributed systems or tooling, with an interest in engineering enablement and software scaling.

Salary: $180K – $265K

Type: Full time

Benefits: Generous Time Off, Comprehensive Health Plans, Paid Parental Leave, 401k matching, Learning and Development Budget, Sab…

Altana

Senior Manager, Technical Operations & Observability San Francisco

Skills & Focus: Observability, SRE, Incident Management, IT Operations, FinOps, Automation, Reliability, Cloud Platforms, Monitoring, Alerting

About the Company: Altana applies AI to the world's largest organized body of supply chain data to power a more resilient, secure, and sustainable model of global commerce, focus…

Salary: $185,000 - $220,000 USD

Benefits: Flexible Time Off, Paid Parental Leave, Health Benefits, Supplemental Benefits, 401(k), Commuter Benefits, Wellness pro…

Goodleap

Site Reliability Engineer San Mateo

Skills & Focus: Site Reliability Engineer, software engineering, system engineering, automation, monitoring, incident response, infrastructure management, DevOps, observability, AWS

About the Company: GoodLeap is a technology company delivering best-in-class financing and software products for sustainable solutions, from solar panels and batteries to energy-…

Salary: $97,000 - $141,000 a year

Type: Full Time

Arkose Labs

Senior Director of Engineering San Mateo

Skills & Focus: Platform Engineering, Infrastructure, Site Reliability, Cloud Infrastructure, Incident Response, AWS, Azure, Distributed Systems, CI/CD, Infrastructure-as-Code

About the Company: Arkose Labs protects enterprises from cybercrime and abuse, offering the world's first $1M warranties for credential stuffing and SMS toll fraud. They have a s…

Experience: 5+ years of leadership experience in Platform, Infrastructure, SRE, or related fields; 10+ years of experience in software engineering.

Salary: $270,000.00-$350,000.00

Type: Full-time

Benefits: Competitive salary + Equity; 401k plan; Robust benefits package (85% medical, dental, vision for employees; 75% for dep…

Xero

Team Lead of Product SRE San Mateo

Skills & Focus: Product SRE, SRE engineers, reliability, Observability, high performing services, Engineering, high performing teams, Product SRE strategy, transformation, expert communicator

About the Company: Xero helps businesses by automating routine tasks and connecting them with the right data, advisors, and apps, ultimately contributing to a stronger economy.

Experience: Strong Engineering background, deep experience in SRE

Sustainable Talent

Platform Reliability & Lab Support Engineer Santa Clara

Skills & Focus: Infrastructure, Data Centers, Hardware, Software, Networking, Troubleshooting, DevOps, Maintenance, Collaboration, Testing

About the Company: Sustainable Talent is a staffing agency partnered with Nvidia, focusing on providing talent for tech roles in infrastructure and data centers.

Experience: 4+ years of equivalent experience in a Lab or Datacenter environment.

Salary: $70/hr - $80/hr

Type: Full-time

Benefits: Full benefits, PTO, and amazing company culture.

Palo Alto Networks

Manager, Site Reliability Engineering (Cortex, Tools and Platforms) Santa Clara

Skills & Focus: DevOps, Site Reliability Engineering, Cortex, Security, Engineering Management, Cloud, Platforms, Production Operations, AI, Software Development

About the Company: Palo Alto Networks is a cybersecurity company that offers advanced firewalls and cloud-based security services to secure the digital transformation.

Type: Full-time

Sr Site Reliability Engineer (App Service Team) Santa Clara

Skills & Focus: Site Reliability Engineering, DevOps, cloud-native applications, AWS, GCP, Terraform, Kubernetes, automation, programming languages, CI/CD

About the Company: Palo Alto Networks is a cybersecurity company that aims to redefine protection and security in the digital age. Their mission is to be the cybersecurity partne…

Experience: 4+ years as an engineer in Infrastructure, Operations, DevOps, or System Engineering; 2+ years building high availability, scalable cloud-native applications on AWS and GCP

Type: Full-time

Benefits: FLEXBenefits wellbeing spending account, mental and financial health resources, personalized learning opportunities

Servicenow

Senior Staff Machine Learning Engineer - Site Reliability Engineer Santa Clara

Skills & Focus: Machine Learning, AI, infrastructure, platform, deployment, observability, GPU, scalable, code reviews, SRE

About the Company: PLATO (Platform Engineering and AI Technology Organization) at ServiceNow is a customer-focused innovative group building intelligent software using a variety …

Palo Alto Networks

Senior Staff DevOps Engineer Santa Clara

Skills & Focus: DevOps, SRE, Cloud infrastructure, Automation, Terraform, Kubernetes, GitLab CI/CD, Monitoring, Security, Reliability

Principal Site Reliability Engineer (WildFire Cloud Infrastructure) Santa Clara

Skills & Focus: Site Reliability Engineer, DevOps, Cloud infrastructure, Automation, Kubernetes, GCP, AWS, Python, Docker, Terraform

About the Company: Palo Alto Networks is a cybersecurity company committed to protecting our digital way of life. The company aims to redefine cybersecurity standards and focuses…

Experience: BS or MS in Computer Science, a related field, or equivalent professional experience

Salary: $160,000 - $225,000/YR

Type: Full-time

Benefits: FLEXBenefits wellbeing spending account, mental and financial health resources, personalized learning opportunities

Apple

Database SRE- Postgres SQL Sunnyvale

Skills & Focus: Postgres, Database, AWS, Kubernetes, High Availability, Replication, Performance Tuning, Disaster Recovery, Backup, Cloud Infrastructure

About the Company: Apple Inc. is a leading technology company known for its innovative products and services.

Experience: 5-15 years supporting Postgres databases in a high volume environment

Type: Full-time

Google

Software Developer II, Site Reliability Development, Google Cloud Sunnyvale

Skills & Focus: software development, site reliability development, coding, algorithms, complexity analysis, large-scale systems, automation, system capacity, performance optimization, team collaboration

About the Company: Google is a global technology company that specializes in Internet-related services and products, which include search engines, online advertising technologies…

Experience: Experience with data structures/algorithms and software development in one or more programming languages.

Salary: $118,000-$170,000

Type: Full-time

Benefits: bonus + equity + benefits

81 Site Reliability Engineer jobs in San Francisco.

🔥 Skills

📍 Locations

Zoox

Replit

Zoox

Neuralink

Personalis, Inc

Neuralink

Meta

Robinhood Markets

Aerospike

Coupang

Newsbreak

Intuit

Coupang

Moody's Shared Services, Inc.

2k Games

Hippocratic Ai

Luma Ai

Glean

Luma Ai

Celonis

Box

Celonis

Astronomer

Cisco Thousandeyes

Gusto

Astranis

Anthropic

Crusoe

Sesame

Crusoe Energy Systems

Abridge

Focal Systems

Openai

Crusoe

Orb

Cisco Thousandeyes

Salesforce

Twitter

Crusoe

Checkr

Crusoe Energy Systems

Crusoe

Openai

Alchemy

Abridge

Baseten

Speak

Loft Orbital

Openai

Foundry Technologies, Inc.

Sigma Computing

Hive

Goodleap

Cisco Thousandeyes

Crusoe Energy Systems

Benchling

Writer

Autify, Inc.

Abridge

Altana

Goodleap

Arkose Labs

Xero

Sustainable Talent

Palo Alto Networks

Servicenow

Palo Alto Networks

Meta

Apple

Google

Unlock AI-Powered Job Insights