DoubleCloud are looking for a Senior Site Reliability Engineer who’s passionate about working with data and gets excited solving large and complex challenges. You’ll be doing this for the most innovative companies in the world, engineering solutions that use data in ways never tried before.

As we’re a startup, the successful candidate will be helping shape and grow our newly formed SRE team in a startup-like agile rhythm.

About DoubleCloud and our product:

DoubleCloud, the creators of the first managed ClickHouse service, offer an open-source data management platform with a single point of access to the services that help build modern data applications. Backups, monitoring, shard configurations, replicas, and updates are all expertly managed by us, allowing our clients to concentrate on their projects rather than routine tasks. Our engineers play a significant role in contributing to leading open-source technologies such as ClickHouse, PostgreSQL, Odyssey, WAL-G, and others.

Since 2021, we’ve worked with more than 100 companies crunching analytics with various data tools, including Clickhouse, BigQuery, Redshift, MySQL, Postgres, and Kafka.

As a result, we created a data platform to specifically help businesses build an end-to-end modern data stack and real-time analytics with fully managed open-source technologies, like Clickhouse, Kafka, etc.

With our platform, data engineers can focus on what they love building instead of spending time on tasks related to scaling up or down, installing updates, deploying additional software, and other required admin around open source technologies.

DoubleCloud is an early-stage startup with over 45 people as of today, and we want you to be a part of our team!

What you’ll be doing:

  • Improving reliability, performance, monitoring, emergency response and capacity planning for DoubleCloud
  • Rolling out our cutting-edge cloud technologies to meet infrastructure needs
  • Implementing and improving CD processes
  • Growing L3 support competency within the team
  • Partnering with development and support L1/L2 teams as well as other leaders at DoubleCloud

What we expect from you:

  • Experience operating large, distributed systems
  • 3+ years of experience with Kubernetes
  • Experience in implementing pipelines with Spinnaker
  • Software engineering experience in Python and/or Go
  • Experience with designing and deploying infrastructure, skilled in "infrastructure as code", e.g. Terraform and Packer
  • Practical experience designing and improving monitoring and alerting systems using Prometheus, Thanos and Grafana
  • Expertise in Linux network and container technologies such as Docker or Podman
  • A good understanding of at least one DBMS, e.g. PostgreSQL, MySQL, ClickHouse, or Redis

Some nice-to-haves include:

  • In-depth knowledge of multiple database management systems
  • Kubernetes certifications (CKAD etc.)

DoubleCloud’s Culture

As a team, we work in a startup-like agile rhythm. We help and inspire each other, experiment with new ideas and learn from the outcomes. We’re here for each other and we ensure each individual has everything they need to reach their goals. We’re here to build the best possible product and want our customers to get the most value from it.

DoubleCloud is proud to be an equal opportunity employer. Simply put, we don’t discriminate which means we treat everyone with respect. Diversity, equity, and inclusion are fundamental principles at DoubleCloud.

We’re a global and diverse team full of positive vibes and we love it that way.


To reward our employees for the great work they do, we offer several perks and benefits, including:

  • Flexible working hours
  • Paid parental leave
  • For WFH: Home office reimbursement options
  • For remote coworking: office space or coworking space reimbursement
  • Flexible vacation and paid sick leave
  • And plenty more...


Get in touch or share this Job Description with someone you think might be.