Senior Site Reliability Engineer



Software Engineering
Posted on Friday, September 1, 2023

About Us:

Zepto is a rapidly-scaling provider of real-time, account-to-account payments solutions for merchants, and is reimagining the way money moves through the always-on, digital-first economy.

Our growing team across Australia allows for like-minded, talented, passionate people motivated to help us deliver on our brand promise to #LevelThePayingField

At Zepto we believe in the power of positive human experiences, and that a deep sense of belonging creates cohesion in our culture. So, even in a remote-first team, you will be a contributor and custodian of that culture. You will also enjoy solving complex problems, and play a key role in creating something truly special as we focus on delivering ‘a better way to pay’ at Zepto.


About the Role:

As a Senior Site Reliability Engineer, you will have the opportunity to architect, build and support Zepto’s real-time payments platform.

Reporting to an Engineering Manager, your focus will be automating and supporting production grade hybrid cloud infrastructure. You will make crucial decisions about how to handle and scale complex, high-performance, distributed and secure systems. You will have visibility and impact across all of Engineering and will be able to balance the needs of the business with your understanding of cost implications against implementing layers of security, performance and availability.

Working in a scale-up, means you get the opportunity to flex your skills in a variety of ways. We are agile and always willing to roll up our sleeves to get things done. You can, however, expect your day to day to be involved in the following:

  • Design and build solutions which support the execution of our business strategy.
  • Introduce best practices into the teams around observability, SLOs and reliability.
  • Identify areas for improvement across the organisation and drive Engineering-wide technical change in the field of Site Reliability.
  • Analyse patterns in incidents and identify improvements needed in how we operate and design software.
  • Lead development and roll out of new tools, technologies and processes that have high business impact and are used by multiple teams that improve reliability and velocity.
  • Locate and implement cost saving initiatives across Cloud infrastructure, relevant tooling and work processes.
  • Contribute to the architectural roadmap for Zepto’s services
  • Share your knowledge and experiences with other members of the SRE & Engineering team
  • Participate in an on-call rotation
  • Build relationships and influence both internal (across the business) and external (3rd party partners/vendors)

About you:

You will have proven experience in designing, securing and operating Kubernetes clusters at scale, in production. You are known for your ability to use GitOps tooling to deploy services onto Kubernetes environments and have experience working with and implementing CI/CD pipelines for cloud infrastructure.

You are adaptive and thrive in an evolving environment. You are comfortable with working in a busy, high volume, competitive commercial environment. Your collaboration skills make you a valued member of the team and fundamentally a good colleague to work with.

What we’re looking for:

  • Experience designing, securing and operating Kubernetes clusters at scale, in production.
  • Demonstrated use of GitOps tooling to deploy services onto Kubernetes environments.
  • Strong capabilities automating complex environments using IaC tools (preferably Ansible and Terraform).
  • Experience working with and implementing CI/CD pipelines for cloud infrastructure.
  • Deep knowledge architecting and administering solutions in a Public Cloud (preferably AWS):
    • VPC, networking, routing, transit gateway, site-to-site VPN, direct connect
    • EC2, auto-scaling, security groups, ALB/NLB, EFS
    • IAM, SSO, Shield/WAF, Cloudfront
    • Lambda, API Gateway
  • Knowledge of data pipelines.
  • Experience with event-driven architectures using queuing subsystems such as NATS, SQS or Kafka.
  • An ability to write and debug code in Bash, GoLang, Python or Ruby.
  • Demonstrated ability to troubleshoot issues across a full stack (CDN, front-end, back-end, orchestration, OS, database, caching).
  • Knowledge of how to optimise database performance (NoSQL or RDBMS).
  • Familiarity with VMware vSphere (nice to have).
  • Experience working with physical switches and routers (CISCO, Juniper etc.) (nice to have).
  • Able to adapt and thrive in an evolving environment.
  • Able to be effective in a collaborative team environment.
  • Ability to work productively in a fast paced and agile environment.

What’s on Offer:

Headquartered in beautiful Byron Bay, Zepto has an inspiring Founder story and is a customer-focused, culture-first organisation.

We’re all striving to achieve our mission of enabling a better way to pay for consumers and merchants and do so while fostering an inclusive culture where you will work with and learn from world-class talent in their areas of expertise.

This role can be based anywhere in Australia as we work remotely but you’ll feel connected through our various initiatives and be supported by great leadership to learn and grow. If your preference is to work hybrid or in an office, we have hub locations in Byron, Sydney and Melbourne.

We have a supportive learning environment, with access to an individual learning benefit to ensure your curiosity and learning is a priority. You will have access to an Employee Assistance Program, paid parental leave and be eligible for inclusion in our Employee Share Option Plan.