610 Site Reliability jobs in South Africa

Site Reliability Engineer

Boksburg, Gauteng Datacentrix Human Capital

Posted today

Job Viewed

Tap Again To Close

Job Description

Are you a Site Reliability Engineer with solid Datadog experience? Our client in the Warehousing and Logistics sector is looking to employ an Engineer to Support the design, implementation, and optimization of Datadog monitoring solutions across infrastructure, applications, and services.

Qualifications and Experience:

  • Datadog Certified Fundamentals – Must have
  • Degree in Information Technology or Computer Science
  • Management of operations on virtualized and distributed infrastructures,
  • Management of operations on environment with clustering, replication, load balancer
  • ITIL Practitioner (V3) / ITIL Specialist (V4)
  • Windows Server: Advantage
  • 1–3 years of experience working with a modern monitoring/observability tool, ideally Datadog (or alternatives like Prometheus, Grafana, New Relic, or Dynatrace).

Experience in:

  • Deploying and configuring monitoring agents
  • Creating dashboards and monitors
  • Parameterizing tags and labels for proper data correlation
  • Basic familiarity with cloud platforms (AWS, Azure, or GCP) and container environments (Docker/Kubernetes)
  • Experience working with Centreon - Advantage
  • Strong interest in monitoring, DevOps, SRE, or cloud infrastructure
  • Knowledge of basic scripting (e.g., Bash, Python) is a plus

Duties:

  • Support the design, implementation, and optimization of Datadog monitoring solutions across infrastructure, applications, and services.
  • Work alongside DevOps, infrastructure, and application teams to ensure complete observability using custom dashboards, alerts, and tagging strategies.
  • Assist in the deployment and onboarding of new systems into the monitoring ecosystem.
  • Serve as the go-to person for building visualizations, improving signal-to-noise ratios in alerting, and aligning monitoring with business objectives.
  • Ideal for a young and motivated engineer looking to grow within observability and cloud-native monitoring.
  • Deploy and configure Datadog agents across various environments (cloud and on-prem).
  • Create and customize dashboards, monitors, and alerts for systems, services, containers, and applications.
  • Implement tagging strategies to organize, filter, and correlate metrics and logs effectively.
  • Integrate Datadog with various platforms (AWS, Azure, GCP, Kubernetes, Docker, etc.) to collect telemetry data.
  • Collaborate with developers, DevOps, and infrastructure teams to identify key business and system metrics to monitor.
  • Continuously tune and optimize monitors to reduce false positives and improve actionable alerting.
  • Document dashboards, alert logic, best practices, and knowledge for cross-team enablement.
  • Analyze incidents and outages post-mortem to identify monitoring gaps and enhance visibility.
  • Assist in evangelizing observability practices within the organization and contribute to monitoring as code efforts (e.g., Terraform for Datadog resources).
  • Stay up to date with new Datadog features and industry trends in observability and monitoring.
#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Robin AI

Posted today

Job Viewed

Tap Again To Close

Job Description

workfromhome

Robin AI City of Cape Town, Western Cape, South Africa

Join or sign in to find your next job

Join to apply for the Site Reliability Engineer role at Robin AI

Robin AI City of Cape Town, Western Cape, South Africa

Join to apply for the Site Reliability Engineer role at Robin AI

About Robin

Robin is on a mission to rebuild the legal industry — starting with making contracts simple for everyone. We are a pioneer in Legal AI, built on proprietary models, licensed data, and deep partnerships with Anthropic and AWS. Since 2019, we’ve expanded our footprint to 4 continents and have been supporting many of the world’s most successful businesses, including GE, Pfizer, KPMG, and UBS.

About Robin

Robin is on a mission to rebuild the legal industry — starting with making contracts simple for everyone. We are a pioneer in Legal AI, built on proprietary models, licensed data, and deep partnerships with Anthropic and AWS. Since 2019, we’ve expanded our footprint to 4 continents and have been supporting many of the world’s most successful businesses, including GE, Pfizer, KPMG, and UBS.

What will you do as an SRE?

As an SRE at Robin AI, you'll help build and maintain our cloud infrastructure and applications that powers our cutting-edge Legal AI platform. You'll collaborate with engineering teams to establish robust monitoring, incident response, and deployment strategies that ensure high availability and reliability of our proprietary models and services, maintaining optimal SLOs for our global customer base.

Your Day-to-day Responsibilities

  • You will be responsible for ensuring the Robin systems are highly available and scalable.
  • Standardise and implement observability practices in our service-based architecture through logging, traces, metrics and monitors
  • Design, deploy, and operate infrastructure to support Robin's product teams as we expand into new regions.
  • Adding automation around manual operational tasks
  • Collaborate with development team leads to optimise build, test, and deployment processes
  • Participating in and improving our on-call and incident handling processes to ensure 24/7 system reliability

Ideally, You Should Have The Following Qualifications

  • 3+ years of experience in DevOps or Site Reliability Engineering roles
  • Proficiency in at least one backend programming language (We use Python)
  • Strong knowledge of AWS services (ECS, S3, RDS, Lambda, etc.), managed by Terraform
  • Comfortable troubleshooting across the full stack, starting from the browser, through the networking components, into the containerised applications and then onto data stores.
  • Knowledge of observability frameworks and tools (We use OpenTelemetry, Cloudwatch & DataDog)
  • Excellent problem-solving and communication skills
  • Experience with AI/ML infrastructure deployments is a plus

What’s In It For You

  • Salary: Competitive
  • Hybrid schedule: We offer a flexible working schedule.
  • Equity package: Generous equity scheme - everyone gets to be an owner of Robin AI!
  • Annual leave: 20 days PTO, in addition to the public holidays observed in South Africa.
  • Growth opportunities: We prioritise promotions for high performers and help you to progress your career.

What’s it like working at Robin?

Our culture and values attract people who are creative, resourceful, and share our passion for excellence. At Robin, you're encouraged to push yourself and empowered to take risks. We support each other to think big, try new ideas, and navigate uncertainty. Whether you're at our headquarters or one of our worldwide offices, you'll find a world of opportunities to grow, thrive, and make a meaningful impact. See what life is like at Robin.

Diversity, Equity and Inclusion at Robin

We are committed to building one of the most diverse technology companies in the world. As of 2024, more than 30% of our employees come from ethnic minority backgrounds, and 51% of roles are held by women. We know that transforming the legal industry requires diverse perspectives, so we're creating an environment where innovation thrives through inclusion.

Robin operates a direct hiring model and any speculative CVs shared via agencies will be treated as a gift.

Seniority level
  • Seniority level Mid-Senior level
Employment type
  • Employment type Full-time
Job function
  • Job function Engineering and Information Technology
  • Industries Software Development

Referrals increase your chances of interviewing at Robin AI by 2x

Get notified about new Site Reliability Engineer jobs in City of Cape Town, Western Cape, South Africa .

Somerset West, Western Cape, South Africa 2 weeks ago

I want to work at Lula sometime in the future!

Cape Town, Western Cape, South Africa 2 months ago

Cape Town, Western Cape, South Africa 2 months ago

Cape Town, Western Cape, South Africa 18 hours ago

Cape Town, Western Cape, South Africa 5 months ago

Cape Town, Western Cape, South Africa 3 weeks ago

Software Engineer (Python) - Supply Chain

Cape Town, Western Cape, South Africa 1 month ago

Software Engineer (Python) - Supply Chain

Cape Town, Western Cape, South Africa 1 month ago

Cape Town, Western Cape, South Africa 2 weeks ago

Cape Town, Western Cape, South Africa 1 month ago

Cape Town, Western Cape, South Africa 2 days ago

Software Engineer - 85 Percent Average in Matric

Cape Town, Western Cape, South Africa 1 month ago

Cape Town, Western Cape, South Africa 5 days ago

Cape Town, Western Cape, South Africa 2 days ago

Cape Town, Western Cape, South Africa 8 hours ago

Cape Town, Western Cape, South Africa 2 weeks ago

Cape Town, Western Cape, South Africa 3 weeks ago

Cape Town, Western Cape, South Africa 2 weeks ago

Cape Town, Western Cape, South Africa 1 day ago

Cape Town, Western Cape, South Africa 2 weeks ago

Cape Town, Western Cape, South Africa 1 month ago

Cape Town, Western Cape, South Africa 1 month ago

Cape Town, Western Cape, South Africa 3 months ago

Cape Town, Western Cape, South Africa 1 month ago

Cape Town, Western Cape, South Africa 2 weeks ago

Cape Town, Western Cape, South Africa 3 weeks ago

City of Cape Town, Western Cape, South Africa 1 week ago

Cape Town, Western Cape, South Africa 3 weeks ago

Cape Town, Western Cape, South Africa 2 weeks ago

Cape Town, Western Cape, South Africa ZAR65,000.00-ZAR80,000.00 1 month ago

City of Cape Town, Western Cape, South Africa 1 week ago

Cape Town, Western Cape, South Africa 9 hours ago

City of Cape Town, Western Cape, South Africa 5 days ago

Cape Town, Western Cape, South Africa 2 months ago

Cape Town, Western Cape, South Africa ZAR65,000.00-ZAR80,000.00 4 months ago

CLOUD ENGINEER (Data Ops and Analytical Workflows)

City of Cape Town, Western Cape, South Africa 1 month ago

Cape Town, Western Cape, South Africa 1 week ago

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Western Cape, Western Cape LexisNexis

Posted 6 days ago

Job Viewed

Tap Again To Close

Job Description

workfromhome

About our Team

LexisNexis Legal & Professional, which serves customers in more than 150 countries with 11,800 employees worldwide, is part of RELX , a global provider of information-based analytics and decision tools for professional and business customers. Our company has been a long-time leader in deploying AI and advanced technologies to the legal market to improve productivity and transform the overall business and practice of law, deploying ethical and powerful generative AI solutions with a flexible, multi-model approach that prioritizes using the best model for each individual legal use case.

About the role:

Our CEMEA Cloud/SRE team is looking for an experienced DevOps Engineer to help build scalable, secure, and reliable systems. Our team specializes in cloud and DevOps technologies, with members possessing varying levels of expertise in areas such as Kubernetes, development, and database administration. Cedric emphasizes a collaborative working style and values team members who are proactive in communication and knowledge sharing.

Responsibilities

  • Build and improve CI/CD pipelines
  • Manage AWS cloud infrastructure
  • Ensure high availability, observability, and performance
  • Automate systems for efficiency and cost optimization
  • Collaborate across engineering, ops, and leadership teams

Requirements

  • Bachelor’s Degree or Advanced Diploma in Information Systems, Computer Science, Mathematics, Engineering, with a minimum of 5 years experience in a software/technology environment.
  • Certifications in AWS or Kubernetes are advantageous.
  • At least 5 years of experience in a DevOps or SRE role.
  • AWS Expertise: Preferably 5 years of comprehensive experience with AWS services, including EC2, Lambda, DynamoDB, Aurora RDS PostgreSQL, and AWS OpenSearch.
  • Kubernetes Proficiency: Preferably 2 years of hands-on experience with deploying, managing, and scaling applications in Kubernetes environments. Practical experience with Helm and ArgoCD.
  • Understanding of containerization concepts and tools like Docker/Podman.
  • Infrastructure as Code (IaC): Preferably 2 years of experience with IaC tools like Terraform to manage and automate cloud resources effectively. Terraform is preferred.

Work in a way that works for you

We promote a healthy work/life balance across the organization. We offer numerous wellbeing initiatives, shared parental leave, study assistance, and sabbaticals to help you meet your immediate responsibilities and long-term goals.

  • Working flexible hours — adjusting your work times to fit your productivity peaks.


Working for you

We value your well-being and happiness. Benefits include:

  • Medical Aid
  • Retirement Plan including Risk Benefits (Disability, Critical Illness, Life, and Funeral Cover)
  • Modern family benefits, including adoption and surrogacy
  • Study Leave


About the Business

LexisNexis Legal & Professional provides legal, regulatory, and business information and analytics that help customers increase productivity, improve decision-making, achieve better outcomes, and promote the rule of law globally. As a pioneer in digital services, the company was the first to bring legal and business information online with Lexis and Nexis.

We are committed to a fair and accessible hiring process. If you require accommodation or adjustments, please complete our Applicant Request Support Form or contact 1- .

Warning: Be aware of scams where criminals pose as recruiters asking for money or personal information. We never request money or banking details from applicants. Learn more about spotting and avoiding scams here .

Please read our Candidate Privacy Policy .

We are an equal opportunity employer: qualified applicants will be considered and treated without regard to race, color, creed, religion, sex, national origin, citizenship status, disability, veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other protected characteristic.

USA Job Seekers:

EEO Know Your Rights .

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Johannesburg, Gauteng LexisNexis

Posted 6 days ago

Job Viewed

Tap Again To Close

Job Description

About our Team

LexisNexis Legal & Professional, which serves customers in more than 150 countries with 11,800 employees worldwide, is part of RELX , a global provider of information-based analytics and decision tools for professional and business customers. Our company has been a long-time leader in deploying AI and advanced technologies to the legal market to improve productivity and transform the overall business and practice of law, deploying ethical and powerful generative AI solutions with a flexible, multi-model approach that prioritizes using the best model for each individual legal use case.

About the role:

Our CEMEA Cloud/SRE team is looking for an experienced DevOps Engineer to help build scalable, secure, and reliable systems. Our team specializes in cloud and DevOps technologies, with members possessing varying levels of expertise in areas such as Kubernetes, development, and database administration. Cedric emphasizes a collaborative working style and values team members who are proactive in communication and knowledge sharing.

Responsibilities

  • Build and improve CI/CD pipelines
  • Manage AWS cloud infrastructure
  • Ensure high availability, observability, and performance
  • Automate systems for efficiency and cost optimization
  • Collaborate across engineering, ops, and leadership teams

Requirements

  • Bachelor’s Degree or Advanced Diploma in Information Systems, Computer Science, Mathematics, Engineering, and a minimum of 5 years experience in a software/technology environment.
  • Certifications in AWS or Kubernetes are advantageous.
  • At least 5 years of experience in a DevOps or SRE role.
  • AWS Expertise: Preferably 5 years comprehensive experience with AWS services, including EC2, Lambda, DynamoDB, Aurora RDS PostgreSQL, and AWS OpenSearch.
  • Kubernetes Proficiency: Preferably 2 years hands-on experience with deploying, managing, and scaling applications in Kubernetes environments. Practical experience with Helm and ArgoCD.
  • Understanding of containerization concepts and tools like Docker/Podman.
  • Infrastructure as Code (IaC): Preferably 2 years experience with IaC tools like Terraform to manage and automate cloud resources effectively. Terraform is preferred.

Work in a way that works for you

We promote a healthy work/life balance across the organization. We offer an appealing working prospect for our people. With numerous wellbeing initiatives, shared parental leave, study assistance, and sabbaticals, we will help you meet your immediate responsibilities and your long-term goals.

  • Working flexible hours - flexing the times when you work in the day to help you fit everything in and work when you are most productive


Working for you

We know that your well-being and happiness are key to a long and successful career. These are some of the benefits we are delighted to offer:

  • Medical Aid
  • Retirement Plan including Risk Benefits (Disability, Critical Illness, Life Cover & Funeral Cover)
  • Modern family benefits, including adoption and surrogacy
  • Study Leave


About the Business

LexisNexis Legal & Professional provides legal, regulatory, and business information and analytics that help customers increase their productivity, improve decision-making, achieve better outcomes, and advance the rule of law worldwide. As a digital pioneer, the company was the first to bring legal and business information online with its Lexis and Nexis services.

We are committed to providing a fair and accessible hiring process. If you have a disability or other need that requires accommodation or adjustment, please let us know by completing our Applicant Request Support Form or contact 1- .

Warning: Criminals may pose as recruiters asking for money or personal information. We never request money or banking details from job applicants. Learn more about spotting and avoiding scams here .

Please read our Candidate Privacy Policy .

We are an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law.

USA Job Seekers:

EEO Know Your Rights .

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Durban, KwaZulu Natal LexisNexis

Posted 8 days ago

Job Viewed

Tap Again To Close

Job Description

About Our Team

LexisNexis Legal & Professional, which serves customers in more than 150 countries with 11,800 employees worldwide, is part of RELX , a global provider of information-based analytics and decision tools for professional and business customers. Our company has been a long-time leader in deploying AI and advanced technologies to the legal market to improve productivity and transform the overall business and practice of law, deploying ethical and powerful generative AI solutions with a flexible, multi-model approach that prioritizes using the best model from today’s top model creators for each individual legal use case.

About the role:

Our CEMEA Cloud/SRE team is looking for an experienced DevOps Engineer to help build scalable, secure, and reliable systems. Our team specializes in cloud and DevOps technologies, with members possessing varying levels of expertise in areas such as Kubernetes, development, and database administration. Cedric emphasizes a collaborative working style and values team members who are proactive in communication and knowledge sharing.

Responsibilities

  • Build and improve CI/CD pipelines.
  • Manage AWS cloud infrastructure.
  • Ensure high availability, observability, and performance.
  • Automate systems for efficiency and cost optimization.
  • Collaborate across engineering, operations, and leadership teams.

Requirements

  • Bachelor’s Degree or Advanced Diploma in Information Systems, Computer Science, Mathematics, Engineering, with a minimum of 5 years experience in a software/technology environment.
  • Certifications in AWS or Kubernetes are advantageous.
  • At least 5 years of experience in a DevOps or SRE role.
  • AWS Expertise: Preferably 5 years of comprehensive experience with AWS services, including EC2, Lambda, DynamoDB, Aurora RDS PostgreSQL, and AWS OpenSearch.
  • Kubernetes Proficiency: Preferably 2 years of hands-on experience with deploying, managing, and scaling applications in Kubernetes environments, including practical experience with Helm and ArgoCD.
  • Understanding of containerization concepts and tools like Docker/Podman.
  • Infrastructure as Code (IaC): Preferably 2 years of experience with IaC tools like Terraform to manage and automate cloud resources effectively.

Work in a way that works for you

We promote a healthy work/life balance across the organization. We offer an appealing working prospect for our people, with numerous wellbeing initiatives, shared parental leave, study assistance, and sabbaticals to help you meet both your immediate responsibilities and your long-term goals.

  • Working flexible hours — adjusting your schedule to fit your productivity peaks.


Working for you

We know that your well-being and happiness are key to a long and successful career. These are some of the benefits we are delighted to offer:

  • Medical Aid
  • Retirement Plan including Risk Benefits (Disability, Critical Illness, Life Cover & Funeral Cover)
  • Modern family benefits, including adoption and surrogacy
  • Study Leave


About the Business

LexisNexis Legal & Professional provides legal, regulatory, and business information and analytics that help customers increase productivity, improve decision-making, and achieve better outcomes worldwide. As a digital pioneer, the company was the first to bring legal and business information online with its Lexis and Nexis services.

We are committed to providing a fair and accessible hiring process. If you require accommodation or adjustments due to a disability or other need, please let us know by completing our Applicant Request Support Form or contacting 1- .

Warning: Criminals may pose as recruiters asking for money or personal information. We never request money or banking details from job applicants. Learn more about spotting and avoiding scams here .

Please read our Candidate Privacy Policy .

We are an equal opportunity employer: qualified applicants are considered without regard to race, color, creed, religion, sex, national origin, citizenship, disability, veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other legally protected characteristic.

USA Job Seekers:

EEO Know Your Rights

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Western Cape, Western Cape Robin App AS

Posted 12 days ago

Job Viewed

Tap Again To Close

Job Description

workfromhome
About Robin

Robin is on a mission torebuild the legal industry — starting withmaking contracts simple for everyone. We are a pioneer in Legal AI, built on proprietary models, licensed data, anddeeppartnerships with Anthropic and AWS. Since 2019, we’ve expanded our footprint to 4 continents and have been supporting many of the world’s most successful businesses,including GE, Pfizer, KPMG, and UBS.

What will you do as an SRE?

As an SRE at Robin AI, you'll help build and maintain our cloud infrastructure and applications that powers our cutting-edge Legal AI platform. You'll collaborate with engineering teams to establish robust monitoring, incident response, and deployment strategies that ensure high availability and reliability of our proprietary models and services, maintaining optimal SLOs for our global customer base.

Your day-to-day responsibilities:

  • You will be responsible for ensuring the Robin systems are highly available and scalable.

  • Standardise and implement observability practices in our service-based architecture through logging, traces, metrics and monitors

  • Design, deploy, and operate infrastructure to support Robin's product teams as we expand into new regions.

  • Adding automation around manual operational tasks

  • Collaborate with development team leads to optimise build, test, and deployment processes

  • Participating in and improving our on-call and incident handling processes to ensure 24/7 system reliability

Ideally, you should have the following qualifications:

  • 3+ years of experience in DevOps or Site Reliability Engineering roles

  • Proficiency in at least one backend programming language (We use Python)

  • Strong knowledge of AWS services (ECS, S3, RDS, Lambda, etc.), managed by Terraform

  • Comfortable troubleshooting across the full stack, starting from the browser, through the networking components, into the containerised applications and then onto data stores.

  • Knowledge of observability frameworks and tools (We use OpenTelemetry, Cloudwatch & DataDog)

  • Excellent problem-solving and communication skills

  • Experience with AI/ML infrastructure deployments is a plus

What’s in it for you

  • Salary : Competitive

  • Hybrid schedule: We offer a flexible working schedule. #LI-HYBRID

  • Equity package: Generous equity scheme - everyone gets to be an owner of Robin AI!

  • Annual leave: 20 days PTO, in addition to the public holidays observed in South Africa.

  • Growth opportunities: We prioritise promotions for high performers and help you to progress your career.

What’s it like working at Robin?

Our culture and values attract people who are creative, resourceful, and share our passion for excellence. At Robin, you're encouraged to push yourself and empowered to take risks. We support each other to think big, try new ideas, and navigate uncertainty. Whether you're at our headquarters or one of our worldwide offices, you'll find a world of opportunities to grow, thrive, and make a meaningful impact. See what life is like at Robin .

Diversity, Equity and Inclusion at Robin

We are committed to building one of the most diverse technology companies in the world. As of 2024, more than 30% of our employees come from ethnic minority backgrounds, and 51% of roles are held by women. We know that transforming the legal industry requires diverse perspectives, so we're creating an environment where innovation thrives through inclusion.

Robin operates a direct hiring model and any speculative CVs shared via agencies will be treated as a gift.

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Gauteng, Gauteng Asuer (Pty)

Posted 14 days ago

Job Viewed

Tap Again To Close

Job Description

workfromhome

Location: North Riding, Johannesburg, South Africa

Type: Full-time

Office: Hybrid, 3 days in office a week

ABOUT YOU

We are looking for.


A committed and capable Site Reliability Engineer (SRE) to take ownership of the uptime, performance, and scalability of our production and development systems. You will be responsible for managing the hosting environments of our ERP, customer platforms, internal applications, databases, and websites, ensuring they are secure, available, and optimised across all stages of deployment. This position is based in Johannesburg, offers a competitive salary, and provides an opportunity to build the foundations of infrastructure excellence for one of South Africa’s most promising fintech ventures.

What you'll get to do and why we need you.


As a Site Reliability Engineer, you will be the guardian of our technical stability and infrastructure performance. You will manage and optimise hosting environments across production and development instances, covering platforms like Odoo ERP, WhatsApp chatbot systems, APIs, internal tools, external facing websites and reporting databases. Your work ensures that the systems powering over 50 000 Sales Force members and thousands of end users remain resilient, scalable, and secure.

You will collaborate with engineers, product managers, and business teams to design infrastructure strategies, improve observability, manage deployments, respond to incidents, and drive continuous improvement. This is a rare opportunity to shape the infrastructure blueprint of a high growth, impact focused business from the ground up.

Infrastructure Management Security & Uptime Automation & CI/CD Collaboration with Engineers

ABOUT US

Who we are and what we do.

Asuer is a fintech company committed to making life simpler and more secure for African communities through innovative financial and technology solutions. We operate across insurance and telecommunications, with plans to expand into digital payments. Our focus is on removing barriers and helping people achieve their goals.

Born from the ongoing digital transformation of Botle Buhle Brands (BBB), one of Africa’s leading direct-selling businesses, Asuer has grown into an independent company centred on financial inclusion and accessible technology. Everything we build is guided by our core values: Impact, Innovation, and Integrity.

  • Managing and monitoring the infrastructure of our ERP systems, applications, APIs, and databases.
  • Ensuring high availability and scalability of production environments and development pipelines.
  • Administering cloud environments including deployments, rollbacks, and updates.
  • Establishing and maintaining CI CD workflows for rapid and safe deployments.
  • Setting up monitoring, logging, and alerting systems to track system health and performance.
  • Investigating and resolving production incidents in a timely and thorough manner.
  • Implementing backup, recovery, and failover processes to ensure data integrity.
  • Improving observability and reporting across environments and services.
  • Hardening infrastructure security and enforcing access controls and best practices.
  • Supporting development teams with staging, test, and release environments.
  • Automating routine tasks to improve system efficiency and reduce human error.
Our requirements include. Technical skills in:
  • Experience managing Linux based production environments preferably on Ubuntu
  • Strong proficiency in cloud hosting platforms such as AWS or Google Cloud
  • Solid understanding of containerisation using Docker and orchestration tools
  • Experience with CI CD tools and pipeline automation
  • Familiarity with infrastructure as code tools such as Terraform or Ansible
  • Comfortable working with PostgreSQL and database administration best practices
  • Networking, DNS, and load balancing
  • Monitoring and alerting using tools like Grafana, Prometheus, or cloud native solutions
  • Understanding of secure deployment practices including firewalls, SSL, and API rate limiting
Mustbe able to:
  • Set up and manage reliable and scalable hosting environments
  • Diagnose and resolve incidents efficiently with minimal downtime
  • Collaborate with software teams to enable faster and safer deployments
  • Document infrastructure processes and maintain infrastructure knowledge bases
  • Implement DevOps and SRE practices tailored to a fast moving startup context
  • Build processes that are robust and scale as the company grows
  • Balance performance, security, and simplicity in all infrastructure decisions
Knowledge & experience:
  • Odoo hosting and maintenance workflows
  • Hosting ERP systems, databases, and API driven platforms
  • Securing web infrastructure and access credentials
  • Optimising costs and performance in cloud environments
  • Scripting and automation using Bash, Python, or similar
  • Logging and system observability tools
  • Fast recovery planning and disaster mitigation
Prerequisites:
  • A tertiary qualification in Computer Science, Information Technology, or a related field
  • Minimum of 3 years of experience in a systems administration, DevOps, or SRE role
  • Strong problem solving, troubleshooting, and communication skills
  • Proficiency in English reading, writing, and speaking

A BIT MORE ABOUT US

What we offer.

At Asuer, you’ll join a mission with real meaning, where your work empowers thousands of people across Africa. You’ll collaborate with smart, curious teammates who move fast and build with purpose, without the drag of legacy systems. We offer competitive pay, a flexible environment, and the autonomy to shape systems from the ground up. This is a place for real growth, where you scale products that matter and make a tangible impact every day.

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.
Be The First To Know

About the latest Site reliability Jobs in South Africa !

Site Reliability Engineer

Zepz

Posted 14 days ago

Job Viewed

Tap Again To Close

Job Description

workfromhome

Zepz powers two leading global remittance brands, WorldRemit and Sendwave, to build the next generation of cross-border payments. Serving over 9 million customers across 4,000 corridors, Zepz is transforming how money moves across borders by making it faster, safer and more convenient. Its innovative digital solutions are designed to break down financial barriers and expand access to better financial tools. Zepz operates across a broad global footprint, connecting the global north and south and enabling migrants to support loved ones, fuel local economies and build better futures.

  • We act like owners - We are relentlessly delivering for our users and spending money thoughtfully.
  • We embrace embarrassing honesty - We function best when we're open and honest with one another — especially about our challenges and doubts.
  • We have a bias to action - We get to first outcomes quickly, iterate and learn.
  • We strive to be better - We may make mistakes, but always learn from them.
  • We are inclusive - to better reflect and serve our users.
About the role

Working in the Site Reliability Engineering team, you’ll be helping ensure the stability, resilience and scale of our services through automation, observability and infrastructure engineering. The work is varied; from helping engineering teams deploy monitoring, to designing and implementing new SRE tools and techniques, our team is proactive and always involved.We are a fast moving team operating in a growing Fintech company, supporting engineers on three continents.We use a modern DevOps and SRE tech stack –Github Actions, K8s, ArgoCD, Grafana, AWS, Terraform, and Agile working practices to get the job done.As a member of Zepz’s SRE team you will aim high, embrace challenges and always do what’s right; acting with integrity and building trust as you contribute to the company’s technical direction and long term decision making.

Reporting to the SRE Manager you will:
  • Use code to solve problems. configuration, infrastructure, tooling, and automation, everything must be solved by writing high quality code that performs and scales.
  • Using best practices and standards in regards to Observability, Monitoring, Alerting, Capacity Planning, availability, performance/latency, change, troubleshooting for all our Tech services.
  • Work closely with feature teams to ensure that services are correctly monitored, change is delivered in a safe and secure way, resilience is built into our product and our standards and best practices adopted.
  • Lead or be involved in the troubleshooting of complex incidents and problems.
  • Have visibility on end to end service to our customers and ensure their journey is stable and consistent across all the microservices and 3rd party dependencies with the observability tool you will have implemented with the Engineering teams.
  • Helping the team meet its strategic goals; to maintain the highest level of observability, maximize developer velocity while keeping our product reliable, and ensure that we can deliver the highest quality experience to our customers.
  • Growing together. You’ll review others' work and happily seek feedback on yours to ensure we build a better codebase and sharpen each other's skills.
What we’re looking for from you
  • A skilled Engineer. At least 5 years in SRE, DevOps or Engineer role with a keen interest in solving problems using automation.
  • Understand SRE and DevOps methodologies. You understand the build and deployment cycle of an application, and how to operate a resilient system.
  • A focus on observability. Observability is key to operating a truly reliable and scalable system. We are looking for engineers who can "Monitor Everything & Measure Everything", driving a culture of observability. Experience with Grafana, Loki and Prometheus.
  • Holistic view on application delivery. You understand the use of many systems; monitoring, logging, alerting, and scaling. To build a robust platform which can respond to varying demands from both external sources (traffic) and internal sources (feature team delivery) in a safe and controlled manner. You have experience supporting or developing applications written in Java, Python or node.js.
  • Systematic problem-solving approach. You should have an understanding of how to analyze, and troubleshoot large-scale distributed systems.
  • Happy in the Clouds. Our Cloud Native platform is hosted on AWS. You’ll be comfortable working with a system that supports users from around the world, at scale.
  • Bias for action. You see a problem, you fix a problem. You get buy-in for your solutions and keep tickets moving. We’re always looking for ways to ship at pace.
  • Growth mindset. A willingness to use your skills and experience to mentor less-experienced engineers. A desire to learn from others and make yourself better every day.
  • Agile outlook. You need to be excited about working in a fast-changing environment. Products, tools, frameworks and processes change, we evolve and take the best bits with us. The teams drive the evolution.
  • Disciplined and self managed. You need to own your role and be disciplined about adhering to protocols and processes. As a senior you will always ensure you are bringing value to the team and driving tasks to completion without being actively managed.
  • Have experience working in a FinTech space
  • Have experience working in a distributed team across different geographies and timezones

What you’ll get from us

Please note that the benefits below will apply to permanent roles.

We have five core benefits for our talent in the US, UK, Philippines, Poland, and South Africa. specifically:

  • Unlimited Annual Leave: Feel free to make the most of your time off and maintain a healthy work-life balance!
  • Private Medical Cover: You can opt-in to a Private Medical Insurance scheme. This provides you with access to thorough medical coverage, so you can feel confident in your health and well-being.
  • Retirement: We offer pension schemes to help you plan for and secure your future.
  • Life Assurance: Life assurance is available to give you peace of mind and protect your loved ones in case of the unexpected.
  • Parental Leave: We offer competitive parental leave schemes to ensure you are spending as much quality time with your new bundle of joy as possible.

We are also remote-first as an organisation, offering flexibility for you to work where you need to be most productive. In addition to the above, you will discover that we have a range of secondary perks (such as the cycle-to-work scheme and employee discounts) depending on your location, to help you thrive at Zepz!

Why choose Zepz?
  • Our team of over 1,000 employees is fully distributed across the world. We are working from coffee shops, homes, and co-working spaces — making us one of the larger fully distributed growth-stage startups in the world but we also offer workspace in our talent cluster locations - spaces we can meet, collaborate and connect.
  • We are proud parents, community organizers, farmers, band members, yoga teachers, YouTube influencers, former Olympians, and serial entrepreneurs.
  • We collectively speak over twenty languages, including Akuapem, Amharic, Bengali, Ewe, Fante, Ga, Igbo, Kalenjin, Luganda, Oromo, Somali, Swahili, Wolof, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish and Swedish.
  • At Zepz, embodying our commitments binds us together. We are collectively passionate about striving to achieve our vision and purpose - to continue to provide the best service to our users.
Ready to Apply?

Applications will be reviewed on a rolling basis. If interested, please submit your resume along with a cover letter (optional), highlighting why your experience demonstrates you meet the requirements of the role. Please also indicate the countries in which you have work authorization.

Confidence can sometimes hold us back from applying for a job. But we'll let you in on a secret: there's no such thing as a 'perfect' candidate. Zepz is a place where everyone can thrive.

So however you identify and whatever background you bring with you, and if at all you might need any form of support to make the process as comfortable as possible, please let us know and give us a shot by applying. We want you to be excited to wake up to make an impact every day.

Create a Job Alert

Interested in building your career at Zepz? Get future opportunities sent straight to your email.

Apply for this job

indicates a required field

First Name *

Last Name *

Email *

Phone *

Location (City) *

Resume/CV *

Enter manually

Accepted file types: pdf, doc, docx, txt, rtf

Share your LinkedIn profile *

Are you based in South Africa? * Select.

Will you need a visa to work in South Africa? * Select.

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Durban, KwaZulu Natal RELX

Posted 14 days ago

Job Viewed

Tap Again To Close

Job Description

About our Team

LexisNexis Legal & Professional, which serves customers in more than 150 countries with 11,800 employees worldwide, is part of RELX , a global provider of information-based analytics and decision tools for professional and business customers. Our company has been a long-time leader in deploying AI and advanced technologies to the legal market to improve productivity and transform the overall business and practice of law, deploying ethical and powerful generative AI solutions with a flexible, multi-model approach that prioritizes using the best model for each individual legal use case.

About the role:

Our CEMEA Cloud/SRE team is looking for an experienced DevOps Engineer to help build scalable, secure, and reliable systems. Our team specializes in cloud and DevOps technologies, with members possessing varying levels of expertise in areas such as Kubernetes, development, and database administration. Cedric emphasizes a collaborative working style and values team members who are proactive in communication and knowledge sharing.

Responsibilities

  • Build and improve CI/CD pipelines
  • Manage AWS cloud infrastructure
  • Ensure high availability, observability, and performance
  • Automate systems for efficiency and cost optimization
  • Collaborate across engineering, ops, and leadership teams

Requirements

  • Bachelor’s Degree or Advanced Diploma in Information Systems, Computer Science, Mathematics, Engineering, and a minimum of 5 years experience in a software/technology environment.
  • Certifications in AWS or Kubernetes are advantageous.
  • At least 5 years of experience in a DevOps or SRE role.
  • AWS Expertise: Preferably 5 years comprehensive experience with AWS services, including EC2, Lambda, DynamoDB, Aurora RDS PostgreSQL, and AWS OpenSearch.
  • Kubernetes Proficiency: Preferably 2 years hands-on experience with deploying, managing, and scaling applications in Kubernetes environments. Practical experience with Helm and ArgoCD.
  • Understanding of containerization concepts and tools like Docker/Podman.
  • Infrastructure as Code (IaC): Preferably 2 years experience with IaC tools like Terraform to manage and automate cloud resources effectively. Terraform is preferred.

Work in a way that works for you

We promote a healthy work/life balance across the organization. We offer an appealing working prospect for our people. With numerous wellbeing initiatives, shared parental leave, study assistance, and sabbaticals, we will help you meet your immediate responsibilities and your long-term goals.

  • Working flexible hours - flexing the times when you work in the day to help you fit everything in and work when you are most productive


Working for you

We know that your well-being and happiness are key to a long and successful career. These are some of the benefits we are delighted to offer:

  • Medical Aid
  • Retirement Plan inclusive of Risk Benefits (Disability, Critical Illness, Life Cover & Funeral Cover)
  • Modern family benefits, including adoption and surrogacy
  • Study Leave


About the Business

LexisNexis Legal & Professional provides legal, regulatory, and business information and analytics that help customers increase their productivity, improve decision-making, achieve better outcomes, and advance the rule of law around the world. As a digital pioneer, the company was the first to bring legal and business information online with its Lexis and Nexis services.

We are committed to providing a fair and accessible hiring process. If you have a disability or other need that requires accommodation or adjustment, please let us know by completing our Applicant Request Support Form or contact 1- .

Criminals may pose as recruiters asking for money or personal information. We never request money or banking details from job applicants. Learn more about spotting and avoiding scams here .

Please read our Candidate Privacy Policy .

We are an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law.

USA Job Seekers:

EEO Know Your Rights .

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Johannesburg, Gauteng RELX

Posted 14 days ago

Job Viewed

Tap Again To Close

Job Description

AboutourTeam

LexisNexisLegal&Professional,whichservescustomersinmorethan150countrieswith11,800employeesworldwide,ispartof RELX ,aglobalproviderofinformation basedanalyticsanddecisiontoolsforprofessionalandbusinesscustomers.Ourcompanyhasbeenalong-timeleaderindeployingAIandadvancedtechnologiestothelegalmarkettoimproveproductivityandtransformtheoverallbusinessandpracticeoflaw,deployingethicalandpowerfulgenerativeAIsolutionswithaflexible,multi-modelapproachthatprioritizesusingthebestmodelfromtoday’stopmodelcreatorsforeachindividuallegalusecase.

About the role:

Our CEMEA Cloud/SRE team is looking for an experienced DevOps Engineer to help build scalable, secure, and reliable systems.Our team specializes in cloud and DevOps technologies, with members possessing varying levels of expertise in areas such as Kubernetes, development, and database administration. Cedric emphasizes a collaborative working style and values team members who are proactive in communication and knowledge sharing.

Responsibilities

  • Build and improve CI/CD pipelines
  • Manage AWS cloud infrastructure
  • Ensure high availability, observability, and performance
  • Automate systems for efficiency and cost optimization
  • Collaborate across engineering, ops, and leadership teams

Requirements

  • Bachelor’s Degree or Advanced Diploma in Information Systems, Computer Science, Mathematics, Engineering and a minimum of 5 years experience in a software/technology environment is required.
  • Bachelor’s Degree or Advanced Diploma in Information Systems, Computer Science, Mathematics, Engineering and a minimum of 5 years experience in a software/technology environment is required.
  • Certifications in AWS or Kubernetes are advantageous.
  • At least 5 years of experience in a DevOps or SRE role.
  • AWS Expertise: Preferably 5 years comprehensive experience with AWS services, including EC2, Lambda, DynamoDB, and Aurora RDS PostgreSQL and AWS OpenSearch.
  • Kubernetes Proficiency: Preferably 2 years hands-on experience with deploying, managing, and scaling applications in Kubernetes environments. Practical experience with Helm and ArgoCD
  • Understanding of containerization concepts and tools like Docker/Podman.
  • Infrastructure as Code (IaC): Preferably 2 years experience with IaC tools like Terraform to manage and automate cloud resources effectively. Terraform is preferential.

Work in a way that works for you

We promote a healthy work/life balance across the organization. We offer an appealing working prospect for our people. With numerous wellbeing initiatives, shared parental leave, study assistance and sabbaticals, we will help you meet your immediate responsibilities and your long-term goals.

  • Working flexible hours - flexing the times when you work in the day to help you fit everything in and work when you are the most productive


Working for you

We know that your well-being and happiness are key to a long and successful career. These are some of the benefits we are delighted to offer:

  • Medical Aid

  • Retirement Plan inclusive of Risk Benefits (Disability, Critical Illness, Life Cover & Funeral Cover)

  • Modern family benefits, including adoption and surrogacy

  • Study Leave


About the Business

LexisNexis Legal & Professional provides legal, regulatory, and business information and analytics that help customers increase their productivity, improve decision-making, achieve better outcomes, and advance the rule of law around the world. As a digital pioneer, the company was the first to bring legal and business information online with its Lexis and Nexis services.

We are committed to providing a fair and accessible hiring process. If you have a disability or other need that requires accommodation or adjustment, please let us know by completing our Applicant Request Support Form or please contact 1- .

Criminals may pose as recruiters asking for money or personal information. We never request money or banking details from job applicants. Learn more about spotting and avoiding scams here .

Please read our Candidate Privacy Policy .

We are an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law.

USA Job Seekers:

EEO Know Your Rights .

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.
 

Nearby Locations

Other Jobs Near Me

Industry

  1. request_quote Accounting
  2. work Administrative
  3. eco Agriculture Forestry
  4. smart_toy AI & Emerging Technologies
  5. school Apprenticeships & Trainee
  6. apartment Architecture
  7. palette Arts & Entertainment
  8. directions_car Automotive
  9. flight_takeoff Aviation
  10. account_balance Banking & Finance
  11. local_florist Beauty & Wellness
  12. restaurant Catering
  13. volunteer_activism Charity & Voluntary
  14. science Chemical Engineering
  15. child_friendly Childcare
  16. foundation Civil Engineering
  17. clean_hands Cleaning & Sanitation
  18. diversity_3 Community & Social Care
  19. construction Construction
  20. brush Creative & Digital
  21. currency_bitcoin Crypto & Blockchain
  22. support_agent Customer Service & Helpdesk
  23. medical_services Dental
  24. medical_services Driving & Transport
  25. medical_services E Commerce & Social Media
  26. school Education & Teaching
  27. electrical_services Electrical Engineering
  28. bolt Energy
  29. local_mall Fmcg
  30. gavel Government & Non Profit
  31. emoji_events Graduate
  32. health_and_safety Healthcare
  33. beach_access Hospitality & Tourism
  34. groups Human Resources
  35. precision_manufacturing Industrial Engineering
  36. security Information Security
  37. handyman Installation & Maintenance
  38. policy Insurance
  39. code IT & Software
  40. gavel Legal
  41. sports_soccer Leisure & Sports
  42. inventory_2 Logistics & Warehousing
  43. supervisor_account Management
  44. supervisor_account Management Consultancy
  45. supervisor_account Manufacturing & Production
  46. campaign Marketing
  47. build Mechanical Engineering
  48. perm_media Media & PR
  49. local_hospital Medical
  50. local_hospital Military & Public Safety
  51. local_hospital Mining
  52. medical_services Nursing
  53. local_gas_station Oil & Gas
  54. biotech Pharmaceutical
  55. checklist_rtl Project Management
  56. shopping_bag Purchasing
  57. home_work Real Estate
  58. person_search Recruitment Consultancy
  59. store Retail
  60. point_of_sale Sales
  61. science Scientific Research & Development
  62. wifi Telecoms
  63. psychology Therapy
  64. pets Veterinary
View All Site Reliability Jobs