610 Site Reliability jobs in South Africa
Site Reliability Engineer
Posted today
Job Viewed
Job Description
Are you a Site Reliability Engineer with solid Datadog experience? Our client in the Warehousing and Logistics sector is looking to employ an Engineer to Support the design, implementation, and optimization of Datadog monitoring solutions across infrastructure, applications, and services.
Qualifications and Experience:
- Datadog Certified Fundamentals – Must have
- Degree in Information Technology or Computer Science
- Management of operations on virtualized and distributed infrastructures,
- Management of operations on environment with clustering, replication, load balancer
- ITIL Practitioner (V3) / ITIL Specialist (V4)
- Windows Server: Advantage
- 1–3 years of experience working with a modern monitoring/observability tool, ideally Datadog (or alternatives like Prometheus, Grafana, New Relic, or Dynatrace).
Experience in:
- Deploying and configuring monitoring agents
- Creating dashboards and monitors
- Parameterizing tags and labels for proper data correlation
- Basic familiarity with cloud platforms (AWS, Azure, or GCP) and container environments (Docker/Kubernetes)
- Experience working with Centreon - Advantage
- Strong interest in monitoring, DevOps, SRE, or cloud infrastructure
- Knowledge of basic scripting (e.g., Bash, Python) is a plus
Duties:
- Support the design, implementation, and optimization of Datadog monitoring solutions across infrastructure, applications, and services.
- Work alongside DevOps, infrastructure, and application teams to ensure complete observability using custom dashboards, alerts, and tagging strategies.
- Assist in the deployment and onboarding of new systems into the monitoring ecosystem.
- Serve as the go-to person for building visualizations, improving signal-to-noise ratios in alerting, and aligning monitoring with business objectives.
- Ideal for a young and motivated engineer looking to grow within observability and cloud-native monitoring.
- Deploy and configure Datadog agents across various environments (cloud and on-prem).
- Create and customize dashboards, monitors, and alerts for systems, services, containers, and applications.
- Implement tagging strategies to organize, filter, and correlate metrics and logs effectively.
- Integrate Datadog with various platforms (AWS, Azure, GCP, Kubernetes, Docker, etc.) to collect telemetry data.
- Collaborate with developers, DevOps, and infrastructure teams to identify key business and system metrics to monitor.
- Continuously tune and optimize monitors to reduce false positives and improve actionable alerting.
- Document dashboards, alert logic, best practices, and knowledge for cross-team enablement.
- Analyze incidents and outages post-mortem to identify monitoring gaps and enhance visibility.
- Assist in evangelizing observability practices within the organization and contribute to monitoring as code efforts (e.g., Terraform for Datadog resources).
- Stay up to date with new Datadog features and industry trends in observability and monitoring.
Site Reliability Engineer
Posted today
Job Viewed
Job Description
Robin AI City of Cape Town, Western Cape, South Africa
Join or sign in to find your next jobJoin to apply for the Site Reliability Engineer role at Robin AI
Robin AI City of Cape Town, Western Cape, South Africa
Join to apply for the Site Reliability Engineer role at Robin AI
About Robin
Robin is on a mission to rebuild the legal industry — starting with making contracts simple for everyone. We are a pioneer in Legal AI, built on proprietary models, licensed data, and deep partnerships with Anthropic and AWS. Since 2019, we’ve expanded our footprint to 4 continents and have been supporting many of the world’s most successful businesses, including GE, Pfizer, KPMG, and UBS.
About Robin
Robin is on a mission to rebuild the legal industry — starting with making contracts simple for everyone. We are a pioneer in Legal AI, built on proprietary models, licensed data, and deep partnerships with Anthropic and AWS. Since 2019, we’ve expanded our footprint to 4 continents and have been supporting many of the world’s most successful businesses, including GE, Pfizer, KPMG, and UBS.
What will you do as an SRE?
As an SRE at Robin AI, you'll help build and maintain our cloud infrastructure and applications that powers our cutting-edge Legal AI platform. You'll collaborate with engineering teams to establish robust monitoring, incident response, and deployment strategies that ensure high availability and reliability of our proprietary models and services, maintaining optimal SLOs for our global customer base.
Your Day-to-day Responsibilities
- You will be responsible for ensuring the Robin systems are highly available and scalable.
- Standardise and implement observability practices in our service-based architecture through logging, traces, metrics and monitors
- Design, deploy, and operate infrastructure to support Robin's product teams as we expand into new regions.
- Adding automation around manual operational tasks
- Collaborate with development team leads to optimise build, test, and deployment processes
- Participating in and improving our on-call and incident handling processes to ensure 24/7 system reliability
- 3+ years of experience in DevOps or Site Reliability Engineering roles
- Proficiency in at least one backend programming language (We use Python)
- Strong knowledge of AWS services (ECS, S3, RDS, Lambda, etc.), managed by Terraform
- Comfortable troubleshooting across the full stack, starting from the browser, through the networking components, into the containerised applications and then onto data stores.
- Knowledge of observability frameworks and tools (We use OpenTelemetry, Cloudwatch & DataDog)
- Excellent problem-solving and communication skills
- Experience with AI/ML infrastructure deployments is a plus
- Salary: Competitive
- Hybrid schedule: We offer a flexible working schedule.
- Equity package: Generous equity scheme - everyone gets to be an owner of Robin AI!
- Annual leave: 20 days PTO, in addition to the public holidays observed in South Africa.
- Growth opportunities: We prioritise promotions for high performers and help you to progress your career.
Our culture and values attract people who are creative, resourceful, and share our passion for excellence. At Robin, you're encouraged to push yourself and empowered to take risks. We support each other to think big, try new ideas, and navigate uncertainty. Whether you're at our headquarters or one of our worldwide offices, you'll find a world of opportunities to grow, thrive, and make a meaningful impact. See what life is like at Robin.
Diversity, Equity and Inclusion at Robin
We are committed to building one of the most diverse technology companies in the world. As of 2024, more than 30% of our employees come from ethnic minority backgrounds, and 51% of roles are held by women. We know that transforming the legal industry requires diverse perspectives, so we're creating an environment where innovation thrives through inclusion.
Robin operates a direct hiring model and any speculative CVs shared via agencies will be treated as a gift.
Seniority level
- Seniority level Mid-Senior level
- Employment type Full-time
- Job function Engineering and Information Technology
- Industries Software Development
Referrals increase your chances of interviewing at Robin AI by 2x
Get notified about new Site Reliability Engineer jobs in City of Cape Town, Western Cape, South Africa .
Somerset West, Western Cape, South Africa 2 weeks ago
I want to work at Lula sometime in the future!Cape Town, Western Cape, South Africa 2 months ago
Cape Town, Western Cape, South Africa 2 months ago
Cape Town, Western Cape, South Africa 18 hours ago
Cape Town, Western Cape, South Africa 5 months ago
Cape Town, Western Cape, South Africa 3 weeks ago
Software Engineer (Python) - Supply ChainCape Town, Western Cape, South Africa 1 month ago
Software Engineer (Python) - Supply ChainCape Town, Western Cape, South Africa 1 month ago
Cape Town, Western Cape, South Africa 2 weeks ago
Cape Town, Western Cape, South Africa 1 month ago
Cape Town, Western Cape, South Africa 2 days ago
Software Engineer - 85 Percent Average in MatricCape Town, Western Cape, South Africa 1 month ago
Cape Town, Western Cape, South Africa 5 days ago
Cape Town, Western Cape, South Africa 2 days ago
Cape Town, Western Cape, South Africa 8 hours ago
Cape Town, Western Cape, South Africa 2 weeks ago
Cape Town, Western Cape, South Africa 3 weeks ago
Cape Town, Western Cape, South Africa 2 weeks ago
Cape Town, Western Cape, South Africa 1 day ago
Cape Town, Western Cape, South Africa 2 weeks ago
Cape Town, Western Cape, South Africa 1 month ago
Cape Town, Western Cape, South Africa 1 month ago
Cape Town, Western Cape, South Africa 3 months ago
Cape Town, Western Cape, South Africa 1 month ago
Cape Town, Western Cape, South Africa 2 weeks ago
Cape Town, Western Cape, South Africa 3 weeks ago
City of Cape Town, Western Cape, South Africa 1 week ago
Cape Town, Western Cape, South Africa 3 weeks ago
Cape Town, Western Cape, South Africa 2 weeks ago
Cape Town, Western Cape, South Africa ZAR65,000.00-ZAR80,000.00 1 month ago
City of Cape Town, Western Cape, South Africa 1 week ago
Cape Town, Western Cape, South Africa 9 hours ago
City of Cape Town, Western Cape, South Africa 5 days ago
Cape Town, Western Cape, South Africa 2 months ago
Cape Town, Western Cape, South Africa ZAR65,000.00-ZAR80,000.00 4 months ago
CLOUD ENGINEER (Data Ops and Analytical Workflows)City of Cape Town, Western Cape, South Africa 1 month ago
Cape Town, Western Cape, South Africa 1 week ago
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-LjbffrSite Reliability Engineer
Posted 6 days ago
Job Viewed
Job Description
About our Team
LexisNexis Legal & Professional, which serves customers in more than 150 countries with 11,800 employees worldwide, is part of RELX , a global provider of information-based analytics and decision tools for professional and business customers. Our company has been a long-time leader in deploying AI and advanced technologies to the legal market to improve productivity and transform the overall business and practice of law, deploying ethical and powerful generative AI solutions with a flexible, multi-model approach that prioritizes using the best model for each individual legal use case.
About the role:
Our CEMEA Cloud/SRE team is looking for an experienced DevOps Engineer to help build scalable, secure, and reliable systems. Our team specializes in cloud and DevOps technologies, with members possessing varying levels of expertise in areas such as Kubernetes, development, and database administration. Cedric emphasizes a collaborative working style and values team members who are proactive in communication and knowledge sharing.
Responsibilities
- Build and improve CI/CD pipelines
- Manage AWS cloud infrastructure
- Ensure high availability, observability, and performance
- Automate systems for efficiency and cost optimization
- Collaborate across engineering, ops, and leadership teams
Requirements
- Bachelor’s Degree or Advanced Diploma in Information Systems, Computer Science, Mathematics, Engineering, with a minimum of 5 years experience in a software/technology environment.
- Certifications in AWS or Kubernetes are advantageous.
- At least 5 years of experience in a DevOps or SRE role.
- AWS Expertise: Preferably 5 years of comprehensive experience with AWS services, including EC2, Lambda, DynamoDB, Aurora RDS PostgreSQL, and AWS OpenSearch.
- Kubernetes Proficiency: Preferably 2 years of hands-on experience with deploying, managing, and scaling applications in Kubernetes environments. Practical experience with Helm and ArgoCD.
- Understanding of containerization concepts and tools like Docker/Podman.
- Infrastructure as Code (IaC): Preferably 2 years of experience with IaC tools like Terraform to manage and automate cloud resources effectively. Terraform is preferred.
Work in a way that works for you
We promote a healthy work/life balance across the organization. We offer numerous wellbeing initiatives, shared parental leave, study assistance, and sabbaticals to help you meet your immediate responsibilities and long-term goals.
- Working flexible hours — adjusting your work times to fit your productivity peaks.
Working for you
We value your well-being and happiness. Benefits include:
- Medical Aid
- Retirement Plan including Risk Benefits (Disability, Critical Illness, Life, and Funeral Cover)
- Modern family benefits, including adoption and surrogacy
- Study Leave
About the Business
LexisNexis Legal & Professional provides legal, regulatory, and business information and analytics that help customers increase productivity, improve decision-making, achieve better outcomes, and promote the rule of law globally. As a pioneer in digital services, the company was the first to bring legal and business information online with Lexis and Nexis.
We are committed to a fair and accessible hiring process. If you require accommodation or adjustments, please complete our Applicant Request Support Form or contact 1- .
Warning: Be aware of scams where criminals pose as recruiters asking for money or personal information. We never request money or banking details from applicants. Learn more about spotting and avoiding scams here .
Please read our Candidate Privacy Policy .
We are an equal opportunity employer: qualified applicants will be considered and treated without regard to race, color, creed, religion, sex, national origin, citizenship status, disability, veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other protected characteristic.
USA Job Seekers:
#J-18808-LjbffrSite Reliability Engineer
Posted 6 days ago
Job Viewed
Job Description
About our Team
LexisNexis Legal & Professional, which serves customers in more than 150 countries with 11,800 employees worldwide, is part of RELX , a global provider of information-based analytics and decision tools for professional and business customers. Our company has been a long-time leader in deploying AI and advanced technologies to the legal market to improve productivity and transform the overall business and practice of law, deploying ethical and powerful generative AI solutions with a flexible, multi-model approach that prioritizes using the best model for each individual legal use case.
About the role:
Our CEMEA Cloud/SRE team is looking for an experienced DevOps Engineer to help build scalable, secure, and reliable systems. Our team specializes in cloud and DevOps technologies, with members possessing varying levels of expertise in areas such as Kubernetes, development, and database administration. Cedric emphasizes a collaborative working style and values team members who are proactive in communication and knowledge sharing.
Responsibilities
- Build and improve CI/CD pipelines
- Manage AWS cloud infrastructure
- Ensure high availability, observability, and performance
- Automate systems for efficiency and cost optimization
- Collaborate across engineering, ops, and leadership teams
Requirements
- Bachelor’s Degree or Advanced Diploma in Information Systems, Computer Science, Mathematics, Engineering, and a minimum of 5 years experience in a software/technology environment.
- Certifications in AWS or Kubernetes are advantageous.
- At least 5 years of experience in a DevOps or SRE role.
- AWS Expertise: Preferably 5 years comprehensive experience with AWS services, including EC2, Lambda, DynamoDB, Aurora RDS PostgreSQL, and AWS OpenSearch.
- Kubernetes Proficiency: Preferably 2 years hands-on experience with deploying, managing, and scaling applications in Kubernetes environments. Practical experience with Helm and ArgoCD.
- Understanding of containerization concepts and tools like Docker/Podman.
- Infrastructure as Code (IaC): Preferably 2 years experience with IaC tools like Terraform to manage and automate cloud resources effectively. Terraform is preferred.
Work in a way that works for you
We promote a healthy work/life balance across the organization. We offer an appealing working prospect for our people. With numerous wellbeing initiatives, shared parental leave, study assistance, and sabbaticals, we will help you meet your immediate responsibilities and your long-term goals.
- Working flexible hours - flexing the times when you work in the day to help you fit everything in and work when you are most productive
Working for you
We know that your well-being and happiness are key to a long and successful career. These are some of the benefits we are delighted to offer:
- Medical Aid
- Retirement Plan including Risk Benefits (Disability, Critical Illness, Life Cover & Funeral Cover)
- Modern family benefits, including adoption and surrogacy
- Study Leave
About the Business
LexisNexis Legal & Professional provides legal, regulatory, and business information and analytics that help customers increase their productivity, improve decision-making, achieve better outcomes, and advance the rule of law worldwide. As a digital pioneer, the company was the first to bring legal and business information online with its Lexis and Nexis services.
We are committed to providing a fair and accessible hiring process. If you have a disability or other need that requires accommodation or adjustment, please let us know by completing our Applicant Request Support Form or contact 1- .
Warning: Criminals may pose as recruiters asking for money or personal information. We never request money or banking details from job applicants. Learn more about spotting and avoiding scams here .
Please read our Candidate Privacy Policy .
We are an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law.
USA Job Seekers:
#J-18808-LjbffrSite Reliability Engineer
Posted 8 days ago
Job Viewed
Job Description
About Our Team
LexisNexis Legal & Professional, which serves customers in more than 150 countries with 11,800 employees worldwide, is part of RELX , a global provider of information-based analytics and decision tools for professional and business customers. Our company has been a long-time leader in deploying AI and advanced technologies to the legal market to improve productivity and transform the overall business and practice of law, deploying ethical and powerful generative AI solutions with a flexible, multi-model approach that prioritizes using the best model from today’s top model creators for each individual legal use case.
About the role:
Our CEMEA Cloud/SRE team is looking for an experienced DevOps Engineer to help build scalable, secure, and reliable systems. Our team specializes in cloud and DevOps technologies, with members possessing varying levels of expertise in areas such as Kubernetes, development, and database administration. Cedric emphasizes a collaborative working style and values team members who are proactive in communication and knowledge sharing.
Responsibilities
- Build and improve CI/CD pipelines.
- Manage AWS cloud infrastructure.
- Ensure high availability, observability, and performance.
- Automate systems for efficiency and cost optimization.
- Collaborate across engineering, operations, and leadership teams.
Requirements
- Bachelor’s Degree or Advanced Diploma in Information Systems, Computer Science, Mathematics, Engineering, with a minimum of 5 years experience in a software/technology environment.
- Certifications in AWS or Kubernetes are advantageous.
- At least 5 years of experience in a DevOps or SRE role.
- AWS Expertise: Preferably 5 years of comprehensive experience with AWS services, including EC2, Lambda, DynamoDB, Aurora RDS PostgreSQL, and AWS OpenSearch.
- Kubernetes Proficiency: Preferably 2 years of hands-on experience with deploying, managing, and scaling applications in Kubernetes environments, including practical experience with Helm and ArgoCD.
- Understanding of containerization concepts and tools like Docker/Podman.
- Infrastructure as Code (IaC): Preferably 2 years of experience with IaC tools like Terraform to manage and automate cloud resources effectively.
Work in a way that works for you
We promote a healthy work/life balance across the organization. We offer an appealing working prospect for our people, with numerous wellbeing initiatives, shared parental leave, study assistance, and sabbaticals to help you meet both your immediate responsibilities and your long-term goals.
- Working flexible hours — adjusting your schedule to fit your productivity peaks.
Working for you
We know that your well-being and happiness are key to a long and successful career. These are some of the benefits we are delighted to offer:
- Medical Aid
- Retirement Plan including Risk Benefits (Disability, Critical Illness, Life Cover & Funeral Cover)
- Modern family benefits, including adoption and surrogacy
- Study Leave
About the Business
LexisNexis Legal & Professional provides legal, regulatory, and business information and analytics that help customers increase productivity, improve decision-making, and achieve better outcomes worldwide. As a digital pioneer, the company was the first to bring legal and business information online with its Lexis and Nexis services.
We are committed to providing a fair and accessible hiring process. If you require accommodation or adjustments due to a disability or other need, please let us know by completing our Applicant Request Support Form or contacting 1- .
Warning: Criminals may pose as recruiters asking for money or personal information. We never request money or banking details from job applicants. Learn more about spotting and avoiding scams here .
Please read our Candidate Privacy Policy .
We are an equal opportunity employer: qualified applicants are considered without regard to race, color, creed, religion, sex, national origin, citizenship, disability, veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other legally protected characteristic.
USA Job Seekers:
#J-18808-LjbffrSite Reliability Engineer
Posted 12 days ago
Job Viewed
Job Description
Robin is on a mission torebuild the legal industry — starting withmaking contracts simple for everyone. We are a pioneer in Legal AI, built on proprietary models, licensed data, anddeeppartnerships with Anthropic and AWS. Since 2019, we’ve expanded our footprint to 4 continents and have been supporting many of the world’s most successful businesses,including GE, Pfizer, KPMG, and UBS.
What will you do as an SRE?
As an SRE at Robin AI, you'll help build and maintain our cloud infrastructure and applications that powers our cutting-edge Legal AI platform. You'll collaborate with engineering teams to establish robust monitoring, incident response, and deployment strategies that ensure high availability and reliability of our proprietary models and services, maintaining optimal SLOs for our global customer base.
Your day-to-day responsibilities:
You will be responsible for ensuring the Robin systems are highly available and scalable.
Standardise and implement observability practices in our service-based architecture through logging, traces, metrics and monitors
Design, deploy, and operate infrastructure to support Robin's product teams as we expand into new regions.
Adding automation around manual operational tasks
Collaborate with development team leads to optimise build, test, and deployment processes
Participating in and improving our on-call and incident handling processes to ensure 24/7 system reliability
Ideally, you should have the following qualifications:
3+ years of experience in DevOps or Site Reliability Engineering roles
Proficiency in at least one backend programming language (We use Python)
Strong knowledge of AWS services (ECS, S3, RDS, Lambda, etc.), managed by Terraform
Comfortable troubleshooting across the full stack, starting from the browser, through the networking components, into the containerised applications and then onto data stores.
Knowledge of observability frameworks and tools (We use OpenTelemetry, Cloudwatch & DataDog)
Excellent problem-solving and communication skills
Experience with AI/ML infrastructure deployments is a plus
What’s in it for you
Salary : Competitive
Hybrid schedule: We offer a flexible working schedule. #LI-HYBRID
Equity package: Generous equity scheme - everyone gets to be an owner of Robin AI!
Annual leave: 20 days PTO, in addition to the public holidays observed in South Africa.
Growth opportunities: We prioritise promotions for high performers and help you to progress your career.
Our culture and values attract people who are creative, resourceful, and share our passion for excellence. At Robin, you're encouraged to push yourself and empowered to take risks. We support each other to think big, try new ideas, and navigate uncertainty. Whether you're at our headquarters or one of our worldwide offices, you'll find a world of opportunities to grow, thrive, and make a meaningful impact. See what life is like at Robin .
Diversity, Equity and Inclusion at RobinWe are committed to building one of the most diverse technology companies in the world. As of 2024, more than 30% of our employees come from ethnic minority backgrounds, and 51% of roles are held by women. We know that transforming the legal industry requires diverse perspectives, so we're creating an environment where innovation thrives through inclusion.
Robin operates a direct hiring model and any speculative CVs shared via agencies will be treated as a gift.
#J-18808-LjbffrSite Reliability Engineer
Posted 14 days ago
Job Viewed
Job Description
Location: North Riding, Johannesburg, South Africa
Type: Full-time
Office: Hybrid, 3 days in office a week
ABOUT YOU
A committed and capable Site Reliability Engineer (SRE) to take ownership of the uptime, performance, and scalability of our production and development systems. You will be responsible for managing the hosting environments of our ERP, customer platforms, internal applications, databases, and websites, ensuring they are secure, available, and optimised across all stages of deployment. This position is based in Johannesburg, offers a competitive salary, and provides an opportunity to build the foundations of infrastructure excellence for one of South Africa’s most promising fintech ventures.
As a Site Reliability Engineer, you will be the guardian of our technical stability and infrastructure performance. You will manage and optimise hosting environments across production and development instances, covering platforms like Odoo ERP, WhatsApp chatbot systems, APIs, internal tools, external facing websites and reporting databases. Your work ensures that the systems powering over 50 000 Sales Force members and thousands of end users remain resilient, scalable, and secure.
You will collaborate with engineers, product managers, and business teams to design infrastructure strategies, improve observability, manage deployments, respond to incidents, and drive continuous improvement. This is a rare opportunity to shape the infrastructure blueprint of a high growth, impact focused business from the ground up.
ABOUT US
Who we are and what we do.Asuer is a fintech company committed to making life simpler and more secure for African communities through innovative financial and technology solutions. We operate across insurance and telecommunications, with plans to expand into digital payments. Our focus is on removing barriers and helping people achieve their goals.
Born from the ongoing digital transformation of Botle Buhle Brands (BBB), one of Africa’s leading direct-selling businesses, Asuer has grown into an independent company centred on financial inclusion and accessible technology. Everything we build is guided by our core values: Impact, Innovation, and Integrity.
- Managing and monitoring the infrastructure of our ERP systems, applications, APIs, and databases.
- Ensuring high availability and scalability of production environments and development pipelines.
- Administering cloud environments including deployments, rollbacks, and updates.
- Establishing and maintaining CI CD workflows for rapid and safe deployments.
- Setting up monitoring, logging, and alerting systems to track system health and performance.
- Investigating and resolving production incidents in a timely and thorough manner.
- Implementing backup, recovery, and failover processes to ensure data integrity.
- Improving observability and reporting across environments and services.
- Hardening infrastructure security and enforcing access controls and best practices.
- Supporting development teams with staging, test, and release environments.
- Automating routine tasks to improve system efficiency and reduce human error.
- Experience managing Linux based production environments preferably on Ubuntu
- Strong proficiency in cloud hosting platforms such as AWS or Google Cloud
- Solid understanding of containerisation using Docker and orchestration tools
- Experience with CI CD tools and pipeline automation
- Familiarity with infrastructure as code tools such as Terraform or Ansible
- Comfortable working with PostgreSQL and database administration best practices
- Networking, DNS, and load balancing
- Monitoring and alerting using tools like Grafana, Prometheus, or cloud native solutions
- Understanding of secure deployment practices including firewalls, SSL, and API rate limiting
- Set up and manage reliable and scalable hosting environments
- Diagnose and resolve incidents efficiently with minimal downtime
- Collaborate with software teams to enable faster and safer deployments
- Document infrastructure processes and maintain infrastructure knowledge bases
- Implement DevOps and SRE practices tailored to a fast moving startup context
- Build processes that are robust and scale as the company grows
- Balance performance, security, and simplicity in all infrastructure decisions
- Odoo hosting and maintenance workflows
- Hosting ERP systems, databases, and API driven platforms
- Securing web infrastructure and access credentials
- Optimising costs and performance in cloud environments
- Scripting and automation using Bash, Python, or similar
- Logging and system observability tools
- Fast recovery planning and disaster mitigation
- A tertiary qualification in Computer Science, Information Technology, or a related field
- Minimum of 3 years of experience in a systems administration, DevOps, or SRE role
- Strong problem solving, troubleshooting, and communication skills
- Proficiency in English reading, writing, and speaking
A BIT MORE ABOUT US
At Asuer, you’ll join a mission with real meaning, where your work empowers thousands of people across Africa. You’ll collaborate with smart, curious teammates who move fast and build with purpose, without the drag of legacy systems. We offer competitive pay, a flexible environment, and the autonomy to shape systems from the ground up. This is a place for real growth, where you scale products that matter and make a tangible impact every day.
#J-18808-LjbffrBe The First To Know
About the latest Site reliability Jobs in South Africa !
Site Reliability Engineer
Posted 14 days ago
Job Viewed
Job Description
Zepz powers two leading global remittance brands, WorldRemit and Sendwave, to build the next generation of cross-border payments. Serving over 9 million customers across 4,000 corridors, Zepz is transforming how money moves across borders by making it faster, safer and more convenient. Its innovative digital solutions are designed to break down financial barriers and expand access to better financial tools. Zepz operates across a broad global footprint, connecting the global north and south and enabling migrants to support loved ones, fuel local economies and build better futures.
- We act like owners - We are relentlessly delivering for our users and spending money thoughtfully.
- We embrace embarrassing honesty - We function best when we're open and honest with one another — especially about our challenges and doubts.
- We have a bias to action - We get to first outcomes quickly, iterate and learn.
- We strive to be better - We may make mistakes, but always learn from them.
- We are inclusive - to better reflect and serve our users.
Working in the Site Reliability Engineering team, you’ll be helping ensure the stability, resilience and scale of our services through automation, observability and infrastructure engineering. The work is varied; from helping engineering teams deploy monitoring, to designing and implementing new SRE tools and techniques, our team is proactive and always involved.We are a fast moving team operating in a growing Fintech company, supporting engineers on three continents.We use a modern DevOps and SRE tech stack –Github Actions, K8s, ArgoCD, Grafana, AWS, Terraform, and Agile working practices to get the job done.As a member of Zepz’s SRE team you will aim high, embrace challenges and always do what’s right; acting with integrity and building trust as you contribute to the company’s technical direction and long term decision making.
Reporting to the SRE Manager you will:- Use code to solve problems. configuration, infrastructure, tooling, and automation, everything must be solved by writing high quality code that performs and scales.
- Using best practices and standards in regards to Observability, Monitoring, Alerting, Capacity Planning, availability, performance/latency, change, troubleshooting for all our Tech services.
- Work closely with feature teams to ensure that services are correctly monitored, change is delivered in a safe and secure way, resilience is built into our product and our standards and best practices adopted.
- Lead or be involved in the troubleshooting of complex incidents and problems.
- Have visibility on end to end service to our customers and ensure their journey is stable and consistent across all the microservices and 3rd party dependencies with the observability tool you will have implemented with the Engineering teams.
- Helping the team meet its strategic goals; to maintain the highest level of observability, maximize developer velocity while keeping our product reliable, and ensure that we can deliver the highest quality experience to our customers.
- Growing together. You’ll review others' work and happily seek feedback on yours to ensure we build a better codebase and sharpen each other's skills.
- A skilled Engineer. At least 5 years in SRE, DevOps or Engineer role with a keen interest in solving problems using automation.
- Understand SRE and DevOps methodologies. You understand the build and deployment cycle of an application, and how to operate a resilient system.
- A focus on observability. Observability is key to operating a truly reliable and scalable system. We are looking for engineers who can "Monitor Everything & Measure Everything", driving a culture of observability. Experience with Grafana, Loki and Prometheus.
- Holistic view on application delivery. You understand the use of many systems; monitoring, logging, alerting, and scaling. To build a robust platform which can respond to varying demands from both external sources (traffic) and internal sources (feature team delivery) in a safe and controlled manner. You have experience supporting or developing applications written in Java, Python or node.js.
- Systematic problem-solving approach. You should have an understanding of how to analyze, and troubleshoot large-scale distributed systems.
- Happy in the Clouds. Our Cloud Native platform is hosted on AWS. You’ll be comfortable working with a system that supports users from around the world, at scale.
- Bias for action. You see a problem, you fix a problem. You get buy-in for your solutions and keep tickets moving. We’re always looking for ways to ship at pace.
- Growth mindset. A willingness to use your skills and experience to mentor less-experienced engineers. A desire to learn from others and make yourself better every day.
- Agile outlook. You need to be excited about working in a fast-changing environment. Products, tools, frameworks and processes change, we evolve and take the best bits with us. The teams drive the evolution.
- Disciplined and self managed. You need to own your role and be disciplined about adhering to protocols and processes. As a senior you will always ensure you are bringing value to the team and driving tasks to completion without being actively managed.
- Have experience working in a FinTech space
- Have experience working in a distributed team across different geographies and timezones
What you’ll get from us
Please note that the benefits below will apply to permanent roles.We have five core benefits for our talent in the US, UK, Philippines, Poland, and South Africa. specifically:
- Unlimited Annual Leave: Feel free to make the most of your time off and maintain a healthy work-life balance!
- Private Medical Cover: You can opt-in to a Private Medical Insurance scheme. This provides you with access to thorough medical coverage, so you can feel confident in your health and well-being.
- Retirement: We offer pension schemes to help you plan for and secure your future.
- Life Assurance: Life assurance is available to give you peace of mind and protect your loved ones in case of the unexpected.
- Parental Leave: We offer competitive parental leave schemes to ensure you are spending as much quality time with your new bundle of joy as possible.
We are also remote-first as an organisation, offering flexibility for you to work where you need to be most productive. In addition to the above, you will discover that we have a range of secondary perks (such as the cycle-to-work scheme and employee discounts) depending on your location, to help you thrive at Zepz!
Why choose Zepz?- Our team of over 1,000 employees is fully distributed across the world. We are working from coffee shops, homes, and co-working spaces — making us one of the larger fully distributed growth-stage startups in the world but we also offer workspace in our talent cluster locations - spaces we can meet, collaborate and connect.
- We are proud parents, community organizers, farmers, band members, yoga teachers, YouTube influencers, former Olympians, and serial entrepreneurs.
- We collectively speak over twenty languages, including Akuapem, Amharic, Bengali, Ewe, Fante, Ga, Igbo, Kalenjin, Luganda, Oromo, Somali, Swahili, Wolof, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish and Swedish.
- At Zepz, embodying our commitments binds us together. We are collectively passionate about striving to achieve our vision and purpose - to continue to provide the best service to our users.
Applications will be reviewed on a rolling basis. If interested, please submit your resume along with a cover letter (optional), highlighting why your experience demonstrates you meet the requirements of the role. Please also indicate the countries in which you have work authorization.
Confidence can sometimes hold us back from applying for a job. But we'll let you in on a secret: there's no such thing as a 'perfect' candidate. Zepz is a place where everyone can thrive.
So however you identify and whatever background you bring with you, and if at all you might need any form of support to make the process as comfortable as possible, please let us know and give us a shot by applying. We want you to be excited to wake up to make an impact every day.
Create a Job Alert
Interested in building your career at Zepz? Get future opportunities sent straight to your email.
Apply for this jobindicates a required field
First Name *
Last Name *
Email *
Phone *
Location (City) *
Resume/CV *
Enter manually
Accepted file types: pdf, doc, docx, txt, rtf
Share your LinkedIn profile *
Are you based in South Africa? * Select.
Will you need a visa to work in South Africa? * Select.
#J-18808-LjbffrSite Reliability Engineer
Posted 14 days ago
Job Viewed
Job Description
About our Team
LexisNexis Legal & Professional, which serves customers in more than 150 countries with 11,800 employees worldwide, is part of RELX , a global provider of information-based analytics and decision tools for professional and business customers. Our company has been a long-time leader in deploying AI and advanced technologies to the legal market to improve productivity and transform the overall business and practice of law, deploying ethical and powerful generative AI solutions with a flexible, multi-model approach that prioritizes using the best model for each individual legal use case.
About the role:
Our CEMEA Cloud/SRE team is looking for an experienced DevOps Engineer to help build scalable, secure, and reliable systems. Our team specializes in cloud and DevOps technologies, with members possessing varying levels of expertise in areas such as Kubernetes, development, and database administration. Cedric emphasizes a collaborative working style and values team members who are proactive in communication and knowledge sharing.
Responsibilities
- Build and improve CI/CD pipelines
- Manage AWS cloud infrastructure
- Ensure high availability, observability, and performance
- Automate systems for efficiency and cost optimization
- Collaborate across engineering, ops, and leadership teams
Requirements
- Bachelor’s Degree or Advanced Diploma in Information Systems, Computer Science, Mathematics, Engineering, and a minimum of 5 years experience in a software/technology environment.
- Certifications in AWS or Kubernetes are advantageous.
- At least 5 years of experience in a DevOps or SRE role.
- AWS Expertise: Preferably 5 years comprehensive experience with AWS services, including EC2, Lambda, DynamoDB, Aurora RDS PostgreSQL, and AWS OpenSearch.
- Kubernetes Proficiency: Preferably 2 years hands-on experience with deploying, managing, and scaling applications in Kubernetes environments. Practical experience with Helm and ArgoCD.
- Understanding of containerization concepts and tools like Docker/Podman.
- Infrastructure as Code (IaC): Preferably 2 years experience with IaC tools like Terraform to manage and automate cloud resources effectively. Terraform is preferred.
Work in a way that works for you
We promote a healthy work/life balance across the organization. We offer an appealing working prospect for our people. With numerous wellbeing initiatives, shared parental leave, study assistance, and sabbaticals, we will help you meet your immediate responsibilities and your long-term goals.
- Working flexible hours - flexing the times when you work in the day to help you fit everything in and work when you are most productive
Working for you
We know that your well-being and happiness are key to a long and successful career. These are some of the benefits we are delighted to offer:
- Medical Aid
- Retirement Plan inclusive of Risk Benefits (Disability, Critical Illness, Life Cover & Funeral Cover)
- Modern family benefits, including adoption and surrogacy
- Study Leave
About the Business
LexisNexis Legal & Professional provides legal, regulatory, and business information and analytics that help customers increase their productivity, improve decision-making, achieve better outcomes, and advance the rule of law around the world. As a digital pioneer, the company was the first to bring legal and business information online with its Lexis and Nexis services.
We are committed to providing a fair and accessible hiring process. If you have a disability or other need that requires accommodation or adjustment, please let us know by completing our Applicant Request Support Form or contact 1- .
Criminals may pose as recruiters asking for money or personal information. We never request money or banking details from job applicants. Learn more about spotting and avoiding scams here .
Please read our Candidate Privacy Policy .
We are an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law.
USA Job Seekers:
#J-18808-LjbffrSite Reliability Engineer
Posted 14 days ago
Job Viewed
Job Description
AboutourTeam
LexisNexisLegal&Professional,whichservescustomersinmorethan150countrieswith11,800employeesworldwide,ispartof RELX ,aglobalproviderofinformation basedanalyticsanddecisiontoolsforprofessionalandbusinesscustomers.Ourcompanyhasbeenalong-timeleaderindeployingAIandadvancedtechnologiestothelegalmarkettoimproveproductivityandtransformtheoverallbusinessandpracticeoflaw,deployingethicalandpowerfulgenerativeAIsolutionswithaflexible,multi-modelapproachthatprioritizesusingthebestmodelfromtoday’stopmodelcreatorsforeachindividuallegalusecase.
About the role:
Our CEMEA Cloud/SRE team is looking for an experienced DevOps Engineer to help build scalable, secure, and reliable systems.Our team specializes in cloud and DevOps technologies, with members possessing varying levels of expertise in areas such as Kubernetes, development, and database administration. Cedric emphasizes a collaborative working style and values team members who are proactive in communication and knowledge sharing.
Responsibilities
- Build and improve CI/CD pipelines
- Manage AWS cloud infrastructure
- Ensure high availability, observability, and performance
- Automate systems for efficiency and cost optimization
- Collaborate across engineering, ops, and leadership teams
Requirements
- Bachelor’s Degree or Advanced Diploma in Information Systems, Computer Science, Mathematics, Engineering and a minimum of 5 years experience in a software/technology environment is required.
- Bachelor’s Degree or Advanced Diploma in Information Systems, Computer Science, Mathematics, Engineering and a minimum of 5 years experience in a software/technology environment is required.
- Certifications in AWS or Kubernetes are advantageous.
- At least 5 years of experience in a DevOps or SRE role.
- AWS Expertise: Preferably 5 years comprehensive experience with AWS services, including EC2, Lambda, DynamoDB, and Aurora RDS PostgreSQL and AWS OpenSearch.
- Kubernetes Proficiency: Preferably 2 years hands-on experience with deploying, managing, and scaling applications in Kubernetes environments. Practical experience with Helm and ArgoCD
- Understanding of containerization concepts and tools like Docker/Podman.
- Infrastructure as Code (IaC): Preferably 2 years experience with IaC tools like Terraform to manage and automate cloud resources effectively. Terraform is preferential.
Work in a way that works for you
We promote a healthy work/life balance across the organization. We offer an appealing working prospect for our people. With numerous wellbeing initiatives, shared parental leave, study assistance and sabbaticals, we will help you meet your immediate responsibilities and your long-term goals.
Working flexible hours - flexing the times when you work in the day to help you fit everything in and work when you are the most productive
Working for you
We know that your well-being and happiness are key to a long and successful career. These are some of the benefits we are delighted to offer:
Medical Aid
Retirement Plan inclusive of Risk Benefits (Disability, Critical Illness, Life Cover & Funeral Cover)
Modern family benefits, including adoption and surrogacy
Study Leave
About the Business
LexisNexis Legal & Professional provides legal, regulatory, and business information and analytics that help customers increase their productivity, improve decision-making, achieve better outcomes, and advance the rule of law around the world. As a digital pioneer, the company was the first to bring legal and business information online with its Lexis and Nexis services.
We are committed to providing a fair and accessible hiring process. If you have a disability or other need that requires accommodation or adjustment, please let us know by completing our Applicant Request Support Form or please contact 1- .
Criminals may pose as recruiters asking for money or personal information. We never request money or banking details from job applicants. Learn more about spotting and avoiding scams here .
Please read our Candidate Privacy Policy .
We are an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law.
USA Job Seekers:
#J-18808-Ljbffr