Service Reliability Engineering Head

Johannesburg, Gauteng Mindworx Consulting

Job Viewed

Tap Again To Close

Job Description

Key Responsibilities:
  • Set vision and direction for SRE teams, including the usage of platforms and tools
  • Responsible for overall operational resilience and continuous business system functionality by using a systematic approach to delivering products and services
  • Prioritize optimization, reliability, and scalability deliverables related to software delivery processes and securing user data
  • Develop and site reliability engineering is service-level objectives (SLOs), which measure how consistently a service meets its goals over a given period such as how often its available, how quickly it responds to requests made by users or customers, and establishing goals around those SLOs
  • Achieve objectives that help organizations build customer trust in their products and services since it ensures that they have stable and reliable day-to-day operations
  • Maintain uptime while meeting these requirements, SRE applies various techniques like automation and continuous deployment processes to ensure high availability
  • Contribute and help with efficiently managing incidents in the production environment
  • Oversight of all Site Reliability Engineering Activities as defined in partnership with the technology leadership teams
  • Develop the strategies and tactics that enhance automation of repeatability and predictability, increase scalability for growth that doesnt interrupt service, and ensure reliability
  • Grow out, build, and drive the engineering culture
  • Deliver service availability and performance to meet (and exceed) business and customer needs
  • Work with broader technology, operational, and business teams to continuously improve endtoend service experience and associated cost of delivery
  • Deliver continuous improvement of both core products/platform ecosystems and operational processes, tools, and capabilities based on operational experiences and application of broader industry standard methodology
Qualification:
  • Bachelor of Science with Honours in Computer and Information Science
This advertiser has chosen not to accept applicants from your region.

Job No Longer Available

This position is no longer listed on WhatJobs. The employer may be reviewing applications, filled the role, or has removed the listing.

However, we have similar jobs available for you below.

Site Reliability Engineer

Johannesburg, Gauteng Ziyasiza Consulting (Pty) Ltd

Posted 20 days ago

Job Viewed

Tap Again To Close

Job Description

Key Responsibilities:

Monitoring and Alerting: Implementing and maintaining monitoring systems to track system health and performance, alerting on symptoms rather than just outages.

Incident Response: Responding to and resolving production incidents, troubleshooting across the entire stack, and providing support for product teams.

Automation: Developing and implementing automation to streamline operational tasks, improve efficiency, and reducing manual effort.

Infrastructure Management: Managing and maintaining infrastructure, including platforms

Performance Optimization: Identifying and addressing performance bottlenecks, optimizing existing systems, and contributing to system design and capacity planning.

Collaboration: Working closely with development, operations, and other teams to ensure smooth deployments and efficient operations.

Continuous Improvement: Continuously improving systems and processes through post-incident reviews, documentation, and knowledge sharing.

Proactive Problem Solving: Identifying potential problems before they occur and developing solutions to prevent future issues.

Capacity Planning: Ensuring that systems can handle current and future demands.

Mentoring and Coaching: Sharing knowledge and providing guidance to junior engineers.

Skills and Qualifications:
  • Strong understanding of system architect, automation, and infrastructure tools.
  • Proficiency in programming languages like Python, Go, or Jave.
  • Experience with cloud platforms like AWS, Azure or GCP.
  • Familiarity with containerization technologies like Docket and Kubernetes.
  • Experience with monitoring and alerting tools like Prometheus, Grafana, or New Relic.
  • Strong problem-solving and troubleshooting skills.
  • Excellent communication and collaboration skills.
  • Ability to work independently and as part of a team.
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Randburg, Gauteng Asuer (Pty)

Posted today

Job Viewed

Tap Again To Close

Job Description

Location: North Riding, Johannesburg, South Africa

Type: Full-time

Office: Hybrid, 3 days in office a week

ABOUT YOU

We are looking for.


A committed and capable Site Reliability Engineer (SRE) to take ownership of the uptime, performance, and scalability of our production and development systems. You will be responsible for managing the hosting environments of our ERP, customer platforms, internal applications, databases, and websites, ensuring they are secure, available, and optimised across all stages of deployment. This position is based in Johannesburg, offers a competitive salary, and provides an opportunity to build the foundations of infrastructure excellence for one of South Africa’s most promising fintech ventures.

What you'll get to do and why we need you.


As a Site Reliability Engineer, you will be the guardian of our technical stability and infrastructure performance. You will manage and optimise hosting environments across production and development instances, covering platforms like Odoo ERP, WhatsApp chatbot systems, APIs, internal tools, external facing websites and reporting databases. Your work ensures that the systems powering over 50 000 Sales Force members and thousands of end users remain resilient, scalable, and secure.

You will collaborate with engineers, product managers, and business teams to design infrastructure strategies, improve observability, manage deployments, respond to incidents, and drive continuous improvement. This is a rare opportunity to shape the infrastructure blueprint of a high growth, impact focused business from the ground up. Infrastructure Management Security & Uptime Automation & CI/CD Collaboration with Engineers

ABOUT US

Who we are and what we do.

Asuer is a fintech company committed to making life simpler and more secure for African communities through innovative financial and technology solutions. We operate across insurance and telecommunications, with plans to expand into digital payments. Our focus is on removing barriers and helping people achieve their goals.

Born from the ongoing digital transformation of Botle Buhle Brands (BBB), one of Africa’s leading direct-selling businesses, Asuer has grown into an independent company centred on financial inclusion and accessible technology. Everything we build is guided by our core values: Impact, Innovation, and Integrity.

  • Managing and monitoring the infrastructure of our ERP systems, applications, APIs, and databases.
  • Ensuring high availability and scalability of production environments and development pipelines.
  • Administering cloud environments including deployments, rollbacks, and updates.
  • Establishing and maintaining CI CD workflows for rapid and safe deployments.
  • Setting up monitoring, logging, and alerting systems to track system health and performance.
  • Investigating and resolving production incidents in a timely and thorough manner.
  • Implementing backup, recovery, and failover processes to ensure data integrity.
  • Improving observability and reporting across environments and services.
  • Hardening infrastructure security and enforcing access controls and best practices.
  • Supporting development teams with staging, test, and release environments.
  • Automating routine tasks to improve system efficiency and reduce human error.
Our requirements include. Technical skills in:
  • Experience managing Linux based production environments preferably on Ubuntu
  • Strong proficiency in cloud hosting platforms such as AWS or Google Cloud
  • Solid understanding of containerisation using Docker and orchestration tools
  • Experience with CI CD tools and pipeline automation
  • Familiarity with infrastructure as code tools such as Terraform or Ansible
  • Comfortable working with PostgreSQL and database administration best practices
  • Networking, DNS, and load balancing
  • Monitoring and alerting using tools like Grafana, Prometheus, or cloud native solutions
  • Understanding of secure deployment practices including firewalls, SSL, and API rate limiting
Mustbe able to:
  • Set up and manage reliable and scalable hosting environments
  • Diagnose and resolve incidents efficiently with minimal downtime
  • Collaborate with software teams to enable faster and safer deployments
  • Document infrastructure processes and maintain infrastructure knowledge bases
  • Implement DevOps and SRE practices tailored to a fast moving startup context
  • Build processes that are robust and scale as the company grows
  • Balance performance, security, and simplicity in all infrastructure decisions
Knowledge & experience:
  • Odoo hosting and maintenance workflows
  • Hosting ERP systems, databases, and API driven platforms
  • Securing web infrastructure and access credentials
  • Optimising costs and performance in cloud environments
  • Scripting and automation using Bash, Python, or similar
  • Logging and system observability tools
  • Fast recovery planning and disaster mitigation
Prerequisites:
  • A tertiary qualification in Computer Science, Information Technology, or a related field
  • Minimum of 3 years of experience in a systems administration, DevOps, or SRE role
  • Strong problem solving, troubleshooting, and communication skills
  • Proficiency in English reading, writing, and speaking

A BIT MORE ABOUT US

What we offer.

At Asuer, you’ll join a mission with real meaning, where your work empowers thousands of people across Africa. You’ll collaborate with smart, curious teammates who move fast and build with purpose, without the drag of legacy systems. We offer competitive pay, a flexible environment, and the autonomy to shape systems from the ground up. This is a place for real growth, where you scale products that matter and make a tangible impact every day.

#J-18808-Ljbffr

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Midrand, Gauteng e-Merge IT Recruitment

Posted 8 days ago

Job Viewed

Tap Again To Close

Job Description

contract

We are looking for a proactive and detail-oriented Site Reliability Engineer to help bridge the gap between software development and IT operations, ensuring our systems are not only fast and reliable, but continuously improving.

As a Site Reliability Engineer (SRE), you’ll play a key role in ensuring the scalability, performance, and uptime of systems that support the companies global digital ecosystem.

Requirements:

  • IT Degree and/or relevant qualifications
  • 10+ years of experience in a SRE, DevOps engineer, or a similar role, preferably in a technology-driven environment
  • Strong understanding of networking fundamentals
  • Skilled with AWS
  • Proficiency in at least one programming language: Python, Go, or JavaScript/TypeScript
  • Understanding of containerization (Docker) and orchestration principles
  • Experience with monitoring and alerting systems
  • Understanding of CI/CD principles
  • Version control with Git
  • Any additional responsibilities assigned in the Agile Working Model (AWM) Charter
  • Advanced Kubernetes knowledge and certification (CKA/CKAD)
  • Experience with the complete Grafana stack (Grafana, Loki, Tempo)
  • Proficiency with GitOps tools (Flux, ArgoCD)
  • Advanced programming skills in Go or TypeScript
  • Knowledge of terraform
  • Database experience with PostgreSQL, MongoDB

Reference number for this position is GZ60580 which is a contract position based in Midrand/ Centurion/ Semi-Remote offering a cost to company salary of R750 per hour negotiable on experience and ability. Contact Garth on or call him on to discuss this and other opportunities.

Are you ready for a change of scenery? The e-Merge IT recruitment is a specialist niche recruitment agency. We offer our candidates options so that we can successfully place the right developers with the right companies in the right roles. Check out the e-Merge website for more great positions.

Do you have a friend who is a developer or technology specialist? We pay cash for successful referrals!

This advertiser has chosen not to accept applicants from your region.

Senior Site Reliability Engineer

Midrand, Gauteng E-Merge IT Recruitment

Posted 8 days ago

Job Viewed

Tap Again To Close

Job Description

contract

Our client in the Manufacturing industry specialises with building premium vehicles—they engineer the future of mobility.

Currently in search for a Senior Site Reliability Engineer , you will play a critical role in ensuring the availability, performance, and resilience of companies’ digital platforms and connected services across the globe.

If you're passionate about automation, cloud infrastructure, and high-scale systems, join us in shaping what’s next — on the road and in technology.

Requirements:

  • At least six years’ worth of experience using C# or similar MS technologies
  • Experience in testing (manual or automated testing)
  • Agile working experience advantageous
  • Infrastructure Management (Cloud and on-Prem)
  • High level of skills in the Kubernetes, Automation and Infrastructure
  • Solid understanding of infrastructure as code principles and practical experience with Terraform or similar tools.
  • Hands on experience with Docker, containerisation and microservices architecture
  • Solid understanding of monitoring and alerting practices (tools e.g Grafana, Prometheus, Elasticsearch) be able to develop new application metrics.
  • Any additional responsibilities assigned in the Agile Working Model (AWM) Charter
  • Familiarity with CI/CD concepts and experience with GitOps tools like argoCD
  • Experience with Unix/Linux operating systems internals and administration or in-depth knowledge of the Unix networking stack
  • Problem Management and Incident Management – Proactive and Reactive
  • Defect Management
  • Change Management
  • Optimise application performance.
  • Strong understanding and troubleshooting skills of distributed services.
  • Service Delivery Management
  • Excellent communication skills and team-oriented work behaviour in a distributed team
  • Software development background (C# experience)
  • Strong ability to understand and interpret Business needs and requirements with the ability to move concepts through to proposal and finally successful implementation
  • Confluence / Jira, DevOps

Reference Number for this position is GZ60549 which is a contract position based in Midrand /Centurion/ Semi-Remote offering a contract rate of R650 per hour negotiable on experience and ability. Contact Garth on or call him on to discuss this and other opportunities.

Are you ready for a change of scenery? The e-Merge IT recruitment is a specialist niche recruitment agency. We offer our candidates options so that we can successfully place the right developers with the right companies in the right roles. Check out the e-Merge website for more great positions.

Do you have a friend who is a developer or technology specialist? We pay cash for successful referrals!

This advertiser has chosen not to accept applicants from your region.

Engineer, Site Reliability

Johannesburg, Gauteng Standard Bank of South Africa Limited

Posted 5 days ago

Job Viewed

Tap Again To Close

Job Description

Business Segment: Business & Commercial Banking

Location: ZA, GP, Johannesburg, 3 Simmonds Street

Responsible for the resilience of Group Information Technology across the entire eco system of the bank by improving availability, reliability and performance of business-critical customer facing systems, whilst building sustainable capability. This complex task is delivered in conjunction with the CIO and CTO communities.

Qualifications

Type of Qualification: Post Graduate Degree
Field of Study: Information Studies
Type of Qualification: Post Graduate Degree
Field of Study: Information Technology

Experience Required
Software Engineering
Technology
8-10 years
Experience as a software engineer or operations engineer, using large scale production systems and technologies. Experience in design and execution small to medium scale systems automation projects with strong autonomy. Broad experience in translating business and functional requirements into technical specifications. Experience in engaging with delivery partners both internal and external to the organisation with a focus on optimising partner performance.

More than 10 years
Experience in transformational projects with a strong technology platform component, demonstrating the realisation of business objectives and affecting client experience. Experience in working with cross-functional business stakeholder groups in order to facilitate ideation and solution design, ensuring that initiatives have client and business relevance. Experience in ensuring the commercial viability of solution and creating value for clients, shareholders and business.

Additional Information

  • Adopting Practical Approaches
  • Articulating Information
  • Checking Things
  • Developing Expertise
  • Documenting Facts
  • Examining Information
  • Interpreting Data
  • Managing Tasks
  • Producing Output
  • Taking Action
  • Team Working
  • Benefits Management
  • IT Applications
  • IT Systems
  • Technical Analysis
  • Use of Build and Test Automation
  • Use of Version Control
  • Splunk, Appdynamics, Dynatrace
  • Python
  • IaC (AWS CDK or Terraform)

Please note: All our recruitment processes comply with the applicable local laws and regulations.

We will never ask for money or any form of payment as part of our recruitment process. If you experience this, please contact our Fraud line on +27 800222050

Please note: All our recruitment processes comply with the applicable local laws and regulations. We will never ask for money or any from of payment as part of our recruitment process. If you experience this, please contact our Fraud line on +27 800222050 or

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Engineer, Site Reliability

Johannesburg, Gauteng Standard Bank of South Africa Limited

Posted today

Job Viewed

Tap Again To Close

Job Description

Business Segment: Business & Commercial Banking

Location: ZA, GP, Johannesburg, 3 Simmonds Street

Responsible for the resilience of Group Information Technology across the entire eco system of the bank by improving availability, reliability and performance of business-critical customer facing systems, whilst building sustainable capability. This complex task is delivered in conjunction with the CIO and CTO communities.

Qualifications

Type of Qualification: Post Graduate Degree
Field of Study: Information Studies
Type of Qualification: Post Graduate Degree
Field of Study: Information Technology Experience Required
Software Engineering
Technology
8-10 years
Experience as a software engineer or operations engineer, using large scale production systems and technologies. Experience in design and execution small to medium scale systems automation projects with strong autonomy. Broad experience in translating business and functional requirements into technical specifications. Experience in engaging with delivery partners both internal and external to the organisation with a focus on optimising partner performance. More than 10 years
Experience in transformational projects with a strong technology platform component, demonstrating the realisation of business objectives and affecting client experience. Experience in working with cross-functional business stakeholder groups in order to facilitate ideation and solution design, ensuring that initiatives have client and business relevance. Experience in ensuring the commercial viability of solution and creating value for clients, shareholders and business.

Additional Information

  • Adopting Practical Approaches
  • Articulating Information
  • Checking Things
  • Developing Expertise
  • Documenting Facts
  • Examining Information
  • Interpreting Data
  • Managing Tasks
  • Producing Output
  • Taking Action
  • Team Working
  • Benefits Management
  • IT Applications
  • IT Systems
  • Technical Analysis
  • Use of Build and Test Automation
  • Use of Version Control
  • Splunk, Appdynamics, Dynatrace
  • Python
  • IaC (AWS CDK or Terraform)

Please note: All our recruitment processes comply with the applicable local laws and regulations.

We will never ask for money or any form of payment as part of our recruitment process. If you experience this, please contact our Fraud line on +27 800222050

Please note: All our recruitment processes comply with the applicable local laws and regulations. We will never ask for money or any from of payment as part of our recruitment process. If you experience this, please contact our Fraud line on +27 800222050 or

#J-18808-Ljbffr

This advertiser has chosen not to accept applicants from your region.

Platform / DevOps / Site Reliability Engineer

Johannesburg, Gauteng Elite Search & Selection

Posted 5 days ago

Job Viewed

Tap Again To Close

Job Description

Role:

Platform / DevOps / Site Reliability Engineer
Location: Remote but ideally based in Johannesburg, Cape Town, Durban
Company: Part of a large ICT group, this company offers globally available cloud services, solutions, and platforms for all. Their expertise empowers clients to adopt and migrate to any cloud, wherever they choose.

The purpose of the role is to create and manage platforms to guarantee the smooth operation of application systems. This involves aligning the planning, execution, and management of cloud infrastructure and software services with the overall business strategy. Collaborates with other teams to ensure the infrastructure remains dependable, scalable, and equipped to meet the evolving requirements of the applications.

Duties & Responsibilities

Requirements:

  1. 3 - 5yrs + DevOps / Site Reliability / Platform Engineer or System Administration experience in software environment.
  2. Experience working with IaaS and public cloud platforms.
  3. Managing and provisioning of infrastructure through code (IaaC).
  4. Solid experience with and knowledge of Docker.
  5. Experience with VMware Cloud Services, Amazon Web Services, Microsoft Azure, or Google Cloud Platform.
  6. SOLID experience with both or one of the following: HashiCorp and / or Kubernetes.
  7. Infrastructure-as-Code tools Terraform, Ansible – essential.
  8. CircleCI, GitHub Actions, GitLab CI, etc.
  9. Experience using version control tools such as Git and GitHub.
  10. Strong knowledge of security risks and mitigation thereof.
  11. MySQL, Postgres database administration.

Strong skills in the following:

  1. Network routing and core principles.
  2. Solid experience and good knowledge working with Linux containers and virtual machines.
  3. Knowledge of cloud platform environments.
  4. Experience within a software development environment including a good understanding of software development principles.
  5. Good knowledge of infrastructure-as-code and automation.
  6. Solid experience in Unix/Linux administration.
  7. Experience in Linux container orchestration.
Package & Remuneration

R 35 000 - R 65 000 - Monthly

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.
Be The First To Know

About the latest Service reliability engineering head jobs in Johannesburg !

Site Reliability Engineer (Expert) 0630

Midrand, Gauteng Open Source (Pty) Ltd

Posted 4 days ago

Job Viewed

Tap Again To Close

Job Description

Why Join Us?
  • Work on high-availability, multi-region deployments
  • Shape our observability strategy and implement automation at scale
  • Collaborate with development teams to enhance service reliability
  • Lead incident response and drive systematic improvements
Essential Skills & Experience
  • 10+ years in SRE, DevOps, or similar roles
  • Strong networking fundamentals
  • Skilled with AWS and cloud-native technologies
  • Proficiency in Python, Go, or JavaScript/TypeScript
  • Experience with Docker, Kubernetes, CI/CD, and GitOps (Flux/ArgoCD)
  • Knowledge of monitoring tools (Grafana, Prometheus, Loki, Tempo)
Bonus Skills
  • Advanced Kubernetes certification (CKA/CKAD)
  • Experience with Terraform, PostgreSQL, MongoDB
  • Expertise in performance optimization & cost management
  • Security hardening & compliance implementation
Tech Stack You'll Work With
  • Containerization: Kubernetes, Docker
  • Observability: Grafana Stack, Prometheus
  • Infrastructure: Cloud-native technologies
  • Programming: Go, Python, TypeScript/JavaScript
  • CI/CD: Modern pipeline tools
  • Multi-region deployments & microservices architecture
Key Responsibilities
  • System Reliability: Design and implement scalable infrastructure solutions
  • Observability: Architect and maintain monitoring & alerting systems
  • Automation: Develop automated workflows to reduce manual effort
  • Incident Management: Lead major incident response and drive improvements
  • Technical Leadership: Mentor team members and influence engineering decisions
  • Tool Development: Build internal tools to enhance operational efficiency
  • Best Practices: Establish and enforce SRE methodologies

📩 Ready to take on this challenge? Apply now with your latest and detailed CV!
This advertiser has chosen not to accept applicants from your region.

Technology/Domain Specialist II (Site Reliability Engineer)

Johannesburg, Gauteng nedbank

Posted 17 days ago

Job Viewed

Tap Again To Close

Job Description

Technology/Domain Specialist II (Site Reliability Engineer)

Details

Location:

Johannesburg, ZA

Reference: 140754

Job Classification

140754 - Technology Domain Specialist (Site Reliability Engineer)

Closing date - 10 July 2025

Job Family

Information Technology

Application Development

Manage Self: Technical

Job Purpose

To actively own and participate in the overall evolution of the Technology or Domain asset while influencing and maintaining the health of the asset. Play a leadership role on the associated COE’s

Job Responsibilities
  • Collaborating with stakeholders, engineers, and operational SMEs to ensure all relevant parties are up to date with what is top of mind within the reliability service offerings
  • Evolve services based on customer needs and technology to ensure we remain competitive in the market
  • Influence and collaborate with squads during service or platform design to proactively prevent system failures and enhance performance
  • Engage with Asset/Journey squads to adopt SRE practices with a core focus to contribute towards incident management and advocate for blameless post mortems.
  • Engage and influence squads with regards to observability, high availability utilising new or existing technology and Improve disaster recovery plans.
  • Implement automated-based solutions to achieve high availability, efficiency, reduce cost and performance to systems.
  • Coach squads on best practices within the organisation via internal forums to position SRE fundamental knowledge and promote enterprise-wide knowledge sharing
  • Assist with creating and maintaining system health and performance metrics reflecting real-time data, enabling proactive resolution and faster troubleshooting.
  • Collaborate and partner with DevOps engineer/coach to ensure efficient (CI/CD) pipelines and resolve any failures or improve.
  • Take charge of technical leadership, engage, with squads to identify best solutions, and support and guide Junior SRE's.
  • Assist in defining and implementing metrics related to performance of services such as SLO's, SLI's and SLO's.
  • Defining and delivering Site Reliability Engineering technical standards in partnership with all disciplines of software engineering.
  • Participate and closely work with relevant COE's to improve release of new features to facilitate time to market.
  • Ability to build and maintain strategic relationships with the business units and vendors in order to be in sync on current ways of work and business decisions that are being embraced
  • Conduct assessments within squads to measure SRE maturity, provide report and outline a plan to assist on moving to next level with continuous feedback.
  • Adhere and comply with Nedbank group information management, data integrity and security policies and best practices.
  • Participate and support corporate responsibility initiatives for the achievement of business strategy.
  • Manage multiple concurrent objectives, projects, groups, or activities, making effective judgements as to prioritisation and time allocation

Technical Skills
  • Working Experience of Operating System (Linux or Windows)
  • Knowledgeable with microservices and containerization; K8s or Docker
  • Troubleshooting and rout cause Analysis
  • SRE Best practices
  • In-depth knowledge of DevOps framework
  • Experience and knowledge of programming languages(C#, Java, Python, Bash)
  • Proactivity in seeking Improvement opportunities
  • Experience with troubleshooting production systems/applications

Essential Qualifications - NQF Level
  • Advanced Diplomas/National 1st Degrees
  • Professional Qualifications/Honour’s Degree
Preferred Qualification

Degree or Diploma in IT

Preferred Certifications

Certificate in relevant Technology or Domain

Minimum Experience Level

Min 5IT Experience with 3 years in relevant technology or domain

Technical / Professional Knowledge
  • Asset management
  • Data Warehousing
  • Information Technology (IT) Architecture
  • Decision Making
  • Courage
  • Stress Tolerance
  • Quality Orientation
  • Technical/Professional Knowledge and Skills
  • Resolving Conflict

---

Please contact the Nedbank Recruiting Team at +27 860 555 566

If you can't find the job you're looking for, activate job alerts to be one of the first to know when new positions open up.

Nedbank Ltd Reg No 1951/0009/06.
Authorised financial services and registered credit provider (NCRCP16).

For assistance please contact the Nedbank Recruiting Team at +27 860 555 566

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Quality & Reliability Engineer (QRE)

Johannesburg, Gauteng WatersEdge Solutions

Posted 4 days ago

Job Viewed

Tap Again To Close

Job Description

WatersEdge Solutions is hiring a Quality & Reliability Engineer (QRE) to lead the charge on software stability, release quality, and development tooling for a global incentive technology platform. If you thrive in the space between engineering, QA, and operations—this is your opportunity to own quality at scale.

About the Role

Reporting to the Tech Lead, you’ll play a central role in building, maintaining, and improving CI/CD pipelines, test automation, observability, and reliability metrics. You’ll collaborate across functions to ensure stable, secure, and high-quality releases, with a strong focus on developer experience and platform integrity.

Key Responsibilities

  • Maintain and enhance CI/CD pipelines (GitHub Actions, Heroku)
  • Automate quality gates: tests, linting, coverage, type-checks
  • Own release processes: staging, production deployments, rollbacks, feature flags
  • Monitor production health via logs, Sentry, performance dashboards
  • Track SLIs/SLOs and report on metrics like MTTR, CFR, and defect escape rates
  • Support incident response and prepare operational runbooks
  • Maintain test automation infrastructure and flaky test backlog
  • Collaborate with devs to ensure testable, regression-resistant code
  • Integrate security checks into pipelines and uphold compliance standards (e.g. SOC 2)

What You’ll Bring

  • Strong CI/CD experience (GitHub Actions, CircleCI, GitLab CI)
  • Familiarity with cloud platform pipelines (e.g., Heroku)
  • Proficient in Python and shell scripting (Django a bonus)
  • Experience managing automated test frameworks
  • Comfort with observability tools like Sentry and log monitors
  • Proven track record in delivering secure, reliable SaaS or FinTech systems

Nice to Have

  • Experience with feature flags (e.g., LaunchDarkly, Unleash)
  • Knowledge of SOC 2 / ISO 27001 controls in CI/CD
  • Exposure to data privacy and multi-tenant architecture
  • Experience conducting post-mortems and tracking incident actions

What’s On Offer

  • Competitive compensation
  • Full ownership of the CI/CD and reliability function within a fast-scaling SaaS product
  • High autonomy and low red tape environment
  • A mission-driven platform enabling financial equity globally

Company Culture

We value pragmatism, systems thinking, and a passion for enabling others through great tooling. You’ll join a smart, humble, and impact-driven team working in a no-blame culture where quality is everyone’s responsibility. If you believe tests are leverage, incidents are learning opportunities, and performance is about velocity with stability—this is your place.

If you have not been contacted within 10 working days, please consider your application unsuccessful.

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.
 

Nearby Locations

Other Jobs Near Me

Industry

  1. request_quote Accounting
  2. work Administrative
  3. eco Agriculture Forestry
  4. smart_toy AI & Emerging Technologies
  5. school Apprenticeships & Trainee
  6. apartment Architecture
  7. palette Arts & Entertainment
  8. directions_car Automotive
  9. flight_takeoff Aviation
  10. account_balance Banking & Finance
  11. local_florist Beauty & Wellness
  12. restaurant Catering
  13. volunteer_activism Charity & Voluntary
  14. science Chemical Engineering
  15. child_friendly Childcare
  16. foundation Civil Engineering
  17. clean_hands Cleaning & Sanitation
  18. diversity_3 Community & Social Care
  19. construction Construction
  20. brush Creative & Digital
  21. currency_bitcoin Crypto & Blockchain
  22. support_agent Customer Service & Helpdesk
  23. medical_services Dental
  24. medical_services Driving & Transport
  25. medical_services E Commerce & Social Media
  26. school Education & Teaching
  27. electrical_services Electrical Engineering
  28. bolt Energy
  29. local_mall Fmcg
  30. gavel Government & Non Profit
  31. emoji_events Graduate
  32. health_and_safety Healthcare
  33. beach_access Hospitality & Tourism
  34. groups Human Resources
  35. precision_manufacturing Industrial Engineering
  36. security Information Security
  37. handyman Installation & Maintenance
  38. policy Insurance
  39. code IT & Software
  40. gavel Legal
  41. sports_soccer Leisure & Sports
  42. inventory_2 Logistics & Warehousing
  43. supervisor_account Management
  44. supervisor_account Management Consultancy
  45. supervisor_account Manufacturing & Production
  46. campaign Marketing
  47. build Mechanical Engineering
  48. perm_media Media & PR
  49. local_hospital Medical
  50. local_hospital Military & Public Safety
  51. local_hospital Mining
  52. medical_services Nursing
  53. local_gas_station Oil & Gas
  54. biotech Pharmaceutical
  55. checklist_rtl Project Management
  56. shopping_bag Purchasing
  57. home_work Real Estate
  58. person_search Recruitment Consultancy
  59. store Retail
  60. point_of_sale Sales
  61. science Scientific Research & Development
  62. wifi Telecoms
  63. psychology Therapy
  64. pets Veterinary
View All Service Reliability Engineering Head jobs View All Jobs in Johannesburg