Didn't find the right job?

Get expert career advice to help you find the ideal role and improve your job search strategy.

What Jobs are available for Sre Manager in South Africa?

Showing 38 Sre Manager jobs in South Africa

Site Reliability Engineering

R70000 - R120000 Y Reach Digital Health

Posted today

Tap Again To Close

Job Description

About Reach

Reach Digital Health is transforming how public healthcare is delivered. Using innovative digital tools, we connect people, especially those who cannot easily access traditional care, to the information and support they need to live healthier lives. From maternal and child health to HIV/AIDS support and immunisation, our work helps close critical gaps in healthcare and ensures that underserved communities are not left behind.

With more than 16 years of experience, we know that technology alone is not enough. Real impact comes from combining our scalable, multi-channel technology with the partnerships, systems and expertise needed to drive meaningful change. By joining Reach, you will be part of a mission-driven team tackling some of the world's toughest health challenges, making healthcare more inclusive and helping save lives every day.

Why Work With Us

Our team is guided by our values:
grit, empathy, collaboration, simplicity, and curiosity

. By joining Reach, you will be part of a mission-driven team tackling some of the world's toughest health challenges, making healthcare more inclusive and helping save lives every day.

At Reach, you will do work that matters while enjoying the balance you deserve. We are proud to be one of the first South African companies to embrace a four-day work week, giving our team more time for life outside of work. Alongside competitive salaries, we invest in your growth through ongoing training and career development, creating opportunities to thrive in a supportive and innovative environment.

We put people at the centre of everything we do - both internally and in our work. We are creating an inclusive, diverse environment where everyone feels welcome, accepted, and supported. We are a progressive and equal-opportunity employer.

About the role

Join Reach as our Site Reliability Engineering Lead and play a central role in designing and maintaining the secure infrastructure that powers vital health services. You'll lead the SRE team, automate processes, and improve system reliability while ensuring adherence to data privacy regulations and security best practices, all while working on projects that directly impact communities in need. Your ideas and innovations will have real-world effects on healthcare access and outcomes.

The role requires advanced infrastructure engineering and security expertise with a passion for healthcare technology and data compliance.

Key Focus Areas

You will primarily be responsible for:

Team Management and Growth:
Foster the professional development of the SRE team through mentorship, one-on-one sessions, and skill-building opportunities.
Collaboration:
Work closely with cross-functional teams, including development and operations, to implement best practices and foster a culture of collaboration and innovation.
Infrastructure reliability and performance:
Monitoring, measuring, and improving the reliability and performance of our systems
Identify and address bottlenecks, optimize system performance, and implement strategies for scaling infrastructure to meet growing demands.
Maintenance, upgrades, and security updates
Automation and tooling:
You will design and develop software and scripts that automate and streamline various aspects of infrastructure and operations
Assisting other teams with deployment and updates of their applications and services.
Administration:
Administration of our infrastructure accounts and critical services, providing strategic oversight for our hosting infrastructure and vendor relationships. Owns the hosting and billing lifecycle, from monitoring and analysis to implementing cost-optimization strategies, ensuring financial efficiency and predictability across our platforms.
Data Management and Security:
Lead Information Security Management System (ISMS) compliance initiatives including policy development, risk assessment processes, and security framework implementation, while managing security tools (antivirus, password management, security awareness training), ensuring data, security and infrastructure policies and best practices are adhered to, working with Legal and Projects teams to develop and enforce policies and procedures for data collection, storage, and access to ensure compliance with data privacy regulations, implementing and monitoring security measures to protect sensitive health information, and managing data backups and disaster recovery.
Innovation:
You will research and evaluate new technologies and methodologies that can enhance our systems and processes, and implement proof-of-concepts and prototypes to demonstrate their feasibility and value.

Responsibilities and Duties

Lead a team of Site Reliability Engineers, providing mentorship, guidance, and technical expertise.
Establish and enforce SRE best practices to improve system reliability and operational efficiency.
Collaborate with development teams to design, implement, and maintain scalable and reliable infrastructure.
Develop and implement incident response plans, ensuring timely resolution of system outages and performance issues.
Conduct performance reviews, set goals, and facilitate professional development for team members.
Drive the implementation of automation tools, software and processes to improve infrastructure and operational efficiency of our systems and ensure they follow best practices.
Monitor system health, analyze trends, and implement proactive measures to prevent incidents.
Advise on and/or contribute to new or emerging technologies that might be relevant to Reach.
Work closely with the Head of Engineering and other Engineering Leads to ensure alignment within the engineering department.
Design and develop tools and software that automate and improve the infrastructure and operation of our systems and ensure they follow best practices.
Perform code reviews, testing and debugging and troubleshooting of the software and tools developed by the SRE team and assist other engineering teams with the same.
Design and implement security features, conduct security audits and risk assessments, manage enterprise security tools, and coordinate penetration testing exercises while serving as technical point of contact for external security audits.
Develop and enforce

Information Security Management System (ISMS)

compliance policies aligned with POPIA and ISO 27001, including risk treatment processes, data protection policies, and business continuity frameworks. Lead security awareness programs, manage security training and phishing campaigns, and collaborate with teams to ensure alignment between technical and regulatory requirements across organisational systems.
- Suggest and implement improvements to current ways of working / processes (or gaps in the processes) that are relevant to the current and future success of the SRE team and Reach as a whole.

Qualifications

An honours degree in Computer Science or Engineering or equivalent experience.
8+ years of experience as a senior site reliability engineer, senior software engineer, or system administrator, working with large-scale, distributed, and cloud-based systems.
4+ years of experience as a team lead, manager, or mentor, leading and developing site reliability engineers or software engineers.

Skills and Experience Required

Proficient in one or more programming languages, such as Python, Go, Java, or C++.
Proficient in one or more scripting languages, such as Bash, Perl, or Ruby.
Proficient in one or more cloud platforms, such as AWS, Azure, or GCP.
Proficient in one or more UNIX-like operating systems.
Proficient in one or more configuration management and deployment tools, such as Ansible, Chef, Puppet, or Terraform.
Proficient in one or more monitoring and alerting tools, such as Prometheus, Grafana, Datadog, or Splunk.
Proficient in one or more container and orchestration tools, such as Docker, Kubernetes.
Proficient in one or more web servers and proxies, such as Apache, Nginx, or Envoy.
Proficient in one or more databases and data stores, such as MySQL, PostgreSQL, MongoDB, or Redis.
Proficient in one or more version control and collaboration tools, such as Git.
Knowledgeable in the concepts and principles of site reliability engineering, such as SLIs, SLOs, error budgets, incident management, postmortems, and blameless culture.
Knowledgeable in the concepts and principles of software engineering, such as design patterns, code quality, testing, debugging, and documentation.
Knowledgeable in the concepts and principles of performance engineering, such as profiling, benchmarking, load testing, and capacity planning.
Knowledgeable in the concepts and principles of distributed computing, such as concurrency, parallelism, synchronisation, and consensus.
Excellent communication and collaboration skills, and ability to work effectively in a cross-functional and remote team environment.
Excellent problem-solving and analytical skills, and ability to troubleshoot and resolve complex issues in a timely and efficient manner.
Excellent learning and innovation skills, and ability to research and evaluate new technologies and methodologies.
Experience implementing ISO 27001 or POPIA standards with expertise in security audits, policy development, and regulatory compliance.
Proficiency managing enterprise security tools (antivirus, password management, SIEM), penetration testing oversight, and incident response procedures.
Experience leading security awareness programs, developing security frameworks, and implementing organization-wide security policies and training initiatives.

How to Apply

Ready to make a difference in public health? We welcome applicants from all backgrounds and encourage candidates of all genders, races, ages, religions, sexual orientations, and abilities to apply. Reach Digital Health is an equal opportunity and affirmative action employer, committed to creating a diverse and inclusive workplace.

Submit your application today and join our mission-driven team to make a real impact in public health.

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineering

R70000 - R120000 Y Reach Digital Health

Posted today

Tap Again To Close

Job Description

About Reach

Why Work With Us

Our team is guided by our values: grit, empathy, collaboration, simplicity, and curiosity. By joining Reach, you will be part of a mission-driven team tackling some of the world's toughest health challenges, making healthcare more inclusive and helping save lives every day.

About the role

The role requires advanced infrastructure engineering and security expertise with a passion for healthcare technology and data compliance.

Key Focus Areas

You will primarily be responsible for:

Team Management and Growth:
Foster the professional development of the SRE team through mentorship, one-on-one sessions, and skill-building opportunities.
Collaboration:
Work closely with cross-functional teams, including development and operations, to implement best practices and foster a culture of collaboration and innovation.
Infrastructure reliability and performance:
Monitoring, measuring, and improving the reliability and performance of our systems
Identify and address bottlenecks, optimize system performance, and implement strategies for scaling infrastructure to meet growing demands.
Maintenance, upgrades, and security updates
Automation and tooling:
You will design and develop software and scripts that automate and streamline various aspects of infrastructure and operations
Assisting other teams with deployment and updates of their applications and services.
Administration:
Administration of our infrastructure accounts and critical services, providing strategic oversight for our hosting infrastructure and vendor relationships. Owns the hosting and billing lifecycle, from monitoring and analysis to implementing cost-optimization strategies, ensuring financial efficiency and predictability across our platforms.
Data Management and Security:
Lead Information Security Management System (ISMS) compliance initiatives including policy development, risk assessment processes, and security framework implementation, while managing security tools (antivirus, password management, security awareness training), ensuring data, security and infrastructure policies and best practices are adhered to, working with Legal and Projects teams to develop and enforce policies and procedures for data collection, storage, and access to ensure compliance with data privacy regulations, implementing and monitoring security measures to protect sensitive health information, and managing data backups and disaster recovery.
Innovation:
You will research and evaluate new technologies and methodologies that can enhance our systems and processes, and implement proof-of-concepts and prototypes to demonstrate their feasibility and value.

Responsibilities and Duties

Lead a team of Site Reliability Engineers, providing mentorship, guidance, and technical expertise.
Establish and enforce SRE best practices to improve system reliability and operational efficiency.
Collaborate with development teams to design, implement, and maintain scalable and reliable infrastructure.
Develop and implement incident response plans, ensuring timely resolution of system outages and performance issues.
Conduct performance reviews, set goals, and facilitate professional development for team members.
Drive the implementation of automation tools, software and processes to improve infrastructure and operational efficiency of our systems and ensure they follow best practices.
Monitor system health, analyze trends, and implement proactive measures to prevent incidents.
Advise on and/or contribute to new or emerging technologies that might be relevant to Reach.
Work closely with the Head of Engineering and other Engineering Leads to ensure alignment within the engineering department.
Design and develop tools and software that automate and improve the infrastructure and operation of our systems and ensure they follow best practices.
Perform code reviews, testing and debugging and troubleshooting of the software and tools developed by the SRE team and assist other engineering teams with the same.
Design and implement security features, conduct security audits and risk assessments, manage enterprise security tools, and coordinate penetration testing exercises while serving as technical point of contact for external security audits.
Develop and enforce Information Security Management System (ISMS) compliance policies aligned with POPIA and ISO 27001, including risk treatment processes, data protection policies, and business continuity frameworks. Lead security awareness programs, manage security training and phishing campaigns, and collaborate with teams to ensure alignment between technical and regulatory requirements across organisational systems.
Suggest and implement improvements to current ways of working / processes (or gaps in the processes) that are relevant to the current and future success of the SRE team and Reach as a whole.

Qualifications

An honours degree in Computer Science or Engineering or equivalent experience.
8+ years of experience as a senior site reliability engineer, senior software engineer, or system administrator, working with large-scale, distributed, and cloud-based systems.
4+ years of experience as a team lead, manager, or mentor, leading and developing site reliability engineers or software engineers.

Skills and Experience Required

Proficient in one or more programming languages, such as Python, Go, Java, or C++.
Proficient in one or more scripting languages, such as Bash, Perl, or Ruby.
Proficient in one or more cloud platforms, such as AWS, Azure, or GCP.
Proficient in one or more UNIX-like operating systems.
Proficient in one or more configuration management and deployment tools, such as Ansible, Chef, Puppet, or Terraform.
Proficient in one or more monitoring and alerting tools, such as Prometheus, Grafana, Datadog, or Splunk.
Proficient in one or more container and orchestration tools, such as Docker, Kubernetes.
Proficient in one or more web servers and proxies, such as Apache, Nginx, or Envoy.
Proficient in one or more databases and data stores, such as MySQL, PostgreSQL, MongoDB, or Redis.
Proficient in one or more version control and collaboration tools, such as Git.
Knowledgeable in the concepts and principles of site reliability engineering, such as SLIs, SLOs, error budgets, incident management, postmortems, and blameless culture.
Knowledgeable in the concepts and principles of software engineering, such as design patterns, code quality, testing, debugging, and documentation.
Knowledgeable in the concepts and principles of performance engineering, such as profiling, benchmarking, load testing, and capacity planning.
Knowledgeable in the concepts and principles of distributed computing, such as concurrency, parallelism, synchronisation, and consensus.
Excellent communication and collaboration skills, and ability to work effectively in a cross-functional and remote team environment.
Excellent problem-solving and analytical skills, and ability to troubleshoot and resolve complex issues in a timely and efficient manner.
Excellent learning and innovation skills, and ability to research and evaluate new technologies and methodologies.
Experience implementing ISO 27001 or POPIA standards with expertise in security audits, policy development, and regulatory compliance.
Proficiency managing enterprise security tools (antivirus, password management, SIEM), penetration testing oversight, and incident response procedures.
Experience leading security awareness programs, developing security frameworks, and implementing organization-wide security policies and training initiatives.

How to Apply

Submit your application today and join our mission-driven team to make a real impact in public health.

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Cloud Infrastructure Engineer

R480000 - R960000 Y Job Crystal

Posted today

Tap Again To Close

Job Description

Company:

We're building the future with robotics. Our work is in integrating robotics and automation into industries like construction and mining.

If you're passionate about cutting-edge technology, solving complex problems, and building systems that push the limits of what is possible in robotics and automation, here is where your skills can make a global impact.

We are a US based company with strong South African roots. Our business started in Joburg in 2005 and later moved to the USA. We are now building out our engineering office's in Cape Town (Century City) and Joburg (Greenstone Hill) to support our US operations and customers.

Key Responsibilities

Kubernetes Cluster Management:
Design, deploy, and manage Kubernetes clusters in private cloud environments, ensuring high availability, scalability, and performance.
Configure and optimize cluster components, including control plane, worker nodes, and networking with Antrea as the CNI plugin and Project Contour for ingress management.
Implement and maintain Kubernetes RBAC, network policies, and resource quotas to ensure security and efficiency.
Deployment Automation:
Develop and maintain Infrastructure as Code (IaC) using OpenTofu, integrated with GitLab CI/CD pipelines, to automate the provisioning and management of Kubernetes clusters and related infrastructure.
Create reusable OpenTofu modules to streamline deployment processes across multiple environments (dev, staging, production).
Leverage GitLab CI/CD to enable automated, repeatable, and auditable infrastructure deployments.
Maintain the OpenTofu repository responsible for managing Proxmox hosts and virtual machines (VMs) that host the Kubernetes nodes, ensuring consistent and automated provisioning of underlying infrastructure.
Secrets Management:
Implement and manage secure secrets storage and access using OpenBao.
Configure OpenBao policies, roles, and dynamic secrets for secure integration with Kubernetes workloads.

Monitoring and Troubleshooting:

Set up monitoring, logging, and alerting for Kubernetes clusters using tools like Prometheus, and Grafana.
Troubleshoot and resolve issues related to cluster performance, application deployments, and infrastructure automation.
Collaboration and Documentation:
Collaborate with development, DevOps, and security teams to align infrastructure with application requirements.
Document infrastructure configurations, processes, and best practices to ensure knowledge sharing and maintainability.

Skills & Qualifications

Experience:
Bachelors Degree or equivalent
5+ years of experience in infrastructure engineering, with at least 3 years focused on Kubernetes deployment and management.
Proven experience setting up and managing Kubernetes clusters in private cloud environments (e.g., bare-metal with Proxmox).
Hands-on experience with OpenTofu (Terraform) for Infrastructure as Code
Expertise in secrets management using OpenBao in production environments.
Experience with other cloud-native tools like Helm, ArgoCD, or Flux for GitOps workflows.
Knowledge of security frameworks and compliance standards
Familiarity with hybrid or multi-cloud environments.
Technical Skills:
Deep understanding of Kubernetes architecture, including pods, services, ingress, and operators.
Proficiency in container runtimes, specifically CRI-O, and orchestration.
Strong experience with GitLab CI/CD for automating infrastructure and application deployments.
Familiarity with networking concepts (e.g., VPCs, load balancers, DNS) and expertise in configuring Antrea as the Kubernetes CNI plugin and Project Contour for ingress management.
Experience with Proxmox for virtual machine management in private cloud setups.
General
Excellent problem-solving and analytical skills.
Strong communication and collaboration skills to work effectively in cross-functional teams.
Ability to document complex systems clearly and concisely.
Preferred Qualifications:
Bachelors Degree or equivalent
Certifications such as Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD).

Job Type: Full-time

Pay: Up to R80 000,00 per month

Work Location: In person

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Cloud Infrastructure Engineer

Century City, Western Cape R720000 - R960000 Y DataTech Recruitment

Posted today

Tap Again To Close

Job Description

Cloud Infrastructure Engineer

Century City, Cape Town – Onsite

R720,000 – R960,000 CTC per annum

5+ years' experience

About the role

We're looking for an experienced Cloud Infrastructure Engineer to design, build, and manage private cloud environments that support high-performance systems. This role will suit someone with strong Kubernetes expertise, a solid grasp of automation, and an interest in working with modern infrastructure technologies.

Key responsibilities

Build, configure, and manage Kubernetes clusters in private cloud environments.
Automate deployments and infrastructure using Infrastructure as Code (OpenTofu/Terraform) and CI/CD pipelines.
Manage secrets securely across environments.
Set up monitoring, logging, and alerting with tools like Prometheus and Grafana.
Troubleshoot cluster and infrastructure issues.
Work closely with development, DevOps, and security teams to align systems with business needs.

Requirements

5+ years' experience in infrastructure engineering, with at least 3 years focused on Kubernetes.
Strong skills in Kubernetes, Proxmox, OpenTofu/Terraform, and CI/CD (GitLab).
Experience with secrets management tools such as OpenBao.
Good knowledge of networking, containers, and private cloud environments.
Excellent problem-solving skills and the ability to work well in a team.
Relevant degree or equivalent qualification.

What's on offer

R720k – R960k annual CTC
Medical aid contribution
In-office role, Century City

Job Types: Full-time, Permanent

Pay: R720 000,00 - R960 000,00 per year

Application Question(s):

Do you have strong skills in Kubernetes, Proxmox, OpenTofu/Terraform, and CI/CD (GitLab)?
Do you have good knowledge of networking, containers, and private cloud environments?

Education:

Bachelors (Required)

Experience:

Infrastructure Engineering: 5 years (Required)
Kubernetes: 3 years (Required)
Cloud Engineering: 3 years (Required)

Work Location: In person

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Infrastructure Engineer – Cloud Infrastructure

R900000 - R1200000 Y MagicOrange

Posted today

Tap Again To Close

Job Description

MagicOrange is a globally recognized leader in the IT Financial Management Software market, as acknowledged by Gartner. With customers and a strong presence on four continents, we are a Software as a Service (SaaS) provider in a high-growth phase. Our mission is to empower individuals and organizations, enhancing their value through our innovative software solutions

Location: Durban - KwaZulu Natal, South Africa

Position Summary:

Execute tasks related to MagicOrange's IT infrastructure, assist in the operational layer of our SaaS platform estate, and ensure security governance. Provide operational resilience, secure cloud usage, and consistent IT support across a globally distributed, hybrid workforce.

Key Responsibilities

Cloud Infrastructure & SaaS Operations

Oversee and optimize the suite of SaaS platforms and operational tools used across the business to ensure seamless day-to-day functionality, efficiency, and support for all users.
Maintain end-user computing, network operations, and endpoint protection to ensure operational continuity and user satisfaction.
Implement tooling, automation, and remote support models for globally distributed and hybrid workforces.
Administer Microsoft Entra ID, including provisioning, monitoring, and performance optimization.

Azure Governance & Compliance

Configure and maintain Azure Policy, Blueprints, and Compliance Manager to align with ISO 27001, GDPR, POPIA, SOC II, and other regulatory frameworks.
Create dashboards and evidence packs for auditors, executives, and clients, automating compliance reporting and remediation tasks.

ISMS & Security Framework Ownership

Collaborate with the CISO Office on maintaining and enhancing the ISO 27001 ISMS, including contributing technical input for the Statement of Applicability and participating in internal/external audits as needed.
Ensure IT infrastructure and cloud tooling align with the compliance goals set for ISO 27001, SOC 2, POPIA, and GDPR.
Support the CISO in conducting risk assessments related to infrastructure and SaaS operations and contribute to the IT-specific entries in the centralized risk register.

Identity & Access Management

Implement and maintain identity services and access controls (e.g. RBAC, MFA, PIM) in accordance with IAM policies and governance.
Enforce least-privilege access via automated policy enforcement and periodic reviews.

SaaS Licensing & Software Asset Management

Manage Microsoft EA/CSP and other enterprise SaaS subscriptions.
Develop and maintain a licensing compliance framework and optimize cost forecasting.

Incident Response & Resilience

Execute incident response procedures under the direction of the CISO Office, coordinate with partners on infrastructure-related aspects, and implement technical corrective actions post-incident.
Conduct post-incident investigations and implement corrective actions.

People & Vendor Management

Collaborate with MSPs, security partners, and SaaS vendors.
Ensure cost-efficiency across infrastructure and tooling.

Cross-Functional Collaboration

Work with Product, Finance, HR, and Engineering teams to align IT operations with strategic business goals.
Present technical KPIs, risks, and compliance status to non-technical executives and client stakeholders.

IT Operations & Support Responsibilities

Handle issues with screens, docking stations, laptops, and printers, as well as provide necessary cords.
Set up workstations, perform hardware upgrades (RAM, batteries), and manage cables.
Manage the employee onboarding process, conduct access audits, and update general user information in systems like Microsoft Teams and email.

Required Skills & Experience

6–8 years in IT infrastructure, SaaS operations, or IT security.
Diploma/Degree in IT.
3+ years working with Microsoft Azure environments.
Proven experience with distributed teams, SaaS platforms, and remote support operations.
Demonstrated involvement in ISO 27001, SOC 2, or GDPR/POPIA compliance programs.
Strong knowledge of firewall/router technologies and network security concepts
Proficiency with administrating Windows OS and Servers.
Experience with Active Directory (Entra ID), DNS/DHCP
Microsoft Exchange, Office365, Microsoft Intune, Microsoft Defender365 experience is desired.
Hands-on experience with monitoring and diagnostic tools
Experience supporting on-prem and Azure-hosted environments
Ability to troubleshoot and resolve complex infrastructure issues independently

Preferred Qualifications

Relevant IT certifications (e.g., CompTIA, Microsoft, Cisco).
Familiarity with ITSM and ISMS platforms.
Multi-cloud awareness (Azure, AWS, GCP).

Join us at MagicOrange and help shape the future of IT Financial Management and FinOps Software by ensuring our customers achieve the highest levels of satisfaction and success.

MagicOrange is an equal opportunity employer, committed to promoting diversity and inclusion in the workplace. We value and appreciate the diverse contributions and perspectives of all our employees.

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Senior Cloud Infrastructure Specialist

R80000 - R200000 Y Optimal Growth Technologies

Posted today

Tap Again To Close

Job Description

Senior Cloud Infrastructure Specialist – Azure

Location: Remote

Job Summary:

We are seeking a highly skilled and experienced Senior Cloud Infrastructure Specialist with a focus on Microsoft Azure to join our IT team.
The ideal candidate will have a solid technical background in designing, implementing, and managing Azure cloud infrastructure.
With over 7 years of experience and a degree in Information Technology or a related field, you will be responsible for driving cloud infrastructure strategies, ensuring optimal performance, scalability, and security of our cloud environments.

Key Responsibilities:

Cloud Infrastructure Design and Implementation:

Lead the design, implementation, and management of scalable and secure Azure cloud infrastructure solutions.
Collaborate with architects, developers, and other IT teams to design cloud-based solutions that meet business requirements.
Oversee the deployment of Azure IaaS and PaaS services, including virtual machines, storage accounts, and networking components.

Cloud Management and Optimization:

Monitor and manage cloud infrastructure to ensure high availability, performance, and security.
Implement and maintain Azure governance frameworks, including policies, role-based access control (RBAC), and resource tagging.
Optimize cloud resources for cost efficiency and operational effectiveness, including regular assessments and right-sizing of resources.

Security and Compliance:

Ensure cloud infrastructure is secure and compliant with industry standards and regulations.
Implement security best practices, including identity and access management, encryption, network security groups, and security monitoring.
Conduct regular security assessments and audits to identify and mitigate potential risks.

Automation and Scripting:

Develop and maintain automation scripts using PowerShell, Azure CLI, or other scripting languages to automate cloud operations and deployments.
Implement Infrastructure as Code (IaC) practices using tools like ARM templates, Terraform, or Azure DevOps.
Continuously improve cloud automation processes to enhance efficiency and reduce manual intervention.

Disaster Recovery and Backup:

Design and implement disaster recovery (DR) solutions and backup strategies to ensure business continuity.
Regularly test and validate DR plans to ensure they meet organizational requirements and recovery objectives.
Manage data protection services, including Azure Backup and Site Recovery, to safeguard critical data and workloads.

Collaboration and Support:

Provide expert guidance and support to development teams, IT operations, and business units regarding Azure cloud infrastructure.
Collaborate with cross-functional teams to troubleshoot and resolve cloud infrastructure issues.
Lead cloud infrastructure-related projects and initiatives, ensuring they are delivered on time and within budget.

Documentation and Reporting:

Create and maintain comprehensive documentation for cloud infrastructure designs, configurations, and processes.
Generate regular reports on cloud usage, performance, and cost, providing insights and recommendations to management.
Maintain up-to-date knowledge of Azure services, features, and best practices.

Educational Background:

Bachelor's Degree in Information Technology, Computer Science, or a related field.
Azure certifications (e.g., Microsoft Certified: Azure Solutions Architect Expert, Azure Administrator Associate).

Experience:

Minimum of 10+ years of experience in IT infrastructure, with at least 5 years focused on cloud infrastructure.
Extensive experience with Microsoft Azure, including design, implementation, and management of cloud solutions.
Strong background in networking, virtualization, storage, and security within cloud environments.
Expertise in Azure services such as Virtual Machines, Azure Active Directory, Virtual Networks, Storage Accounts, and Azure Security Center.
Proficiency in scripting and automation using PowerShell, Azure CLI, or similar tools.
Experience with Infrastructure as Code (IaC) using ARM templates, Terraform, or Azure DevOps.
Experience with multi-cloud environments, particularly AWS or Google Cloud, is a plus.
Familiarity with DevOps practices and tools, including CI/CD pipelines and containerization.

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Software Engineer- Cloud Infrastructure

R900000 - R1200000 Y KPMG South Africa

Posted today

Tap Again To Close

Job Description

Software Engineer (Cloud Infrastructure)
Job Title: Software Engineer – Cloud Infrastructure
Location: Johannesburg
Job Level: Software Engineer
Experience Required: 5+ years
Job Description
The Credit Risk Team at KPMG is looking for a
Software Engineer
with a strong background in
cloud infrastructure
to help implement and maintain the technical architecture for the productionalisation of credit risk models and applications onto cloud platforms. This role will involve designing scalable and secure systems to host the credit risk solutions, ensuring seamless integration with various systems, and optimizing performance for real-time access to financial risk models.

Key Responsibilities

Design and implement cloud-based solutions to host financial risk models and applications.
Ensure the scalability, security, and reliability of cloud-hosted solutions (AWS, Azure, or GCP).
Collaborate with Python model developers to integrate the credit risk models with cloud infrastructure and client ERP systems.
Work with the development team to build APIs and backend services to expose model outputs to UI components and visualization tools (e.g., Power BI).
Automate deployment and monitoring processes using cloud-native tools and DevOps practices.
Provide ongoing support and enhancements to the cloud infrastructure as required.
Work closely with the IT team to ensure that the hosting environment adheres to company security policies and compliance standards.

Skills & Qualifications

A degree in Computer Science, Software Engineering, or a related field.
3-5 years of experience in software engineering and cloud infrastructure development.
Proficiency in cloud platforms (AWS, Azure, or GCP) and knowledge of related services such as EC2, Lambda, S3, RDS, etc.
Experience with containerization technologies (Docker, Kubernetes) and CI/CD pipelines.
Strong knowledge of programming languages (Python, Java, , etc.) and frameworks.
Familiarity with database technologies (SQL, NoSQL).
Experience integrating APIs and managing data flow between different systems.
Solid understanding of cloud security best practices and compliance standards.
Ability to work collaboratively in cross-functional teams and provide technical leadership where necessary.

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Be The First To Know

About the latest Sre manager Jobs in South Africa !

Set Email Alert:

Enter your email

Job title

Location

Cloud and Infrastructure Automation Engineer

R500000 - R1200000 Y PBT Group

Posted today

Tap Again To Close

Job Description

Employment Type

Contract

Experience

5 to 25 years

Salary

Negotiable

Job Published

08 October 2025

Job Reference No.
Job Description

PBT Group has an exciting opportunity for a Cloud and Infrastructure Automation Engineer. This role will be responsible for managing and optimising server and storage infrastructure across both on-premise and AWS environments, with a strong focus on automation and reliability.

The ideal candidate is hands-on, proactive, and passionate about using automation to streamline infrastructure management — not just maintain it. You'll play a key role in ensuring performance, scalability, and stability across our systems, while collaborating closely with cross-functional engineering and DevOps teams.

Key Responsibilities

Manage and maintain server and storage infrastructure across on-prem and cloud (AWS) environments.
Design, implement, and optimise infrastructure automation using Ansible and Terraform.
Build and manage EC2 instances and related AWS resources.
Enhance system efficiency, performance, and reliability through continuous improvement initiatives.
Collaborate with development and DevOps teams to integrate infrastructure-as-code principles.
Support CI/CD pipelines and maintain Git-based workflows for configuration management.
Monitor system performance and proactively identify potential issues or improvements.
Ensure compliance with security and operational standards.
Provide technical guidance, documentation, and mentorship where required.

Key Skills & Experience

3–5+ years' experience in cloud and infrastructure engineering or related roles.
Proven experience with AWS (EC2, IAM, networking, storage, etc.).
Strong proficiency in Linux administration.
Advanced experience in Ansible and Terraform for automation and configuration management.
Working knowledge of DevOps tools such as Git, Jenkins, or GitLab CI/CD.
Familiarity with monitoring tools and performance optimisation.
Experience managing hybrid environments (on-prem + cloud).
Strong troubleshooting and problem-solving skills.
Excellent communication and teamwork abilities.

Desirable / Advantageous

Certifications in AWS, Linux, or Terraform/Ansible.
Exposure to containerisation (Docker, Kubernetes) or Infrastructure as Code (IaC) best practices.
Experience within a high-availability or large-scale enterprise environment.

Personal Attributes

Ownership-driven, proactive, and solution-oriented.
Strong attention to detail with a focus on automation and efficiency.
Adaptable in dynamic environments with shifting priorities.
Passionate about continuous learning and staying updated with emerging technologies.
In order to comply with the POPI Act, for future career opportunities, we require your permission to maintain your personal details on our database. By completing and returning this form you give PBT your consent
If you have not received any feedback after 2 weeks, please consider you application as unsuccessful.

Skills

InfrastructureAWSLinuxCloud ArchitectureDevOps

Industries

BankingFinancial Services

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Senior Infrastructure & Cloud Engineer

Gauteng, Gauteng Discovery Limited

Posted today

Tap Again To Close

Job Description

full-time

Job title : Senior Infrastructure & Cloud Engineer Job Location : Gauteng, Deadline : November 16, 2025 Quick Recommended Links

Job Purpose

The Infrastructure and Cloud Engineer will optimize the company's on-premise network as well as Cloud Services. The incumbent will deploy Virtual applications on prem & on cloud environments and providing VMware/Cloud services support. They should exhibit sound knowledge of VMware ESX, Cloud Services (Azure) and supporting technologies. An accomplished Infrastructure and Cloud engineer will be someone whose expertise results in the successful integration of On Promises and Virtual/Cloud products across multiple data canters.

Areas of responsibility may include but are not limited to:

Administer and optimize VMware ESX environments.

Virtualize Windows and RHEL servers and manage clustering.

Deploy and support Citrix, MS IIS, and related infrastructure.

Implement patches, service packs, and security updates.

Troubleshoot SAN storage and related infrastructure.

Support backup and disaster recovery using tools like Netbackup.

Document infrastructure processes and maintain technical documentation.

Participate in incident, problem and change management procedures.

Providing technical support and documenting VMware processes.

Keeping informed of developments in VMware technologies and products.

Cloud Services

Design and implement scalable cloud architecture using Azure services.

Develop Infrastructure as Code (IaC) using Terraform, Bicep, and ARM templates.

Manage CI/CD pipelines and DevOps workflows using Azure DevOps.

Ensure cloud security and compliance with best practices.

Lead cloud migration and modernization projects.

Mentor junior engineers and collaborate with cross-functional teams.

Monitor and optimize cloud performance using Azure Monitor and Log Analytics.

Interact with stakeholders and present technical solutions.

Stay up to date with new public cloud technologies and vendors.

Technical Skills:

Technical support strategies and approaches.

Technical documentation creation and maintenance.

Incident Management and Problem Management procedures

Change Management Procedures

Troubleshooting and analytical skills.

Excellent communication and collaboration skills.

Extensive knowledge of VMware associated programs.

Extensive knowledge of Cloud infrastructure and related technologies.

Proficiency in Server operating systems such as Windows Server, RedHat Enterprise Linux etc.

Proficient in SAN and network architecture.

Exceptional analytical and technical aptitude.

Great organizational, time management, and problem-solving skills.

Education and Experience:

Bachelor’s degree in Computer Science/ Information Technology/ Computer Programming, or similar.

VMWare Certified Professional (VCP) preferred.

At least 5 years’ experience as a VMware Administrator at enterprise level.

Azure Solutions Architect Expert certification preferred.

At least 5 years’ experience in Cloud computing.

Experience with Kubernetes, Docker, and container orchestration.

ICT jobs

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Rosebank, Gauteng R250000 - R450000 Y Cartrack

Posted today

Tap Again To Close

Job Description

We're a world-leading smart mobility SaaS tech company with over 2,000,000 active users. Our teams are collaborative, vibrant and fast-growing, and all team members are empowered with the freedom to influence our products and technology.

Are you curious, innovative and passionate? Do you take ownership, embrace challenges, and love problem-solving?

We're looking for a Site Reliability Engineer (SRE) who will enable us to build industry disruptive tech products and revolutionize the way our customers use technology.

The Site Reliability Engineer (SRE) will be responsible for ensuring the reliability, performance, and scalability of Cartrack' Linux-based systems and services. This role combines software engineering with operations, focusing on automation, monitoring, and incident response. The position requires working in shifts and rotations to support 24/7 operations.

You want to

Maintain and improve the reliability, scalability, and performance of Cartrack' infrastructure and applications.
Implement automation for deployments, monitoring, and system management.
Troubleshoot production issues, perform root cause analysis, and implement permanent fixes.
Develop and manage monitoring, alerting, and incident response processes.
Work with development teams to design resilient and scalable systems.
Participate in on-call shifts and rotation schedules to manage incidents and ensure uptime.
Optimize system efficiency and cost-effectiveness in an open-source environment.

You have

Strong background in Linux/Unix system administration (open-source stack).
Familiarity with monitoring and logging tools (Prometheus, Grafana etc.).
Knowledge of networking, load balancing, and system security best practices.
Strong problem-solving and debugging skills in a production environment.
Proven experience in automation and scripting (Python, Bash, Go, or similar).
Ability to design and maintain automation frameworks for deployments, monitoring, and system recovery.
Hands-on experience with CI/CD pipelines and configuration management tools (e.g., GitLab CI, Ansible, Puppet, Terraform).
Experience building self-healing and auto-remediation solutions for production environments.

Nice to Have

Experience with containerization and orchestration (Docker, Kubernetes).
Exposure to microservices and service mesh environments.
Knowledge of database reliability and performance tuning (PostgreSQL).

Qualifications

Bachelor's degree in Computer Science, Information Systems, or equivalent practical experience.
3+ years of experience in SRE, DevOps, or related infrastructure/operations roles.
Ability to work flexible hours, including shift rotations and on-call duties.

Job Type: Full-time

Ability to commute/relocate:

Rosebank, Gauteng: Reliably commute or planning to relocate before starting work (Preferred)

Experience:

Linux: 4 years (Preferred)
SRE: 3 years (Preferred)
Network monitoring: 3 years (Preferred)

Work Location: In person

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Industry

View All Sre Manager Jobs

Search Suggestions

Recent Searches

Popular Searches

Location Suggestions

Popular Locations

Nearby Locations

Other Jobs Near Me

Industry

What Jobs are available for Sre Manager in South Africa?

Site Reliability Engineering

Job Description

Is this job a match or a miss?

Site Reliability Engineering

Job Description

Is this job a match or a miss?

Cloud Infrastructure Engineer

Job Description

Is this job a match or a miss?

Cloud Infrastructure Engineer

Job Description

Is this job a match or a miss?

Infrastructure Engineer – Cloud Infrastructure

Job Description

Is this job a match or a miss?

Senior Cloud Infrastructure Specialist

Job Description

Is this job a match or a miss?

Software Engineer- Cloud Infrastructure

Job Description

Is this job a match or a miss?

Be The First To Know

Cloud and Infrastructure Automation Engineer

Job Description

Is this job a match or a miss?

Senior Infrastructure & Cloud Engineer

Job Description

Is this job a match or a miss?

Site Reliability Engineer

Job Description

Is this job a match or a miss?

Nearby Locations

Other Jobs Near Me

Industry