311 Reliability Engineer jobs in South Africa
Reliability Engineer
Posted 5 days ago
Job Viewed
Job Description
A great mining company in Aggeneys, Northern Cape is seeking the expertise of a Reliability Engineer to join their team.
Responsibilities:
- Asset Management - Ensure that the entire fleet of equipment is properly recorded/accounted for on the Asset Management System/Asset Register. Ensure that the Asset management register and maintenance systems are kept up to date with a high level of data integrity. Ensure that all services, maintenance and repairs are properly scheduled, timeously executed and accurately recorded.
- Fleet Reliability and performance - Monitor Equipment reliability trends per equipment type to ensure that 85% fleet availability is maintained. Monitor Equipment MTBF and MTTR tempos/ratios to ensure that these are within industry norm and at acceptable standards. Develop and implement strategies to address and/or optimize/improve equipment performance when required. Develop and implement equipment lifecycle.
- Services and Repairs - Ensure that all engineering activities are pre-planned, scheduled, properly and timeously executed and conducted as per OEM guidelines and intervals for each machine type and model. Study and keep abreast of OEM manuals and specifications to ensure that all work is performed correctly. Assist Engineering team to perform diagnostics and troubleshooting to identify problems and resolve equipment malfunctions. Ensure that Artisans attend to all breakdowns or equipment malfunctions / failures reported with urgency and continuously update the Maintenance and production managers on repair progress and expected work completion timelines. Implement systems to ensure frequent over-inspections are executed on work done by artisans and assistants to ensure that work is executed to high standards. Implement systems to ensure that no equipment leaves before re-starting of equipment.
- Legal duties and obligations - Perform legal duties as Appointee in terms of Mine Health and Safety Act. Ensure that all subordinates are authorized to work in writing by the applicable Engineer. Ensure that all activities are conducted in a safe and responsible manner and promote a safe work culture. Ensure compliance with all client/company SOP's, Policies, procedures, provisions and stipulations of the MHSA.
- Administration and Housekeeping - Ensure that all designated areas of work are neat and well kept. Implement and administer tool equipment issue control registers and exercise proper control of all assets, including tools, equipment and LDV's, allocated to the section/Superintendents. Ensure that all artisans and employees within the section adhere to stores and procurement procedures when spares or other stock are required. Ensure that all work performed by subordinates is duly captured and accounted for on job cards, signed off and submitted to the maintenance administration department. Ensure that engineering stock levels and items on site match with counts and reports on the maintenance system.
- Budgeting, reporting and continuous improvement - Ensure that allocated section budget is adhered to and behave in a cost-conscious manner at all times. Investigate premature component failures in conjunction with the Engineering Manager to perform root cause analysis. Compile and present a monthly report for the Section. Develop, propose and implement cost saving initiatives.
- Engineering HR Function (To be performed in conjunction with the HR Dept) - Act as mentor for allocated section. Identify and propose training interventions and programs to Engineering Manager to ensure that subordinates are developed in terms of set IDP's. Develop and implement a succession plan for Dept. Manage discipline and initiate disciplinary action. Manage time and attendance, sick leave and leave rosters for Dept. Ensure that overtime is planned, properly managed and pre-approved as per overtime procedures. Monitor time spent by subordinates on the mine and ensure compliance with legal attendance parameters.
- Manage and lead the planning department - Ensure that all maintenance strategies are planned according to the mining operating model. Provide coaching, support and leadership to all subordinates within Dept. Ensure that all the maintenance strategies are automated on the computer-based maintenance system.
Requirements:
- Qualified Mechanical Engineer (N Dip or Degree) with at least 5 years related post-graduation experience.
- Additional course or qualification completed related to Maintenance System Management / Reliability Engineering (Heavy Earthmoving Equipment).
- Valid Driver's License and clean criminal record.
- Excellent knowledge of heavy earthmoving equipment fleet maintenance, repairs and diagnostics within an open cast mining environment.
- High level of energy and ability to work under pressure to meet deadlines.
- Job ownership and accountability.
- Ability to schedule and prioritise work, multi-tasking ability.
- Excellent people (supervisory) and administration skills (including computer skills, MS Office, Pragma/Asset management systems).
- Good knowledge of MHSA and Applicable Labour Legislation (BCEA, LRA).
- Financial/Numerical acumen and presentation skills.
- Sober habits and professional.
- Willing to work extended hours.
Benefits:
Hire Resolve is a top-tier recruitment firm that focuses on placing skilled professionals in permanent employment.
Hire Resolve focuses on working with senior-level executives and we pride ourselves on delivering excellent service to our candidates and clients.
- Salary: negotiable.
- Our client is offering a highly competitive salary for this role based on experience.
- Apply for this role today, visit the Hire Resolve website: hireresolve.us or email us:
- Alternatively, you are welcome to connect with Chandre Cordier on LinkedIn.
Reliability Engineer
Posted today
Job Viewed
Job Description
Development of a maintenance plan for all assets on site. The Maintenance plan and reliability needs to ensure that the integrity, obsolescence, end of life cycle of the asset is addressed timeously, to actively inform of changes, replacements and improvements that need to be affected.
REQUIREMENTS :
National Diploma / Degree / BTech in Mechanical or Electrical Engineering
At least 4years experience in Improving Equipment Reliability and Asset Care / Maintenance
In Possession of or in progress of obtaining Government Certificate of Competency is essential for GMR2.1 appointment.
CORE FUNCTIONS
PLC System Integration and maintenance,
- Development of Plant Asset Care Plan using Reliability Centered Maintenance processes;
Maintenance and updating of all asset care plans
Development, implementation and optimizing asset life cycle strategies,
Focus on electrical instrumentation, SCADA development and automation.
Trouble shooting and fault diagnostics on PLC controlled
equipment.
#J-18808-LjbffrReliability Engineer
Posted 3 days ago
Job Viewed
Job Description
REQUIREMENTS:
National Diploma/Degree/BTech in Mechanical or Electrical Engineering
At least 4years experience in Improving Equipment Reliability and Asset Care/Maintenance
In Possession of or in progress of obtaining Government Certificate of Competency is essential for GMR2.1 appointment.
CORE FUNCTIONS
PLC System Integration and maintenance,
Development of Plant Asset Care Plan using Reliability Centered Maintenance processes;
Maintenance and updating of all asset care plans
Development, implementation and optimizing asset life cycle strategies,
Focus on electrical instrumentation, SCADA development and automation.
Trouble shooting and fault diagnostics on PLC controlled
equipment.
Site Reliability Engineer
Posted today
Job Viewed
Job Description
We are looking for a skilled Site Reliability Engineer (SRE) with expertise in Ansible and Linux to join our dynamic team. The successful candidate will play a critical role in maintaining the reliability, scalability, and performance of our infrastructure, driving automation, and collaborating with development teams to optimize system efficiency.
Key Responsibilities
- Infrastructure Automation
- Automate and maintain IT infrastructure using Ansible to streamline operations.
- System Administration (Linux and Windows)
- Manage virtual and physical Windows and Linux servers.
- Automate server patching and updates to ensure systems remain current.
- Implement automated security measures for all servers.
- Monitor server performance and health.
- Maintain comprehensive system documentation, including configuration and troubleshooting guides.
- Conduct troubleshooting and root cause analysis as needed.
- Ensure robust backup, disaster recovery, and business continuity plans are in place and followed.
- Azure Cloud Management
- Collaborate with DevOps to deploy, configure, and manage Azure virtual machines and resources.
- Monitor cloud services for availability, performance, and security.
- Work with the networking team to implement, monitor, and secure cloud networking infrastructure.
- Ensure backup, disaster recovery, and business continuity plans are maintained for cloud systems.
- System Monitoring and Optimization
- Deploy and maintain monitoring tools for proactive system oversight and alerting.
- Analyze performance data to identify and resolve bottlenecks.
- Conduct capacity planning to support scalability and meet business needs.
- Partner with development teams to enhance application performance on infrastructure.
- Documentation and Collaboration
- Create and update technical documentation, including system configurations and procedures.
- Work with cross-functional teams to provide technical support and solutions.
- Participate in on-call rotations and respond promptly to system emergencies.
- Stay informed on industry trends, emerging technologies, and best practices in system administration, cloud computing, and virtualization.
Qualifications
- Bachelors degree in Computer Science, Information Technology, or a related field (or equivalent experience).
- Relevant certifications (e.g., Linux Professional Institute (LPIC), Microsoft Certified: Azure Administrator Associate) are a plus.
Experience & Technical Skills
- Minimum of 8 years in an Enterprise IT environment, with at least 3 years in a DevOps or SRE role.
- Strong expertise in Ansible for automation and configuration management.
- Proficient in Linux system administration (installation, configuration, troubleshooting).
- Hands-on experience with hypervisor technologies (e.g., VMware, Hyper-V, Proxmox).
- Knowledge of containerization technologies (e.g., Docker, Kubernetes).
- Experience managing Azure cloud services, including VMs, storage, networking, and security.
- Proficiency in scripting languages (e.g., Bash, PowerShell, Python) for automation.
Skills & Competencies
- Excellent problem-solving skills and ability to work independently or in a high-performance team.
- Strong sense of ownership over tasks, projects, and issues.
- Effective communication and interpersonal skills to collaborate with stakeholders at all levels.
Site Reliability Engineer
Posted today
Job Viewed
Job Description
Location: North Riding, Johannesburg, South Africa
Type: Full-time
Office: Hybrid, 3 days in office a week
ABOUT YOU
A committed and capable Site Reliability Engineer (SRE) to take ownership of the uptime, performance, and scalability of our production and development systems. You will be responsible for managing the hosting environments of our ERP, customer platforms, internal applications, databases, and websites, ensuring they are secure, available, and optimised across all stages of deployment. This position is based in Johannesburg, offers a competitive salary, and provides an opportunity to build the foundations of infrastructure excellence for one of South Africa’s most promising fintech ventures.
As a Site Reliability Engineer, you will be the guardian of our technical stability and infrastructure performance. You will manage and optimise hosting environments across production and development instances, covering platforms like Odoo ERP, WhatsApp chatbot systems, APIs, internal tools, external facing websites and reporting databases. Your work ensures that the systems powering over 50 000 Sales Force members and thousands of end users remain resilient, scalable, and secure.
You will collaborate with engineers, product managers, and business teams to design infrastructure strategies, improve observability, manage deployments, respond to incidents, and drive continuous improvement. This is a rare opportunity to shape the infrastructure blueprint of a high growth, impact focused business from the ground up.
ABOUT US
Who we are and what we do.Asuer is a fintech company committed to making life simpler and more secure for African communities through innovative financial and technology solutions. We operate across insurance and telecommunications, with plans to expand into digital payments. Our focus is on removing barriers and helping people achieve their goals.
Born from the ongoing digital transformation of Botle Buhle Brands (BBB), one of Africa’s leading direct-selling businesses, Asuer has grown into an independent company centred on financial inclusion and accessible technology. Everything we build is guided by our core values: Impact, Innovation, and Integrity.
- Managing and monitoring the infrastructure of our ERP systems, applications, APIs, and databases.
- Ensuring high availability and scalability of production environments and development pipelines.
- Administering cloud environments including deployments, rollbacks, and updates.
- Establishing and maintaining CI CD workflows for rapid and safe deployments.
- Setting up monitoring, logging, and alerting systems to track system health and performance.
- Investigating and resolving production incidents in a timely and thorough manner.
- Implementing backup, recovery, and failover processes to ensure data integrity.
- Improving observability and reporting across environments and services.
- Hardening infrastructure security and enforcing access controls and best practices.
- Supporting development teams with staging, test, and release environments.
- Automating routine tasks to improve system efficiency and reduce human error.
- Experience managing Linux based production environments preferably on Ubuntu
- Strong proficiency in cloud hosting platforms such as AWS or Google Cloud
- Solid understanding of containerisation using Docker and orchestration tools
- Experience with CI CD tools and pipeline automation
- Familiarity with infrastructure as code tools such as Terraform or Ansible
- Comfortable working with PostgreSQL and database administration best practices
- Networking, DNS, and load balancing
- Monitoring and alerting using tools like Grafana, Prometheus, or cloud native solutions
- Understanding of secure deployment practices including firewalls, SSL, and API rate limiting
- Set up and manage reliable and scalable hosting environments
- Diagnose and resolve incidents efficiently with minimal downtime
- Collaborate with software teams to enable faster and safer deployments
- Document infrastructure processes and maintain infrastructure knowledge bases
- Implement DevOps and SRE practices tailored to a fast moving startup context
- Build processes that are robust and scale as the company grows
- Balance performance, security, and simplicity in all infrastructure decisions
- Odoo hosting and maintenance workflows
- Hosting ERP systems, databases, and API driven platforms
- Securing web infrastructure and access credentials
- Optimising costs and performance in cloud environments
- Scripting and automation using Bash, Python, or similar
- Logging and system observability tools
- Fast recovery planning and disaster mitigation
- A tertiary qualification in Computer Science, Information Technology, or a related field
- Minimum of 3 years of experience in a systems administration, DevOps, or SRE role
- Strong problem solving, troubleshooting, and communication skills
- Proficiency in English reading, writing, and speaking
A BIT MORE ABOUT US
At Asuer, you’ll join a mission with real meaning, where your work empowers thousands of people across Africa. You’ll collaborate with smart, curious teammates who move fast and build with purpose, without the drag of legacy systems. We offer competitive pay, a flexible environment, and the autonomy to shape systems from the ground up. This is a place for real growth, where you scale products that matter and make a tangible impact every day.
#J-18808-LjbffrSite Reliability Engineer
Posted today
Job Viewed
Job Description
Zepz powers two leading global remittance brands, WorldRemit and Sendwave, to build the next generation of cross-border payments. Serving over 9 million customers across 4,000 corridors, Zepz is transforming how money moves across borders by making it faster, safer and more convenient. Its innovative digital solutions are designed to break down financial barriers and expand access to better financial tools. Zepz operates across a broad global footprint, connecting the global north and south and enabling migrants to support loved ones, fuel local economies and build better futures.
- We act like owners - We are relentlessly delivering for our users and spending money thoughtfully.
- We embrace embarrassing honesty - We function best when we're open and honest with one another — especially about our challenges and doubts.
- We have a bias to action - We get to first outcomes quickly, iterate and learn.
- We strive to be better - We may make mistakes, but always learn from them.
- We are inclusive - to better reflect and serve our users.
Working in the Site Reliability Engineering team, you’ll be helping ensure the stability, resilience and scale of our services through automation, observability and infrastructure engineering. The work is varied; from helping engineering teams deploy monitoring, to designing and implementing new SRE tools and techniques, our team is proactive and always involved.We are a fast moving team operating in a growing Fintech company, supporting engineers on three continents.We use a modern DevOps and SRE tech stack –Github Actions, K8s, ArgoCD, Grafana, AWS, Terraform, and Agile working practices to get the job done.As a member of Zepz’s SRE team you will aim high, embrace challenges and always do what’s right; acting with integrity and building trust as you contribute to the company’s technical direction and long term decision making.
Reporting to the SRE Manager you will:- Use code to solve problems. configuration, infrastructure, tooling, and automation, everything must be solved by writing high quality code that performs and scales.
- Using best practices and standards in regards to Observability, Monitoring, Alerting, Capacity Planning, availability, performance/latency, change, troubleshooting for all our Tech services.
- Work closely with feature teams to ensure that services are correctly monitored, change is delivered in a safe and secure way, resilience is built into our product and our standards and best practices adopted.
- Lead or be involved in the troubleshooting of complex incidents and problems.
- Have visibility on end to end service to our customers and ensure their journey is stable and consistent across all the microservices and 3rd party dependencies with the observability tool you will have implemented with the Engineering teams.
- Helping the team meet its strategic goals; to maintain the highest level of observability, maximize developer velocity while keeping our product reliable, and ensure that we can deliver the highest quality experience to our customers.
- Growing together. You’ll review others' work and happily seek feedback on yours to ensure we build a better codebase and sharpen each other's skills.
- A skilled Engineer. At least 5 years in SRE, DevOps or Engineer role with a keen interest in solving problems using automation.
- Understand SRE and DevOps methodologies. You understand the build and deployment cycle of an application, and how to operate a resilient system.
- A focus on observability. Observability is key to operating a truly reliable and scalable system. We are looking for engineers who can "Monitor Everything & Measure Everything", driving a culture of observability. Experience with Grafana, Loki and Prometheus.
- Holistic view on application delivery. You understand the use of many systems; monitoring, logging, alerting, and scaling. To build a robust platform which can respond to varying demands from both external sources (traffic) and internal sources (feature team delivery) in a safe and controlled manner. You have experience supporting or developing applications written in Java, Python or node.js.
- Systematic problem-solving approach. You should have an understanding of how to analyze, and troubleshoot large-scale distributed systems.
- Happy in the Clouds. Our Cloud Native platform is hosted on AWS. You’ll be comfortable working with a system that supports users from around the world, at scale.
- Bias for action. You see a problem, you fix a problem. You get buy-in for your solutions and keep tickets moving. We’re always looking for ways to ship at pace.
- Growth mindset. A willingness to use your skills and experience to mentor less-experienced engineers. A desire to learn from others and make yourself better every day.
- Agile outlook. You need to be excited about working in a fast-changing environment. Products, tools, frameworks and processes change, we evolve and take the best bits with us. The teams drive the evolution.
- Disciplined and self managed. You need to own your role and be disciplined about adhering to protocols and processes. As a senior you will always ensure you are bringing value to the team and driving tasks to completion without being actively managed.
- Have experience working in a FinTech space
- Have experience working in a distributed team across different geographies and timezones
What you’ll get from us
Please note that the benefits below will apply to permanent roles.We have five core benefits for our talent in the US, UK, Philippines, Poland, and South Africa. specifically:
- Unlimited Annual Leave: Feel free to make the most of your time off and maintain a healthy work-life balance!
- Private Medical Cover: You can opt-in to a Private Medical Insurance scheme. This provides you with access to thorough medical coverage, so you can feel confident in your health and well-being.
- Retirement: We offer pension schemes to help you plan for and secure your future.
- Life Assurance: Life assurance is available to give you peace of mind and protect your loved ones in case of the unexpected.
- Parental Leave: We offer competitive parental leave schemes to ensure you are spending as much quality time with your new bundle of joy as possible.
We are also remote-first as an organisation, offering flexibility for you to work where you need to be most productive. In addition to the above, you will discover that we have a range of secondary perks (such as the cycle-to-work scheme and employee discounts) depending on your location, to help you thrive at Zepz!
Why choose Zepz?- Our team of over 1,000 employees is fully distributed across the world. We are working from coffee shops, homes, and co-working spaces — making us one of the larger fully distributed growth-stage startups in the world but we also offer workspace in our talent cluster locations - spaces we can meet, collaborate and connect.
- We are proud parents, community organizers, farmers, band members, yoga teachers, YouTube influencers, former Olympians, and serial entrepreneurs.
- We collectively speak over twenty languages, including Akuapem, Amharic, Bengali, Ewe, Fante, Ga, Igbo, Kalenjin, Luganda, Oromo, Somali, Swahili, Wolof, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish and Swedish.
- At Zepz, embodying our commitments binds us together. We are collectively passionate about striving to achieve our vision and purpose - to continue to provide the best service to our users.
Applications will be reviewed on a rolling basis. If interested, please submit your resume along with a cover letter (optional), highlighting why your experience demonstrates you meet the requirements of the role. Please also indicate the countries in which you have work authorization.
Confidence can sometimes hold us back from applying for a job. But we'll let you in on a secret: there's no such thing as a 'perfect' candidate. Zepz is a place where everyone can thrive.
So however you identify and whatever background you bring with you, and if at all you might need any form of support to make the process as comfortable as possible, please let us know and give us a shot by applying. We want you to be excited to wake up to make an impact every day.
Create a Job Alert
Interested in building your career at Zepz? Get future opportunities sent straight to your email.
Apply for this jobindicates a required field
First Name *
Last Name *
Email *
Phone *
Location (City) *
Resume/CV *
Enter manually
Accepted file types: pdf, doc, docx, txt, rtf
Share your LinkedIn profile *
Are you based in South Africa? * Select.
Will you need a visa to work in South Africa? * Select.
#J-18808-LjbffrSite Reliability Engineer
Posted today
Job Viewed
Job Description
About our Team
LexisNexis Legal & Professional, which serves customers in more than 150 countries with 11,800 employees worldwide, is part of RELX , a global provider of information-based analytics and decision tools for professional and business customers. Our company has been a long-time leader in deploying AI and advanced technologies to the legal market to improve productivity and transform the overall business and practice of law, deploying ethical and powerful generative AI solutions with a flexible, multi-model approach that prioritizes using the best model for each individual legal use case.
About the role:
Our CEMEA Cloud/SRE team is looking for an experienced DevOps Engineer to help build scalable, secure, and reliable systems. Our team specializes in cloud and DevOps technologies, with members possessing varying levels of expertise in areas such as Kubernetes, development, and database administration. Cedric emphasizes a collaborative working style and values team members who are proactive in communication and knowledge sharing.
Responsibilities
- Build and improve CI/CD pipelines
- Manage AWS cloud infrastructure
- Ensure high availability, observability, and performance
- Automate systems for efficiency and cost optimization
- Collaborate across engineering, ops, and leadership teams
Requirements
- Bachelor’s Degree or Advanced Diploma in Information Systems, Computer Science, Mathematics, Engineering, and a minimum of 5 years experience in a software/technology environment.
- Certifications in AWS or Kubernetes are advantageous.
- At least 5 years of experience in a DevOps or SRE role.
- AWS Expertise: Preferably 5 years comprehensive experience with AWS services, including EC2, Lambda, DynamoDB, Aurora RDS PostgreSQL, and AWS OpenSearch.
- Kubernetes Proficiency: Preferably 2 years hands-on experience with deploying, managing, and scaling applications in Kubernetes environments. Practical experience with Helm and ArgoCD.
- Understanding of containerization concepts and tools like Docker/Podman.
- Infrastructure as Code (IaC): Preferably 2 years experience with IaC tools like Terraform to manage and automate cloud resources effectively. Terraform is preferred.
Work in a way that works for you
We promote a healthy work/life balance across the organization. We offer an appealing working prospect for our people. With numerous wellbeing initiatives, shared parental leave, study assistance, and sabbaticals, we will help you meet your immediate responsibilities and your long-term goals.
- Working flexible hours - flexing the times when you work in the day to help you fit everything in and work when you are most productive
Working for you
We know that your well-being and happiness are key to a long and successful career. These are some of the benefits we are delighted to offer:
- Medical Aid
- Retirement Plan inclusive of Risk Benefits (Disability, Critical Illness, Life Cover & Funeral Cover)
- Modern family benefits, including adoption and surrogacy
- Study Leave
About the Business
LexisNexis Legal & Professional provides legal, regulatory, and business information and analytics that help customers increase their productivity, improve decision-making, achieve better outcomes, and advance the rule of law around the world. As a digital pioneer, the company was the first to bring legal and business information online with its Lexis and Nexis services.
We are committed to providing a fair and accessible hiring process. If you have a disability or other need that requires accommodation or adjustment, please let us know by completing our Applicant Request Support Form or contact 1- .
Criminals may pose as recruiters asking for money or personal information. We never request money or banking details from job applicants. Learn more about spotting and avoiding scams here .
Please read our Candidate Privacy Policy .
We are an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law.
USA Job Seekers:
#J-18808-LjbffrBe The First To Know
About the latest Reliability engineer Jobs in South Africa !
Site Reliability Engineer
Posted today
Job Viewed
Job Description
Who are Tyk, and what do we do?
The Tyk API Management platform is helping to drive the connected world and power new products and services. We’re changing the way that organisations connect any number of their systems and services. Whether internal, external, public or highly encrypted systems, Tyk helps businesses drive value across the retail, finance, telecoms, healthcare, or media industries (to name just a few!)
If you’ve banked online, used an app to check the news, or perhaps even driven a connected car, API’s, and by extension, Tyk, make that possible. Founded in 2015 with offices in London – UK, London – Ontario, Atlanta and Singapore, we have many thousands of users of our B2B platform across the globe. Brands using Tyk range from Lotte, Bell, Dominos, Starbucks, to RBS and Societe Generale. We have a varied user base hailing from every continent – even Antarctica.
Our Mission
Tyk is on a mission to connect every system in the world. We’ve started by building an API Management platform.
Total flexibility, default remote, radical responsibility
We offer unlimited paid holidays and remote working from anywhere in the world, for everyone, Why? Tyk was founded on the principle of offering flexibility and autonomy to our employees, we believe this allows our employees to achieve their best results. It also means we can build the best possible team, location and working hours are no barrier.
If this sounds like an environment that you believe could work for you then read on to find out more.
The role:
At Tyk, we’re obsessed with building software that solves problems. We count on our Site Reliability Engineers (SREs) to empower users with a rich feature set, high availability, and stellar performance level to pursue their missions.
Our customer base is growing, so we’re seeking an experienced SRE to optimise, automate, and improve our performance, using insights from massive-scale data in real time. We want an original thinker, a challenger, a technical legend, an opinionated collaborator who wants to make things better.
Here’s what you’ll be getting up to:
- Proactive Monitoring : Ensure our production Cloud environment operates within defined SLAs through vigilant monitoring and proactive issue resolution.
- Alerting and Monitoring : Collaborate with Senior SRE to identify opportunities for building proactive alerting and monitoring systems; implement solutions to enhance system reliability.
- Performance Metrics: Contribute to defining key performance metrics for Cloud services, enabling performance improvements and success measurement.
- Solutions Development : Propose and develop solutions to maintain and enhance key performance indicators (KPIs) across our Cloud infrastructure.
- Data Analysis: Gather and analyse metrics from operating systems and applications to optimise system performance and expedite fault resolution.
- Innovation : Drive innovation by optimising system and infrastructure performance, anticipating customer needs, and proactively addressing scaling demands.
- Scalability : Work closely with commercial functions to optimise our platform for scalability and meet growing customer demands.
- Cloud Infrastructure : Analyse and ensure the automation, scalability, and efficient management of our Cloud infrastructure.
- Automation : Execute automation for known cloud operations tasks and create new automation solutions to streamline processes.
- Software Development : Design, write, and deliver software and automation solutions to enhance the availability, scalability, latency, and efficiency of our PaaS services.
- Root Cause Analysis : Participate in blame-free root cause analysis meetings to promote learning and continuous system improvement in the event of production system incidents.
- Documentation : Create and contribute to policies and runbooks to ensure that operational processes are well-documented and consistently followed.
- On-call Support : Provide on-call support, ensuring our Cloud services follow a 24/7 model by promptly responding to alerts, meeting SLAs, and automating root cause analysis.
- Upgrades and Migrations : Plan and execute software upgrades, including Kubernetes versions. Manage and communicate migrations from Classic Cloud to the new Cloud platform.
Here’s what we’re looking for:
- Strong collaboration skills
- Launching and operating production Kubernetes clusters
- Designing and operating infrastructure on AWS and other providers
- Operating MongoDB (or other document database) clusters
- Operating Redis (or other key-value storage) clusters
- Administering Linux servers
- Maintaining distributed software
- Operating Prometheus and Grafana
- Operating logging collection and analysis system
Skills:
- Kubernetes & containers (proficient)
- Go and/or Python (advanced)
- AWS (proficient)
- Linux (proficient)
- Terraform and IaC in general (proficient)
- Helm (familiar)
- MongoDB (or similar)
- Redis (or similar)
- Monitoring & logging
- Grasp of networking concepts (subnets, routing, peering, load balancing, NAT, etc.)
- Common networking protocols (DNS, TCP/IP, HTTP, TLS, UDP)
Benefits
Here’s why you should join us:
- Everyone has unlimited paid holidays.
- We have total flexibility in hours, as we believe creativity flows better when our people are given freedom to decide when they are most productive. Everyone is unique after all.
- Employee share scheme
- Generous maternity and paternity leave
- Volunteering Days
- Company retreats
- Employee Wellbeing platform
We all share the same vision – we value authenticity, respect, responsibility, independence, honesty, diversity and inclusion and most importantly treating others how you wish to be treated. We look for like-minded people who bring their personalities to work everyday, strive to achieve their personal goals and who are willing to challenge the way we do things, why? – to make what we do even better!
Our values tell the story of Tyk – here’s how:
- It’s ok to screw up!
We’ve found that it’s often the ‘stupid’ or unexpected ideas that turn out to be the successful ones – so try it, at least we can say we have!
- The only stupid idea, is the untested one!
It’s in our DNA – starting a business with founders 12 hours apart, giving our gateway away for free – sure, we did that, and we’d do it again!
- Trust starts with you – make it count!
Trust is a two-way street – instil it from day one!
- Assume best intent!
We have each other’s back – we’re all on the same team. Think before you speak or act.
- Make things better!
Always try to leave things better than when you found them – change is constant, inevitable and embraced! Be that change we want to see.
What’s it like to work here! check it out:
Tyk is an equal opportunities employer and we are determined to ensure that no applicant or employee receives less favourable treatment on the grounds of gender, age, disability, religion, belief, sexual orientation, marital status, or race, or is disadvantaged by conditions or requirements which cannot be shown to be justifiable.
#J-18808-LjbffrSite Reliability Engineer
Posted today
Job Viewed
Job Description
AboutourTeam
LexisNexisLegal&Professional,whichservescustomersinmorethan150countrieswith11,800employeesworldwide,ispartof RELX ,aglobalproviderofinformation basedanalyticsanddecisiontoolsforprofessionalandbusinesscustomers.Ourcompanyhasbeenalong-timeleaderindeployingAIandadvancedtechnologiestothelegalmarkettoimproveproductivityandtransformtheoverallbusinessandpracticeoflaw,deployingethicalandpowerfulgenerativeAIsolutionswithaflexible,multi-modelapproachthatprioritizesusingthebestmodelfromtoday’stopmodelcreatorsforeachindividuallegalusecase.
About the role:
Our CEMEA Cloud/SRE team is looking for an experienced DevOps Engineer to help build scalable, secure, and reliable systems.Our team specializes in cloud and DevOps technologies, with members possessing varying levels of expertise in areas such as Kubernetes, development, and database administration. Cedric emphasizes a collaborative working style and values team members who are proactive in communication and knowledge sharing.
Responsibilities
- Build and improve CI/CD pipelines
- Manage AWS cloud infrastructure
- Ensure high availability, observability, and performance
- Automate systems for efficiency and cost optimization
- Collaborate across engineering, ops, and leadership teams
Requirements
- Bachelor’s Degree or Advanced Diploma in Information Systems, Computer Science, Mathematics, Engineering and a minimum of 5 years experience in a software/technology environment is required.
- Bachelor’s Degree or Advanced Diploma in Information Systems, Computer Science, Mathematics, Engineering and a minimum of 5 years experience in a software/technology environment is required.
- Certifications in AWS or Kubernetes are advantageous.
- At least 5 years of experience in a DevOps or SRE role.
- AWS Expertise: Preferably 5 years comprehensive experience with AWS services, including EC2, Lambda, DynamoDB, and Aurora RDS PostgreSQL and AWS OpenSearch.
- Kubernetes Proficiency: Preferably 2 years hands-on experience with deploying, managing, and scaling applications in Kubernetes environments. Practical experience with Helm and ArgoCD
- Understanding of containerization concepts and tools like Docker/Podman.
- Infrastructure as Code (IaC): Preferably 2 years experience with IaC tools like Terraform to manage and automate cloud resources effectively. Terraform is preferential.
Work in a way that works for you
We promote a healthy work/life balance across the organization. We offer an appealing working prospect for our people. With numerous wellbeing initiatives, shared parental leave, study assistance and sabbaticals, we will help you meet your immediate responsibilities and your long-term goals.
Working flexible hours - flexing the times when you work in the day to help you fit everything in and work when you are the most productive
Working for you
We know that your well-being and happiness are key to a long and successful career. These are some of the benefits we are delighted to offer:
Medical Aid
Retirement Plan inclusive of Risk Benefits (Disability, Critical Illness, Life Cover & Funeral Cover)
Modern family benefits, including adoption and surrogacy
Study Leave
About the Business
LexisNexis Legal & Professional provides legal, regulatory, and business information and analytics that help customers increase their productivity, improve decision-making, achieve better outcomes, and advance the rule of law around the world. As a digital pioneer, the company was the first to bring legal and business information online with its Lexis and Nexis services.
We are committed to providing a fair and accessible hiring process. If you have a disability or other need that requires accommodation or adjustment, please let us know by completing our Applicant Request Support Form or please contact 1- .
Criminals may pose as recruiters asking for money or personal information. We never request money or banking details from job applicants. Learn more about spotting and avoiding scams here .
Please read our Candidate Privacy Policy .
We are an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law.
USA Job Seekers:
#J-18808-LjbffrReliability Engineer Aggeneys
Posted 5 days ago
Job Viewed
Job Description
A great mining company in Aggeneys, Northern Cape is seeking the expertise of a Reliability Engineer to join their team.
Responsibilities:
- Asset Management - Ensure that the entire fleet of equipment is properly recorded/accounted for on the Asset Management System/Asset Register. Ensure that the Asset management register and maintenance systems are kept up to date with a high level of data integrity. Ensure that all services, maintenance and repairs are properly scheduled, timeously executed and accurately recorded.
- Fleet Reliability and Performance - Monitor Equipment reliability trends per equipment type to ensure that 85% fleet availability is maintained. Monitor Equipment MTBF and MTTR tempos/ratios to ensure that these are within industry norms and at acceptable standards. Develop and implement strategies to address and/or optimize/improve equipment performance when required. Develop and implement equipment lifecycle.
- Services and Repairs - Ensure that all engineering activities are pre-planned, scheduled, properly and timeously executed and conducted as per OEM guidelines and intervals for each machine type and model. Study and keep abreast of OEM manuals and specifications to ensure that all work is performed correctly. Assist Engineering team to perform diagnostics and troubleshooting to identify problems and resolve equipment malfunctions. Ensure that Artisans attend to all breakdowns or equipment malfunctions/failures reported with urgency and continuously update the Maintenance and production managers on repair progress and expected work completion timelines. Implement systems to ensure frequent over-inspections are executed on work done by artisans and assistants to ensure that work is executed to high standards. Implement systems to ensure that no equipment leaves before restarting of equipment (Implement a high level of control/control mechanisms to avoid "dry starts" at all times).
- Legal Duties and Obligations - Perform legal duties as Appointee in terms of Mine Health and Safety Act. Ensure that all subordinates are authorized to work in writing by the applicable Engineer. Ensure that all activities are conducted in a safe and responsible manner and promote a safe work culture. Ensure compliance with all client/company SOPs, Policies, procedures, provisions and stipulations of the MHSA.
- Administration and Housekeeping - Ensure that all designated areas of work are neat and well kept. Implement and administer tool equipment issue control registers and exercise proper control of all assets, including tools, equipment and LDVs, allocated to the section/Superintendents. Ensure that all artisans and employees within the section adhere to stores and procurement procedures when spares or other stock are required. Ensure that all work performed by subordinates is duly captured and accounted for on job cards, signed off and submitted to the maintenance administration department. Ensure that engineering stock levels and items on site match with counts and reports on the maintenance system.
- Budgeting, Reporting and Continuous Improvement - Ensure that allocated section budget is adhered to and behave in a cost-conscious manner at all times. Investigate premature component failures in conjunction with the Engineering Manager to perform root cause analysis. Compile and present a monthly report for the Section. Develop, propose and implement cost-saving initiatives.
- Engineering HR Function (To be performed in conjunction with the HR Dept) - Act as mentor for allocated section. Identify and propose training interventions and programs to Engineering Manager to ensure that subordinates are developed in terms of set IDPs. Develop and implement a succession plan for Dept. Manage discipline and initiate disciplinary action. Manage time and attendance, sick leave and leave rosters for Dept. Ensure that overtime is planned, properly managed and pre-approved as per overtime procedures. Monitor time spent by subordinates on the mine and ensure compliance with legal attendance parameters.
- Manage and Lead the Planning Department - Ensure that all maintenance strategies are planned according to the mining operating model. Provide coaching, support and leadership to all subordinates within Dept. Ensure that all the maintenance strategies are automated on the computer-based maintenance system.
Requirements:
- Qualified Mechanical Engineer (N Dip or Degree) with at least 5 years related post-graduation experience.
- Additional course or qualification completed related to Maintenance System Management / Reliability Engineering (Heavy Earthmoving Equipment).
- Valid Driver's License and clean criminal record.
- Excellent knowledge of heavy earthmoving equipment fleet maintenance, repairs and diagnostics within an open cast mining environment.
- High level of energy and ability to work under pressure to meet deadlines.
- Job ownership and accountability.
- Ability to schedule and prioritize work, multi-tasking ability.
- Excellent people (supervisory) and administration skills (including computer skills, MS Office, Pragma/Asset management systems).
- Good knowledge of MHSA and Applicable Labour Legislation (BCEA, LRA).
- Financial/Numerical acumen and presentation skills.
- Sober habits and professionalism.
- Willing to work extended hours.
Benefits:
Hire Resolve is a top-tier recruitment firm that focuses on placing skilled professionals in permanent employment.
Hire Resolve focuses on working with senior-level executives and we pride ourselves on delivering excellent service to our candidates and clients.
- Salary: negotiable.
- Our client is offering a highly competitive salary for this role based on experience.
- Apply for this role today, visit the Hire Resolve website: hireresolve.us or email us:
- Alternatively, you are welcome to connect with Chandre Cordier on LinkedIn.