2,179 Senior Data Engineer jobs in South Africa
Senior Data Engineer
Job Viewed
Job Description
Overview
Our client is seeking a highly skilled Senior Data Engineer with deep expertise in Snowflake and Matillion to design, build, and optimize scalable data pipelines and integration solutions. The ideal candidate will have a strong background in cloud data warehousing, ETL / ELT processes, and modern data architecture.
Responsibilities- Data Management and API integration.
- Design, develop, and maintain robust ETL / ELT pipelines using Matillion and Snowflake.
- Optimize Snowflake data warehouse performance and manage data models.
- Collaborate with data scientists, analysts, and business stakeholders to understand data requirements.
- Ensure data quality, governance, and security best practices are implemented.
- Automate data validation, reconciliation, and monitoring processes.
- Document data architecture, pipelines, and integration processes.
- Mentor junior engineers and contribute to best practices and code reviews.
- Matric and a tertiary qualification
- 8+ years of experience in data engineering or related roles.
- Strong hands-on experience with Matillion ETL and Snowflake.
- Proficiency in SQL and at least one programming language (e.g., Python, Java, or Scala).
- Experience with cloud platforms (AWS, Azure, or GCP).
- Familiarity with data modeling, data warehousing, and ELT best practices.
- Experience with CI / CD pipelines, version control (Git), and DevOps practices.
- Knowledge of data governance, security, and compliance (e.g., GDPR, SoX).
Job No Longer Available
This position is no longer listed on WhatJobs. The employer may be reviewing applications, filled the role, or has removed the listing.
However, we have similar jobs available for you below.
Big Data Data Engineer
Posted 17 days ago
Job Viewed
Job Description
Big Data Data Engineer job vacancy in Johannesburg.
We are seeking a skilled Data Engineer to design and develop scalable data pipelines that ingest raw, unstructured JSON data from source systems and transform it into clean, structured datasets within our Hadoop-based data platform.
The ideal candidate will play a critical role in enabling data availability, quality, and usability by engineering the movement of data from the Raw Layer to the Published and Functional Layers.
OverviewBig Data Data Engineer job vacancy in Johannesburg.
Key Responsibilities:
- Design, build, and maintain robust data pipelines to ingest raw JSON data from source systems into the Hadoop Distributed File System (HDFS).
- Transform and enrich unstructured data into structured formats (e.g., Parquet, ORC) for the Published Layer using tools like PySpark, Hive, or Spark SQL.
- Develop workflows to further process and organize data into Functional Layers optimized for business reporting and analytics.
- Implement data validation, cleansing, schema enforcement, and deduplication as part of the transformation process.
- Collaborate with Data Analysts, BI Developers, and Business Users to understand data requirements and ensure datasets are production-ready.
- Optimize ETL/ELT processes for performance and reliability in a large-scale distributed environment.
- Maintain metadata, lineage, and documentation for transparency and governance.
- Monitor pipeline performance and implement error handling and alerting mechanisms.
- 3+ years of experience in data engineering or ETL development within a big data environment.
- Strong experience with Hadoop ecosystem tools: HDFS, Hive, Spark, YARN, and Sqoop.
- Proficiency in PySpark, Spark SQL, and HQL (Hive Query Language).
- Experience working with unstructured JSON data and transforming it into structured formats.
- Solid understanding of data lake architectures: Raw, Published, and Functional layers.
- Familiarity with workflow orchestration tools like Airflow, Oozie, or NiFi.
- Experience with schema design, data modeling, and partitioning strategies.
- Comfortable with version control tools (e.g., Git) and CI/CD processes.
- Experience with data cataloging and governance tools (e.g., Apache Atlas, Alation).
- Exposure to cloud-based Hadoop platforms like AWS EMR, Azure HDInsight, or GCP Dataproc.
- Experience with containerization (e.g., Docker) and/or Kubernetes for pipeline deployment.
- Familiarity with data quality frameworks (e.g., Deequ, Great Expectations).
- Bachelor’s degree in Computer Science, Information Systems, Engineering, or a related field.
- Relevant certifications (e.g., Cloudera, Databricks, AWS Big Data) are a plus.
Big Data Data Engineer
Posted 18 days ago
Job Viewed
Job Description
We are seeking a skilled Data Engineer to design and develop scalable data pipelines that ingest raw, unstructured JSON data from source systems and transform it into clean, structured datasets within our Hadoop-based data platform. The ideal candidate will play a critical role in enabling data availability, quality, and usability by engineering the movement of data from the Raw Layer to the Published and Functional Layers.
Key Responsibilities:
- Design, build, and maintain robust data pipelines to ingest raw JSON data from source systems into the Hadoop Distributed File System (HDFS).
- Transform and enrich unstructured data into structured formats (e.g., Parquet, ORC) for the Published Layer using tools like PySpark, Hive, or Spark SQL.
- Develop workflows to further process and organize data into Functional Layers optimized for business reporting and analytics.
- Implement data validation, cleansing, schema enforcement, and deduplication as part of the transformation process.
- Collaborate with Data Analysts, BI Developers, and Business Users to understand data requirements and ensure datasets are production-ready.
- Optimize ETL/ELT processes for performance and reliability in a large-scale distributed environment.
- Maintain metadata, lineage, and documentation for transparency and governance.
- Monitor pipeline performance and implement error handling and alerting mechanisms.
Technical Skills & Experience:
- 3+ years of experience in data engineering or ETL development within a big data environment.
- Strong experience with Hadoop ecosystem tools: HDFS, Hive, Spark, YARN, and Sqoop.
- Proficiency in PySpark, Spark SQL, and HQL (Hive Query Language).
- Experience working with unstructured JSON data and transforming it into structured formats.
- Solid understanding of data lake architectures: Raw, Published, and Functional layers.
- Familiarity with workflow orchestration tools like Airflow, Oozie, or NiFi.
- Experience with schema design, data modeling, and partitioning strategies.
- Comfortable with version control tools (e.g., Git) and CI/CD processes.
Nice to Have:
- Experience with data cataloging and governance tools (e.g., Apache Atlas, Alation).
- Exposure to cloud-based Hadoop platforms like AWS EMR, Azure HDInsight, or GCP Dataproc.
- Experience with containerization (e.g., Docker) and/or Kubernetes for pipeline deployment.
- Familiarity with data quality frameworks (e.g., Deequ, Great Expectations).
Qualifications:
- Bachelor’s degree in Computer Science, Information Systems, Engineering, or a related field.
- Relevant certifications (e.g., Cloudera, Databricks, AWS Big Data) are a plus.
* In order to comply with the POPI Act, for future career opportunities, we require your permission to maintain your personal details on our database. By completing and returning this form you give PBT your consent
* If you have not received any feedback after 2 weeks, please consider you application as unsuccessful.
Big Data Data Engineer
Posted today
Job Viewed
Job Description
Big data data engineer
Posted today
Job Viewed
Job Description
Data Engineer
Posted today
Job Viewed
Job Description
About Unifi
Unifi is redefining credit in Africa with simple, fast personal loans delivered through online, mobile and branch channels. We make life easy for thousands of clients across Zambia, Kenya, Uganda and South Africa. Unifi has conviction in the African continent and its people, and our products enable our clients to achieve even more. As one of the fastest-growing lenders in East Africa, we combine exceptional client service with the very best tech and data analytics.
Learn More About Unifi:
About The RoleUnifi is on the lookout for a talented Data Engineer with strong expertise in Google Cloud Platform (GCP) to join our fast-growing team. In this role, you’ll design, build, and maintain scalable data pipelines and architectures that power our business. You’ll collaborate closely with data scientists and analysts to ensure seamless data flow across the organisation, enabling smarter decisions and impactful solutions.
We’re looking for someone who is analytically sharp, self-motivated, and thrives in an unstructured environment. A genuine passion for African business is a must—along with a healthy sense of adventure and a good sense of humour to match our dynamic culture.
Responsibilities- Design and build scalable data pipelines and architectures using GCP technologies such as Dataflow, BigQuery, Pub/Sub, and Cloud Storage.
- Develop and manage ETL processes to transform diverse data sources into clean, structured formats for analysis and reporting.
- Partner with data scientists and analysts to understand their needs and deliver solutions that enable insights and decision-making.
- Create and maintain documentation for data pipelines, architecture, and data models to ensure clarity and consistency.
- Troubleshoot and resolve data-related issues quickly to minimise disruption.
- Continuously optimise data pipelines for performance, scalability, and cost efficiency.
- Automate workflows and processes through scripts and tools that streamline operations.
- Safeguard data quality and integrity across all sources, pipelines, and platforms.
- Stay ahead of the curve by keeping up with new GCP tools, best practices, and data engineering trends.
- Bachelor’s or Master’s degree in Computer Science, Data Science, or a related field.
- 5+ years’ experience as a Data Engineer or in a similar role.
- Strong programming skills in BigQuery, Python, SQL, and GCP.
- Proven expertise in ETL development and data modeling.
- Familiarity with data lakehouse concepts and techniques.
- Excellent problem-solving, analytical, and critical-thinking skills.
- Strong communication and collaboration abilities.
- Experience with Google Cloud Platform (GCP) technologies—especially BigQuery, with additional exposure to Dataflow, Pub/Sub, and Cloud Storage—considered highly beneficial.
- Background in financial services would be an added advantage.
Mid-Senior level
Employment typeFull-time
Job functionInformation Technology
IndustriesAirlines and Aviation
#J-18808-LjbffrData Engineer
Posted 1 day ago
Job Viewed
Job Description
InfyStrat is looking for a proficient Data Engineer to strengthen our data team. In this role, you will be integral in designing and implementing data pipelines that facilitate the efficient extraction, transformation, and loading (ETL) of data across various platforms. You will collaborate with data analysts, scientists, and business units to provide reliable and accurate datasets to drive decision-making processes. The ideal candidate should possess a combination of technical proficiency and creativity to solve complex data challenges. We foster a culture of innovation at InfyStrat, where the contributions of our team members are essential to transforming our data into valuable insights. Join us to help build data solutions that empower our business growth.
Key Responsibilities- Design, implement, and manage ETL processes to collect and transform data from diverse sources.
- Develop and maintain data models, ensuring they meet business needs and performance requirements.
- Optimize database performance and troubleshoot data-related issues.
- Collaborate with stakeholders to identify data needs and develop solutions accordingly.
- Implement data quality monitoring and validation to maintain data integrity.
- Keep up with industry trends and emerging technologies to continually enhance data engineering practices.
- Bachelor's degree in Computer Science, Data Science, or a related field.
- 5-12 years of experience in data engineering or a related role.
- Strong Knowledge in Snowflake + Metallion.
- Strong proficiency in SQL and experience with relational databases.
- Experience with data integration and ETL tools (such as Talend, Apache NiFi, or similar).
- Familiarity with big data frameworks (like Hadoop, Spark) and cloud computing platforms (AWS, Azure).
- Proficient in programming languages for data processing (Python, Scala, or Java).
- Problem-solving skills with a keen attention to detail.
- Ability to work independently and collaborate effectively within teams.
Data Engineer
Posted 1 day ago
Job Viewed
Job Description
Overview
We are seeking a Data Engineer with expertise in data architecture and modelling to drive the design and optimisation of our enterprise data platform in a dynamic logistics environment. This role blends hands on engineering with the shaping of the data platform architecture ensuring scalability data quality and strategic alignment with business objectives.
The incumbent will be responsible for building and refining enterprise data models that unify information from diverse sources including internal systems ERP platforms IoT sensor streams and third-party data ensuring that our platform supports high quality accessible and trusted data.
The focus will span across data modelling integration and architecture ensuring that the platform can support real-time analytics predictive insights and machine learning at scale. This includes designing efficient schemas implementing semantic layers centralising business logic and building data pipelines that adhere to a medallion architecture within Azure Synapse Analytics.
Furthermore the incumbent will collaborate with analytics and business stakeholders translate complex requirements into performant robust and future proof data solutions as well as develop indexing and partitioning strategies optimise data models for query efficiency and contribute to the creation of feature stores that power analytics and machine learning applications.
Duties & Responsibilities- Monitor maintain and improve data pipelines and data models and schemas in Azure Synapse Analytics.
- Design develop and optimise data models in a medallion architecture to support analytics and reporting needs.
- Work with third party consultants to implement data lake architecture.
- Build and maintain data pipelines to facilitate the smooth flow of data between various systems and databases.
- Ensure consistency in data model design including proper use of keys indexing and semantic layers to improve query performance.
- Provision datasets for consumption side users focusing on consistency and data integrity.
- Migrate business logic from Qlik Sense to Azure Synapse Analytics.
- Migrate on premises data systems to the Azure Cloud.
- Test and maintain new and existing API connections.
- Contribute to and expand a data taxonomy to ensure consistency and clarity in data organisation.
- Implement data security protocols across the entire data environment.
- Focused improvement initiatives of existing processes systems data reports.
- Focused improvement initiatives of cross-functional interactions within the organisation.
Experience and Qualification
- Bachelors degree in Computer Science Information Systems or a related field OR proven capability in working with enterprise data systems.
- Experience with cloud data platforms (Azure Preferred).
- Strong understanding of cloud data solutions and architectures and data modelling techniques including dimensional and relational modelling.
- Azure Data Engineer Associate (DP-203) (Preferred).
Skills and Attributes
- Highly SQL proficient with experience in building and optimising complex queries for data modeling.
- Highly proficient in Python / PySpark and scripting for data operations.
- Knowledge of open source orchestration tools such as Apache Airflow.
- Process design and implementation.
- Proficiency in designing and maintaining data schemas and keys including primary foreign and composite keys.
- Familiarity with data modeling best practices in cloud data platforms and indexing strategies.
- Familiarity with on-premises SQL Server databases and familiarity with migration processes to cloud platforms.
- Familiarity with Qlik Sense or similar reporting and analytics tools.
- Familiarity with enterprise systems and working with multiple databases.
- Strong problem-solving skills and a demonstrated ability to learn new technologies independently.
- Proactive approach to problem solving.
- Ability to function within various teams and environments but also work independently.
- Excellent communication skills.
Apache Hive,S3,Hadoop,Redshift,Spark,AWS,Apache Pig,NoSQL,Big Data,Data Warehouse,Kafka,Scala
Employment Type: Full-Time
Department / Functional Area: Administration
Experience: years
Vacancy: 1
#J-18808-LjbffrBe The First To Know
About the latest Senior data engineer Jobs in South Africa !
Data Engineer
Posted 1 day ago
Job Viewed
Job Description
Job title: Data Engineer
Job Location: Western Cape, Cape Town
Deadline: October 28, 2025
Job Advert Summary- We are seeking a Data Engineer with expertise in data architecture and modelling to drive the design and optimisation of our enterprise data platform in a dynamic logistics environment. This role blends hands on engineering with the shaping of the data platform architecture, ensuring scalability, data quality, and strategic alignment with business objectives.
- The incumbent will be responsible for building and refining enterprise data models that unify information from diverse sources including internal systems, ERP platforms, IoT sensor streams, and third-party data, ensuring that our platform supports high quality, accessible and trusted data.
- The focus will span across data modelling, integration and architecture, ensuring that the platform can support real-time analytics, predictive insights and machine learning at scale. This includes designing efficient schemas, implementing semantic layers, centralising business logic, and building data pipelines that adhere to a medallion architecture within Azure Synapse Analytics.
- Furthermore, the incumbent will collaborate with analytics and business stakeholders, translate complex requirements into performant, robust, and future proof data solutions as well as develop indexing and partitioning strategies, optimise data models for query efficiency and contribute to the creation of feature stores that power analytics and machine learning applications.
Minimum Requirements
Experience and Qualification- Bachelor's degree in Computer Science, Information Systems, or a related field OR proven capability in working with enterprise data systems.
- Experience with cloud data platforms (Azure Preferred).
- Strong understanding of cloud data solutions and architectures, and data modelling techniques, including dimensional and relational modelling.
- Azure Data Engineer Associate (DP-203) (Preferred).
- Highly SQL proficient, with experience in building and optimising complex queries for data modeling.
- Highly proficient in Python / PySpark and scripting for data operations.
- Knowledge of open source orchestration tools such as Apache Airflow.
- Process design and implementation.
- Proficiency in designing and maintaining data schemas and keys, including primary, foreign, and composite keys.
- Familiarity with data modeling best practices in cloud data platforms and indexing strategies.
- Familiarity with on-premises SQL Server databases and familiarity with migration processes to cloud platforms.
- Familiarity with Qlik Sense or similar reporting and analytics tools.
- Familiarity with enterprise systems and working with multiple databases.
- Strong problem-solving skills and a demonstrated ability to learn new technologies independently.
- Proactive approach to problem solving.
- Ability to function within various teams and environments, but also work independently.
- Excellent communication skills.
- Monitor, maintain and improve data pipelines and data models and schemas in Azure Synapse Analytics.
- Design, develop, and optimise data models in a medallion architecture to support analytics and reporting needs.
- Work with third party consultants to implement data lake architecture.
- Build and maintain data pipelines to facilitate the smooth flow of data between various systems and databases.
- Ensure consistency in data model design, including proper use of keys, indexing, and semantic layers to improve query performance.
- Provision datasets for consumption side users, focusing on consistency and data integrity.
- Migrate business logic from Qlik Sense to Azure Synapse Analytics.
- Migrate on premises data systems to the Azure Cloud.
- Test and maintain new and existing API connections.
- Contribute to and expand a data taxonomy to ensure consistency and clarity in data organisation.
- Implement data security protocols across the entire data environment.
- Focused improvement initiatives of existing processes, systems, data, reports.
- Focused improvement initiatives of cross-functional interactions within the organisatin
2025 / 09 / 07
#J-18808-LjbffrData Engineer
Posted 2 days ago
Job Viewed
Job Description
Job Family
Information Technology
Data
Manager of Self Professional
Job PurposeThe purpose of the Data Engineer is to leverage their data expertise and data related technologies, in line with the Nedbank Data Architecture Roadmap, to advance technical thought leadership for the Enterprise, deliver fit for purpose data products, and support data initiatives. In addition, Data Engineers enhance the data infrastructure of the bank to enable advanced analytics, machine learning and artificial intelligence by providing clean, usable data to stakeholders. They also create data pipelines, Ingestion, provisioning, streaming, self service, API and solutions around big data that support the Bank's strategy to become a data driven organisation.
Job Responsibilities- Responsible for the maintenance, improvement, cleaning, and manipulation of data in the bank's operational and analytics databases.
- Data Infrastructure: Build and manage scalable, optimised, supported, tested, secure, and reliable data infrastucture eg using Infrastructure and Databases (DB2, PostgreSQL, MSSQL, HBase, NoSQL, etc), Data Lakes Storage (Azure Data Lake Gen 2), Cloud-based solutions (SAS , Azure Databricks, Azure Data Factory, HDInsight), Data Platforms (SAS, Ab Initio, Denodo, Netezza, Azure Cloud). Ensure data security and privacy in collaboration with Information Security, CISO and Data Governance
- Data Pipeline Build (Ingestion, Provisioning, Streaming and API): Build and maintain data pipelines to:
- create data pipelines for data integration (Data Ingestion, Data Provisioning and Data Streaming) utilising both On Premise tool sets and Cloud Data Engineering tool sets
- efficiently extract data (Data Acquisition) from Golden Sources, Trusted sources and Writebacks with data integration from multiple sources, formats and structures
- load the Nedbank Data Warehouse (Data Reservoir, Atomic Data Warehouse, Enterprise Data Mart)
- provide data to the respective Lines of Business Marts, Regulatory Marts and Compliance Marts through self service data virtualisation
- provide data to applications or Nedbank Data consumers
- transform data to a common data model for reporting and data analysis, and to provide data in a consistent, useable format to Nedbank data stakeholders
- handle big data technologies (Hadoop), streaming (KAFKA) and data Replication (IBM Inphosphere Data Replication)
- drive utilisation of data integration tools ( Ab Initio) and Cloud data integration tools (Azure Data Factory and Azure Data Bricks)
- Data Modelling and Schema Build: In collaboration with Data Modellers, create data models and database schemas on the Data Reservoir, Data Lake, Atomic Data Warehouse and Enterprise Data Marts.
- Nedbank Data Warehouse Automation: Automate, monitor and improve the performance of data pipelines.
- Collaboration: Collaborate with Data Analysts, Software Engineers, Data Modelers, Data Scientistsm Scrum Masers and Data Warehouse teams as part of a squad to contribute to the data architecture detail designs and take ownership of Epics end-to-end and ensure that data solutions deliver business value.
- Data Quality and Data Governance: Ensure that reasonable data quality checks are implemented in the data pipelines to maintain a high level of data accuracy, consistency and security.
- Performance and Optimisation: Ensure the performance of the Nedbank data warehouse, integration patterns, batch and real time jobs, streaming and API's.
- API Development: Build API's that enable the Data Driven Organisation, ensuring that the data warehouse is optimised for API's by collaborating with Software Engineers.
- Matric / Grade 12 / National Senior Certificate
- Advanced Diplomas/National 1st Degrees
- Field of Study:Bcom, BSc, BEng
- Any Data Science certification will be an added advantage, Coursera, Udemy, SAS Data Scientist certification, Microsoft Data Scientist.
- Total number of years of experience:3 - 6 years
- Type of experience:Experienced at working independently within a squad and has the demonstrated knowledge and skills to deliver data outcomes without supervision.
- Experience designing, building, and maintaining data warehouses and data lakes.
- Experience with big data technologies such as Hadoop, Spark, and Hive.
- Experience with programming languages such as Python, Java, and SQL.
- Experience with relational databases and NoSQL databases.
- Experience with cloud computing platforms such as AWS, Azure, and GCP.
- Experience with data visualization tools.
- Result-driven, analytical creative thinker, with demonstrated ability for innovative problem solving.
- Data Warehousing
- Programming (Python, Java, SQL)
- Data Analysis and Data Modelling
- Data Pipelines and ETL tools (Ab Initio, ADB, ADF, SAS ETL)
- Agile Delivery
- Decision Making
- Communication
- Technical/Professional Knowledge and Skills
- Building Partnerships
For assistance, please contact the Nedbank Recruiting Team at .
If you can't find the job you're looking for, activate job alerts to be one of the first to know when new positions open up.
Nedbank Ltd Reg No 1951/ /06. Authorised financial services and registered credit provider (NCRCP16).
For assistance please contact the Nedbank Recruiting Team at
#J-18808-LjbffrData Engineer
Posted 2 days ago
Job Viewed
Job Description
We’re seeking a meticulous Data Engineer to join our client’s finance/operations function. In this role, you’ll design and run reliable data pipelines that ingest, clean, and transform transaction information coming from banks, payment partners, and internal applications.
This position is key to ensuring their financial datasets are complete, consistent, and fully prepared for analysis and reporting. It combines strong technical skills with an emphasis on data accuracy, control, and reconciliation.
Core Responsibilities- Develop, enhance, and manage automated pipelines to load daily transaction files from spreadsheets, APIs, and other sources.
- Reconcile records between banks, acquirers, and payment gateways to identify and resolve discrepancies.
- Create automated checks to catch missing, duplicate, or misaligned data.
- Transform raw inputs into standardized, finance-ready formats suitable for reporting.
- Work closely with the finance team to deliver reliable settlement and transaction summaries.
- Manage elements such as currency conversions, time zone adjustments, fees, and totals.
- Maintain detailed logs and audit trails to ensure traceability of all processing steps.
- Optionally help build internal dashboards or reporting tools to give finance better visibility.
- Minimum of 3 years’ experience in ETL, data engineering, or equivalent roles.
- Advanced Python skills (especially working with tabular/array libraries like Pandas or NumPy) and strong SQL knowledge.
- Practical background in data preparation, transformation, and quality control.
- Experience handling financial or high-volume transactional data preferred.
- Ability to deal with messy inputs, multiple file types, and varied data sources.
- High attention to detail and strong documentation habits to support audits and compliance.
- Familiarity with payment ecosystems, e-commerce settlements, or acquirer reconciliation processes.
- Exposure to cloud storage services (such as Google Drive or S3) and API-based integrations.
- Understanding of finance operations, settlement cycles, or related workflows.
- Experience with scheduling and orchestration tools (like cron or Airflow) to automate processes.