316 Data Scientists jobs in Gauteng
Big Data Data Engineer
Posted 5 days ago
Job Viewed
Job Description
Big Data Data Engineer job vacancy in Johannesburg.
We are seeking a skilled Data Engineer to design and develop scalable data pipelines that ingest raw, unstructured JSON data from source systems and transform it into clean, structured datasets within our Hadoop-based data platform.
The ideal candidate will play a critical role in enabling data availability, quality, and usability by engineering the movement of data from the Raw Layer to the Published and Functional Layers.
OverviewBig Data Data Engineer job vacancy in Johannesburg.
Key Responsibilities:
- Design, build, and maintain robust data pipelines to ingest raw JSON data from source systems into the Hadoop Distributed File System (HDFS).
- Transform and enrich unstructured data into structured formats (e.g., Parquet, ORC) for the Published Layer using tools like PySpark, Hive, or Spark SQL.
- Develop workflows to further process and organize data into Functional Layers optimized for business reporting and analytics.
- Implement data validation, cleansing, schema enforcement, and deduplication as part of the transformation process.
- Collaborate with Data Analysts, BI Developers, and Business Users to understand data requirements and ensure datasets are production-ready.
- Optimize ETL/ELT processes for performance and reliability in a large-scale distributed environment.
- Maintain metadata, lineage, and documentation for transparency and governance.
- Monitor pipeline performance and implement error handling and alerting mechanisms.
- 3+ years of experience in data engineering or ETL development within a big data environment.
- Strong experience with Hadoop ecosystem tools: HDFS, Hive, Spark, YARN, and Sqoop.
- Proficiency in PySpark, Spark SQL, and HQL (Hive Query Language).
- Experience working with unstructured JSON data and transforming it into structured formats.
- Solid understanding of data lake architectures: Raw, Published, and Functional layers.
- Familiarity with workflow orchestration tools like Airflow, Oozie, or NiFi.
- Experience with schema design, data modeling, and partitioning strategies.
- Comfortable with version control tools (e.g., Git) and CI/CD processes.
- Experience with data cataloging and governance tools (e.g., Apache Atlas, Alation).
- Exposure to cloud-based Hadoop platforms like AWS EMR, Azure HDInsight, or GCP Dataproc.
- Experience with containerization (e.g., Docker) and/or Kubernetes for pipeline deployment.
- Familiarity with data quality frameworks (e.g., Deequ, Great Expectations).
- Bachelor’s degree in Computer Science, Information Systems, Engineering, or a related field.
- Relevant certifications (e.g., Cloudera, Databricks, AWS Big Data) are a plus.
Big Data Data Engineer
Posted 7 days ago
Job Viewed
Job Description
We are seeking a skilled Data Engineer to design and develop scalable data pipelines that ingest raw, unstructured JSON data from source systems and transform it into clean, structured datasets within our Hadoop-based data platform. The ideal candidate will play a critical role in enabling data availability, quality, and usability by engineering the movement of data from the Raw Layer to the Published and Functional Layers.
Key Responsibilities:
- Design, build, and maintain robust data pipelines to ingest raw JSON data from source systems into the Hadoop Distributed File System (HDFS).
- Transform and enrich unstructured data into structured formats (e.g., Parquet, ORC) for the Published Layer using tools like PySpark, Hive, or Spark SQL.
- Develop workflows to further process and organize data into Functional Layers optimized for business reporting and analytics.
- Implement data validation, cleansing, schema enforcement, and deduplication as part of the transformation process.
- Collaborate with Data Analysts, BI Developers, and Business Users to understand data requirements and ensure datasets are production-ready.
- Optimize ETL/ELT processes for performance and reliability in a large-scale distributed environment.
- Maintain metadata, lineage, and documentation for transparency and governance.
- Monitor pipeline performance and implement error handling and alerting mechanisms.
Technical Skills & Experience:
- 3+ years of experience in data engineering or ETL development within a big data environment.
- Strong experience with Hadoop ecosystem tools: HDFS, Hive, Spark, YARN, and Sqoop.
- Proficiency in PySpark, Spark SQL, and HQL (Hive Query Language).
- Experience working with unstructured JSON data and transforming it into structured formats.
- Solid understanding of data lake architectures: Raw, Published, and Functional layers.
- Familiarity with workflow orchestration tools like Airflow, Oozie, or NiFi.
- Experience with schema design, data modeling, and partitioning strategies.
- Comfortable with version control tools (e.g., Git) and CI/CD processes.
Nice to Have:
- Experience with data cataloging and governance tools (e.g., Apache Atlas, Alation).
- Exposure to cloud-based Hadoop platforms like AWS EMR, Azure HDInsight, or GCP Dataproc.
- Experience with containerization (e.g., Docker) and/or Kubernetes for pipeline deployment.
- Familiarity with data quality frameworks (e.g., Deequ, Great Expectations).
Qualifications:
- Bachelor’s degree in Computer Science, Information Systems, Engineering, or a related field.
- Relevant certifications (e.g., Cloudera, Databricks, AWS Big Data) are a plus.
* In order to comply with the POPI Act, for future career opportunities, we require your permission to maintain your personal details on our database. By completing and returning this form you give PBT your consent
* If you have not received any feedback after 2 weeks, please consider you application as unsuccessful.
Big data data engineer
Posted today
Job Viewed
Job Description
Research Assistant (Administrative tax data | Big Data)
Posted 2 days ago
Job Viewed
Job Description
UNU-WIDER is seeking exceptional candidates for the position of Research Assistant, based in Pretoria, South Africa, to support the SA-TIED programme. This role involves managing and enhancing tax datasets, assisting researchers, and ensuring high standards of data confidentiality.
For the full job description and application details, please click here.
UNU offers three types of contracts: fixed-term staff positions (General Service, National Officer and Professional), Personnel Service Agreement positions (PSA), and consultant positions (CTC). For more information, see the Contract Types page.
1 articles, publications, projects, experts. #J-18808-LjbffrProduct Manager, Big Data & AI Engineer (Cisco & Mobile Technologies)
Posted 5 days ago
Job Viewed
Job Description
Overview
We are seeking an experienced Big Data & AI Specialist who will drive the design development and deployment of intelligent data solutions. The ideal candidate will combine deep technical expertise in Big Data platforms, Artificial Intelligence and Machine Learning with practical knowledge of Cisco networking technologies and Mobile communication systems. You will work across teams to build data-driven architectures, ensure secure and scalable infrastructure and enable actionable insights through advanced analytics.
Responsibilities- Design and implement robust Big Data architectures and AI-driven solutions for advanced data processing, analytics and automation.
- Develop and deploy machine learning models, predictive analytics and scalable data pipelines.
- Collaborate closely with network engineering teams to seamlessly integrate AI solutions within Cisco networking environments.
- Optimize mobile technology platforms for real-time data collection and transmission, enabling responsive AI-driven applications.
- Manage and process large-scale datasets from diverse sources (structured, semi-structured and unstructured) ensuring data quality, security and governance.
- Deploy and maintain scalable big data platforms on cloud or on-premises infrastructure leveraging technologies such as Hadoop, Spark, Kafka and others.
- Build and deploy APIs and microservices to enable seamless delivery of AI models across mobile and network environments.
- Perform system performance tuning, troubleshooting and proactive monitoring of big data and AI platforms.
- Continuously research and adopt emerging technologies and best practices in AI, Big Data, Cisco networking solutions and mobile networks.
- Big Data & AI Technologies: Proficiency in big data ecosystems: Hadoop, Spark, Hive, Kafka, Flink or equivalent technologies. Expertise in machine learning frameworks such as TensorFlow, PyTorch and Scikit-learn. Strong experience with data science tools: Python, R, SQL, Scala. Knowledge of ETL processes and workflow orchestration tools: Airflow, NiFi.
- Cisco Networking: Cisco Certified Network Professional (CCNP) certification or equivalent practical experience. Hands-on knowledge of Cisco SD-WAN, ACI, ISE and advanced security solutions. Experience with network automation and monitoring using Cisco DNA Center, NetFlow and SNMP protocols.
- Mobile Technologies: Solid understanding of 3G, 4G LTE and 5G mobile network technologies. Experience with mobile device management (MDM), edge computing and IoT platforms. Familiarity with mobile application ecosystems and their integration with AI platforms.
- Cloud Platforms (Advantageous): Experience with cloud providers such as AWS, Azure or Google Cloud Platform specifically in Big Data and AI services. Proficiency with Kubernetes, Docker and container orchestration for scalable deployments.
- Other Competencies: Strong problem-solving and analytical skills. Excellent communication, collaboration and stakeholder management abilities. Proven ability to thrive in cross-functional agile teams.
- Bachelors or Masters degree in Computer Science, Data Science, Telecommunications or a related field.
- A minimum of 5 years hands-on experience in the development and deployment of Big Data and AI/ML solutions.
- At least 3 years of proven experience working with Cisco network infrastructure.
- Prior experience in mobile technology environments or the telecommunications industry is highly advantageous.
- Relevant professional certifications are preferred including: AI/Big Data certifications (e.g., TensorFlow, Azure AI Engineer, Google Professional Data Engineer); Cisco certifications (CCNA, CCNP or higher); Mobile technology certifications (MDM, 5G, IoT platforms); Experience with Huawei Mobile Cloud is a distinct advantage.
Please send your CV to or contact.
Key Details- Employment Type: Full-Time
- Experience: years
- Vacancy: 1
Product manager, big data & ai engineer (cisco & mobile technologies)
Posted today
Job Viewed
Job Description
Machine Learning AI Data Scientist
Posted today
Job Viewed
Job Description
- Build, fine-tune, and deploy Large Language Models (LLMs) across domains.
- Develop and implement chatbots, transcription systems, and RAG pipelines .
- Design and optimise end-to-end ML lifecycles , from data preprocessing to production deployment.
- Leverage vector databases (Pinecone, FAISS, Weaviate, Milvus) for retrieval-augmented generation.
- Drive innovation by applying GenAI, NLP, and advanced ML techniques to real-world business problems.
- Education: Degree in Computer Science, Data Science, AI/ML, or related field (postgrad a bonus).
- Experience: Hands-on with LLMs, transformer architectures, and NLP frameworks .
- Technical Skills: Python, PyTorch/TensorFlow, Hugging Face, LangChain/LlamaIndex, OpenAI & Anthropic APIs, Docker, Kubernetes, AWS/Azure/GCP ML platforms, MLOps pipelines.
- Other: Strong track record in AI model training, deployment, and optimisation . Bonus if youve presented papers or talks in AI/ML conferences.
For more exciting AI & Data vacancies, please visit:
Be The First To Know
About the latest Data scientists Jobs in Gauteng !
Senior Data Scientist - Machine Learning
Posted 16 days ago
Job Viewed
Job Description
About the Role:
Youll work across the full data science lifecyclefrom understanding business needs and refining problem statements to data preparation, model development, testing, deployment, and performance monitoring. Your work will support data-led strategies, operational efficiency, and intelligent decision-making through automation and advanced analytics.
Key Responsibilities:
- Develop and deploy predictive modelling and machine learning solutions.
- Collaborate with stakeholders to define business challenges and identify analytical opportunities.
- Conduct data cleaning, transformation, feature engineering, and rigorous hypothesis testing.
- Work with engineering teams to build data pipelines and access large-scale structured and unstructured data.
- Translate data science solutions into practical business applications.
- Create impactful visualisations and reports to communicate insights to both technical and non-technical stakeholders.
- Monitor and refine models to ensure performance and business relevance over time.
Requirements:
- Minimum:
- Honours, Masters, or PhD in a quantitative discipline (e.g. Actuarial Science, Mathematics, Computer Science, Engineering, and Statistics).
- 56 years of hands-on experience in data science and analytics roles.
- Strong Python and SQL proficiency, including libraries such as NumPy, Pandas, SciPy, and Matplotlib.
- Experience working with Jupyter Notebooks and in Agile development environments.
- Proven ability to operationalise machine learning models at scale.
- Preferred:
- Familiarity with cloud platforms (e.g. Azure and AWS), Spark, and big data tools.
- Professional certifications in data science and analytics technologies.
- Experience with data visualisation tools such as Power BI, Tableau, or Kibana.
- Key Competencies:
- Advanced analytical and problem-solving skills.
- Strong communication skills, with the ability to explain technical concepts to diverse audiences.
- Business acumen and the ability to align data solutions with strategic objectives.
- Initiative, adaptability, and a strong sense of accountability.
- Ability to work independently as well as collaboratively within cross-functional teams.
Ready to use data to make an impact?
Join a dynamic, forward-thinking environment where your expertise will directly contribute to innovation and strategic growth. Apply now and be part of the data-driven future.
Note: This position is open to candidates with strong technical capabilities and a proven track record in solving real-world problems with data.
Machine learning engineer
Posted today
Job Viewed
Job Description
We are looking for a talented Machine Learning Engineer to join our team, responsible for developing and deploying machine learning models and algorithms that drive business growth and innovation. The successful candidate will have a strong background in machine learning, deep learning, and software engineering, with a proven track record of delivering high-quality machine learning models and algorithms. The Machine Learning Engineer will work closely with cross-functional teams, including data science, product, and engineering, to identify opportunities for machine learning-driven innovation and develop strategic plans to execute on these opportunities.
Responsibilities:- Design, develop, and deploy machine learning models and algorithms that drive business growth and innovation
- Collaborate with data scientists to develop and implement machine learning models and algorithms
- Work with software engineers to integrate machine learning models and algorithms into production-ready software applications
- Develop and maintain large-scale machine learning systems, including data pipelines, model training, and model serving
- Optimize machine learning models and algorithms for performance, scalability, and reliability
- Stay up-to-date with the latest advancements in machine learning, deep learning, and AI, applying this knowledge to drive innovation and improvement in machine learning models and algorithms
- Collaborate with product managers to develop product roadmaps and prioritize features and requirements
- Develop and maintain relationships with key stakeholders, including business leaders, product managers, and engineering teams
- Communicate complex machine learning concepts and results to non-technical stakeholders, including business leaders and product managers
- Bachelor's or Master's degree in Computer Science, Electrical Engineering, or related field
- 3+ years of experience in machine learning, deep learning, or software engineering, with a focus on machine learning model development and deployment
- Strong background in machine learning, deep learning, and software engineering, with expertise in areas such as natural language processing, computer vision, or recommender systems
- Experience with machine learning frameworks and tools, such as TensorFlow, PyTorch, or Scikit-learn
- Strong programming skills in languages such as Python, Java, or C++
- Experience with cloud-based technologies, such as AWS or Google Cloud
- Strong understanding of software engineering principles, including design patterns, testing, and version control
- Strong communication and collaboration skills, with the ability to work effectively with cross-functional teams
Technical Skills:
- Machine learning frameworks: TensorFlow, PyTorch, Scikit-learn, etc.
- Deep learning frameworks: Keras, TensorFlow, PyTorch, etc.
- Cloud-based technologies: AWS, Google Cloud, Azure, etc.
Full time
Johannesburg
#J-18808-LjbffrAI Machine Learning
Posted today
Job Viewed
Job Description
Join to apply for the AI Machine Learning role at Blue Pearl
Join to apply for the AI Machine Learning role at Blue Pearl
Job Description
Standard Bank is seeking a highly skilled AI and Machine Learning Specialist to join our innovative team. In this role, you will leverage your expertise in artificial intelligence and machine learning to develop and implement cutting-edge solutions that drive business value and enhance customer experiences.
Job Description
Standard Bank is seeking a highly skilled AI and Machine Learning Specialist to join our innovative team. In this role, you will leverage your expertise in artificial intelligence and machine learning to develop and implement cutting-edge solutions that drive business value and enhance customer experiences.
Responsibilities
- Model Development:
- Design, develop, and train machine learning models.
- Implement AI algorithms and frameworks.
- Conduct exploratory data analysis to inform model development.
- Model Deployment and Integration:
- Deploy machine learning models into production environments.
- Integrate models with existing systems and data pipelines.
- Ensure seamless operation of deployed models.
- Data Preparation and Feature Engineering:
- Prepare and clean data for model training and evaluation.
- Perform feature engineering to enhance model performance.
- Implement data preprocessing and transformation pipelines.
- Model Evaluation and Tuning:
- Evaluate model performance using appropriate metrics.
- Tune hyperparameters and optimize model accuracy.
- Conduct A/B testing and validation of models.
- Collaboration and Stakeholder Engagement:
- Work with data scientists, engineers, and business stakeholders to understand requirements.
- Translate business problems into technical solutions.
- Communicate findings and model performance to non-technical stakeholders.
- Research and Innovation:
- Stay updated with the latest advancements in AI and ML.
- Experiment with new algorithms and techniques.
- Propose innovative solutions to business problems.
- Documentation and Reporting:
- Document model development processes and methodologies.
- Create user guides and technical documentation.
- Report on model performance and project progress.
- Machine Learning Models:
- Trained and validated ML models.
- Model deployment scripts and integration guidelines.
- Documentation of model architecture and training processes.
- Data Pipelines:
- Data preprocessing and transformation pipelines.
- Feature engineering scripts.
- Documentation of data preparation steps.
- Performance Reports:
- Model performance metrics and evaluation reports.
- Hyperparameter tuning and optimization logs.
- A/B testing and validation results.
- Technical Documentation:
- User guides for deployed models.
- Technical documentation for model development and deployment.
- Maintenance and monitoring procedures.
Bachelor's degree in Computer Science, Engineering, Mathematics, or related field. Advanced degree (e.g., Master's or PhD) preferred.
Proven experience in developing and deploying machine learning models in a commercial or academic environment.
Proficiency in programming languages such as Python, R, or Java.
Strong understanding of statistical methods and data analysis techniques.
Excellent communication skills with the ability to collaborate effectively with technical and non-technical stakeholders.
check(event) ; career-website-detail-template-2 => apply(record.id,meta)" mousedown="lyte-button => check(event)" final-style="background-color:#187B9E;border-color:#187B9E;color:white;" final-class="lyte-button lyteBackgroundColorBtn lyteSuccess" lyte-rendered=""> Seniority level
- Seniority level Entry level
- Employment type Full-time
- Job function Engineering and Information Technology
- Industries IT Services and IT Consulting
Referrals increase your chances of interviewing at Blue Pearl by 2x
Sign in to set job alerts for “Machine Learning Specialist” roles.Johannesburg, Gauteng, South Africa 4 days ago
Johannesburg, Gauteng, South Africa 2 months ago
Johannesburg, Gauteng, South Africa 1 day ago
Johannesburg, Gauteng, South Africa 4 days ago
Germiston, Gauteng, South Africa 4 days ago
Personal and Private Banking (PPB) AI/ML Engineer Graduate ProgrammeJohannesburg, Gauteng, South Africa 1 month ago
Johannesburg, Gauteng, South Africa 13 hours ago
Johannesburg, Gauteng, South Africa 4 days ago
Sandton, Gauteng, South Africa 2 days ago
Johannesburg Metropolitan Area 1 week ago
Johannesburg, Gauteng, South Africa 2 days ago
Machine Learning Engineer/ Data Scientist with R certification + PL-400Johannesburg, Gauteng, South Africa 1 month ago
Research Scientist – Generative & Applied Machine Learning Personal & Private Banking (PPB) Data Science Graduate ProgrammeWe’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr