Site Reliability Engineer job at Vodacom
New
Website :
3 Days Ago
Linkedid Twitter Share on facebook
Site Reliability Engineer
2026-02-03T20:30:14+00:00
Vodacom
https://cdn.greattanzaniajobs.com/jsjobsdata/data/employer/comp_5916/logo/Vodacom.jpeg
FULL_TIME
Dar Es Salaam
Dar es Salaam
00000
Tanzania
Telecommunications
Computer & IT, Science & Engineering
TZS
MONTH
2026-02-15T17:00:00+00:00
8

Join Us

At Vodafone, we’re not just shaping the future of connectivity for our customers – we’re shaping the future for everyone who joins our team. When you work with us, you’re part of a global mission to connect people, solve complex challenges, and create a sustainable and more inclusive world. If you want to grow your career whilst finding the perfect balance between work and life, Vodafone offers the opportunities to help you belong and make a real impact.

What you’ll do

The Site Reliability Engineer ensures the scalability, performance, and reliability of big data platforms (Hadoop, Spark, Flink, Kafka, etc.) by bridging software engineering and operations. The role focuses on automation, monitoring, incident management, fault tolerance, and disaster recovery to maintain high availability across data clusters. Additionally, it involves proactively resolving bottlenecks, enforcing SLAs, optimizing resources, and securing data pipelines, enabling efficient, continuous, and reliable delivery of analytics and data-driven services at scale.

Key accountabilities and decision ownership

  • Platform Reliability and Performance: Ensure high availability, scalability, and optimal performance of big data platforms (e.g., Hadoop, Spark, Kafka, HDFS, Iceberg) through proactive monitoring, tuning, and capacity management.
  • Automation and Infrastructure as Code: Design and implement automated deployment, configuration, and recovery processes using tools like Ansible, Terraform, or Kubernetes to improve operational efficiency and reduce human error.
  • Incident Management and Root Cause Analysis: Lead incident response for critical big data systems, perform detailed post-incident reviews, and implement corrective actions to prevent recurrence.
  • Observability and Monitoring: Develop and maintain comprehensive observability frameworks (metrics, logging, alerting) using tools such as Prometheus, Grafana, or ELK to ensure early detection of anomalies and service degradations.
  • Security, Compliance, and Change Governance: Enforce data platform security controls, manage configuration changes, and ensure compliance with organizational and regulatory standards for data protection and access.

Who you are

Core competencies, knowledge, and experience

  • Technical Expertise: Strong hands-on experience with big data ecosystems (Hadoop, Spark, Kafka, HDFS, Hive, Flink, Iceberg, Trino) and distributed systems performance tuning, troubleshooting, and optimization.
  • Reliability Engineering: Proficient in applying SRE principles—automation, monitoring, incident response, fault tolerance, and resilience engineering—to maintain system uptime and reliability.
  • DevOps & Automation: Skilled in CI/CD pipelines, Infrastructure as Code (e.g. Terraform, Ansible), and container orchestration platforms such as Kubernetes for scalable data workloads.
  • Monitoring & Observability: Deep understanding of metrics collection, alerting, and visualization tools (Prometheus, Grafana) to ensure proactive system health management.
  • Security & Governance: Knowledge of authentication and authorization frameworks (Kerberos, Ranger, OAuth2), encryption standards, and compliance best practices for big data environments.
  • Collaboration & Communication: Strong problem-solving, documentation, and cross-functional communication skills, enabling effective collaboration with data engineers, platform teams, and security teams and other key stakeholders.

Must have technical/professional qualifications:

  • Bachelor’s degree in Computer Science, Information Technology, Computer Engineering, or equivalent.
  • 2+ years of experience in Site Reliability Engineering, DevOps, or Big Data Platform Engineering within large-scale, distributed environments.
  • Proven experience managing and optimizing Hadoop ecosystem components (HDFS, Hive, Spark, Flink, Iceberg, Trino, etc.).
  • Hands-on experience with Linux systems administration, network troubleshooting, and performance optimization in production clusters.

Not a perfect fit?

Worried that you don’t meet all the desired criteria exactly? At Vodafone we are passionate about empowering people and creating a workplace where everyone can thrive, whatever their personal or professional background. If you’re excited about this role but your experience doesn’t align exactly with every part of the job description, we encourage you to still apply as you may be the right candidate for this role or another opportunity.

  • Platform Reliability and Performance: Ensure high availability, scalability, and optimal performance of big data platforms (e.g., Hadoop, Spark, Kafka, HDFS, Iceberg) through proactive monitoring, tuning, and capacity management.
  • Automation and Infrastructure as Code: Design and implement automated deployment, configuration, and recovery processes using tools like Ansible, Terraform, or Kubernetes to improve operational efficiency and reduce human error.
  • Incident Management and Root Cause Analysis: Lead incident response for critical big data systems, perform detailed post-incident reviews, and implement corrective actions to prevent recurrence.
  • Observability and Monitoring: Develop and maintain comprehensive observability frameworks (metrics, logging, alerting) using tools such as Prometheus, Grafana, or ELK to ensure early detection of anomalies and service degradations.
  • Security, Compliance, and Change Governance: Enforce data platform security controls, manage configuration changes, and ensure compliance with organizational and regulatory standards for data protection and access.
  • Technical Expertise: Strong hands-on experience with big data ecosystems (Hadoop, Spark, Kafka, HDFS, Hive, Flink, Iceberg, Trino) and distributed systems performance tuning, troubleshooting, and optimization.
  • Reliability Engineering: Proficient in applying SRE principles—automation, monitoring, incident response, fault tolerance, and resilience engineering—to maintain system uptime and reliability.
  • DevOps & Automation: Skilled in CI/CD pipelines, Infrastructure as Code (e.g. Terraform, Ansible), and container orchestration platforms such as Kubernetes for scalable data workloads.
  • Monitoring & Observability: Deep understanding of metrics collection, alerting, and visualization tools (Prometheus, Grafana) to ensure proactive system health management.
  • Security & Governance: Knowledge of authentication and authorization frameworks (Kerberos, Ranger, OAuth2), encryption standards, and compliance best practices for big data environments.
  • Collaboration & Communication: Strong problem-solving, documentation, and cross-functional communication skills, enabling effective collaboration with data engineers, platform teams, and security teams and other key stakeholders.
  • Bachelor’s degree in Computer Science, Information Technology, Computer Engineering, or equivalent.
  • 2+ years of experience in Site Reliability Engineering, DevOps, or Big Data Platform Engineering within large-scale, distributed environments.
  • Proven experience managing and optimizing Hadoop ecosystem components (HDFS, Hive, Spark, Flink, Iceberg, Trino, etc.).
  • Hands-on experience with Linux systems administration, network troubleshooting, and performance optimization in production clusters.
bachelor degree
24
JOB-69825ad6a2d87

Vacancy title:
Site Reliability Engineer

[Type: FULL_TIME, Industry: Telecommunications, Category: Computer & IT, Science & Engineering]

Jobs at:
Vodacom

Deadline of this Job:
Sunday, February 15 2026

Duty Station:
Dar Es Salaam | Dar es Salaam

Summary
Date Posted: Tuesday, February 3 2026, Base Salary: Not Disclosed

Similar Jobs in Tanzania
Learn more about Vodacom
Vodacom jobs in Tanzania

JOB DETAILS:

Join Us

At Vodafone, we’re not just shaping the future of connectivity for our customers – we’re shaping the future for everyone who joins our team. When you work with us, you’re part of a global mission to connect people, solve complex challenges, and create a sustainable and more inclusive world. If you want to grow your career whilst finding the perfect balance between work and life, Vodafone offers the opportunities to help you belong and make a real impact.

What you’ll do

The Site Reliability Engineer ensures the scalability, performance, and reliability of big data platforms (Hadoop, Spark, Flink, Kafka, etc.) by bridging software engineering and operations. The role focuses on automation, monitoring, incident management, fault tolerance, and disaster recovery to maintain high availability across data clusters. Additionally, it involves proactively resolving bottlenecks, enforcing SLAs, optimizing resources, and securing data pipelines, enabling efficient, continuous, and reliable delivery of analytics and data-driven services at scale.

Key accountabilities and decision ownership

  • Platform Reliability and Performance: Ensure high availability, scalability, and optimal performance of big data platforms (e.g., Hadoop, Spark, Kafka, HDFS, Iceberg) through proactive monitoring, tuning, and capacity management.
  • Automation and Infrastructure as Code: Design and implement automated deployment, configuration, and recovery processes using tools like Ansible, Terraform, or Kubernetes to improve operational efficiency and reduce human error.
  • Incident Management and Root Cause Analysis: Lead incident response for critical big data systems, perform detailed post-incident reviews, and implement corrective actions to prevent recurrence.
  • Observability and Monitoring: Develop and maintain comprehensive observability frameworks (metrics, logging, alerting) using tools such as Prometheus, Grafana, or ELK to ensure early detection of anomalies and service degradations.
  • Security, Compliance, and Change Governance: Enforce data platform security controls, manage configuration changes, and ensure compliance with organizational and regulatory standards for data protection and access.

Who you are

Core competencies, knowledge, and experience

  • Technical Expertise: Strong hands-on experience with big data ecosystems (Hadoop, Spark, Kafka, HDFS, Hive, Flink, Iceberg, Trino) and distributed systems performance tuning, troubleshooting, and optimization.
  • Reliability Engineering: Proficient in applying SRE principles—automation, monitoring, incident response, fault tolerance, and resilience engineering—to maintain system uptime and reliability.
  • DevOps & Automation: Skilled in CI/CD pipelines, Infrastructure as Code (e.g. Terraform, Ansible), and container orchestration platforms such as Kubernetes for scalable data workloads.
  • Monitoring & Observability: Deep understanding of metrics collection, alerting, and visualization tools (Prometheus, Grafana) to ensure proactive system health management.
  • Security & Governance: Knowledge of authentication and authorization frameworks (Kerberos, Ranger, OAuth2), encryption standards, and compliance best practices for big data environments.
  • Collaboration & Communication: Strong problem-solving, documentation, and cross-functional communication skills, enabling effective collaboration with data engineers, platform teams, and security teams and other key stakeholders.

Must have technical/professional qualifications:

  • Bachelor’s degree in Computer Science, Information Technology, Computer Engineering, or equivalent.
  • 2+ years of experience in Site Reliability Engineering, DevOps, or Big Data Platform Engineering within large-scale, distributed environments.
  • Proven experience managing and optimizing Hadoop ecosystem components (HDFS, Hive, Spark, Flink, Iceberg, Trino, etc.).
  • Hands-on experience with Linux systems administration, network troubleshooting, and performance optimization in production clusters.

Not a perfect fit?

Worried that you don’t meet all the desired criteria exactly? At Vodafone we are passionate about empowering people and creating a workplace where everyone can thrive, whatever their personal or professional background. If you’re excited about this role but your experience doesn’t align exactly with every part of the job description, we encourage you to still apply as you may be the right candidate for this role or another opportunity.

Work Hours: 8

Experience in Months: 24

Level of Education: bachelor degree

Job application procedure

Application Link: Click Here to Apply Now

All Jobs | QUICK ALERT SUBSCRIPTION

Job Info
Job Category: Engineering jobs in Tanzania
Job Type: Full-time
Deadline of this Job: Sunday, February 15 2026
Duty Station: Dar Es Salaam | Dar es Salaam
Posted: 03-02-2026
No of Jobs: 1
Start Publishing: 03-02-2026
Stop Publishing (Put date of 2030): 10-10-2076
Apply Now
Notification Board

Join a Focused Community on job search to uncover both advertised and non-advertised jobs that you may not be aware of. A jobs WhatsApp Group Community can ensure that you know the opportunities happening around you and a jobs Facebook Group Community provides an opportunity to discuss with employers who need to fill urgent position. Click the links to join. You can view previously sent Email Alerts here incase you missed them and Subscribe so that you never miss out.

Caution: Never Pay Money in a Recruitment Process.

Some smart scams can trick you into paying for Psychometric Tests.