Senior Data Engineer job at CRDB
Website :
8 Days Ago
Linkedid Twitter Share on facebook
Senior Data Engineer
2026-06-23T08:47:16+00:00
CRDB
https://cdn.greattanzaniajobs.com/jsjobsdata/data/employer/comp_2278/logo/CRDB%20Bank%20Plc.jpg
FULL_TIME
Tanzania Head Office
Tanzania
00000
Tanzania
Finance
Computer & IT, Science & Engineering
TZS
MONTH
2026-07-09T17:00:00+00:00
8

The Senior Data Engineer is responsible for designing, building, and maintaining the scalable data pipelines and ingestion frameworks that power the Digital Banking department. This role focuses on translating disparate, high-volume raw data streams—from mobile apps, internet banking portals, and payment gateways—into structured, clean, and highly optimized data stores.

The Senior Data Engineer ensures that data is consistently available, accurate, and structured to support real-time reporting, advanced business intelligence, and production-ready machine learning models.

Principle Responsibilities

  • Design, implement, and optimize scalable batch and real-time data ingestion pipelines using distributed computing frameworks like PySpark.
  • Build and maintain resilient data lakes and warehousing environments, managing storage formats (e.g., Parquet, Delta) and metadata cataloging systems such as a Hive Metastore backed by PostgreSQL or object storage.
  • Structure and partition large datasets to ensure low-latency query performance for downstream consumers (BI Analysts and Data Scientists).
  • Implement strict data contract definitions, schema registries, and quality validation checks within pipelines to catch upstream system changes before they break downstream models or reports.
  • Ensure data pipelines adhere to strict banking data privacy regulations, masking sensitive customer details, managing access control levels, and archiving historical logs securely.
  • Maintain a highly transparent, clear data catalog mapping out data lineage from core systems to final analytics tables.
  • Work to deploy data pipelines via containerized environments (Docker/Kubernetes).
  • Serve as the primary technical point of contact for the BI Team and Data Science Team, translating their business analytics requirements into optimized backend data assets.
  • Enforce clean, modular, and optimized SQL and Python coding standards for data engineering, ensuring thorough version control (Git) and documentation.

Qualifications Required

  • Bachelor’s degree in Computer Science, Software Engineering, Information Systems, Data Science, Statistics, Mathematics, or any related field.
  • Minimum of 3 years of professional experience as a Data Engineer or Core Database Developer, with a proven track record of managing production-grade data pipelines.
  • Advanced, hands-on experience using PySpark/Spark to extract, transform, and load massive, complex datasets.
  • Deep understanding of managing decoupled data environments, file storage types (Parquet etc), and metadata catalogs (Hive Metastore).
  • Expert-level proficiency in writing and optimizing complex queries, indexing, and modeling data structures within relational engines (e.g., PostgreSQL, Oracle).
  • Strong familiarity with container tools (Docker) and modern data orchestration workflows (e.g., Apache Airflow or cron-based job scheduling).
  • Dedication to automation, building resilient architectures that can recover from network timeouts, api failures, or source data spikes without manual intervention.
  • An obsessive eye for identifying performance bottlenecks in queries and pipeline steps to minimize computing costs and execution time.
  • Excellent technical communication skills, allowing for seamless collaboration with data consumers to understand exactly how the data needs to be shaped.
  • Flexible and adoptive to market dynamics and experimentation.
  • Customer‑centric mindset.
  • Self-driven and problem‑solving skills
  • Design, implement, and optimize scalable batch and real-time data ingestion pipelines using distributed computing frameworks like PySpark.
  • Build and maintain resilient data lakes and warehousing environments, managing storage formats (e.g., Parquet, Delta) and metadata cataloging systems such as a Hive Metastore backed by PostgreSQL or object storage.
  • Structure and partition large datasets to ensure low-latency query performance for downstream consumers (BI Analysts and Data Scientists).
  • Implement strict data contract definitions, schema registries, and quality validation checks within pipelines to catch upstream system changes before they break downstream models or reports.
  • Ensure data pipelines adhere to strict banking data privacy regulations, masking sensitive customer details, managing access control levels, and archiving historical logs securely.
  • Maintain a highly transparent, clear data catalog mapping out data lineage from core systems to final analytics tables.
  • Work to deploy data pipelines via containerized environments (Docker/Kubernetes).
  • Serve as the primary technical point of contact for the BI Team and Data Science Team, translating their business analytics requirements into optimized backend data assets.
  • Enforce clean, modular, and optimized SQL and Python coding standards for data engineering, ensuring thorough version control (Git) and documentation.
  • PySpark/Spark
  • Data Lakes
  • Data Warehousing
  • Parquet
  • Delta Lake
  • Hive Metastore
  • PostgreSQL
  • Object Storage
  • SQL
  • Python
  • Docker
  • Kubernetes
  • Apache Airflow
  • Git
  • Data Privacy Regulations
  • Schema Registries
  • Data Lineage
  • Performance Optimization
  • Technical Communication
  • Bachelor’s degree in Computer Science, Software Engineering, Information Systems, Data Science, Statistics, Mathematics, or any related field.
  • Minimum of 3 years of professional experience as a Data Engineer or Core Database Developer, with a proven track record of managing production-grade data pipelines.
  • Advanced, hands-on experience using PySpark/Spark to extract, transform, and load massive, complex datasets.
  • Deep understanding of managing decoupled data environments, file storage types (Parquet etc), and metadata catalogs (Hive Metastore).
  • Expert-level proficiency in writing and optimizing complex queries, indexing, and modeling data structures within relational engines (e.g., PostgreSQL, Oracle).
  • Strong familiarity with container tools (Docker) and modern data orchestration workflows (e.g., Apache Airflow or cron-based job scheduling).
  • Dedication to automation, building resilient architectures that can recover from network timeouts, api failures, or source data spikes without manual intervention.
  • An obsessive eye for identifying performance bottlenecks in queries and pipeline steps to minimize computing costs and execution time.
  • Excellent technical communication skills, allowing for seamless collaboration with data consumers to understand exactly how the data needs to be shaped.
  • Flexible and adoptive to market dynamics and experimentation.
  • Customer‑centric mindset.
  • Self-driven and problem‑solving skills
bachelor degree
36
JOB-6a3a4814be64c

Vacancy title:
Senior Data Engineer

[Type: FULL_TIME, Industry: Finance, Category: Computer & IT, Science & Engineering]

Jobs at:
CRDB

Deadline of this Job:
Thursday, July 9 2026

Duty Station:
Tanzania Head Office | Tanzania

Summary
Date Posted: Tuesday, June 23 2026, Base Salary: Not Disclosed

Similar Jobs in Tanzania
Learn more about CRDB
CRDB jobs in Tanzania

JOB DETAILS:

The Senior Data Engineer is responsible for designing, building, and maintaining the scalable data pipelines and ingestion frameworks that power the Digital Banking department. This role focuses on translating disparate, high-volume raw data streams—from mobile apps, internet banking portals, and payment gateways—into structured, clean, and highly optimized data stores.

The Senior Data Engineer ensures that data is consistently available, accurate, and structured to support real-time reporting, advanced business intelligence, and production-ready machine learning models.

Principle Responsibilities

  • Design, implement, and optimize scalable batch and real-time data ingestion pipelines using distributed computing frameworks like PySpark.
  • Build and maintain resilient data lakes and warehousing environments, managing storage formats (e.g., Parquet, Delta) and metadata cataloging systems such as a Hive Metastore backed by PostgreSQL or object storage.
  • Structure and partition large datasets to ensure low-latency query performance for downstream consumers (BI Analysts and Data Scientists).
  • Implement strict data contract definitions, schema registries, and quality validation checks within pipelines to catch upstream system changes before they break downstream models or reports.
  • Ensure data pipelines adhere to strict banking data privacy regulations, masking sensitive customer details, managing access control levels, and archiving historical logs securely.
  • Maintain a highly transparent, clear data catalog mapping out data lineage from core systems to final analytics tables.
  • Work to deploy data pipelines via containerized environments (Docker/Kubernetes).
  • Serve as the primary technical point of contact for the BI Team and Data Science Team, translating their business analytics requirements into optimized backend data assets.
  • Enforce clean, modular, and optimized SQL and Python coding standards for data engineering, ensuring thorough version control (Git) and documentation.

Qualifications Required

  • Bachelor’s degree in Computer Science, Software Engineering, Information Systems, Data Science, Statistics, Mathematics, or any related field.
  • Minimum of 3 years of professional experience as a Data Engineer or Core Database Developer, with a proven track record of managing production-grade data pipelines.
  • Advanced, hands-on experience using PySpark/Spark to extract, transform, and load massive, complex datasets.
  • Deep understanding of managing decoupled data environments, file storage types (Parquet etc), and metadata catalogs (Hive Metastore).
  • Expert-level proficiency in writing and optimizing complex queries, indexing, and modeling data structures within relational engines (e.g., PostgreSQL, Oracle).
  • Strong familiarity with container tools (Docker) and modern data orchestration workflows (e.g., Apache Airflow or cron-based job scheduling).
  • Dedication to automation, building resilient architectures that can recover from network timeouts, api failures, or source data spikes without manual intervention.
  • An obsessive eye for identifying performance bottlenecks in queries and pipeline steps to minimize computing costs and execution time.
  • Excellent technical communication skills, allowing for seamless collaboration with data consumers to understand exactly how the data needs to be shaped.
  • Flexible and adoptive to market dynamics and experimentation.
  • Customer‑centric mindset.
  • Self-driven and problem‑solving skills

Work Hours: 8

Experience in Months: 36

Level of Education: bachelor degree

Job application procedure

Interested and qualified? Click here to apply

All Jobs | QUICK ALERT SUBSCRIPTION

Job Info
Job Category: Computer/ IT jobs in Tanzania
Job Type: Full-time
Deadline of this Job: Thursday, July 9 2026
Duty Station: Tanzania Head Office | Tanzania
Posted: 23-06-2026
No of Jobs: 1
Start Publishing: 23-06-2026
Stop Publishing (Put date of 2030): 10-10-2076
Apply Now
Notification Board

Join a Focused Community on job search to uncover both advertised and non-advertised jobs that you may not be aware of. A jobs WhatsApp Group Community can ensure that you know the opportunities happening around you and a jobs Facebook Group Community provides an opportunity to discuss with employers who need to fill urgent position. Click the links to join. You can view previously sent Email Alerts here incase you missed them and Subscribe so that you never miss out.

Caution: Never Pay Money in a Recruitment Process.

Some smart scams can trick you into paying for Psychometric Tests.