Browse Jobs
For Companies
Log inGet Started

ML Data Engineer

Cairo, Egypt

ML Data Engineer

Cairo, Egypt
Posted 15 hours ago
1 open position
Be the First to Apply

Job Details

Experience Needed:
Career Level:
Education Level:
Salary:
Job Categories:

Skills And Tools:

Job Description

The ML Data Engineer is responsible for designing, implementing, and maintaining a centralized feature repository for scalable machine learning development. This includes building PySpark pipelines, maintaining feature lineage and metadata, ensuring governance and consistency across training and inference, and aligning MLOps architecture via Cloudera and Hopsworksintegration.

Overall Responsibilities:
• Design, build, and maintain robust data pipelines and centralized feature stores.
• Enable consistent, reusable, and governed features for ML development and inference.
• Collaborate with data scientists to transform raw data into model-ready features.
• Ensure data validation, versioning, and lineage to support explainability and trust.
• Streamline data workflows to reduce model development cycle time.
• Contribute to feature documentation, reusability frameworks, and metadata tracking.
• Support experimentation through scalable access to pre-processed and curated features.


Technical Responsibilities:
• Develop and orchestrate batch and streaming pipelines using Cloudera, Hadoop, Hive, and Spark.
• Build and manage centralized Feature Stores to ensure training-serving consistency
• Implement data validation checks using tools like Great Expectations or custom scripts.
• Maintain feature lineage, version control, and data governance protocols.
• Integrate feature engineering processes with MLFlow and experiment tracking tools.
• Optimize feature pipelines for low latency and high throughput in real-time applications.
• Work with Data Scientists to improve data quality, resolve inconsistencies, and enable faster experimentation.
• Monitor feature drift, feature availability, and quality over time.


Tools & Technologies:
• Big Data & Storage: Cloudera, Hadoop, Hive, Spark, HDFS, Azure Data Lake
• Feature Store: Feast, Hopsworks, or custom implementations
• ETL Pipelines: PySpark, SQL, Airflow, Azure Pipelines
• Validation & Quality: Great Expectations, PyDeequ
• Versioning: DVC, Delta Lake
• Experiment Tracking: MLFlow
• Programming Languages: Python, SQL, PySpark
• Governance & Compliance: Audit Logs, Access Control, Metadata Tracking

Job Requirements

Preferred Experience:
• 7-8+ years of experience as a Data Engineer or ML Data Engineer.
• Experience building and managing large-scale ETL workflows for ML use cases.
• Hands-on exposure to building and using feature stores in production.
• Strong knowledge of feature governance, versioning, and schema management.


Education & Certifications:
• Bachelor’s or Master’s degree in Data Engineering, Computer Science, or related discipline.

Certifications preferred:
Microsoft Azure Data Engineer Associate
Cloudera Data Engineer Certification
Databricks Data Engineer Associate 

Featured Jobs

Similar Jobs

Search other opportunities
JobsIT/Software DevelopmentML Data Engineer