Lead / Staff Data Engineer - Data Platform

Apna

Apna

Software Engineering, Data Science

Bengaluru, Karnataka, India

Posted on Jun 1, 2026

Company: Apna

Team: Data Platform / Engineering

Location: Bangalore

Experience : 5-7 Years of Experience

Why Join Apna

At Apna, data is central to how we build products, understand users, improve employer outcomes, power recommendations, and scale decision-making. This role gives you the opportunity to build the backbone of Apna’s data platform and influence how data is used across the company.

You will work on real-world, high-scale problems across jobs, users, employers, communities, matching, growth, and AI-driven systems.

About the Role

Apna is looking for a Lead / Staff Data Engineer to build and scale our core data platform. This role will work on large-scale data pipelines, lakehouse architecture, query platforms, workflow orchestration, and data reliability systems that power analytics, product intelligence, machine learning, business dashboards, experimentation, and operational decision-making across Apna.

We are looking for someone who can think deeply about data architecture, design reliable pipelines, improve data quality, and help build a platform that can scale with Apna’s growth.

What You’ll Own:

You will be responsible for designing, building, and operating critical parts of Apna’s data platform, including:

  • Building scalable batch and near-real-time data pipelines across product, business, growth, and ML use cases.
  • Designing and improving our lakehouse architecture using technologies likeApache Hudi.
  • Working with query engines such asPresto / Trinofor large-scale analytical workloads.
  • Building and maintaining orchestration workflows usingApache Airflow.
  • Creating reusable data models, curated datasets, and reliable data marts for analytics and product teams.
  • Improving data platform reliability, observability, SLA tracking, lineage, and data quality checks.
  • Optimizing storage, compute, query performance, and pipeline costs.
  • Partnering with product, analytics, ML, and backend engineering teams to understand data needs and convert them into scalable platform solutions.
  • Driving engineering standards around data modeling, schema evolution, partitioning, deduplication, backfills, replayability, and pipeline ownership.
  • Mentoring data engineers and influencing architecture decisions across teams.

What We’re Looking For

Must Have

  • Strong experience indata engineering, preferably at scale.
  • Hands-on experience withApache Airflowor similar orchestration systems.
  • Strong knowledge ofPresto / Trinoor other distributed query engines.
  • Good understanding ofApache Hudiconcepts such as:
    • Copy-on-write vs merge-on-read
    • Upserts and deletes
    • Incremental reads
    • Compaction
    • Clustering
    • Timeline and commits
    • Schema evolution
    • Partitioning strategy
  • Strong knowledge of distributed data processing and storage systems.
  • Ability to design and build reliable ETL / ELT pipelines.
  • Strong SQL skills and ability to debug complex data issues.
  • Good understanding of different data architectures, including:
    • Data warehouse
    • Data lake
    • Lakehouse
    • Lambda architecture
    • Kappa architecture
    • Medallion architecture
    • Event-driven data architecture
  • Experience with data modeling for analytics and reporting.
  • Strong programming skills in at least one language such asPython, Java, or Scala.
  • Ability to reason about trade-offs between freshness, cost, reliability, latency, and complexity.
  • Strong debugging and production ownership mindset.

Good to Have

  • Experience with Kafka, Spark, Flink, Hive, Iceberg, Delta Lake, or BigQuery.
  • Experience building internal data platforms or self-serve data infrastructure.
  • Experience with data quality frameworks such as Great Expectations, Deequ, Soda, or custom validation systems.
  • Exposure to ML feature pipelines or feature stores.
  • Experience with metadata management, data catalogs, lineage, and governance.
  • Experience with cloud infrastructure such as AWS, GCP, or Azure.
  • Understanding of privacy, compliance, PII handling, and access control in data systems.

What Success Looks Like
In this role, success means:

  • Critical business and product datasets are reliable, discoverable, and trusted.
  • Pipelines are observable, recoverable, and have clear SLAs.
  • Query performance improves across major analytical workloads.
  • Data freshness and quality issues reduce significantly.
  • Teams can build on top of the data platform faster without reinventing pipelines.
  • The platform can scale with Apna’s user, job, employer, and engagement data.