Data Engineer

[sharebear]

Who we are:

We are the world’s leading ground transportation marketplace. We give travellers seamless access to ground transportation online, from search to ticket purchase. We have built a cutting-edge B2B technology platform that connects bus, rail and ferry operators in 70+ countries with the biggest online retailers including Google Maps and Booking.com. 

We are shaping the future of travel and building the largest global network of transport providers and retailers. Having grown 10x in the past year, we are one of the fastest growing startups in travel. Backed by three leading VCs (Creandum, Northzone and Lightrock), and now, following our recent €30 m Series B, we are ready to push beyond. 

Do you want to work in an advanced tech environment and have an impact on millions of travellers around the globe? Come join us!

 

Our tech stack

  • BigData: DataLake based on GCP (BigQuery/CloudStorage + Airflow for batch data processing, Kafka for data streaming + GDS for visualization)
  • WebServices: FastAPI for Backend and Vue.js for frontend
  • Machine Learning: plain old Statistical, Black-box models, Pyro, Pytorch

Who you are:

  • Major in Computer Science or related field
  • 2+ years of data engineering experience (data ingestion and data processing pipelines)
  • Experience in data management domain: data quality, data lineage, and data security
  • Excellent knowledge of Python and SQL
  • Experience with Airflow, Kafka and Kafka Connect, FastAPI, GitLab
  • Ability to break down complex problems and projects into manageable goals
  • Problem-solving, process improvement, and analytical skills


What you will do:

  • Have a full ownership of our Data Lake: from building new ELTs, ETLs and streaming ingestions to evolving infrastructure & expanding Data Lake toolset [approx. 60% of your time]

  • Partner with Platform, Integrations and White Label engineering teams to improve the data visibility and transparency; evaluate information gathered from multiple sources, identify gaps, reconcile conflicts, and decompose high-level information into details to drive decision-making [approx. 20% of your time]

  • Drive data quality analysis, reporting, and monitoring across core processes and data assets [approx. 10% of your time]

  • Establish and maintain a metadata dictionary for the core metadata elements [approx. 10% of your time]

 

Our impact on the society and the environment:

There are still 100 m tickets sold offline every day. We are making ground transportation more online accessible and more convenient to use for millions of travellers. Transportation accounts for nearly 20% of global CO2 emissions, and each trip booked via Distribusion’s platform that substitutes a more pollutive and congested mode of transport such as taxi, private car, or plane, directly contributes to reducing global carbon emissions. 

Apply here