Data Pipelines using Airflow on Kubernetes
With ML being the new shiny object, No one really talks about the data pipelines that make the data consumable. Single streamlined data pipeline may sound cozy but no one want’s to sit through 10 hours for ingesting 2 million records, do you? Trust me you don’t. The solution? Distributed data pipelines.
Airflow is an opensource platform to create, schedule and monitor workflows. In this talk we will explore - What is airflow? - How to create data pipelines? - How to exploit kubernetes to achieve performance - Pros and cons of the approach