Airflow Vs. Dagster: The Full Breakdown!

  Рет қаралды 5,990

The Data Guy

The Data Guy

Жыл бұрын

In this video I'll give you a full breakdown of the differences between Airflow and Dagster so that you can make an informed decision on which solution is best for you! Hope this helps anyone out there who is trying to decide between the two!

Пікірлер: 27
@jarredthedataengineer
@jarredthedataengineer 3 ай бұрын
This an interesting video but it is fairly inaccurate about Dagster, I'm sure not out of malice, but probably because op is more familiar with Airflow. for ex... Dagster is open-source, it is super extensible and modular, etc. I'd also point out a pretty important difference between Dagster and Airflow, Dagster enables a local to production test-build and deploy cycle, which is not really possible with Airflow. Also, Dagster comes with a ton of automation capabilities that just aren't possible with an imperative orchestator like Airflow. This is a pretty deep subject that requires a fair amount of knowledge by the author to really give a fair comparison, and it's somewhat lacking in this video.
@thanhbinh24
@thanhbinh24 11 ай бұрын
This video is just right on point! I had my first job as a DE recently and was tasked with migrating all the cronjobs to an orchestration tool and I was looking for the best option, and now i'm pretty sure that we'll be better off with Airflow. Thank you and keep up the good work my man
@thedataguygeorge
@thedataguygeorge 10 ай бұрын
Thank you so much, happy this helped you make a decision!!
@simondelorean
@simondelorean 9 ай бұрын
Thank you, that was very helpful.
@thedataguygeorge
@thedataguygeorge 9 ай бұрын
You're very welcome, glad it was helpful!
@luiztauffer8513
@luiztauffer8513 Жыл бұрын
Hey, thanks a lot for the insightful overview! And your channel is awesome for Airflow content. I'd love to see a similar comparisons with Flyte and Kestra
@thedataguygeorge
@thedataguygeorge Жыл бұрын
Thanks Luiz! Really appreciate the love! And will put them in the schedule, thanks for the idea!
@baja
@baja 10 ай бұрын
Coming here as someone who uses dagster daily and wants to know if Airflow is worth it so appreciate this comparison A few things on the Dagster side: for the first example you can do exactly what you have in Airflow in Dagster. You can create branching logic by having an Op have multi outputs (not all required) and only output the single one for the day of the week. You can wrap this branching Op and the specific day of the week Ops in a graph and build this graph into one of the assets shown. If guitar lessons, family dinner, etc... produce assets, you can just make them their own assets and have a similar not required feature where they only fire on their specific day of the week. In the UI you can expand the assets to their Ops and Graphs to see the branching logic I use this for example by training a ML model every monday and then running predictions using it after. Every other day of the week, we just use the previous model for predictions without retraining I don't really understand the point about testing in dagster? You can add assertions/raise errors in the Dagster Assets, there's also hooks which are separate functions that run after the completion of an asset (these can send messages to slack, do any quality checks, etc... it's just a python function) - which is just nicer to keep things separate. Most of those logs you're seeing in Dagster will be user specified as the logger gets passed into the Asset function - I log debug info, errors, warnings, etc... I don't really understand the last point about dagster api?? You can run anything in Dagster, for example if you want to trigger something in Fivetran or DBT Cloud, the dagster code is just hitting the endpoint and polling while computations are done elsewhere. You can set up your own api's to do a similar thing. I don't really like how Dagster couples compute and orchestration so much but it seems like Airflow is doing a similar thing and you don't have to use Dagster this way. There's IO managers to manage the data passing between assets. This doesn't have to be JSON data from an API but any python variable. I run dagster on kubernetes where each asset is run in it's own pod so I'll use S3 or GCS, etc... to pickle the python objects and pass between pods. My understanding is that this is an advantage dagster has because it type checks the data going between pods. There's other tasks where my assets just run cli, one example being running scripts in R
@thedataguygeorge
@thedataguygeorge 10 ай бұрын
Wow thank you so much for that breakdown, really really appreciate it! Am planning on a revised version of this video to give Dagster more credibility after learning all these things, made the video when I was still relatively new to Dagster
@baja
@baja 10 ай бұрын
@@thedataguygeorge All good, and looking forward to the new videa! It did take me a lot of time using Dagster to learn a lot of these thigns
@nixbruh
@nixbruh 10 ай бұрын
one thing i have to say that sucks is let's say you want to have two ops in a job, and have them run in parallel - dagster won't let you do that if your io managers are in memory. it will force one to wait for the other. for me that defeats the whole purpose honestly. maybe im clueless?@@baja
@baja
@baja 10 ай бұрын
@@nixbruh This shouldn't depend on the io manager but on the executor you're using. Are you using a multiprocess executor or in process? I don't have an issue using multiprocess locally or I typically use a k8s_executor when deploying. I typically use the fs_io_manager instead of in memory locally but again that shouldn't matter
@ofnotandi
@ofnotandi 8 ай бұрын
Dagster is open source according to the homepage
@thedataguygeorge
@thedataguygeorge 8 ай бұрын
Sorry you're right, I think it's more of an open-core since there's not much development outside of the dagster company but that's definitely up for debate!
@ricardomalla6533
@ricardomalla6533 4 ай бұрын
would airflow be a good fit to orquestate a couple of python scripts to send marketing emails to our customers based on certain criteria? is there something better for this application?
@thedataguygeorge
@thedataguygeorge 4 ай бұрын
Thats a great use case for Airflow! MailChimp might also be a good option for that particular use case as well!
@joshuasmith2814
@joshuasmith2814 Жыл бұрын
Great content... (horrid audio, was your landlady vacuuming?)
@thedataguygeorge
@thedataguygeorge Жыл бұрын
Thanks Josh! And apologies, I had facade construction going on outside my window from 8-6 the past couple months that was really screwing me up, all done now though, hopefully its better in recent videos!
@nixbruh
@nixbruh 10 ай бұрын
awesome stuff bro. question, is there any reason why not just to use these things as schedulers and just have them spin up containers that hold the code? i feel like you get tied to a specific framework and it turns into a nightmare...
@nixbruh
@nixbruh 10 ай бұрын
i guess the only downside is that you can stop and start parts of the code that might fail or just to run things manually? but idk if that trade off is worth it...hoping people who know what they're doing can share opinions
@thedataguygeorge
@thedataguygeorge 10 ай бұрын
That is a totally valid approach, honestly one that I think Airflow excels at. A lot I see using Airflow in production are just using it to call out to other containers/services to run those jobs there, and just have Airflow as a centralized error-handling/monitoring layer on top in addition to its scheduling capabilities
@datalearningsihan
@datalearningsihan Жыл бұрын
Thank you. I feel privileged for making the video on my request. I know I know, I will take the whole of the credits :D
@thedataguygeorge
@thedataguygeorge Жыл бұрын
hahahaha no worries man, doing it all for you!
@ricardomalla6533
@ricardomalla6533 4 ай бұрын
genius.
@thedataguygeorge
@thedataguygeorge 4 ай бұрын
Thanks man!
@StefanoMessina-ux2mj
@StefanoMessina-ux2mj Ай бұрын
I'm pretty sure Dagster is open source
@thedataguygeorge
@thedataguygeorge Ай бұрын
It technically is but 90% of the dev work is from the on-staff Dagster team
Databricks Vs. Airflow for ETL Workflows!
29:55
The Data Guy
Рет қаралды 2,2 М.
Airflow Vs. Prefect: Full Breakdown!
17:41
The Data Guy
Рет қаралды 4,8 М.
Alat Seru Penolong untuk Mimpi Indah Bayi!
00:31
Let's GLOW! Indonesian
Рет қаралды 16 МЛН
Дарю Самокат Скейтеру !
00:42
Vlad Samokatchik
Рет қаралды 4,6 МЛН
Nutella bro sis family Challenge 😋
00:31
Mr. Clabik
Рет қаралды 12 МЛН
Configuration & Resources (A Dagster Deep Dive)
33:51
Dagster
Рет қаралды 1,9 М.
Don't Use Apache Airflow
16:21
Bryan Cafferky
Рет қаралды 88 М.
Event-Driven Architecture (EDA) vs Request/Response (RR)
12:00
Confluent
Рет қаралды 121 М.
Dagster Data Orchestration 10 min walkthrough
10:28
Dagster
Рет қаралды 18 М.
DuckDB & Iceberg : The Future of Lightweight Data Management
1:14:57
Airflow Vs. Prefect Part 1: Data Guys Debate!
29:19
The Data Guy
Рет қаралды 645
Alat Seru Penolong untuk Mimpi Indah Bayi!
00:31
Let's GLOW! Indonesian
Рет қаралды 16 МЛН