Check out my boot camp / course at DataExpert.io where you can learn all this in much more detail! Use code PROMOTION15 at checkout by April 7th to get 15% off! #dataengineering #netflix
Пікірлер: 358
@sevrantw89313 ай бұрын
I’m so glad I found this video, I was just sitting here with 60 million gigabytes and was figuring out what joins to use so this was perfect timing.
@aripapas10983 ай бұрын
if all u registered was 60 mil gb & joins ur not flowing
@smackastan56973 ай бұрын
You're kidding, but somehow I just started a data analysis project of two terabytes and this video shows up.
@hi-mn5rg3 ай бұрын
@@aripapas1098 if you think comments must indicate a user registered every aspect of a video, ur not following
@derickd61503 ай бұрын
@@aripapas1098this is a sad comment
@00Tenrai003 ай бұрын
Sarcasm ???? 😂
@bilbobeutlin34053 ай бұрын
Can't wait to build hyperscale pipelines for my startup with 0 users
@92kosta3 ай бұрын
But it sounds powerful when you say it, like you mean business.
@npc-drew3 ай бұрын
Based
@vikingthedude3 ай бұрын
1 user (me)
@JGComments3 ай бұрын
If you build it, they will come.
@abhilashpatel68523 ай бұрын
I have 1k TB data just sitting around in my backyard. Glad your video came up to get me started on atleast something.
@subhasishsarkar51063 ай бұрын
What I absolutely love about your videos is that as a beginner in the data engineering field, you often talk about things that I had no conception of. In this video for example, I have never heard of SMBs or broadcast joins. This gives me an oppurtunity to learn these things, even hearing them be mentioned from someone as widely experienced as you. You need not necessarily have to even go into detail, but these short form videos act as beacons of knowledge that I can throw myself into learning about. Thanks a lot, and keep these coming Zach!
@EcZachly_3 ай бұрын
Really appreciate this comment! It reminds to that the value im putting out there is important!
@vasudevreddy35273 ай бұрын
@@EcZachly_ ✌
@eric.batdorff3 ай бұрын
Great summation! I was thinking the exact same thing while watching. It's nice hearing even the specialized lingo from technical experts in their fields, it peaks my curiosity.
@MrAmitkr0073 ай бұрын
@@EcZachly_thanks
@prawtism3 ай бұрын
@@EcZachly_did you already know the importance of these two before Netflix or did you learn that while working at Netflix?
@supercompooper3 ай бұрын
In the future a wrist watch will have a little blinking light that will have 60 million gigabytes of data in it
@dhillaz3 ай бұрын
You mean an Electron app?
@aripapas10983 ай бұрын
yeah okay crack smoker
@mrevilducky3 ай бұрын
And it will still lag and hit 99% singularities
@Ivan-Bagrintsev3 ай бұрын
@@dhillaz that will just show current time
@supercompooper2 ай бұрын
@@Ivan-Bagrintsev Yes it will show the time, but with full DRM. Unless you have a license to view certain minutes it will be denied.
@lucas.p.f3 ай бұрын
Boyfriend simulator: you sit with your bf and he starts talking about this nerdy stuff you have no idea about but need to keep listening because you love him
@EcZachly_2 ай бұрын
This is exactly correctly
@CU.SpaceCowboy2 ай бұрын
aww 🥰
@heykikeАй бұрын
After marriage they no longer pretend to listen to
@rajns8643Ай бұрын
If only a girl would fall for me when I speak nerdy stuff 🫠
@lucas.p.fАй бұрын
@@rajns8643 are you kidding me? This is what most people like the most! Intelligent people are extremely attractive
@supafiyalaito3 ай бұрын
Thanks Zach, hopefully one day I will understand what all of that means
@Bostonaholic3 ай бұрын
I love that you kept it short and to the point.
@tobiastho96393 ай бұрын
He sure wanted to save some data… 😅
@RichardOles3 ай бұрын
Holy crap. I’m currently learning about data science, the various roles, etc. -with the hope of one day switching careers. But the current state of learning is all about the languages and software used etc, not about the infrastructure and what to do with massive datasets. So this just 🤯
@samuelisaacs7557Ай бұрын
its really about math but no one talks about it. get at least 1 year university math comprehension and then get into the python and tech tools. the most competent and successful data engineers are always people with a good STEM background. for example Zach has a Bachelor's Degree in Applied Mathematics and a Bachelor's Degree in Computer Science so he is a heavy numbers guy. That's what most of Data Science \ Engineering KZfaqrs don't tell their viewers cause that will cause them to loose viewers.
@byRoyaltyАй бұрын
learning the tools can be very different from solving real world problems.
@rajns8643Ай бұрын
@@samuelisaacs7557 True asf
@stevess7777Ай бұрын
@@samuelisaacs7557Yep, even a business administration bachelors will have a lot of maths and it's nowhere near data science which is 3x that.
@WM-eg4gh3 ай бұрын
Thank you Zach for taking the time to give us the hard truth and hands down your experience. It helps a lot of enthuastic students/people to know how we can in some way support or help others in the subjects we like. I don't imagine myself processing 2000TBs per day, but it helps give a bigger picture. Once again, appreciate the short video and thank you for sharing
@mohammedaamer42013 ай бұрын
Just started following you. Really appreciate you for sharing your knowledge with the community.
@rembautimes88083 ай бұрын
Great content, an honour to be able to listen to someone who has handled that volume of data.
@stifflery3 ай бұрын
literally 🎉
@codecaine2 ай бұрын
Have chat gpt explain it too you or some other LLM.
@Adhanks913 ай бұрын
Informative and straight to the point, great stuff as usual
@JT-zb6vi3 ай бұрын
instant subscribe - really appreciate the concise explanation and clear examples
@LambOverSpicyRice3 ай бұрын
Excellent video, thanks Zach!
@rohanbhakat29223 ай бұрын
Thanks for the info Zach. Could you please make an elaboriative video on SMB join.
@jacobp82943 ай бұрын
I am a regional IT installer who runs Cat6 Ethernet pipelines for managing 1gb loads on HP laptops, this video is really awesome and breaks down your workflow and mindset in a complicated field really efficiently. I would love to get more short videos about the industry like this.
@EcZachly_2 ай бұрын
I'll keep them coming. I make much more on Tiktok and Instagram since I like making vertical content!
@jacobp82942 ай бұрын
@@EcZachly_ Ill check it out! Keep it up!
@tanujkhochare34983 ай бұрын
Hey Zach, your content is consistently amazing! As a newcomer to the field, I'm considering diving into data engineering. What roadmap would you recommend, and are there any certifications that could enhance my journey? I already have a solid grasp of Python and SQL in data analysis.
@sharpsrain83023 ай бұрын
I just found ur stuff but thanks for the content mang keep it up 🙏
@SahilKashyap643 ай бұрын
I've never heard of these terms, thank you sharing your real case scenarios(The FB notification example)
@oakleyorbit22 күн бұрын
Half of what you said I had no idea what you were taking about but I was very engaged and now I’m gonna look all this stuff up for centering my div!
@souravghosh3583 ай бұрын
Very important concept in such short time.. thank u so very much ❤
@vinit.khandelwal3 ай бұрын
Thanks, looking forward to more such content
@ArjunRajaS2 ай бұрын
If you come across a scenario to join 2 large datasets. You could do an iterative broadcast join. Basically you are going the break one of the df into multiple dfs and join the dataframe in a loop till all the multiple dfs are joined.
@jordanmessec53322 ай бұрын
You’ll require a lot of memory and have long start times, no?
@dazzassti3 ай бұрын
In the 37 years I’ve been working in data, I’ve never heard anyone call it Peter 😂. PETA
@anotherguy94022 ай бұрын
What's wrong with a Peter bite?
@divinecomedian22 ай бұрын
Heya Peeda
@Starmast3rmusic2 ай бұрын
Could be an accent or a slip 😂
@ChrisMPerry3 ай бұрын
Insightful as always.💯
@EcZachly_3 ай бұрын
Appreciate that!
@RyanSaplanPT3 ай бұрын
Please more data stuff!!! I hardly understood what you said, but it’s sounds interesting
@nikolagrkovic87692 ай бұрын
The amount of knowledge you shared here is astonishing
@arbol412 ай бұрын
Thanks Zach , but I have a question broadcast join is used when we have a small dimensions joined with big table this is your case? Or are you used hash join with two large table?
@Jc12x063 ай бұрын
Dude has beef with Bezos😂
@theAnupamAnandoriginal3 ай бұрын
you can make a bios optimized for throughput and without interrupta , to speeden 67x and more
@maggiejetson790420 күн бұрын
Honestly, 2000 TB per day isn't the problem. The problem is the cost and how much of the data is burst. If it is not burst it is pretty much always cheaper to do it in-house with your own hardware than to pay and rent the cloud to do it.
@Llanowyn3 ай бұрын
I would be interested in the architecture and content delivery for pre and post cdn from a network design perspective. Are there any examples or presentations regarding networking at netflix?
@solitary20014 күн бұрын
Great points to remember! There are a lot more underlying abstraction layers you can add at these different points to further optimize the second network hop. Caching is a simple one. Can you implement an efficient snapshot system with delta encoding of entities and compress the message? Would be a cool video for you to implement!
@dungenwalkerr619Ай бұрын
Thanks for sharing, now I can finally put some good numbers on my resume 🎉
@ATX_Engineer23 күн бұрын
Ah yes, data structures and sorting… but with the “can you even scale bro” tick enabled.
@JGComments3 ай бұрын
2 pita bites a day, the same as me when I’m on a diet.😊
@theactualslimshady3 ай бұрын
Please keep up the great content!
@explosivecl3 ай бұрын
Thanks for the video
@internetcancer16723 ай бұрын
My problem is how do people even find out about the careers that they go into?
@joshi1q2w3eАй бұрын
Did Facebook use Databricks or did they have HPC Clusters for you to run Spark on?
@remoАй бұрын
Damn I just wanted to shuffle like there’s no tomorrow and then I found this video.
@earthling_parth3 ай бұрын
Imma wait for Primeagen to confirm this as well when he reacts to this video inevitably 😁
@vikrampandit21743 ай бұрын
Never thought broadcast join is a Netflix saviour
@john_paul2 ай бұрын
I love how you acronym Sorted Bucket Merge as SMB. Think you may have had Super Mario Bros on the mind 😂
@IAmAlpharius14Ай бұрын
Sir this is a Wendy's.
@OurNewestMemberАй бұрын
Interesting! I would have thought something like sharding (or partitioning and clustering) so data processing and access can scale horizontally.
@EcZachly_Ай бұрын
Bucketing and clustering are similar
@aamadmi58483 ай бұрын
Thanks zech for the video
@seegreen64842 ай бұрын
I love that I’m only a software engineer but I can understand all of this
@rashshawn7793 ай бұрын
Very nice. Short and sweet.
@EcZachly_3 ай бұрын
Glad you enjoyed it
@TheInterestingInformer2 ай бұрын
I’m trying to get into data analytics and most of this we t over my head but this still sounds lit 🔥
@hearhaw23 күн бұрын
I'd like to learn more about these pitabytes. What are they? What do they taste like?
@TLOGhx3 ай бұрын
Insanely valuable content
@uwize58972 ай бұрын
optimizing selling personal data to minimize cost is something i never thought about
@MFsyrup2 ай бұрын
Thank you Tony Hawk, very cool!
@liamvstech3 ай бұрын
When I was hired to do data engineering, it was always data that could fit on a single hard drive and it was boring af. I hated it. This sounds way more challenging and interesting.
@ChuckNorris-lf6vo3 ай бұрын
Hi, what about replacing torrents with IPFS? That's data pipelining, right ?
@TheDa678128 күн бұрын
Managing retention, storage and flow is always important. Im sitting on a toilet as im writing this.
@narbwow81683 ай бұрын
Pretty interesting, even though I had no idea about most of what he was talking about.
@user-op5vc9qw6o3 ай бұрын
That's cool bro. Will it fix the Netflix app where it shows the title of one show but the preview and description of another?
@EcZachly_3 ай бұрын
It was to look at network traffic to keep your credit card data secure
@SamCyanide3 ай бұрын
My medical science clients called, they need an 800tb imaging data set parsed by end of day (thank you kubernetes)
@dark_lord983 ай бұрын
Are those joins available in MySQl or specific to dbms at meta you worked?
@juanbrekesgregoris44053 ай бұрын
I think they're not available on MySQL because it's an OLTP database. Those joins are used for analytics
@jordanmessec53322 ай бұрын
These are not database joins, they are processing joins. Frameworks such as Flink and Spark would leverage broadcasts. It basically boils down to a single coordinator instance that publishes a small, often changing dataset to all parallel processors. Usually used to enrich, prune, or map the main dataset.
@bacfjib98743 ай бұрын
Very informative, I wanna ask you, which certification can help me as a fresh graduate, is AWS data engineer Certification worth it or not? And thank's a lot Zach
@EcZachly_3 ай бұрын
It’s pretty great!
@_sonicfive2 ай бұрын
Whenever I hold on to more than 60 petabytes I just call the assistant to the regional manager and he runs a fix from his mainframe.
@iloos74573 ай бұрын
Hey are you familiar with cosmosDB from azure? Its a db like mongo but claims to be able to scale infinitely... What are your thoughts on that?
@orppranator52303 ай бұрын
Bro can figure out how to send my entire homework folder in 1/500th of a second but can’t flip the camera sideways
@sneakybutpirate2 ай бұрын
Oh yeah that’s really great and insightful, now what’s a join?
@schwarzie24783 ай бұрын
I just felt like drinking from the fountain of knowledge and instantly drowning. Definitily haven't had to deal with these kind of volumes yet...
@GnomeEU25 күн бұрын
Now I just need a billion dollar company to have these kinda problems. My question would be, why you have table that big? Can't you distribute or cluster your data? I'm thinking like 10000 users per server. Only stuff around those 10k users gets stored. No magic needed to query stuff.
@EcZachly_25 күн бұрын
Gotta analyze it all together though
@theAnupamAnandoriginal3 ай бұрын
: multiple streams across entire ddrs directly accessible
@GameCyborgChАй бұрын
gotta love a good pita byte
@xasm832 ай бұрын
my data pipeline usually processes one pitabyte every other day and one shawarmabyte every week week
@emerald42481Ай бұрын
Very useful and interesting, even to a layman
@GeneralKenobi694203 ай бұрын
The Venn diagram of people who use TikTok and data scientists is two circles my dude lol
@EcZachly_3 ай бұрын
I have 66k followers on TikTok and this video did 375k views there.
@TheGoodContent37Ай бұрын
Love the way you tried to make it sound more complicated than it actually is and failed.
@LucTaylor2 ай бұрын
I might get 5 users on my site this month so this will come in handy
@3dilson3 ай бұрын
"FNA developer" I'm sorry, my brain couldn't let go of it
@phitsf54752 ай бұрын
The internet is not something you just dump something on, it's not a big truck. It's a series of tubes.
@picdu2891Ай бұрын
I love technology and I know more than your average user, yet I have no IT qualifications and I am light years away from this knowledge, but for some reason, I love watching these videos as if I was ever going to use the information 😂
@49erman23 ай бұрын
Quality content!
@bandanaboii31362 ай бұрын
Interviewer: name 5 data types Me:
@cry2love2 ай бұрын
I still bite my gigas when my man hustling meta in peta
@nat.serranoАй бұрын
This guy earned his half a million salary. I tried to do this myself and failed
@ungeschaut3 ай бұрын
I use just a database with just value as field (long string) and nothing else
@Hishamhh9327 күн бұрын
Bro is the PewDiePie of data Engineering
@Kusagrass3 ай бұрын
People don’t know the data they collect is very volatile, unless you are paying for it.
@chrism3790Ай бұрын
What engine were you using to do these massive joins? Spark?
@EcZachly_Ай бұрын
Yep!
@DxWangZ3 ай бұрын
I don't quite understand why Netflix needs data pipelines.
@tschaderdstrom21452 ай бұрын
I love pita bites as much as the next guy, but I don't think I can take more than 35 before I'm full
@AkhilSharmaTech3 ай бұрын
Yes but why does he look like a French model
@manh91053 ай бұрын
ok, so how to do that ...can you make a screencast and show us how to do it!
@mikishwagg3 ай бұрын
Me watching this not knowing anything hes talking about makes me feel like starting a big tech company 😀
@Manhunternew3 ай бұрын
How do you deal with log data
@YishuaiLiu3 ай бұрын
Short and informative
@EcZachly_3 ай бұрын
Thank you! What other videos would you like to see from me?
@PySnek3 ай бұрын
That's around 160 Gbit/s. Enough for 30K 1080p streams or 10K in 4K.
@KvltklassikАй бұрын
I built data pipelines at Netflix that ran 2000000000 MBs per day
@dexnow26 күн бұрын
I suddenly feel like pita bread...
@aarjunpp2 ай бұрын
1. Are you a data engineer? 2. What tech is this? AWS, Snowflake?
@sergeikulikov44123 ай бұрын
You shouldn't write "s" in Terabyte per hour, just TB/hr "TBs/hr" looks like "Terabyte*second / hour" 😅
@user-to4md9xm2d3 ай бұрын
Hey absolutely curious about the content your are doing. In my company we are working dbt and snowflake. I can't find a possibility to work with broadcast joins there. do you see a possibility to replicate this process?
@EcZachly_3 ай бұрын
Snowflake isn’t suitable for volumes >100tbs in my opinion. Clustering is an option in snowflake that helps though
@tlalepm3 ай бұрын
My tech lead keeps talking about bucketing as our integration solution tends to get overloaded sometimes. This kinda puts things into perspective. Definitely dont need most of what he’s talking about but just to know the terms and how to implement them