Apache Spark - Computerphile

  Рет қаралды 244,851

Computerphile

Computerphile

5 жыл бұрын

Analysing big data stored on a cluster is not easy. Spark allows you to do so much more than just MapReduce. Rebecca Tickle takes us through some code.
/ computerphile
/ computer_phile
This video was filmed and edited by Sean Riley.
Computer Science at the University of Nottingham: bit.ly/nottscomputer
Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com

Пікірлер: 71
@notangryjustdismayed
@notangryjustdismayed 5 жыл бұрын
note to the editor: please stop cutting away from the code so quickly. we're trying to follow along in the code based on what she's saying. at that moment, we don't need to cut back to the shot of her face. we can still hear her voice in the voiceover.
@SilentScream321
@SilentScream321 5 жыл бұрын
I think the time the code was displayed when she went trough each line was quite sufficient. The code is very readable (except for the typo where "words" suddenly became "splitlines") and reading the code while she explains would most likely on distract you from the explanation she is giving IMHO. If you are looking for a more practical solution i would recommend you to just pause the video and read the code before she explains it step by step.
@foorack
@foorack 5 жыл бұрын
Fully agree. The quick switching was very annoying when trying to read the code. Also would be helpful if the editor could highlight the active line she is talking about.
@trotterdotpoulpe
@trotterdotpoulpe 5 жыл бұрын
Yeah thank you.
@Hourai
@Hourai 5 жыл бұрын
The RDD API is outmoded as of Spark 2.0 and in almost every use case you should be using the Dataset API. You lose out on a lot of improvements and optimizations using RDDs instead of Datasets.
@Technomancr
@Technomancr 5 жыл бұрын
Can you do Apache Kafka next? How do they compare?
@Gooberpatrol66
@Gooberpatrol66 5 жыл бұрын
I understood some of those words.
@0xIAMROOT
@0xIAMROOT 5 жыл бұрын
ahh.. so refreshing after taking a week break from dev work and staying away from non dev topics. Lol, I love our field. Like music to my ears
@Bolt6265
@Bolt6265 5 жыл бұрын
pretty sure theres a typo in that code. "splitLines" doesnt exist and is probably supposed to be words.map(...) instead
@recklessroges
@recklessroges 5 жыл бұрын
Is there any meta analysis on the usefulness of bigdata analysis? How often do jobs get run that either produce no meaningful data or don't produce any statistically significant data?
@tablit.
@tablit. 3 жыл бұрын
Wow congrats on the content. You were able to explain it in a concise, yet logical and detailed way. nice
@alexkompos1735
@alexkompos1735 5 жыл бұрын
These data ones are really good! Keep them coming!
@xakkep9000
@xakkep9000 5 жыл бұрын
It's so clear and easy after the explanation! I will be waiting for more vids about clustering and distributed computing)
@tackline
@tackline 5 жыл бұрын
A great example of how programming languages are a reasonably efficient mechanism to communicate sections of program and how natural language really is not.
@williamwurthmann1573
@williamwurthmann1573 5 жыл бұрын
Thank you for teaching an old man new things.
@christernilsson1
@christernilsson1 5 жыл бұрын
Please give time measurements comparing single node with multi node execution. What is the overhead?
@michael-h95
@michael-h95 5 жыл бұрын
Really interesting video! I have done some MapReduce before, but I haven’t came across Apache Spark
@PaulSukys
@PaulSukys 5 жыл бұрын
typo in line 32 for using `splitLines` instead of `word`?
@adriansrealm
@adriansrealm 5 жыл бұрын
Where are the extra bits?
@tolgakarahan
@tolgakarahan 3 жыл бұрын
Great explanations. Of course there are many things going on behind the scenes, but good overview.
@MJ-em_jay
@MJ-em_jay 5 жыл бұрын
More of these, please. More big data.
@KurtSchwind
@KurtSchwind 5 жыл бұрын
She refers to an early example. Did I miss that video? Otherwise, nicely done. Love learning about distributed computing.
@king4aday4aday
@king4aday4aday 5 жыл бұрын
Search for MapReduce on Computerphile
@Alex55555
@Alex55555 5 жыл бұрын
I wish she also talked a little about Spark's ability to deal with data streams
@m13m
@m13m 5 жыл бұрын
Brady Please make a video on Kubernetes
@Jlr297
@Jlr297 5 жыл бұрын
Thank you for the great summary.
@jimmycheong7970
@jimmycheong7970 5 жыл бұрын
Thank you so much. This was an incredible explanation
@RonaldSVM
@RonaldSVM 5 жыл бұрын
Sorry for redundancy, just verifying my understanding. Do I understand it correctly that (when running this example in a cluster) collect runs the 'reduceByKey' against the results on each node, and then reduces to a final result. Say on Node 1 I have count of word 'something' = 5 , on Node 2 I have count of word 'something' = 3, then collect combines from those two nodes into a count of 'something' = 8, And so on...?
@p.z.8355
@p.z.8355 3 жыл бұрын
What is the architectural difference between spark and map reduce ?
@mohamedthi0une198
@mohamedthi0une198 5 жыл бұрын
I really love your videos I would like to know if it is possible to watch them in French or at least with subtitles so that we can follow
@zugletsmith5082
@zugletsmith5082 5 жыл бұрын
really good summary thankyou!
@MisterPotatoHands
@MisterPotatoHands 5 жыл бұрын
What programming language is she using??
@mathematicalninja2756
@mathematicalninja2756 5 жыл бұрын
Great video
@billoddy5637
@billoddy5637 5 жыл бұрын
Do a video explaining AES!
@Jarza
@Jarza 5 жыл бұрын
Interesting video!
@LucasZawacki
@LucasZawacki 5 жыл бұрын
Good video :)
@sameerakhatoon9508
@sameerakhatoon9508 2 ай бұрын
can anyone please suggest books to learn about distributed systems?
@Mmouse_
@Mmouse_ 5 жыл бұрын
She's damn good at explaining and easy to listen to, any plans of having her host other episodes? (sorry for "her" I don't know her name).
@Xakriss
@Xakriss 5 жыл бұрын
feels like this video is four years too late ... :-/
@draakisback
@draakisback 5 жыл бұрын
Good old Scala.
@Kadderin
@Kadderin 5 жыл бұрын
Was so excited to see this posted :) I'm a Cassandra professional.
@aimanal-fatih386
@aimanal-fatih386 5 жыл бұрын
its bit silly but i cant understand 100% because english isnt my first language , hope someone could add english subs on every this channel videos because i found computerphile videos are easy to understanding because excellent explanation
@michaelebbs6035
@michaelebbs6035 Жыл бұрын
Computerphile will be excited to learn that tripods exist.
@nO_d3N1AL
@nO_d3N1AL 5 жыл бұрын
For anyone interested, although the documentation is awful for Apache Flink and it doesn't support Java versions beyond 8, it at least lets you do setup on each node. Spark does not have any functionality for running one-time setup on each node, which makes it infeasible for many use cases. These distributed processing frameworks are quite opinionated and if you're not doing word count or streaming data from one input stream to another with very simple stateless transformations in between you'll find little in the documentation or functionality. They're not really designed for use cases where you have a parallel program with a fixed size data source known in advance and want to scale it up as you would by adding more threads, but more for continuous data processing.
@WaqasAliAbbasi10
@WaqasAliAbbasi10 2 жыл бұрын
This was very helpful
@ZachBora
@ZachBora 5 жыл бұрын
woohooo rebecca is back
@sillybuttons925
@sillybuttons925 5 жыл бұрын
More like this!!!!!!
@oldbootz
@oldbootz 5 жыл бұрын
Thanks, nice vid.
@knowntoache
@knowntoache Ай бұрын
yeah vertical scaling and modular based data handing similar Hadoop ,Hive. framework library.
@gajiodea
@gajiodea 5 жыл бұрын
Apache Flink next please
@fluffyfloof9267
@fluffyfloof9267 5 жыл бұрын
1:19 Floppy drives? xD LOL
@hanelyp1
@hanelyp1 5 жыл бұрын
Looks like you could do a search engine in that.
@DroisKargva
@DroisKargva Жыл бұрын
"RDD is basically an array distributed across the cluster" - genius
@M3t4lstorm
@M3t4lstorm 5 жыл бұрын
Would have liked it to be a bit more in-depth and technical, was too high level.
@christernilsson1
@christernilsson1 5 жыл бұрын
Please show some drawings or animations of data going back and forth between the noded.
@jameslawson1
@jameslawson1 5 жыл бұрын
The first time I learned about Apache Spark, I was looking up documentation for another framework named Spark.
@BigDataLogin
@BigDataLogin 2 жыл бұрын
thanks
@BeCurieUs
@BeCurieUs 5 жыл бұрын
Ohhh, she is using VSCode! I love VS Code :D
@LeJalapenos
@LeJalapenos 5 жыл бұрын
Hi friends!
@SlackWi
@SlackWi 5 жыл бұрын
I study bioinformatics handling txt files many gigabytes in size and this could be so handy
@lztverygood
@lztverygood 3 жыл бұрын
content is nice, well explained. BUT the camera and editor are so bad. We are not here for a documentary, the computer shot from her shoulder is completely useless and distracting, if you want to use your cuts, use something like the picture in picture but please let us focus on the code!!
@DmitryShultz
@DmitryShultz 5 жыл бұрын
@3:16 line 12 is wrong. Great review 👍 otherwise!
@christianlamprecht9860
@christianlamprecht9860 5 жыл бұрын
First? Does this matter? No. Go build a cluster and be happier!
@gegdim9307
@gegdim9307 5 жыл бұрын
00000001
@UKFbass
@UKFbass 5 жыл бұрын
21st!!!
@veggiet2009
@veggiet2009 5 жыл бұрын
First? sorry, I've never watched a video when it said it was posted "25 seconds" ago, and so it would be weird if I were actually first. Good Video, I feel like I stink at data analysis, but I'm more experienced than most in my organization so...
@vijeenroshponmaniwalson490
@vijeenroshponmaniwalson490 Жыл бұрын
What useless video : - slow down, explain slow , assume audience know not much
@undifini
@undifini 5 жыл бұрын
first!
@oussemamhiri9713
@oussemamhiri9713 5 жыл бұрын
First 😂
@benjaminmellingen5340
@benjaminmellingen5340 5 жыл бұрын
totally lost me 3 min into this video.
@CJWest08
@CJWest08 5 жыл бұрын
She's mumbling in the beginning... can't really hear her (American-born English speaker)
The ONLY PySpark Tutorial You Will Ever Need.
17:21
Moran Reznik
Рет қаралды 125 М.
What is Big Data? - Computerphile
11:53
Computerphile
Рет қаралды 211 М.
KINDNESS ALWAYS COME BACK
00:59
dednahype
Рет қаралды 167 МЛН
🤔Какой Орган самый длинный ? #shorts
00:42
Playing hide and seek with my dog 🐶
00:25
Zach King
Рет қаралды 32 МЛН
Каха и суп
00:39
К-Media
Рет қаралды 6 МЛН
Crowdstruck (Windows Outage) - Computerphile
14:42
Computerphile
Рет қаралды 183 М.
MapReduce - Computerphile
6:41
Computerphile
Рет қаралды 252 М.
[Westworld] Ford just wants to tell his stories
13:51
stereospell
Рет қаралды 398 М.
Master Databricks and Apache Spark Step by Step: Lesson 1 - Introduction
32:23
Apache Spark / PySpark Tutorial: Basics In 15 Mins
17:16
Greg Hogg
Рет қаралды 143 М.
Has Generative AI Already Peaked? - Computerphile
12:48
Computerphile
Рет қаралды 889 М.
Learn Apache Spark in 10 Minutes | Step by Step Guide
10:47
Darshil Parmar
Рет қаралды 276 М.
KINDNESS ALWAYS COME BACK
00:59
dednahype
Рет қаралды 167 МЛН