No video

How to Clean Data Like a Pro: Pandas for Data Scientists and Analysts

  Рет қаралды 2,207

TrentDoesMath

TrentDoesMath

Күн бұрын

Пікірлер: 22
@newenglandnomad9405
@newenglandnomad9405 6 күн бұрын
Fantastic easy to follow data cleaning video. I also appreciate you blatantly saying yes it's 50 or so rows but it could be 10k, but the same techniques apply.
@ImJordanHubbard-qg9qt
@ImJordanHubbard-qg9qt 14 күн бұрын
Actual actionable real life skills not fluffy fun python skills but actual valuable stuff we need to know!
@souravbarua3991
@souravbarua3991 12 күн бұрын
Very helpful and super simple explanation. Looking forward for your next advance pandas with larger dataset videos. Thank you for this video.
@Carlos-wv4zk
@Carlos-wv4zk 27 күн бұрын
Dude I cannot explain how helpful this was, man! Seriously, you literally allowed me to pickup any datasets I download and immediately gave me the practical guidelines to clean/analyze it. Thank you!!
@trentdoesmath
@trentdoesmath 26 күн бұрын
You're very welcome!😎
@mapletech_22
@mapletech_22 6 күн бұрын
This is great. ❤❤🎉
@israsuazo3345
@israsuazo3345 Ай бұрын
This is the 1st video I watched that actually seeing the python libraries in action. Thank you for this.
@trentdoesmath
@trentdoesmath Ай бұрын
You're very welcome! I'm excited to hear about what you will build with them 🙂
@ChukwuemekaAmblessedchinenye
@ChukwuemekaAmblessedchinenye Ай бұрын
wow your are the real goat the best video so far please more video like this
@dogsapparatus7504
@dogsapparatus7504 19 күн бұрын
nice tutorial
@LivingG6170
@LivingG6170 Ай бұрын
Keep doing good work. Big help
@trentdoesmath
@trentdoesmath Ай бұрын
I appreciate the kind words 🙏 thanks for the support!
@totoarifiyanto8679
@totoarifiyanto8679 Ай бұрын
Just like Thor said: "Another"
@CaribouDataScience
@CaribouDataScience 28 күн бұрын
You misspelled Tidyverse 😮
@trentdoesmath
@trentdoesmath 28 күн бұрын
🤣
@tmb8807
@tmb8807 28 күн бұрын
Cool, thanks. Is Polars making much of an impact in your world? I've used it a bit and I think I prefer the more explicit syntax - besides the potential for enormous performance gains it brings.
@trentdoesmath
@trentdoesmath 28 күн бұрын
Hi tmb8807 :) I have followed a couple of tutorials on polars, but never used it on anything in a professional setting as of yet 🤔 I'll test it out more extensively. Any good tutorials you'd recommend? Typically, when I've worked on projects that needed high performance I've used Apache Spark - but Polars could be a nice in-between pandas and spark? Thanks for the support!
@tmb8807
@tmb8807 27 күн бұрын
@@trentdoesmath thanks for the reply. There are a few tutorials on KZfaq, the one from Rob Mulla is what got me onto it. Because Polars can work with larger-than-memory data via the streaming API I’ve seen it suggested it could replace Spark on a single node for some jobs, although I’ve not done that first hand! But it could potentially expand the 'in-between' area, as you say. Main reason I like it is that I just find the syntax much more consistent and readable (and easier to write as a result). Your mileage may vary on that, though, especially if you're extremely comfortable with Pandas (it's a bit less "Pythonic", with more explicit methods for everything). Lazy evaluation and the query optimisation engine are a big selling point of it as well - can greatly improve memory usage.
@trentdoesmath
@trentdoesmath 26 күн бұрын
Awesome! I'll check out the Rob Mulla stuff, thanks for the recommendation👍 For sure! It actually reminds me a bit of Scala 🤔... Very 'to the point'. Not sure if you have tried out Dask before? but it's yet another performance option out there.
@trentdoesmath
@trentdoesmath Ай бұрын
What are some data cleaning techniques that you have used? 🤔
@kikiboy2545
@kikiboy2545 Ай бұрын
Hi ! Thanks for this video. I wanted to know, as a data scientist/analyst, why did you choose to use Jupyter and a .ipynb cleaning file ? Why not using pycharm and a .py for example ? Is that just a matter of personal preference ? Sorry I am new to python, proficient on Stata but trying to make a shift
@trentdoesmath
@trentdoesmath Ай бұрын
Hi @kikiboy2545 🙂 thank you for your question. TL; DR - I chose to use jupyter as it is easier for me to demo with and record the video with. To your point on creating a .py file - I would recommend this if you are creating cleaning logic that is going to be re-used and shipped to 'production' as it is easier to test and maintain a straight Python script IMO. That being said, there is increasing support for the use of notebooks as the preferred environment - as examples, Snowflake, Databricks, Azure Synapse and more all support the use of re-useable notebooks to contain all of your logic. I've worked in teams where notebooks are preferred for all data pipeline code due to how intuitive and approachable they are - but as I say my personal preference is: use notebooks for exploration, and .py scripts for your production code 🙂 No need to apologize! I am glad to be part of your learning journey - keep pushing man! 😎
لااا! هذه البرتقالة مزعجة جدًا #قصير
00:15
One More Arabic
Рет қаралды 51 МЛН
Stay on your way 🛤️✨
00:34
A4
Рет қаралды 33 МЛН
Алексей Щербаков разнес ВДВшников
00:47
Solving real world data science tasks with Python Pandas!
1:26:07
Keith Galli
Рет қаралды 1,5 МЛН
How Fast can Python Parse 1 Billion Rows of Data?
16:31
Doug Mercer
Рет қаралды 211 М.
Real World Data Cleaning in Python Pandas (Step By Step)
40:01
Ryan & Matt Data Science
Рет қаралды 67 М.
This is why Deep Learning is really weird.
2:06:38
Machine Learning Street Talk
Рет қаралды 382 М.
Exploratory Data Analysis with Pandas Python
40:22
Rob Mulla
Рет қаралды 458 М.
ChatGPT for Data Analytics: Full Course
3:35:30
Luke Barousse
Рет қаралды 251 М.
Data Cleaning Project in Python
40:48
Her Data Project
Рет қаралды 24 М.
لااا! هذه البرتقالة مزعجة جدًا #قصير
00:15
One More Arabic
Рет қаралды 51 МЛН