How to use groupby() to group categories in a pandas DataFrame

  Рет қаралды 122,389

Chart Explorers

Chart Explorers

Күн бұрын

In this video we go over how to group categories of data using the grouby() operation in pandas. We use the popular Titanic data set commonly used when learning data science. We look at how to group on a single criteria on a single column. Then we move on how to group with multiple columns, then multiple groups, multiple groups and multiple columns, and how to look at multiple groupby functions in a single command. As a bonus, I'll show you a trick on how to minimize the number of groups which can improve the interpretability of you data.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
$15 off Annual Dataquest subscription
app.dataquest.io/referral-signup/qybqz3r8/
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Merch: bit.ly/PythonAndDataMerch
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Did you find this video helpful? Consider subscribing for weekly tips, tricks, and tutorials.
/ @chartexplorers
Link to Data
bit.ly/3kGDuKx
0:00 Intro
0:18 Data Overview
1:08 Groupby single col & function
1:55 Multiple cols grouped
2:40 Multiple grouping cols
3:20 Multiple functions flat df
3:43 Multiple functions
4:35 Quick tip

Пікірлер: 145
@ShiladityaBiswasNow
@ShiladityaBiswasNow 2 жыл бұрын
Thanks a lot! You saved me days! I'm literally crying rn. So pricise and to the point. Love the content
@ChartExplorers
@ChartExplorers 2 жыл бұрын
I'm glad it helped! Groupby was always a sore spot for me learning, but now that I know it I use it all the time.
@imad_uddin
@imad_uddin 3 жыл бұрын
I have seen three of your videos so far, all were very well thought out. Really helpful. You deserve many more subscribers!
@ChartExplorers
@ChartExplorers 3 жыл бұрын
Thanks for your kind words Imad Uddin!
@rashadm.sadigov4366
@rashadm.sadigov4366 Жыл бұрын
Dude thank you sooo much. Finally someone with proper english explained things properly
@aishwaryapattnaik3082
@aishwaryapattnaik3082 Жыл бұрын
Just what we needed . Awesome content 🙌🏼
@Aleqsie
@Aleqsie 7 ай бұрын
ok this is a mad comprehensive information that is explained amazingly briefly and clearly within just 7 min.
@tonianibal7585
@tonianibal7585 Жыл бұрын
Thank you very much for sharing! It really helped me, was exactly what I was looking for. People like you are blessed ang good people helping to develop this world! I just subscribed, follow and will share in my groups!
@sgerodes
@sgerodes 3 жыл бұрын
Brilliant. It had exactly what i needed. Multiple groups and the splitting trick
@ChartExplorers
@ChartExplorers 3 жыл бұрын
Perfect! I'm glad it had what you needed.
@crystalchaung1576
@crystalchaung1576 Жыл бұрын
I had to watch this a couple times too hear that part around 4:18 about why groupby will only return those who survived. It is good you added that. Now that I understand that, I can take a shot at age groups for the Titanic.
@jackfarah7494
@jackfarah7494 6 ай бұрын
Simple and informative i love this video and am saving it for future references! Thank you!
@DuniyaJahan1
@DuniyaJahan1 Жыл бұрын
🙏🙏🚩🚩🙏🙏Truly sir great lecture I had been trying to understand group by in pandas since last 25 days, but no-one was able to clear my confusion. But you sir explained me brilliantly and I am really so obliged of you. Thanks and I subscribed you and share on Facebook page, from Banaras City, India 😄😄😄🙏🙏🙏🙏🙏🙏
@InteligenciadeNegocios
@InteligenciadeNegocios 2 жыл бұрын
This is one of the best videos EVER! really helpfull! Thanks a LOT!
@Monkeysal07
@Monkeysal07 2 жыл бұрын
THANK YOU!!! that last tip is a life saver
@skye5107
@skye5107 7 ай бұрын
Thanks a lot i am searching this in entire weeks on articles.
@carolinamalosabastos2648
@carolinamalosabastos2648 8 ай бұрын
Great video! so clear... It helps me a lot! Tks from Brazil!)
@afonsoosorio2099
@afonsoosorio2099 2 жыл бұрын
Awesome 👌. Clear crystal 🔮. I specially like the bin trick, straightforward. That is really amazing 👏 😍. I had to break into intervals using numpy select ( ) or user defined function with apply ( ) to get the same result with the bin method. Keep it up.
@vitorribeirosa
@vitorribeirosa Жыл бұрын
Neat and objective!!! Thanks for sharing. I do appreciate your content.
@zebramc3693
@zebramc3693 Жыл бұрын
Thank you for your detailed demonstrations.
@lawngreenlyp
@lawngreenlyp 2 жыл бұрын
This is a very good video for explanation. Thanks so much from Hong Kong.
@ssrwarrior7978
@ssrwarrior7978 2 жыл бұрын
wow, u made it easy for me and saved lot of time.. THANK YOU
@blueciel_03
@blueciel_03 7 ай бұрын
Thanks a lot, it's really informative for my upcoming exam.
@mrb7931
@mrb7931 Жыл бұрын
Thanks a lot! You saved me day , now i can calculate mean by categorizing datasets
@rohitekka2674
@rohitekka2674 3 жыл бұрын
concise, short , illustrious!! Thanks alot!!!
@ChartExplorers
@ChartExplorers 3 жыл бұрын
You're welcome!
@andrenevares7543
@andrenevares7543 Жыл бұрын
Great explanation! Good JOB! Thumbs up!
@VRUNO
@VRUNO 2 жыл бұрын
you got a new follower Sir! really clear, really good explained, God, finally I understand :D thanks so much!
@ThanhVo-zs7ns
@ThanhVo-zs7ns 2 жыл бұрын
Very good and funny videos bring a great sense of entertainment!
@mohamedfawzy5453
@mohamedfawzy5453 Жыл бұрын
Great explanation! Thank you.
@mohamedkhaled902
@mohamedkhaled902 Жыл бұрын
Very helpful , keep it up ❤
@Jitendrakumar-du1ng
@Jitendrakumar-du1ng 2 жыл бұрын
thanks for the great video, it really helped me.
@saisarath623
@saisarath623 2 жыл бұрын
Really helpful tricks. Thank you!
@ChartExplorers
@ChartExplorers 2 жыл бұрын
You're welcome!
@gabriellopes0
@gabriellopes0 Жыл бұрын
Great explanation!
@nivviyer_
@nivviyer_ Жыл бұрын
Thank you so much sir !!
@ZirothTech
@ZirothTech 2 жыл бұрын
Great video, thanks!
@rajibroy1170
@rajibroy1170 Жыл бұрын
You are a savior
@sebastianperalta4775
@sebastianperalta4775 2 жыл бұрын
Thanks for the video.
@pramishprakash
@pramishprakash Жыл бұрын
Great video sir
@lightningmi
@lightningmi 2 жыл бұрын
Good step by step tutorial. But one thing you missed by Groupby multi columns, and apply different aggregate function. example: [column A, column B] A=sum, B=average. something like that
@yili6498
@yili6498 2 жыл бұрын
very clear, thxxx
@czr372
@czr372 Жыл бұрын
Saved me looots of hours haha! thanx!
@MagnusAnand
@MagnusAnand 3 жыл бұрын
excellent tutorial
@MachineLearningPro
@MachineLearningPro 8 ай бұрын
Great video
@MatthieuKhairallah
@MatthieuKhairallah Жыл бұрын
Thanks a lot!
@onurkoc6869
@onurkoc6869 2 жыл бұрын
you are telling very well proffessor:))
@paar6128
@paar6128 Жыл бұрын
Waow, your're amazing man :))
@varshakamble2095
@varshakamble2095 2 жыл бұрын
Thanks by heart
@crunchnos
@crunchnos 2 жыл бұрын
Thank you so f much!
@isaacenobun6370
@isaacenobun6370 2 жыл бұрын
Thanks man
@denisml42
@denisml42 2 жыл бұрын
Thanks for the great video. Im wondering about how you could group the ages in intervals of 10 years. I feel like you probably wouldnt use cut for that since you would need to know the highest / lowest age in order to determine how many cuts you need. Do you have a recommendation on how to do that?
@athief
@athief 2 жыл бұрын
It's great to have a 5-min quick & dirty dive, but a couple more seconds here and there to say that "agg" means "aggregate", that if we want more than one column summarised we must provide a list (hence the double brackets), etc. It provides a simple explanation that facilitates memory.
@richarda1630
@richarda1630 3 жыл бұрын
nice ! thanks :)
@jaskaransingh3200
@jaskaransingh3200 Жыл бұрын
Nice. helpful
@TheShrikhande
@TheShrikhande 3 жыл бұрын
What if I have a dataframe with two date columns (start-date, end-date) along with other attributes and I wish to create bins for each year incorporating both those date columns. How do you think I can manage to do that?
@souravde2283
@souravde2283 3 жыл бұрын
Awesome.
@marchanselthomas
@marchanselthomas Жыл бұрын
to the point!
@bnadir3930
@bnadir3930 2 жыл бұрын
Great video ! how can I get max() value grouped by column and yet get the intire dataframe colums to be presented ?
@jakobstigsson9687
@jakobstigsson9687 2 жыл бұрын
Hey, thanks for the video. I have a dataframe that has a column with 0-4 in value, but I wish to group it by 0 and then 1-4. How would that be possible? Is it a big difference?
@laychansethaaerd
@laychansethaaerd 3 жыл бұрын
Perfect
@tinayesibanda3070
@tinayesibanda3070 9 ай бұрын
How can I combine groupby then do distinct count on one of the cat column then sum on some of the numeric column
@ericc1317
@ericc1317 2 жыл бұрын
The as_index=0 tip is great! When doing this with .count() instead of sum, like for example I’m doing a project with the code format Df.groupby([‘x’][‘y’],as_index=False)[‘y’].count(), is there any way to keep the original y column along with the new y “count” column in a resulting data frame? With this method it replaces the original y with the count of y.
@maxons.e4643
@maxons.e4643 2 жыл бұрын
How do you sort the data when different conditions are involved in the groupby?
@youknownothing_
@youknownothing_ Жыл бұрын
great video. it would be great if you also provide the link for the notebook
@fashaikh5339
@fashaikh5339 3 жыл бұрын
VERY CLEAR , PLEASE IF YOU CAN EXPLAIN HOW DOING INTERSECTION IN CASE WE HAVE (ONE -TO -MANT) RELATIONAL DATA BASE ?. THANKS
@rohanbangash5827
@rohanbangash5827 2 жыл бұрын
How would we put the result of a groupby function as a column in our dataframe?
@govindrajput8503
@govindrajput8503 2 жыл бұрын
hi thanks for this. How do I show group by results for more than one variable with more than one aggregate function without the index. so basically mulitple groups as columns + aggregated on more than one function
@ahovebismark4001
@ahovebismark4001 Жыл бұрын
so please, I need a personal favor, I need to make labels for a plot I generated from a groupby method, any help with that?
@hansrc4469
@hansrc4469 Жыл бұрын
When I use groupby for multiple columns like you did, it show me a message that used list instead of square brackets.
@nurshibumi
@nurshibumi 2 жыл бұрын
thank u for your time and exertion! i have a question, i have a dataset, there are a few columns in it including "Fuel_Type". Fuel types are petrol, diesel and CNG. all i want is to group by the fuel_type and store the copy of datasets in variables both petrol and diesel. how can I do that, i have been searching for hours :))) pls answer me
@kiko1955
@kiko1955 2 жыл бұрын
Como hago un grafico con el resultado de un groupby. How do I make a graph with the result of a groupby?
@mohammadmfd682
@mohammadmfd682 3 жыл бұрын
very good
@ChartExplorers
@ChartExplorers 3 жыл бұрын
Thanks!
@premprakash6863
@premprakash6863 2 жыл бұрын
I want to group by on mobile number and want to merge messages received, how can i do that?
@danielrico3352
@danielrico3352 2 жыл бұрын
Thanks for the video! I have a question. If you want to select one specific biological sex, How could I write that code? For example just females. df.groupby(["pclass", [sex] == female])["survived].sum() It would be right to write it like this? Thanks in advance!
@javierclement3047
@javierclement3047 Жыл бұрын
It seems to me like this function doesn’t really need to exist. I feel like I could make all of these manipulations relatively easily with Boolean operations. Can someone explain the advantage of using groupby()? Because it’s easier? Or is there something I’m missing?
@brainwaves2389
@brainwaves2389 2 жыл бұрын
thanks
@ChartExplorers
@ChartExplorers 2 жыл бұрын
You're welcome! 😀
@MohsinAli-yd9js
@MohsinAli-yd9js 2 жыл бұрын
at 5:39. in setting labels for 'age_bins' how did it get to know that from which age group is young, which one is middle and old. like you did not set the parameters from 0 to 20 for young, 21 to 60 for middle and above 60 for old. or either it does it implicitly.
@JopieSchaft
@JopieSchaft 2 жыл бұрын
Using bins=3 as a parameter to the pd.cut() function automatically divides the group into 3 equally sized categories. See my comment to Xuan Tran for an explanation of how you can find out what it does or what you could do differently.
@XuanTran-ri1hn
@XuanTran-ri1hn 2 жыл бұрын
Hi. Thank you for your video. May I ask how do you know exactly that which age group is divided to which bin? Although these ages are put into 3 bins but I am unclear which exact age which bin contains? For example: what age range for 'young' in this case?
@JopieSchaft
@JopieSchaft 2 жыл бұрын
​@Adeel KhanI can think of 3 approaches to this: - Group by age_bins, then take the minimum and maximum age: df.groupby(['age_bins']).['age'].agg(['min', 'max']) - Use retbins=True in the pd.cut() function; I think retbins returns the bounds of your bins. - Define the bins yourself, i.e. bins=[0, 20, 60, 120] (instead of bins=3 as in the video) will divide the passengers into a 60 bin
@ericzheng4815
@ericzheng4815 2 жыл бұрын
When trying out this example: df['age_bins'] = pd.cut(df['age'], 3, labels=('young','middle_age', 'old')), I got a error returned. TypeError: can only concatenate str (not "float") to str. I don't know why. I looked at the manual, the code seems good to me.
@pazenriqueguillermo
@pazenriqueguillermo 2 жыл бұрын
Great Video! One question... Let say you do like the first example, group survivers by class and sum(), but I want the result sorted in a descending order ( the class with most survivers to the least...) How would you do that?
@coledd9487
@coledd9487 Жыл бұрын
.sort_values(ascending=False)
@Abdullah_Alhathloul
@Abdullah_Alhathloul 5 ай бұрын
nice
@coledd9487
@coledd9487 Жыл бұрын
Hey there, for some reason when i try doing Single Group, Multiple Columns (like in 2:19), I keep getting an error basically stating that it thinks my 'fare' column is filled with strings - as opposed to floats. As such, I can't do sum/mean/numeric methods on that data. I can't seem to get around it.
@ChartExplorers
@ChartExplorers Жыл бұрын
Hey Cole DD, sometimes when you read in your data pandas thinks the data is a string even though it should be integers or floats. This video here kzfaq.info/get/bejne/m9x7jNyEsbneqZ8.html discusses how to convert datatypes of columns and some common problems that you may run into when doing so. Let me know if that works.
@AIdevel
@AIdevel Жыл бұрын
I have a problem it keeps giving me keyError it doesn’t identify the name of the columns how can I solve it ? Please help me
@AimarZayyan
@AimarZayyan 2 жыл бұрын
Hi, how do i get with specific value column pclass sum for ex : 1 only
@ChartExplorers
@ChartExplorers 2 жыл бұрын
I'm not sure I understand your question. Are you looking to filter the dataframe so that only pclass = 1 is contained in the dataframe? You could use a boolean mask pclass1 = df[df['pclass'] == 1]. If that's what you are looking for you can check out this video on filtering which I think you will find helpful kzfaq.info/get/bejne/pM9pocplr9-Ximw.html
@osoriomatucurane9511
@osoriomatucurane9511 10 ай бұрын
Hi Bradon, Awesome tutorial. 4:41, survived by class, mean and sum. Proportion would have been more meaningful. How to get percentagem there, I mean the proportion of survived (survived rate) by class. Using transform????? For aggregation only allowed sum, mean, count,......
@fashaikh5339
@fashaikh5339 3 жыл бұрын
I have data frame contains three columns, one for restaurants_id , the second for his categories (one or plus categories) and the third column is for his zone. I need to calculate for each restaurant how many restaurants in his zone that share this restaurant in one category at least, and put the result in a new column ?
@ChartExplorers
@ChartExplorers 3 жыл бұрын
Hi F Ashaikh, is it possible for you to email me your data (or provide me with some made up data that is similar to the data you have). That will help me see what is going on a little better. My email is bradonvalgardson@gmail.com
@fashaikh5339
@fashaikh5339 3 жыл бұрын
I did , thank you very much for your help.
@russellmubaya2662
@russellmubaya2662 2 жыл бұрын
Can we then plot a graph of any sort using the generated table we've just grouped ? @Chat Explorers
@russellmubaya2662
@russellmubaya2662 2 жыл бұрын
@Chart Explorers*
@aliyananwar3727
@aliyananwar3727 2 жыл бұрын
I came here to understand concept of groupby but left with emotions we men sacrificed. 🥺
@shekharmandal4569
@shekharmandal4569 Жыл бұрын
goat
@xowp.
@xowp. Жыл бұрын
i love u
@michaelcruz1322
@michaelcruz1322 3 жыл бұрын
How did python determine which age_bin to place the individual into? You never specified the age-ranges associated with the categories?
@ChartExplorers
@ChartExplorers 3 жыл бұрын
Hi Michael, good question. The age bins was were grouped with the pandas cut method. By default the cut method will turn continuous data into categorical data by grouping it into three bins (you can specify how many bins you want - but if you don't it will make three bins). So if you have 12 values it will create three bins with 4values in each bin. pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html
@Monkeysal07
@Monkeysal07 2 жыл бұрын
Maybe this will allow you to specify the ranges of the bins. The length of the labels have to be -1 inferior with respect to the length of the bins df['age_cat'] = pd.cut(df['age'], bins=[x for x in range(0,100, 5)], labels=[x for x in range(5,100, 5)], right=True)
@pritisingh2432
@pritisingh2432 3 жыл бұрын
Hey I'm having problem in groupby as it is giving Data error and No numeric type to aggregate. Could you please help ?
@ChartExplorers
@ChartExplorers 3 жыл бұрын
Hi Priti, will you run df.dtypes and let me know if there are any numeric (float or int) datatypes in your dataframe? If they are all objects check out this video on how to convert objects into numberic values kzfaq.info/get/bejne/m9x7jNyEsbneqZ8.html (hopefully that will solve your problem. If this doesn't solve your problem will you copy and past your groupby statement and send it to me please?
@pritisingh2432
@pritisingh2432 3 жыл бұрын
@@ChartExplorers # Visualize Churn Rate by Gender plot_by_gender = churn_dataset.groupby('gender').Churn.mean().reset_index() plot_data = [ go.Bar( x=plot_by_gender['gender'], y=plot_by_gender['Churn'], width = [0.3, 0.3], marker=dict( color=['orange', 'green']) ) ] plot_layout = go.Layout( xaxis={"type": "category"}, yaxis={"title": "Churn Rate"}, title='Churn Rate by Gender', plot_bgcolor = 'rgb(243,243,243)', paper_bgcolor = 'rgb(243,243,243)', ) fig = go.Figure(data=plot_data, layout=plot_layout) po.iplot(fig) This is giving me the error .Can you suggest an alternative
@houndofjustice5
@houndofjustice5 3 жыл бұрын
Hello is there any way to put all values in their column depending on their index if value i m trying to group by is lets say Switzerland and it has multiple Happiness ratings for each year how do i put all ratings in same column for each year but just seperate them by comma without summing them up?
@ChartExplorers
@ChartExplorers 3 жыл бұрын
Great question Ivan. Try this out and see if it works for you. First I create a dictionary of data with 3 different countries and some happiness scores. Then I create a DataFrame with this data. The I use groupby function to group each country and then use apply(list) to create a list of all the values in each group. data_dict = {'country':['country_1','country_2','country_3','country_1','country_', 'country_2','country_3','country_2','country_3','country_1, 'happiness':[3,1,3,5,7,4,1,2,3,4]} df = pd.DataFrame(data_dict) df_grouped = df.groupby('country'['happiness'].apply(list)
@houndofjustice5
@houndofjustice5 3 жыл бұрын
@@ChartExplorers thank you for swift answer i managed to do it for one column but i m trying to do it for multiple columns basically just uniting rows with same country values but seperate them with comma its working when i do it for happiness score but if i try to add happiness rank it just throws out happiness score and happiness rank not values just those strings i tried as list but yea still not working I did it with this code which works for Happiness Score: frame.groupby(['Country'])['Happiness Score'].apply(lambda x:' , '.join(x.astype(str))).reset_index()
@ChartExplorers
@ChartExplorers 3 жыл бұрын
@@houndofjustice5 I think I see what you are asking. So you want to groupby country and then list out all the values for that country in the happiness and rank columns. Let me know if this works. If not, I am setting up a discord server for Chart Explorers. That might be a better medium for problem solving. # Example Data data_dict = {'country':['country_1','country_2','country_3','country_1','country_1', 'country_2','country_3','country_2','country_3','country_1'], 'happiness':[3,1,3,5,7,4,1,2,3,4], 'rank':[1,2,3,4,5,6,7,8,9,10]} df = pd.DataFrame(data_dict) # groupby with list for multiple columns df_grouped = df.groupby('country')[['happiness','rank']].agg(lambda x: list(x))
@SudhirKumar-ry4gk
@SudhirKumar-ry4gk 3 жыл бұрын
Please help as I have data of employees in which they did multiple sale, I want if any employee did sale more the 50000 againt it each emp I'd of that person print excellent rest low. Like Emp I'd. Sale status Emp1001 5000. Excellent Emp1001 45000. Excellent Emp1001 2000. Excellent Emp1002 5000. Low Emp1003 2500. Low
@ChartExplorers
@ChartExplorers 3 жыл бұрын
Hi @@SudhirKumar-ry4gk, so you are wanting to group by employee Id and for employees that had sales greater than $50,000 mark them as excellent otherwise mark them as low? Is that correct?
@srideviponmalarp
@srideviponmalarp 9 ай бұрын
Can you send dataset
@ainahannani4489
@ainahannani4489 2 жыл бұрын
How do I make a poisson distribution of a groupby column?
@ChartExplorers
@ChartExplorers 2 жыл бұрын
I'm not sure. I would need to see your data and know more context to better understand what you are trying to accomplish.
@pursh2002
@pursh2002 3 жыл бұрын
# function that groups data by attribute1 and calculates per-group statistics for attribute2 mean and count , how do we make a function for this def get(data, attr1, attr2, statistic):
@ChartExplorers
@ChartExplorers 3 жыл бұрын
Hi Pursh, I'm not sure if I understand exactly what you are trying to accomplish. Are you trying to obtain the mean and count on groups based on multiple columns/attributes? df.groupby(['pclass','sex], as_index=False)['survived'].agg(['mean','count']) If this is the case I'm not sure the purpose of creating a function to do this.
@shaikhjunaid8693
@shaikhjunaid8693 Жыл бұрын
Sir how will you solve the problem when you have to determine who are the top5 highest rated players for every position in fifa dataset?
@YoungerLei
@YoungerLei Жыл бұрын
Hi, it might be fifa.groupby(by='position').apply(lambda group: group.sort_values(by='rate', ascending=False').head(n=5)
@ibrar6121
@ibrar6121 10 ай бұрын
In the Quick Tip Section, How did the program know that 29 is Middle_age, 2 is Young_age and 50 is old???
@shoaibsoomro
@shoaibsoomro 2 жыл бұрын
at 5:54 while applying pd.cut did not work for me it gives error TypeError: can only concatenate str (not "float") to str Solution: used the two lines that solved the issue. df['age'] = df['age'].replace('?',0) #clean data df['age']=df.age.astype('float64') #convert data type to float
@apz9022
@apz9022 3 жыл бұрын
I have a dataframe that has around 20 columns and 800 rows. One column contains multiple duplicate information that I am using as the group, and based on one of the other columns I want to filter the dataframe to show unique values based on the highest number of this column using max(). I still want to retain all of the other columns and end up with a dataframe that contains these unique values including the original columns. group = df_UE5_Compatability_info.groupby('lookup')['Function Count'].max() where "lookup" is the column I want to group by (containing multiples of the same value) and filter to show the rows with the highest number for "Function Count", how do I make the dataframe contain the other remaining columns associated with the resultant rows determined by the groupby? I am struggling. Difficult to describe in words.. sorry
@ChartExplorers
@ChartExplorers 3 жыл бұрын
Hi Alan, you did a great job explaining thanks providing me an example of what you have done. 😀 If I'm understanding correctly (please correct me if I'm wrong), you have 1 column that contains categories and you want to get the max value for each of those categories in every column that you have (using groupby). Here is a simple example I made that will get the max value for every column in the dataframe based on the groups in Col_4. import pandas as pd # Create practice df df = pd.DataFrame({'Col_1':[1,2,3,4,5], 'Col_2':[6,7,8,9,10], 'Col_3':[11,12,13,14,15], 'Col_4':['Group_1','Group_2','Group_1','Group_1','Group_2'] }) # groupby Col_4 (in your case use lookup) group = df.groupby('Col_4').max() group.head() You will notice here, instead of adding a list of columns to perform the groupby function on I excluded it. This will perform the operation on all the columns. In your example, you should be able to do the following to get your answer: group = df_UE5_Compatability_info.groupby('lookup').max()
@apz9022
@apz9022 3 жыл бұрын
@@ChartExplorers Thanks for the reply. Below is a sample dataset (made up) to try and better explain and one that is more representative to my actual dataset. df = pd.DataFrame({'lookup':['abc123','abc124','abc123','abc125','abc125'], 'Supported':['no','yes','no','yes','yes'], 'Percentage':[0.9,0.6,0.6,0.7,0.6], 'Number of features':[1,6,10,8,11], 'Platform':['Release 1.0','Release 1.0','Release 2.0','Release 1.0','Release 2.0'] }) The output should look like the following: lookup Supported Percentage Number of features Platform 0 abc123 no 0.9 1 Release 1.0 1 abc124 yes 0.6 6 Release 1.0 2 abc123 no 0.6 10 Release 2.0 3 abc125 yes 0.7 8 Release 1.0 4 abc125 yes 0.6 11 Release 2.0 Column "lookup", Row 0 and 2 are common values, as are rows 3 and 4. My goal is to have one row per value in column "lookup", filtered on the highest value in column "Number of features" and all other columns values for the selected row should be shown in the output data frame. Using the following group = df.groupby('lookup').max() creates: Supported Percentage Number of features Platform lookup abc123 no 0.9 10 Release 2.0 abc124 yes 0.6 6 Release 1.0 abc125 yes 0.7 11 Release 2.0 But the percentage is wrong for rows abc123 and abc125, as its has included the highest percentage in each of the groups. My desired result is as follows:- abc123 no 0.6 10 Release 2.0 abc124 yes 0.6 6 Release 1.0 abc125 yes 0.6 11 Release 2.0 where values for columns "Supported', 'Percentage' are taken "as-is' from the dataframe row that contains the row with the highest "Number of features' In my script I am using group = df.groupby('lookup')['Number of features'].max() which returns the following, but I am missing the other columns, in this example Supported, Percentage and Platform. lookup abc123 10 abc124 6 abc125 11 Also, if I try to save the dataframe to csv, I only get the following Number of features 10 6 11 I would have expected to have this csv output? lookup Number of features abc123 10 abc124 6 abc125 11 Thanks again.. and I hope this is more descriptive?
@ChartExplorers
@ChartExplorers 3 жыл бұрын
@@apz9022 thanks for providing the example, that clarifies things a lot. If you use the same dataframe you created in your example you should be able to use the following code: new_df = pd.DataFrame(pd.DataFrame(columns=df.columns)) for item in df['lookup'].unique(): temp_df = df[df['lookup']==item] row = temp_df[temp_df['Number of features'] == temp_df['Number of features'].max()] alist.append(row) new_df = pd.concat([new_df, row], ignore_index=True) new_df Sadly, this uses a for loop. There might be another way to do this would avoid the for loop (I need to work on it a little more to get it to work - I'll let you know if I get it to work). I'm also going to look into groupby a little more. There are some cool things you can do with groupby, but this has several constraints that I do not think groupby will support. With 800 rows and 20 columns performance should not be an issue (but it's always nice to squeeze as much performance out as possible just for fun!). Hope this works. Let me know.
@apz9022
@apz9022 3 жыл бұрын
@@ChartExplorers Thanks.. what is "alist.append" ? I get an error stating "alist" is not defined?
@apz9022
@apz9022 3 жыл бұрын
@@ChartExplorers Thanks.. updated my code and its working like a charm! Thanks. One point, alist.append(row) did not work for me? I have left it out and it still seems to work. What does this do?
@Tropical188
@Tropical188 2 жыл бұрын
wow
@jha6783
@jha6783 10 ай бұрын
how do you know what is young, middle_age or old. This is not defined.
@azrflourish9032
@azrflourish9032 3 жыл бұрын
why '?' is needed while reading a csv file??
@ChartExplorers
@ChartExplorers 3 жыл бұрын
Good question, I should have explained this in the video. In the csv file missing data is represented with '?'. When we read in missing data into pandas we can tell it that missing data is represented by then pandas will treat it as a missing value rather than getting confused.
@azrflourish9032
@azrflourish9032 3 жыл бұрын
@@ChartExplorers oh, thank you (^ ^)
@ericfayhuynh
@ericfayhuynh Жыл бұрын
looks like the data set is outdated
Convert to DateTime
5:37
Chart Explorers
Рет қаралды 14 М.
Pandas Groupby | How to Use Pandas Groupby
22:30
Automate with Rakesh
Рет қаралды 965
IQ Level: 10000
00:10
Younes Zarou
Рет қаралды 7 МЛН
Fast and Furious: New Zealand 🚗
00:29
How Ridiculous
Рет қаралды 41 МЛН
Loop / Iterate over pandas DataFrame (2020)
11:05
Chart Explorers
Рет қаралды 81 М.
How to use the Pandas GroupBy function | Pandas tutorial
19:03
Mısra Turp
Рет қаралды 29 М.
How to filter a pandas DataFrame | 6 HELPFUL METHODS
17:27
Chart Explorers
Рет қаралды 29 М.
When should I use a "groupby" in pandas?
8:25
Data School
Рет қаралды 243 М.
Learn JSON in 10 Minutes
12:00
Web Dev Simplified
Рет қаралды 3,1 МЛН
How to combine DataFrames in Pandas | Merge, Join, Concat, & Append
13:40
How do I use the MultiIndex in pandas?
25:01
Data School
Рет қаралды 173 М.
The Complete Guide to Python Pandas Groupby
44:17
Ryan Nolan Data
Рет қаралды 8 М.
25 Nooby Pandas Coding Mistakes You Should NEVER make.
11:30
Rob Mulla
Рет қаралды 264 М.
Look, this is the 97th generation of the phone?
0:13
Edcers
Рет қаралды 7 МЛН
Запрещенный Гаджет для Авто с aliexpress 2
0:50
Тимур Сидельников
Рет қаралды 920 М.
АЙФОН 20 С ФУНКЦИЕЙ ВИДЕНИЯ ОГНЯ
0:59
КиноХост
Рет қаралды 1,2 МЛН
Какой ноутбук взять для учёбы? #msi #rtx4090 #laptop #юмор #игровой #apple #shorts
0:18
iPhone 15 Pro Max vs IPhone Xs Max  troll face speed test
0:33