No video

14. explode(), split(), array() & array_contains() functions in PySpark |

  Рет қаралды 33,593

WafaStudies

WafaStudies

Жыл бұрын

In this video, I explained about explode() , split(), array() & array_contains() functions usages with ArrayType column in PySpark.
Link for PySpark Playlist:
• 1. What is PySpark?
Link for PySpark Real Time Scenarios Playlist:
• 1. Remove double quote...
Link for Azure Synapse Analytics Playlist:
• 1. Introduction to Azu...
Link to Azure Synapse Real Time scenarios Playlist:
• Azure Synapse Analytic...
Link for Azure Data bricks Play list:
• 1. Introduction to Az...
Link for Azure Functions Play list:
• 1. Introduction to Azu...
Link for Azure Basics Play list:
• 1. What is Azure and C...
Link for Azure Data factory Play list:
• 1. Introduction to Azu...
Link for Azure Data Factory Real time Scenarios
• 1. Handle Error Rows i...
Link for Azure Logic Apps playlist
• 1. Introduction to Azu...
#PySpark #Spark #databricks #azuresynapse #synapse #notebook #azuredatabricks #PySparkcode #dataframe #WafaStudies #maheer #azure

Пікірлер: 29
@raghunathpanse3258
@raghunathpanse3258 Жыл бұрын
this is worthy to watch.... The speed I picked up after following you is unbelievable. thank you soo muchh for this amazing content and no doubt your explanation is finest ever I have seen.
@WafaStudies
@WafaStudies Жыл бұрын
Thank you for your kind words ☺️
@VivekKBangaru
@VivekKBangaru Жыл бұрын
Awesome Video this is i can thoroughly understand it.
@WafaStudies
@WafaStudies Жыл бұрын
Thank you 😊
@deepjyotimitra1340
@deepjyotimitra1340 Жыл бұрын
You are doing an amazing job brother. Keep it up. Thanks for all your contributions to data engineering tutorials.
@WafaStudies
@WafaStudies Жыл бұрын
Thank you ☺️
@tarigopulaayyappa
@tarigopulaayyappa Жыл бұрын
@@WafaStudies brother , can you try to upload the videos quickly as much as you can if you don't mind?
@WafaStudies
@WafaStudies Жыл бұрын
@@tarigopulaayyappa will try to do more fastly 😇
@tarigopulaayyappa
@tarigopulaayyappa Жыл бұрын
@@WafaStudies Thank you very much.
@polakigowtam183
@polakigowtam183 Жыл бұрын
Good Vedio. Thanks Maheer.
@WafaStudies
@WafaStudies Жыл бұрын
Welcome 🤗
@tarun007
@tarun007 Жыл бұрын
Thank You Wafa..😁😊
@WafaStudies
@WafaStudies Жыл бұрын
Welcome 🤗
@SonuKumar-fn1gn
@SonuKumar-fn1gn Жыл бұрын
Thank you ❤️
@WafaStudies
@WafaStudies Жыл бұрын
Welcome 🤗
@vutv5742
@vutv5742 6 ай бұрын
Completed
@deepakk8758
@deepakk8758 Жыл бұрын
thanks Sir
@WafaStudies
@WafaStudies Жыл бұрын
Welcome
@Aelmasri-ht5sv
@Aelmasri-ht5sv Жыл бұрын
Thank you Maheer. you are doing a very gentle work. have you prepared the tips of this videos i means slides or whatever?
@shreyaspatil4861
@shreyaspatil4861 6 ай бұрын
Thanks very much for the tutorial :) , I have a query regarding reading in json files. so i have an array of structs where each struct has a different structure/schema. And based on a certain property value of struct I apply filter to get that nested struct , however when I display using printschema it contains fields that do not belong to that object but are somehow being associated with the object from the schema of other structs , how can i possibly fix this issue ?
@sahilgarg7383
@sahilgarg7383 5 ай бұрын
in case of split, what will happen if we give delimiter as | instead of ,
@yosaki-fv9yy
@yosaki-fv9yy 8 ай бұрын
When you used array() ... What if the number of skills is different between each data?
@phanidivi3613
@phanidivi3613 Жыл бұрын
Thanks a lot for sharing maheer. Can we create any trail account for practice. As of now Microsoft not provide community free trail subscription I think
@RakeshGandu-wb7eu
@RakeshGandu-wb7eu Жыл бұрын
Nice video how can we remove duplicates from array column
@mohitpande2006
@mohitpande2006 Жыл бұрын
sir how can we explode more than 2 columns or more like 150
@julianalilian
@julianalilian Жыл бұрын
@WafaStudies Are there any other ways to explode the array without the explode command? I ask because I made a script with the explode command, but the performance is really bad and I'm looking for another way to do this. Thank you!
@vasanthasworld2948
@vasanthasworld2948 Жыл бұрын
Please drop that notebook details in description..so that it will be easy for us to refer...or u can share at git hub repository
@DataWithNagar
@DataWithNagar Жыл бұрын
explained about explode() , split(), array() & array_contains() functions usages with ArrayType column in PySpark. ---------------------------------------- data = [(1,'Maheer',['dotnet','azure']),(2,'Wafa',['java','aws'])] schema = ['id', 'name', 'skills'] df = spark.createDataFrame(data=data,schema=schema) df.display() df.printSchema() ----- #explode() from pyspark.sql.functions import explode,col df.show() df1 = df.withColumn('skill',explode(col='skills')) df1.show() ------------------------------------------- data = [(1,'Maheer','dotnet,azure'),(2,'Wafa','java,aws')] schema = ['id', 'name', 'skills'] df = spark.createDataFrame(data=data,schema=schema) df.display() df.printSchema() ----- #split() from pyspark.sql.functions import split,col df.show() df1 = df.withColumn('skills_array',split('skills',',')) df1.show() -------------------------------------------- data = [(1,'Maheer','dotnet','azure'),(2,'Wafa','java','aws')] schema = ['id', 'name', 'primaryskill', 'secondaryskill'] df = spark.createDataFrame(data=data,schema=schema) df.display() df.printSchema() ------ #array() from pyspark.sql.functions import array,col df.show() df1 = df.withColumn('skillsArray',array(col('primarySkill'),col('secondarySkill'))) df1.show() --------------------------------------------- data = [(1,'Maheer',['dotnet','azure']),(2,'Wafa',['java','aws'])] schema = ['id', 'name', 'skills'] df = spark.createDataFrame(data=data,schema=schema) df.display() df.printSchema() ------ from pyspark.sql.functions import array_contains,col df.show() df1 = df.withColumn('HasJavaSkill',array_contains('skills',value='java')) df1.show() -------------------------------------------------
@abhishekstatus_7
@abhishekstatus_7 Жыл бұрын
For me I am not sure why it was not working I changed the script then i got skills and skill both the columns from pyspark.sql.functions import explode, col # Sample data data = [(1, 'abhishek', ['dotnet', 'azure']), (2, 'abhi', ['java', 'aws'])] schema = ['id', 'name', 'skills'] # Create DataFrame df = spark.createDataFrame(data, schema) df.show() # Apply explode function on the "skills" column and rename the exploded column df1 = df.withColumn('skill', explode(col('skills'))).select('id', 'name', 'skills', 'skill') df1.show()
Please Master These 10 Python Functions…
22:17
Tech With Tim
Рет қаралды 132 М.
Running With Bigger And Bigger Feastables
00:17
MrBeast
Рет қаралды 102 МЛН
🩷🩵VS👿
00:38
ISSEI / いっせい
Рет қаралды 12 МЛН
PySpark Explode function and all its variances with examples
11:44
Data Engineering Studies
Рет қаралды 107
Python JSON Parsing: A Step-by-Step Guide to Extract Data from JSON
14:27
Automate with Rakesh
Рет қаралды 16 М.
Learn Apache Spark in 10 Minutes | Step by Step Guide
10:47
Darshil Parmar
Рет қаралды 291 М.
5 Useful F-String Tricks In Python
10:02
Indently
Рет қаралды 295 М.
Conceptual, Logical & Physical Data Models
13:45
DataAcademy.in
Рет қаралды 470 М.