Transform data with ADF Data Flows Part 3: Derive columns, bring it all together

No video

Transform data with ADF Data Flows Part 3: Derive columns, bring it all together

Рет қаралды 14,271

Data Factory

Күн бұрын

Пікірлер: 25

@Anroth98 2 жыл бұрын

thanks for the turotial! very clear and conscise. I wish the video was in FullHD though

@jayakrishna9153 4 жыл бұрын

very good ..pls upload more videos

@MSDataFactory 4 жыл бұрын

Anything in particular you'd like to see next?

@liftfundus6468 2 жыл бұрын

I am trying to sink a JSON file to an Azure SQL database. What data flow activity should I use if I am dealing with a particular field of the JSON objects where sometimes that field holds an empty array and other times that field is an array with one or multiple objects? I am unsure of how to deal with that variability.

@MSDataFactory 2 жыл бұрын

You should flatten that array object using the Flatten transformation

@hkiamzon 3 жыл бұрын

Hi. Thanks for the very helpful videos. Regarding derived columns, is there a way to pull in a value from a separate dataset? This is my use case: I have an excel workbook in which the first tab contains a field with the file's creation date. The rest of the tabs contain tabular data which I'll be creating separate datasets for, but this data does not contain a date indicating the age of the data. The user has to refer to the first tab with the creation date. When creating the datasets for the tabular data, I'd like to add a column and populate each row with the excel files created date. Can this be done with a data flow using a derived column? Thanks

@MSDataFactory 3 жыл бұрын

If the created date is part of the Excel file's data then you can do this using either a Lookup or a Cached Lookup. However, if the file create date is not present in the file columns/rows, then you'll need to fetch that from the pipeline using GetMetadata, then pass that date to the data flow as a parameter.

@joshiabhinav 3 жыл бұрын

Hi . I have a question. we are currently using azure data factory and azurr data bricks. I am planning to do away with data bricks and use azure data flows. I do understand that azure data flows under the covers is also spark . How would I differentiate azure data flows and azure data bricks in that regard

@MSDataFactory 3 жыл бұрын

Data Flows is built as an ETL engine in ADF that happens to use Spark as the compute engine for executing your data transformation logic. You can take the same data flow script from ADF and paste it into an Azure Synapse data flow and it will work just fine on both flavors of Spark.

@joshiabhinav 3 жыл бұрын

@@MSDataFactory thanks for the revert. Can you may be comment on the performance of azure data bricks and azure data flows. considering a scenario which is exactly the same but one built via azure data bricks and one by azure data flows .. I am at a point in my organization where I need to justify the use of azure data flows against azure databricks

@MSDataFactory 3 жыл бұрын

@@joshiabhinav This is really difficult to answer in general terms. With Databricks, the coding is your task and some folks can write terrifically optimized ETL code, while some can write very poor-performing code. What I can share with is a couple of examples of baseline timing in ADF Data Flows that we've documented. I have a video and a slidedeck to give you some examples. I'm hoping this will help you with your decision: kzfaq.info/get/bejne/qt-XadOKmK28Xas.html www2.slideshare.net/kromerm/azure-data-factory-data-flow-performance-tuning-101

@joshiabhinav 3 жыл бұрын

@@MSDataFactory thank you so much sir

@trishachatterjee1126 3 жыл бұрын

Hi, What if I want to remove some redundant strings from the column names dynamically? Is there anyway to do that? Thanks in advance .

@MSDataFactory 3 жыл бұрын

Do you want to remove redundant strings from the column values or from the column names?

@vinayvarmadandu1619 4 жыл бұрын

Hi in adf,the cluster startup time is taking 4 to 5 minutes and conditional seperator activity runs for a longer time .how can we optimize the performance and cluster start up time.Used TTL and created cluster with 16*16 driver cores

@MSDataFactory 4 жыл бұрын

If you set a TTL on the Azure IR, then ADF will utilize the Databricks warm pool feature and your subsequent activities using that same Azure IR will startup in 1-2 mins

@vinayvarmadandu1619 4 жыл бұрын

@@MSDataFactory thank you for the immediate response.I tried and working fine.are there any video which provides performance tuning on activities like conditional split operator and some other activities.It was taking long processing time .

@MSDataFactory 4 жыл бұрын

@@vinayvarmadandu1619 we do have some videos here on the YT channel for performance, you can have a search and see if they help. Or, if you'd like to discuss your specific case, post a new topic on our public ADF Q&A site and we can help there: docs.microsoft.com/en-us/answers/topics/azure-data-factory.html

@abdulwahabo 4 жыл бұрын

Hi, whats the efficient way to create a derived column (new one) by combining all columns from the input stream ? prefer not hard coding the column name or the position/index

@MSDataFactory 4 жыл бұрын

Do you mean all string columns aggregated into a single string?

@abdulwahabo 4 жыл бұрын

@@MSDataFactoryI have columns with different data types (string, date time, complex object, array) in my source data. I want to create a new column with whole data of my source stream. Some like this : newColumn = [{string, complex object, date time }, {string, complex object, date time }]

@MSDataFactory 4 жыл бұрын

@@abdulwahabo Add a Derived Column and create a new hierarchical structure with those columns.

@joshiabhinav 3 жыл бұрын

hey I have a question . I am picking same pattern files and processing them. I want to get the file name for each of the file as a separate column. How can I do this ?

@MSDataFactory 3 жыл бұрын

Use the "Column to Store File Name" property in Source Options: docs.microsoft.com/en-us/azure/data-factory/format-delimited-text#source-properties

@joshiabhinav 3 жыл бұрын

@@MSDataFactory hey thanks just used it and it works fab