Рет қаралды 5,169
Pipeline Builder is the most accessible way to build production-grade data pipelines in Palantir Foundry. This video is an intro tutorial that covers the basics of how to use Pipeline Builder and some essential data engineering concepts. It also showcases the first few AI features of Pipeline Builder.
Ontologize
Founded by Taylor Gregoire-Wright, a former Palantir implementation engineer, Ontologize offers courses & live trainings for Palantir Foundry. Visit ontologize.com or connect on LinkedIn: / tgregoirewright
You can follow along with the same data I used for this tutorial
The data used in this tutorial is notional data from a fictional set of grocery stores in the US. The datasets include:
Transactions - A customer makes a transaction when they buy groceries
Baskets - Baskets represents which items and how many of each were bought in a single transaction
Customers - Each row is information about a single customer
Products - Descriptions about the different products customers can buy, including product name, brand, store department, etc.
Stores - The stores that the parent company owns
Download the data from Ontologize's GitHub: github.com/ontologize/fake-gr...
0:00 Creating a pipeline
1:30 Reference pipeline
3:29 Adding input data
4:40 Cleaning transactions data
7:30 Aggregating data
9:00 Joining data
10:35 Nesting expressions
12:12 Completing the transactions pipeline
15:23 Changing column order
16:15 Editing upstream parts of the pipeline
17:34 AI Feature: auto-naming
17:56 Drag and drop pipeline nodes
18:39 Creating an output dataset
19:29 Saving, Builds, and Jobs
20:38 Saving vs. Deploying
23:20 Organizing with colors
24:16 Creating synthetic primary keys
26:46 Wrapping up the rest of the pipeline
28:59 Reusables: parameters aka variables
30:43 Reusables: functions
32:37 AI Features: auto-generating regex expressions
34:04 Version Control in Pipeline Builder
38:24 Pipeline organization
41:43 Course announcement