Data Lakehouse Deep Dive: Hudi, Iceberg, and Delta Lake

  Рет қаралды 3,056

OnehouseHQ

OnehouseHQ

Күн бұрын

On Tuesday, August 22nd, Onehouse presented a LinkedIn Live webinar on Apache Hudi, Apache Iceberg, and Delta Lake. You can see the video here or on LinkedIn:
/ 7095484265877950465
The video was based on Kyle Weller's popular blog post on the three major lakehouse projects:
www.onehouse.ai/blog/apache-h...
You can access Kyle's slides here:
uploads-ssl.webflow.com/64d1d...

Пікірлер: 8
@padam_discussion
@padam_discussion Ай бұрын
Interesting video... great
@HoorayforOranges
@HoorayforOranges 4 ай бұрын
Thank you so much for this. This is the only video I could find that takes a real deep dive into the data without propaganda towards any one candidate.
@JG-zu6nq
@JG-zu6nq 10 ай бұрын
mistake at 22:41, there's no limitation that you 'cant cross over the boundary' in a query when you do partition evolution in Iceberg
@kjweller
@kjweller 10 ай бұрын
You can cross the boundary, but the query predicates need to be right to get the same performance across both partition schemes.
@JG-zu6nq
@JG-zu6nq 10 ай бұрын
@@kjweller what exactly does that mean, one just has to write select * from table where ts > timestamp '2023-08-21 00:00:00' and even if the partitioning was evolved from say daily to hourly on 08/25 that will work and prune the partitions
@kjweller
@kjweller 10 ай бұрын
@@JG-zu6nq take an example if you were partitioning by date daily, and you want to evolve this to partition by userId or vice-versa. A query with only one of the predicates will be efficient just for that section of the partitioned data. It works great for evolving partitioning within different aggregate levels of same value, but struggles across different values.
@paulfunigga
@paulfunigga 10 ай бұрын
@@kjweller what about schema evolution, in your article it says that hudi's schema evolution is good only on spark sql. What if I use hudi with trino? Is schema evolution going to be bad? Also, is hudi good with trino at all? In trino's slack channel they said that they prioritize iceberg.
@paulfunigga
@paulfunigga 10 ай бұрын
@@kjweller also, in your "which format to choose" why didn't you add another point: hudi's table services are managed, compared to iceberg and delta lake, I think it's a big thing.
Open Data Foundations across Hudi, Iceberg and Delta
34:24
Data Council
Рет қаралды 1,5 М.
Does size matter? BEACH EDITION
00:32
Mini Katana
Рет қаралды 13 МЛН
Was ist im Eis versteckt? 🧊 Coole Winter-Gadgets von Amazon
00:37
SMOL German
Рет қаралды 37 МЛН
Survival skills: A great idea with duct tape #survival #lifehacks #camping
00:27
БОЛЬШОЙ ПЕТУШОК #shorts
00:21
Паша Осадчий
Рет қаралды 9 МЛН
What is Apache Iceberg?
12:54
IBM Technology
Рет қаралды 17 М.
Getting Started With Apache Iceberg With Project Co-Creator Ryan Blue
52:30
Snowflake Developers
Рет қаралды 2,3 М.
Why You Shouldn’t Care About Iceberg | Tabular
20:26
Data Council
Рет қаралды 12 М.
Data Mesh, Data Fabric, Data Lakehouse - SQLBits 2022
56:26
James Serra
Рет қаралды 51 М.
Making Apache Spark™ Better with Delta Lake
58:10
Databricks
Рет қаралды 173 М.
Generative AI in a Nutshell - how to survive and thrive in the age of AI
17:57
[LIVE] Hudi 0.14.0 Deep Dive: Record Level Index
40:17
Apache Hudi
Рет қаралды 378
The Next Decade of Software Development - Richard Campbell - NDC London 2023
1:07:05
Does size matter? BEACH EDITION
00:32
Mini Katana
Рет қаралды 13 МЛН