Рет қаралды 854
Catalog provides unified, secure access to structured data. Apache Iceberg is a new open source table format, which enables ACID-compliance and scales for petabyte datasets. While the same Hive metastore can be used for cataloguing when migrating from Hive to Iceberg table, its convenience is quickly outweighed by intermittent lock issues and maintenance overhead. The Iceberg open source community has a proposal for a new implementation of catalog, which can be hosted as a REST server. Defined by an open API specification, Iceberg REST catalog presents a language-agnostic choice and hides all server side logic behind the scenes, while also provides better security and extensibility.
This talk highlights how the REST catalog was built and adopted at Apple and the improvements we are bringing back to the open source community. We will cover the benefits of the new REST catalog, our migration journey from Hive metastore to the REST catalog, and lessons learned. We will even dive into some of the ways that we extended the catalog functionality to clients outside of Apache Spark and Apache Flink, introduced a new set of APIs to expose files behind Iceberg table and integrated with the authentication and authorization plugin for access control. Attendees will leave ready to start their migration to the REST catalogue.
Speaker: Hongyue Zhang
Hongyue started his career developing micro services in AWS to help schedule the serverless containers at scale. His journey with data started in 2021, and he immediately became the fan of the Apache Iceberg project. At Apple, Hongyue is building tools and systems around Apache Iceberg to help make data-driven decisions.