7 Comments

Great comparison.

Few things I have experienced:

- Delta streaming is good, although we don't have lot of streaming use case.

- Delta liquid clustering (incremental) equivalent is something missing from Iceberg iirc.

- I noticed when working on Iceberg, you cannot create a table without having a catalog, in delta, you can just write directly to s3 and it will be a delta table independent of catalog.

Expand full comment

Hi Zach, great article thank you!

iceberg fanboy too…so the drawback you mention with hidden partitioning, trino just released 472, you can now query partitions via hidden metadata column.

https://github.com/trinodb/trino/issues/24301

Expand full comment

Hi, Zach. Interesting article 👍🏽

I’m willing to test iceberg asap.

In the table comparing both delta, iceberg and hudi say that delta doesn’t allow rename or drop columns but it does.

https://docs.delta.io/latest/delta-column-mapping.html#delta-column-mapping

Expand full comment

Makes sense

Expand full comment

Hello 👋 Zach, Thanks for the great article. It's really helpful to compare Delta, Iceberg, and Hudi!

Quick question: I noticed in the Delta example that ZORDER is applied on a single column (event_time). Based on the Delta Lake documentation, Z-Ordering tends to be most beneficial when used on multiple columns, especially for queries with varying filters. For a single column, it seems a simple sort or clustering might suffice.

I would love to hear your thoughts. Is there an added benefit to using ZORDER in this specific case?

Thanks again for sharing this!

Expand full comment

In this paragraph , you say ' If you use Copy-on-Write strategy, files are compacted when data is written. This makes for slower reads.' . Is this a typographical error? Did you mean this makes for 'Slower writes', as compaction needs to be done at Write time? Amazing blog and very educative and instructive at the same time. The relational database analogy is brilliant

Expand full comment

Yes it’s a typo. Lemme fix

Expand full comment