Support for Parquet file format in Snowflake connector (S3/Blob upload)
It would be very helpful for us if DV would support a more efficient file format like Parquet for the AWS S3 or Azure Blob upload part during the staging process then CSV - specially for large views. In the same context it would make sense to make the local exporting of the file optional (Parameter tempFolder). This means that it would be also possible to create the file directly on the cloudstorage without the need to swap locally first.
-
Comment actionsOfficial comment
Hi Sebastian,
thanks for your request. That should be something we can indeed support. There already is a Parquet connector but it's currently not integrated into the Snowflake/Redshift load mechanism. We will give it a try, in the first step to find out what difference it makes in terms of performance/file size and report back our findings here.
Best
Niklas
-
Hi Niklas
We didn't do statistically representative analysis, but even for a large fact tables containing floating point data we got a 10 times smaller file with parquet compared to uncompressed CSV.
The other thing Sebastian mentioned is also substantial. Writing large tables / CSV files to disk first, before transmitting it to Azure (in our case) and writing it then to disk for a second time (blob) generates quite some overhead.
Best,
David
Please sign in to leave a comment.
Comments
2 comments