Support for Parquet file format in Snowflake connector (S3/Blob upload)

Comments

2 comments

  • Comment actions Permalink
    Official comment
    Avatar
    Niklas Schmidtmer

    Hi Sebastian,

     

    thanks for your request. That should be something we can indeed support. There already is a Parquet connector but it's currently not integrated into the Snowflake/Redshift load mechanism. We will give it a try, in the first step to find out what difference it makes in terms of performance/file size and report back our findings here.

     

    Best

    Niklas

  • 0
    Comment actions Permalink
    Avatar
    David Bosshard

    Hi Niklas

    We didn't do statistically representative analysis, but even for a large fact tables containing floating point data we got a 10 times smaller file with parquet compared to uncompressed CSV.

    The other thing Sebastian mentioned is also substantial. Writing large tables / CSV files to disk first, before transmitting it to Azure (in our case) and writing it then to disk for a second time (blob) generates quite some overhead.

    Best,
    David

Please sign in to leave a comment.

Powered by Zendesk