Drop any file to identify it
No upload. No signup. No sending your file halfway across the internet.
We tell you what it is, right here in your browser.
Drop it!
Let go to identify this file.
Couldn't identify this file
Need to convert it? fwip it →
Parquet stores data by column instead of by row. This sounds like a minor detail, but it transforms analytical performance. When you query "average salary by department" across a million-row dataset, a row-based format (CSV) reads every column of every row. Parquet reads only the salary and department columns, skipping everything else. The result: 10-100x faster queries on large datasets.
The format also compresses dramatically better than CSV because similar values in a column compress together (all the dates, all the prices, all the names). A 10 GB CSV file commonly becomes a 1-2 GB Parquet file. Add predicate pushdown (skipping row groups that can't contain matching data) and you're reading a tiny fraction of the file for most queries.
Parquet is the standard for data lakes, analytics warehouses, and any dataset too large for a spreadsheet. Spark, Pandas, DuckDB, BigQuery, Snowflake, and Athena all read Parquet natively. For small datasets, CSV is simpler. For anything over a few hundred MB, Parquet pays for itself immediately.