.parquet

What is a .parquet file?

Parquet is a columnar storage format for big data analytics — dramatically faster than CSV for analytical queries.

Safe format
Type Data
By Apache Software Foundation (Twitter + Cloudera)
MIME application/vnd.apache.parquet

Drop any file to identify it

No upload. No signup. No sending your file halfway across the internet.
We tell you what it is, right here in your browser.

What is it

Parquet stores data by column instead of by row. This sounds like a minor detail, but it transforms analytical performance. When you query "average salary by department" across a million-row dataset, a row-based format (CSV) reads every column of every row. Parquet reads only the salary and department columns, skipping everything else. The result: 10-100x faster queries on large datasets.

The format also compresses dramatically better than CSV because similar values in a column compress together (all the dates, all the prices, all the names). A 10 GB CSV file commonly becomes a 1-2 GB Parquet file. Add predicate pushdown (skipping row groups that can't contain matching data) and you're reading a tiny fraction of the file for most queries.

Parquet is the standard for data lakes, analytics warehouses, and any dataset too large for a spreadsheet. Spark, Pandas, DuckDB, BigQuery, Snowflake, and Athena all read Parquet natively. For small datasets, CSV is simpler. For anything over a few hundred MB, Parquet pays for itself immediately.

Technical details
Full Name
Apache Parquet
MIME Type
application/vnd.apache.parquet
Developer
Apache Software Foundation (Twitter + Cloudera)
Magic Bytes
50 41 52 31
Safety
.parquet is a known, safe format. Binary data format. No executable content.
What opens it
DuckDB
FREE Windows / Mac / Linux
Python (pandas/pyarrow)
FREE Windows / Mac / Linux
Tad Viewer
FREE Windows / Mac / Linux
FAQ
Why is Parquet faster than CSV?
Parquet stores data by column, so queries only read the columns they need. It also uses efficient compression and supports predicate pushdown (skipping irrelevant data blocks). CSV stores data by row, forcing full-file reads for most queries.
How do I open a Parquet file?
DuckDB (free CLI) reads Parquet directly: `SELECT * FROM 'file.parquet' LIMIT 10`. Python with pandas: `pd.read_parquet('file.parquet')`. Tad Viewer (free) provides a spreadsheet-like GUI.
Related formats