.hdf5

What is a .hdf5 file?

HDF5 is a format for storing large, complex scientific datasets with hierarchical organisation and metadata.

Safe format
Type Data
By The HDF Group
MIME application/x-hdf5

Drop any file to identify it

No upload. No signup. No sending your file halfway across the internet.
We tell you what it is, right here in your browser.

What is it

HDF5 is the scientific community's answer to "how do I store a petabyte of telescope data with full metadata." The format organises data hierarchically (like a filesystem within a file), supports datasets of arbitrary size, includes self-describing metadata, and handles multidimensional arrays natively. It's the standard for astronomy, genomics, climate science, and machine learning model weights.

A single HDF5 file can contain thousands of datasets organised in groups (like folders), each with attached metadata (units, dimensions, provenance). The format supports partial I/O — you can read one variable from a 100 GB file without loading the rest. Compression is built in, and parallel I/O allows multiple processes to read/write simultaneously.

Python's h5py and HDFView (free GUI) are the standard tools. NumPy arrays map directly to HDF5 datasets. PyTorch and TensorFlow use HDF5 for model serialisation (though both are moving toward their own formats). For ML practitioners, HDF5 is the format your pre-trained model weights might arrive in.

Technical details
Full Name
Hierarchical Data Format 5
MIME Type
application/x-hdf5
Developer
The HDF Group
Magic Bytes
89 48 44 46
Safety
.hdf5 is a known, safe format. Scientific data file. No executable content.
What opens it
HDFView
FREE Windows / Mac / Linux
Python (h5py)
FREE Windows / Mac / Linux
MATLAB
PAID Windows / Mac / Linux
FAQ
How do I open an HDF5 file?
HDFView (free from The HDF Group) provides a visual browser. Python with h5py: `import h5py; f = h5py.File('data.hdf5', 'r')`. MATLAB reads HDF5 natively with h5read().
Why do scientists use HDF5 instead of CSV?
HDF5 handles multidimensional arrays, hierarchical organisation, metadata, and partial reads efficiently. A 50 GB climate dataset in CSV would be unmanagebly large and slow. HDF5 handles it with random access to specific variables.
Related formats