Document

to_parquet(self, path: 'FilePath | WriteBuffer[bytes] | None' = None, engine: 'str' = 'auto', compression: 'str | None' = 'snappy', index: 'bool | None' = None, partition_cols: 'list[str] | None' = None, storage_options: 'StorageOptions' = None, **kwargs) -> 'bytes | None'

This function writes the dataframe as a :None:None:`parquet file <https://parquet.apache.org/>`. You can choose different parquet backends, and have the option of compression. See the user guide <io.parquet> for more details.

Notes

This function requires either the :None:None:`fastparquet <https://pypi.org/project/fastparquet>` or :None:None:`pyarrow <https://arrow.apache.org/docs/python/>` library.

Parameters

path : str, path object, file-like object, or None, default None

String, path object (implementing os.PathLike[str] ), or file-like object implementing a binary write() function. If None, the result is returned as bytes. If a string or path, it will be used as Root Directory path when writing a partitioned dataset.

versionchanged

Previously this was "fname"

engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'

Parquet library to use. If 'auto', then the option io.parquet.engine is used. The default io.parquet.engine behavior is to try 'pyarrow', falling back to 'fastparquet' if 'pyarrow' is unavailable.

compression : {'snappy', 'gzip', 'brotli', None}, default 'snappy'

Name of the compression to use. Use None for no compression.

index : bool, default None

If True , include the dataframe's index(es) in the file output. If False , they will not be written to the file. If None , similar to True the dataframe's index(es) will be saved. However, instead of being saved as values, the RangeIndex will be stored as a range in the metadata so it doesn't require much space and is faster. Other indexes will be included as columns in the file output.

partition_cols : list, optional, default None

Column names by which to partition the dataset. Columns are partitioned in the order they are given. Must be None if path is not a string.

storage_options : dict, optional

Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib as header options. For other URLs (e.g. starting with "s3://", and "gcs://") the key-value pairs are forwarded to fsspec . Please see fsspec and urllib for more details.

versionadded

**kwargs :

Additional arguments passed to the parquet library. See pandas io <io.parquet> for more details.

Returns

bytes if no path argument is provided else None

Write a DataFrame to the binary parquet format.

Examples

This example is valid syntax, but we were not able to check execution

>>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
... df.to_parquet('df.parquet.gzip',
...               compression='gzip')  # doctest: +SKIP
... pd.read_parquet('df.parquet.gzip')  # doctest: +SKIP
   col1  col2
0     1     3
1     2     4

If you want to get a buffer to the parquet content you can use a io.BytesIO object, as long as you don't use partition_cols, which creates multiple files.

This example is valid syntax, but we were not able to check execution

>>> import io
... f = io.BytesIO()
... df.to_parquet(f)
... f.seek(0)
0

This example is valid syntax, but we were not able to check execution

>>> content = f.read()

See :

Back References

The following pages refer to to this document either explicitly or contain code examples using this.

pandas.core.generic.NDFrame.to_hdf pandas.core.generic.NDFrame.to_xarray pandas.core.frame.DataFrame.to_feather

Local connectivity graph

Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.

Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)

SVG is more flexible but power hungry; and does not scale well to 50 + nodes.

All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them

File: /pandas/core/frame.py#2737
type: <class 'function'>
Commit:

Notes

Parameters

Returns

See Also

Examples

Back References

Local connectivity graph