to_parquet(df: 'DataFrame', path: 'FilePath | WriteBuffer[bytes] | None' = None, engine: 'str' = 'auto', compression: 'str | None' = 'snappy', index: 'bool | None' = None, storage_options: 'StorageOptions' = None, partition_cols: 'list[str] | None' = None, **kwargs) -> 'bytes | None'
String, path object (implementing os.PathLike[str]
), or file-like object implementing a binary write()
function. If None, the result is returned as bytes. If a string, it will be used as Root Directory path when writing a partitioned dataset. The engine fastparquet does not accept file-like objects.
Parquet library to use. If 'auto', then the option io.parquet.engine
is used. The default io.parquet.engine
behavior is to try 'pyarrow', falling back to 'fastparquet' if 'pyarrow' is unavailable.
default 'snappy'. Name of the compression to use. Use None
for no compression. The supported compression methods actually depend on which engine is used. For 'pyarrow', 'snappy', 'gzip', 'brotli', 'lz4', 'zstd' are all supported. For 'fastparquet', only 'gzip' and 'snappy' are supported.
If True
, include the dataframe's index(es) in the file output. If False
, they will not be written to the file. If None
, similar to True
the dataframe's index(es) will be saved. However, instead of being saved as values, the RangeIndex will be stored as a range in the metadata so it doesn't require much space and is faster. Other indexes will be included as columns in the file output.
Column names by which to partition the dataset. Columns are partitioned in the order they are given. Must be None if path is not a string.
Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib
as header options. For other URLs (e.g. starting with "s3://", and "gcs://") the key-value pairs are forwarded to fsspec
. Please see fsspec
and urllib
for more details.
Additional keyword arguments passed to the engine
Write a DataFrame to the parquet format.
Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.
Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)
SVG is more flexible but power hungry; and does not scale well to 50 + nodes.
All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them