Document

read_stata(filepath_or_buffer: 'FilePath | ReadBuffer[bytes]', convert_dates: 'bool' = True, convert_categoricals: 'bool' = True, index_col: 'str | None' = None, convert_missing: 'bool' = False, preserve_dtypes: 'bool' = True, columns: 'Sequence[str] | None' = None, order_categoricals: 'bool' = True, chunksize: 'int | None' = None, iterator: 'bool' = False, compression: 'CompressionOptions' = 'infer', storage_options: 'StorageOptions' = None) -> 'DataFrame | StataReader'

Notes

Categorical variables read through an iterator may not have the same categories and dtype. This occurs when a variable stored in a DTA file is associated to an incomplete set of value labels that only label a strict subset of the values.

Parameters

filepath_or_buffer : str, path object or file-like object

Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. A local file could be: file://localhost/path/to/table.dta .

If you want to pass in a path object, pandas accepts any os.PathLike .

By file-like object, we refer to objects with a read() method, such as a file handle (e.g. via builtin open function) or StringIO .

convert_dates : bool, default True

Convert date variables to DataFrame time values.

convert_categoricals : bool, default True

Read value labels and convert columns to Categorical/Factor variables.

index_col : str, optional

Column to set as index.

convert_missing : bool, default False

Flag indicating whether to convert missing values to their Stata representations. If False, missing values are replaced with nan. If True, columns containing missing values are returned with object data types and missing values are represented by StataMissingValue objects.

preserve_dtypes : bool, default True

Preserve Stata datatypes. If False, numeric data are upcast to pandas default types for foreign data (float64 or int64).

columns : list or None

Columns to retain. Columns will be returned in the given order. None returns all columns.

order_categoricals : bool, default True

Flag indicating whether converted categorical data are ordered.

chunksize : int, default None

Return StataReader object for iterations, returns chunks with given number of lines.

iterator : bool, default False

Return StataReader object.

compression : str or dict, default 'infer'

For on-the-fly decompression of on-disk data. If 'infer' and '%s' is path-like, then detect compression from the following extensions: '.gz', '.bz2', '.zip', '.xz', or '.zst' (otherwise no compression). If using 'zip', the ZIP file must contain only one data file to be read in. Set to None for no decompression. Can also be a dict with key 'method' set to one of { 'zip' , 'gzip' , 'bz2' , 'zstd' } and other key-value pairs are forwarded to zipfile.ZipFile , gzip.GzipFile , bz2.BZ2File , or zstandard.ZstdDecompressor , respectively. As an example, the following could be passed for Zstandard decompression using a custom compression dictionary: compression={'method': 'zstd', 'dict_data': my_compression_dict} .

storage_options : dict, optional

Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib as header options. For other URLs (e.g. starting with "s3://", and "gcs://") the key-value pairs are forwarded to fsspec . Please see fsspec and urllib for more details.

Returns

DataFrame or StataReader

Read Stata file into DataFrame.

Examples

Creating a dummy stata for this example >>> df = pd.DataFrame({'animal': ['falcon', 'parrot', 'falcon', ... 'parrot'], ... 'speed': [350, 18, 361, 15]}) # doctest: +SKIP >>> df.to_stata('animals.dta') # doctest: +SKIP

Read a Stata dta file:

This example is valid syntax, but we were not able to check execution

>>> df = pd.read_stata('animals.dta')  # doctest: +SKIP

Read a Stata dta file in 10,000 line chunks: >>> values = np.random.randint(0, 10, size=(20_000, 1), dtype="uint8") # doctest: +SKIP >>> df = pd.DataFrame(values, columns=["i"]) # doctest: +SKIP >>> df.to_stata('filename.dta') # doctest: +SKIP

This example is valid syntax, but we were not able to check execution

>>> itr = pd.read_stata('filename.dta', chunksize=10000)  # doctest: +SKIP
... for chunk in itr:
...    # Operate on a single chunk, e.g., chunk.mean()
...    pass  # doctest: +SKIP

See :

Back References

The following pages refer to to this document either explicitly or contain code examples using this.

pandas.core.frame.DataFrame.to_stata

Local connectivity graph

Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.

Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)

SVG is more flexible but power hungry; and does not scale well to 50 + nodes.

All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them