Document

pandas 1.4.2

Notes Parameters Raises Returns BackRef

split(self, pat: 'str | re.Pattern | None' = None, n=-1, expand=False, *, regex: 'bool | None' = None)

Splits the string in the Series/Index from the beginning, at the specified delimiter string.

Notes

The handling of the n keyword depends on the number of found splits:

If found splits > n, make first n splits only
If found splits <= n, make all splits
If for a certain row the number of found splits < n, append :None:None:`None` for padding up to n if expand=True

If using expand=True , Series and Index callers return DataFrame and MultiIndex objects, respectively.

Use of :None:None:`regex=False` with a :None:None:`pat` as a compiled regex will raise an error.

Parameters

pat : str or compiled regex, optional

String or regular expression to split on. If not specified, split on whitespace.

n : int, default -1 (all)

Limit number of splits in output. None , 0 and -1 will be interpreted as return all splits.

expand : bool, default False

Expand the split strings into separate columns.

If True , return DataFrame/MultiIndex expanding dimensionality.
If False , return Series/Index, containing lists of strings.

regex : bool, default None

Determines if the passed-in pattern is a regular expression:

If True , assumes the passed-in pattern is a regular expression
If False , treats the pattern as a literal string.
If None and :None:None:`pat` length is 1, treats :None:None:`pat` as a literal string.
If None and :None:None:`pat` length is not 1, treats :None:None:`pat` as a regular expression.
Cannot be set to False if :None:None:`pat` is a compiled regex

versionadded

Raises

ValueError

if regex is False and :None:None:`pat` is a compiled regex

Returns

Series, Index, DataFrame or MultiIndex: Type matches caller unless expand=True (see Notes).

Split strings around given separator/delimiter.

See Also

Series.str.join: Join lists contained as elements in the Series/Index with passed delimiter.

Series.str.rsplit: Splits string around given separator/delimiter, starting from the right.

Series.str.split: Split strings around given separator/delimiter.

str.rsplit: Standard library version for rsplit.

str.split: Standard library version for split.

Examples

This example is valid syntax, but we were not able to check execution

>>> s = pd.Series(
...     [
...         "this is a regular sentence",
...         "https://docs.python.org/3/tutorial/index.html",
...         np.nan
...     ]
... )
... s
0                       this is a regular sentence
1    https://docs.python.org/3/tutorial/index.html
2                                              NaN
dtype: object

In the default setting, the string is split by whitespace.

This example is valid syntax, but we were not able to check execution

>>> s.str.split()
0                   [this, is, a, regular, sentence]
1    [https://docs.python.org/3/tutorial/index.html]
2                                                NaN
dtype: object

Without the n parameter, the outputs of rsplit and split are identical.

This example is valid syntax, but we were not able to check execution

>>> s.str.rsplit()
0                   [this, is, a, regular, sentence]
1    [https://docs.python.org/3/tutorial/index.html]
2                                                NaN
dtype: object

The n parameter can be used to limit the number of splits on the delimiter. The outputs of split and rsplit are different.

This example is valid syntax, but we were not able to check execution

>>> s.str.split(n=2)
0                     [this, is, a regular sentence]
1    [https://docs.python.org/3/tutorial/index.html]
2                                                NaN
dtype: object

This example is valid syntax, but we were not able to check execution

>>> s.str.rsplit(n=2)
0                     [this is a, regular, sentence]
1    [https://docs.python.org/3/tutorial/index.html]
2                                                NaN
dtype: object

The :None:None:`pat` parameter can be used to split by other characters.

This example is valid syntax, but we were not able to check execution

>>> s.str.split(pat="/")
0                         [this is a regular sentence]
1    [https:, , docs.python.org, 3, tutorial, index...
2                                                  NaN
dtype: object

When using expand=True , the split elements will expand out into separate columns. If NaN is present, it is propagated throughout the columns during the split.

This example is valid syntax, but we were not able to check execution

>>> s.str.split(expand=True)
                                               0     1     2        3         4
0                                           this    is     a  regular  sentence
1  https://docs.python.org/3/tutorial/index.html  None  None     None      None
2                                            NaN   NaN   NaN      NaN       NaN

For slightly more complex use cases like splitting the html document name from a url, a combination of parameter settings can be used.

This example is valid syntax, but we were not able to check execution

>>> s.str.rsplit("/", n=1, expand=True)
                                    0           1
0          this is a regular sentence        None
1  https://docs.python.org/3/tutorial  index.html
2                                 NaN         NaN

Remember to escape special characters when explicitly using regular expressions.

This example is valid syntax, but we were not able to check execution

>>> s = pd.Series(["foo and bar plus baz"])
... s.str.split(r"and|plus", expand=True)
    0   1   2
0 foo bar baz

Regular expressions can be used to handle urls or file names. When :None:None:`pat` is a string and regex=None (the default), the given :None:None:`pat` is compiled as a regex only if len(pat) != 1 .

This example is valid syntax, but we were not able to check execution

>>> s = pd.Series(['foojpgbar.jpg'])
... s.str.split(r".", expand=True)
           0    1
0  foojpgbar  jpg

This example is valid syntax, but we were not able to check execution

>>> s.str.split(r"\.jpg", expand=True)
           0 1
0  foojpgbar

When regex=True , :None:None:`pat` is interpreted as a regex

This example is valid syntax, but we were not able to check execution

>>> s.str.split(r"\.jpg", regex=True, expand=True)
           0 1
0  foojpgbar

A compiled regex can be passed as :None:None:`pat`

This example is valid syntax, but we were not able to check execution

>>> import re
... s.str.split(re.compile(r"\.jpg"), expand=True)
           0 1
0  foojpgbar

When regex=False , :None:None:`pat` is interpreted as the string itself

This example is valid syntax, but we were not able to check execution

>>> s.str.split(r"\.jpg", regex=False, expand=True)
               0
0  foojpgbar.jpg

See :

Back References

The following pages refer to to this document either explicitly or contain code examples using this.

pandas.core.strings.accessor.StringMethods.split pandas.core.strings.accessor.StringMethods.rsplit pandas.core.strings.accessor.StringMethods.cat

Local connectivity graph

Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.

Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)

SVG is more flexible but power hungry; and does not scale well to 50 + nodes.

All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them

File: /pandas/core/strings/accessor.py#834
type: <class 'function'>
Commit: