split(self, pat: 'str | re.Pattern | None' = None, n=-1, expand=False, *, regex: 'bool | None' = None)
Splits the string in the Series/Index from the beginning, at the specified delimiter string.
The handling of the n
keyword depends on the number of found splits:
If found splits <= n
, make all splits
If for a certain row the number of found splits < n
, append :None:None:`None`
for padding up to n
if expand=True
If using expand=True
, Series and Index callers return DataFrame and MultiIndex objects, respectively.
Use of :None:None:`regex=False`
with a :None:None:`pat`
as a compiled regex will raise an error.
String or regular expression to split on. If not specified, split on whitespace.
Limit number of splits in output. None
, 0 and -1 will be interpreted as return all splits.
Expand the split strings into separate columns.
If True
, return DataFrame/MultiIndex expanding dimensionality.
If False
, return Series/Index, containing lists of strings.
Determines if the passed-in pattern is a regular expression:
If True
, assumes the passed-in pattern is a regular expression
If False
, treats the pattern as a literal string.
If None
and :None:None:`pat`
length is 1, treats :None:None:`pat`
as a literal string.
If None
and :None:None:`pat`
length is not 1, treats :None:None:`pat`
as a regular expression.
Cannot be set to False if :None:None:`pat`
is a compiled regex
if regex
is False and :None:None:`pat`
is a compiled regex
Type matches caller unless expand=True
(see Notes).
Split strings around given separator/delimiter.
Series.str.join
Join lists contained as elements in the Series/Index with passed delimiter.
Series.str.rsplit
Splits string around given separator/delimiter, starting from the right.
Series.str.split
Split strings around given separator/delimiter.
str.rsplit
Standard library version for rsplit.
str.split
Standard library version for split.
>>> s = pd.Series(
... [
... "this is a regular sentence",
... "https://docs.python.org/3/tutorial/index.html",
... np.nan
... ]
... )
... s 0 this is a regular sentence 1 https://docs.python.org/3/tutorial/index.html 2 NaN dtype: object
In the default setting, the string is split by whitespace.
This example is valid syntax, but we were not able to check execution>>> s.str.split() 0 [this, is, a, regular, sentence] 1 [https://docs.python.org/3/tutorial/index.html] 2 NaN dtype: object
Without the n
parameter, the outputs of rsplit
and split
are identical.
>>> s.str.rsplit() 0 [this, is, a, regular, sentence] 1 [https://docs.python.org/3/tutorial/index.html] 2 NaN dtype: object
The n
parameter can be used to limit the number of splits on the delimiter. The outputs of split
and rsplit
are different.
>>> s.str.split(n=2) 0 [this, is, a regular sentence] 1 [https://docs.python.org/3/tutorial/index.html] 2 NaN dtype: objectThis example is valid syntax, but we were not able to check execution
>>> s.str.rsplit(n=2) 0 [this is a, regular, sentence] 1 [https://docs.python.org/3/tutorial/index.html] 2 NaN dtype: object
The :None:None:`pat`
parameter can be used to split by other characters.
>>> s.str.split(pat="/") 0 [this is a regular sentence] 1 [https:, , docs.python.org, 3, tutorial, index... 2 NaN dtype: object
When using expand=True
, the split elements will expand out into separate columns. If NaN is present, it is propagated throughout the columns during the split.
>>> s.str.split(expand=True) 0 1 2 3 4 0 this is a regular sentence 1 https://docs.python.org/3/tutorial/index.html None None None None 2 NaN NaN NaN NaN NaN
For slightly more complex use cases like splitting the html document name from a url, a combination of parameter settings can be used.
This example is valid syntax, but we were not able to check execution>>> s.str.rsplit("/", n=1, expand=True) 0 1 0 this is a regular sentence None 1 https://docs.python.org/3/tutorial index.html 2 NaN NaN
Remember to escape special characters when explicitly using regular expressions.
This example is valid syntax, but we were not able to check execution>>> s = pd.Series(["foo and bar plus baz"])
... s.str.split(r"and|plus", expand=True) 0 1 2 0 foo bar baz
Regular expressions can be used to handle urls or file names. When :None:None:`pat`
is a string and regex=None
(the default), the given :None:None:`pat`
is compiled as a regex only if len(pat) != 1
.
>>> s = pd.Series(['foojpgbar.jpg'])This example is valid syntax, but we were not able to check execution
... s.str.split(r".", expand=True) 0 1 0 foojpgbar jpg
>>> s.str.split(r"\.jpg", expand=True) 0 1 0 foojpgbar
When regex=True
, :None:None:`pat`
is interpreted as a regex
>>> s.str.split(r"\.jpg", regex=True, expand=True) 0 1 0 foojpgbar
A compiled regex can be passed as :None:None:`pat`
>>> import re
... s.str.split(re.compile(r"\.jpg"), expand=True) 0 1 0 foojpgbar
When regex=False
, :None:None:`pat`
is interpreted as the string itself
>>> s.str.split(r"\.jpg", regex=False, expand=True) 0 0 foojpgbar.jpgSee :
The following pages refer to to this document either explicitly or contain code examples using this.
pandas.core.strings.accessor.StringMethods.split
pandas.core.strings.accessor.StringMethods.rsplit
pandas.core.strings.accessor.StringMethods.cat
Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.
Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)
SVG is more flexible but power hungry; and does not scale well to 50 + nodes.
All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them