Python RegEx

RegEx 或正则表达式是组成搜索模式的字符序列。

RegEx 可用于检查字符串是否包含指定的搜索模式。


正则表达式模块

Python 有一个名为 的内置包,可用于使用正则表达式。re

导入模块:re

import re


Python 中的 RegEx

导入模块后,可以开始使用正则表达式:re

例子

搜索字符串以查看其是否以"The"开头,以"西班牙"结尾:

 import re

txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)

正则表达式功能

该模块提供了一组函数,允许我们搜索字符串以寻找匹配项:re

| Function | Description | | findall | Returns a list containing all matches | | search | Returns a Match object if there is a match anywhere in the string | | split | Returns a list where the string has been split at each match | | sub | Replaces one or many matches with a string |



元字符

元字符是具有特殊含义的字符:

| Character | Description | Example | | [] | A set of characters | “[a-m]” | | \ | Signals a special sequence (can also be used to escape special characters) | “\d” | | . | Any character (except newline character) | “he..o” | | ^ | Starts with | “^hello” | | $ | Ends with | “world$” | | * | Zero or more occurrences | “aix*” | | + | One or more occurrences | “aix+” | | {} | Exactly the specified number of occurrences | “al{2}” | | | | Either or | “falls|stays” | | () | Capture and group |   |   |


特殊序列

特殊序列后跟以下列表中的一个字符,具有特殊含义:\

| Character | Description | Example | | \A | Returns a match if the specified characters are at the beginning of the string | “\AThe” | | \b | Returns a match where the specified characters are at the beginning or at the end of a word (the “r” in the beginning is making sure that the string is being treated as a “raw string”) | r"\bain" r"ain\b" |

| \B | Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word (the “r” in the beginning is making sure that the string is being treated as a “raw string”) | r"\Bain" r"ain\B" |

| \d | Returns a match where the string contains digits (numbers from 0-9) | “\d” | | \D | Returns a match where the string DOES NOT contain digits | “\D” | | \s | Returns a match where the string contains a white space character | “\s” | | \S | Returns a match where the string DOES NOT contain a white space character | “\S” | | \w | Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character) | “\w” | | \W | Returns a match where the string DOES NOT contain any word characters | “\W” | | \Z | Returns a match if the specified characters are at the end of the string | “Spain\Z” |


一组是一对方括号内的一组字符,具有特殊的含义: []

| Set | Description | | [arn] | Returns a match where one of the specified characters (a, r, or n) are present | | [a-n] | Returns a match for any lower case character, alphabetically between a and n | | [^arn] | Returns a match for any character EXCEPT a, r, and n | | [0123] | Returns a match where any of the specified digits (0, 1, 2, or 3) are present | | [0-9] | Returns a match for any digit between 0 and 9 | | [0-5][0-9] | Returns a match for any two-digit numbers from 00 and 59 | | [a-zA-Z] | Returns a match for any character alphabetically between a and z, lower case OR upper case | | [+] | In sets, +, *, ., |, (), $,{} has no special meaning, so [+] means: return a match for any + character in the string |


 

findall () 函数

函数返回包含所有匹配项的列表。findall()

例子

打印所有匹配项的列表:

 import re

txt = "The rain in Spain"
x = re.findall("ai", txt)
print(x)

该列表包含按其找到顺序显示的匹配项。

如果未找到匹配项,则返回空列表:

例子

如果未找到匹配项,请返回空列表:

 import re

txt = "The rain in Spain"
x = re.findall("Portugal", txt)
print(x)

 

搜索() 函数

函数搜索字符串以寻找匹配项,并在有匹配项时返回 Match 对象。search()

如果有多个匹配项,则只返回匹配项的第一个匹配项:

例子

搜索字符串中的第一个空白字符:

 import re

txt = "The rain in Spain"
x = re.search("\s", txt)

print("The first white-space character is located in position:", x.start())

如果未找到匹配项,则返回该值:None

例子

进行不匹配的搜索:

 import re

txt = "The rain in Spain"
x = re.search("Portugal", txt)
print(x)

 

拆分() 函数

函数返回一个列表,其中字符串已在每个匹配项上拆分:split()

例子

在每个空白字符上拆分:

 import re

txt = "The rain in Spain"
x = re.split("\s", txt)
print(x)

您可以通过指定参数来控制发生次数:maxsplit

例子

仅在第一次出现时拆分字符串:

 import re

txt = "The rain in Spain"
x = re.split("\s", txt, 1)
print(x)

 

子() 函数

函数将匹配项替换为您选择的文本:sub()

例子

将每个空格字符替换为数字 9:

 import re

txt = "The rain in Spain"
x = re.sub("\s", "9", txt)
print(x)

您可以通过指定参数来控制替换数:count

例子

替换前 2 个匹配项:

 import re

txt = "The rain in Spain"
x = re.sub("\s", "9", txt, 2)
print(x)

 

匹配对象

匹配对象是包含有关搜索和结果的信息的对象。

**注:**如果没有匹配项,将返回该值,而不是匹配对象。None

例子

进行将返回匹配对象的搜索:

 import re

txt = "The rain in Spain"
x = re.search("ai", txt)
print(x) #this will print an object

Match 对象具有用于检索有关搜索的信息以及结果的属性和方法:

.span()返回包含匹配的开始位置和结束位置的元组。 返回传递到函数中的 字符串返回有匹配项的字符串部分.string``````.group()

例子

打印第一次匹配发生的位置(开始位置和结束位置)。

正则表达式寻找以大写"S"开头的任何单词:

 import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(**x.span()**)

例子

打印传递到函数中的字符串:

 import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(**x.string**)

例子

打印有匹配项的字符串部分。

正则表达式寻找以大写"S"开头的任何单词:

 import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(**x.group()**)