您的位置:

Python中正则表达式的基本用法

正则表达式(regular expression)是一种用于描述字符串结构的语法规则,利用一些特殊符号和组合,可以方便地进行字符串的匹配、查找、替换等操作。在Python中,通过re模块可以实现正则表达式的处理。

一、基本元字符

1、.:匹配任意单个字符,除了\n

示例代码:

```python import re pattern = r"hello.world" string = "hello\nworld" match = re.search(pattern, string) if match: print("Match found: " + match.group()) else: print("No match") ```

输出:No match

2、^:匹配字符串开头

示例代码:

```python import re pattern = r"^hello" string = "hello world" match = re.search(pattern, string) if match: print("Match found: " + match.group()) else: print("No match") ```

输出:Match found: hello

3、$:匹配字符串结尾

示例代码:

```python import re pattern = r"world$" string = "hello world" match = re.search(pattern, string) if match: print("Match found: " + match.group()) else: print("No match") ```

输出:Match found: world

二、字符集

字符集用[]表示,匹配[]中的任意一个字符

示例代码:

```python import re pattern = r"[aeiou]" string = "hello world" match = re.search(pattern, string) if match: print("Match found: " + match.group()) else: print("No match") ```

输出:Match found: e

字符集中除了[a-z]等常见形式外,还可以使用如下缩写:

  • \d:匹配任意一个数字
  • \D:匹配任意一个非数字字符
  • \s:匹配任意一个空白字符,包括空格、制表符、换行符等
  • \S:匹配任意一个非空白字符
  • \w:匹配任意一个字母、数字或下划线
  • \W:匹配任意一个非字母、数字或下划线的字符

示例代码:

```python import re pattern1 = r"\d" pattern2 = r"\s" pattern3 = r"\w" string = "hello 123 world" match1 = re.search(pattern1, string) if match1: print("Match found: " + match1.group()) else: print("No match") match2 = re.search(pattern2, string) if match2: print("Match found: " + match2.group()) else: print("No match") match3 = re.search(pattern3, string) if match3: print("Match found: " + match3.group()) else: print("No match") ```

输出:

``` Match found: 1 Match found: Match found: h ```

三、量词

量词可以控制匹配的次数,包括:

  • *:匹配前一个字符0或多次
  • +:匹配前一个字符1或多次
  • ?:匹配前一个字符0或1次
  • {n}:匹配前一个字符n次
  • {m,n}:匹配前一个字符至少m次,最多n次(不包括m或n)

示例代码:

```python import re pattern1 = r"o*l" pattern2 = r"o+l" pattern3 = r"o?l" pattern4 = r"o{2}l" pattern5 = r"o{1,2}l" string1 = "hello world" string2 = "hollo world" string3 = "hllo world" string4 = "hool world" string5 = "hoool world" match1 = re.search(pattern1, string1) if match1: print("Match found: " + match1.group()) else: print("No match") match2 = re.search(pattern2, string1) if match2: print("Match found: " + match2.group()) else: print("No match") match3 = re.search(pattern3, string1) if match3: print("Match found: " + match3.group()) else: print("No match") match4 = re.search(pattern4, string1) if match4: print("Match found: " + match4.group()) else: print("No match") match5 = re.search(pattern5, string1) if match5: print("Match found: " + match5.group()) else: print("No match") match6 = re.search(pattern5, string4) if match6: print("Match found: " + match6.group()) else: print("No match") match7 = re.search(pattern5, string5) if match7: print("Match found: " + match7.group()) else: print("No match") ```

输出:

``` Match found: ol Match found: ol Match found: l No match Match found: ol Match found: ool Match found: oool ```

四、分组

分组通过()实现,可以将多个字符当成一个整体进行匹配。

示例代码:

```python import re pattern = r"(ab)+" string1 = "ababab" string2 = "ab" match1 = re.search(pattern, string1) if match1: print("Match found: " + match1.group()) else: print("No match") match2 = re.search(pattern, string2) if match2: print("Match found: " + match2.group()) else: print("No match") ```

输出:

``` Match found: ababab No match ```

五、转义字符

如果需要匹配正则表达式中的特殊字符本身,可以使用转义字符\

示例代码:

```python import re pattern = r"\." string = "hello.world" match = re.search(pattern, string) if match: print("Match found: " + match.group()) else: print("No match") ```

输出:

``` Match found: . ```

六、re模块常用函数

在Python中,匹配正则表达式通常使用re模块提供的函数实现。

  • re.search(pattern, string, flags=0):在string中搜索匹配pattern的第一个位置,返回MatchObject实例
  • re.match(pattern, string, flags=0):从string的起始位置开始搜索匹配pattern的第一个位置,返回MatchObject实例
  • re.findall(pattern, string, flags=0):搜索string中所有匹配pattern的子串,并返回一个由匹配字符串构成的列表
  • re.finditer(pattern, string, flags=0):搜索string中所有匹配pattern的子串,并返回一个由MatchObject实例构成的迭代器
  • re.split(pattern, string, maxsplit=0, flags=0):根据pattern进行分割字符串,返回分割后的列表
  • re.sub(pattern, repl, string, count=0, flags=0):使用repl替换string中匹配patter的子串,count控制替换次数

示例代码:

```python import re pattern = r"\d+" string = "hello 123 world 456" match = re.search(pattern, string) if match: print("Match found: " + match.group()) else: print("No match") match_all = re.findall(pattern, string) if match_all: print(match_all) match_iter = re.finditer(pattern, string) for match in match_iter: print(match.group()) split_list = re.split(pattern, string) print(split_list) sub_str = re.sub(pattern, "X", string) print(sub_str) sub_str_limit = re.sub(pattern, "X", string, count=1) print(sub_str_limit) ```

输出:

``` Match found: 123 ['123', '456'] 123 456 ['hello ', ' world ', ''] hello X world X hello X world 456 ```

七、flags参数

在使用re模块时,可以使用flags参数指定不同的匹配选项,常用的选项包括:

  • re.I / re.IGNORECASE:忽略大小写
  • re.S / re.DOTALL:匹配任意字符,包括换行符
  • re.M / re.MULTILINE:多行匹配
  • re.X / re.VERBOSE:忽略正则表达式中的空白符,以使表达式更易读

示例代码:

```python import re pattern1 = r"(?i)hello" pattern2 = r"(?s)hello.world" pattern3 = r"(?m)^world$" pattern4 = r"(?x)h e l l o . w o r l d" string = "Hello\nWORLD\nhello.world" match1 = re.search(pattern1, string) if match1: print("Match found: " + match1.group()) else: print("No match") match2 = re.search(pattern2, string) if match2: print("Match found: " + match2.group()) else: print("No match") match3 = re.search(pattern3, string) if match3: print("Match found: " + match3.group()) else: print("No match") match4 = re.search(pattern4, string) if match4: print("Match found: " + match4.group()) else: print("No match") ```

输出:

``` Match found: Hello Match found: hello.world Match found: WORLD Match found: hello.world ```

八、总结

正则表达式是一种强大的文本处理工具,掌握基本的语法规则可以使我们在处理字符串时更加灵活高效。在Python中,re模块提供了方便的接口,能够轻松实现正则表达式的匹配、查找、替换等操作。