【Python之路】特別篇

highoo 2019-03-20

展開全文

正則表達式的基礎

　　正則表達式并不是Python的一部分。正則表達式是用于處理字符串的強大工具，擁有自己獨特的語法以及一個獨立的處理引擎，效率上可能不如str自帶的方法，但功能十分強大。得益于這一點，在提供了正則表達式的語言里，正則表達式的語法都是一樣的，區(qū)別只在于不同的編程語言實現(xiàn)支持的語法數(shù)量不同。就其本質而言，正則表達式（或 RE）是一種小型的、高度專業(yè)化的編程語言，（在Python中）它內嵌在Python中，并通過 re 模塊實現(xiàn)。

元字符:　　. ^ $ * + ? { [ ] \ | ( )

.　匹配除了換行符外所有字符 (通配符)

?

1

2

3

4

content = 'Abcdefghijklmnopq'

test = re.findall(r"b.d",content)

print(test)

['bcd']

^　以....開頭

?

1

2

3

4

content = 'Abcdefghijklmnopq'

test = re.findall(r"^Abcd",content)

print(test)

['Abcd']

$　以....結尾

?

1

2

3

4

content = 'Abcdefghijklmnopq'

test = re.findall(r"nopq$",content)

print(test)

['nopq']

*　匹配0到多次 {0, } 　控制它前面的字符

?

1

2

3

4

content = 'Abcdefghijklmnopq'

test = re.findall(r"A.*e",content)

print(test)

['Abcde']

+　匹配1到多次 {1, }

?

1

2

3

4

content = 'abcdefab111111'

test = re.findall(r"ab1+",content)

print(test)

['ab111111']

?　匹配0到1次 {0,1}

?

1

2

3

4

content = 'abcdefab111111'

test = re.findall(r"ab1?",content)

print(test)

['ab', 'ab1']

* + ? 都是按照貪婪模式進行匹配非貪婪模式需要在后面加個?

?

1

2

3

4

5

6

7

8

9

10

content = 'abcdefab111111'

test = re.findall(r"ab1+?",content)

print(test)

['ab1']

re.search(r"a(\d+?)","a2345").group() => a2

re.search(r"a(\d*?)","a2345").group() => a

#如果前后均有限定條件 ?不起作用 re.search(r"a(\d*?)b","a2345b").group() => a2345b

( )　組　　作為一個整體

?

1

2

3

4

content = 'abcdefab111111'

test = re.findall(r"(ab1)",content)

print(test)

['ab1']

{ }　　重復次數(shù)自定義

?

1

2

3

4

content = 'abcdefab111111'

test = re.findall(r"ab1{3,9}",content)

print(test)

['ab111111']

[ ]　　字符集表示或

　　字符集里面元字符會失去意義除了 - \ ^ 3個元字符外

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

content1 = 'wwwwwabdxxxxx'

test1 = re.findall(r"a[bc]d",content1)

print(test1)

#['abd']

content2 = 'wwwwwacdxxxxx'

test2 = re.findall(r"a[bc]d",content2)

print(test2)

#['acd']

***********************************************************************

content = 'wwwwwa.xxxxx'

test = re.findall(r"a[.]x",content)

print(test)

#['a.x']

content = 'wwwww1234xxxxx'

test = re.findall(r"[1-9]",content) #1~9的數(shù)字

print(test)

#['1', '2', '3', '4']

content = 'wwwww1234xxxxx'

test = re.findall(r"[^1-9]",content) #非1~9的數(shù)字

print(test)

#['w', 'w', 'w', 'w', 'w', 'x', 'x', 'x', 'x', 'x']

\　作用:

后面跟元字符去除特殊功能
后面跟普通字符實現(xiàn)特殊功能
引用序號對應的字組所匹配的字符串

?

1

2

3

test = re.search(r"(alex)(eric)com\2","alexericcomeric")

print(test.group())

#alexericcomeric

\d 匹配任何十進制數(shù), [0-9]

\D 匹配任何非數(shù)字字符 [^0-9]

\s 匹配任何空白字符 [ \t\n\r\f\v ]

\S 匹配任何非空白字符 [^ \t\n\r\f\v ]

\w 匹配任何字母數(shù)字字符 [a-zA-Z0-9_]

\W 匹配任何非字母數(shù)字字符 [^a-zA-Z0-9]

\b 匹配一個單詞邊界,單詞和空格間的位置　　匹配特殊字符(不單止空格)

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

content = 'wwwww1234xxxxx'

test = re.findall(r"\d",content)

print(test)

# ['1', '2', '3', '4']

content = 'ww&*#$%ww1234xx'

test = re.findall(r"\D",content)

print(test)

# ['w', 'w', '&', '*', '#', '$', '%', 'w', 'w', 'x', 'x']

content = 'asdasd '

test = re.findall(r"\s",content)

print(test)

# [' ', ' ', ' ']

content = ' asdasd '

test = re.findall(r"\S",content)

print(test)

# ['a', 's', 'd', 'a', 's', 'd']

content = 'abc123^&*lm-\_'

test = re.findall(r"\w",content)

print(test)

# ['a', 'b', 'c', '1', '2', '3', 'l', 'm', '_']

content = 'abc123^&*lm-\_'

test = re.findall(r"\W",content)

print(test)

# ['^', '&', '*', '-', '\\']

content = 'I like Sooooo'

test = re.findall(r"like\b",content)

print(test)

# ['like']

*******************************************

test = re.findall(r"abc\b","asdasd abc ")

test = re.findall(r"abc\b","asdasd abc*")

print(test)

# ['abc']

match()

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

# match，從起始位置開始匹配，匹配成功返回一個對象，未匹配成功返回None

match(pattern, string, flags=0)

# pattern：正則模型

# string ：要匹配的字符串

# falgs ：匹配模式

# re.I(re.IGNORECASE): 忽略大小寫（括號內是完整寫法，下同）

# M(MULTILINE): 多行模式，改變'^'和'$'的行為

# S(DOTALL): 點任意匹配模式，改變'.'的行為使 . 匹配包括換行在內的所有字符

# L(LOCALE): 使預定字符類 \w \W \b \B \s \S 取決于當前區(qū)域設定

# U(UNICODE): 使預定字符類 \w \W \b \B \s \S \d \D 取決于unicode定義的字符屬性

# X(VERBOSE):   詳細模式。這個模式下正則表達式可以是多行，忽略空白字符，并可以加入注釋。以下兩個正則表達式是等價的：

*************************************

# match對象的方法

.group() 獲取匹配到的所有結果

.groups() 獲取模型中匹配到的分組結果

.groupdict() 獲取模型中匹配到的分組中所有執(zhí)行了key的組

.group() 　　返回被RE匹配的字符串可以加參數(shù)group(1) 組號

.start() 返回匹配開始的位置

.end() 　　　　返回匹配結束的位置

.span() 返回一個元組包含的匹配的位置

demo

search()

?

1

2

3

4

# search 匹配成功有結果,返回match對象

# 查看返回結果用.group()

# search,瀏覽整個字符串去匹配第一個，未匹配成功返回None

# search(pattern, string, flags=0)

demo

findall()

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

# 優(yōu)先取組里內容返回!

# findall，獲取非重復的匹配列表；如果有一個組則以列表形式返回，且每一個匹配均是字符串；如果模型中有多個組，則以列表形式返回，且每一個匹配均是元祖；

# 空的匹配也會包含在結果中

# findall(pattern, string, flags=0)

data = re.findall("\d+\w\d+",'a2b3c4d5')

# ['2b3', '4d5']

# re.findall() 匹配成功一個后,從匹配成功最后位置開始下一次查找

# 空的匹配也會包含在結果中

data = re.findall("",'a2')

print(data)

# ['', '', '']

***********************************

#有幾個括號就取幾次

data = re.findall(r'(\dasd)*','1asd2asdp3asd3434')

print(data)

# ['2asd', '', '3asd', '', '', '', '', '']

# 貪婪匹配第一段取到1asd2asd 但最后返回 2asd 取最后一個!

如下:

a= "alex"

data = re.findall(r'(\w)(\w)(\w)(\w)',a)

print(data)

# [('a', 'l', 'e', 'x')]

data = re.findall(r'(\w){4}',a)

print(data)

# ['x'] => 只是執(zhí)行了4次,返回還是按一個括號算,取最后匹配的一項

***********************************

test = re.findall("www.(baidu|laonanhai).com","asdsa www.baidu.com")

print(test)

# ['baidu']

添加 ?: 去掉優(yōu)先權

test = re.findall("www.(?:baidu|laonanhai).com","asdsa www.baidu.com")

print(test)

# ['www.baidu.com']

demo

sub()

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

# sub，替換匹配成功的指定位置字符串

sub(pattern, repl, string, count=0, flags=0)

# pattern：正則模型

# repl ：要替換的字符串或可執(zhí)行對象

# string ：要匹配的字符串

# count ：指定匹配個數(shù)

# flags ：匹配模式

test = re.sub("g.t","have","I get A, I got B , I gut C")

print(test)

#I have A, I have B , I have C

********************************************

#subn 最后還返回一個替換次數(shù)

origin = "ale4 xc 19"

data,counts = re.subn("\d+","KKK",origin)

print(data,counts)

# aleKKK xc KKK 2

compile()

?

1

2

3

4

5

regex = re.compile(r"\w*oo\w*")

text = " JGood is ,he is cool"

data = regex.findall(text)

print(data)

#['JGood', 'cool']

split()

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

# split，根據(jù)正則匹配分割字符串

split(pattern, string, maxsplit=0, flags=0)

# pattern：正則模型

# string ：要匹配的字符串

# maxsplit：指定分割個數(shù)

# flags ：匹配模式

*****************************************

# 有分組情況下, 把分割的項也添加進去

origin = "hello alex bcd alex lge alex acd 19"

r1 = re.split("(alex)", origin, 1)

print(r1)

# ['hello ', 'alex', ' bcd alex lge alex acd 19']

r2 = re.split("(al(ex))", origin, 1)

print(r2)

# ['hello ', 'alex', 'ex', ' bcd alex lge alex acd 19']

*****************************************

p = re.compile(r"\d+")

test = p.split("one1two2three3four4")

print(test)

# ['one', 'two', 'three', 'four', '']

# 末尾有空字符串

=> one,two2three3four4 => ['one'] two,three3four4 => ..

test = re.split('[bc]','abcd')

print(test)

# ['a', '', 'd']

demo

finditer()

?

1

2

3

4

5

6

7

8

9

10

11

# 返回結果為迭代對象

p = re.compile(r"\d+")

w = p.finditer(' 1 drum44ers druming , 11 ... 10 ...')

for match in w:

print(match.group(),match.span())

# 1 (1, 2)

# 44 (7, 9)

# 11 (23, 25)

# 10 (30, 32)

反斜杠的困擾

與大多數(shù)編程語言相同，正則表達式里使用"\"作為轉義字符，這就可能造成反斜杠困擾。

假如你需要匹配文本中的字符"\"，那么使用編程語言表示的正則表達式里將需要4個反斜杠"\\\\"

前兩個和后兩個分別用于在編程語言里轉義成反斜杠，轉換成兩個反斜杠后再在正則表達式里轉義成一個反斜杠。

Python里的原生字符串很好地解決了這個問題，這個例子中的正則表達式可以使用r"\\"表示。

同樣，匹配一個數(shù)字的"\\d"可以寫成r"\d"。有了原生字符串，你再也不用擔心是不是漏寫了反斜杠，寫出來的表達式也更直觀。

常用正則表達式

?

1

2

3

4

5

6

7

8

# IP：

^(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}$

# 手機號：

^1[3|4|5|8][0-9]\d{8}$

# 郵箱：

[a-zA-Z0-9_-]+@[a-zA-Z0-9_-]+(\.[a-zA-Z0-9_-]+)+

練習題:計算器

View Code

本站是提供個人知識管理的網(wǎng)絡存儲空間，所有內容均由用戶發(fā)布，不代表本站觀點。請注意甄別內容中的聯(lián)系方式、誘導購買等信息，謹防詐騙。如發(fā)現(xiàn)有害或侵權內容，請點擊一鍵舉報。

小男孩‘自慰网亚洲一区二区,亚洲一级在线播放毛片,亚洲中文字幕av每天更新,黄aⅴ永久免费无码,91成人午夜在线精品,色网站免费在线观看,亚洲欧洲wwwww在线观看

【Python之路】特別篇

正則表達式的基礎

元字符: . ^ $ * + ? { [ ] \ | ( )

反斜杠的困擾

常用正則表達式

練習題:計算器

元字符:　　. ^ $ * + ? { [ ] \ | ( )