Loading... In this tutorial you will learn how regular expressions work, as well as how to use them to perform pattern matching in an efficient way in PHP. ## What is Regular Expression 正则表达式,通常被称为“regex”或“RegExp”,是一种特殊格式的文本字符串,用于在文本中查找模式。正则表达式是当今最强大的文本处理和操作工具之一。例如,它可以用于验证用户输入的姓名、电子邮件、电话号码等数据的格式是否正确,在文本内容中查找或替换匹配的字符串,等等。 PHP(5.3及以上版本)通过其preg\_系列函数支持Perl风格的正则表达式。为什么使用Perl风格的正则表达式?因为Perl (Practical Extraction and Report Language,实用提取和报告语言)是第一个为正则表达式提供集成支持的主流编程语言,它以其对正则表达式的强大支持以及非凡的文本处理和操作能力而闻名。 Let’s begin with a brief overview of the commonly used PHP’s built-in pattern-matching functions before delving deep into the world of regular expressions. | Function | What it Does | | ------------------ | ----------------------------------------------------------------- | | preg\_match() | Perform a regular expression match. | | preg\_match\_all() | Perform a global regular expression match. | | preg\_replace() | Perform a regular expression search and replace. | | preg\_grep() | Returns the elements of the input array that matched the pattern. | | preg\_split() | Splits up a string into substrings using a regular expression. | | preg\_quote() | Quote regular expression characters found within a string. | > Note: The PHP preg\_match() function stops searching after it finds the first match, whereas the preg\_match\_all() function continues searching until the end of the string and find all possible matches instead of stopping at the first match. ## Regular Expression Syntax Regular expression syntax includes the use of special characters (do not confuse with the HTML special characters). The characters that are given special meaning within a regular expression, are: . \* ? + [ ] ( ) { } ^ \$ | \\. You will need to backslash these characters whenever you want to use them literally. For example, if you want to match “.”, you’d have to write \\.. All other characters automatically assume their literal meanings. The following sections describe the various options available for formulating patterns: ## Character Classes Square brackets surrounding a pattern of characters are called a character class e.g. [abc]. A character class always matches a single character out of a list of specified characters that means the expression [abc] matches only a, b or c character. Negated character classes can also be defined that match any character except those contained within the brackets. A negated character class is defined by placing a caret (^) symbol immediately after the opening bracket, like this [^abc]. You can also define a range of characters by using the hyphen (-) character inside a character class, like [0-9]. Let’s look at some examples of character classes: | RegExp | What it Does | | -------- | -------------------------------------------------------------- | | [abc] | Matches any one of the characters a, b, or c. | | [^abc] | Matches any one character other than a, b, or c. | | [a-z] | Matches any one character from lowercase a to lowercase z. | | [A-Z] | Matches any one character from uppercase a to uppercase z. | | [a-Z] | Matches any one character from lowercase a to uppercase Z. | | [0-9] | Matches a single digit between 0 and 9. | | [a-z0-9] | Matches a single character between a and z or between 0 and 9. | The following example will show you how to find whether a pattern exists in a string or not using the regular expression and PHP preg\_match() function: ```php <?php $pattern = "/ca[kf]e/"; $text = "He was eating cake in the cafe."; if(preg_match($pattern, $text)){ echo "Match found!"; } else{ echo "Match not found."; } ?> ``` Similarly, you can use the preg\_match\_all() function to find all matches within a string: ```php <?php $pattern = "/ca[kf]e/"; $text = "He was eating cake in the cafe."; $matches = preg_match_all($pattern, $text, $array); echo $matches . " matches were found."; ?> ``` > Tip: Regular expressions aren’t exclusive to PHP. Languages such as Java, Perl, Python, etc. use the same notation for finding patterns in text. ## Predefined Character Classes Some character classes such as digits, letters, and whitespaces are used so frequently that there are shortcut names for them. The following table lists those predefined character classes: | Shortcut | What it Does | | -------- | -------------------------------------------------------------------------------------------------------- | | . | Matches any single character except newline\\n. | | \\d | matches any digit character. Same as[0-9] | | \\D | Matches any non-digit character. Same as[^0-9] | | \\s | Matches any whitespace character (space, tab, newline or carriage return character). Same as[\\t\\n\\r] | | \\S | Matches any non-whitespace character. Same as[^\\t\\n\\r] | | \\w | Matches any word character (definned as a to z, A to Z,0 to 9, and the underscore). Same as[a-zA-Z\_0-9] | | \\W | Matches any non-word character. Same as[^a-zA-Z\_0-9] | The following example will show you how to find and replace space with a hyphen character in a string using regular expression and PHP preg\_replace() function: ```php <?php $pattern = "/\s/"; $replacement = "-"; $text = "Earth revolves around\nthe\tSun"; // Replace spaces, newlines and tabs echo preg_replace($pattern, $replacement, $text); echo "<br>"; // Replace only spaces echo str_replace(" ", "-", $text); ?> ``` ## Repetition Quantifiers In the previous section we’ve learnt how to match a single character in a variety of fashions. But what if you want to match on more than one character? For example, let’s say you want to find out words containing one or more instances of the letter p, or words containing at least two p’s, and so on. This is where quantifiers come into play. With quantifiers you can specify how many times a character in a regular expression should match. The following table lists the various ways to quantify a particular pattern: | RegExp | What it Does | | ------ | ------------------------------------------------------------------------------------------------------ | | p+ | Matches one or more occurrences of the letter p. | | p\* | Matches zero or more occurrences of the letter p. | | p? | Matches zero or one occurrences of the letter p. | | p{2} | Matches exactly two occurrences of the letter p. | | p{2,3} | Matches at least two occurrences of the letter p, but not more than three occurrences of the letter p. | | p{2,} | Matches two or more occurrences of the letter p. | | p{,3} | Matches at most three occurrences of the letter p | The regular expression in the following example will splits the string at comma, sequence of commas, whitespace, or combination thereof using the PHP preg\_split() function: ```php <?php $pattern = "/[\s,]+/"; $text = "My favourite colors are red, green and blue"; $parts = preg_split($pattern, $text); // Loop through parts array and display substrings foreach($parts as $part){ echo $part . "<br>"; } ?> ``` ## Position Anchors There are certain situations where you want to match at the beginning or end of a line, word, or string. To do this you can use anchors. Two common anchors are caret (^) which represent the start of the string, and the dollar (\$) sign which represent the end of the string. | RegExp | What it Does | | ------ | ------------------------------------------------ | | ^p | Matches the letter p at the beginning of a line. | | p\$ | Matches the letter p at the end of a line. | The regular expression in the following example will display only those names from the names array which start with the letter “J” using the PHP preg\_grep() function: ```php <?php $pattern = "/^J/"; $names = array("Jhon Carter", "Clark Kent", "John Rambo"); $matches = preg_grep($pattern, $names); // Loop through matches array and display matched names foreach($matches as $match){ echo $match . "<br>"; } ?> ``` ## Pattern Modifiers 模式修饰符(pattern modifier)允许你控制处理模式匹配的方式。模式修饰符直接放在正则表达式的后面,例如,如果你想以不区分大小写的方式搜索模式,可以使用i修饰符,如/ Pattern /i。下表列出了一些最常用的模式修饰符。 | Modifier | What it Does | | -------- | -------------------------------------------------------------------------------------- | | i | Makes the match case-insensitive manner. | | m | 更改^和\$的行为以匹配换行边界(即多行字符串中每行的开始或结束),而不是字符串边界。<br/> | | g | Perform a global match i.e. finds all occurrences. | | o | Evaluates the expression only once. | | s | Changes the behavior of . (dot) to match all characters, including newlines. | | x | 允许你在正则表达式中使用空格和注释,以使代码更清晰。<br/> | The following example will show you how to perform a global case-insensitive search using the i modifier and the PHP preg\_match\_all() function. ```php <?php $pattern = "/color/i"; $text = "Color red is more visible than color blue in daylight."; $matches = preg_match_all($pattern, $text, $array); echo $matches . " matches were found."; ?> ``` Similarly, the following example shows how to match at the beginning of every line in a multi-line string using ^ anchor and m modifier with PHP preg\_match\_all() function. ```php <?php $pattern = "/^color/im"; $text = "Color red is more visible than \ncolor blue in daylight."; $matches = preg_match_all($pattern, $text, $array); echo $matches . " matches were found."; ?> ``` ## Word Boundaries 单词边界字符(\\b)帮助你搜索以模式开始and/or结束的单词。例如,regexp的/\\bcar/匹配以car开头的单词,它会匹配cart、carrot或cartoon,但不会匹配oscar。 类似地,regexp的/car\\b/匹配以car结尾的单词,它会匹配scar、oscar或supercar,但不会匹配cart。类似地,/\\bcar\\b/匹配以car开头和结尾的单词,并且只匹配单词car。 下面的例子将突出显示以car开头的粗体单词: ```php <?php $pattern = '/\bcar\w*/'; $replacement = '<b>$0</b>'; $text = 'Words begining with car: cart, carrot, cartoon. Words ending with car: scar, oscar, supercar.'; echo preg_replace($pattern, $replacement, $text); ?> //Words begining with <b>car</b>: <b>cart</b>, <b>carrot</b>, <b>cartoon</b>. Words ending with <b>car</b>: scar, oscar, supercar. ``` We hope you have understood the basics of regular expression. To learn how to validate form data using regular expression, please check out the tutorial on [PHP Form Validation](http://www.bixiaguangnian.com/manual/php7/3991.html "PHP Form Validation"). Last modification:September 12, 2024 © Allow specification reprint Like 如果觉得我的文章对你有用,请随意赞赏