Writing Regular Expression with PHP
Just consider how you would make a search for files on your computer. You most likely use the ? and * characters to help find the files you're looking for. The ? character matches a single character in a file name, while the * matches zero or more characters. A pattern such as 'file?.txt' would find the following files:
file1.txt
filer.txt
files.txt
Using the * character instead of the ? character expands the number
of files found. 'file*.txt' matches all of the following:filer.txt
files.txt
file1.txt
file2.txt
file12.txt
filer.txt
filedce.txt
While this method of searching for files can certainly be useful,
it is also very limited. The limited ability of the ? and * wildcard
characters give you an idea of what regular expressions can do, but
regular expressions are much more powerful and flexible.file2.txt
file12.txt
filer.txt
filedce.txt
Let Us Start on RegEx
A regular expression is a pattern of text that consists of
ordinary characters (for example, letters a through z) and special
characters, known as metacharacters. The pattern describes
one or more strings to match when searching a body of text. The
regular expression serves as a template for matching a character
pattern to the string being searched.The following table contains the list of some metacharacters and their behavior in the context of regular expressions:
|
RegEx functions in PHP
PHP has functions to work on complex string manipulation using
RegEx. The following are the RegEx functions provided in PHP.
|
Finding US Zip Code
Now let us see a simple example to match a US 5 digit zip code from a string
<?
$zip_pattern = "[0-9]{5}";
$str = "Mission Viejo, CA 92692";
ereg($zip_pattern,$str,$regs);
echo $regs[0];
?>
This script would output as follows
$zip_pattern = "[0-9]{5}";
$str = "Mission Viejo, CA 92692";
ereg($zip_pattern,$str,$regs);
echo $regs[0];
?>
92692
RegEx for US Phone Numbers
Now let us try to create a RegEx pattern to match a US telephone
number. US telephone numbers are 10 digit numbers usually written with
three parts like xxx xxx xxxx. These three parts are normally used
with – hyphen, () braces, and blank spaces. The most common patterns
can be seen as follows:
XXX XXX XXXX
(XXX) XXX XXXX
XXX-XXX-XXXX
(XXX) XXX-XXXX
In some cases, US ISD code would be added in the first, like +1 XXX XXX XXXX.(XXX) XXX XXXX
XXX-XXX-XXXX
(XXX) XXX-XXXX
Let us create a Perl-Compatible RegEx pattern to match the above patterns. First we would need to match the single digit ISD code (let us not restrict it to 1). But this may or may not available in the phone numbers, hence we would write it as follows:
$Phone_Pattern = “/(\d)?/”;
Here \d is equivalent to 0-9 and the succeeding ‘?’ indicates that the digit may appear one time or doesn’t appear at all.Now what would appear next in the sequence? The possibilities are a blank space or a hyphen. So we would add the pattern “(\s|-)?” with the above RegEx. This pattern indicates that either a blank space or a hyphen may or may not appear. So our RegEx becomes:
$Phone_Pattern = “/(\d)?(\s|-)?/”;
The next sequence would be either XXX or (XXX). To match this
sequence, we need to first match the braces with the pattern “(\()?”. As
we use braces to enclose the patterns in RegEx, braces are
metacharacters and to match these metacharacters explicitly, we need to
use the escape character “\” preceding the metacharacters. Hence we
use “\(“ in our RegEx pattern. Now we need to match the three digits
and a closing braces. So this can be written as “(\d){3}(\))?”. Now our
RegEx is added with these patterns,
$Phone_Pattern = “/(\d)?(\s|-)?(\()?(\d){3}(\))?/”;
After the first part XXX, there should be either a blank space or a hyphen. So we add “(\s|-){1}” to the phone pattern.
$Phone_Pattern = “/(\d)?(\s|-)?(\()?(\d){3}(\))?(\s|-){1}/”;
Further construction of RegEx would be much more simpler, as we need
to match either XXX-XXXX or XXX XXXX. This could be written as
“(\d){3}(\s|-){1}(\d){4}”. Adding this part of pattern to our RegEx,
$Phone_Pattern = “/(\d)?(\s|-)?(\()?(\d){3}(\))?(\s|-){1}(\d){3}(\s|-){1}(\d){4}/”;
Yippee!!! We have created a RegEx to match US phone numbers. Now we need to use this RegEx to perform some task, so that we can understand the significance of RegEx better. Now let us try to script a code to fetch the phone numbers from Google contact us page. So first we need to fetch the html content from Google’s contact us page.
$str = implode("",file("http://www.google.com/intl/en/contact/index.html"));
Then we need to search for the phone number pattern with the help of
our “Just Created” RegEx. If we use the preg_match(), we can fetch only
one match. So to get more than one match we would use
preg_match_all().
preg_match_all($Phone_Pattern,$str,$phone);
Now putting all these pieces into a single script,
<?
$str = implode("",file("http://www.google.com/intl/en/contact/index.html"));
$Phone_Pattern = "/(\d)?(\s|-)?(\()?(\d){3}(\))?(\s|-){1}(\d){3}(\s|-){1}(\d){4}/";
preg_match_all($Phone_Pattern,$str,$phone);
for($i=0;$i<count($phone[0]);$i++)
{
?>
$str = implode("",file("http://www.google.com/intl/en/contact/index.html"));
$Phone_Pattern = "/(\d)?(\s|-)?(\()?(\d){3}(\))?(\s|-){1}(\d){3}(\s|-){1}(\d){4}/";
preg_match_all($Phone_Pattern,$str,$phone);
for($i=0;$i<count($phone[0]);$i++)
{
echo $phone[0][$i]."<br>";
}?>
(650) 253-0000
(650) 253-0001
(650) 253-0001
Wrap Up
No comments:
Post a Comment