Greedy and Non-Greedy Regex Pattern Matching


Greedy and Non-Greedy Regular Expressions

Writing regular expressions allows you to make you custom matches the way you want. One particular meta character that can make pattern matching easier for large blocks of repetitive code is to add a '?' after a greedy quantifier '*'. If you add a '?' after a greedy quantifier, it will make the match as fewest times as possible. Oppositely, if you do not use a '?' after a greedy quantifier it will try to make the match as many times as possible; thus you will end up with just one match since your expression will grab everything in between the first and last match.

The simplest way to show this is through the following coding samples.

The example below will match the first '<table>' tag and everything in between until it finds the last '</table>' tag in the document.

 $data = file_get_contents('http://localhost/ebay-pi.html'); $regular_expression = '/<table\s*listingId(.*)<\/table>/si';   //Make an array of all matches preg_match_all($regular_expression,$data,$posts, PREG_SET_ORDER); 

The example below will match every starting '<table>' table tag to its matching ending '</table>' tag. This method is often a choice for matching a pattern that occurs throughout a string(or file) and would be probably be desired for web scraping.  

 $data = file_get_contents('http://localhost/ebay-pi.html'); $regular_expression = '/<table\s*listingId(.*?)<\/table>/si';   //Make an array of all matches preg_match_all($regular_expression,$data,$posts, PREG_SET_ORDER);