Using PHP DOM Functions to Parse PHP and Find Links

Using PHP DOM Functions to Parse PHP and Find Links

When developing websites, there are a million and one reasons that you will find yourself needing to parse some HTML to find snippets of information. On the face of it, most of the time a simple regular expression will do the trick, particularly when you are in control of the HTML you are fetching.

When parsing other peoples HTML, you soon find that the tag soup that makes the World Wide Web results in situations and code segments your regular expression was never built to accommodate, resulting in false positives, false negatives… and generally the unexpected.

PHP’s DOM functions are specifically made for XML and X/HTML parsing. So, when you have the need to parse some SGML language, turn to these functions and stay away from regular expressions, the comprehensive DOM library will add, edit and delete any attribute, tag or HTML within tags with its suite of functions.

The following example shows how easy it is to collect hyperlinks from a page or file without the problem of broken HTML, attributes with missing/no quotes, or any other hassle that may impede the collection of links:

/*
  Using PHP's DOM functions to
  fetch hyperlinks and their anchor text
*/
$dom = new DOMDocument;
$dom->loadHTML(file_get_contents('https://pcx3.com/')); // Fetch innvo.com's home page
 
// echo Links and their anchor text
echo '<pre>';
echo "Link\tAnchor\n";
foreach($dom->getElementsByTagName('a') as $link) {
	$href = $link->getAttribute('href');
	$anchor = $link->nodeValue;
	echo $href,"\t",$anchor,"\n";
}
echo '</pre>';

whoami
Stefan Pejcic
Join the discussion

I enjoy constructive responses and professional comments to my posts, and invite anyone to comment or link to my site.