Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the copy-the-code domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /var/www/html/pcx3.com/wp-includes/functions.php on line 6121
Using PHP DOM Functions to Parse PHP and Find Links - PC✗3
Using PHP DOM Functions to Parse PHP and Find Links

Using PHP DOM Functions to Parse PHP and Find Links

When developing websites, there are a million and one reasons that you will find yourself needing to parse some HTML to find snippets of information. On the face of it, most of the time a simple regular expression will do the trick, particularly when you are in control of the HTML you are fetching.

When parsing other peoples HTML, you soon find that the tag soup that makes the World Wide Web results in situations and code segments your regular expression was never built to accommodate, resulting in false positives, false negatives… and generally the unexpected.

PHP’s DOM functions are specifically made for XML and X/HTML parsing. So, when you have the need to parse some SGML language, turn to these functions and stay away from regular expressions, the comprehensive DOM library will add, edit and delete any attribute, tag or HTML within tags with its suite of functions.

The following example shows how easy it is to collect hyperlinks from a page or file without the problem of broken HTML, attributes with missing/no quotes, or any other hassle that may impede the collection of links:

/*
  Using PHP's DOM functions to
  fetch hyperlinks and their anchor text
*/
$dom = new DOMDocument;
$dom->loadHTML(file_get_contents('https://pcx3.com/')); // Fetch innvo.com's home page
 
// echo Links and their anchor text
echo '<pre>';
echo "Link\tAnchor\n";
foreach($dom->getElementsByTagName('a') as $link) {
	$href = $link->getAttribute('href');
	$anchor = $link->nodeValue;
	echo $href,"\t",$anchor,"\n";
}
echo '</pre>';

whoami
Stefan Pejcic
Join the discussion

I enjoy constructive responses and professional comments to my posts, and invite anyone to comment or link to my site.