Forking with PHP from the command line 👨‍💻

Forking with PHP from the command line 👨‍💻

Forking new processes is an extremely handy function in programming that allows you to run tasks in parallel to one another, from a single invocation of a program.

You may be interested in forking if:

  • You have a multi-processor/threaded CPU and want to utilise it more effectively
  • You want something to run in the background while your main thread of execution continues
  • You have a set of tasks that take an appreciable time to complete, but do not rely on the results of one another to complete.

As ever, an introdution to the concept is available in the PHP manual.

It is worth noting early on that forking is slightly different to threading, which is described in more detail in this StackOverflow question. Historically threading has not been available in PHP though there has been developments in remedying that.

One popular example usage is HTTP fetching. Fetching is a relatively slow process because of all the latency involved in talking to servers across the world. If you have a queue of 1000 URLs to fetch and each URL takes 3 seconds to fetch, it will take 3000 seconds to fetch all the URLs. Slow or unresponsive servers mean that your average is higher, and that URLs later in the queue have to wait for all the slower URLs in front of it to be fetched.

With forking (or threading), you can split the workload between instances of the script. In the URL fetching example for instance, you could create 10 forks of the fetching script that will fetch 100 URLs each. This should dramatically speed up the time it takes to fetch all the URLs, because if one particular URL is slow, your 9 other forked scripts will still be fetching the URLs in their queue.

I have provided skeleton code below to give you an idea of how it can work for you.

declare(ticks = 1);
		
class socket {
    static private $sockfile = 'server.sock'; // UNIX sockets require a socket file to bind to
    static $numforks = 5; // Number of children to create
    static $numlines = 1; // Number of jobs to give each child
    static $inputfile = 'input.tsv'; // Input file containing jobs to do
    
    static function init() { 
        file_exists(self::$sockfile) && unlink(self::$sockfile);
        // Create a counter for children to indicate when they're done
        file_put_contents("counter.txt",str_pad("0","0",self::$numforks));
        // Server
        $pid = pcntl_fork();
            if($pid != -1 && !$pid)
                socket::server(); // Line server to children
        usleep(250000);
        // Children
            for($i = 0;$i < socket::$numforks;$i++) {
                $pid = pcntl_fork();
                if($pid != -1 && !$pid)
                    socket::child($i); // New child
            }
        // Wait for the counter to indicate completion
        $complete = str_pad("1","1",self::$numforks);
            while(1) {
                $count = substr_count(file_get_contents('counter.txt'),'1');
                echo $count.' / '.self::$numforks.' complete'."\n";
                    if($count == self::$numforks)
                        break;
                sleep(1);
            }
        exit(0);
    }
    
    static function server() {
        // This socket distributes tasks to the children by giving the children $numlines from input.tsv. 
        // The children communicate with this via a unix socket
        $s = stream_socket_server("unix://".self::$sockfile,$errno,$errstr,STREAM_SERVER_BIND | STREAM_SERVER_LISTEN);
        $fp = fopen(self::$inputfile,"rb");
            while(1) {
                $lines_sent = 0;
                $conn = stream_socket_accept($s);
                while((++$lines_sent <= self::$numlines))
                    fwrite($conn,fgets($fp));
                fclose($conn);
                if(feof($fp))
                    break;
            }
        // It ends when the end of the file is reached
        fclose($fp);
        fclose($s);
        exit(0);
    }
    
    static function child($forkid) { 
        // The child will continually fetch lines and do its job until there's nothing more to do
            while(@$s = stream_socket_client("unix://".self::$sockfile,$errno,$errstr)) {
                    
                if(!($line = trim(fread($s,65535)))) {
                    fclose($s);
                    break;
                }    
                echo "Fork $forkid has |$line|\n";
                /*
                    Let the child do its work here
                */    
                fclose($s);
                usleep(mt_rand(90000,150000)); // This lets you see that jobs are distributed on demand to each fork by delaying requests slightly
            }
        // Tell the counter that this child is finished working
        echo "Fork $forkid complete\n";
        $fp = fopen("counter.txt","rb+");
        fseek($fp,$forkid);
        fwrite($fp,"1");
        fclose($fp);
        exit(0);
    }
}
 
// Perform the task
socket::init();

One important thing to consider when forking scripts is to avoid the nastiness of a fork bomb or the unpredictability of a race condition. Bear these concepts in mind as you delve into the usefulness of multi-tasking with forks or threads.

Workarounds for this problem are quite easy. In a text file for instance, you would want each script instance to grab every 10th line, so the 1st fork would grab the 1st line, the 11th line, the 21st line etc. Alternatively, you can have one fork that “serves” lines to the other forks (like in the example above), so that each line is only issued once. If you’re using a database as input and it has an auto-increment field, simply using a modulus of the auto-increment as a quick’n’dirty way to delegate an equal number of rows to each fork. Essentially, you’re looking to keep each fork busy and avoid allocating the same job twice.

whoami
Stefan Pejcic
Join the discussion

I enjoy constructive responses and professional comments to my posts, and invite anyone to comment or link to my site.