Mutliprocessing with pcntl_fork() to speed up image resizing

There are a lot of file uploads and displays involved in the automobile industry. As such, I wanted to try to resize and crop to different sizes the original image when the user uploads the file before I send the response back to the browser. I could enqueue an async task to handle that, but I was wondering if it was possible to do it synchronously as a proof of concept. I had read about PHP Fibers and gave it a failed attempt, that's when I discovered pcntl_fork() and the cool realm of Process Control Functions in PHP.

The PHP documentation states that Process Control should not be enabled within a web server environment since unexpected results may happen. So use the code snippets below in production at your own risks.

Consider the following code that iterates over all the pictures inside a directory containing 31 pictures. I'm creating four different sizes of each original picture using the ffmpeg executable.

$start = microtime(true);
foreach (new DirectoryIterator('/path/to/your/directory') as $item) {
  /** @var SplFileInfo $item */
  if ($item->getExtension() === 'jpg') {
    $input = $item->getPathname();
    $baseName = $item->getBasename('.' . $item->getExtension());
    foreach ([800,600,400,200] as $size) {
      $destination = $item->getPath() . "/resized/$baseName-$size.jpg";
      exec("ffmpeg -loglevel error -i $input -vf scale=$size:-1 $destination -y");
    }
  }
}
$end = microtime(true);
echo 'Directory processed in ' . round($end - $start, 1) . ' seconds' . PHP_EOL;

// => Directory processed in 24.3 seconds

As we can see on my MacBook M1 it takes 24 seconds to handle the complete folder.

With one child process per image size

Now let's use the multiprocessing at our disposal and fork a new process for each image size.

$start = microtime(true);
$pids = [];
foreach (new DirectoryIterator('/path/to/your/directory') as $item) {
  /** @var SplFileInfo $item */
  if ($item->isFile() && $item->getExtension() === 'jpg') {
    $input = $item->getPathname();
    $baseName = $item->getBasename('.' . $item->getExtension());
    foreach ([800, 600, 400, 200] as $size) {
      $pid = pcntl_fork();
      if ($pid == -1) {
        die('could not fork');
      } elseif ($pid) {
        // parent
        $pids[] = $pid;
      } else {
        // child
        $destination = $item->getPath() . "/resized/$baseName-$size.jpg";
        exec("ffmpeg -loglevel error -i $input -vf scale=$size:-1 $destination -y");
        exit(0);
      }
    }
  }
}
// Wait for all child processes to finish
foreach ($pids as $pid) {
  pcntl_waitpid($pid, $status);
}
$end = microtime(true);
echo 'Directory processed in ' . round($end - $start, 1) . ' seconds' . PHP_EOL;

// => Directory processed in 5.7 seconds

Well, that's almost a 20-seconde performance gain, which is encouraging despite a humongous CPU spike. Let's see if we can keep going.

With one child process per file

Instead of spawning a child process for each size which creates a ton of child processes and an unmanagable (in PHP) CPU spike, what about forking the process for each file instead.

$start = microtime(true);
$pids = [];
foreach (new DirectoryIterator('/Users/michael/Downloads/wetransfer_bolide-e-trail-black_2024-02-07_0744') as $item) {
  /** @var SplFileInfo $item */
  if ($item->isFile() && $item->getExtension() === 'jpg') {
    $pid = pcntl_fork();
    $input = $item->getPathname();
    $baseName = $item->getBasename('.' . $item->getExtension());
    if ($pid == -1) {
      die('could not fork');
    } elseif ($pid) {
      // parent
      $pids[] = $pid;
    } else {
      // child
      foreach ([800, 600, 400, 200] as $size) {
        $destination = $item->getPath() . "/resized/$baseName-$size.jpg";
        exec("ffmpeg -loglevel error -i $input -vf scale=$size:-1 $destination -y");
      }
      exit(0);
    }
  }
}
// Wait for all child processes to finish
foreach ($pids as $pid) {
  pcntl_waitpid($pid, $status);
}
$end = microtime(true);
echo 'Directory processed in ' . round($end - $start, 1) . ' seconds' . PHP_EOL;

// => Directory processed in 3.3 seconds

As we can see, it's faster this way. Even though the "one child process per size" script theoretically allows for more parallelism, the overhead of creating and managing a larger number of child processes is making it slower than the "one child process per file".

So there is a balance to be found between the number of CPU cores, the amount of available memory, the overhead of creating a new process, and the time to execute the task in each process.

A safer alternative to exec()

On most production servers exec() is disabled by default for security reasons. But there is another way to execute commands from PHP, using proc_open() provides more control over the executed process, and it's often allowed when exec() is disabled. It only adds a few lines of code.

We can replace

exec("ffmpeg -loglevel error -i $input -vf scale=$size:-1 $destination -y");

by

$process = proc_open("ffmpeg -loglevel error -i $input -vf scale=$size:-1 $destination -y", $descriptorspec, $pipes);
        if (is_resource($process)) {
          fclose($pipes[0]);
          fclose($pipes[1]);
          proc_close($process);
        }

Here you go, I want to point out again that this is a proof of concept I used to familiarize myself with PHP Process Control functions. I have not used it in production, though I might one day, for a small controlled number of child processes. If I can create four different sizes of 31 pictures in 3 seconds, then creating two sizes on a single file would be extremely fast on a server or even a VPS.

Category:
Languages
Tags:
php