Replace query argument pagination with WordPress paged pretty URLs

This post is more than 10 years old.

When integrating non-WordPress PHP software into WordPress, sometimes the two butt heads over little things; pagination is one such thing. WordPress likes to move pagination into the pretty URL and out of query parameters. If your non-WordPress software generates content with URLs that have page= in query parameters, that means a redirect each time such a URL is fetched. A little regular expression magic can help fix that, with some assembly required.

A lot of non-WordPress packages generate URLs with pagination as part of the query string, like this:

http://www.example.com/products/?page=2&sort=title

When your permalink structure is set to a pretty URL scheme, WordPress will intercept that and redirect the browser to load the page here instead:

http://www.example.com/products/2/?sort=title

Pretty, but it adds another trip to the server — and anything that adds trips to the server can slow things down, potentially losing customers along the way! Always best to avoid that if we can.

Here’s a little snippet that takes the output HTML from something that generates the first type of URL, and replaces all instances with the second type of URL. It uses regular expressions to find the URLs in the HTML, and passes them to a function which converts them to WordPress pretty URLs. The magic is performed by the following PHP and WordPress functions:

  • preg_replace_callback — regular expression search/replace using a callback function
  • parse_url — break a URL down into its components
  • parse_str — break a URL query string down into an associative array of arguments
  • trailingslashit — add a trailing slash to a URL if it doesn’t have one already
  • user_trailingslashit — same as trailingslashit, but only if the website’s permalink structure wants it
  • add_query_arg — add an array of arguments to a URL as a query string
/**
* replace query-paged URLs with canonical paged URLs
*/
$output = preg_replace_callback('@"(https?://[^"]+[&?]page=[^"]+)"@', function ($matches) {

    // break URL down into parts
    $parts = parse_url($matches[1]);

    // compose base URL again, without query
    $canonical = trailingslashit("{$parts['scheme']}://{$parts['host']}{$parts['path']}");

    // extract / remove page argument from query
    parse_str($parts['query'], $args);
    $page = $args['page'];
    unset($args['page']);

    // add page component to URL, if greater than 1
    if ($page > 1) {
        $canonical .= user_trailingslashit($page, 'single_paged');
    }

    // recombine as URL with query args
    $canonical = add_query_arg($args, $canonical);

    // add the enclosing quotes back, and return
    return '"' . $canonical . '"';

}, $output);

There, job is done a little more directly.