Published on

Sitecore Headless Next.js ISR Unpublish Not Working

Authors

Unpublishing an item in Sitecore is usually a non-trivial operation. You navigate to the page item, open publishing restrictions, set the entire item to unpublishable, and publish the item. Boom. That item and it's descendants are wiped out from the target database. You go to view the page... and it's still there? You've checked the target database and the item is gone. An index rebuild doesn't sound necessary, but you do it anyways, only to see the page still there on the rendering host's browser. It seems like the only thing that fixes the issue is deleting the physical generated html/json files on the server. Once the files are deleted, the application returns the expected 404 Not Found.

Sitecore Support was able to reproduce the issue on their edge environment. Luckily, if you're on XM Cloud, the current workaround is to publish the "parent" of the item expected to be unpublished. A bug with a reference number 604866 was registered for this. If you're self-hosting or want to learn more about more features of self-hosting, read further for my workaround.

Self Hosting Background

For background, this issue occurs when the following criteria is met:

  • Standalone build output mode is configured via nextConfig.output = 'standalone'
  • nextConfig.experimental.isrMemoryCacheSize: 0 is disabled
  • The Next.js application is using SSG w/ ISR
  • The Next.js application is using page router
  • The Next.js application is either hosted using a Linux App Service or Kubernetes

Your mileage may vary with the above cases. I haven't had a chance to thoroughly test whether the issue still occurs without standalone built output mode, but it's definitely a side-effect of disabling isrMemoryCacheSize. Let's go through these one by one. If you'd like to skip to the actual workaround, click here.

Standalone build output mode who?

This is a Next.js build feature that helps reduce the size of deployments drastically. If you want to use this feature, make sure that you handle copying the static assets over to the Next.js build afterwards.

I'll report back here with a post on handling standalone output.

Why did I disable isrMemoryCacheSize?

I first came across this property when I was troubleshooting our environments for an issue where refreshing the page would result in sporadic behavior of the content being shown. The page would switch back and forth between different component datasources, showing the old datasource content randomly. This was occuring on our Next.js App Service whether or not it was scaled out past 1 instance. Next.js's Self-hosting ISR page led me to this workaround since it mentions caching issues, which resolved the sporadic behavior.

SSG w/ ISR - Doesn't that revalidate pages?!

I know what you're thinking - "Incremental Static Regeneration revalidates every N seconds". While this is true, the Next.js code revalidates differently based on the notFound property that is returned from getStaticProps on the dynamic page route. It revalidates correctly if the provider used is Memory Cache, but stale data doesn't get checked for the File System based cache. You can read more about the issue on Vercel's github at Issue 37945 where I've also commented asking for a timline and the PR that supposedly fixes it.

Believe it or not, Next.js with SSG/ISR doesn't actually remove the generated static page files with a revalidate. Go ahead, try it yourself. The revalidate will purge the cache provider and only update the file system when a regeneration occurs.

Using Page Router

This one's pretty easy. If you're using Sitecore's JSS Next.js template, you're using page router. App Router is on Sitecore's radar for 2024.

Self Hosting Workaround

Since we know that deleting the file-based files resolves the issue, that's the route we're going to take. When the notFound flag is true within the dynamic catch-all page route, we can purge the current request's html and json files. Let's programatically delete some files.

purge-files.ts

First we can set up the purgeNextjsStaticFiles method. This abstraction will use node to purge static files based on the current slug. A sample generated path would be: /site/wwwroot/.next/server/pages/en/_mysite_/landing-page/detail-page.

import { readdir, rmdir, stat, unlink } from 'node:fs/promises';

export const purgeNextjsStaticFiles = async (pathSlugs: string[]) => {
  if (!process.env.PWD) {
    console.log('purgeFiles: process.env.PWD is undefined');
    return;
  }
  if (!Array.isArray(pathSlugs) || pathSlugs.length < 1) {
    return;
  }

  const fullSlugPath = `${process.env.PWD}/.next/server/pages/${pathSlugs.join('/')}`;
  const parentFolder = `${process.env.PWD}/.next/server/pages/${pathSlugs.slice(0, pathSlugs.length - 1).join('/')}`;

  const htmlPath = `${fullSlugPath}.html`;
  const jsonPath = `${fullSlugPath}.json`;

  try {
    const htmlStat = await stat(htmlPath);
    const jsonStat = await stat(jsonPath);

    if (htmlStat && jsonStat) {
      try {
        await unlink(htmlPath);
        await unlink(jsonPath);

        if (pathSlugs.length > 1) {
          const readParent = await readdir(parentFolder);

          if (readParent.length === 0) {
            // the folder is now empty, let's delete it
            await rmdir(parentFolder);
          }
        }
      } catch (unlinkError) {
        console.error(`Error trying to purge nextjs static files with base (ending in .json or .html) ${fullSlugPath}`);
        console.error(unlinkError);
      }
    }
  } catch (e) {
    // expected in certain scenarios, stating non existant files etc
    console.log(e);
  }
};

[[...path]].tsx

The context from getStaticProps will provide us with the context.params.path and the context.locale. We can further abstract this as a method, getSlugsFromPageContext, to return us the fully built slug, which are the paths split between the url slashes.

// This function gets called at build time on server-side.
// It may be called again, on a serverless function, if
// revalidation (or fallback) is enabled and a new request comes in.
export const getStaticProps: GetStaticProps = async (context) => {
  const props = await sitecorePagePropsFactory.create(context);
  props.appUrl = process.env['PUBLIC_URL'] || '';

  // if the page isn't found in CMS, purge the next.js static files (fix for issue: https://github.com/vercel/next.js/issues/37945)
  if (props.notFound) {
    const slugs = getSlugsFromPageContext(context);
    await purgeNextjsStaticFiles(slugs);
  }

  return {
    props,
    // Next.js will attempt to re-generate the page:
    // - When a request comes in
    // - At most once every 5 seconds
    revalidate: 5, // In seconds
    notFound: props.notFound, // Returns custom 404 page with a status code of 404 when true
  };
};

function getSlugsFromPageContext(
  context: GetStaticPropsContext<ParsedUrlQuery, PreviewData>,
  keepSiteInSlug = true
): string[] {
  const pathParts = context.params ? (context.params['path'] as string[]) : [];
  if (!pathParts.length) return [];

  if (context.locale?.length) {
    pathParts.unshift(context.locale);
  }

  if (!keepSiteInSlug) {
    return pathParts.filter((path, index) => {
      // first slug contains the multisite part "_site_upmchealthplan" that needs to be stripped out when keepSiteInSlug = false
      if (index === 0 && path.includes('_site_')) return false;
      return true;
    });
  }

  return pathParts;
}

After deploying to your self-hosting environment, we can now see that after an unpublish operation, the static page is now deleted from the file system. For performance reasons, I would recommend increasing the revalidate to 60 seconds or greater when notFound is true AND the page is deleted.