puppeteer: getting the number of pages in a PDF

in Okticket, we are building a serverless service that generates PDF from HTML. This feature was previously implemented using wkhtmltopdf, but we are migrating to Puppeteer for better performance and stability, as the previous engine is severely outdated and clogs up the API when under heavy load.

the first step of this transition is using Puppeteer to render the PDF from the HTML content generated by the API using Blade templates. In this process, some specific features are required, such as the ability to get the number of pages in the generated PDF.

after a quick search, the seemingly most straightforward way to get the number of pages in a PDF generated by Puppeteer is to use a pdf library to read the file and count the pages. However, this approach is far from ideal, as we would need to render the PDF twice: once to get the number of pages and once to generate the final PDF with the page count.

const pdfBuffer = await page.pdf({ ... });

const pdfDoc = await PDFDocument.load(pdfBuffer);
const pageCount = pdfDoc.getPages().length;
this.logger.debug(`PDF has ${pageCount} pages`);

however, Puppeteer has built-in style classes that inject certain information in the PDF, including the page count:

pageNumber: current page number.
totalPages: total number of pages in the PDF.
date: current date.
title: title of the document.
url: URL of the document.

replacing the built-in classes of wkhtmltopdf with Puppeteer's classes in the blade templates allows us to get the page count directly from the PDF without the need to render it twice, cutting the processing time in half!

more TILs

created: 2024-08-15

source: /content/collections/til/pdf-count

0.23.7