CaRusthus: Puppeteer on Rust

Puppeteer is a powerful tool to control web application at server side using headless chromium.
This is mainly used for automated testing, but one of the useful feature is generating pdf from html.

One of the main problem of generating PDF from html on client side browser is that the layout may change when the browser is updated(Actually I only considering Chrome/chromium).

So if the PDF generation is done at server side, we can fix the specific version of chromium at server side, but client can update the browser without loosing correct layout of pdf.

This Puppeteer can be installed by 'npm i puppeteer'
Following is the sample code for support httpserver to convert html into pdf.
This service takes parameters, and generate html string from them and create pdf from the generated html string, so no file system is used for html and pdf.(efficient)

const http = require('http');

http.createServer((request, response) => {
    //console.log('request.url: '+request.url)
    //console.log('request.method: '+request.method)
    if (request.method === 'POST' && request.url === '/nodejs/template-pdf-gen') {
      var ctx = null;
      request.on('data', (chunk) => {
        ctx = JSON.parse(chunk);
      }).on('end', async () => {
        const browser = await puppeteer.launch({ headless: true })
        const page = await browser.newPage();
        var url = 'file://'+ctx.webRootPath.replace(/\\/g, '/');
        var html = generate_html(ctx);

        await page.goto(url);
        await page.setContent(html);

        var pdf = await page.pdf({
          format: 'A4',
          margin: {
                top: "20px",
                left: "20px",
                right: "20px",
                bottom: "20px"
          }
        });
        await browser.close();
        response.end(pdf);
      });
    } else {
      response.statusCode = 404;
      response.end();
    }
}).listen(8989);

There are a few trick in the above code.
1) in order to process link element in the HTML file which refer to the local file, we need to use page.goto(url), before calling page.setContent(html)

2) the root web location must be provided from client.(we may hard code this, but if it is in a war file, we cannot hard code the folder location statically).

3) in case of Java servlet, there is a method getServletContext().getRealPath("/") which provide this info. webRootPath in the above code has this value.

----

It we are only interested in generating pdf, there is way to use RUST library to do the same job.
https://github.com/atroche/rust-headless-chrome

there are another similar tool but not restricted to chrome:https://github.com/jonhoo/fantoccini

CaRusthus

Friday, January 31, 2020

Puppeteer on Rust

No comments:

Post a Comment

Recursive Matrix and the parallel matrix multiplication using crossbeam and generic constant

Report Abuse