- Sun, 2011-09-11 20:36
- 0 Comments
PDF documents are everywhere and are the best way for printing documents from the web. Imagine if you were able to convert a web page to a PDF document! I found an excellent tool for converting HTML web pages to PDF. I started using this tool with our largest clients, and have now moved it to wherever else I need the functionality, including my own web sites.
The tool I use to convert HTML to PDF is wkhtmltopdf. This tool renders a PDF document by running the HTML, CSS, and JavaScript code through the open source webkit rendering engine. Webkit is the engine behind the popular browsers Safari and Google Chrome.
How To Install
On Windows, the installation is fairly simple. From the Google Code website, download the windows installer executable and install on the computer where you would like to use it. Once the install is complete, the software is ready to use.
On Linux, wkhtmltopdf can be installed by downloading the code to the web server, then running any of the popular package management software to assist in the installation. With Ubuntu, wkhtmltopdf is included in their package repository and is as simple to install and update as any other software.
How To Create PDF Files
The easiest way to jump in and create PDF files is using the command line. This first example demonstrates how to convert a popular website to a PDF file. Navigate to the directory where the HTML to PDF conversion software was installed, then type the following:
wkhtmltopdf www.google.com google.pdf
The command above starts by specifying the name of the program to run, then adds a couple of parameters. The first parameter is the web address to load and convert. The second parameter is the name of the PDF file to save to disk. The full list of parameters can be found in the readme file in the install package.
Example Integration With PHP
The wkhtmltopdf software can do much more than the simple command line example above. Let's look at a way to convert a dynamic web page to PDF. This first piece of code handles the html part of the process.
The "a" tag creates a link to run the JavaScript function "printPDF()". The printPDF function is detailed below.
-
function printPDF() {
-
var h = '<style></style>'+$('#article').html();
-
var html = '<html><body><scr'+'ipt>';
-
html += 'var form = document.createElement("form");';
-
html += 'form.setAttribute("method","post");';
-
html += 'form.setAttribute("action","/ajax/pdf/");';
-
html += 'var h = document.createElement("input");';
-
html += 'h.setAttribute("type","hidden");';
-
html += 'h.setAttribute("name","html");';
-
html += 'h.setAttribute("value","'+escape(h)+'");';
-
html += 'form.appendChild(h);';
-
html += 'document.body.appendChild(form);';
-
html += 'form.submit();';
-
html += '</sc'+'ript></body></html>';
-
var w = window.open('','');
-
w.document.write(html);
-
w.document.close();
-
}
The function above pulls the html from the page and stores it in a local variable. Then, HTML is added to the variable "html" and will run JavaScript code to post the HTML code to convert to PDF to the web server. At the end of the function, a new window is opened, then the HTML code to be converted will be posted to the server. The PHP code to handle the conversion request is detailed below.
-
class AjaxController extends Zend_Controller_Action {
-
public function pdfAction() {
-
$this->_helper->layout()->disableLayout();
-
$this->_helper->viewRenderer->setNoRender();
-
if ($this->getRequest()->isPost() {
-
$filepath = '/www/html/';
-
$pdfpath = '/www/pdf/';
-
$pdfcmd = 'wkhtmltopdf.sh '.$filepath.$filename.'.html '.$pdfpath.$filename.'.pdf --disable-javascript';
-
echo $buffer;
-
}
-
}
-
}
-
}
-
}
The above PHP code handles the html posted from the web page and converts it to a PDF file. You may notice the strange layout of this PHP code. Here, I am using the Zend Framework to serve pages for my web site.
The standard layouts are disabled so the PDF file can be output without the template code being rendered. Then, then HTML code from the web page is saved to a HTML file on the server. In the middle of this code block, you can see wkhtmltopdf being called with the same parameters as in the command line example, but a 3rd parameter is added. The 3rd parameter disables any JavaScript running in the posted code, just in case someone manages to add some extra code to the request. Next, the PDF file is buffered to the browser window with the PDF type in the header. Last, the HTML and PDF files created on the server are deleted.
That's the whole process! I have found this software to being incredibly useful, and I hope you do as well.





Post new comment