Wednesday, June 22, 2011

Extracting text / reading text from a secured pdf in coldfusion

So you want to extract text from a pdf that has some form of security on it. For example the pdf has the following properties:

Author someone
CenterWindowOnScreen [empty string]
ChangingDocument Not Allowed
Commenting Not Allowed
ContentExtraction Not Allowed
CopyContent Not Allowed

You probably already figured out that this is not possible with the standard cfpdf tag introduced in coldfusion version 8. You will get the following error:

An error occurred during EXTRACTTEXT operation in .
Error: The password provided is either wrong or does not have sufficient permission to perform this action.

However it is possible. A coldfusion programmor called Raymond Camden has written a coldfusion component that bypasses the standard security restrictions on a pdf. This component is called pdfutil and you can find the source code here or if you like to read more about the author you can go here. The pdfutils components uses ddx to bypasses any security issues. The installation is very straightforward so it should not take you long before you can see the results yourself.