Keywords: Open Office, Ghostscript, Do It Yourself Convert Microsoft Word, Excel, Powerpoint to PDF
I’ve written a small program that uses Open Office to open and save different kinds of Microsoft Office files to PDF, and optionally merge them into a single output PDF file using GPL Ghostscript. I posted the code and article at the Code Project: http://www.codeproject.com/KB/java/PDFCM.aspx.
It’s a command line program, and we’re using a simplified version of it in production to do back-office conversions and merges of office files that we get from filling out forms internally and others that we get from customers. There are potentially many documents, and they can vary in size, so it is very cumbersome to cut, paste, print and scan everything to PDF (which is what our staff were doing when I started this project.)
Fortunately, it turns out that (1) one can use PRNADMIN.DLL with a Postscript Printer driver and an ActiveX IE browser to render a web page to Postscript, (2) Open Office can batch convert Microsoft Office files (and many more) to PDF, and (3) Ghostscript will merge Postscript and PDF on the command line.
The printer setup part of step 1 was detailed in a post here a few months ago, and there are a gazillion Code Project articles on using the ActiveX IE web browser to navigate and print. Steps 2 and 3 are detailed in the new Code Project article. I wrote it mainly because although there were many good sources on how to get started with Open Office, I couldn’t find a whole example anywhere on converting files to PDF. In fact, just connecting to Open Office was proving to be a major headache for me until I found the BootstrapSocketConnector. But still, there seemed to be many more questions on the forums about how to do PDF conversion using Open Office than there were answers, so when I got it working (actually, only an hour or so after getting the BootstrapSocketConnector) I decided to pass it along. The intended audience is people who are just getting started (like me) with Open Office. I hope you find it useful!
A few words: the structure of the program is to connect to Open Office and loop over non-PDF input files. Each file is opened using Open Office and exported to PDF using a filter appropriate to its file extension. Then, optionally all results (plus PDF originals on the input) are merged into a single PDF file using Ghostscript by process invocation. There were constraints which made this a logical choice. First, since we have to handle arbitrary files from clients, it is really not feasible to expect this to live in a web application. (Ie- Our users have happy fingers.) Instead, we use remoting with asynchronous user notification; and firing up a process is really no big deal for us. If one were going to be running this in a web application, one might want to refactor this and/or find a Ghostscript alternative.
Second, we added an option [-d] that would cause deletion of input and intermediate files on successful operation. Our company deals with sensitive information, so this was a big selling point.
Third, I was surprised at the sheer amount of goofiness that it took to get this working on both Linux and Windows. While I originally developed this for Windows, I felt a moral pull to validate the program on Linux before writing an article about it. I ran into LF/CR/CRLF line ending problems with Open Office and text files, the ESP Ghostscript that came with my Fedora Core 7 flat out didn’t work for this application (use GPL Ghostscript instead), and the syntax for firing up a process given the command line was slightly different in how the two handled spaces in the command string. All of these problems were solvable, however. (Actually, the LF/CR/CRLF problem may be a bug, but you can work around it with sed.)