• Home
  • About Gregory Graham
  • Disclaimer

The Solarium

A Sunny Place for Ideas to Grow

Feeds:
Posts
Comments
« Open Office and Plain Text Files
Bicycle Sleeper »

Converting and Merging Word, Excel and Powerpoint Files to PDF on Windows

June 2, 2008 by ggraham412

Keywords: Open Office, Ghostscript, Do It Yourself Convert Microsoft Word, Excel, Powerpoint to PDF

I’ve written a small program that uses Open Office to open and save different kinds of Microsoft Office files to PDF, and optionally merge them into a single output PDF file using GPL Ghostscript. I posted the code and article at the Code Project: http://www.codeproject.com/KB/java/PDFCM.aspx.

It’s a command line program, and we’re using a simplified version of it in production to do back-office conversions and merges of office files that we get from filling out forms internally and others that we get from customers. There are potentially many documents, and they can vary in size, so it is very cumbersome to cut, paste, print and scan everything to PDF (which is what our staff were doing when I started this project.)

Fortunately, it turns out that (1) one can use PRNADMIN.DLL with a Postscript Printer driver and an ActiveX IE browser to render a web page to Postscript, (2) Open Office can batch convert Microsoft Office files (and many more) to PDF, and (3) Ghostscript will merge Postscript and PDF on the command line.

The printer setup part of step 1 was detailed in a post here a few months ago, and there are a gazillion Code Project articles on using the ActiveX IE web browser to navigate and print. Steps 2 and 3 are detailed in the new Code Project article. I wrote it mainly because although there were many good sources on how to get started with Open Office, I couldn’t find a whole example anywhere on converting files to PDF. In fact, just connecting to Open Office was proving to be a major headache for me until I found the BootstrapSocketConnector. But still, there seemed to be many more questions on the forums about how to do PDF conversion using Open Office than there were answers, so when I got it working (actually, only an hour or so after getting the BootstrapSocketConnector) I decided to pass it along. The intended audience is people who are just getting started (like me) with Open Office. I hope you find it useful!

A few words: the structure of the program is to connect to Open Office and loop over non-PDF input files. Each file is opened using Open Office and exported to PDF using a filter appropriate to its file extension. Then, optionally all results (plus PDF originals on the input) are merged into a single PDF file using Ghostscript by process invocation. There were constraints which made this a logical choice. First, since we have to handle arbitrary files from clients, it is really not feasible to expect this to live in a web application. (Ie- Our users have happy fingers.) Instead, we use remoting with asynchronous user notification; and firing up a process is really no big deal for us. If one were going to be running this in a web application, one might want to refactor this and/or find a Ghostscript alternative.

Second, we added an option [-d] that would cause deletion of input and intermediate files on successful operation. Our company deals with sensitive information, so this was a big selling point.

Third, I was surprised at the sheer amount of goofiness that it took to get this working on both Linux and Windows. While I originally developed this for Windows, I felt a moral pull to validate the program on Linux before writing an article about it. I ran into LF/CR/CRLF line ending problems with Open Office and text files, the ESP Ghostscript that came with my Fedora Core 7 flat out didn’t work for this application (use GPL Ghostscript instead), and the syntax for firing up a process given the command line was slightly different in how the two handled spaces in the command string. All of these problems were solvable, however. (Actually, the LF/CR/CRLF problem may be a bug, but you can work around it with sed.)

Like this:

Like
Be the first to like this post.

Posted in ASP.Net, C#, Open Offie, Programming | 3 Comments

3 Responses

  1. on June 4, 2008 at 3:38 pm ggraham412

    As always, if you like the above article, please give me a good vote on the Code Project. If not, then please leave a comment why. Thanks in advance!


  2. on October 17, 2008 at 9:41 pm Roy Chrisop

    Hi Gregory,
    I saw your posting on the Code Project regarding the usage of prnadmin.dll and thought you may be able to recommend something that I am having a difficult time with currently.

    I need to create a C# app that will get the driver file name for all the printer mfg & model combinations. I would like to display everything that the Add Printer Wizard does for mfg/model just to get the associated inf file so I can store in a db record but not do anything else like setup the printer port, ip address, etc.

    Thank you in advance,
    Roy


  3. on April 15, 2009 at 2:52 pm How to Get Six Pack Fast

    If you ever want to see a reader’s feedback :) , I rate this article for four from five. Decent info, but I just have to go to that damn yahoo to find the missed parts. Thanks, anyway!



Comments are closed.

  • Recent Posts

    • Using the Internal Keyword While Preserving Encapsulation
    • Farkle Odds
    • ASP.Net: Validating a TextBox with a Dollar Sign
    • Pre-scoring Candidates for String Matching
    • String Matching and Zaxxon
  •  

    June 2008
    M T W T F S S
    « May   Jul »
     1
    2345678
    9101112131415
    16171819202122
    23242526272829
    30  
  • a

  • Archives

    • October 2009
    • August 2009
    • October 2008
    • August 2008
    • July 2008
    • June 2008
    • May 2008
    • April 2008
    • March 2008
    • February 2008
    • October 2007
    • July 2007
    • May 2007
    • April 2007
    • March 2007
  • Blog Stats

    • 107,155 hits
  • Meta

    • Register
    • Log in
    • Entries RSS
    • Comments RSS
    • WordPress.com

Blog at WordPress.com.

Theme: MistyLook by Sadish.


Follow

Get every new post delivered to your Inbox.

Powered by WordPress.com