Is there a common solution to generate msoffice, libreoffice and pdf documents on the fly? - ruby

I would like to generate office documents (msoffice, oo) and pdf on the fly from one source document. Currently i think about opendocument as templates files and libreoffice-headless as converter.
Does anybody have experience on this topic and is there a (commercial?) ready to use solution?

A commercial solution is Docmosis which has a downloadable and cloud-service solutions using MSWord/OpenOffice documents as templates and providing template-population features, load balancing, doc/docx/odt/pdf/rtf/html production and quite a few other features. One of it's key features is to generate point-in-time output in multiple formats (from the same template and data) as you mentioned. It has at least one Ruby example to show the population features. Please note I work for the company that created Docmosis.
Another option is the open source JOD Reports.
I hope that helps.

Related

Preparing PDFs for use as Prawn templates

We've got a system that takes in a large variety of PDFs from unknown sources, and then uses them as templates for new PDFs generated by Prawn.
Occasionally some PDFs don't work as templates for Prawn- they either trigger a generic Prawn error ("Prawn::Errors::TemplateError => Error reading template file. If you are sure it's a valid PDF, it may be a bug.") or the resulting PDF comes out malformed.
(It's a known issue that some PDFs don't work as templates in Prawn, so I'm not trying to address that here:
[1]
[2])
If I take any of the problematic PDFs, and manually re-save them on my Mac using Preview > Save As [new PDF], I can then always use them as Prawn templates without any problem.
My question is, is there some (open source) server-side utility I can use that might be able to do the same thing- i.e. process problematic PDFs into something Prawn can use?
Yarin, it at least partially depends on why the PDFs don't work in the first place. If you can use them after re-saving with Apple's (quite bad) preview PDF code, you should be able to get the same result using a number of different tactics:
-) Use an actual PDF library to open and save the PDF files (libraries from Adobe and Global Graphics come to mind). These are typically commercial products but (I know the Adobe library the best) they do allow you to open a file and save it, performing a number of optimisations in the process. The Adobe libraries are currently licensed through a company called DataLogics (http://www.datalogics.com)
-) Use a commercial product that embeds these libraries. callas pdfToolbox comes to mind (warning, I'm affiliated with this product). This basically gives you the same possibilities as the previous point, but in a somewhat easier to use package (command-line use for example).
-) Use an open source product. I'm not very well positioned to provide useful links for that.
There is another approach that may work depending on your workflow and files. In graphic arts bad files are sometimes "made better" by a process called re-distilling; you basically convert the PDF file to PostScript and re-distill the postscript into PDF again. Because this rewrites the whole file structure, it often fixes fundamental problems. However, it also comes with risks as you're going through a different file format. Libraries such as GhostScript (watch the licensing conditions) may allow you to do this.
Given that your files seem to be fixed simply by using preview, I would think a redistilling approach would be overly dangerous and overkill. I would look into finding a good PDF library that can automatically open and save your files.

I need a wrapper (or alternative) for Open Office XML Presentations / Powerpoints

I recently automated the creation of Powerpoint Presentations in a site I'm making. I found the Office Interop libraries extremely simple to use.
Office isn't built for this kind of thing in a webserver environment, so I'm looking at creating the Powerpoints using Open Office XML, only it's so extremely complex. For example I downloaded some code to create a blank presentation with some text. This code was around 300 lines! Using the Office Interop libraries I could do the same thing in just a couple of lines of code.
I don't have time, nor do I want to attempt to learn how to interact with the Open Office XML libraries, so I'm hoping someone has made a wrapper for the Open Office XML libraries. So far all my searching has only given me one result, Aspose Slides for .NET. This looks really hopeful, but it also looks rather expensive
Has anyone ever used a decent wrapper or alternative before?
If you are looking at automating the creation of Powerpoint presentation files, I'd say you continue with OpenXML, there's nothing better than it. Everything else is either paid or don't offer entire gamut of functionality that Open XML can provide.
If you find creating a blank file tedious, you could save an empty file somewhere and use that as a template for performing further operations on it.
The only thing close to a wrapper for PowerPoint I've found is the Open XML PowerTools. It includes a PresentationBuilder class which can be used for some specific tasks like combining slides from multiple PowerPoint documents into a new document. Although its pretty limited in its functionality you could extend the class.
However, I've come to the conclusion that there just is not a good wrapper out there so I've had to do what everybody pretty much recommends and that is using the Open XML SDK Productivity Tool and the Reflect code button.
I put together a basic presentation then Reflect Code and put that into a class. Yes its a lot of lines of code and its not the most elegant solution but it does work. Then from there I can extend or modify that class to do the specific things I need to do with each slide. The Productivity Tool is a big help for figuring out the code need to do specific things. I try to keep it simple and just do one or two things at a time, Reflect Code, then look at the code to see what it does.
You could try SoftArtisans PowerPointWriter, it has a template mode that allows you to start with an existing PowerPoint file with a few place holders, and merge your data with your presentation with as little as 5 lines of code.
Disclaimer: I work for SoftArtisans

How to create an index for a text file?

Can anyone point me to a good resource on how to build an index for a large text file?
I am using an OLE Driver on Windows to execute SQL queries against the text file and I don't want to have it read through the entire file.
I've tried googling the topic but I can't find a good resource.
Thank you!
In case anyone else comes across this post looking for a resource on how to index a text file, I found a great tutuorial at:
Indexing Random Access Files in Visual Basic
Hat-tip to Rick Meyer for writing a clear and educational article on the subject.
Originally, my plan was to use an OLE driver to execute SQL queries on text files. After reading Rick's article above along with this tutorial: Open for Random in Visual Basic, I decided to abandon the SQL approach and perform binary searches on random access files instead.
Choosing Random Access instead of a SQL solution (like the ones suggested in the comments above) is certainly more involved, however, I am very happy with the speed and lightweight nature of the Random Access based application.

How to generate a PDF within application with no reporting framework

I need to create pdf reports in my app. I'm using asp.net mvc3. What's the best way to do this? I don't really want to use a reporting framework if i can avoid it, it's just a few reports, table layout, groupings, pagination possibly, totals, ability to merge pdfs into 1 pdf....any ideas? what would be ideal is if i could convert my html view into a pdf simply...
There is nothing built into .NET allowing to create PDF files. So you have two possibilities: write one yourself from scratch or use one that exists.
In case you decide to go with the second you may take a look at flying-saucer which along with ikvmc.exe could be used to convert XHTML files into PDF. I have blogged about some of the required steps in order to get this working.
Some possibilities:
I think you can do this with SQL Server reporting services (in SQL rather than a 3rd party reporting framework)
Low level PDF libraries that can be used: PDFSharp, iTextSharp.
You could print an html file to a postscript driver using word automation, then convert the PS to PDF via GhostScript

Libraries/Tools for Website Parsing

I would like to start working with parsing large numbers of raw HTML pages into semantic data structures.
Just interested in the community opinion on various available tools for such a task, particularly various useful libraries in any language.
So far, planning on using Hadoop to manage a lot of the processing, but curious about alternatives.
First you need to download your page source and then create a DOM tree.
if you are coding in C# you can user the following tools to create your DOM tree.
1) http://htmlagilitypack.codeplex.com/
2) http://www.majestic12.co.uk/projects/html_parser.php
the first one is easy to use but second one is much faster and memory friendly and I suggest you to use the second one if you want to create a robust application
then you can extract usefull content from web page using:
http://www.chrisspen.com/blog/how-to-extract-a-webpages-main-article-content.html
and many other articles you can find to extract content from web page by Googling (extract main content from web page)
Hope it helps

Resources