Modifying existing pdf elements (particularly images) - image

I am reading in template PDFs, customizing them, and appending pages before outputting the final document. What I want to do is modify the elements in the template I load before I append it to the output.
In particular I want to hide or remove images (and potentially other elements). I'm not even sure if elements in the imported page can be modified directly, if I can only add images (I haven't seen any sign of a removeImage() function) or what.
A little guidance would be greatly appreciated.

You should get hold of the book accompanying iText: iText in Action (1st or 2nd edition). It has some great examples of most things that can be done in iText.
Fist edition
Second edition
I believe you need to iterate through the pdf references in the reader to be able to identify images. I am not sure how one would replace them, but it's probably possible.
There are other libraries that do this better, pdfnet being one of them, but this is commercial.

Related

exist-db how to access a pdf

I am sure it is very simple ... I just cannot get my head around this...
the exist-db Documentation is a bit fuzzy on content extraction...
http://exist-db.org/exist/apps/doc/contentextraction.
I have a pdf-file, containing of about 162 high-res images (the pdf is quite big ...) and I do not know how to access any of the that are presumably created ...
please do not destroy me! I am just starting to build a database (for an Edition at Uni)I'd love to have a facsimile edition (so one Tab with the image-file and one tab with the transcribed texts)
I aim at doing something similar to what Heidelberg Universitdy did with the "Welsche Gast Digital" http://digi.ub.uni-heidelberg.de/diglit/cpg389/0190/image
(the choosen image is just an example! )
This pic
When clicking on faksimile the Scan opens and when clicking on Transkription the transcribed texts open!
I am quite new to Xquery, Xpath and most X-related stuff. I have a "working design" put together in exist-db and am looking at TEI for marking up the transcritpion etc, I fear I'll have to spend quite some time on this issue ...
(it is not about doing my job for me, it's just about pointing me in the right direction)
I m afraid the short answer is simply don't.
Storing a pdf in your db, and then trying to extract images from it, is kind of a recipe for disaster. Instead you should use the source images (not necessarily extracted from the pdf), and store these individually in a collection (e.g. resources/img). Those image files are then the binary resources that the documentation is actually talking about.
You might want to take a look at tei-publisher for creating digital edition in exist, especially this demo app for how to present high-res facsimiles with transcribed portions of text. I m afraid its all a bit more involved then just opening a pdf in a browser, but so is the Welsche Gast Digital

MediaWiki API: size at which images where embedded/dropping unrelated icons

I use the MediaWiki API to find images of Wikipedia articles. However, I also get all the useless icons, like the broom for when a article needs to be cleaned up or the creative commons logo that marks something to be placed under a creative commons license.
Is there a way to detect which images are such icons so I can drop them? E.g. is there a way to query the size at which the image was embedded (rather then the size of the original image, which might be huge even for icons) so that I can drop all small ones. I'm not really interested in very small images anyway.
As far as I know, no. That information is simply not stored in the database, and is therefore also not available via the API.
Some things you could perhaps do include:
Load the HTML markup of the article (via the API action=parse, or simply via index.php with action=render) and extract the image sizes from it.
Simply build a list of images that should be excluded. You could do this programmatically (e.g. find all images used on all templates included in Category:Wikipedia maintenance templates and all its subcategories) or just add any unwanted images to the exclusion list as you come across them.

Image/form to Pascal/Delphi code converter?

Does anyone knows about any editor allowing to visually design a form (by form I do not mean DFM or Delphi form, but a "paper form", like those pre-printed forms that you fill with some info) and that generates pascal commands to draw that form in a Printer (or Image) canvas?
What I want is an easy way to draw/design this form visually, composed just by lines and text, and a way to convert this to Pascal commands that when run, will draw that form in a Canvas (Image or Printer), respecting the original layout and scale, doesn't matter the Canvas DPI where it is being drawn.
Update: Maybe I wasn't clear enough about what I need and why I need it. I developed an Open Source component called TFreeBoleto (freeboleto.sf.net). It is used to generate and print bank billets (a common method for billing people in Brazil). Right now, the component uses a TBitmap image containing the "billet" mask, and TextOut methods for the dynamic areas (ie: billet number, customer name, etc). It is fine when looked in the screen, but some people complains that the quality of the printed image is not good. The component uses a BltTBitmapAsDib procedure to maximize the quality of printing, but some people still think it is not good enough. So, my idea was to avoid using a bitmap image as the form layout, and draw everything direct in the canvas (both form and printer). Check here for a sample of what a bank billet looks like.
Of course ReportBuilder and/or FastReport could solve the problem, but they are not free, so I cannot include it in the component. I need "native" solution that any standard Delphi install would be able to compile.
You might get what you want out of the Fast Reports Report Designer which is a commercial reporting system for Delphi. Remember that a report is just a page. That page can be shown on the screen or printed on the printer.
You also might find that something like TRichView helps you.
Whether using TRichView in particular or not, I would look into using HTML to do what you want. I would use HTML+CSS to do both a screen and printer layout, that can also be viewed on the web. For simple text layout plus text boxes I think even bare HTML and HTML tables might be sufficient. To visually design simple text pages, using a Delphi application, I would use TRichView.
In both cases, you would be creating documents, not code. To create code that creates a page, without using any document system, would be very difficult indeed, and I am not sure what you would really do with that code, since you would need a compiler or interpreter to convert that code into something that you could use. Please clarify what you mean by "creating code", and what syntax you would want that code to be using. If HTML is code in your definition of "code" then maybe HTML is the best kind of "code" for your problem.
I do my form-work with WPTools. It is also a commercial product. The core is a very good wordprocessor and form-designer. The engine can render text and forms to any canvas (screen, printer, also create pdf) and is highly flexible. Output is mainly rtf and html.
I also see no advantage in creating pascal code to redraw the form. What you need, i think, is a good WYSIWYG-editor which creates a document that fits your needs.
Check out ReportBuilder # http://www.digital-metaphors.com/
It is a commercial reporting tool for Delphi - around a long time, very high quality, with all native Delphi source code packaged with it. I am using it for an important commercial project right now and I recommend it highly (I'm not working for them.) I've used MANY Delphi reporting tools over the years and this one is the best IMO.
RBuilder also has extensive support for paper form emulation see:
http://www.digital-metaphors.com/products/report_design/form_emulation.html
I haven't worked with that feature, but you can download a full-featured demo and try it.
Yoy can use Adobe Acrobat (full version) to create forms.
Then you can use free Acrobat Reader to display and print forms or other COM object in your application.
I think it is best solution for you.
PS
All tools for reports that are included in Delphi are free for you to design form and are free to distribute if user only preview and print already designed reports.
The same is valid for Adobe Acrobat (you may distribute forms) but you have added that you need to print form and some text over form. Maybe it is easier if you use reports but it is possible to do the same using PDF.
Most report engines are not open source but are free to distribute. There is many components for creating PDF - paid (one time), free, as well as open source.
PPS
I have read your updete for second time. Since you are using TBitmap and you can to TextOut so: You can use TMetafile. There is many editors for metafiles and it is free to distribute metafiles.

PDF generation under ruby - block should not cut by page separator

My PDF consists of a number of blocks (actually, a list of quotations), they go one after another till the end of the document. If the text of a quotation
does not fit on the page, the whole quotation should start from the top of the next page, instead of being torn apart. How can I implement that on any library under ruby?
Try PrinceXML - this is a standalone executable that generates PDF out of HTML or XML. It supports a lot of special CSS properties that will even help you to control page breaks. Refer to http://www.princexml.com/doc/6.0/page-breaks/
This application is available for windows and linux. I was using it for generation of a pretty complicated PDF documents with headers and footers on every page except first one. And since you don't need to output a PDF with precise positioning of elements, it might be a perfect solution for you.
I haven't tried it, but in Prawn I would try using either the Document#text_box method or looking up the table methods and putting your text in cells with invisible borders. The documentation's unclear on how page break functionality fits in with the bounding box models, but it's worth a shot.
HTMLDoc which converts HTML to PDF has a page break facility.

Printing Reports and invoices with Ruby?

I just learn Ruby, and I wonder how to generate Reports and Invoices (with Logo, adressfield, footer, variable number of invoice-items (sometimes resulting in more than one page), carry over of the amount to pay from one page to the next, free-floating 2-column text (left-and-right-justified) below the resulting cash-informations).
Currently I get a canvas to print and draw on from the OperatingSystem (matching the printer specifications) and use some draw-, move-, line-, text- and formfeed-API-Functions and do some heavy calculations for textblock-moving (a bit TeX-like).
How will this be done in Ruby?
Building an .odt and throw it to OpenOffice or a .tex and throw it to LaTeX?
Or are there any free Libraries, thet do all this kind of things for me, so I only have to feed the relevant parts, and let Ruby do the Text-Formatting thing?
EDIT:
To be more specific: I want to put a corporation logo on the first page (DIN-A4-format, but may also be letter) on a specific position, also the footer on every page and the adress-box on the first page. all the rest should be free floating text blocks with left-right-justification, bold words in the middle of texts.
something like
pdf.column.blocktext("Hello Mr. P\nwe have [b]good news[/b] for you. bla bla bla and so on. Please keep this text together (no page break)...");
pdf.column.floatingblock("This is another block, that should be printed, and can be broken over more than one column...");
which should render the text in the corporate font on the paper, justified, and wrapping neatly to the next column/page if it reaches the bottom of the page.
Thinking about it, this is exactly, what LaTeX is for.
I suggest you consider PDF generation. In Rails, it's pretty simple with the Prawn library.
There is also a fresh new Railcast about that.
Official web site.
You could also check out HtmlDoc for generating PDFs, it just takes in HTML and generates a PDF from it. This approach is nice because it lets you very easily reuse a partial for an on-screen and hard copy invoice.
http://blog.adsdevshop.com/2007/11/20/easy-pdf-generation-with-ruby-rails-and-htmldoc/
The Ruport library (Ruby Reports) makes it pretty easy to spit report tables out in multiple formats, including PDF. There's also a ActiveRecord hook acts_as_reportable that gives your models a reporting interface.

Resources