exist-db how to access a pdf - exist-db

I am sure it is very simple ... I just cannot get my head around this...
the exist-db Documentation is a bit fuzzy on content extraction...
http://exist-db.org/exist/apps/doc/contentextraction.
I have a pdf-file, containing of about 162 high-res images (the pdf is quite big ...) and I do not know how to access any of the that are presumably created ...
please do not destroy me! I am just starting to build a database (for an Edition at Uni)I'd love to have a facsimile edition (so one Tab with the image-file and one tab with the transcribed texts)
I aim at doing something similar to what Heidelberg Universitdy did with the "Welsche Gast Digital" http://digi.ub.uni-heidelberg.de/diglit/cpg389/0190/image
(the choosen image is just an example! )
This pic
When clicking on faksimile the Scan opens and when clicking on Transkription the transcribed texts open!
I am quite new to Xquery, Xpath and most X-related stuff. I have a "working design" put together in exist-db and am looking at TEI for marking up the transcritpion etc, I fear I'll have to spend quite some time on this issue ...
(it is not about doing my job for me, it's just about pointing me in the right direction)

I m afraid the short answer is simply don't.
Storing a pdf in your db, and then trying to extract images from it, is kind of a recipe for disaster. Instead you should use the source images (not necessarily extracted from the pdf), and store these individually in a collection (e.g. resources/img). Those image files are then the binary resources that the documentation is actually talking about.
You might want to take a look at tei-publisher for creating digital edition in exist, especially this demo app for how to present high-res facsimiles with transcribed portions of text. I m afraid its all a bit more involved then just opening a pdf in a browser, but so is the Welsche Gast Digital

Related

Is there a way to link yet to be created images into an indesign layout?

I'm creating a multipage publication with many ads that haven't been built yet. I know their size, and filename, but the image/pdf doesn't exist yet.
Is there an existing script or a possible way to link an image that doesn't exist? Another way to look at this would be kind of the reverse of how the missing links (relink) button works. Where I know what the file path will be, but the file is missing.
Publishing industry standard practice:
Rather than a script, just create a blank image at the exact size. Make it florescent magenta with the letters "FPO" huge and dead-centre so no one can mistake it for the real thing.
Importantly: give this FPO image the exact file name of the file which will eventually be used/placed.
When your production image is finalized and approved, cut-and-paste the exact FPO file name into the new file. Drop the production file into your working directory overwriting the FPO file, and refresh it in InDesign. Bob's your uncle.
If this is being done to hundreds of images, you can develop your own batch process to handle this with some time-saving automation. However, this is a good example of an issue that can be solved at the production-management level, rather than at the coding level.
Hoping this helps!

markdown or markup to powerpoint?

I need to maintain some slides in both latex beamer and in powerpoint. (This is to make slides available for instructors elsewhere, too, 90% of which do not know how to use latex and are unwilling to learn it. and I am a latex guy on linux.)
I have tried the route via Libreoffice (and opendocument), but this did not come out well. right now, the best method that I have found is to author pdf in beamer, then run it through a nuance OCR program to get MS Word...and not even go all the way to Powerpoint (which is where I really need to be).
If I only had a markup language that produced nice Powerpoint, I could probably code a perl translator from markdown to this intermediate markup language. (going from markdown to latex beamer is relatively easy.)
I don't think this exists, but hope springs eternal. after all, it is almost 2014 now. does anyone know of a solution?
One solution is to use odpdown: It converts markdown to the OpenOffice Presenter format, which can be imported into PowerPoint.
It is not yet complete, i.e. table support is missing and possibly not running on certain Windows setups, but nevertheless it could be a start. Possibly, you have Linux running, where it seems to work.
Steve Rindsberg's answer in the comments works on PP 2007 works! Let me repeat it here:
I suspect that PowerPoint is the likeliest solution. ;-) But what sort
of slides are you creating? If they're simple heading and bullet point
slides, all you need to produce is a simple text file. Any text that
starts in the left column will be the heading of a new slide. Indent
one tab and it becomes a first-level bullet point under the current
heading; indent two tabs, it becomes a second level bullet point and
so on. Simply use File | Open on the text file to pull it into PPT.
Steve: Is this all that PP converts? Or is there a reference of other "sneaky" markup that PP knows about?
(pandoc: unfortunately, the conversion from libreoffice to powerpoint is pretty poor when I tried it last. I also tried to save and understand the powerpoint xml format, but that was REAL bad.)
The easiest way to handle this is to work with:
RStudio (and R if not already installed)
RMarkdown
Pandoc 2.0.5 (minimum)
Install those 3 (or 4) items, then read: https://bookdown.org/yihui/rmarkdown/powerpoint-presentation.html
The installation time is worth the time saved copy-pasting everything from scratch.
I also am a Linux guy and I also use LateX engines to create nice documents. Based on my experience, here's what you should do :
Stop writing directly in LaTeX and start using org-mode to write documents instead (I spent years writing in LaTeX and now it's over (except when I use modernv package))
Org supports latex math formulas and .org files are easily exported in .tex files
Org can also be easily exported in markdown
Once you have your markdown, there are several tools that will allow you to create a PowerPoint. Two of them are pandoc and md2pptx

Generating Powerpoint or Keynote from XML (or via a Ruby gem?)

I'm looking for a nice way to generate either a Keynote file from XML or a Powerpoint file that I can then import to Keynote. Basically, I'm looking for a simple human-writable markup format (for easy scripting) that can be exported into slides.
I volunteer with a local nonprofit, where anything remotely technical falls to me. On a fairly regular basis, I'm sent information for events and produce a nice looking printed program in Word, though much of the same material also goes into slides in Keynote. (Keynote is used rather than PowerPoint so that Keynote Remote can be used.)
Anyway, there's a large volume of text I work with that I'm sent via email, and it has to go in both a Keynote presentation and a Word document, and requires all sorts of odd manual formatting to not break pages or slides at odd times, also requiring a good deal of manual restyling, since I'm not going to allow something I do to come out looking like something sloppy from the 1990s.
My hope is to write up a Ruby script that I can feed the source text to, and it'll go do all the processing for me, at least for Powerpoint or Keynote. I've normally had fantastic luck finding a gem for just about any format or service I've wanted to work with, but I haven't found anything that works with Powerpoint or Keynote.
My next thought was to have the Ruby code generate appropriate XML since both Office and I Work allegedly open the Office XML format, but I couldn't find any actual friendly documentation for human-writable XML code.
Is it wishful thinking to want to be able to do something like the following?
<SLIDE FORMAT="Title & Bullets">
<SLIDE_TITLE>
Lorem Ipsum
</SLIDE_TITLE>
<PARAGRAPH>
[etc.]
All I can find as far as converter scripts is all related to charts and tables and such which is of zero use here), usually revolves around opening or converting FROM Powerpoint or Keynote rather than creating, and furthermore generally seems to be for Windows using OLE or VBScript. This needs to run on the Macs they have there, so no Visual Studio stuff, Windows related scripting, etc will work. I don't HAVE to do it in Ruby, but that's what I'd be most comfortable with on the Mac end of things.
So is there documentation out there on a marginally friendly XML format for Powerpoint or Keynote, or even better, a Ruby gem for either?
If all you need to do is title + bullet point slides, you simply need to create an ascii text file. Each line of text will become the title of a new slide. But if the first character in a line of text is a tab, the line will become a first level bullet point on the same slide as the previous title. If two tabs, it indents the text to a second level bullet point and so on.
This becomes the title on slide one
This becomes the title on slide two
<tab>This is a bullet point, first level
<tab><tab>And this is a bullet point, second level
<tab>Back to first level bullet point
And another new slide
Once you have the text file, you can do File Open in PPT and force files of type to all files . and select your .TXT file. Or you can use Insert Slide From File to bring the .TXT file into an existing presentation.
There's a limit to the number of slides you can create at one go like this; 100 perhaps?
Note also that VBA disappeared in Mac Ofice 2008 but is back in Mac Office 2011, so if you can find examples of VB/VBA code that do what you want, you can use them on Mac, so long as it doesn't have to happen in Office 2008.

Image/form to Pascal/Delphi code converter?

Does anyone knows about any editor allowing to visually design a form (by form I do not mean DFM or Delphi form, but a "paper form", like those pre-printed forms that you fill with some info) and that generates pascal commands to draw that form in a Printer (or Image) canvas?
What I want is an easy way to draw/design this form visually, composed just by lines and text, and a way to convert this to Pascal commands that when run, will draw that form in a Canvas (Image or Printer), respecting the original layout and scale, doesn't matter the Canvas DPI where it is being drawn.
Update: Maybe I wasn't clear enough about what I need and why I need it. I developed an Open Source component called TFreeBoleto (freeboleto.sf.net). It is used to generate and print bank billets (a common method for billing people in Brazil). Right now, the component uses a TBitmap image containing the "billet" mask, and TextOut methods for the dynamic areas (ie: billet number, customer name, etc). It is fine when looked in the screen, but some people complains that the quality of the printed image is not good. The component uses a BltTBitmapAsDib procedure to maximize the quality of printing, but some people still think it is not good enough. So, my idea was to avoid using a bitmap image as the form layout, and draw everything direct in the canvas (both form and printer). Check here for a sample of what a bank billet looks like.
Of course ReportBuilder and/or FastReport could solve the problem, but they are not free, so I cannot include it in the component. I need "native" solution that any standard Delphi install would be able to compile.
You might get what you want out of the Fast Reports Report Designer which is a commercial reporting system for Delphi. Remember that a report is just a page. That page can be shown on the screen or printed on the printer.
You also might find that something like TRichView helps you.
Whether using TRichView in particular or not, I would look into using HTML to do what you want. I would use HTML+CSS to do both a screen and printer layout, that can also be viewed on the web. For simple text layout plus text boxes I think even bare HTML and HTML tables might be sufficient. To visually design simple text pages, using a Delphi application, I would use TRichView.
In both cases, you would be creating documents, not code. To create code that creates a page, without using any document system, would be very difficult indeed, and I am not sure what you would really do with that code, since you would need a compiler or interpreter to convert that code into something that you could use. Please clarify what you mean by "creating code", and what syntax you would want that code to be using. If HTML is code in your definition of "code" then maybe HTML is the best kind of "code" for your problem.
I do my form-work with WPTools. It is also a commercial product. The core is a very good wordprocessor and form-designer. The engine can render text and forms to any canvas (screen, printer, also create pdf) and is highly flexible. Output is mainly rtf and html.
I also see no advantage in creating pascal code to redraw the form. What you need, i think, is a good WYSIWYG-editor which creates a document that fits your needs.
Check out ReportBuilder # http://www.digital-metaphors.com/
It is a commercial reporting tool for Delphi - around a long time, very high quality, with all native Delphi source code packaged with it. I am using it for an important commercial project right now and I recommend it highly (I'm not working for them.) I've used MANY Delphi reporting tools over the years and this one is the best IMO.
RBuilder also has extensive support for paper form emulation see:
http://www.digital-metaphors.com/products/report_design/form_emulation.html
I haven't worked with that feature, but you can download a full-featured demo and try it.
Yoy can use Adobe Acrobat (full version) to create forms.
Then you can use free Acrobat Reader to display and print forms or other COM object in your application.
I think it is best solution for you.
PS
All tools for reports that are included in Delphi are free for you to design form and are free to distribute if user only preview and print already designed reports.
The same is valid for Adobe Acrobat (you may distribute forms) but you have added that you need to print form and some text over form. Maybe it is easier if you use reports but it is possible to do the same using PDF.
Most report engines are not open source but are free to distribute. There is many components for creating PDF - paid (one time), free, as well as open source.
PPS
I have read your updete for second time. Since you are using TBitmap and you can to TextOut so: You can use TMetafile. There is many editors for metafiles and it is free to distribute metafiles.

Modifying existing pdf elements (particularly images)

I am reading in template PDFs, customizing them, and appending pages before outputting the final document. What I want to do is modify the elements in the template I load before I append it to the output.
In particular I want to hide or remove images (and potentially other elements). I'm not even sure if elements in the imported page can be modified directly, if I can only add images (I haven't seen any sign of a removeImage() function) or what.
A little guidance would be greatly appreciated.
You should get hold of the book accompanying iText: iText in Action (1st or 2nd edition). It has some great examples of most things that can be done in iText.
Fist edition
Second edition
I believe you need to iterate through the pdf references in the reader to be able to identify images. I am not sure how one would replace them, but it's probably possible.
There are other libraries that do this better, pdfnet being one of them, but this is commercial.

Resources