Create image of, "specific page" of postscript file using ghostscript

Create image of, "specific page" of postscript file using ghostscript - ghostscript

Please someone help me with ghostscript....
I want to create image of a particular page postscript file using ghostscript. I will specify the page number or something like that will get the respective image as output.
Is this possible with ghostscript?

In current versions of Ghostscript you have two choices:
1) Render every page to an image file, use the '%d' file name format so that you know which page you want, delete the others.
2) PostScript is a programming language. Write a customer EndPage procedure which returns true when the page is the one you want, and false for all others.
In a yet to be released version of Ghostscript, the FirstPage and LastPage paramters could be used to do this.

Related

Issue with Oracle Reports 6i to PDF

everyone.
I am working with Oracle Reports 6i to generate a report that includeds text in the form of paragraphs. Everything looks fine from the Real Time Viewer however when the report is run to generate a PDF, some, of the paragraphs would change from Justified to Filled.
This doesn't happen for every text container. In a full page I will have two paragraphs that are filled instead of justified.
Here is the details.
Each paragraph is within their own container.
The alignment for all containers is set to Justified(Flush)
Paragraphs have the same font type and font size.
I have already try the size of the output but it didn't make a difference. Is there any configuration parameter or any format function I can use to fix this?
Thank you all!!

If some paragraphs are OK and some are not, I'd suggest you to use good, old copy/paste principle:
delete wrong ones
copy correct one
paste it
edit its contents - hopefully, it'll look OK (as all properties the "correct" one had are now "inherited")

How to detect multiple barcodes/QR codes in a TIFF image and return their value + position?

I'm currently trying to achieve this:
I have a very large TIFF image, which contains scanned documents. The image contains invoices with barcodes/QR codes, followed by multiple other scanned documents related to the invoice which preceded them. This can be repeated multiple times ( the TIFF image may look like [invoice] + [documents] + [invoice] + [documents] ... )
I need a program (doesn't really matter in which language but I'd prefer either Java, JavaScript, PHP, C++ or Python) that takes said TIFF image, scans all the barcodes and returns their values and their position in the image (either which page it is on or it's absolute position, but the page is preferable, I know for certain that there won't be multiple barcodes on one page). The goal is to split this TIFF image into multiple PDF files, each containing only one invoice and all of the documents that belong to the invoice.
I have the latter part done already. I intend to use ImageMagick to split the TIFF file into multiple files (tested, works). I have also tried multiple barcode scanning methods, but met critical problems at every one. And that's the point of my question:
Is any of my presumptions false? Is there a better way/library/SW that you know about that could work?
Libraries/SW I tried so far:
ZXing port for PHP: Can't work with TIFF files
ZXing github
Quagga for JavaScript: Can't work with TIFF files either.
Quagga github
ZBar code reader: The best looking one by far. I managed to scan multiple QR codes in one TIFF image using CMD (Windows), but didn't find a way to get their positions. Also found out that C++ and Python versions exist, but didn't get to try them out just yet.
Thanks for any ideas/corrections.

The best one I heard -that is subjective ofc- is Barcode Rendering Framework
I'm not sure if it can detect multiple barcodes on a page but it can detect many different types of barcodes.
And it's also Open Source..

Why only 1 image out of 2 is correctly read by tesseract?

It's my first experience with tesseract, I'm trying to read the digits contained in these tiff images:
http://imageshack.us/g/703/64553021.png/
As you can see they are in the same format and also same width/height. I don't know why tesseract returns the correct output only for the second image ("150") instead for the first one returns a blank output.
Maybe I should modify them to best fit tesseract? How? I can use Imagemagick if needed.
Thanks in advance.

In the readme they say:
In the executable, page layout analysis is enabled by default. You may need to turn it off to process small images. No command-line control for this yet. Sorry. See tesseractmain.cpp.
I think your images are too small, try editing the code (and recompile).

Extract Images and Words with coordinates and sizes from PDF

I've read much about PDF extractions and libraries (as iText) but i just haven't found a solution to extract images and text (with coordinates) from a PDF.
The task is to scan PDF with catalog of products and extract each image. There is an image code printed next to each image and also a list of product codes for products that are shown on the image.
I know that there is no way to extract structured info from a PDF like this but with coordinates of all image and text objects I could write code to identify linked text by its distance from the image. Then I could split text using a RegExp and find out what is a product code, what is an image code etc.
Could you recommend a good and working solution for the task?

Use XPDF (http://www.foolabs.com/xpdf/)
It can extract all the characters in the PDF with co-ordinates (pdftotext -bbox [sourcefile] [outputfile]) and also all the images and SVGs in the PDF.
It's open source (GPLv2) and supports a lot of additional extraction functionalities as well.

Several Java libraries can do this. Have you looked at JPedal or PdfBox?

If a commercial library is an option for you, you could try Amyuni PDF Creator .Net or Amyuni PDF Creator ActiveX. You could use the method IacDocument.GetObjectsInRectangle to retrieve all the "graphic objects" of your interest, then use the ObjectType attribute to separate images from text. The library already provides an algorithm for putting close text together. From the documentation:
IacDocument.GetObjectsInRectangle Method
The GetObjectsInRectangle method gets all the objects that are in the specified rectangle.
Usual disclaimer applies.

PDFTK - and the ability to change the default view

I have been merging PDFs using PDFTK with great success, the pages that are used to generate the pdf are set to 'click to show one page at a time' (basically the whole of the first page is displayed when the pdf opens, based on the height of the page).
however the generated pdf defaults back to filling the reader based on its width (not all the first page shows).
Do you know a way of controlling the view of the generated pdf? because I would prefer the whole page to be displayed based on its height?
Best regards
Daniel

Daniel,
Thank you for your message. When using pdftk to assemble a new PDF from PDF pages or documents (via the cat operation), the new PDF does not have display settings. So the resulting PDF is displayed using the defaults set in your viewer's preferences. Pdftk doesn't have a means of setting the display mode, but I will add that to the feature wishlist. Meanwhile, you can change your Reader/Acrobat preferences to your preferred view mode as a workaround.
Regards-
Sid Steward
Pdftk Maintainer

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio