ghost script produce wrong pdf file with diffrent page size - ghostscript

I am using ghost script version "ghostscript-8.71" to produce pdf file in my application.
In window it produce the pdf file with size "8.26x11.69 (A4)" and Linux 8.5x11 (for India Locale) .
I want to make it consistent as per Window environment. So Linux also produce the same pdf file size i.e "8.26x11.69 (A4)" (for India locale only)
I found the file named "gs_init.ps"
under the directory "Resource\Init" which contain the settings.
I found while googling ie.
Find the line in "gs_init.ps"
% /DEFAULTPAPERSIZE (a4) def
Then to make A4 the default paper size, uncomment the line to change this to
/DEFAULTPAPERSIZE (a4) def
But it has no effect in Linux. How we can make Linux to produce the pdf file with same size as Window for locale(India).
Please help me in this regard.
Thanks & Regards
Vikas

Firstly, use an up to date version. The current version of Ghostscript is 9.10.
If you read the Ghostscript documentation you will find the same information you googled, in /ghostpdl/gs/doc/Use.htm#Paper_size, section 3.4, where it also says:
"Sometimes the initialization files are compiled into Ghostscript and cannot be changed."
Did you try simply setting -sPAPERSIZE=a4 ?

Related

Windows 10: Simple way to print a PDF-file without the save-dialog

I'm trying to print the PDf-file(s) in a certain folder (or alternatively just print the files one-by-one) using for example Micorosft Print to PDF in order to create flattened versions. However when using Microsoft Print to PDF i need to specify the ouput-file's name and path. Is there any way to circumvent this or an alternative virtual printer specialized on such a job?
What I've already tried:
Windows 10 Print to PDF from command-line and Printing PDFs from Windows Command Line
These approches try to use the command prompt (personally favoured by me aswell, as it allows to create a batch-file and automate the process completely), but unfortunately the programs/printers listed in those posts are either not free or show a save-file-dialog aswell. Furthermore they are quite slow (even though this is not my main focus). So far, PDFtoPrinter has been the best solution, though it shows the save-file-dialog aswell...
Another idea I got from this post is to create a (VBA-/PowerShell-)script, but I'm not very experienced at that.
Any way to print just one PDF via the console and then making a loop or maybe even hard-coding the names would suffice aswell. I can easily rename the files for example to 1.pdf, 2.pdf, 3.pdf, ...
At this point I've tried so much but there has to be a way to get this running. Any help would be greatly appreciated!
Microsoft Print to PDF on Windows is not "Free", simply "Leased", however that said you can change the owners designed behavior to a different "port" than "prompt" or use the drivers to print to your desired named file.
To use ONE fixed output filename like %TEMP%OUT.PDF you are best served by cloning/duplicate the "Microsoft Print to PDF" to a printer name of your choice so I call mine "My Print to PDF" as its shorter to type and the Auto printed file goes to MyData folder. For a visual guide see https://stackoverflow.com/a/69169728/10802527 and up vote there if that helps.
The alternative is to use a structure like
CliPdfApp /PrintTo file.pdf "Microsoft Print to PDF" "Microsoft Print to PDF" "C:\MyFavourite Places\FileName.pdf
However few apps follow the required convention, so WordPad will convert Docx or RTF via command line but can not handle a PDF and Edge AFAIK was not designed to make the PDF format CLI print friendly :-). But those links you have in the question will suggest acrord32 /p or /t filename printer printdriver filename and that is probably the best method for flattening acroforms
Disclaimer I support SumatraPDF so can suggest to "Print As Image ONLY" its perfect as one single 32 or 64 bit portable.exe https://www.sumatrapdfreader.org/prerelease so all you need is:-
SumatraPDF -print-to "My Print to PDF" filename.pdf (or other types supported)
There are other print methods/options but BE-AWARE that is NOT flattening forms since "Flattening" means convert the form to plain readable text and SumatraPDF ONLY prints PDF as Imagery.
So combining SumatraPDF with a promptless port will provide a single command to build a known output then you need to monitor that output and rename to one of your choice, that can be tricky if you are submerged and "running silent and deep" without GDI feedback (that the print is spooling/erroring) and time is as variable as the input PDFs complexity.
You use the word "Slow" but that is the innate feature of PDF "Slow and Steady" output are its designed aims.
As an alternative to SumatraPDF two other viewers are more geared towards PDF Command Line printing. One is Acrobat Reader as per above "/Terminate and Stay Resident" and it does that exceptionally well so should be preferred. A good alternative lightweight but powerful PDF handler is Tracker PDF X-change which has both command line printing and its own programmable printer drivers.
Win2PDF has a command line to create image only (flattened) PDF files.

GhostScript undefined glyp

I'm using gs 9.20 to merge some pdf documents into a single document
/usr/bin/gs9/bin/gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dRENDERTTNOTDEF=true -sOutputFile=/docs/merged.pdf
And I'm getting this error and have no idea how to resolve it. Has anyone come across these types of errors?
GPL Ghostscript 9.20: ERROR: Page 5 used undefined glyph 'g2' from
type 3 font 'PDFType3Untitled'
Without seeing the original file its not possible to be certain, but I would guess from the error that the file calls for a particular glyph in a font (PDFType3Untitled), and that font does not contain that glyph.
The result is that you get an error message (messages from the PDF interpreter which begin with ERROR, as opposed to WARNING, mean that the output is very likely to be incorrect).
You will still get a PDF file, and it may be visually identical with the original because, obviously, the original file didn't have the glyph either.
As for 'resolving' it, you need to fix the original PDF file,that's almost certainly where the problem is.
Please note that you are not 'merging' PDF files as I keep on saying to people, the original file is torn down to graphics primitives, and then a new file built from those primitives. You cannot depend on any constructs in the original file being present in the final file. A truly 'merged' file would preserve that, Ghostscript's pdfwrite device does not.
See here for an explanation.

Maximum number of input file for Ghostscript (gs)

I simply want to combine multiple eps files into one big file using gs command
the command work flawlessly except that when I specify more than 20 input files.
Somehow the command ignore input files starting from 21st input.
Anyone experience the same behavior? Is there a cap of number of input files specify anywhere?
I look through the site and couldn't find one.
sample command
gs -o output.eps -sDEVICE=eps2write file1.eps file2.eps .... file21.eps
Thank you.
Edit: add sample command
Almost certainly you have simply reached the maximum length of the command line for your Operating System. You can use the # syntax for Ghostscript to supply a file containing the command line instead.
https://www.ghostscript.com/doc/current/Use.htm#Input_control
Note that the EPS files will not be placed appropriately using that command, and this does not actually combine EPS files, it creates a new EPS file whose marking content should be the same as the input(s).
If you actually want to combine the EPS files its easy enough, but will require a small amount of programming to parse the EPS file headers and produce appropriate scale/translate operations, as well as stripping off any bitmap previews (which will also happen when you run them through Ghostscript).

How do I include ggplots using pander's live report generation

I am using pander to create docx reports via Windows7, following the examples at http://rapporter.github.io/pander/#live-report-generation.
myReport <- Pandoc$new(author="Jerubaal",title="Plot Anything", format="docx")
I've tried the example
myReport$add(plot(1:10))
and that doesn't work on Windows, but does on Linux.
Previously I got plots appearing using brew files and <%=plot(1:10)=>, but I am trying out the Live Report Generation because that method seems most suited to me.
I've also tried saving a plot to file first and then creating an image link, which again work in Linux, but not in Windows:
myReport$add("![](plots/myplot.png)")
I want to include ggplot2 plots - the code works on its own in R but doesn't appear in the docx (although I do get a blank line).
R version 3.0.2 (2013-09-25) -- "Frisbee Sailing"
pandoc.exe 1.12.2.1
What am I missing? Thanks.
Edit: I'm back home and this works on Ubuntu:
library(pander)
library(ggplot2)
setwd("/home/jerubaal/R/Projects/reports")
attach(movies)
m=movies[sample(nrow(movies), 1000),]
myReport=Pandoc$new(author="Jerubaal",title="Testing plots in reports",format="docx")
myReport$add.paragraph("There should be a plot after this")
p=ggplot(data=m, aes(x=rating,fill=mpaa)) + geom_density(alpha=0.25)
ggsave(filename=paste(getwd(),"plots/movies.png",sep="/"),plot=p,width=6,height=3)
myReport$add(paste("![](",paste(getwd(),"plots/movies.png",sep="/"),")",sep=""))
myReport$add.paragraph("There should be a plot before this")
myReport$export(tempfile())
Q: Would it cause a problem if the png took too long to create or save?

Methods of Parsing Large PDF Files

I have a very large PDF File (200,000 KB or more) which contains a series of pages containing nothing but tables. I'd like to somehow parse this information using Ruby, and import the resultant data into a MySQL database.
Does anyone know of any methods for pulling this data out of the PDF? The data is formatted in the following manner:
Name | Address | Cash Reported | Year Reported | Holder Name
Sometimes the Name field overflows into the address field, in which case the remaining columns are displayed on the following line.
Due to the irregular format, I've been stuck on figuring this out. At the very least, could anyone point me to a Ruby PDF library for this task?
UPDATE: I accidentally provided incorrect information! The actual size of the file is 300 MB, or 300,000 KB. I made the change above to reflect this.
I assume you can copy'n'paste text snippets without problems when your PDF is opened in Acrobat Reader or some other PDF Viewer?
Before trying to parse and extract text from such monster files programmatically (even if it's 200 MByte only -- for simple text in tables that's huuuuge, unless you have 200000 pages...), I would proceed like this:
Try to sanitize the file first by re-distilling it.
Try with different CLI tools to extract the text into a .txt file.
This is a matter of minutes. Writing a Ruby program to do this certainly is a matter of hours, days or weeks (depending on your knowledge about the PDF fileformat internals... I suspect you don't have much experience of that yet).
If "2." works, you may halfway be done already. If it works, you also know that doing it programmatically with Ruby is a job that can in principle be solved. If "2." doesn't work, you know it may be extremely hard to achieve programmatically.
Sanitize the 'Monster.pdf':
I suggest to use Ghostscript. You can also use Adobe Acrobat Distiller if you have access to it.
gswin32c.exe ^
-o Monster-PDF-sanitized ^
-sDEVICE=pdfwrite ^
-f Monster.pdf
(I'm curious how much that single command will make your output PDF shrink if compared to the input.)
Extract text from PDF:
I suggest to first try pdftotext.exe (from the XPDF folks). There are other, a bit more inconvenient methods available too, but this might do the job already:
pdftotext.exe ^
-f 1 ^
-l 10 ^
-layout ^
-eol dos ^
-enc Latin1 ^
-nopgbrk ^
Monster-PDF-sanitized.pdf ^
first-10-pages-from-Monster-PDF-sanitized.txt
This will not extract all pages but only 1-10 (for proof of concept, to see if it works at all). To extract from every page, just leave off the -f 1 -l 10 parameter. You may need to tweak the encoding by changing the parameter to -enc ASCII7 (or UTF-8, UCS-2).
If this doesn't work the quick'n'easy way (because, as sometimes happens, some font in the original PDF uses "custom encoding vector") you should ask a new question, describing the details of your findings so far. Then you need to resort bigger calibres to shoot down the problem.
At the very least, could anyone point
me to a Ruby PDF library for this
task?
If you haven't done so, you should check out the two previous questions: "Ruby: Reading PDF files," and "ruby pdf parsing gem/library." PDF::Reader, PDF::Toolkit, and Docsplit are some of the relatively popular suggested libraries. There is even a suggestion of using JRuby and some Java PDF library parser.
I'm not sure if any of these solutions is actually suitable for your problem, especially that you are dealing with such huge PDF files. So unless someone offers a more informative answer, perhaps you should select a library or two and take them for a test drive.
This will be a difficult task, as rendered PDFs have no concept of tabular layout, just lines and text in predetermined locations. It may not be possible to determine what are rows and what are columns, but it may depend on the PDF itself.
The java libraries are the most robust, and may do more than just extract text. So I would look into JRuby and iText or PDFbox.
Check whether there is any structured content in the PDF. I wrote a blog article explaining this at http://www.jpedal.org/PDFblog/?p=410
If not, you will need to build it.
Maybe the Prawn ruby library? link text

Resources