Image processing scripts are mysteriously killed. How do I detrmine why? - bash

I have an account on Bluehost, it's a shared machine. I have been able to run most custom scripts with no problem, but image processing scripts are killed mysteriously after about 20 seconds. No output file is created. Sometimes I can get the command line below to run if I restrict it to monochrome.
I tried ulimit and nice, but I feel I am just guessing. Is there a more methodical way to look into this? Yes, I am also contacting Bluehost support.
~]# gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT \
> -sDEVICE=png48 -sOutputFile=11006.png 11006.pdf
Killed
~]# echo $?
137
~]#

Problem 1 is that there is no png48 device in standard Ghostscript. The most likely problem after that, I'd guess is that its using too much memory, so you need to use controls which will limit the memory usage and instead use the clist (which is a display list implementation), this will take longer but reduce the memory footprint.
Available PNG output devices are png16, png16m, png256, pngalpha, pnggray, pngmono, pngmonod
Use the -dMaxBitmap switch to control the maximum page buffer size, if the page exceeds this then it will use the clist, which will result in 'n' bands. This is slower to process but uses much less memory. You can also use -dNumRenderingThreads if your system has multiple cores.
What version of Linux is this, and what version of Ghostscript, and where did the installed Ghostscript come from (eg did you build it yourself from source ?)
If its a very old version of Ghostscript, it may simply be that it has bugs.

Related

Batch Convert Sony raw ".ARW" image files to .jpg with raw image settings on the command line

I am looking to convert 15 Million 12.8 mb Sony .ARW files to .jpg
I have figured out how to do it using sips on the Command line BUT what I need is to make adjustments to the raw image settings: Contrast, Highlights, Blacks, Saturation, Vibrance, and most importantly Dehaze. I would be applying the same settings to every single photo.
It seems like ImageMagick should work if I can make adjustments for how to incorporate Dehaze but I can't seem to get ImageMagick to work.
I have done benchmark testing comparing Lightroom Classic / Photoshop / Bridge / RAW Power / and a few other programs. Raw Power is fastest by far (on a M1 Mac Mini 16GB Ram) but Raw Power doesn't allow me to process multiple folders at once.
I do a lot of scripting / actions with photoshop - but in this case photoshop is by far the slowest option. I believe this is because it opens each photo.
That's 200TB of input images, without even allowing any storage space for output images. It's also 173 solid days of 24 hr/day processing, assuming you can do 1 image per second - which I doubt.
You may want to speak to Fred Weinhaus #fmw42 about his Retinex script (search for "hazy" on that page), which does a rather wonderful job of haze removal. Your project sounds distinctly commercial.
© Fred Weinhaus - Fred's ImageMagick scripts
If/when you get a script that does what you want, I would suggest using GNU Parallel to get decent performance. I would also think you may want to consider porting, or having ported, Fred's algorithm to C++ or Python to run with OpenCV rather than ImageMagick.
So, say you have a 24-core MacPro, and a bash script called ProcessOne that takes the name of a Sony ARW image as parameter, you could run:
find . -iname \*.arw -print0 | parallel --progress -0 ProcessOne {}
and that will recurse in the current directory finding all Sony ARW files and passing them into GNU Parallel, which will then keep all 24-cores busy until the whole lot are done. You can specify fewer, or more jobs in parallel with, say, parallel -j 8 ...
Note 1: You could also list the names of additional servers in your network and it will spread the load across them too. GNU Parallel is capable of transferring the images to remote servers along with the jobs, but I'd have to question whether it makes sense to do that for this task - you'd probably want to put a subset of the images on each server with its own local disk I/O and run the servers independently yourself rather than distributing from a single point globally.
Note 2: You will want your disks well configured to handle multiple, parallel I/O streams.
Note 3: If you do write a script to process an image, write it so that it accepts multiple filenames as parameters, then you can run parallel -X and it will pass as many filenames as your sysctl parameter kern.argmax allows. That way you won't need a whole bash or OpenCV C/C++ process per image.

Why doesn't Linux cache object and/or ".so" files when using GNU Linker?

When linking executables (more than 200) in a large project, I get link rate 0.5 executables per second, even if I have ran the link stage a minute before. vmstat shows more than 20MB/s disk read rate.
But if I pre-cache the build directory using "tar cf /dev/null build-dir" once, I get consistent link rate of 4.8 executables per second and the disk read rate is basically zero.
Why doesn't Linux cache the object files and/or ".so" files when they are read by GNU Linker, but does so when they are read by tar? There is plenty of RAM (16GB). Kernel version is 4.4.146. CentOS 7.5.
It looks like an incorrect setting of vm.vfs_cache_pressure = 1000 was causing this misbehaviour. Setting it to 70 fixed the problem and restored good cache performance.
And the documentation explicitly recommends against increasing the value beyond 100. Unfortunately, the Internet is full of examples with insane values like 1000.

Pdftk heap sections error when over 512Mb memory

I use pdftk server to automate various tasks. Recently I ran into a problem where pdftk crashed while merging a large number of pdfs with the error window:
Fatal error in gc: Too many heap sections
After receiving this error I've run some tests to confirm that this same error will happen, regardless of what task pdftk is doing on pdfs, when its memory usage exceeds 512Mb.
I was hoping someone could help me understand what this error means, and if there's a way to set up pdftk to handle these larger jobs?
If it's just a limitation of the program, does anyone have a suggestion for a similarly functioning program without this limitation?
I used pdftk for dumping data, such as bookmarks/contents. It is very useful.
However, I met similar problems.
Ghostscript may be helpful.
Ghostscript can change original PDF file into newer one, and decrease its size.
Ghostscript also split 1 large PDF file into smaller PDF files.
My commands are:
gswin64c -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dBATCH -dQUIET -dNOPAUSE -dDOPDFMARKS -dFirstPage=(some number) -dLastPage=(some number) -sOutputFile=newfile.pdf originalfile.pdf
Parameters such as -dFirstPage=(some number) -dLastPage=(some number) may be skipped if the transformed PDF file is small enough
I wish the above could be helpful
So I know it's not perfect, but I just wrote my own implementation using iTextSharp to fulfill my requirements. If anyone else runs into this post and needs it, it's available at this github link under the AGPL license.

Slow ghostscript conversion and slow printing with large ps files

I have a service which produces pdf files. I have PS-compatible printers. For printing the pdf files, I use ghostscript to convert them to ps an copy them to a shared (windows) print queue. Most of the pdf-files contain just a few pages (<10) and don't cause any trouble.
From time to time I have to print large files (100+, 500+, 5000+) pages and there I observe the following:
converting to ps is fast for the first couple of pages, then slows down. The further the progress, the longer the time for a single page.
after conversion, copying to the print queue works without problems
when copying is finished and it comes to sending the document to the printer, I observe more or less the same phenomenon: the further the progress, the slower the transfer.
Here is how I convert pdf to ps:
"C:\Program Files\gs\gs9.07\bin\gswin64c.exe" \
-dSAFER -dNOPAUSE -DBATCH \
-sOutputFile=D:\temp\testGS\test.ps \
-sDEVICE=ps2write \
D:\temp\testGS\test.pdf
After this conversion I simply copy it to the print queue
copy /B test.ps \\printserever\myPSQueue
What possibilities do I have to print large files this way?
My first idea was to do the following:
"C:\Program Files\gs\gs9.07\bin\gswin64c.exe" \
-dSAFER -dNOPAUSE -DBATCH \
-sOutputFile=D:\temp\testGS\test%05d.ps \
-sDEVICE=ps2write \
D:\temp\testGS\test.pdf
Working with single pages speeds up the conversion, it doesn't slow down after every single page, and also printing is fast, when I copy every single page as own ps file to the printer. But there is one problem I will encounter sooner or later: when I copy the single ps files, they will be single print jobs. Even when they are sorted in the correct order, if someone else starts a print job on the same printer in between, the printings will all get mixed up.
The other idea was using gsPrint, which works considerable fast, but with gsPrint I need the printer to be installed locally, which is not manageable in my environment with 300+ printers at different locations.
Can anyone tell me exactly, what happens? Is this a bad way to print? Does any have a suggestion how to solve the task of printing such documents in such an environment?
Without seeing an example PDF file its difficult to say much about why it should print slowly. However the most likely explanation is that the PDF is being rendered to an image, probably because it contains transparency.
This will result in a large image, created at the default resolution of the device (720 dpi), which is almost certainly higher than required for your printer(s). This means that a latge amount of time is spent transmitting extra data to the printer, which the PostScript interpreter in the printer then has to discard.
Using gsprint renders the file to the resolution of the device, assuming this is less than 720 dpi the resulting PostScript will be smaller therefore requiring less time to transmit, less time to decompress on the printer and less time spent throwing away extra data.
One reason the speed decreases is because of the way ps2write works, it maintains much of the final content in temporary files, and stitches the main file back together from those files. It also maintains a cross reference table which grows as the number of objects int eh file does. Unless you need the files to be continuous you could create a number of print files by using the -dFirstPage and -dLastPage options so that only a subset of the final printout is created, this might improve the performance.
Note that ps2write does not render the incoming file to an image, while gsprint definitely does, the PostScript emerging from gsprint will simply define a big bitmap. This doesn't mantain colours (everything goes to RGB) and doesn't maintain vector objects as vectors, so it doesn't scale well. However.... If you want to use gsprint to print to a remote printer, you can set up a 'virtual printer' using RedMon. You can have RedMon send the output from a port to a totally different printer, even a remote one. So you use gsprint to print to (eg) 'local instance of MyPrinter' on RedMon1: and have the RedMon port set up to capture the print stream to disk and then send the PostScript file to 'MyPrinter on another PC'. Though I'd guess thats probably not going to be any faster.
My suggestion would be to set the resolution of ps2write lower; -r300 should be enough for any printer, and lower may be possible. The resolution will only affect rendered output, everything else remains as vectors and so scales nicely. Rendered images will print perfectly well at half the resolution of the printer, in general.
I can't say why the printer becomes so slow with the Ghostscript generated PostScript, but you might want to give other converters a try, like pdftops from the Poppler utils (I found a Windows download here as you seem to be using Windows).

How to convert multiple, different-sized PostScript files to a single PDF?

I'm using a command similar to this:
gswin32c.exe -dNOPAUSE -dBATCH -q -dSAFER -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress -sOutputFile="path/output.pdf" <PSfiles>
This gives me a single pdf document with each PS document represented as a page. However, the page sizes do not translate well. The original PS files are all different sizes and each resulting pdf page is cutoff to the same size, which looks like landscape A4.
When I convert a single PS file with the exact same command, the page size is preserved. So it seems like since all the PS files are being sent to the same pdf, they must all have the same page size and I lose content. Is there anyway to preserve the document sizes while still using a single command?
Update: I was originally using GS 8.63, but I downloaded 9.06 and have the same issue.
Additionally, I've narrowed the problem down. It seems like there is one specific PS file (call it problemFile.ps) that causes the problem, as I can run the command successfully as long as I disclude problemFile.ps. And it only causes a problem if it is the last file included on the command line. I can't post the entire file, but are there any potential problem areas I should look at?
Update2: Okay I was wrong in saying there is one specifc problem file. It appears that the page size of the last file included on the command line sets the maximum page size for all the resultant pages.
As long as each PostScript file (or indeed each page) actually requests a different media size then the resulting PDF file will honour the requests. I know this at least used to work, I've tested it.
However there are some things in your command line which you might want to reconsider:
1) When investigating problems with GS, don't use -q, this will prevent Ghostscript telling you potentially useful things.
2) DON'T use -dPDFSETTINGS unless you have read the relevant documentation and understand the implications of each parameter setting.
3) You may want to turn off AutoRotatePages, or at least set it to /PageByPage
My guess is that your PostScript files don't request a media size and therefore use the default media. Of course I can't tell without seeing an example.
NB you also don't say what version of Ghostscript you are using.

Resources