acroread pdf to postscript conversion too slow - bash

I am converting pdf file in postscript file using acroread command.
The conversion is successfull but it is too slow and almost uses 100% of CPU,
because of this my application hangs for some time and thus no user is able to do
anything.
The code i am using is:-
processBuilder = new ProcessBuilder("bash","-c","acroread -toPostScript -size "+width+"x"+height+" -optimizeForSpeed sample.pdf");
pp = processBuilder.start();
pp.waitFor();
Is there a way to speed up the process and make it to use less percentage of CPU.
Please help!!!!

I'd suggest you start by using strace on the command line to diagnose the problem.
strace -tt -f acroread -toPostScript -size 1000x2500 -optimizeForSpeed sample.pdf.
I suspect you may find it spends a lot of time reading font files.
If you have a choice then poppler or Xpdf or even ghostscript should be more supported and performant options, especially considering acroread is now unsupported on linux.

Related

How to convert images with parallel from one directory to another

I am trying to use the following command:
ls -1d a/*.jpg | parallel convert -resize 300x300 {}'{=s/\..*//=}'.png
However one problem that I didn't succeed to solve is to have the files to be output to folder b and not in the same folder
Spent quite some times looking for an answer but didn't find any where the files are piped through ls command. (thousands of pictures). I would like to keep the same tools (ls pipe, parallel and convert - or mogrify if better)
First, with mogrify:
mkdir -p b # ensure output directory exists
magick mogrify -path b -resize 300x300 a/*.jpg
This creates a single mogrify process that does all the files without the overhead of creating a new process for each image. It is likely to be faster if you have a smallish number of images. The advantage of this method is that it doesn't require you to install GNU Parallel. The disadvantage is that there is no parallelism.
Second, with GNU Parallel:
mkdir -p b # ensure output directory exists
parallel --dry-run magick {} b/{/} ::: a/*.jpg
Here {/} means "the filename with the directory part removed" and GNU Parallel does it all nicely and simply for you.
If your images are large, say 8-100 megapixels, it will definitely be worth using the JPEG "shrink-on-load" feature to reduce disk i/o and memory pressure like this:
magick -define jpeg:size=512x512 ...
in the above command.
This creates a new process for each image, and is likely to be faster if you have lots of CPU cores and lots of images. If you have 12 CPU cores it will keep all 12 busy till all your images are done - you could change the number or percentage of used cores with -j parameter. The slight performance hit is that a new convert process is created for each image.
Probably the most performant option is to use GNU Parallel for parallelism along with mogrify to amortize process creation across more images, say 32, like this:
mkdir -p b
parallel -n 32 magick mogrify -path b -resize 300x300 ::: a/*.jpg
Note: You should try to avoid parsing the output of ls, it is error prone. I mean avoid this:
ls file*.jpg | parallel
You should prefer feeding in filenames like this:
parallel ... ::: file*.jpg
Note: There is a -X option for GNU Parallel which is a bit esoteric and likely to only come into its own with hundreds/thousands/millions of images. That would pass as many filenames as possible (in view of command-line length limitations) to each mogrify process. And amortise the process startup costs across more files. For 99% of use cases the answers I have given should be performant enough.
Note: If your machine doesn't have multiple cores, or your images are very large compared to the installed RAM, or your disk subsystem is slow, your mileage will vary and it may not be worth parallelising your code. Measure and see!

How can I make bash execute an ELF binary from stdin?

For some obscure reason I have written a bash script which generates some source code, then compiles it, using
... whatever ... | gcc -x c -o /dev/stdout
Now, I want to execute the result on the compilation. How can I make that happen? No use of files please.
As Charles Duffy said, to execute a binary, you'd have to tell your operating system (which seems to be a Unix variant) to load and execute something – and Unix systems only take files to execute them directly.
What you could do is have a process that prepares a memory region containing the ELF binary, fork and jump into that region - but even that is questionable, considering that there's CPU support to suppress exactly that operation (R^X). Basically, what you need is a runtime linker, and shells do not (and also: should not) include something like that.
Let's drop the Bash requirement (which really just sounds like you're trying to find an obvious hole in an application that is older than I am grumpy):
Generally, requiring ELF (which is a file format) and avoiding files at the same time is a tad complicated. GCC generates machine code. If you just want to execute known machine code, put it into some buffer, build a function pointer to that and call it. Simple as that. However, you'd obviously don't have all the nice relocation and dynamic linking that the process of executing an ELF binary or loading a shared object (dlopen) would have.
If you want that, I'd look in the direction of things like LLVM – I know, for a fact, that there's people building "I compile C++ at runtime and execute it" with LLVM as executing instance, and clang as compiler. In the end, what your gcc|something is is really just JIT – an old technology :)
If your goal is to not write to the filesystem at all, then neither bash nor any other UNIX program will be able to help you execute an ELF from a pipe - execve only takes a path to a regular file as its filename and will fail (setting errno to EACCES) if you pass it a special file (device or named pipe) or a directory.
However, if your goal is to keep the executable entirely in RAM and not touch the hard disk (perhaps because the disk is read-only) you can do something with the same effect on your machine by using tmpfs, which comes with many UNIX-like systems (and is used in Linux to implement semaphores) and allows you to create a full-permissions filesystem that resides entirely in RAM:
$ sudo mount -t tmpfs -o size=10M tmpfs /mnt/mytmpfs
You can then write your binary to that:
... whatever ... | gcc -x c -o /mnt/mytmpfs/program.out
/mnt/mytmpfs/program.out
and bash will load it for you as if it was on disk.
Note, however, that you do still need enough RAM onboard the device to store and execute the program - though due to the nature of most executable binaries, you would need that anyway.
If you don't want to leave the program behind on your ramdisk (or normal disk, if that is acceptable) for others to find, you can also delete the file immediately after starting to execute it:
/mnt/mytmpfs/program.out &
rm /mnt/mytmpfs/program.out
The name will disappear immediately, but the process will internally hold a hard link to that file, then release that hard link when it terminates, allowing the file to be immediately deleted from disk. (However, the storage won't actually be freed until the program exits, and the program will not be able to exec itself either).

Ignore user-input when running a unix command from within Matlab

I have a Matlab program that runs different unix commands fairly often. For this question let's assume that what I'm doing is:
unix('ls test')
It happens to me quite frequently that I accidentally press a key(like enter or the arrow keys) f.e. when I'm waking up my display from standby. In theory this shouldn't interfere with the unix command. Though unfortunately, Matlab will take this input and forward it right into the execution of the command. The above command then becomes something like this:
unix('ls te^[0Ast')
(Side note: ^[0A is the hex representation of the linefeed character)
Obviously, this will produce an error.
Does anyone have an idea how to work around this issue?
I was thinking that there might be a way to start Matlab with my script in a way that doesn't forward any user input from within the unix shell.
#!/bin/bash
matlab -nodisplay -nosplash -r "runMyScript();"
Can I somehow pipe the user-input somewhere else and isolate Matlab from any sort of input?
That is not very specific question, but let me try. I can see several options. I am assuming that matlab is text terminal application.
There is nohup(1) command. Since you use linux, chances are that there is non-posix version if it which says in it's man page: If standard input is a terminal, redirect it from /dev/null.
$ nohup matlab -nodisplay -nosplash -r "runMyScript();"
You can redirect /dev/null yourself
$ matlab -nodisplay -nosplash -r "runMyScript();" < /dev/null
But matlab can actually re-open it's stdin ignoring what you piped into it (for example ssh does that, you can't use echo password | ssh somewhere.
if you are running in graphics environment you may want to minimise the window, so that it does not receive any input. Probably not your case, you would figure out yourself :)
you may try to wake up by hitting "Ctrl", similar key or mouse
You may run matlab in screen(1) command disconnect from the screen or switch to different window. Screen is a program allowing you to create virtual terminals (similar to virtual desktops in GUI). If you haven't heard of screen, I suggest you to look at some tutorials. Googling for gnu screen tutorial seems to offer quite a few.

gdalinfo - how to pause the outputting data

I am using GDAL. in command prompt, i am doing
$ gdalinfo (my file location)
It works but because it is a huge file the command gives a lot of information. I am only interested in seeing what's near the beginning. The command prompt only allows scrolling up to the last 1000 or so lines of info (it must give about 100,000 lines or so). How can I do this?
This will depend on the OS and utilities it provides. I am assuming you are using a POSIX OS which support pipes and provides utilities such as less/more. The command in this case would be:
$ gdalinfo file.tif | less
If less is unavailable, you may have the more command installed. You can also save the output from gdalinfo into a file and look at the file later.
$gdalinfo tile.tif > output.txt
n
[Appended]
On Windows, I get a truncated response like this:
C:\Users\elijah>gdalinfo "C:\xData\GreeneCountyMo\ortho_1-1_1n_s_mo077_2010_1.sid" | more
(Use ENTER/RETURN to advance to the next line, and CTRL+C to "escape" when you're finished.)
Or I can do the outfile as well:
C:\Users\elijah>gdalinfo "C:\xData\ortho_1-1_1n_s_mo077_2010_1.sid" > "C:\xData\gdalinfo.txt"
If you are on a windows machine... What type of file are you using? Perhaps it contains a lot of ground control points, which you can skip using the -nogcp flag, or skip the metadata using the -nomd flag (see http://www.gdal.org/gdalinfo.html). Also, see --help-general; you might have the --debug flag set to on?

How do I call Matlab in a script on Windows?

I'm working on a project that uses several languages:
SQL for querying a database
Perl/Ruby for quick-and-dirty processing of the data from the database and some other bookkeeping
Matlab for matrix-oriented computations
Various statistics languages (SAS/R/SPSS) for processing the Matlab output
Each language fits its niche well and we already have a fair amount of code in each. Right now, there's a lot of manual work to run all these steps that would be much better scripted. I've already done this on Linux, and it works relatively well.
On Linux:
matlab -nosplash -nodesktop -r "command"
or
echo "command" | matlab -nosplash -nodesktop
...opens Matlab in a "command line" mode. (That is, no windows are created -- it just reads from STDIN, executes, and outputs to STDOUT/STDERR.) My problem is that on Windows (XP and 7), this same code opens up a window and doesn't read from / write to the command line. It just stares me blankly in the face, totally ignoring STDIN and STDOUT.
How can I script running Matlab commands on Windows? I basically want something that will do:
ruby database_query.rb
perl legacy_code.pl
ruby other_stuff.rb
matlab processing_step_1.m
matlab processing_step_2.m
# etc, etc.
I've found out that Matlab has an -automation flag on Windows to start an "automation server". That sounds like overkill for my purposes, and I'd like something that works on both platforms.
What options do I have for automating Matlab in this workflow?
matlab -nosplash -nodesktop -r "command"
works on Windows as well. Yes, it opens another window, but it's not a problem. I run it in a batch mode from Java wrapper on Tomcat server and there were no issues. Put all commands into a script file, don't forget to close the session with exit command, and use -r flag. Also you can find -noFigureWindows and -wait parameters useful. And it works on both Windows and Unix. You can use platform-specific flags, if some are not supported they will be ignored.
Start MATLAB program (Windows platforms)
There is also a way to hide matlab window with C library. See engSetVisible.

Resources