GhostScript output files with original file name - ghostscript

I'm converting a PDF file to separate images using this command:
-sDEVICE=jpeg -o page-%02d.png X.pdf
This outputs the files as:
page-01.jpeg, page-02.jpeg, and so on.
However, I want to output the files with this file name:
X-page-01.jpeg, X-page-02.jpeg, and so on.
Is it possible to do this with GhostScript?

Well yes, clearly its possible:
-sDEVICE=jpeg -o X-page%02d.jpeg X.pdf
(I presume that you actually meant page-%02d.jpeg, rather than .png, since you are specifying the jpeg device).
No, Ghostscript won't automagically prepend the input filename to the output filename, you have to do that yourself.

Related

pdftk update_info command raising a warning which I don't understand

I'm trying to use the update_info command in order to add some bookmarks to an existing pdf's metadata using pdftk and powershell.
I first dump the metadata into a file as follows:
pdftk .\test.pdf dump_data > test.info
Then, I edit the test.info file by adding the bookmarks, I believe I am using the right syntax. I save the test.info file and attempt to write the metadata to a new pdf file using update_info:
pdftk test.pdf update_info test.info output out.pdf
Unfortunately, I get a warning as follows:
pdftk Warning: unexpected case 1 in LoadDataFile(); continuing
out.pdf is generated, but contains no bookmarks. Just to be sure it is not a syntax problem, I also ran it without editing the metadata file, by simply overwriting the same metadata. I still got the same warning.
Why is this warning occurring? Why are no bookmarks getting written to my resulting pdf?
using redirection in that fashion
pdftk .\test.pdf dump_data > test.info
will cause this known problem by building wrong file structure, so change to
pdftk .\test.pdf dump_data output test.info
In addition check your alterations are correctly balanced (and no unusual characters) then save the edited output file in the same encoding.
Note:- you may need to consider
Use dump_data_utf8 and update_info_utf8 in order to properly display characters in scripts other than Latin (e. g. oriental CJK)
I used pdftk --help >pdftk-help.txt to find the answer.
With credit to the previous answer, the following creates a text file of the information parameters: pdftk aaa.pdf dump_data output info.txt
Edit the info.txt file as needed.
The pdftk update_info option creates a new pdf file, leaving the original pdf untouched. Use: pdftk aaa.pdf update_info info.txt output bbb.pdf

Batch Convert Files Different Output Folder

Goal is to convert all .wav files to .mp3 in a different location.
The following code works, but creates output files in the same directory.
All the newly created .mp3's are right alongside the .wav's.
for file in /path/to/*.wav; do lame --preset insane "$file" "${file%.wav}".mp3; done
How can I use terminal to convert a drive full of .wav's with lame and output the .mp3's to a different drive? I've tried changing lame's output, but this syntax grabs the entire filename. Looking for the most simple solution.
From the lame manual, the synopsis is very straightforward:
lame [options] <infile> <outfile>
Found the basic concept here
Assuming that the output files should be placed to /output, possible to extend loop to calculate the output file name using the 'basename'
OUT=/output
for file in /path/to/*.wav; do
# Replace .wav with .mp3
out=${file%.wav}.mp3
# Remove directory (anything up to the last '/'
out=${file##*/}
lame --preset insane "$file" $OUT/$out
done

Extracting specific files with file extension from a .tar.xz archive using MacOS terminal

I have a number of compressed archives with the extension .tar.xz. I am advised that, when decompressed, the total size required is around 2TB.
Within the archives are a number of images that I am solely after.
Is there a method to solely extract files for example with the extensions .jpeg, .jpeg and .gif from the compressed archives without having to extract every file?
Thanks
It's trivial to just extract one of the file types; for example:
tar -xjf archive.tar.xz '*.jpeg'
will extract all files with the .jpeg extension. It's important to quote the *, as otherwise the shell would attempt to expand it, and would only try to match only the files that were found (or fail because there were no files with that name).
You can similarly use other patterns like '*.gif', or both together:
tar -xjf archive.tar.xz '*.jpeg' '*.gif'
Because you tag that you're using OSX, I'll skip the need to use the --wildcards option, which is needed when trying to extract only those files under linux.

shellscript to convert .TIF to a .PDF

I'm wanting to progress through a directory's subdirectories and either convert or place .TIF images into a pdf. I have a directory structure like this:
folder
item_one
file1.TIF
file2.TIF
...
fileN.TIF
item_two
file1.TIF
file2.TIF
...
...
I'm working on a Mac and considered using sips to change my .TIF files to .PNG files and then use pdfjoin to join all the .PNG files into a single .PDF file per folder.
I have used:
for filename in *; do sips -s format png $filename --out $filename.png; done
but this only works for the .TIF files in a single directory. How would one write a shellscript to progress through a series of directories as well?
once the .PNG files were created I'd do essentially the same thing but using:
pdfjoin --a4paper --fitpaper false --rotateoversize false *.png
Is this a valid way of doing this? Is there a better, more efficient way of performing such an action? Or am I being an idiot and should be doing this with some sort of software, like ImageMagick or something?
Try using the find command with the exec switch to call your image conversion solution. Alternatively, instead of using the exec switch, you could pipe the output of find to xargs. There is lots of information online about using find. Here's one example from StackOverflow.
As far as the image conversion, I think that really depends on your requirements for speed and efficiency. If you've verified the process you described, and this is a one-time process, and it only takes seconds or minutes to run, then you're probably fine. On the other hand, if you need to do this frequently, then it might be worth investing the time to find a one-step conversion solution that takes less time than your current, two-pass solution.
Note that, instead of two passes, you may be able to pipe the output of sips to pdfjoin; however, that would require some investigation to verify.

Listing the contents of a LZMA compressed file?

Is it possible to list the contents of a LZMA file (.7zip) without uncompressing the whole file? Also, can I extract a single file from the LZMA file?
My problem: I have a 30GB .7z file that uncompresses to >5TB. I would like to manipulate the original .7z file without needing to do a full uncompress.
Yes. Start with XZ Utils. There are Perl and Python APIs.
You can find the file you want from the headers. Each file is compressed separately, so you can extract just the one you want.
Download lzma922.tar.bz2 from the LZMA SDK files page on Sourceforge, then extract the files and open up C/Util/7z/7zMain.c. There, you will find routines to extract a specific archive file from a .7z archive. You don't need to extract all the data from all the entries, the example code shows how to extract just the one you are interested in. This same code has logic to list the entries without extracting all the compressed data.
I solved this problem by installing 7zip (https://www.7-zip.org/) and using the parameter l. For example:
7z l file.7z
The output has some descriptive information and the list of files in the compressed files. Then, I call this inside python using the subprocess library:
import subprocess
output = subprocess.Popen(["7z","l", "file.7z"], stdout=subprocess.PIPE)
output = output.stdout.read().decode("utf-8")
Don't forget to make sure the program 7z is accessible in your PATH variable. I had to do this manually in Windows.

Resources