I'm wanting to progress through a directory's subdirectories and either convert or place .TIF images into a pdf. I have a directory structure like this:
folder
item_one
file1.TIF
file2.TIF
...
fileN.TIF
item_two
file1.TIF
file2.TIF
...
...
I'm working on a Mac and considered using sips to change my .TIF files to .PNG files and then use pdfjoin to join all the .PNG files into a single .PDF file per folder.
I have used:
for filename in *; do sips -s format png $filename --out $filename.png; done
but this only works for the .TIF files in a single directory. How would one write a shellscript to progress through a series of directories as well?
once the .PNG files were created I'd do essentially the same thing but using:
pdfjoin --a4paper --fitpaper false --rotateoversize false *.png
Is this a valid way of doing this? Is there a better, more efficient way of performing such an action? Or am I being an idiot and should be doing this with some sort of software, like ImageMagick or something?
Try using the find command with the exec switch to call your image conversion solution. Alternatively, instead of using the exec switch, you could pipe the output of find to xargs. There is lots of information online about using find. Here's one example from StackOverflow.
As far as the image conversion, I think that really depends on your requirements for speed and efficiency. If you've verified the process you described, and this is a one-time process, and it only takes seconds or minutes to run, then you're probably fine. On the other hand, if you need to do this frequently, then it might be worth investing the time to find a one-step conversion solution that takes less time than your current, two-pass solution.
Note that, instead of two passes, you may be able to pipe the output of sips to pdfjoin; however, that would require some investigation to verify.
Related
Goal is to convert all .wav files to .mp3 in a different location.
The following code works, but creates output files in the same directory.
All the newly created .mp3's are right alongside the .wav's.
for file in /path/to/*.wav; do lame --preset insane "$file" "${file%.wav}".mp3; done
How can I use terminal to convert a drive full of .wav's with lame and output the .mp3's to a different drive? I've tried changing lame's output, but this syntax grabs the entire filename. Looking for the most simple solution.
From the lame manual, the synopsis is very straightforward:
lame [options] <infile> <outfile>
Found the basic concept here
Assuming that the output files should be placed to /output, possible to extend loop to calculate the output file name using the 'basename'
OUT=/output
for file in /path/to/*.wav; do
# Replace .wav with .mp3
out=${file%.wav}.mp3
# Remove directory (anything up to the last '/'
out=${file##*/}
lame --preset insane "$file" $OUT/$out
done
I regularly scan in my Homework for class. My scanner exports raw jpg files to usb, and from there I can use gimp to edit and save the files as a pdf. One time saver I've found is to export my multi-page homeworks as a .mng file and then use the convert function to turn it into a pdf. I do it this way because Gimp automatically merges all layers when exporting to a pdf.
convert HW.mng HW.pdf
this works well for individual files, but at the end of every week I can have dozens of files to convert.
I have tried using wildcards in the filenames for convert:
convert *.mng *.pdf
This always runs successfully and never throws an error, but never produces any pdfs.
Both
convert HW*.mng HW*.pdf
and
convert "HW*.mng" "HW*.pdf"
yeild the error
convert: unable to open image `HW*.pdf': Invalid argument # error/blob.c/OpenBlob/2712.
which I think means the error lies in exporting with a wildcard.
Is there any way to convert all of a specific file type to another using convert? Or should I try using a different program?
You can see this StackExchange post. The accepted answer basically does what you want.
for file in *.mng; do convert -input "$file" -output "${file/%mng/pdf}"; done
For convert in particular, use mogrify (which is part of ImageMagick as well) as suggested by Mark Setchell in a comment. mogrify can be used to edit/convert files in batches. The command for your case would be
mogrify -format pdf -- *.mng
i have an app that has decided to die which had a library of images it stored on my hard drive in a series of guid-like folders. the files themselves have no file extensions, there must have been an internal database (unrecoverable/corrupt) that associated the file itself with its name/extension/mime. So to get my stuff back out I'd like to be able to search the disk to at least identify which of the files are images (jpeg and png files). I know that both jpeg and png have particular byte sequences in the first few bytes of the files. Is there a grep command that can match these known byte sequences in the first few bytes of each file in the massively nested file system structure that I have (e.g. folders 0 through f, each containing folders 0 through f, nested several levels deep, with files with uid filenames.
Starting at the current directory .:
find . -type f -print0 | xargs -J fname -0 -P 4 identify -ping fname 2>|/dev/null
This will print the files that ImageMagick can identify, which are mostly images, but there are also exceptions (like txt files). ImageMagick is not particularly fast for this task either, so depending on what you have available there might be faster alternatives. For instance, the PIL package for Python will make this faster simply because it supports a lesser amount of image formats, but which might be enough for your task.
I have a shell script. A cron job runs it once a day. At the moment it just downloads a file from the web using wget, appends a timestamp to the filename, then compresses it. Basic stuff.
This file doesn't change very frequently though, so I want to discard the downloaded file if it already exists.
Easiest way to do this?
Thanks!
Do you really need to compress the file ?
wget provides -N, --timestamping which obviously, turns on time-stamping. What that does is say your file is located at www.example.com/file.txt
The first time you do:
$ wget -N www.example.com/file.txt
[...]
[...] file.txt saved [..size..]
The next time it'll be like this:
$ wget -N www.example.com/file.txt
Server file no newer than local file “file.txt” -- not retrieving.
Except if the file on the server was updated.
That would solve your problem, if you didn't compress the file.
If you really need to compress it, then I guess I'd go with comparing the hash of the new file/archive and the old. What matters in that case is, how big is the downloaded file ? is it worth compressing it first then checking the hashes ? is it worth decompressing the old archive and comparing the hashes ? is it better to store the old hash in a txt file ? do all these have an advantage over overwriting the old file ?
You only know that, make some tests.
So if you go the hash way, consider sha256 and xz (lzma2 algorithm) compression.
I would do something like this (in Bash):
newfilesum="$(wget -q www.example.com/file.txt -O- | tee file.txt | sha256sum)"
oldfilesum="$(xzcat file.txt.xz | sha256sum)"
if [[ $newfilesum != $oldfilesum ]]; then
xz -f file.txt # overwrite with the new compressed data
else
rm file.txt
fi
and that's done;
Calculate a hash of the content of the file and check against the new one. Use for instance md5sum. You only have to save the last MD5 sum to check if the file changed.
Also, take into account that the web is evolving to give more information on pages, that is, metadata. A well-founded web site should include file version and/or date of modification (or a valid, expires header) as part of the response headers. This, and quite other things, is what makes up the scalability of Web 2.0.
How about downloading the file, and checking it against a "last saved" file?
For example, the first time it downloads myfile, and saves it as myfile-[date], and compresses it. It also adds a symbolic link, such as lastfile pointing to myfile-[date]. The next time the script runs, it can check if the contents of whatever lastfile points to is the same as the new downloaded file.
Don't know if this would work well, but it's what I could think of.
You can compare the new file with the last one using the sum command. This takes the checksum of the file. If both files have the same checksum, they are very, very likely to be exactly the same. There's another command called md5 that takes the md5 fingerprint, but the sum command is on all systems.
I would like to scan text of textfiles in Matlab with the textscan function. Before I can open the textfile with fid = fopen('C:\path'), I need to unzip the files first. The files have the extension: *.gz
There are thousands of files which I need to analyze and high performance is important.
I have two ideas:
(1) Use an external program an call it from the command line in Matlab
(2) Use a Matlab 'zip'toolbox. I have heard of gunzip, but don't know about its performance.
Does anyone knows a way to unzip these files as quick as possible from within Matlab?
Thanks!
You could always try the Matlab unzip() function:
unzip
Extract contents of zip file
Syntax
unzip(zipfilename)
unzip(zipfilename, outputdir)
unzip(url, ...)
filenames = unzip(...)
Description
unzip(zipfilename) extracts the archived contents of zipfilename into the current folder and sets the files' attributes, preserving the timestamps. It overwrites any existing files with the same names as those in the archive if the existing files' attributes and ownerships permit it. For example, files from rerunning unzip on the same zip filename do not overwrite any of those files that have a read-only attribute; instead, unzip issues a warning for such files.
Internally, this uses Java's zip library org.apache.tools.zip. If your zip archives each contain many text files it might be faster to drop down into Java and extract them entry by entry, without explicitly unzipped files. look at the source of unzip.m to get some ideas, and also the Java documentation.
I've found 7zip-commandline(Windows) / p7zip(Unix) to be somewhat speedier for this.
[edit]From some quick testing, it seems making a system call to gunzip is faster than using MATLAB's native gunzip. You could give that a try as well.
Just write a new function that imitates basic MATLAB gunzip functionality:
function [] = sunzip(fullfilename,output_dir)
if ~exist('output_dir','var'), output_dir = fileparts(fullfilename); end
app_path = '/usr/bin/7za';
switches = ' e'; %extract files ignoring directory structure
options = [' -o' output_dir];
system([app_path switches options '_' fullfilename]);
Then use it as you would use gunzip:
sunzip('/data/time_1000.out.gz',tmp_dir);
With MATLAB's toc timer, I get the following extraction times with 6 uncompressed 114MB ASCII files:
gunzip: 10.15s
sunzip: 7.84s
worked well, just needed a minor change to Max's syntax calling the executable.
system([app_path switches ' ' fullfilename options ]);