Bash foreach on cronjob - bash

I am trying to create a "watch" folder where I will be able to copy files 2 sets of files with the same name, but different file extensions. I have a program that need to reference both files, but since they have the same name, only differing by extension I figure I might be able to do something like this with a cron job
cronjob.sh:
#/bin/bash
ls *.txt > processlist.txt
for filename in 'cat processlist.txt'; do
/usr/local/bin/runcommand -input1=/home/user/process/$filename \
-input2=/home/user/process/strsub($filename, -4)_2.stl \
-output /home/user/process/done/strsub($filename, -4)_2.final;
echo "$filename finished processing"
done
but substr is a php command, not bash. What would be the right way of doing this?

strsub($filename, -4)
in Bash is
${filename:(-4)}
See Shell Parameter Expansion.
Your command can look like
/usr/local/bin/runcommand "-input1=/home/user/process/$filename" \
"-input2=/home/user/process/${filename:(-4)}_2.stl" \
"-output /home/user/process/done/${filename:(-4)}_2.final"
Note: Prefer quoting your arguments with variables around double-quotes to prevent word splitting and possible pathname expansion. This would be helpful to filenames with spaces.
It would also be better to directly pass your glob pattern as an argument to for to properly distribute tokens without getting split with word splitting.
for filename in *.txt; do

So Konsolebox's solution was almost right, but the issue was that when you do ${filename:(-4)} it only returns the last 4 letters of the variable instead of trimming the last 4 off. When I did was change it to ${filename%.txt} where the %.txt matches to the text I want to find and remove, and then just tagged .mp3 on at the end to change the extension.
His other suggestion of using this for loop also was much better than mine:
for filename in *.txt; do
The only other modification was putting the full command all on one line in the end. I divided it up here to make sure it was all easily visible.

Related

How to remove all file extensions in bash?

x=./gandalf.tar.gz
noext=${x%.*}
echo $noext
This prints ./gandalf.tar, but I need just ./gandalf.
I might have even files like ./gandalf.tar.a.b.c which have many more extensions.
I just need the part before the first .
If you want to give sed a chance then:
x='./gandalf.tar.a.b.c'
sed -E 's~(.)\..*~\1~g' <<< "$x"
./gandalf
Or 2 step process in bash:
x="${s#./}"
echo "./${x%%.*}"
./gandalf
Using extglob shell option of bash:
shopt -s extglob
x=./gandalf.tar.a.b.c
noext=${x%%.*([!/])}
echo "$noext"
This deletes the substring not containing a / character, after and including the first . character. Also works for x=/pq.12/r/gandalf.tar.a.b.c
Perhaps a regexp is the best way to go if your bash version supports it, as it doesn't fork new processes.
This regexp works with any prefix path and takes into account files with a dot as first char in the name (hidden files):
[[ "$x" =~ ^(.*/|)(.[^.]*).*$ ]] && \
noext="${BASH_REMATCH[1]}${BASH_REMATCH[2]}"
Regexp explained
The first group captures everything up to the last / included (regexp are greedy in bash), or nothing if there are no / in the string.
Then the second group captures everything up to the first ., excluded.
The rest of the string is not captured, as we want to get rid of it.
Finally, we concatenate the path and the stripped name.
Note
It's not clear what you want to do with files beginning with a . (hidden files). I modified the regexp to preserve that . if present, as it seemed the most reasonable thing to do. E.g.
x="/foo/bar/.myinitfile.sh"
becomes /foo/bar/.myinitfile.
If performance is not an issue, for instance something like this:
fil=$(basename "$x")
noext="$(dirname "$x")"/${fil%%.*}

Downloading a list of files with WGET - rename files up to .jpg ie. get rid of extraneous text

My problem is pretty straightforward to understand.
I have images.txt which is a list of line separated URLs pointing to .jpg files separated as follows:
https://region.URL.com/files/2/2f/dir/2533x1946_IMG.jpg?Tag=2&Policy=BLAH__&Signature=BLAH7-BLAH-BLAH__&Key-Pair-Id=BLAH
I'm able to successfully download with wget -i but they are formatted like 2533x1946_IMG.jpg?BLAH_BLAH_BLAH_BLAH when I need them named like this instead: 2533x1946_IMG.jpg
Note that I've already tried the popular solutions to no avail (see below), so I'm thinking more along the lines of a solution that would involved sed, grep and awk
wget --content-disposition-i images.txt
wget --trust-server-names -i images.txt
wget --metalink-over-http --trust-server-names --content-disposition -i images.txt
wget --trust-server-names --content-disposition -i images.txt
and more iterations like this based on those three flags....
I'd ideally like to do it with one command, but even if it's a matter of downloading the files as-is and later doing a recursive command that renames them to the 2533x1946_IMG.jpg format is acceptable too.
1) you can use rename in ONE liner to rename all files
rename -n 's/[?].*//' *_BLAH
rename uses the next sintax 's/selectedString/whatYouChange/'
rename uses regex to find all your files and also to rename using a loop. Because your name is very specific, you can select it very easy. you're going to select the char ? and because in regex it has a special meaning youre going to put that in brackets [ ]. end result [?].
-n argument it's to show you what is going to change and not make the changes until you remove it. delete -n and changes will be applied.
.* is for selecting everything after the char ?, so BLAH_BLAH_BLAH_BLAH
// is for remove what you select, because there are NOT words OR anything in here.
*_BLAH is for selecting all files that end with _BLAH, you could use * but maybe you have other files, folders in that same place, so it's safer this way.
output
find . \
-name '*[?]*' \
-exec bash -c $'for f; do mv -- "$f" "${f%%\'?\'*}"; done' _ {} +
Why *[?]*? That prevents the ? from being treated as a single-character wildcard, and instead ensures that it only matches itself.
Why $'...\'?\'...'? The $'...' ANSI-C-style string quoting form allows backslash escapes to be able to specify literal ' characters even inside a single-quoted string.
Why bash -c '...' _ {} +? Unlike approaches that substitute the filenames that were found into code to be executed, this keeps those names out-of-band from the code, preventing shell injection attacks via hostile filenames. The _ placeholder fills in $0, so subsequent arguments become $1 and onword; and the for loop iterates over them (for f; do is the same as for f in "$#"; do).
What does ${f%%'?'*} do? This paramater expansion expands $f with the longest possible string matching the glob-style/fnmatch pattern '?'* removed from the end.

bash - mass renaming files with many special characters

I have a lot of files (in single directory) like:
[a]File-. abc'.d -001[xxx].txt
so there are many spaces, apostrophes, brackets, and full stops. The only differences between them are numbers in place of 001, and letters in place of xxx.
How to remove the middle part, so all that remains would be
[a]File-001[xxx].txt
I'd like an explanation how such code would work, so I could adapt it for other uses, and hopefully help answer others similar questions.
Here is a simple script in pure bash:
for f in *; do # for all entries in the current directory
if [ -f "$f" ]; then # if the entry is a regular file (i.e. not a directory)
mv "$f" "${f/-*-/-}" # rename it by removing everything between two dashes
# and the dashes, and replace the removed part
# with a single dash
fi
done
The magic done in the "${f/-*-/-}" expression is described in the bash manual (the command is info bash) in the chapter 3.5.3 Shell Parameter Expansion
The * pattern in the first line of the script can be replaced with anything than can help to narrow the list of the filles you want to rename, e.g. *.txt, *File*.txt, etc.
If you have the rename (aka prename) utility that's a part of Perl distribution, you could say:
rename -n 's/([^-]*-).*-(.*)/$1$2/' *.txt
to rename all txt files in your desired format. The -n above would not perform the actual rename, it'd only tell you what it would do had you not specified it. (In order to perform the actual rename, remove -n from the above command.)
For example, this would rename the file
[a]File-. abc'.d -001[xxx].txt
as
[a]File-001[xxx].txt
Regarding the explanation, this captures the part upto the first - into a group, and the part after the second (or last) one into another and combines those.
Read about Regular Expressions. If you have perl docs available on your system, saying perldoc perlre should help.

Bash escaping spaces in filename, in variable

I'm quite new to Bash so this might be something trivial, but I'm just not getting it. I'm trying to escape the spaces inside filenames. Have a look. Note that this is a 'working example' - I get that interleaving files with blank pages might be accomplished easier, but I'm here about the space.
#! /bin/sh
first=true
i=combined.pdf
o=combined2.pdf
for f in test/*.pdf
do
if $first; then
first=false
ifile=\"$f\"
else
ifile=$i\ \"$f\"
fi
pdftk $ifile blank.pdf cat output $o
t=$i
i=$o
o=$t
break
done
Say I have a file called my file.pdf (with a space). I want the ifile variable to contain the string combined.pdf "my file.pdf", such that pdftk is able to use it as two file arguments - the first one being combined.pdf, and the second being my file.pdf.
I've tried various ways of escaping (with or without first escaping the quotes themselves, etc.), but it keeps splitting my and file.pdf when executing pdftk.
EDIT: To clarify: I'm trying to pass multiple file names (as multiple arguments) in one variable to the pdftk command. I would like it to recognise the difference between two file names, but not tear one file name apart at the spaces.
Putting multiple arguments into a single variable doesn't make sense. Instead, put them into an array:
args=(combined.pdf "my file.pdf");
Notice that "my file.pdf" is quoted to preserve whitespace.
You can use the array like this:
pdftk "${args[#]}" ...
This will pass two separate arguments to pdftk. The quotes in "${args[#]}" are required because they tell the shell to treat each array element as a separate "word" (i.e. do not split array elements, even if they contain whitespace).
As a side note, if you use bashisms like arrays, change your shebang to
#!/bin/bash
Try:
find test/*.pdf | xargs -I % pdftk % cat output all.pdf
As I said in my comments on other answers xargs is the most efficient way to do this.
EDIT: I did not see you needed a blank page but I suppose you could pipe the find above to some command to put the blank page between (similar to a list->string join). I prefer this way as its more FP like.

bash for loop on directories with symbols

I'm trying to create a for loop on folders that contain spaces, comma's and parenthesis. For example:
Italy - Rimini (Feb 09, 2013)
First it scans a parent folder /albums for sub-folders that look like in the example above. Then it executes a curl actions on files in thoses sub-folders. It works fine if the sub-folders do not contain spaces, comma's or other symbols.
for dir in `ls /albums`;
do
for file in /albums/$dir/*
do
curl http://upload.com/up.php -F uploadfile[]=#"$file" > out.txt
php process.php
done
php match.php
done
But if there are such symbols, it seems the the curl bit gets stuck - it can't find the $file (probably because $dir is incorrect).
I could replace all the symbols in the sub-dirs or remove them or rename the folders to 001, 002 and it works flawlessly. But before resorting to that I'd like to know if it can be solved using bash tricks while keeping the sub-folder name intact?
Familiarize yourself with the concept of word splitting of your shell. Then realize that using ls to get a list of files with spaces is asking for trouble. Instead, use shell globbing and then quote expansions:
cd /albums
for dir in *; do
for file in /albums/"$dir"/*; do
echo x"$dir"x"$file"x
done
php match.php
done
For problems with spaces in filenames, you have to change the IFS to
IFS='
'
which tells the shell, that only linebreaks are file separators. By default IFS is set to tabs, spaces and linebreaks.
So just put this before the loop begins, and it will work with filenames that contains spaces.
And of course put quotes around your variablenames.

Resources