Bash Script which recursively makes all text in files lowercase - bash

I'm trying to write a shell script which recursively goes through a directory, then in each file converts all Uppercase letters to lowercase ones. To be clear, I'm not trying to change the file names but the text in the files.
Considerations:
This is an old Fortran project which I am trying to make more accessible
I do not want to create a new file but rather write over the old one with the changes
There are several different file extensions in this directory, including .par .f .txt and others
What would be the best way to go about this?

To convert a file from lower case to upper case you can use ex (a good friend of ed, the standard editor):
ex -s file <<EOF
%s/[[:upper:]]\+/\L&/g
wq
EOF
or, if you like stuff on one line:
ex -s file <<< $'%s/[[:upper:]]\+/\L&/g\nwq'
Combining with find, you can then do:
find . -type f -exec bash -c "ex -s -- \"\$0\" <<< $'%s/[[:upper:]]\+/\L&/g\nwq'" {} \;
This method is 100% safe regarding spaces and funny symbols in the file names. No auxiliary files are created, copied or moved; files are only edited.
Edit.
Using glenn jackmann's suggestion, you can also write:
find . -type f -exec bash -c 'printf "%s\n" "%s/[[:upper:]]\+/\L&/g" "wq" | ex -- -s "$0"' {} \;
(the pro is that it avoids awkward escapes; the con is that it's longer).

You can translate all uppercase characters (A–Z) to lowercase (a–z) using the tr command
and specifying a range of characters, as in:
$ tr 'A-Z' 'a-z' <be.fore >af.ter
There is also special syntax in tr for specifying this sort of range for upper- and lowercase
conversions:
$ tr '[:upper:]' '[:lower:]' <be.fore >af.ter
The tr utility copies the given input to produced the output with substitution or deletion of selected characters. tr abbreviated as translate or transliterate. It takes as parameters two sets of characters, and replaces occurrences of the characters in the first set with the corresponding elements from the other set i.e. it is used to translate characters.
tr "set1" "set2" < input.txt > output.txt
Although tr doesn't support regular expressions, hmm, it does support a range of characters.
Just make sure that both arguments end up with the same number of characters.
If the second argument is shorter, its last character will be repeated to match the
length of the first argument. If the first argument is shorter, the second argument will
be truncated to match the length of the first.

sed -e 's/\(.*\)/\L\1/g' *
or you could pipe the files in from find

Expanding on #nullrevolution's solution:
find /path_to_files -type f -exec sed --in-place -e 's/\(.*\)/\L\1/g' '{}' \;
This one liner will look for all files in all sub-directories starting with /path_to_files as a base directory.
WARNING: This will change the case on ALL files in EVERY directory under */path_to_file*, so make sure you want to do that before you execute this script. You can limit the scope of the find based on file extensions by utilizing the following:
find /path_to_files -type f -name \*.txt -exec sed --in-place -e 's/\(.*\)/\L\1/g' '{}' \;
You may also want to make a backup of the original file before modifying the original:
find /path_to_files -type f -name *.txt -exec sed --in-place=-orig -e 's/(.*)/\L\1/g' '{}' \;
This will leave the original file name, while making an unmodified copy with the "_orig" appended to the file name (ie file.txt would become file.txt-orig).
An explanation of each piece:
find /path_to_file This will set the base directory to the path provided.
-type f This will search the directory hierarchy for files only.
-exec COMMAND '{}' \; This executes the provided command once for each matched file. The '{}' is replaced by the current file name. The \; indicates the end of the command.
sed --in-place -e 's/\(.*\)/\L\1/g' The --in-place will make the cnages to the file without backing up the file. The regular expression uses a backreference \1 to refer to the entire line and the \L to convert to lower case.
Optional
(For a more archaic solution.)
find /path_to_files -type f -exec dd if='{}' of='{}'-lc conv=lcase \;

Identifying text files can be a bit tricky in Unixlike environments. You can do something like this:
set -e -o noclobber
while read f; do
tr 'A-Z' 'a-z' <"$f" >"f.$$"
mv "$f.$$" "$f"
done < <(find "$start_directory" -type f -exec file {} + | cut -d: -f1)
This will fail on filenames with embedded colons or newlines, but should work on others, including those with spaces.

Related

I'm writing a script and it is printing all the spaces in new lines. It is a Bash script

for f in $( tasks/1_uniq/ -type f -follow -print | sed -r 's/[[:blank:]]+/ /g' ); do
md5sum $f
done
And this is what is printing where it finds a space.
tasks/1_uniq/two/6/66/test/me
&
my
friends
I cant manage to escape the spaces properly.
this is due to word splitting, without quotes after expansions it is split by characters in '$IFS' (space tab and newline), by double quoting the whole expansion will be taken as an argument. It seems you want to split by newlines. It can be done easily with read;
while read filepath; do
md5sum "$filepath" # note the double quotes to avoid word splitting
done < <( tasks/1_uniq/ -type f -follow -print | sed -r 's/[[:blank:]]+/ /g' )
I don't understand sed command which modifies filenames, it will give wrong filenames, another option
find ... -exec md5sum {} +
where ... is replaced with options
From your original code, it seems that your goal is to print out the md5 checksums for all files under a directory. In that case, you can simply use rhash
rhash -r -M dir/
-r for recursive and -M for md5 hash sum

Add suffix to all files in the directory with an extension

How to add a suffix to all files in the current directory in bash?
Here is what I've tried, but it keeps adding an extra .png to the filename.
for file in *.png; do mv "$file" "${file}_3.6.14.png"; done
for file in *.png; do
mv "$file" "${file%.png}_3.6.14.png"
done
${file%.png} expands to ${file} with the .png suffix removed.
You could do this through rename command,
rename 's/\.png/_3.6.14.png/' *.png
Through bash,
for i in *.png; do mv "$i" "${i%.*}_3.6.14.png"; done
It replaces .png in all the .png files with _3.6.14.png.
${i%.*} Anything after last dot would be cutdown. So .png part would be cutoff from the filename.
mv $i ${i%.*}_3.6.14.png Rename original .png files with the filename+_3.6.14.png.
If you are familiar with regular expressions sed is quite nice.
a) modify the regular expression to your liking and inspect the output
ls | sed -E "s/(.*)\.png$/\1_foo\.png/
b) add the p flag, so that sed provides you the old and new paths. Feed this to xargs with -n2, meaning that it should keep the pairing of 2 arguments.
ls | sed -E "p;s/(.*)\.png/\1_foo\.png/" | xargs -n2 mv
If you know how to rename a single file to your liking programmatically
fname=myfile.png
mv $fname ${fname%.png}_extended.png
you can batch apply this command with xargs:
find -name "*.png" | xargs -n1 bash -c 'mv $0 ${0%.png}_extended.png'
Explanation
We pipe the list of files to xargs and tell it to process one line at a time with the -n1 flag. We then tell xargs to call bash on each instance and provide it with the code to execute via the -c flag.
The $0 references the first input argument the bash receives.
If you need other string substitutions than ${0%.png} there are many cheat sheets such as https://devhints.io/bash.
For more complex substitutions you provide multiple arguments using -n2; these can be collected with $0, $1, etc..
This use of piping + xargs + bash -c is fairly general.
In the short example above, beware that I assumed proper file names (without special characters).

In bash, how to batch show the text of certain line in files?

I want to batch show the text of certain line of files in certain directory, usually this can be done with the following commands:
for file in `find ./ -name "results.txt"`;
do
sed -n '12p' < ${file};
done
In the 12th line of each file names "results.txt", there is the text I want to output.
But, I wonder that if we can use the pipeline command to do this operation. I have tried the following command:
find ./ -name "results.txt" | xargs sed -n '12p'
or
find ./ -name "results.txt" | xargs sed -n '12p' < {} \;
But neither works fine.
Could you give some advice or recommend some references, please?
All are welcome, Thanks in advice!
This should do it
find ./ -name results.txt -exec sed '12!d' {} ';'
#Steven Penny's answer is the most elegant and best-performing solution, but to shed some light on why your solution didn't work:
find ./ -name "results.txt" | xargs sed -n '12p'
causes all filenames(1) to be passed at once(2) to sed. Since sed counts lines cumulatively, across input files, only 1 line will be printed for all input files, namely line 12 from the first input file.
Keeping in mind that find's -exec action is the best solution, if you still wanted to solve this problem with xargs, you'd have to use xarg's -I option as follows, so as to ensure that sed is called once per input line (filename) (% is a self-chosen placeholder):
find ./ -name "results.txt" | xargs -I % sed -n '12q;d' %
Footnotes:
(1) with word splitting applied, which would break with paths with embedded spaces, but that's a separate issue.
(2) assuming they don't make the entire command exceed the max. length of a command line; either way, multiple filenames are passed at once.
As an aside: parsing command output with for as in your first snippet is NEVER a good idea - see http://mywiki.wooledge.org/ParsingLs and http://mywiki.wooledge.org/BashFAQ/001
Your use of xargs results in running sed with multiple file arguments. But as you can see, sed doesn't reset the record number to 1 when it starts reading a new file. For example, try running the following command against files with more than 12 lines each.
sed -n '12p' x.txt y.txt
If you want to use xargs, you might consider using awk:
find . -name 'results.txt' | xargs awk 'FNR==12'
P.S: I personally like using the for loop.

Find all files with text "example.html" and replace with "example.php" works only if no spaces are in file name

I have used the following to do a recursive find and replace within files, to update hrefs to point to a new page correctly:
#!/bin/bash
oldstring='features.html'
newstring='features.php'
grep -rl $oldstring public_html/ | xargs sed -i s#"$oldstring"#"$newstring"#g
It worked, except for a few files that had spaces in the name.
This isn't an issue, as the files with spaces in their names are backups/duplicates I created while testing new things. But I'd like to understand how I could properly pass paths with spaces to the sed command, in this query. Would anybody know how this could be corrected in this "one liner"?
find public_html/ -type f -exec grep -q "$oldstring" {} \; -print0 |
xargs -0 sed -i '' s#"$oldstring"#"$newstring"#g
find will print all the filenames for which the grep command is successful. I use the -print0 option to print them with the NUL character as the delimiter. This goes with the -0 option to xargs, which treats NUL as the argument delimiter on its input, rather than breaking the input at whitespace.
Actually, you don't even need grep and xargs, just run sed from find:
find public_html/ -type f -exec sed -i '' s#"$oldstring"#"$newstring"#g {} +
Here's a lazy approach:
grep -rl $oldstring public_html/ | xargs -d'\n' sed -i "s#$oldstring#$newstring#g"
By default, xargs uses whitespace as the delimiter of arguments coming from the input. So for example if you have two files, a b and c, then it will execute the command:
sed -i 's/.../.../' a b c
By telling xargs explicitly to use newline as the delimiter with -d '\n' it will correctly handle a b as a single argument and quote it when running the command:
sed -i 's/.../.../' 'a b' c
I called a lazy approach, because as #Barmar pointed out, this won't work if your files have newline characters in their names. If you need to take care of such cases, then use #Barmar's method with find ... -print0 and xargs -0 ...
PS: I also changed s#"$oldstring"#"$newstring"#g to "s#$oldstring#$newstring#g", which is equivalent, but more readable.

Rename Files to original extensions

Need help on writing a bash script that will rename files that are being outputted as file name.suffix.date I need these files to be rewritten as name.date.suffix instead.
Edited:
Changed suffix from date to ~
Here's what I have so far:
find . -type f -name "*.~" -print0 | while read -d $'\0' f
do
new=`echo "$f" | sed -e "s/~//"`
mv "$f" "$new"
done
This changes the suffix back to original but can't figure out how to get the date to be named before the extension (fname??)
You can use regular expression matching to pull apart the original file name:
find . -type f -name "*.~" -print0 | while read -d $'\0' f
do
dir=${f%/*}
fname=${f##*/}
[[ $fname =~ (.+)\.([^.]+)\.([^.]+)\.~$ ]] || continue
name=${BASH_REMATCH[1]}
suffix=${BASH_REMATCH[2]}
d=${BASH_REMATCH[3]}
mv "$f" "$dir/$name.$d.$suffix"
done
Bash-only solution:
while IFS=. read -r -u 9 -d '' name suffix date tilde
do
mv "${name}.${suffix}.${date}.~" "${name}.${date}.${suffix}"
done 9< <(find . -type f -name "*.~" -print0)
Notes:
-d '' gives you the same result as -d $'\0'
Splits file names by the dots while reading them. Of course this means it would break if there are dots anywhere else.
Should otherwise work with pretty much any filenames, including those containing space, newlines and other funny business.
create a list of the files first and redirect to a file.
ls > fileList.txt
Open the file and read line by line in Perl. Use a regex to match the parts of the files and capture them like this
my ($fileName,$suffix,$date)=($WholeFileName=~/(.*)\.(.*)\.(.*)/);
This should capture the three seperate variables for you. Now all you need to do is move the old file to the new file name. The new file name will be a concatenation of the above three variables that you have got. $newFileName=$fileName. ".".$date.".".$suffix. If you have a sample fileName post a comment and I can reply with a short script. Perl is not the only way. You could just use bash or awk and find alternate ways to do this.
cut each part of your filenames:
FIN=$(echo test.12345.ABCDEF | sed -e 's/[a-zA-Z0-9]*[\\.][a-zA-Z0-9]*[\\.]//')
DEBUT=$(echo test.12345.ABCDEF | sed -e 's/[\\.][a-zA-Z0-9]*[\\.][a-zA-Z0-9]*//')
MILIEU=$(echo test.12345.ABCDEF | sed -e 's/'${FIN}'//' -e 's/'${DEBUT}'//' -e 's/[\.]*//g')
paste each part as expected:
echo ${DEBUT}.${FIN}.${MILIEU}
rename --no-act 's/\(name-regex\).\(suffix-regex\).\(date-regex\)/\1.\3.\2' *
Tweak the three regexes to fit your file names, and remove --no-act when you're happy with the result to actually rename the files.

Resources