Why is this bash for loop slow? - bash

I am trying to this code:
for f in jobs/UPDTEST/apples* ; do
nf=`echo $f | sed s:jobs\/::g`
echo $nf | tr '_' ' '
done > jobs
There are 750 apples* type text files. But as I am only messing with the file name - I would have thought it should be quick - but take about 5 mins.
Is there an alternative way to do this?

You can use parameter expansions like ${parameter/pattern/string} to get rid of the calls to sed and tr. In your case it could look like:
for f in jobs/UPDTEST/apples*; do
f=${f//jobs\//}
echo ${f//_/ }
done > jobs

First, cd jobs would remove the need for the sed
Second, you don't need tr to substitute characters in the value of a bash variable.
Third, with find you don't need a loop at all.
f=$(cd jobs; find UPDTEST -name 'apples*' -depth 1)
echo "${f//_/ }" > jobs.log
By the way, you can't have a jobs directory and a jobs file in the same directory.

Related

Bash capturing in brace expansion

What would be the best way to use something like a capturing group in regex for brace expansion. For example:
touch {1,2,3,4,5}myfile{1,2,3,4,5}.txt
results in all permutations of the numbers and 25 different files. But in case I just want to have files like 1myfile1.txt, 2myfile2.txt,... with the first and second number the same, this obviously doesn't work. Therefore I'm wondering what would be the best way to do this?
I'm thinking about something like capturing the first number, and using it a second time. Ideally without a trivial loop.
Thanks!
Not using a regex but a for loop and sequence (seq) you get the same result:
for i in $(seq 1 5); do touch ${i}myfile${i}.txt; done
Or tidier:
for i in $(seq 1 5);
do
touch ${i}myfile${i}.txt;
done
As an example, using echo instead of touch:
➜ for i in $(seq 1 5); do echo ${i}myfile${i}.txt; done
1myfile1.txt
2myfile2.txt
3myfile3.txt
4myfile4.txt
5myfile5.txt
Variation on MTwarog's answer with one less pipe/subprocess:
$ echo {1..5} | tr ' ' '\n' | xargs -I '{}' touch {}myfile{}.txt
$ ls -1 *myfile*
1myfile1.txt
2myfile2.txt
3myfile3.txt
4myfile4.txt
5myfile5.txt
You can use AWK to do that:
echo {1..5} | tr ' ' '\n' | awk '{print $1"filename"$1".txt"}' | xargs touch
Explanation:
echo {1..5} - prints range of numbers
tr ' ' '\n' - splits numbers to separate lines
awk '{print $1"filename"$1}' - enables you to format output using previously printed numbers
xargs touch - passes filenames to touch command (creates files)

Finding the file name in a directory with a pattern

I need to find the latest file - filename_YYYYMMDD in the directory DIR.
The below is not working as the position is shifting each time because of the spaces between(occurring mostly at file size field as it differs every time.)
please suggest if there is other way.
report =‘ls -ltr $DIR/filename_* 2>/dev/null | tail -1 | cut -d “ “ -f9’
You can use AWK to cut the last field . like below
report=`ls -ltr $DIR/filename_* 2>/dev/null | tail -1 | awk '{print $NF}'`
Cut may not be an option here
If I understand you want to loop though each file in the directory and file the largest 'YYYYMMDD' value and the filename associated with that value, you can use simple POSIX parameter expansion with substring removal to isolate the 'YYYYMMDD' and compare against a value initialized to zero updating the latest variable to hold the largest 'YYYYMMDD' as you loop over all files in the directory. You can store the name of the file each time you find a larger 'YYYYMMDD'.
For example, you could do something like:
#!/bin/sh
name=
latest=0
for i in *; do
test "${i##*_}" -gt "$latest" && { latest="${i##*_}"; name="$i"; }
done
printf "%s\n" "$name"
Example Directory
$ ls -1rt
filename_20120615
filename_20120612
filename_20120115
filename_20120112
filename_20110615
filename_20110612
filename_20110115
filename_20110112
filename_20100615
filename_20100612
filename_20100115
filename_20100112
Example Use/Output
$ name=; latest=0; \
> for i in *; do \
> test "${i##*_}" -gt "$latest" && { latest="${i##*_}"; name="$i"; }; \
> done; \
> printf "%s\n" "$name"
filename_20120615
Where the script selects filename_20120615 as the file with the greatest 'YYYYMMDD' of all files in the directory.
Since you are using only tools provided by the shell itself, it doesn't need to spawn subshells for each pipe or utility it calls.
Give it a test and let me know if that is what you intended, let me know if your intent was different, or if you have any further questions.

How to process tr across all files in a directory and output to a different name in another directory?

mpu3$ echo * | xargs -n 1 -I {} | tr "|" "/n"
which outputs:
#.txt
ag.txt
bg.txt
bh.txt
bi.txt
bid.txt
dh.txt
dw.txt
er.txt
ha.txt
jo.txt
kc.txt
lfr.txt
lg.txt
ng.txt
pb.txt
r-c.txt
rj.txt
rw.txt
se.txt
sh.txt
vr.txt
wa.txt
is what I have so far. What is missing is the output; I get none. What I really want is to get a list of txt files, use their name up to the extension, process out the "|" and replace it with a LF/CR and put the new file in another directory as [old-name].ics. HALP. THX in advance. - Idiot me.
You can loop over the files and use sed to process the file:
for i in *.txt; do
sed -e 's/|/\n/g' "$i" > other_directory/"${i%.txt}".ics
done
No need to use xargs, especially with echo which would risk the filenames getting word split and having globbing apply to them, so could well do the wrong thing.
Then we use sed and use s to substitute | with \n g makes it a global replace. We redirect that to the other director you want and use bash's parameter expansion to strip off the .txt from the end
Here's an awk solution:
$ awk '
FNR==1 { # for first record of every file
close(f) # close previous file f
f="path_to_dir/" FILENAME # new filename with path
sub(/txt$/,"ics",f) } # replace txt with ics
{
gsub(/\|/,"\n") # replace | with \n
print > f }' *.txt # print to new file

Bash - Search and Replace operation with reporting the files and lines that got changed

I have a input file "test.txt" as below -
hostname=abc.com hostname=xyz.com
db-host=abc.com db-host=xyz.com
In each line, the value before space is the old value which needs to be replaced by the new value after the space recursively in a folder named "test". I am able to do this using below shell script.
#!/bin/bash
IFS=$'\n'
for f in `cat test.txt`
do
OLD=$(echo $f| cut -d ' ' -f 1)
echo "Old = $OLD"
NEW=$(echo $f| cut -d ' ' -f 2)
echo "New = $NEW"
find test -type f | xargs sed -i.bak "s/$OLD/$NEW/g"
done
"sed" replaces the strings on the fly in 100s of files.
Is there a trick or an alternative way by which i can get a report of the files changed like absolute path of the file & the exact lines that got changed ?
PS - I understand that sed or stream editors doesn't support this functionality out of the box. I don't want to use versioning as it will be an overkill for this task.
Let's start with a simple rewrite of your script, to make it a little bit more robust at handling a wider range of replacement values, but also faster:
#!/bin/bash
# escape regexp and replacement strings for sed
escapeRegex() { sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$1"; }
escapeSubst() { sed 's/[&/\]/\\&/g' <<<"$1"; }
while read -r old new; do
find test -type f -exec sed "/$(escapeRegex "$old")/$(escapeSubst "$new")/g" -i '{}' \;
done <test.txt
So, we loop over pairs of whitespace-separated fields (old, new) in lines from test.txt and run a standard sed in-place replace on all files found with find.
Pretty similar to your script, but we properly read lines from test.txt (no word splitting, pathname/variable expansion, etc.), we use Bash builtins whenever possible (no need to call external tools like cat, cut, xargs); and we escape sed metacharacters in old/new values for proper use as sed's regexp and replacement expressions.
Now let's add logging from sed:
#!/bin/bash
# escape regexp and replacement strings for sed
escapeRegex() { sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$1"; }
escapeSubst() { sed 's/[&/\]/\\&/g' <<<"$1"; }
while read -r old new; do
find test -type f -printf '\n[%p]\n' -exec sed "/$(escapeRegex "$old")/{
h
s//$(escapeSubst "$new")/g
H
x
s/\n/ --> /
w /dev/stdout
x
}" -i '{}' > >(tee -a change.log) \;
done <test.txt
The sed script above changes each old to new, but it also writes old --> new line to /dev/stdout (Bash-specific), which we in turn append to change.log file. The -printf action in find outputs a "header" line with file name, for each file processed.
With this, your "change log" will look something like:
[file1]
hostname=abc.com --> hostname=xyz.com
[file2]
[file1]
db-host=abc.com --> db-host=xyz.com
[file2]
db-host=abc.com --> db-host=xyz.com
Just for completeness, a quick walk-through the sed script. We act only on lines containing the old value. For each such line, we store it to hold space (h), change it to new, append that new value to the hold space (joined with newline, H) which now holds old\nnew. We swap hold with pattern space (x), so we can run s command that converts it to old --> new. After writing that to the stdout with w, we move the new back from hold to pattern space, so it gets written (in-place) to the file processed.
From man sed:
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if SUFFIX supplied)
This can be used to create a backup file when replacing. You can then look for any backup files, which indicate which files were changed, and diff those with the originals. Once you're done inspecting the diff, simply remove the backup files.
If you formulate your replacements as sed statements rather than a custom format you can go one further, and use either a sed shebang line or pass the file to -f/--file to do all the replacements in one operation.
There's several problems with your script, just replace it all with (using GNU awk instead of GNU sed for inplace editing):
mapfile -t files < <(find test -type f)
awk -i inplace '
NR==FNR { map[$1] = $2; next }
{ for (old in map) gsub(old,map[old]) }
' test.txt "${files[#]}"
You'll find that is orders of magnitude faster than what you were doing.
That still has the issue your existing script does of failing when the "test.txt" strings contain regexp or backreference metacharacters and modifying previously-modified strings and handling partial matches - if that's an issue let us know as it's easy to work around with awk (and extremely difficult with sed!).
To get whatever kind of report you want you just tweak the { for ... } line to print them, e.g. to print a record of the changes to stderr:
mapfile -t files < <(find test -type f)
awk -i inplace '
NR==FNR { map[$1] = $2; next }
{
orig = $0
for (old in map) {
gsub(old,map[old])
}
if ($0 != orig) {
printf "File %s, line %d: \"%s\" became \"%s\"\n", FILENAME, FNR, orig, $0 | "cat>&2"
}
}
' test.txt "${files[#]}"

Bash script 'sed: first RE may not be empty' error

I have written the following bash script, it is not finished yet so it is still a little messy. The script looks for directories at the same level as the script, it then searches for a particular file within the directory which it makes some changes to.
When I run the script it returns the following error:
sed: first RE may not be empty
sed: first RE may not be empty
sed: first RE may not be empty
sed: first RE may not be empty
sed: first RE may not be empty
sed: first RE may not be empty
sed: first RE may not be empty
My research tells me that it may be something to do with the '/'s in the directory name strings but I have not been able to solve the issue.
Despite the error messages the script seems to be working fine and is making the changes to the files correctly. Can anyone help explain why I am getting the error message above?
#!/bin/bash
FIND_DIRECTORIES=$(find . -type d -maxdepth 1 -mindepth 1)
FIND_IN_DIRECTORIES=$(find $FIND_DIRECTORIES"/app/design/adminhtml" -name "login.phtml")
for i in $FIND_IN_DIRECTORIES
do
# Generate Random Number
RANDOM=$[ ( $RANDOM % 1000 ) + 1 ]
# Find the line where password is printed out on the page
# Grep for the whole line, then remove all but the numbers
# This will leave the old password number
OLD_NUM_HOLDER=$(cat $i | grep "<?php echo Mage::helper('adminhtml')->__('Password: ')" )
OLD_NUM="${OLD_NUM_HOLDER//[!0-9]}"
# Add old and new number to the end of text string
# Beginning text string is used so that sed can find
# Replace old number with new number
OLD_NUM_FULL="password\" ?><?php echo \""$OLD_NUM
NEW_NUM_FULL="password\" ?><?php echo \""$RANDOM
sed -ie "s/$OLD_NUM_FULL/$NEW_NUM_FULL/g" $i
# GREP for the setNewPassword function line
# GREP for new password that has just been set above
SET_NEW_GREP=$(cat $i | grep "setNewPassword(" )
NEW_NUM_GREP=$(cat $i | grep "<?php echo \"(password\" ?><?php echo" )
NEW_NUM_GREPP="${NEW_NUM_GREP//[!0-9]}"
# Add new password to string for sed
# Find and replace old password for setNewPassword function
FULL_NEW_PASS="\$user->setNewPassword(password"$NEW_NUM_GREPP")"
sed -ie "s/$SET_NEW_GREP/$FULL_NEW_PASS/g" $i
done
Thanks in advance for any help with this.
UPDATE -- ANSWER
The issue here was that the for loop was not working as expected. I thought that it was doing /first/directory"/app/design/adminhtml" looping through and then doing /second/directory"/app/design/adminhtml" and then looping through. It was actually doing /first/directory looping through and then doing /second/directory"/app/design/adminhtml" and then looping through. So it was actually attaching the full directory path to the last item in the iteration. I have fixed the issue in the script below:
#!/bin/bash
for i in $(find . -type d -maxdepth 1 -mindepth 1); do
FIND_IN_DIRECTORIES=$i"/app/design/adminhtml/default"
FIND_IN_DIRECTORIES=$(find $FIND_IN_DIRECTORIES -name "login.phtml")
# Generate Random Number
RANDOM=$[ ( $RANDOM % 1000 ) + 1 ]
# Find the line where password is printed out on the page
# Grep for the whole line, then remove all but the numbers
# This will leave the old password number
OLD_NUM_HOLDER=$(cat $FIND_IN_DIRECTORIES | grep "<?php echo Mage::helper('adminhtml')->__('Password: ')" )
OLD_NUM="${OLD_NUM_HOLDER//[!0-9]}"
# Add old and new number to the end of text string
# Beginning text string is used so that sed can find
# Replace old number with new number
OLD_NUM_FULL="password\" ?><?php echo \""$OLD_NUM
NEW_NUM_FULL="password\" ?><?php echo \""$RANDOM
sed -ie "s/$OLD_NUM_FULL/$NEW_NUM_FULL/g" $FIND_IN_DIRECTORIES
# GREP for the setNewPassword function line
# GREP for new password that has just been set above
SET_NEW_GREP=$(cat $FIND_IN_DIRECTORIES | grep "setNewPassword(" )
NEW_NUM_GREP=$(cat $FIND_IN_DIRECTORIES | grep "<?php echo \"(password\" ?><?php echo" )
NEW_NUM_GREPP="${NEW_NUM_GREP//[!0-9]}"
# Add new password to string for sed
# Find and replace old password for setNewPassword function
FULL_NEW_PASS="\$user->setNewPassword(password"$NEW_NUM_GREPP")"
sed -ie "s/$SET_NEW_GREP/$FULL_NEW_PASS/g" $FIND_IN_DIRECTORIES
done
without debugging your whole setup, note that you can use an alternate character to delimit sed reg-ex/match values, i.e.
sed -i "s\#$OLD_NUM_FULL#$NEW_NUM_FULL#g" $i
and
sed -i "s\#$SET_NEW_GREP#$FULL_NEW_PASS#g" $i
You don't need the -e, so I have removed it.
Some seds require the leading '\' before the #, so I include it. It is possible that some will be confused by it, so if this doesn't work, try removing the leading '\'
you should also turn on shell debugging, to see exactly which sed (and what values) are causing the problem. Add a line with set -vx near the top of your script to turn on debugging.
I hope this helps.

Resources