Removing strings from multiple files - bash

I'm trying to organise and rename some roms I got, I've already used command line to remove regions like " (USA)" and " (Japan)" including the space in front from filenames. Now I need to update my .cue files, I've tried the following but something is missing...
grep --include={*.cue} -rnl './' -e " (USA)" | xargs -i# sed -i 's/ (USA)//g' #
grep --include={*.cue} -rnl './' -e " (Europe)" | xargs -i# sed -i 's/ (Europe)//g' #
grep --include={*.cue} -rnl './' -e " (Japan)" | xargs -i# sed -i 's/ (Japan)//g' #
I got it to work on one occasion but can't see to get it right again...

Awesome thanks, I used:
sed -i 's/ (Japan)//g;s/ (Europe)//g;s/ (USA)//g' *.cue

Related

Refining bash script with multiple find regex sed awk to array and functions that build a report

The following code is working, but it takes too long and everything I've tried to reduce it bombs either due to white spaces, inconsistent access.log syntax or something else.
Any suggestions to help cut down the finds to one find $LOGS -mtime -30 -type f - print0 and grep/sed/awk/sort once compared to multiple finds like this would be appreciated:
find $LOGS -mtime -30 -type f -print0 | xargs -0 grep -B 2 -w "RESULT err=0 tag=97" | grep -w "BIND" | sed '/uid=/!d;s//&\n/;s/.*\n//;:a;/,/bb;$!{n;ba};:b;s//\n&/;P;D' | sed 's/ //g' | sed s/$/,/g |awk '{a[$1]++}END{for(i in a)print i a[i]}' |sort -t , -k 2 -g > $OUTPUT1;
find $LOGS -mtime -30 -type f -print0 | xargs -0 grep -B 2 -w "RESULT err=0 tag=97" | grep -E 'BIND|LDAP connection from*' | sed '/from /!d;s//&\n/;s/.*\n//;:a;/:/bb;$!{n;ba};:b;s//\n&/;P;D' | sed 's/ //g' | sed s/$/,/g |awk '{a[$1]++}END{for(i in a)print i a[i]}' |sort -t , -k 2 -g > $IPAUTH0;
find $LOGS -mtime -30 -type f -print0 | xargs -0 grep -B 2 -w "RESULT err=49 tag=97" | grep -w "BIND" | sed '/uid=/!d;s//&\n/;s/.*\n//;:a;/,/bb;$!{n;ba};:b;s//\n&/;P;D' | sed 's/ //g' | sed s/$/,/g |awk '{a[$1]++}END{for(i in a)print i a[i]}' |sort -t , -k 2 -g > $OUTPUT2;
I've tried: for find | while read -r file; do grep1>output1 grep2>output2 grep3>output3 done and a few others, but cannot seem to get the syntax right and am hoping to cut down the repeats here.
The full script (stripped of some content) can be found here and runs against a Java program I wrote for an email report. NOTE: This runs against access logs in about 60GB of combined text.
I haven't looked closely at the sed/awk/etc section (and they'll be hard to work on without some example data), but you should be able to share the initial scans by greping for lines matching any of the patterns, storing that in a temp file, and then searching just that for the individual patterns. I'd also use find ... -exec instead of find ... | xargs:
tempfile=$(mktemp "${TMPDIR:-/tmp}/logextract.XXXXXX") || {
echo "Error creating temp file" >&2
exit 1
}
find $LOGS -mtime -30 -type f -exec grep -B 2 -Ew "RESULT err=(0|49) tag=97" {} + >"$tempfile"
grep -B 2 -w "RESULT err=0 tag=97" "$tempfile" | grep -w "BIND" | ...
grep -B 2 -w "RESULT err=0 tag=97" "$tempfile" | grep -E 'BIND|LDAP connection from*' | ...
grep -B 2 -w "RESULT err=49 tag=97" "$tempfile" | grep -w "BIND" | ...
rm "$tempfile"
BTW, you probably don't mean to search for LDAP connection from* -- the from* at the end means "fro" followed by 0 or more "m" characters.
A couple of general scripting recommendations: use lower- or mixed-case variables to avoid accidental conflicts with the various all-caps names that have special meanings. (Except when you want the special meaning, e.g. setting PATH.)
Also, putting double-quotes around variable references is generally a good idea to prevent unexpected word splitting and wildcard expansion... except that in some places your script depends on this, like setting LOGS="/log_dump/ldap/c*", and then counting on wildcard expansion happening when the variable is used. In these cases, it's usually better to use a bash array to store each item (e.g. filename) as a separate element:
logs=(/log_dump/ldap/c*) # Wildcard gets expanded when it's defined
...
find "${logs[#]}" -mtime ... # All that syntax gets all array elements in unmangled form
Note that this isn't really needed in cases like this where you know there aren't going to be any unexpected wildcards or spaces in the variable, but when you're dealing with unconstrained data this method is safer. (I work mostly on macOS, where spaces in filenames are just a fact of life, and I've learned the hard way to use script idioms that aren't confused by them.)

Bash line breaks

I am using Git Bash to recursively find all of the file extensions in our legacy web site. When I pipe it to a file I would like to add line-breaks and a period in front of the file extension.
find . -type f -name "*.*" | grep -o -E "\.[^\.]+$" | grep -o -E "[[:alpha:]]{1,12}" | awk '{print tolower($0)}' | sort -u
You have different ways.
When you do not want to change your existing commands I am tempted to use
printf ".%s\n" $(find . -type f -name "*\.*" | grep -o -E "\.[^\.]+$" |
grep -o -E "[[:alpha:]]{1,12}" | awk '{print tolower($0)}' | sort -u ) # Wrong
This is incorrect. When a file extension has a space (like example.with space), it will be split into different lines.
Your command already outputs everyring into different lines, so you can just put a dot before each line with | sed 's/^/./'
You can skip commands in the pipeline. You can let awk put a dot in front of a line with
find . -type f -name "*\.*" | grep -o -E "\.[^\.]+$" | grep -o -E "[[:alpha:]]{1,12}" | awk '{print "." tolower($0)}' | sort -u
Or you can let sed ad the dot, with GNU sed also convert in lowercase.
find . -type f -name "." | sed -r 's/..([^.])$/.\L\1/' | sort -u
In the last command I skipped the grep on 12 chars, I think it works different than you like:
echo test.qqqwwweeerrrtttyyyuuuiiioooppp | grep -o -E "\.[^\.]+$" | grep -o -E "[[:alpha:]]{1,12}"
Adding a second line break for each line, can be done in different ways.
When you have the awk command, swith the awk and sort and use
awk '{print "." tolower($0) "\n"}'
Or add newlines at the end of the pipeline: sed 's/$/\n/'.

Using a file's content in sed's replacement string

I've spent hours searching and can't find a solution to this. I have a directory with over 1,000 PHP files. I need to replace some code in these files as follows:
Find:
session_register("CurWebsiteID");
Replace with (saved in replacement.txt:
if(!function_exists ("session_register") && isset($_SERVER["DOCUMENT_ROOT"])){require_once($_SERVER["DOCUMENT_ROOT"]."/libraries/phpruntime/php_legacy_session_functions.php");} session_register("CurWebsiteID");
Using the command below, I'm able to replace the pattern with $(cat replacement.txt) whereas I'm looking to replace them with the content of the text file.
Command being used:
find . -name "*.xml" | xargs -n 1 sed -i -e 's/mercy/$(cat replacement.txt)/g'
I've also tried using variables instead replacement=code_above; and running an adjusted version with $(echo $replacement) but that doesn't help either.
What is the correct way to achieve this?
You don't need command substitution here. You can use the sed r command to insert file content and d to delete the line matching the pattern:
find . -name "*.xml" | xargs -n 1 sed -i -e '/mercy/r replacement.txt' -e '//d'
$(...) is not interpreted inside single quotes. Use double quotes:
find . -name "*.xml" | xargs -n 1 sed -i -e "s/mercy/$(cat replacement.txt)/g"
You can also do away with cat:
find . -name "*.xml" | xargs -n 1 sed -i -e "s/mercy/$(< replacement.txt)/g"
In case replacement.txt has a / in it, use a different delimiter in sed expression, for example #:
find . -name "*.xml" | xargs -n 1 sed -i -e "s#mercy#$(< replacement.txt)#g"
See also:
Use slashes in sed replace

obtain md5sum on every linked library

I've got an issue where a program suddenly doesn't want to start, no error, no nothing. To ensure the integrity of the code and its linked libraries I wanted to compare the md5sum of every (dynamically) linked library. From other posts in this forum I found it easy to list all the linked libraries show them nicely:
ldd myProgram | grep so | sed -e '/^[^\t]/ d' \
| sed -e 's/\t//' | sed -e 's/.*=..//' \
| sed -e 's/ (0.*)//'
How can I add the md5sum or sha1sum so it will add a column with the checksum next to the filename? Simply adding md5sum only produces one line and doesn't seem to do the job:
ldd myProgram | grep so | sed -e '/^[^\t]/ d' \
| sed -e 's/\t//' | sed -e 's/.*=..//' \
| sed -e 's/ (0.*)//' | md5sum
yields
3baf2fafbce4dc8a313ded067c0fccab -
leaving md5sum out produces the nice list of linked libraries:
/lib/i386-linux-gnu/i686/cmov/libpthread.so.0
/lib/i386-linux-gnu/i686/cmov/librt.so.1
/lib/i386-linux-gnu/i686/cmov/libdl.so.2
/lib/i386-linux-gnu/libz.so.1
/usr/lib/i386-linux-gnu/libodbc.so.1
/usr/lib/libcrypto++.so.9
/lib/i386-linux-gnu/libpng12.so.0
/usr/lib/i386-linux-gnu/libstdc++.so.6
/lib/i386-linux-gnu/i686/cmov/libm.so.6
/lib/i386-linux-gnu/libgcc_s.so.1
/lib/i386-linux-gnu/i686/cmov/libc.so.6
/lib/ld-linux.so.2
/usr/lib/i386-linux-gnu/libltdl.so.7
Any hint is much appreciated!
Your script is doing is piping the literal text "/lib/i386-linux-gnu/i686/cmov/libpthread.so.0..." etc. and calculating the md5sum of that...
You can use xargs to repeat any command on every line of input. The -I{} isn't strictly necessary but I'd recommend as makes your script more readable and easier to understand
For example
adam#brimstone:~$ ldd $(which bash)
| grep so | sed -e '/^[^\t]/ d'
| sed -e 's/\t//' | sed -e 's/.*=..//'
| sed -e 's/ (0.*)//'
| xargs -I{} md5sum {}
6a0cb513f136f5c40332e3882e603a02 /lib/x86_64-linux-gnu/libtinfo.so.5
c60bb4f3ae0157644b993cc3c0d2d11e /lib/x86_64-linux-gnu/libdl.so.2
365459887779aa8a0d3148714d464cc4 /lib/x86_64-linux-gnu/libc.so.6
578a20e00cb67c5041a78a5e9281b70c /lib64/ld-linux-x86-64.so.2
For loop can also be used:
for FILE in `<your command>`;do md5sum $FILE;done
For eg:
for FILE in `ldd /usr/bin/gcc | grep so | sed -e '/^[^\t]/ d' | sed -e 's/\t//' | sed -e 's/.*=..//' | sed -e 's/ (0.*)//'`;do md5sum $FILE;done

Sed with Xargs cannot open passed file (Cygwin)

Trying to use the beauty of Sed so I don't have to manually update a few hundred files. I'll note my employer only allows use of Win8 (joy), so I use Cygwin all day until I can use my Linux boxes at home.
The following works on a Linux (bash) command line, but not Cygwin
> grep -lrZ "/somefile.js" . | xargs -0 -l sed -i -e 's|/somefile.js|/newLib.js|g'
sed: can't read ./testTarget.jsp: No such file or directory
# works
> sed -i 's|/somefile.js|/newLib.js|g' ./testTarget.jsp
So the command by itself works, but not passed through Xargs. And, before you say to use Perl instead of Sed, the Perl equivalent throws the very same error
> grep -lrZ "/somefile.js" . | xargs -0 perl -i -pe 's|/somefile.js|/newLib.js|g'
Can't open ./testTarget.jsp
: No such file or directory.
Use the xargs -n option to split up the arguments and force separate calls to sed.
On windows using GnuWin tools (not Cygwin) I found that I need to split up the input to sed. By default xargs will pass ALL of the files from grep to one call to sed.
Let's say you have 4 files that match your grep call, the sed command will run through xargs like this:
sed -i -e 's|/somefile.js|/newLib.js|g' ./file1 ./file2 ./subdir/file3 ./subdir/file4
If the number of files is too large sed will give you this error.
Use the -n option to have xargs call sed repeatedly until it exhausts all of the arguments.
grep -lrZ "/somefile.js" . | xargs -0 -l -n 2 sed -i -e 's|/somefile.js|/newLib.js|g'
In my small example using -n 2 will internally do this:
sed -i -e 's|/somefile.js|/newLib.js|g' ./file1 ./file2
sed -i -e 's|/somefile.js|/newLib.js|g' ./subdir/file3 ./subdir/file4
I had a large set of files and directories (around 3000 files), and using xargs -n 5 worked great.
When I tried -n 10 I got errors. Using xargs --verbose I could see some of the commandline calls were getting cut off at around 500 characters. So you may need to make -n smaller depending on the path length of the files you are woking with.

Resources