Optimize BASH Code to Delete First and Last Line of XML Files - shell

How can this line from a BASH script be optimized to work faster in removing the first and last lines of a directory full of XML files?
sed -s -i -e 1d ./files/to/edit/*.xml && sed -s -i -e '$d' ./files/to/edit/*.xml
The sed command does not have to be used. Any BASH code will work; python3 would also be nice.

Try that :
sed -i '1d;$d' ./files/to/edit/*.xml
It's faster, see :
time find /usr/share/doc/x* | xargs -I% sed '1d' % && sed '$d' %
real 0m0.611s
user 0m0.033s
sys 0m0.120s
time find /usr/share/doc/x* | xargs -I% sed -e '1d' -e '$d' %
real 0m0.613s
user 0m0.027s
sys 0m0.140s
time find /usr/share/doc/x* | xargs -I% sed '1d;$d' %
real 0m0.565s
user 0m0.023s
sys 0m0.140s

Related

Bash line breaks

I am using Git Bash to recursively find all of the file extensions in our legacy web site. When I pipe it to a file I would like to add line-breaks and a period in front of the file extension.
find . -type f -name "*.*" | grep -o -E "\.[^\.]+$" | grep -o -E "[[:alpha:]]{1,12}" | awk '{print tolower($0)}' | sort -u
You have different ways.
When you do not want to change your existing commands I am tempted to use
printf ".%s\n" $(find . -type f -name "*\.*" | grep -o -E "\.[^\.]+$" |
grep -o -E "[[:alpha:]]{1,12}" | awk '{print tolower($0)}' | sort -u ) # Wrong
This is incorrect. When a file extension has a space (like example.with space), it will be split into different lines.
Your command already outputs everyring into different lines, so you can just put a dot before each line with | sed 's/^/./'
You can skip commands in the pipeline. You can let awk put a dot in front of a line with
find . -type f -name "*\.*" | grep -o -E "\.[^\.]+$" | grep -o -E "[[:alpha:]]{1,12}" | awk '{print "." tolower($0)}' | sort -u
Or you can let sed ad the dot, with GNU sed also convert in lowercase.
find . -type f -name "." | sed -r 's/..([^.])$/.\L\1/' | sort -u
In the last command I skipped the grep on 12 chars, I think it works different than you like:
echo test.qqqwwweeerrrtttyyyuuuiiioooppp | grep -o -E "\.[^\.]+$" | grep -o -E "[[:alpha:]]{1,12}"
Adding a second line break for each line, can be done in different ways.
When you have the awk command, swith the awk and sort and use
awk '{print "." tolower($0) "\n"}'
Or add newlines at the end of the pipeline: sed 's/$/\n/'.

Removing strings from multiple files

I'm trying to organise and rename some roms I got, I've already used command line to remove regions like " (USA)" and " (Japan)" including the space in front from filenames. Now I need to update my .cue files, I've tried the following but something is missing...
grep --include={*.cue} -rnl './' -e " (USA)" | xargs -i# sed -i 's/ (USA)//g' #
grep --include={*.cue} -rnl './' -e " (Europe)" | xargs -i# sed -i 's/ (Europe)//g' #
grep --include={*.cue} -rnl './' -e " (Japan)" | xargs -i# sed -i 's/ (Japan)//g' #
I got it to work on one occasion but can't see to get it right again...
Awesome thanks, I used:
sed -i 's/ (Japan)//g;s/ (Europe)//g;s/ (USA)//g' *.cue

obtain md5sum on every linked library

I've got an issue where a program suddenly doesn't want to start, no error, no nothing. To ensure the integrity of the code and its linked libraries I wanted to compare the md5sum of every (dynamically) linked library. From other posts in this forum I found it easy to list all the linked libraries show them nicely:
ldd myProgram | grep so | sed -e '/^[^\t]/ d' \
| sed -e 's/\t//' | sed -e 's/.*=..//' \
| sed -e 's/ (0.*)//'
How can I add the md5sum or sha1sum so it will add a column with the checksum next to the filename? Simply adding md5sum only produces one line and doesn't seem to do the job:
ldd myProgram | grep so | sed -e '/^[^\t]/ d' \
| sed -e 's/\t//' | sed -e 's/.*=..//' \
| sed -e 's/ (0.*)//' | md5sum
yields
3baf2fafbce4dc8a313ded067c0fccab -
leaving md5sum out produces the nice list of linked libraries:
/lib/i386-linux-gnu/i686/cmov/libpthread.so.0
/lib/i386-linux-gnu/i686/cmov/librt.so.1
/lib/i386-linux-gnu/i686/cmov/libdl.so.2
/lib/i386-linux-gnu/libz.so.1
/usr/lib/i386-linux-gnu/libodbc.so.1
/usr/lib/libcrypto++.so.9
/lib/i386-linux-gnu/libpng12.so.0
/usr/lib/i386-linux-gnu/libstdc++.so.6
/lib/i386-linux-gnu/i686/cmov/libm.so.6
/lib/i386-linux-gnu/libgcc_s.so.1
/lib/i386-linux-gnu/i686/cmov/libc.so.6
/lib/ld-linux.so.2
/usr/lib/i386-linux-gnu/libltdl.so.7
Any hint is much appreciated!
Your script is doing is piping the literal text "/lib/i386-linux-gnu/i686/cmov/libpthread.so.0..." etc. and calculating the md5sum of that...
You can use xargs to repeat any command on every line of input. The -I{} isn't strictly necessary but I'd recommend as makes your script more readable and easier to understand
For example
adam#brimstone:~$ ldd $(which bash)
| grep so | sed -e '/^[^\t]/ d'
| sed -e 's/\t//' | sed -e 's/.*=..//'
| sed -e 's/ (0.*)//'
| xargs -I{} md5sum {}
6a0cb513f136f5c40332e3882e603a02 /lib/x86_64-linux-gnu/libtinfo.so.5
c60bb4f3ae0157644b993cc3c0d2d11e /lib/x86_64-linux-gnu/libdl.so.2
365459887779aa8a0d3148714d464cc4 /lib/x86_64-linux-gnu/libc.so.6
578a20e00cb67c5041a78a5e9281b70c /lib64/ld-linux-x86-64.so.2
For loop can also be used:
for FILE in `<your command>`;do md5sum $FILE;done
For eg:
for FILE in `ldd /usr/bin/gcc | grep so | sed -e '/^[^\t]/ d' | sed -e 's/\t//' | sed -e 's/.*=..//' | sed -e 's/ (0.*)//'`;do md5sum $FILE;done

Sed with Xargs cannot open passed file (Cygwin)

Trying to use the beauty of Sed so I don't have to manually update a few hundred files. I'll note my employer only allows use of Win8 (joy), so I use Cygwin all day until I can use my Linux boxes at home.
The following works on a Linux (bash) command line, but not Cygwin
> grep -lrZ "/somefile.js" . | xargs -0 -l sed -i -e 's|/somefile.js|/newLib.js|g'
sed: can't read ./testTarget.jsp: No such file or directory
# works
> sed -i 's|/somefile.js|/newLib.js|g' ./testTarget.jsp
So the command by itself works, but not passed through Xargs. And, before you say to use Perl instead of Sed, the Perl equivalent throws the very same error
> grep -lrZ "/somefile.js" . | xargs -0 perl -i -pe 's|/somefile.js|/newLib.js|g'
Can't open ./testTarget.jsp
: No such file or directory.
Use the xargs -n option to split up the arguments and force separate calls to sed.
On windows using GnuWin tools (not Cygwin) I found that I need to split up the input to sed. By default xargs will pass ALL of the files from grep to one call to sed.
Let's say you have 4 files that match your grep call, the sed command will run through xargs like this:
sed -i -e 's|/somefile.js|/newLib.js|g' ./file1 ./file2 ./subdir/file3 ./subdir/file4
If the number of files is too large sed will give you this error.
Use the -n option to have xargs call sed repeatedly until it exhausts all of the arguments.
grep -lrZ "/somefile.js" . | xargs -0 -l -n 2 sed -i -e 's|/somefile.js|/newLib.js|g'
In my small example using -n 2 will internally do this:
sed -i -e 's|/somefile.js|/newLib.js|g' ./file1 ./file2
sed -i -e 's|/somefile.js|/newLib.js|g' ./subdir/file3 ./subdir/file4
I had a large set of files and directories (around 3000 files), and using xargs -n 5 worked great.
When I tried -n 10 I got errors. Using xargs --verbose I could see some of the commandline calls were getting cut off at around 500 characters. So you may need to make -n smaller depending on the path length of the files you are woking with.

xargs to execute a string - what am I doing wrong?

I'm trying to rename all files in current directory such that upper case name is converted to lower. I'm trying to do it like this:
ls -1|gawk '{print "`mv "$0" "tolower($0)"`"}'|xargs -i -t eval {}
I have two files in the directory, Y and YY
-t added for debugging, and output is:
eval `mv Y y`
xargs: eval: No such file or directory
if I execute the eval on its own, it works and moves Y to y.
I know there are other ways to achieve this, but I'd like to get this working if I can!
Cheers
eval is a shell builtin command, not a standalone executable. Thus, xargs cannot run it directly. You probably want:
ls -1 | gawk '{print "`mv "$0" "tolower($0)"`"}' | xargs -i -t sh -c "{}"
Although you're looking at an xargs solution, another method to perform the same thing can be done with tr (assuming sh/bash/ksh syntax):
for i in *; do mv $i `echo $i | tr '[A-Z]' '[a-z]'`; done
If your files are created by creative users, you will see files like:
My brother's 12" records
The solutions so far do not work on that kind of files. If you have GNU Parallel installed this will work (even on the files with creative names):
ls | parallel 'mv {} "$(echo {} | tr "[:upper:]" "[:lower:]")"'
Watch the intro video to learn more: http://www.youtube.com/watch?v=OpaiGYxkSuQ
You can use eval with xargs like the one below.
Note: I only tested this in bash shell
ls -1| gawk '{print "mv "$0" /tmp/"toupper($0)""}'| xargs -I {} sh -c "eval {}"
or
ls -1| gawk '{print "mv "$0" /tmp/"toupper($0)""}'| xargs -I random_var_name sh -c "eval random_var_name"
I generally use this approach when I want to avoid one-liner for loop.
e.g.
for file in $(find /some/path | grep "pattern");do somecmd $file; done
The same can be written like below
find /some/path | grep "pattern"| xargs -I {} sh -c "somecmd {}"

Resources