How can this line from a BASH script be optimized to work faster in removing the first and last lines of a directory full of XML files?
sed -s -i -e 1d ./files/to/edit/*.xml && sed -s -i -e '$d' ./files/to/edit/*.xml
The sed command does not have to be used. Any BASH code will work; python3 would also be nice.
Try that :
sed -i '1d;$d' ./files/to/edit/*.xml
It's faster, see :
time find /usr/share/doc/x* | xargs -I% sed '1d' % && sed '$d' %
real 0m0.611s
user 0m0.033s
sys 0m0.120s
time find /usr/share/doc/x* | xargs -I% sed -e '1d' -e '$d' %
real 0m0.613s
user 0m0.027s
sys 0m0.140s
time find /usr/share/doc/x* | xargs -I% sed '1d;$d' %
real 0m0.565s
user 0m0.023s
sys 0m0.140s
Related
I am using Git Bash to recursively find all of the file extensions in our legacy web site. When I pipe it to a file I would like to add line-breaks and a period in front of the file extension.
find . -type f -name "*.*" | grep -o -E "\.[^\.]+$" | grep -o -E "[[:alpha:]]{1,12}" | awk '{print tolower($0)}' | sort -u
You have different ways.
When you do not want to change your existing commands I am tempted to use
printf ".%s\n" $(find . -type f -name "*\.*" | grep -o -E "\.[^\.]+$" |
grep -o -E "[[:alpha:]]{1,12}" | awk '{print tolower($0)}' | sort -u ) # Wrong
This is incorrect. When a file extension has a space (like example.with space), it will be split into different lines.
Your command already outputs everyring into different lines, so you can just put a dot before each line with | sed 's/^/./'
You can skip commands in the pipeline. You can let awk put a dot in front of a line with
find . -type f -name "*\.*" | grep -o -E "\.[^\.]+$" | grep -o -E "[[:alpha:]]{1,12}" | awk '{print "." tolower($0)}' | sort -u
Or you can let sed ad the dot, with GNU sed also convert in lowercase.
find . -type f -name "." | sed -r 's/..([^.])$/.\L\1/' | sort -u
In the last command I skipped the grep on 12 chars, I think it works different than you like:
echo test.qqqwwweeerrrtttyyyuuuiiioooppp | grep -o -E "\.[^\.]+$" | grep -o -E "[[:alpha:]]{1,12}"
Adding a second line break for each line, can be done in different ways.
When you have the awk command, swith the awk and sort and use
awk '{print "." tolower($0) "\n"}'
Or add newlines at the end of the pipeline: sed 's/$/\n/'.
I'm trying to organise and rename some roms I got, I've already used command line to remove regions like " (USA)" and " (Japan)" including the space in front from filenames. Now I need to update my .cue files, I've tried the following but something is missing...
grep --include={*.cue} -rnl './' -e " (USA)" | xargs -i# sed -i 's/ (USA)//g' #
grep --include={*.cue} -rnl './' -e " (Europe)" | xargs -i# sed -i 's/ (Europe)//g' #
grep --include={*.cue} -rnl './' -e " (Japan)" | xargs -i# sed -i 's/ (Japan)//g' #
I got it to work on one occasion but can't see to get it right again...
Awesome thanks, I used:
sed -i 's/ (Japan)//g;s/ (Europe)//g;s/ (USA)//g' *.cue
I've got an issue where a program suddenly doesn't want to start, no error, no nothing. To ensure the integrity of the code and its linked libraries I wanted to compare the md5sum of every (dynamically) linked library. From other posts in this forum I found it easy to list all the linked libraries show them nicely:
ldd myProgram | grep so | sed -e '/^[^\t]/ d' \
| sed -e 's/\t//' | sed -e 's/.*=..//' \
| sed -e 's/ (0.*)//'
How can I add the md5sum or sha1sum so it will add a column with the checksum next to the filename? Simply adding md5sum only produces one line and doesn't seem to do the job:
ldd myProgram | grep so | sed -e '/^[^\t]/ d' \
| sed -e 's/\t//' | sed -e 's/.*=..//' \
| sed -e 's/ (0.*)//' | md5sum
yields
3baf2fafbce4dc8a313ded067c0fccab -
leaving md5sum out produces the nice list of linked libraries:
/lib/i386-linux-gnu/i686/cmov/libpthread.so.0
/lib/i386-linux-gnu/i686/cmov/librt.so.1
/lib/i386-linux-gnu/i686/cmov/libdl.so.2
/lib/i386-linux-gnu/libz.so.1
/usr/lib/i386-linux-gnu/libodbc.so.1
/usr/lib/libcrypto++.so.9
/lib/i386-linux-gnu/libpng12.so.0
/usr/lib/i386-linux-gnu/libstdc++.so.6
/lib/i386-linux-gnu/i686/cmov/libm.so.6
/lib/i386-linux-gnu/libgcc_s.so.1
/lib/i386-linux-gnu/i686/cmov/libc.so.6
/lib/ld-linux.so.2
/usr/lib/i386-linux-gnu/libltdl.so.7
Any hint is much appreciated!
Your script is doing is piping the literal text "/lib/i386-linux-gnu/i686/cmov/libpthread.so.0..." etc. and calculating the md5sum of that...
You can use xargs to repeat any command on every line of input. The -I{} isn't strictly necessary but I'd recommend as makes your script more readable and easier to understand
For example
adam#brimstone:~$ ldd $(which bash)
| grep so | sed -e '/^[^\t]/ d'
| sed -e 's/\t//' | sed -e 's/.*=..//'
| sed -e 's/ (0.*)//'
| xargs -I{} md5sum {}
6a0cb513f136f5c40332e3882e603a02 /lib/x86_64-linux-gnu/libtinfo.so.5
c60bb4f3ae0157644b993cc3c0d2d11e /lib/x86_64-linux-gnu/libdl.so.2
365459887779aa8a0d3148714d464cc4 /lib/x86_64-linux-gnu/libc.so.6
578a20e00cb67c5041a78a5e9281b70c /lib64/ld-linux-x86-64.so.2
For loop can also be used:
for FILE in `<your command>`;do md5sum $FILE;done
For eg:
for FILE in `ldd /usr/bin/gcc | grep so | sed -e '/^[^\t]/ d' | sed -e 's/\t//' | sed -e 's/.*=..//' | sed -e 's/ (0.*)//'`;do md5sum $FILE;done
Trying to use the beauty of Sed so I don't have to manually update a few hundred files. I'll note my employer only allows use of Win8 (joy), so I use Cygwin all day until I can use my Linux boxes at home.
The following works on a Linux (bash) command line, but not Cygwin
> grep -lrZ "/somefile.js" . | xargs -0 -l sed -i -e 's|/somefile.js|/newLib.js|g'
sed: can't read ./testTarget.jsp: No such file or directory
# works
> sed -i 's|/somefile.js|/newLib.js|g' ./testTarget.jsp
So the command by itself works, but not passed through Xargs. And, before you say to use Perl instead of Sed, the Perl equivalent throws the very same error
> grep -lrZ "/somefile.js" . | xargs -0 perl -i -pe 's|/somefile.js|/newLib.js|g'
Can't open ./testTarget.jsp
: No such file or directory.
Use the xargs -n option to split up the arguments and force separate calls to sed.
On windows using GnuWin tools (not Cygwin) I found that I need to split up the input to sed. By default xargs will pass ALL of the files from grep to one call to sed.
Let's say you have 4 files that match your grep call, the sed command will run through xargs like this:
sed -i -e 's|/somefile.js|/newLib.js|g' ./file1 ./file2 ./subdir/file3 ./subdir/file4
If the number of files is too large sed will give you this error.
Use the -n option to have xargs call sed repeatedly until it exhausts all of the arguments.
grep -lrZ "/somefile.js" . | xargs -0 -l -n 2 sed -i -e 's|/somefile.js|/newLib.js|g'
In my small example using -n 2 will internally do this:
sed -i -e 's|/somefile.js|/newLib.js|g' ./file1 ./file2
sed -i -e 's|/somefile.js|/newLib.js|g' ./subdir/file3 ./subdir/file4
I had a large set of files and directories (around 3000 files), and using xargs -n 5 worked great.
When I tried -n 10 I got errors. Using xargs --verbose I could see some of the commandline calls were getting cut off at around 500 characters. So you may need to make -n smaller depending on the path length of the files you are woking with.
I'm trying to rename all files in current directory such that upper case name is converted to lower. I'm trying to do it like this:
ls -1|gawk '{print "`mv "$0" "tolower($0)"`"}'|xargs -i -t eval {}
I have two files in the directory, Y and YY
-t added for debugging, and output is:
eval `mv Y y`
xargs: eval: No such file or directory
if I execute the eval on its own, it works and moves Y to y.
I know there are other ways to achieve this, but I'd like to get this working if I can!
Cheers
eval is a shell builtin command, not a standalone executable. Thus, xargs cannot run it directly. You probably want:
ls -1 | gawk '{print "`mv "$0" "tolower($0)"`"}' | xargs -i -t sh -c "{}"
Although you're looking at an xargs solution, another method to perform the same thing can be done with tr (assuming sh/bash/ksh syntax):
for i in *; do mv $i `echo $i | tr '[A-Z]' '[a-z]'`; done
If your files are created by creative users, you will see files like:
My brother's 12" records
The solutions so far do not work on that kind of files. If you have GNU Parallel installed this will work (even on the files with creative names):
ls | parallel 'mv {} "$(echo {} | tr "[:upper:]" "[:lower:]")"'
Watch the intro video to learn more: http://www.youtube.com/watch?v=OpaiGYxkSuQ
You can use eval with xargs like the one below.
Note: I only tested this in bash shell
ls -1| gawk '{print "mv "$0" /tmp/"toupper($0)""}'| xargs -I {} sh -c "eval {}"
or
ls -1| gawk '{print "mv "$0" /tmp/"toupper($0)""}'| xargs -I random_var_name sh -c "eval random_var_name"
I generally use this approach when I want to avoid one-liner for loop.
e.g.
for file in $(find /some/path | grep "pattern");do somecmd $file; done
The same can be written like below
find /some/path | grep "pattern"| xargs -I {} sh -c "somecmd {}"