How to escape special characters? - shell
I'm trying to remove songs via a bash shell for loop yet removing a file like this
while read item; do rm "$item"; done < duplicates
keeps getting caught up on song name. Is it possible to get around this? My song titles might look like this:
/home/user/Music/Master List's Music/iTunes/iTunes\ Music/John\ Mayer/Room\ for\ Squares\ \[Aware\]/07\ 83.m4a
/home/user/Music/Master List's Music/bsg\ season\ 1\ \(Case\ Conflict\ 1\)/06\ A\ Good\ Lighter.mp3
/home/user/Music/Master List's Music/Nino\ Rota/The\ Godfather\ Pt.\ 3/14\ A\ Casa\ Amiche.m4a
as you can see, in order to remove an item I can have no %.()[] or anything else without being escaped unless it's the . before the file extension obviously. Is there a way I can escape special characters like this?
For instance, I used sed to turn the %20 into spaces:
cat duplicates | sed 's/%20/\\ /g' > clean_duplicates
The output I'm looking for looks like this:
/home/user/Music/Master\ List\'s\ Music/iTunes/iTunes Music/John\ Mayer/Room\ for\ Squares\ \[Aware\]/07\ 83.m4a
/home/user/Music/Master\ List\'s\ Music/bsg\ season\ 1\ \(Case\ Conflict\ 1\)/06\ A\ Good\ Lighter.mp3
/home/user/Music/Master\ List\'s\ Music/Nino\ Rota/The Godfather\ Pt\.\ 3\/14\ A\ Casa\ Amiche.m4a
Update To address the actual url-decoding (I missed it before):
while read line; do printf "$(echo -n $line | sed 's/\\/\\\\/g;s/\(%\)\([0-9a-fA-F][0-9a-fA-F]\)/\\x\2/g')\n"; done < input
Output:
/home/user/Music/Master List's Music/iTunes/iTunes Music/John Mayer/Room for Squares [Aware]/07 83.m4a
/home/user/Music/Master List's Music/bsg season 1 (Case Conflict 1)/06 A Good Lighter.mp3
/home/user/Music/Master List's Music/Nino Rota/The Godfather Pt. 3/14 A Casa Amiche.m4a
So in order to delete those files, e.g. redirect the cleaned output to a file:
while read line
do
printf "$(echo -n $line | sed 's/\\/\\\\/g;s/\(%\)\([0-9a-fA-F][0-9a-fA-F]\)/\\x\2/g')\n"
done < duplicates > cleaned_duplicates
while read file; do rm -v "$file"; done < cleaned_duplicates
If you prefer to store the names into a script files using explicit shell character escaping you could do
while read file; do printf "rm -v %q\n" "$file"; done < cleaned_duplicates > script.sh
Which should result in script.sh containing:
rm -v /home/user/Music/Master\ List\'s\ Music/iTunes/iTunes\ Music/John\ Mayer/R
rm -v /home/user/Music/Master\ List\'s\ Music/bsg\ season\ 1\ \(Case\ Conflict\
rm -v /home/user/Music/Master\ List\'s\ Music/Nino\ Rota/The\ Godfather\ Pt.\ 3/
Related
Speed up bash for loop which contains multiple sed commands
my bash for loop looks like: for i in read_* ; do cut -f1 $i | sponge $i sed -i '1 s/^/>/g' $i sed -i '3 s/^/>ref\n/g' $i sed -i '4d' $i sed -i '1h;2H;1,2d;4G' $i mv $i $i.fasta done Are there any methods of speeding up this process, perhaps using GNU parallel? EDIT: Added input and expected output. Input: sampleid 97 stuff 2086 42 213M = 3322 1431 TATTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| TTTTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA Hopeful output: >ref TTTTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA >sampleid TATTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA I used the sed -i '1h;2H;1,2d;4G' $i command to swap lines 2 and 4.
If I read it right, this should create the same result, though it would probably help a LOT if I could see what your input and expected output look like... awk '{$0=$1} FNR==1{hd=">"$0; next} FNR==2{hd=hd"\n"$0;next} FNR==3{print ">ref\n"$0 > FILENAME".fasta"} FNR==4{next} FNR==5{print hd"\n"$0 > FILENAME".fasta"} ' read_* My input files: $: cat read_x foo x bar x baz x last x curiosity x $: cat read_y FOO y BAR y BAZ y LAST y CURIOSITY y and the resulting output files: $: cat read_x.fasta >ref baz >foo bar curiosity $: cat read_y.fasta >ref BAZ >FOO BAR CURIOSITY This runs in one pass with no loop aside from awk's usual internals, and leaves the originals in place so you can check it first. If all is good, all that's left is to remove the originals. For that, I would use extended globbing. $: shopt -s extglob; rm read_!(*.fasta) That will clean up the original inputs but not the new outputs. Same results, three commands, no loops. I am, or course, making some assumptions about what you are meaning to do that might not be accurate. To get this format in a single sed call - $: sed -e 's/[[:space:]].*//' -e '1{s/^/>/;h;d}' -e '2{H;s/.*/>ref/}' -e '4x' read_x >ref baz >foo bar curiosity but that's not the same commands you used, so maybe I'm misreading it. To use this to in-place edit multiple files at a time (instead of calling it in a loop on each file), use -si so that the line numbers apply to each file rather than the stream of records they collectively produce. DON'T use -is, though you could use -i -s. $: sed -s -i -e 's/[[:space:]].*//' -e '1{s/^/>/;h;d}' -e '2{H;s/.*/>ref/}' -e '4x' read_* This still leaves you with the issue of renaming each, but xargs makes that pretty easy in the given example. printf "%s\n" read_* | xargs -I# mv # #.fasta addendum Using the file you gave in the OP, assuming every file is the same general structure and exactly 4 lines - $: cat file_0 # I made files 0 through 7, but with same data sampleid 97 stuff 2086 42 213M = 3322 1431 TATTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| TTTTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA $: sed -Esi '1{s/^([^[:space:]]+).*/>\1/;h;s/.*/>ref/}; 3x;' file_? $: cat file_0 # used a diff on each, worked on all at once >ref TATTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA >sampleid TTTTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA Breakout: -Esi Extended pattern matching, separate file linecounts, in-place edits 1{...}; Collectively do these commands, in order, only on every line 1 s/^([^[:space:]]+).*/>\1/ add leading > but strip everything after any whitespace h store the resulting >\1 line in the hold buffer s/.*/>ref/ then replace the whole line with a literal >ref `3x' swap line 3 with the value in the hold buffer from line 1 file_? I used a glob to supply the appropriate list of files all at once. Doing same with awk: $: awk 'FNR==1{id=">"$1; print ">ref" >FILENAME".fasta"; next} FNR==3{print id > FILENAME".fasta"; next} {print $0 > FILENAME".fasta"}' file_? Then you can do file management as above with the xargs/mv for the sed or the shopt/rm for the awk - or we could add a little organizational work in awk if you like. Consider this: awk 'BEGIN { system(" mkdir -p done ") } FNR==1 { id=">"$1; print ">ref" > FILENAME".fasta"; next } # skip printing original FNR==3 { print id > FILENAME".fasta"; next } # skip printing original { print $0 > FILENAME".fasta" } # every line NOT skipped FNR==4 { close(FILENAME); close(FILENAME".fasta"); system("mv " FILENAME " done/") }' file_? Then if there are any problems, it's easy to delete the fasta's, move the originals back, adjust the code, and try again. If everything is ok, it's fast and easy to rm -fr done, yes? Note that I really only added the mkdir inside a system call in the awk to show that you can, and to keep from having to manually do it separately if you have to run a few iterations or move it all into a wrapper script, etc.
The code in the question runs multiple subprocesses (cut, sponge, sed four times, and mv) for each file that is processed. Running subprocesses is relatively slow, so you can speed up the code significantly by reducing the number of them. This Shellcheck-clean code is one way to do it: #! /bin/bash -p old_files=() for f in read_* ; do readarray -t lines <"$f" printf '>ref\n%s\n>%s\n%s\n' \ "${lines[3]}" "${lines[0]%%[[:space:]]*}" "${lines[1]}" >"$f.fasta" old_files+=( "$f" ) done rm -- "${old_files[#]}" This runs no subprocesses when processing individual files. It just reads the lines of the old file into an array using the built-in readarray command and writes to the new file using the built-in printf. See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for an explanation of the %% in ${lines[0]%%[[:space:]]*}. To avoid running rm for each file, the code keeps a list of files to be deleted and removes all of them at the end. If you try the code, consider commenting the rm line until you are very confident that the rest of the code is doing what you want.
Is there a way to add a suffix to files where the suffix comes from a list in a text file?
So currently the searches are coming up with a single word renaming solution, where you define the (static) suffix within the code. I need to rename based on a text based filelist and so - I have a list of files in /home/linux/test/ : 1000.ext 1001.ext 1002.ext 1003.ext 1004.ext Then I have a txt file (labels.txt) containing the labels I want to use: Alpha Beta Charlie Delta Echo I want to rename the files to look like (example1): 1000 - Alpha.ext 1001 - Beta.ext 1002 - Charlie.ext 1003 - Delta.ext 1004 - Echo.ext How would you a script which renames all the files in /home/linux/test/ to the list in example1?
Use paste to loop through the two lists in parallel. Split the filenames into the prefix and extension, then combine everything to make the new filenames. dir=/home/linux/test for file in "$dir"/*.ext do read -r label prefix=${file%.*} # remove everything from last . ext=${file##*.} # remove everything before last . mv "$file" "$prefix - $label.$ext" done < labels.txt
I originally partly got the request wrong, although this step is still useful, because it gives you the filenames you need. #!/bin/sh count=1000 cp labels.txt stack cat > ed1 <<EOF 1p q EOF cat > ed2 <<EOF 1d wq EOF next () { [ -s stack ] && main } main () { line="$(ed -s stack < ed1)" echo "${count} - ${line}.ext" >> newfile ed -s stack < ed2 count=$(($count+1)) next } next Now we just need to move the files:- cp newfile stack for i in *.ext do newname="$(ed -s stack < ed1)" mv -v "${i}" "${newname}" ed -s stack < ed2 done rm -v ./ed1 rm -v ./ed2 rm -v ./stack rm -v ./newfile
On the possibility that you don't have exactly the same number of files as labels, I set it up to cycle a couple of arrays in pseudo-parallel. $: cat script #!/bin/env bash lst=( *.ext ) # array of files to rename mapfile -t labels < labels.txt # array of labels to attach for ndx in ${!lst[#]} # for each filename's numeric index do # assign the new name new="${lst[ndx]/.ext/ - ${labels[ndx%${#labels[#]}]}.ext}" # show the command to rename the file echo "mv \"${lst[ndx]}\" \"$new\"" done $: ls -1 *ext # I added an extra file 1000.ext 1001.ext 1002.ext 1003.ext 1004.ext 1005.ext $: ./script # loops back if more files than labels mv "1000.ext" "1000 - Alpha.ext" mv "1001.ext" "1001 - Beta.ext" mv "1002.ext" "1002 - Charlie.ext" mv "1003.ext" "1003 - Delta.ext" mv "1004.ext" "1004 - Echo.ext" mv "1005.ext" "1005 - Alpha.ext" $: ./script > do # use ./script to write ./do $: ./do # use ./do to change the names $: ls -1 '1000 - Alpha.ext' '1001 - Beta.ext' '1002 - Charlie.ext' '1003 - Delta.ext' '1004 - Echo.ext' '1005 - Alpha.ext' do labels.txt script You can just remove the echo to have ./script rename the files there. I renamed labels to labels.txt to match your example. If you aren't using bash this will need a call to something like sed or awk. Here's a short awk-based script that will do the same. $: cat script2 #!/bin/env sh printf "%s\n" *.ext > files.txt awk 'NR==FNR{label[i++]=$0} NR>FNR{ if (! label[i] ) { i=0 } cmd="mv \""$0"\" \""gensub(/[.]ext/, " - "label[i++]".ext", 1)"\""; print cmd; # system(cmd); }' labels.txt files.txt Uncomment the system line to make it actually do the renames as well. It does assume your filenames don't have embedded newlines. Let us know if that's a problem.
looping with grep over several files
I have multiple files /text-1.txt, /text-2.txt ... /text-20.txt and what I want to do is to grep for two patterns and stitch them into one file. For example: I have grep "Int_dogs" /text-1.txt > /text-1-dogs.txt grep "Int_cats" /text-1.txt> /text-1-cats.txt cat /text-1-dogs.txt /text-1-cats.txt > /text-1-output.txt I want to repeat this for all 20 files above. Is there an efficient way in bash/awk, etc. to do this ?
#!/bin/sh count=1 next () { [[ "${count}" -lt 21 ]] && main [[ "${count}" -eq 21 ]] && exit 0 } main () { file="text-${count}" grep "Int_dogs" "${file}.txt" > "${file}-dogs.txt" grep "Int_cats" "${file}.txt" > "${file}-cats.txt" cat "${file}-dogs.txt" "${file}-cats.txt" > "${file}-output.txt" count=$((count+1)) next } next
grep has some features you seem not to be aware of: grep can be launched on lists of files, but the output will be different: For a single file, the output will only contain the filtered line, like in this example: cat text-1.txt I have a cat. I have a dog. I have a canary. grep "cat" text-1.txt I have a cat. For multiple files, also the filename will be shown in the output: let's add another textfile: cat text-2.txt I don't have a dog. I don't have a cat. I don't have a canary. grep "cat" text-*.txt text-1.txt: I have a cat. text-2.txt: I don't have a cat. grep can be extended to search for multiple patterns in files, using the -E switch. The patterns need to be separated using a pipe symbol: grep -E "cat|dog" text-1.txt I have a dog. I have a cat. (summary of the previous two points + the remark that grep -E equals egrep): egrep "cat|dog" text-*.txt text-1.txt:I have a dog. text-1.txt:I have a cat. text-2.txt:I don't have a dog. text-2.txt:I don't have a cat. So, in order to redirect this to an output file, you can simply say: egrep "cat|dog" text-*.txt >text-1-output.txt
Assuming you're using bash. Try this: for i in $(seq 1 20) ;do rm -f text-${i}-output.txt ; grep -E "Int_dogs|Int_cats" text-${i}.txt >> text-${i}-output.txt ;done Details This one-line script does the following: Original files are intended to have the following name order/syntax: text-<INTEGER_NUMBER>.txt - Example: text-1.txt, text-2.txt, ... text-100.txt. Creates a loop starting from 1 to <N> and <N> is the number of files you want to process. Warn: rm -f text-${i}-output.txt command first will be run and remove the possible outputfile (if there is any), to ensure that a fresh new output file will be only available at the end of the process. grep -E "Int_dogs|Int_cats" text-${i}.txt will try to match both strings in the original file and by >> text-${i}-output.txt all the matched lines will be redirected to a newly created output file with the relevant number of the original file. Example: if integer number in original file is 5 text-5.txt, then text-5-output.txt file will be created & contain the matched string lines (if any).
Using sed to find a string with wildcards and then replacing with same wildcards
So I am trying to remove new lines using sed, because it the only way I can think of to do it. I'm completely self taught so there may be a more efficient way that I just don't know. The string I am searching for is \HF=-[0-9](newline character). The problem is the data it is searching through can look like (Note: there are actual new line characters in this data, which I think is causing a bit of the problem) 1\1\GINC-N076\SP\RMP2-FC\CC-pVDZ\C12H12\R2536\09-Apr-2020\0\\# mp2/cc- pVDZ\\Squish3_Slide0\\0,1\H,0,0.,2.4822,0.\C,0,0.,1.3948,0.\C,0,0.,-1. 3948,0.\C,0,1.2079,0.6974,0.\C,0,-1.2079,0.6974,0.\C,0,-1.2079,-0.6974 ,0.\C,0,1.2079,-0.6974,0.\H,0,2.1497,1.2411,0.\H,0,-2.1497,1.2411,0.\H ,0,-2.1497,-1.2411,0.\H,0,2.1497,-1.2411,0.\H,0,0.,-2.4822,0.\C,0,0.,1 .3948,3.\C,0,0.,-1.3948,3.\C,0,1.2079,0.6974,3.\C,0,-1.2079,0.6974,3.\ C,0,-1.2079,-0.6974,3.\C,0,1.2079,-0.6974,3.\H,0,0.,2.4822,3.\H,0,2.14 97,1.2411,3.\H,0,-2.1497,1.2411,3.\H,0,-2.1497,-1.2411,3.\H,0,2.1497,- 1.2411,3.\H,0,0.,-2.4822,3.\\Version=ES64L-G09RevD.01\State=1-AG\HF=-4 61.3998608\MP2=-463.0005321\RMSD=3.490e-09\PG=D02H [SG"(C4H4),X(C8H8)] \\# OR 1\1\GINC-N076\SP\RMP2-FC\CC-pVDZ\C12H12\R2536\09-Apr-2020\0\\# mp2/cc- pVDZ\\Squish3.1_Slide0\\0,1\H,0,0.,2.4822,0.\C,0,0.,1.3948,0.\C,0,0.,- 1.3948,0.\C,0,1.2079,0.6974,0.\C,0,-1.2079,0.6974,0.\C,0,-1.2079,-0.69 74,0.\C,0,1.2079,-0.6974,0.\H,0,2.1497,1.2411,0.\H,0,-2.1497,1.2411,0. \H,0,-2.1497,-1.2411,0.\H,0,2.1497,-1.2411,0.\H,0,0.,-2.4822,0.\C,0,0. ,1.3948,3.1\C,0,0.,-1.3948,3.1\C,0,1.2079,0.6974,3.1\C,0,-1.2079,0.697 4,3.1\C,0,-1.2079,-0.6974,3.1\C,0,1.2079,-0.6974,3.1\H,0,0.,2.4822,3.1 \H,0,2.1497,1.2411,3.1\H,0,-2.1497,1.2411,3.1\H,0,-2.1497,-1.2411,3.1\ H,0,2.1497,-1.2411,3.1\H,0,0.,-2.4822,3.1\\Version=ES64L-G09RevD.01\St ate=1-AG\HF=-461.4104442\MP2=-463.0062587\RMSD=3.651e-09\PG=D02H [SG"( C4H4),X(C8H8)]\\# OR 1\1\GINC-N076\SP\RMP2-FC\CC-pVDZ\C12H12\R2536\09-Apr-2020\0\\# mp2/cc- pVDZ\\Squish3.3_Slide1.7\\0,1\H,0,0.,2.4822,0.\C,0,0.,1.3948,0.\C,0,0. ,-1.3948,0.\C,0,1.2079,0.6974,0.\C,0,-1.2079,0.6974,0.\C,0,-1.2079,-0. 6974,0.\C,0,1.2079,-0.6974,0.\H,0,2.1497,1.2411,0.\H,0,-2.1497,1.2411, 0.\H,0,-2.1497,-1.2411,0.\H,0,2.1497,-1.2411,0.\H,0,0.,-2.4822,0.\C,0, 0.,-0.3052,3.3\C,0,0.,-3.0948,3.3\C,0,1.2079,-1.0026,3.3\C,0,-1.2079,- 1.0026,3.3\C,0,-1.2079,-2.3974,3.3\C,0,1.2079,-2.3974,3.3\H,0,0.,0.782 2,3.3\H,0,2.1497,-0.4589,3.3\H,0,-2.1497,-0.4589,3.3\H,0,-2.1497,-2.94 11,3.3\H,0,2.1497,-2.9411,3.3\H,0,0.,-4.1822,3.3\\Version=ES64L-G09Rev D.01\State=1-AG\HF=-461.436061\MP2=-463.0177441\RMSD=7.859e-09\PG=C02H [SGH(C4H4),X(C8H8)]\\# OR 1\1\GINC-N076\SP\RMP2-FC\CC-pVDZ\C12H12\R2536\09-Apr-2020\0\\# mp2/cc- pVDZ\\Squish3.6_Slide0.9\\0,1\H,0,0.,2.4822,0.\C,0,0.,1.3948,0.\C,0,0. ,-1.3948,0.\C,0,1.2079,0.6974,0.\C,0,-1.2079,0.6974,0.\C,0,-1.2079,-0. 6974,0.\C,0,1.2079,-0.6974,0.\H,0,2.1497,1.2411,0.\H,0,-2.1497,1.2411, 0.\H,0,-2.1497,-1.2411,0.\H,0,2.1497,-1.2411,0.\H,0,0.,-2.4822,0.\C,0, 0.,0.4948,3.6\C,0,0.,-2.2948,3.6\C,0,1.2079,-0.2026,3.6\C,0,-1.2079,-0 .2026,3.6\C,0,-1.2079,-1.5974,3.6\C,0,1.2079,-1.5974,3.6\H,0,0.,1.5822 ,3.6\H,0,2.1497,0.3411,3.6\H,0,-2.1497,0.3411,3.6\H,0,-2.1497,-2.1411, 3.6\H,0,2.1497,-2.1411,3.6\H,0,0.,-3.3822,3.6\\Version=ES64L-G09RevD.0 1\State=1-AG\HF=-461.4376969\MP2=-463.0163868\RMSD=7.263e-09\PG=C02H [ SGH(C4H4),X(C8H8)]\\# Basically the number I am looking for can be broken up into two lines at any point based on character count. I need to get rid of the newline breaking up the number so that I can extract the entire value into a separate file. (I have no problems with the extraction to a new file, hence why it isn't included in the code) Currently I am using this code sed -i ':a;N;$!ba;s/HF=-*[0-9]*\n/HF=-*[0-9]*/g' $i && Which ALMOST works, expect it doesn't replace the wildcard values with the same values. It replaces it with the actual text [0-9] instead and doesn't always remove the new line character. Important to the is that THERE ARE ACTUAL NEW LINE CHARACTERS in the output file and there is no way to change that without messing up the other 30 lines I am extracting from this output file. What I want is to just get rid of the newline characters that occur when that string is found, regardless of how many digits there are in between the - sign and the newline character. So the expected output would be something like 1\1\GINC-N076\SP\RMP2-FC\CC-pVDZ\C12H12\R2536\09-Apr-2020\0\\# mp2/cc- pVDZ\\Squish3_Slide0\\0,1\H,0,0.,2.4822,0.\C,0,0.,1.3948,0.\C,0,0.,-1. 3948,0.\C,0,1.2079,0.6974,0.\C,0,-1.2079,0.6974,0.\C,0,-1.2079,-0.6974 ,0.\C,0,1.2079,-0.6974,0.\H,0,2.1497,1.2411,0.\H,0,-2.1497,1.2411,0.\H ,0,-2.1497,-1.2411,0.\H,0,2.1497,-1.2411,0.\H,0,0.,-2.4822,0.\C,0,0.,1 .3948,3.\C,0,0.,-1.3948,3.\C,0,1.2079,0.6974,3.\C,0,-1.2079,0.6974,3.\ C,0,-1.2079,-0.6974,3.\C,0,1.2079,-0.6974,3.\H,0,0.,2.4822,3.\H,0,2.14 97,1.2411,3.\H,0,-2.1497,1.2411,3.\H,0,-2.1497,-1.2411,3.\H,0,2.1497,- 1.2411,3.\H,0,0.,-2.4822,3.\\Version=ES64L-G09RevD.01\State=1-AG\HF=-461.3998608\MP2=-463.0005321\RMSD=3.490e-09\PG=D02H [SG"(C4H4),X(C8H8)] \\# These files are rather large and have over 1500 executions of this line of code, so the more efficient the better. Everything else in the script this is in is using a combination of grep, awk, sed, and basic UNIX commands. EDIT After trying sed -i -E ':a;N;$!ba;s/(\\HF=-?[.0-9]*)\n/\1/' $i && I still had no luck getting rid of those pesky new line characters. If it has any effect on the answers at all here is the rest of the code to go with the one line that is causing problems echo name HF MP2 mpdiff | cat > allE for i in *.out do echo name HF MP2 mpdiff | cat > $i.allE grep "Slide" $i | cut -d "\\" -f2 | cat | tr -d '\n' > $i.name && grep "EUMP2" $i | cut -d "=" -f3 | cut -c 1-25 | tr '\n' ' ' | tr -s ' ' >> $i.mp && grep "EUMP2" $i | cut -d "=" -f2 | cut -c 1-25 | tr '\n' ' ' | tr -s ' ' >> $i.mpdiff && sed -i -E ':a;N;$!ba;s/(\\HF=-?[.0-9]*)\n/\1/' $i && grep '\\HF' $i | awk -F 'HF' '{print substr($2,2,14)}' | tr '\n' ' ' >> $i.hf && paste $i.name >> $i.energies && sed -i 's/ /0 /g' $i.hf && sed -i 's/\\/0/g' $i.hf && sed -i 's/[A-Z]/0/g' $i.hf && paste $i.hf >> $i.energies && sed -i 's/[ABCEFGHIJKLMNOPQRSTUVWXYZ]//g' $i.mp && paste $i.mp >> $i.energies && sed -i 's/[ABCEFGHIJKLMNOPQRSTUVWXYZ]//g' $i.mpdiff && paste $i.mpdiff >> $i.energies && transpose $i.energies >> $i.allE #temp.txt && #cat temp.txt > $i.energies #echo $i is finished done echo see allE for energies #rm *.energies #temp.txt rm *.name rm *.mp rm *.hf rm *.mpdiff
Here is how you can fix your current attempt. sed -E ':a;N;$!ba;s/(\\HF=-?[.0-9]*)\n/\1/' Add the i flag if you want to make the changes on the file itself, add && to send the job to the background, etc. The -E flag is needed, because backreferences (see below) are part of extended regular expressions. I made the following changes: I changed -* to -? as there should be at most one dash (if I understand correctly and that is in fact a minus sign, not a dash). I added the period to the bracket expression, so that the decimal point would be matched too. (Note that in a bracket expression, the dot is a regular character). I wrapped the whole thing except the newline in parentheses - making it into a subexpression, which you can refer to with a backreference - which is what I did in the replacement part. A few notes though - this will join the lines even if the entire number is at the end of one line, but not followed by the closing \. If in fact the entire number being on one line, but the closing \ is on the next line, you can change the sed command slightly, to leave those alone. On the other hand, this does not handle situations where, for example, one line ends in \H and the next line begins with F=304.222\ You only mentioned "split number" in your problem statement; shouldn't you, though, also handle such cases, where the newline splits the \HF=...\ token, just not in the "number" portion of the token?
It looks like your input lines start with a space. I have ignored them in this solution. sed -rz 's/(AG\\HF=-[0-9]*)\n/\1/g' "$i"
Remove spaces between patterns
I have a log file where data is separated by spaces. Unfortunately one of the datafields contains spaces as well. I would like to replace those spaces with "%20". It looks like this: 2012-11-02 23:48:36 INFO 10.2.3.23 something strange name.doc 3.0.0 view1 orientation_right the expected result is 2012-11-02 23:48:36 INFO 10.2.3.23 something%20strange%20name.doc 3.0.0 view1 orientation_right unpredictable that how many spaces we have between the IP address and ".doc". So I would like to change them between these two patterns using pure bash if possible. thanks for the help
$ cat file 2012-11-02 23:48:36 INFO 10.2.3.23 something strange name.doc 3.0.0 view1 orientation_right Using Perl: $ perl -lne 'if (/(.*([0-9]{1,3}\.){3}[0-9]{1,3} )(.*)(.doc.*)/){($a,$b,$c)=($1,$3,$4);$b=~s/ /%20/g;print $a.$b.$c;}' file 2012-11-02 23:48:36 INFO 10.2.3.23 something%20strange%20name.doc 3.0.0 view1 orientation_right
This might work for you (GNU sed): sed 's/\S*\s/&\n/4;s/\(\s\S*\)\{3\}$/\n&/;h;s/ /%20/g;H;g;s/\(\n.*\n\)\(.*\)\n.*\n\(.*\)\n.*/\3\2/' file This splits the line into three, copies the line, replaces space's with %20's in one of the copies and reassembles the line discarding the unwanted pieces. EDIT: With reference to the comment below, the above solution can be ameliorated to: sed -r 's/\S*\s/&\n/4;s/.*\.doc/&\n/;h;s/ /%20/g;H;g;s/(\n.*\n)(.*)\n.*\n(.*)\n.*/\3\2/' file
Untested as of yet, but in Bash 4 it's possible to do this if [[ $line =~ (.*([0-9]+\.){3}[0-9]+ +)([^ ].*\.doc)(.*) ]]; then nospace=${BASH_REMATCH[3]// /%20} printf "%s%s%s\n" ${BASH_REMATCH[1]} ${nospace} ${BASH_REMATCH[4]} fi
Here's one way with GNU sed: echo "2012-11-02 23:48:36 INFO 10.2.3.23 something strange name.doc 3.0.0 view1 orientation_right" | sed -r 's/(([0-9]+\.){3}[0-9]+\s+)(.*\.doc)/\1\n\3\n/; h; s/[^\n]+\n([^\n]+)\n.*$/\1/; s/\s/%20/g; G; s/([^\n]+)\n([^\n]+)\n([^\n]+)\n(.*)$/\2\1\4/' Output: 2012-11-02 23:48:36 INFO 10.2.3.23 something%20strange%20name.doc 3.0.0 view1 orientation_right Explanation s/(([0-9]+\.){3}[0-9]+\s+)(.*\.doc)/\1\n\3\n/ # Separate the interesting bit on its own line h # Store the rest in HS for later s/[^\n]+\n([^\n]+)\n.*$/\1/ # Isolate the interesting bit s/\s/%20/g # Do the replacement G # Fetched stored bits back s/([^\n]+)\n([^\n]+)\n([^\n]+)\n(.*)$/\2\1\4/ # Reorganize into the correct order
Just bash. Assuming 4 fields appear before the space separated string and 3 fields appear after: reformat_line() { local sep i new="" for ((i=1; i<=$#; i++)); do if (( i==1 )); then sep="" elif (( (1<i && i<=5) || ($#-3<i && i<=$#) )); then sep=" " else sep="%20" fi new+="$sep${!i}" done echo "$new" } while IFS= read -r line; do reformat_line $line # unquoted variable here done < filename outputs 2012-11-02 23:48:36 INFO 10.2.3.23 something%20strange%20name.doc 3.0.0 view1 orientation_right
A variation on Thor's answers, but using 3 processes (4 with the cat bellow but you can get rid of it by putting your_file as the last argument of the 1st sed): cat your_file | sed -r -e 's/ (([0-9]+\.){3}[0-9]+) +(.*\.doc) / \1\n\3\n/' | sed -e '2~3s/ /%20/g' | paste -s -d " \n" As Thor explained: The 1st sed (s/ (([0-9]+\.){3}[0-9]+) +(.*\.doc) / \1\n\3\n/) separates the interesting bit on its own line. And then: The 2nd sed replaces all spaces by %20 on the 2nd line and every 3 lines. Finally, paste it back together. It must be noted that the 2~3 part is a GNU sed extension. If you do not have GNU sed, you can do: cat your_file | sed -r -e 's/ (([0-9]+\.){3}[0-9]+) +(.*\.doc) / \1\n\3\n/' | sed -e 'N;P;s/.*\n//;s/ /%20/g;N' | paste -s -d " \n"