Remove spaces between patterns - bash

I have a log file where data is separated by spaces. Unfortunately one of the datafields contains spaces as well. I would like to replace those spaces with "%20". It looks like this:
2012-11-02 23:48:36 INFO 10.2.3.23 something strange name.doc 3.0.0 view1 orientation_right
the expected result is
2012-11-02 23:48:36 INFO 10.2.3.23 something%20strange%20name.doc 3.0.0 view1 orientation_right
unpredictable that how many spaces we have between the IP address and ".doc". So I would like to change them between these two patterns using pure bash if possible.
thanks for the help

$ cat file
2012-11-02 23:48:36 INFO 10.2.3.23 something strange name.doc 3.0.0 view1 orientation_right
Using Perl:
$ perl -lne 'if (/(.*([0-9]{1,3}\.){3}[0-9]{1,3} )(.*)(.doc.*)/){($a,$b,$c)=($1,$3,$4);$b=~s/ /%20/g;print $a.$b.$c;}' file
2012-11-02 23:48:36 INFO 10.2.3.23 something%20strange%20name.doc 3.0.0 view1 orientation_right

This might work for you (GNU sed):
sed 's/\S*\s/&\n/4;s/\(\s\S*\)\{3\}$/\n&/;h;s/ /%20/g;H;g;s/\(\n.*\n\)\(.*\)\n.*\n\(.*\)\n.*/\3\2/' file
This splits the line into three, copies the line, replaces space's with %20's in one of the copies and reassembles the line discarding the unwanted pieces.
EDIT:
With reference to the comment below, the above solution can be ameliorated to:
sed -r 's/\S*\s/&\n/4;s/.*\.doc/&\n/;h;s/ /%20/g;H;g;s/(\n.*\n)(.*)\n.*\n(.*)\n.*/\3\2/' file

Untested as of yet, but in Bash 4 it's possible to do this
if [[ $line =~ (.*([0-9]+\.){3}[0-9]+ +)([^ ].*\.doc)(.*) ]]; then
nospace=${BASH_REMATCH[3]// /%20}
printf "%s%s%s\n" ${BASH_REMATCH[1]} ${nospace} ${BASH_REMATCH[4]}
fi

Here's one way with GNU sed:
echo "2012-11-02 23:48:36 INFO 10.2.3.23 something strange name.doc 3.0.0 view1 orientation_right" |
sed -r 's/(([0-9]+\.){3}[0-9]+\s+)(.*\.doc)/\1\n\3\n/; h; s/[^\n]+\n([^\n]+)\n.*$/\1/; s/\s/%20/g; G; s/([^\n]+)\n([^\n]+)\n([^\n]+)\n(.*)$/\2\1\4/'
Output:
2012-11-02 23:48:36 INFO 10.2.3.23 something%20strange%20name.doc 3.0.0 view1 orientation_right
Explanation
s/(([0-9]+\.){3}[0-9]+\s+)(.*\.doc)/\1\n\3\n/ # Separate the interesting bit on its own line
h # Store the rest in HS for later
s/[^\n]+\n([^\n]+)\n.*$/\1/ # Isolate the interesting bit
s/\s/%20/g # Do the replacement
G # Fetched stored bits back
s/([^\n]+)\n([^\n]+)\n([^\n]+)\n(.*)$/\2\1\4/ # Reorganize into the correct order

Just bash. Assuming 4 fields appear before the space separated string and 3 fields appear after:
reformat_line() {
local sep i new=""
for ((i=1; i<=$#; i++)); do
if (( i==1 )); then
sep=""
elif (( (1<i && i<=5) || ($#-3<i && i<=$#) )); then
sep=" "
else
sep="%20"
fi
new+="$sep${!i}"
done
echo "$new"
}
while IFS= read -r line; do
reformat_line $line # unquoted variable here
done < filename
outputs
2012-11-02 23:48:36 INFO 10.2.3.23 something%20strange%20name.doc 3.0.0 view1 orientation_right

A variation on Thor's answers, but using 3 processes (4 with the cat bellow but you can get rid of it by putting your_file as the last argument of the 1st sed):
cat your_file |
sed -r -e 's/ (([0-9]+\.){3}[0-9]+) +(.*\.doc) / \1\n\3\n/' |
sed -e '2~3s/ /%20/g' |
paste -s -d " \n"
As Thor explained:
The 1st sed (s/ (([0-9]+\.){3}[0-9]+) +(.*\.doc) / \1\n\3\n/) separates the interesting bit on its own line.
And then:
The 2nd sed replaces all spaces by %20 on the 2nd line and every 3 lines.
Finally, paste it back together.
It must be noted that the 2~3 part is a GNU sed extension. If you do not have GNU sed, you can do:
cat your_file |
sed -r -e 's/ (([0-9]+\.){3}[0-9]+) +(.*\.doc) / \1\n\3\n/' |
sed -e 'N;P;s/.*\n//;s/ /%20/g;N' |
paste -s -d " \n"

Related

String manipulation via script

I am trying to get a substring between &DEST= and the next & or a line break.
For example :
MYREQUESTISTO8764GETTHIS&DEST=SFO&ORIG=6546
In this I need to extract "SFO"
MYREQUESTISTO8764GETTHIS&DEST=SANFRANSISCO&ORIG=6546
In this I need to extract "SANFRANSISCO"
MYREQUESTISTO8764GETTHISWITH&DEST=SANJOSE
In this I need to extract "SANJOSE"
I am reading a file line by line, and I need to update the text after &DEST= and put it back in the file. The modification of the text is to mask the dest value with X character.
So, SFO should be replaced with XXX.
SANJOSE should be replaced with XXXXXXX.
Output :
MYREQUESTISTO8764GETTHIS&DEST=XXX&ORIG=6546
MYREQUESTISTO8764GETTHIS&DEST=XXXXXXXXXXXX&ORIG=6546
MYREQUESTISTO8764GETTHISWITH&DEST=XXXXXXX
Please let me know how to achieve this in script (Preferably shell or bash script).
Thanks.
$ cat file
MYREQUESTISTO8764GETTHIS&DEST=SFO&ORIG=6546
MYREQUESTISTO8764GETTHIS&DEST=PORTORICA
MYREQUESTISTO8764GETTHIS&DEST=SANFRANSISCO&ORIG=6546
MYREQUESTISTO8764GETTHISWITH&DEST=SANJOSE
$ sed -E 's/^.*&DEST=([^&]*)[&]*.*$/\1/' file
SFO
PORTORICA
SANFRANSISCO
SANJOSE
should do it
Replacing airports with an equal number of Xs
Let's consider this test file:
$ cat file
MYREQUESTISTO8764GETTHIS&DEST=SFO&ORIG=6546
MYREQUESTISTO8764GETTHIS&DEST=SANFRANSISCO&ORIG=6546
MYREQUESTISTO8764GETTHISWITH&DEST=SANJOSE
To replace the strings after &DEST= with an equal length of X and using GNU sed:
$ sed -E ':a; s/(&DEST=X*)[^X&]/\1X/; ta' file
MYREQUESTISTO8764GETTHIS&DEST=XXX&ORIG=6546
MYREQUESTISTO8764GETTHIS&DEST=XXXXXXXXXXXX&ORIG=6546
MYREQUESTISTO8764GETTHISWITH&DEST=XXXXXXX
To replace the file in-place:
sed -i -E ':a; s/(&DEST=X*)[^X&]/\1X/; ta' file
The above was tested with GNU sed. For BSD (OSX) sed, try:
sed -Ee :a -e 's/(&DEST=X*)[^X&]/\1X/' -e ta file
Or, to change in-place with BSD(OSX) sed, try:
sed -i '' -Ee :a -e 's/(&DEST=X*)[^X&]/\1X/' -e ta file
If there is some reason why it is important to use the shell to read the file line-by-line:
while IFS= read -r line
do
echo "$line" | sed -Ee :a -e 's/(&DEST=X*)[^X&]/\1X/' -e ta
done <file
How it works
Let's consider this code:
search_str="&DEST="
newfile=chart.txt
sed -E ':a; s/('"$search_str"'X*)[^X&]/\1X/; ta' "$newfile"
-E
This tells sed to use Extended Regular Expressions (ERE). This has the advantage of requiring fewer backslashes to escape things.
:a
This creates a label a.
s/('"$search_str"'X*)[^X&]/\1X/
This looks for $search_str followed by any number of X followed by any character that is not X or &. Because of the parens, everything except that last character is saved into group 1. This string is replaced by group 1, denoted \1 and an X.
ta
In sed, t is a test command. If the substitution was made (meaning that some character needed to be replaced by X), then the test evaluates to true and, in that case, ta tells sed to jump to label a.
This test-and-jump causes the substitution to be repeated as many times as necessary.
Replacing multiple tags with one sed command
$ name='DEST|ORIG'; sed -E ':a; s/(&('"$name"')=X*)[^X&]/\1X/; ta' file
MYREQUESTISTO8764GETTHIS&DEST=XXX&ORIG=XXXX
MYREQUESTISTO8764GETTHIS&DEST=XXXXXXXXXXXX&ORIG=XXXX
MYREQUESTISTO8764GETTHISWITH&DEST=XXXXXXX
Answer for original question
Using shell
$ s='MYREQUESTISTO8764GETTHIS&DEST=SFO&ORIG=6546'
$ s=${s#*&DEST=}
$ echo ${s%%&*}
SFO
How it works:
${s#*&DEST=} is prefix removal. This removes all text up to and including the first occurrence of &DEST=.
${s%%&*} is suffix removal_. It removes all text from the first & to the end of the string.
Using awk
$ echo 'MYREQUESTISTO8764GETTHIS&DEST=SFO&ORIG=6546' | awk -F'[=\n]' '$1=="DEST"{print $2}' RS='&'
SFO
How it works:
-F'[=\n]'
This tells awk to treat either an equal sign or a newline as the field separator
$1=="DEST"{print $2}
If the first field is DEST, then print the second field.
RS='&'
This sets the record separator to &.
With GNU bash:
while IFS= read -r line; do
[[ $line =~ (.*&DEST=)(.*)((&.*|$)) ]] && echo "${BASH_REMATCH[1]}fooooo${BASH_REMATCH[3]}"
done < file
Output:
MYREQUESTISTO8764GETTHIS&DEST=fooooo&ORIG=6546
MYREQUESTISTO8764GETTHIS&DEST=fooooo&ORIG=6546
MYREQUESTISTO8764GETTHISWITH&DEST=fooooo
Replace the characters between &DEST and & (or EOL) with x's:
awk -F'&DEST=' '{
printf("%s&DEST=", $1);
xlen=index($2,"&");
if ( xlen == 0) xlen=length($2)+1;
for (i=0;i<xlen;i++) printf("%s", "X");
endstr=substr($2,xlen);
printf("%s\n", endstr);
}' file

sed from pattern till end of file inside a for loop

I'm writing a bash script that would allow me to take a certain amount of text from a file and add some other text before that for a list of files.
directory=$(pwd)
for f in *test.txt
do
filename=$(basename $f .txt)
printf "%%sum=4 \n"> input.temp
printf "file=$directory"/"$filename".txt" \n">> input.temp
printf "some commands \n">> input.temp
printf "\n" >> input.temp
printf "description \n">> input.temp
sed -n "/0 1/,$p" "$f" >> input.temp;
mv input.temp $filename.temp
done
I have a problem with the sed command inside the for loop. I looked around and people suggest adding double quotes which I did but to no avail. I think it might be the $p.
I hope this is clear enough. If it's not, I'll try to explain better.
sed -n "/0 1/,$p" "$f" >> input.temp; does not work
sed -n '/0 1/,$p' "$f" >> input.temp; does not work
sed -n "/0 1/,\$p" "$f" >> input.temp; does not work
FYI I'm not trying to find something else that works. I want to fix this exact input. I sound like an asshole I'm sure.
Sample input
%sum=8
file=otherpath/filename.txt
some other commands
another description
0 1
0.36920852 -0.56246512
0.77541848 0.05756533
2.05409026 0.62333039
2.92655258 0.56906375
2.52034254 -0.05096652
1.24167014 -0.61673008
-0.60708600 -0.99443872
0.10927459 0.09899803
3.90284624 1.00103940
3.18648588 -0.09239788
0.93151968 -1.09013674
2.50047427 1.30468389
2.19361322 2.54108378
3.18742399 0.34152442
3.38679424 1.11276220
1.56936488 3.27250306
1.81754180 4.19564055
1 2 1.5 6
2 3 1.5
3 4
4 5 1.5
5 6 1.5
6 11 1.0
7
8
9
10
11
12
13 16
14
15
16 17
17
My desired output is basically this file from "0 1" till the end preceded by the stuff I put inside the printf.
UPDATE: If you're interested, the two scripts tripleee and Ed Morton provided work perfectly well. The problem in my script was me leaving out the -i option from the sed line (for inplace).
sed -n "/0 1/,$p" "$f" >> input.temp
should be replaced by
sed -ni '/0 1/,$p' "$f"
I see you updated your question and provided some additional information in your comments so try this, uses GNU awk 4.* for -i inplace:
awk -i inplace -v directory="$(pwd)" '
FNR==1 {
print "%%sum=4 "
print "file=" directory "/" FILENAME
print "some commands "
print ""
print "description "
found = 0
}
/0 1/ { found = 1 }
found
' *text.txt
If you don't have GNU awk then the technically correct way to do it is using xargs but it's simpler using a shell loop for the file manipulation (moving) part:
for file in *test.txt
do
awk -v directory="$(pwd)" '
FNR==1 {
print "%%sum=4 "
print "file=" directory "/" FILENAME
print "some commands "
print ""
print "description "
found = 0
}
/0 1/ { found = 1 }
found
' "$file" > tmp && mv tmp "$file"
done
Like others have already commented, you basically just need to use single quotes instead of double, because $p in double quotes gets replaced with the value of the shell variable p by the shell, before sed executes (in practice, probably an empty string).
However, you might also want to investigate doing it all in sed. You might then instead stick with the double quotes (because there are other variables you do want to substitute) and instead escape the dollar sign in $p with a backslash to protect it from the shell.
directory=$(pwd) # just do this once before the loop; the value doesn't change
for f in *text.txt; do
# no braces
filename=$(basename "$f" .txt)
sed -n "1i\\
%sum=4\\
file=$directory/$filename.txt\\
some commands\\
\\
description
/0 1/,\$p" "$f" >inputout.temp2 # no pointless separate temp file
done
In practice, I imagine you would like for the output file to be different in each iteration (maybe "$filename.temp" instead?) but what you do about that is up to you, obviously. As it is now, the file will contain the output from the last iteration.

Convert data from a simple JSON format to a DSV format

I have a file in Unix, with data sample like the following:
{"ID":"123", "Region":"Asia", "Location":"India"}
{"ID":"234", "Region":"APAC", "Location":"Australia"}
{"ID":"345", "Region":"Americas", "Location":"Mexio"}
{"ID":"456", "Region":"Americas", "Location":"Canada"}
{"ID":"567", "Region":"APAC", "Location":"Japan"}
The desired output is
ID|Region|Location
123|Asia|India
234|APAC|Australia
345|Americas|Mexico
456|Americas|Canada
567|APAC|Japan
I tried with a few sed commands. I could remove the following: '{', '}', ' " ', ':'
There are 2 issues with the output file
All rows from input appear in single line in the output.
Adding the pipe ('|') as delimiter.
Any pointers are highly appreciated.
I recommend the tool jq (http://stedolan.github.io/jq/); jq is a lightweight and flexible command-line JSON processor.
jq -r '"\(.ID)|\(.Region)|\(.Location)"' < infile
123|Asia|India
234|APAC|Australia
345|Americas|Mexio
456|Americas|Canada
567|APAC|Japan
Explanation
-r is --raw-output
Through awk,
awk -F'"' -v OFS="|" 'BEGIN{print "ID|Region|Location"}{print $4,$8,$12}' file
Example:
$ cat file
{"ID":"123", "Region":"Asia", "Location":"India"}
{"ID":"234", "Region":"APAC", "Location":"Australia"}
{"ID":"345", "Region":"Americas", "Location":"Mexio"}
{"ID":"456", "Region":"Americas", "Location":"Canada"}
{"ID":"567", "Region":"APAC", "Location":"Japan"}
$ awk -F'"' -v OFS="|" 'BEGIN{print "ID|Region|Location"}{print $4,$8,$12}' file
ID|Region|Location
123|Asia|India
234|APAC|Australia
345|Americas|Mexio
456|Americas|Canada
567|APAC|Japan
EXplanation:
-F'"' Sets " as Field Separator value.
OFS="|" Sets | as Output Field Separator value.
Atfirst, awk would execute the function inside the BEGIN block. It helps to print the header section.
This sed one-liner does what you want. It's capturing the field values using parenthesized expressions, and then putting them into the output using \1, \2, and \3.
s/^{"ID":"\([^"]*\)", "Region":"\([^"]*\)", "Location":"\([^"]*\)"}$/\1|\2|\3/
Invoke it like:
$ sed -f one-liner.sed input.txt
Or you can invoke it within a Bash script, producing the header:
echo 'ID|Region|Location'
sed -e 's/^{"ID":"\([^"]*\)", "Region":"\([^"]*\)", "Location":"\([^"]*\)"}$/\1|\2|\3/' $input
It is a JSON file so it is best to use a JSON parser. Here is a perl implementation of it.
#!/usr/bin/perl
use strict;
use warnings;
use JSON;
open my $fh, '<', 'path/to/your/file';
#keys of your structure
my #key = qw(ID Region Location);
print join ("|", #key), "\n";
#iterate over your file, decode it and print in order of your key structure
while (my $json = <$fh>) {
my $text = decode_json($json);
print join ("|", map { $$text{$_} } #key ),"\n";
}
Output:
ID|Region|Location
123|Asia|India
234|APAC|Australia
345|Americas|Mexio
456|Americas|Canada
567|APAC|Japan
Using sed as follows
Command line
echo "my_string" |
sed -e 's#[,:"{}]##g' -e 's#ID##g' -e "s#Region##g" -e 's#Location##g' \
-e '1 s#^.*$#ID Region Location\n&#' -e 's# #|#g'
or
sed -e 's#[,:"{}]##g' -e 's#ID##g' -e "s#Region##g" -e 's#Location##g' \
-e '1 s#^.*$#ID Region Location\n&#' -e 's# #|#g' my_file
I tried this in a terminal as follows:
echo '{"ID":"123", "Region":"Asia", "Location":"India"}
{"ID":"234", "Region":"APAC", "Location":"Australia"}
{"ID":"345", "Region":"Americas", "Location":"Mexio"}
{"ID":"456", "Region":"Americas", "Location":"Canada"}
{"ID":"567", "Region":"APAC", "Location":"Japan"}' |
sed -e 's#[,:"{}]##g' -e 's#ID##g' -e "s#Region##g" -e 's#Location##g' \
-e '1 s#^.*$#ID Region Location\n&#' -e 's# #|#g'
Output
ID|Region|Location
123|Asia|India
234|APAC|Australia
345|Americas|Mexio
456|Americas|Canada
567|APAC|Japan
Many thanks for your response and the pointers/ solutions did help a lot.
For some mysterious reasons, I couldn't get any sed commands work. So, I devised my own solution. Although it's not elegant, it's still worked.
Here is the script I prepared which resolved the issue.
#!/bin/bash
# ource file path.
infile=/home/exfile.txt
# remove if these temp file exist already.
rm ./efile.txt ./xfile.txt ./yfile.txt ./zfile.txt
# removing the curly braces from input file.
cat exfile.txt | cut -d "{" -f2 | cut -d "}" -f1 >> ./efile.txt
# setting input file name to different value.
infile=./efile.txt
# remove double quotes from the file.
while IFS= read -r line
do
echo $line | sed 's/\"//g' >> ./xfile.txt
done < "$infile"
# creating another temp file.
infile2=./xfile.txt
# remove colon from file.
while IFS= read -r line
do
echo $line | sed 's/\:/,/g' >> ./yfile.txt
done < "$infile2"
# set input file path to new temp file.
infile3=yfile.txt
# initialize variables to hold header column values.
t1=0
t3=0
t5=0
# read each of the line to extract header row. Exit loop after reading 1st row.
once=1
while IFS=',' read -r f1 f2 f3 f4 f5 f6
do
"$f1 $f2 $f3 $f4 $f5 $f6"
t1=$f1
t3=$f3
t5=$f5
if [ "$once" -eq 1 ]; then
break
fi
done < "$infile3"
# Read each of the line from input file. Write only the value to another output file.
while IFS=',' read -r f1 f2 f3 f4 f5 f6
do
echo "$f2|$f4|$f6" >> ./zfile.txt
done < "$infile3"
# insert the header column row into the file generated in the step above.
frstline="$t1|$t3|$t5"
sed -i '1i ID|Region|Location' ./zfile.txt

How to escape special characters?

I'm trying to remove songs via a bash shell for loop yet removing a file like this
while read item; do rm "$item"; done < duplicates
keeps getting caught up on song name. Is it possible to get around this? My song titles might look like this:
/home/user/Music/Master List's Music/iTunes/iTunes\ Music/John\ Mayer/Room\ for\ Squares\ \[Aware\]/07\ 83.m4a
/home/user/Music/Master List's Music/bsg\ season\ 1\ \(Case\ Conflict\ 1\)/06\ A\ Good\ Lighter.mp3
/home/user/Music/Master List's Music/Nino\ Rota/The\ Godfather\ Pt.\ 3/14\ A\ Casa\ Amiche.m4a
as you can see, in order to remove an item I can have no %.()[] or anything else without being escaped unless it's the . before the file extension obviously. Is there a way I can escape special characters like this?
For instance, I used sed to turn the %20 into spaces:
cat duplicates | sed 's/%20/\\ /g' > clean_duplicates
The output I'm looking for looks like this:
/home/user/Music/Master\ List\'s\ Music/iTunes/iTunes Music/John\ Mayer/Room\ for\ Squares\ \[Aware\]/07\ 83.m4a
/home/user/Music/Master\ List\'s\ Music/bsg\ season\ 1\ \(Case\ Conflict\ 1\)/06\ A\ Good\ Lighter.mp3
/home/user/Music/Master\ List\'s\ Music/Nino\ Rota/The Godfather\ Pt\.\ 3\/14\ A\ Casa\ Amiche.m4a
Update To address the actual url-decoding (I missed it before):
while read line; do printf "$(echo -n $line | sed 's/\\/\\\\/g;s/\(%\)\([0-9a-fA-F][0-9a-fA-F]\)/\\x\2/g')\n"; done < input
Output:
/home/user/Music/Master List's Music/iTunes/iTunes Music/John Mayer/Room for Squares [Aware]/07 83.m4a
/home/user/Music/Master List's Music/bsg season 1 (Case Conflict 1)/06 A Good Lighter.mp3
/home/user/Music/Master List's Music/Nino Rota/The Godfather Pt. 3/14 A Casa Amiche.m4a
So in order to delete those files, e.g. redirect the cleaned output to a file:
while read line
do
printf "$(echo -n $line | sed 's/\\/\\\\/g;s/\(%\)\([0-9a-fA-F][0-9a-fA-F]\)/\\x\2/g')\n"
done < duplicates > cleaned_duplicates
while read file; do rm -v "$file"; done < cleaned_duplicates
If you prefer to store the names into a script files using explicit shell character escaping you could do
while read file; do printf "rm -v %q\n" "$file"; done < cleaned_duplicates > script.sh
Which should result in script.sh containing:
rm -v /home/user/Music/Master\ List\'s\ Music/iTunes/iTunes\ Music/John\ Mayer/R
rm -v /home/user/Music/Master\ List\'s\ Music/bsg\ season\ 1\ \(Case\ Conflict\
rm -v /home/user/Music/Master\ List\'s\ Music/Nino\ Rota/The\ Godfather\ Pt.\ 3/

help on sorting a file using sort

I have this file:
100: pattern1
++++++++++++++++++++
1:pattern2
9:pattern2
+++++++++++++++++++
79: pattern1
61: pattern1
+++++++++++++++++++
and I want to sort it like this:
++++++++++++++++++++
1:pattern2
9:pattern2
+++++++++++++++++++
61:pattern1
79:pattern1
100:pattern1
+++++++++++++++++++
Is it possible using Linux sort command only ?
If I had :
4:pat1
3:pat2
2:pat2
1:pat1
O/p should be:
1:pat1
++++++++++++
2:pat2
3:pat2
++++++++++++
4:pat1
So, want to sort on first group, but "group" on the pattern of second group.
Please note, the thing after : is a regex pattern not a literal.
Best you can do is to sort it according to the numerical values. But you cannot do anything with the "+"-string.
$ sort -n input
+++++++++++++++++++
+++++++++++++++++++
++++++++++++++++++++
1:wow
9:wow
61: this is it
79: this is it
100: this is it
I don't believe sort alone can do what you need.
Create a new shell script and put this in its contents (ie mysort.sh):
#!/bin/sh
IFS=$'\n' # This makes the for loop below split on newline instead of whitespace.
delim=+++++++++++++++++++
for l in `grep -v ^+| sort -g` # Ignore all + lines and sort by number
do
current=`echo $l | sed s/^[0-9]*://g` # Get what comes after the number
if [ ! -z "$prev" ] && [ "$prev" != "$current" ] # If it has changed...
then # then output a ++++ delimiter line.
echo $delim
fi
prev=$current
echo $l # Output this line.
done
To use it, pipe in the contents of your file like so:
cat input | sh mysort.sh
Probably not -- it's not in the sort of format sort(1) expects. And if you did it would be one of those amazing hacks, not easily used. If you have some sort of rule for what goes between the lines of plus signs, you can do it readily enough with an AWK or Perl or Python script.
If your input was space delimited, not ':' delimited:
sort -rk2 | uniq -D -f1
will do the grouping;
I guess you'd need to sort the 'subsections' later (unfortunately my sort(1) doesn't do composite key ordering. I do believe there are version that allow you to do sort -k2,1n and you'd be done at once).
use --all-repeated=separate instead of -D to get blank separators between groups. Look at man uniq for more ideas!
However, since your input is colon delimited, a hack is required:
sed 's/\([0123456789]\+\):/\1 /' t | sort -rk2 | uniq -D -f1
HTH

Resources