sort string with delimiter as string in unix - shell

I have some data in the following format::
Info-programNumber!/TvSource/11100001_233a_32c0/13130^Info-channelName!5 USA^Info-Duration!1575190^Info-programName!CSI: ab cd
Delimiter = Info-
I tried to sort the string based on the delimiter in ascending order. But none of my solutions are working.
Expected Result:
Info-channelName!5 USA^Info-Duration!1575190^Info-programName!CSI: ab cd^Info-programNumber!/TvSource/11100001_233a_32c0/13130
Is there any command that will allow me to do this or do i need to write an awk script to iterate over the string and sort it?

Temporarily split the info into multiple lines so you can sort:
tr ^ \\n | sort | tr \\n ^
Note: if you have multiple entries, you have to write a loop, which processes it per line.. with huge datasets this is probably not a good idea (too slow), in which case pick a programming language.. but you were asking about the shell...

Can be done in awk itself:
awk -F "^" '{OFS="^"; for (i=1; i<=NF; i++) a[i]=$i}
END {n=asort(a, b); for(i=1; i<=n; i++) printf("%s%s", b[i], FS); print ""}' file

Related

How to sort ROW in a line in BASH

Most sorting available in bash or linux terminal commands are about sorting a field (column). I couldn't figure out how to sort a row of three number, e.g. "1, 3, 2". I want it from left to right are small to large, like "1,2,3" or vice versa.
So input would be like line="5, 3, 10". After being sorted, the output will be sorted_line="3,5,10".
Any tips? Thanks.
Note that asort works for gawk not general awk. So here is another solution for a file, a.txt
gawk -F, '{split($0, w); s=""; for(i=1; i<=asort(w); i++) s=s w[i] ","; print s }' a.txt | sed 's/,$//'
sample file, a.txt is
1,5,7,2
8,1,3,4
9,7,8,2
result,
1,2,5,7
1,3,4,8
2,7,8,9
This is one way :
echo "6 5,4,9 1,3 2,10,7 8" | awk '{ split($0,arr,"(,| )") ; asort(arr); exit; } END{ for ( i=1; i <= length(arr) ; i++ ) { print arr[i]} }'
I am using a regex as a delimiter so it can be comma or space separated.
Hope it helps!

Align around a given character in bash

Is there an easy way to align multiple rows of text about a single character, similar to this question, but in bash.
Also open to zsh solutions.
What I have:
aaa:aaaaaaaa
bb:bbb
cccccccccccc:cc
d:d
What I want:
aaa:aaaaaaaa
bb:bbb
cccccccccccc:cc
d:d
Preferably the output can be piped out and retain its layout too.
You can try with column and gnu sed
column -t -s':' infile | sed -E 's/(\S+)(\s{0,})( )(.*)/\2\1:\4/'
The shell itself does not seem like a particularly suitable tool for this task. Using an external tool makes for a solution which is portable between shells. Here is a simple Awk solution.
awk -F ':' '{ a[++n] = $1; b[n] = $2; if (length($1) > max) max = length($1) }
END { for (i=1; i<=n; ++i) printf "%" max "s:%s\n", a[i], b[i] }'
Demo: https://ideone.com/Eaebhh
This stores the input file in memory; if you need to process large amount of text, it would probably be better to split this into a two-pass script (first pass, just read all the lines to get max, then change the END block to actually print output from the second pass), which then requires the input to be seekable.

Sorting groups of lines

Say I have this list:
sharpest
tool
in
the
shed
im
not
the
How can I order alphabetically by the non-indented lines and preserve groups of lines? The above should become:
im
not
the
sharpest
tool
in
the
shed
Similar questions exist here and here but I can't seem to make them work for my example.
Hopeful ideas so far
Maybe I could use grep -n somehow, as it gives me the line numbers? I was thinking to first get the line numbers, then order. I guess I'd somehow need to calculate a line range before ordering, and then from there fetch the range of lines somehow. Can't even think how to do this however!
sed ranges look promising too, but same deal; sed 1,2p and further examples here.
If perl is okay:
$ perl -0777 -ne 'print sort split /\n\K(?=\S)/' ip.txt
im
not
the
sharpest
tool
in
the
shed
-0777 slurp entire file, so solution not suitable if input is too big
split /\n\K(?=\S)/ gives array using newline character followed by non-whitespace character as split indication
sort to sort the array
You can use this asort function in a single gnu awk command:
awk '{if (/^[^[:blank:]]/) {k=$1; keys[++i]=k} else arr[k] = arr[k] $0 RS}
END{n=asort(keys); for (i=1; i<=n; i++) printf "%s\n%s", keys[i], arr[keys[i]]}' file
im
not
the
sharpest
tool
in
the
shed
Code Demo
Alternative solution using awk + sort:
awk 'FNR==NR{if (/^[^[:blank:]]/) k=$1; else arr[k] = arr[k] $0 RS; next}
{printf "%s\n%s", $1, arr[$1]}' file <(grep '^[^[:blank:]]' file | sort)
im
not
the
sharpest
tool
in
the
shed
Edit: POSIX compliancy:
#!/bin/sh
awk 'FNR==NR{if (/^[^[:blank:]]/) k=$1; else arr[k] = arr[k] $0 RS; next} {printf "%s\n%s", $1, arr[$1]}' file |
grep '^[![:blank:]]' file |
sort
With single GNU awk command:
awk 'BEGIN{ PROCINFO["sorted_in"] = "#ind_str_asc" }
/^[^[:space:]]+/{ k = $1; a[k]; next }
{ a[k] = (a[k]? a[k] ORS : "")$0 }
END{ for(i in a) print i ORS a[i] }' file
The output:
im
not
the
sharpest
tool
in
the
shed
awk one-liner
$ awk '/^\w/{k=$1; a[k]=k; next} {a[k]=a[k] RS $0} END{ n=asorti(a,b); for(i=1; i<=n; i++) print a[b[i]] }' file
im
not
the
sharpest
tool
in
the
shed

Is it possible to use grep for the space character

I have a text file (example.txt) like this:
100 this is a string
50 word
10
(Note that there are trailing space characters on the last line.)
When I do the following in my shell script:
cat example.txt | sed '1!d' | awk '{for (i=2; i < NF; i++) printf $i " "; print $NF}' - returns this is a string
cat example.txt | sed '2!d' | awk '{for (i=2; i < NF; i++) printf $i " "; print $NF}' - returns word
cat example.txt | sed '3!d' | awk '{for (i=2; i < NF; i++) printf $i " "; print $NF}' - returns 10 (incorrect, should be a space character instead)
Is there any method to use grep in bash to return the result I am looking for?
Is there any method to use grep in bash to return the result I am looking for?
Well, grep can match space characters. You have to quote them to avoid the shell interpreting them as delimiters. But grep will output either the whole line or the part of it that matches, depending on the options given to it, and I don't think that will satisfy your output requirement.
It looks like your input format may employ fixed field widths, or at least a fixed-width first field, and that you're trying to remove that first field. In that case, why not use sed? For example,
cat example.txt | sed 's/^....//'
will remove the first four characters from each line. You can also spell that
sed 's/^....//' example.txt
. If you want instead to cut a variable-length head of the line consisting of decimal digits up to the first space then that would be
sed 's/^[0-9]* //' example.txt
Note that although that's what you said in comments you want, it will produce different output than your awk example in the case of your second input line -- it will output a leading space:
word
Note also that your awk-based approach will replace multiple adjacent whitespace in the retained part of your lines with single spaces. That behavior could be obtained from sed, too, but I'm inclined to think that it's not actually wanted.

AWK array parsing issue

My two input files are pipe separated.
File 1 :
a|b|c|d|1|44
File 2 :
44|ab|cd|1
I want to store all my values of first file in array.
awk -F\| 'FNR==NR {a[$6]=$0;next}'
So if I store the above way is it possible to interpret array; say I want to know $3 of File 1. How can I get tat from a[].
Also will I be able to access array values if I come out of that awk?
Thanks
I'll answer the question as it is stated, but I have to wonder whether it is complete. You state that you have a second input file, but it doesn't play a role in your actual question.
1) It would probably be most sensible to store the fields individually, as in
awk -F \| '{ for(i = 1; i < NF; ++i) a[$NF,i] = $i } END { print a[44,3] }' filename
See here for details on multidimensional arrays in awk. You could also use the split function:
awk -F \| '{ a[$NF] = $0 } END { split(a[44], fields); print fields[3] }'
but I don't see the sense in it here.
2) No. At most you can print the data in a way that the surrounding shell understands and use command substitution to build a shell array from it, but POSIX shell doesn't know arrays at all, and bash only knows one-dimensional arrays. If you require that sort of functionality, you should probably use a more powerful scripting language such as Perl or Python.
If, any I'm wildly guessing here, you want to use the array built from the first file while processing the second, you don't have to quit awk for this. A common pattern is
awk -F \| 'FNR == NR { for(i = 1; i < NF; ++i) { a[$NF,i] = $i }; next } { code for the second file here }' file1 file2
Here FNR == NR is a condition that is only true when the first file is processed (the number of the record in the current file is the same as the number of the record overall; this is only true in the first file).
To keep it simple, you can reach your goal of storing (and accessing) values in array without using awk:
arr=($(cat yourFilename |tr "|" " ")) #store in array named arr
# accessing individual elements
echo ${arr[0]}
echo ${arr[4]}
# ...or accesing all elements
for n in ${arr[*]}
do
echo "$n"
done
...even though I wonder if that's what you are looking for. Inital question is not really clear.

Resources