How to sort ROW in a line in BASH - bash

Most sorting available in bash or linux terminal commands are about sorting a field (column). I couldn't figure out how to sort a row of three number, e.g. "1, 3, 2". I want it from left to right are small to large, like "1,2,3" or vice versa.
So input would be like line="5, 3, 10". After being sorted, the output will be sorted_line="3,5,10".
Any tips? Thanks.

Note that asort works for gawk not general awk. So here is another solution for a file, a.txt
gawk -F, '{split($0, w); s=""; for(i=1; i<=asort(w); i++) s=s w[i] ","; print s }' a.txt | sed 's/,$//'
sample file, a.txt is
1,5,7,2
8,1,3,4
9,7,8,2
result,
1,2,5,7
1,3,4,8
2,7,8,9

This is one way :
echo "6 5,4,9 1,3 2,10,7 8" | awk '{ split($0,arr,"(,| )") ; asort(arr); exit; } END{ for ( i=1; i <= length(arr) ; i++ ) { print arr[i]} }'
I am using a regex as a delimiter so it can be comma or space separated.
Hope it helps!

Related

How to reverse order of columns in tabular data using bash

Given tabular output from some program in bash I would like to change order of colums printed. Assume number of columns might vary.
Sample input
Name Surname Age
Oli Aaa 15
Boa Bbb 25
Expected output
Age Surname Name
15 Aaa Oli
25 Bbb Boa
What I tried
It seems to me as an easy task when number of columns is known, but I don't know what to do when number of columns is just N. For 3 columns simple AWK script would do:
cat table.txt | awk '{print $3 $2 $1}' > reversed_table.txt
It would be good to achieve this using only POSIX-compliant tools.
using only POSIX-compliant tools
awk is posix.
but I don't know what to do when number of columns is just N
Now that's easy. So first, awk is really flexible. The awk '{ i=5; print $i; } will print the 5th column, just like that.
Second you can get the number of columns with NF.
Now, it's just writing a simple for loop and iterating from the NF to first argument and viola!
awk '{ for(i = NF; i >= 1; --i) printf "%s", $i "\t"; printf "\n" }'
A bit better version without a trailing tabulator:
awk '{ for(i = NF; i >= 1; --i) printf "%s", $i (i==1 ? "" : OFS); print ""; }'
Here is a Generic solution. Where we have 2 variables named swap1 and swap2, in swap one mention keep mapping with swap2 eg--> we want to exchange 3rd field to 5th field AND 4th field with 6th field. Likewise we can have a number of digits in it(I have considered a scenario where we want to exchange 3rd field to 5th field AND 4th field to 6th field).
swap1 --> 3 4
| |
| |
| |
swap2 --> 5 6
Following is the code:
awk -v swap1="3,4" -v swap2="5,6" '
BEGIN{
num=split(swap1,field1,",")
num1=split(swap2,field2,",")
for(i=1;i<=num;i++){
array1[field1[i]]=i
}
}
FNR==1{
print
next
}
{
for(i=1;i<=NF;i++){
if(i in array1){
tmp=$field1[array1[i]]
$field1[array1[i]]=$field2[array1[i]]
$field2[array1[i]]=tmp
}
}
}
1
' Input_file | column -t
This might work for you (GNU sed and rev):
sed 's/.*/echo "&" | rev/e;s/\S\+/$(echo "&"|rev)/g;s/.*/echo "&"/e' file
Reverse each line and re-reverse each separate word within the line.
awk '{use = $NF;$NF = "";print use,$2,$1}' OFS="\t" file
Age Surname Name
15 Aaa Oli
25 Bbb Boa
I looked into this one:
Printing everything except the first field with awk

Sorting groups of lines

Say I have this list:
sharpest
tool
in
the
shed
im
not
the
How can I order alphabetically by the non-indented lines and preserve groups of lines? The above should become:
im
not
the
sharpest
tool
in
the
shed
Similar questions exist here and here but I can't seem to make them work for my example.
Hopeful ideas so far
Maybe I could use grep -n somehow, as it gives me the line numbers? I was thinking to first get the line numbers, then order. I guess I'd somehow need to calculate a line range before ordering, and then from there fetch the range of lines somehow. Can't even think how to do this however!
sed ranges look promising too, but same deal; sed 1,2p and further examples here.
If perl is okay:
$ perl -0777 -ne 'print sort split /\n\K(?=\S)/' ip.txt
im
not
the
sharpest
tool
in
the
shed
-0777 slurp entire file, so solution not suitable if input is too big
split /\n\K(?=\S)/ gives array using newline character followed by non-whitespace character as split indication
sort to sort the array
You can use this asort function in a single gnu awk command:
awk '{if (/^[^[:blank:]]/) {k=$1; keys[++i]=k} else arr[k] = arr[k] $0 RS}
END{n=asort(keys); for (i=1; i<=n; i++) printf "%s\n%s", keys[i], arr[keys[i]]}' file
im
not
the
sharpest
tool
in
the
shed
Code Demo
Alternative solution using awk + sort:
awk 'FNR==NR{if (/^[^[:blank:]]/) k=$1; else arr[k] = arr[k] $0 RS; next}
{printf "%s\n%s", $1, arr[$1]}' file <(grep '^[^[:blank:]]' file | sort)
im
not
the
sharpest
tool
in
the
shed
Edit: POSIX compliancy:
#!/bin/sh
awk 'FNR==NR{if (/^[^[:blank:]]/) k=$1; else arr[k] = arr[k] $0 RS; next} {printf "%s\n%s", $1, arr[$1]}' file |
grep '^[![:blank:]]' file |
sort
With single GNU awk command:
awk 'BEGIN{ PROCINFO["sorted_in"] = "#ind_str_asc" }
/^[^[:space:]]+/{ k = $1; a[k]; next }
{ a[k] = (a[k]? a[k] ORS : "")$0 }
END{ for(i in a) print i ORS a[i] }' file
The output:
im
not
the
sharpest
tool
in
the
shed
awk one-liner
$ awk '/^\w/{k=$1; a[k]=k; next} {a[k]=a[k] RS $0} END{ n=asorti(a,b); for(i=1; i<=n; i++) print a[b[i]] }' file
im
not
the
sharpest
tool
in
the
shed

Print a comma except on the last line in Awk

I have the following script
awk '{printf "%s", $1"-"$2", "}' $a >> positions;
where $a stores the name of the file. I am actually writing multiple column values into one row. However, I would like to print a comma only if I am not on the last line.
Single pass approach:
cat "$a" | # look, I can use this in a pipeline!
awk 'NR > 1 { printf(", ") } { printf("%s-%s", $1, $2) }'
Note that I've also simplified the string formatting.
Enjoy this one:
awk '{printf t $1"-"$2} {t=", "}' $a >> positions
Yeh, looks a bit tricky at first sight. So I'll explain, first of all let's change printf onto print for clarity:
awk '{print t $1"-"$2} {t=", "}' file
and have a look what it does, for example, for file with this simple content:
1 A
2 B
3 C
4 D
so it will produce the following:
1-A
, 2-B
, 3-C
, 4-D
The trick is the preceding t variable which is empty at the beginning. The variable will be set {t=...} only on the next step of processing after it was shown {print t ...}. So if we (awk) continue iterating we will got the desired sequence.
I would do it by finding the number of lines before running the script, e.g. with coreutils and bash:
awk -v nlines=$(wc -l < $a) '{printf "%s", $1"-"$2} NR != nlines { printf ", " }' $a >>positions
If your file only has 2 columns, the following coreutils alternative also works. Example data:
paste <(seq 5) <(seq 5 -1 1) | tee testfile
Output:
1 5
2 4
3 3
4 2
5 1
Now replacing tabs with newlines, paste easily assembles the date into the desired format:
<testfile tr '\t' '\n' | paste -sd-,
Output:
1-5,2-4,3-3,4-2,5-1
You might think that awk's ORS and OFS would be a reasonable way to handle this:
$ awk '{print $1,$2}' OFS="-" ORS=", " input.txt
But this results in a final ORS because the input contains a newline on the last line. The newline is a record separator, so from awk's perspective there is an empty last record in the input. You can work around this with a bit of hackery, but the resultant complexity eliminates the elegance of the one-liner.
So here's my take on this. Since you say you're "writing multiple column values", it's possible that mucking with ORS and OFS would cause problems. So we can achieve the desired output entirely with formatting.
$ cat input.txt
3 2
5 4
1 8
$ awk '{printf "%s%d-%d",t,$1,$2; t=", "} END{print ""}' input.txt
3-2, 5-4, 1-8
This is similar to Michael's and rook's single-pass approaches, but it uses a single printf and correctly uses the format string for formatting.
This will likely perform negligibly better than Michael's solution because an assignment should take less CPU than a test, and noticeably better than any of the multi-pass solutions because the file only needs to be read once.
Here's a better way, without resorting to coreutils:
awk 'FNR==NR { c++; next } { ORS = (FNR==c ? "\n" : ", "); print $1, $2 }' OFS="-" file file
awk '{a[NR]=$1"-"$2;next}END{for(i=1;i<NR;i++){print a[i]", " }}' $a > positions

Filter a file using shell script tools

I have a file which contents are
E006:Jane:HR:9800:Asst
E005:Bob:HR:5600:Exe
E002:Barney:Purc:2300:PSE
E009:Miffy:Purc:3600:Mngr
E001:Franny:Accts:7670:Mngr
E003:Ostwald:Mrktg:4800:Trainee
E004:Pearl:Accts:1800:SSE
E009:Lala:Mrktg:6566:SE
E018:Popoye:Sales:6400:QAE
E007:Olan:Sales:5800:Asst
I want to fetch List all employees whose emp codes are between E001 and E018 using command including pipes is it possible to get ?
Use sed:
sed -n -e '/^E001:/,/^E018:/p' data.txt
That is, print the lines that are literally between those lines that start with E001 and E018.
If you want to get the employees that are numerically between those, one way to do that would be to do comparisons inline using something like awk (as suggested by hochl). Or, you could take this approach preceded by a sort (if the lines are not already sorted).
sort data.txt | sed -n -e '/^E001:/,/^E018:/p'
You can use awk for such cases:
$ gawk 'BEGIN { FS=":" } /^E([0-9]+)/ { n=substr($1, 2)+0; if (n >= 6 && n <= 18) { print } }' < data.txt
E006:Jane:HR:9800:Asst
E009:Miffy:Purc:3600:Mngr
E009:Lala:Mrktg:6566:SE
E018:Popoye:Sales:6400:QAE
E007:Olan:Sales:5800:Asst
Is that the result you want? This example intentionally only prints employees between 6 and 18 to show that it filters out records. You may print some fields only using $1 or $2 as in print $1 " " $2.
You can try something like this: cut -b2- | awk '{ if ($1 < 18) print "E" $0 }'
Just do string comparison: Since all your sample data matches, I changed the boundaries for illustration
awk -F: '"E004" <= $1 && $1 <= "E009" {print}'
output
E006:Jane:HR:9800:Asst
E005:Bob:HR:5600:Exe
E009:Miffy:Purc:3600:Mngr
E004:Pearl:Accts:1800:SSE
E009:Lala:Mrktg:6566:SE
E007:Olan:Sales:5800:Asst
You can pass the strings as variables if you don't want to hard-code them in the awk script
awk -F: -v start=E004 -v stop=E009 'start <= $1 && $1 <= stop {print}'

Get next field/column width awk

I have a dataset of the following structure:
1234 4334 8677 3753 3453 4554
4564 4834 3244 3656 2644 0474
...
I would like to:
1) search for a specific value, eg 4834
2) return the following field (3244)
I'm quite new to awk, but realize it is a simple operation. I have created a bash-script that asks the user for the input, and attempts to return the following field.
But I can't seem to get around scoping in AWK. How do I parse the input value to awk?
#!/bin/bash
read input
cat data.txt | awk '
for (i=1;i<=NF;i++) {
if ($i==input) {
print $(i+1)
}
}
}'
Cheers and thanks in advance!
UPDATE Sept. 8th 2011
Thanks for all the replies.
1) It will never happen that the last number of a row is picked - still I appreciate you pointing this out.
2) I have a more general problem with awk. Often I want to "do something" with the result found. In this case I would like to output it to xclip - an application which read from standard input and copies it to the clipboard. Eg:
$ echo Hi | xclip
Unfortunately, echo doesn't exist for awk, so I need to return the value and echo it. How would you go about this?
#!/bin/bash
read input
cat data.txt | awk '{
for (i=1;i<=NF;i++) {
if ($i=='$input') {
print $(i+1)
}
}
}'
Don't over think it!
You can create an array in awk with the split command:
split($0, ary)
This will split the line $0 into an array called ary. Now, you can use array syntax to find the particular fields:
awk '{
size = split($0, ary)
for (i=1; i < size ;i++) {
print ary[i]
}
print "---"
}' data.txt
Now, when you find ary[x] as the field, you can print out ary[x+1].
In your example:
awk -v input=$input '{
size = split($0, ary)
for (i=1; i<= size ;i++) {
if ($i == ary[i]) {
print ary[i+1]
}
}
}' data.txt
There is a way of doing this without creating an array, but it's simply much easier to work with arrays in situations like this.
By the way, you can eliminate the cat command by putting the file name after the awk statement and save creating an extraneous process. Everyone knows creating an extraneous process kills a kitten. Please don't kill a kitten.
You pass shell variable to awk using -v option. Its cleaner/nicer than having to put quotes.
awk -v input="$input" '
for(i=1;i<=NF;i++){
if ($i == input ){
print "Next value: " $(i+1)
}
}
' data.txt
And lose the useless cat.
Here is my solution: delete everything up to (and including) the search field, then the field you want to print out is field #1 ($1):
awk '/4834/ {sub(/^.* * 4834 /, ""); print $1}' data.txt

Resources