Sorting only a single column in CVS table - sorting

Is there a single step way using awk, sort or something equal to sort or reverse a single column of a multi-column CSV table while maintaining the rest in the same order they are?
For example,I have:
6, 45, 9
5, 47, 6
4, 46, 7
3, 48, 4
2, 10, 5
1, 11, 1
and would like to have:
1, 45, 9
2, 47, 6
3, 46, 7
4, 48, 4
5, 10, 5
6, 11, 1
So, only the first column is sorted and the rest are in their previous order.

This might work for you:
paste -d, <(cut -d, -f1 file | sort) <(cut -d, -f2- file)

awk one-liner
awk -F, '{c[NR]=$1;l[NR]=$2", "$3}END{for(i=1;i<=NR;i++) print c[NR-i+1]", "l[i]}' file
test
kent$ echo "6, 45, 9
5, 47, 6
4, 46, 7
3, 48, 4
2, 10, 5
1, 11, 1"|awk -F, '{c[NR]=$1;l[NR]=$2", "$3}END{for(i=1;i<=NR;i++) print c[NR-i+1]", "l[i]}'
1, 45, 9
2, 47, 6
3, 46, 7
4, 48, 4
5, 10, 5
6, 11, 1

If you have GNU awk here's a one liner:
$ gawk '{s[NR]=$1;c[NR]=$2 $3}END{for(i=0;++i<=asort(s);)print s[i] c[i]}' file
1,45,9
2,47,6
3,46,7
4,48,4
5,10,5
6,11,1
If not, here's an awk script that implements a simple bubble sort:
{ # read col1 in sort array, read others in col array
sort[NR] = $1
cols[NR] = $2 $3
}
END { # sort it with bubble sort
do {
haschanged = 0
for(i=1; i < NR; i++) {
if ( sort[i] > sort[i+1] ) {
t = sort[i]
sort[i] = sort[i+1]
sort[i+1] = t
haschanged = 1
}
}
} while ( haschanged == 1 )
# print it
for(i=1; i <= NR; i++) {
print sort[i] cols[i]
}
}
Save it to a file sort.awk and do awk -f sort.awk file:
$ awk -f sort.awk file
1,45,9
2,47,6
3,46,7
4,48,4
5,10,5
6,11,1

Related

Bash Sorting a List by Columns

I'm reverse sorting column 2.
As for column 1, if multiple lines have the same $2 value, I want them to be sorted in a reverse order. I have stored this list in a variable at the moment in a bash script. Is there a sed or awk function to be used?
My output right now, for example, is:
123, 3
124, 3
12345, 2
898, 1
1010, 1
what I want is:
124, 3
123, 3
12345, 2
1010, 1
898, 1
Use a combination of Perl one-liners and sort. The one-liners convert the , delimiter into tab (and back). And sort uses the -r option for reverse, and -g option for numeric sort. Option -kN,N specifies to sort just by field N, here 2nd, then 1st field.
perl -pe 's/, /\t/' in_file | sort -k2,2gr -k1,1gr | perl -pe 's/\t/, /' > out_file
For example:
Create example input file:
cat > foo <<EOF
123, 3
124, 3
12345, 2
898, 1
1010, 1
EOF
Run the command:
cat foo | perl -pe 's/, /\t/' | sort -k2,2gr -k1,1gr | perl -pe 's/\t/, /'
Output:
124, 3
123, 3
12345, 2
1010, 1
898, 1
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlrequick: Perl regular expressions quick start
It's not a trivial awk script, but it's not hard either. You simply use an array a[] below, to store the values for the first field for equal values of the second field. If last is set (e.g. not the first record) and the second field changes, you output the current array and reset the array (that is Rule 1).
In Rule 2, you just scan through the existing array and insert the current first field in the array in order. You keep the last value of the second field so you know when it changes. You use the END rule to output the last set of values, e.g.
awk -F, '
last && $2 != last {
for (i=1; i<=n; i++)
print a[i]", "last;
delete a
n = 0
}
{
swapped=0
for (i=1; i<=n; i++)
if ($1 > a[i]) {
swapped=1
for (j=n+1; j>i; j--)
a[j]=a[j-1]
a[i]=$1
}
if (!swapped)
a[++n]=$1
else
n++
last=$2
}
END {
for (i=1; i<=n; i++)
print a[i]", "last
}
' file
The swapped flag just tells you whether the current first-field was inserted into the array before an existing element (swapped == 1) or if it was just added at the end (swapped == 0).
Example Use/Output
With your sample file in the file named file, you can simply change to the directory that contains it, select the script above with the mouse (change the filename to what yours is) and then middle-mouse-paste the script into the terminal, e.g.
$ awk -F, '
> last && $2 != last {
> for (i=1; i<=n; i++)
> print a[i]", "last;
> delete a
> n = 0
> }
> {
> swapped=0
> for (i=1; i<=n; i++)
> if ($1 > a[i]) {
> swapped=1
> for (j=n+1; j>i; j--)
> a[j]=a[j-1]
> a[i]=$1
> }
> if (!swapped)
> a[++n]=$1
> else
> n++
> last=$2
> }
> END {
> for (i=1; i<=n; i++)
> print a[i]", "last
> }
> ' file
124, 3
123, 3
12345, 2
1010, 1
898, 1
Look things over and let me know if you have questions.
Also with awk, you can try this:
awk 'BEGIN{RS=""; OFS=FS="\n"} {tmp2 = $2; $2 = $1; $1 = tmp2; tmp5=$5; $5=$4; $4=tmp5}1' file
124, 3
123, 3
12345, 2
1010, 1
898, 1

utilizing awk and making an awk script - beginner question

input fields are this order: lastname, firstname, section, assignment, grade.
with a list of grades from the input file grades.txt, make an awk script
which prints to match the format below.
print the avg grade and the number of failing grades (<= 60) in the specified section.
grades.txt:
Alicia, Joseph, 2, 1, 40
Alfonzo, Gary, 3, 3, 85
Albert, Tom, 2, 1, 90
Bailey, Linda, 3, 2, 76
Butcher, Stewie, 3, 1, 80
Buser, Gary, 1, 3, 59
Canyon, Nicole, 2, 5, 90
EXAMPLE OUTPUT for section 1 (for
the section number specified on the cmd-line)
Gary Buser – 59 – 3
Fails = 1
Average Grade = 59
BEGIN {
# first last - grade - assignment
# Fails = 1
# Avg Grade = grade
}
{
for(i=0; i<=NF; i++)
print($2, $1, $5, "-", $4);
total+=$5;
if ( $5 <= 60 )
fails=fails++;
else {
fails=0;
}
avg=total/i;
}
END {
print("Fails =", fails);
print("Average Grade =", avg);
}
The command line I'm trying to use but it is not working:
awk '$3==1' -f name.awk grades.txt
What am I doing wrong to achieve the desired output?
$ cat tst.awk
BEGIN { FS="[[:space:]]*,[[:space:]]*" }
$3 == sect {
print $2, $1, "-", $5, "-", $4
if ( $5 <= 60 ) {
fails++
}
sum += $5
cnt++
}
END {
print "Fails =", fails+0
print "Average Grade =", (cnt ? sum / cnt : 0)
}
$ awk -v sect=1 -f tst.awk grades.txt
Gary Busey - 43 - 3
Fails = 1
Average Grade = 43

Switch column if value is found in an array

If the table contains a string from the file I need to replace the with a '-' and then change column four to what ever column two had.
I have the following .txt file:
0
1
2
and I have a csv:
carrot, 0, cat, r
orange, 2, cat, m
banana, 4, robin, d
output:
carrot, -, cat, 0
orange, -, cat, 2
banana, 4, robin, d
What I've currently got is I have done a for loop for the csv file line by line and used grep to change if it contains the word. If greater than one replace it with a dash. I think this method is very inefficient and was wondering if there was a better method.
This is classical case for awk tool:
awk 'BEGIN{ FS = OFS = ", " }
NR == FNR{ a[$1]; next }
{ if ($2 in a) { $4 = $2; $2 = "-" } }1' file.txt file.csv
The output:
carrot, -, cat, 0
orange, -, cat, 2
banana, 4, robin, d

Adding columns to a csv table with AWK from multiple files

I'm looking to build a csv table by getting values from several files with AWK. I have it working with two files, but I can't scale it beyond that. I'm currently taking the output of the second file, and appending the third, and so on.
Here are example files:
#file1 #file2 #file3 #file4
100 45 1 5
200 23 1 2
300 29 2 1
400 0 1 2
500 74 4 5
This is the goal:
#data.csv
1,100,45,1,5
2,200,23,1,2
3,300,29,2,1
4,400,0,1,2
5,500,74,4,5
This is what I have working:
awk 'FNR==NR { a[FNR""] = NR", " $0","; next } { print a[FNR""], $0}' $file1 $file2
With the result:
1, 100, 45
2, 200, 23
3, 300, 29
4, 400, 0
5, 500, 74
But when I try and get it to work on 3 or more files, like so:
awk 'FNR==NR { a[FNR""] = NR", " $0","; next } { print a[FNR""], $0; next } { print a[FNR""], $0}' $file1 $file2 $file3
I get this output:
1, 100, 45
2, 200, 23
3, 300, 29
4, 400, 0
5, 500, 74
1, 100, 1
2, 200, 1
3, 300, 2
4, 400, 1
5, 500, 4
In the first column the line count restarts, and the second column it also repeats the first file. In the third column is where it adds the third and subsequent files as new rows, where I would expect these should be added as columns. No new rows required.
Any help would be greatly appreciated. I have learned most of my AWK from Stack Exchange, and I know I'm missing something fundamental here. Thanks,
as already answered you can use paste. To get the exact output with comma delimited line numbering, you can do this
paste -d, file{1..4} | nl -s, -w1
-s, sets number separator as comma (default is tab).
-w1 sets number width, so there are no initial spaces (because default is bigger)
another solution with awk
awk '{a[FNR]=a[FNR] "," $0}
END {for (i=1;i<=length(a);i++) print i a[i]}' file{1..4}
Why don't you use paste and then simply number each row:-
paste -d"," file1 file2 file3 file4
100,45,1,5
200,23,1,2
300,29,2,1
400,0 ,1,2
500,74,4,5
An awk solution for a variable number of files:
awk '{ !line[FNR] && line[FNR]=FNR; line[FNR]=line[FNR]","$0 }
END { for (i=1; i<=length(line); i++) print line[i] }' file1 file2 ... fileN
For example:
$ awk '{ !line[FNR] && line[FNR]=FNR; line[FNR]=line[FNR]","$0 }
END { for (i=1; i<=length(line); i++) print line[i] }' \
<(seq 1 5) <(seq 11 15) <(seq 21 25) <(seq 31 35)
1,1,11,21,31
2,2,12,22,32
3,3,13,23,33
4,4,14,24,34
5,5,15,25,35
Here is a beginner friendly solution. If you need to manipulate the data on the way in you can clearly see which file is being read.
ARGIND is gawk specific. It tells us which file we are processing. We fill two arrays a and b from file1 and file2 and then print your desired output while processing file3.
awk '
ARGIND == 1 { a[FNR] = $0 ; next }
ARGIND == 2 { b[FNR] = $0 ; next }
ARGIND == 3 { print FNR "," a[FNR] "," b[FNR] "," $0 }
' file1 file2 file3
Output:
1,100,45,1
2,200,23,1
3,300,29,2
4,400,0,1
5,500,74,4

match a row value to a column value, and rename the row

I have a file with the following header:
File 1:
location, nameA, nameB, nameC
and a second file with the format:
File2:
ID_number, names
101, nameA
102, nameB
103, nameC
I would like to match the row names from File1 to those in column 2 of File2, and if they match replace the names in the header with the ID number. So that in the end, the resulting file would like:
File 1:
location, 101, 102, 103
I've mostly being trying with awk to do this but I can't get it to produce anything and I'm not sure how to ask it to do the last part of what I want.
awk -F "," '{print $2}' file2.csv | while read i; do awk 'NR=1;{for (j=0;j<=NF;j++) {if ($j == $i) printF $j; }}' file1.csv;done > test.csv
It's a really large file with thousands of columns and rows, so I just put up a simplified version of the files in my question here.
Thanks!
This should work if your csv fields have no embedded commas. It also assumes that both files have a header line.
awk '
BEGIN { FS=","; OFS=", " }
FNR == 1 { # if it is the header line
if (NR != 1) # if it is the second file
print # print it
next # go to next line of file
}
{ gsub(/ +/, "") } # compress spaces
NR == FNR { # if it is the first file
a[$2] = $1 # save the info
next # go to next line of file
}
{
$2=a[$2]; $3=a[$3]; $4=a[$4] # swap names
print # print line
}
' file2.csv file1.csv
Test files:
file1.csv
location, nameA, nameB, nameC
Earth, Chuck, Edward, Bob
The Moon, Bob, Doris, Al
file2.csv
ID_number, names
101, Al
102, Bob
103, Chuck
104, Doris
105, Edward
Output:
location, nameA, nameB, nameC
Earth, 103, 105, 102
TheMoon, 102, 104, 101

Resources