Extracting field from last row of given table using sed - bash

I would like to write a bash script to extract a field in the last row of a table. I will illustrate by example. I have a text file containing tables with space delimited fields like ...
Table 1 (foobar)
num flag name comments
1 ON Frank this guy is frank
2 OFF Sarah she is tall
3 ON Ahmed who knows him
Table 2 (foobar)
num flag name comments
1 ON Mike he is short
2 OFF Ahmed his name is listed twice
I want to extract the first field in the last row of Table1, which is 3. Ideally I would like to be able to use any given table's title to do this. There are guaranteed carriage returns between each table. What would be the best way to accomplish this, preferably using sed and grep?

Awk is perfect for this, print the first field in the last row for each record:
$ awk '!$1{print a}{a=$1}END{print a}' file
3
2
Just from the first record:
$ awk '!$1{print a;exit}{a=$1}' file
3
Edit:
For a given table title:
$ awk -v t="Table 1" '$0~t{f=1}!$1&&f{print a;f=0}{a=$1}END{if (f) print a}' file
3
$ awk -v t="Table 2" '$0~t{f=1}!$1&&f{print a;f=0}{a=$1}END{if (f) print a}' file
2

This sed line seems to work for your sample.
table='Table 2'
sed -n "/$table"'/{n;n;:next;h;n;/^$/b last;$b last;b next;:last;g;s/^\s*\(\S*\).*/\1/p;}' file
Explanation: When we find a line matching the table name in $table, we skip that line, and the next (the field labels). Starting at :next we push the current line into the hold space, get the next line and see if it is blank or the end of the file, if not we go back to :next, push the current line into hold and get another. If it is blank or EOF, we skip to :last, pull the hold space (the last line of the table) into pattern space, chop out all but the first field and print it.

Just read each block as a record with each line as a field and then print the first sub-field of the last field of whichever record you care about:
$ awk -v RS= -F'\n' '/^Table 1/{split($NF,a," "); print a[1]}' file
3
$ awk -v RS= -F'\n' '/^Table 2/{split($NF,a," "); print a[1]}' file
2

Better tool to that is awk!
Here is a kind legible code:
awk '{
if(NR==1) {
row=$0;
next;
}
if($0=="") {
$0=row;
print $1;
} else {
row=$0;
}
} END {
if(row!="") {
$0=row;
print $1;
}
}' input.txt

Related

Transpose rows to column after nth column in bash

I have a file like below format:
$ cat file_in.csv
1308123;28/01/2019;28/01/2019;22/01/2019
1308456;20/11/2018;27/11/2018;09/11/2018;15/11/2018;10/11/2018;02/12/2018
1308789;06/12/2018;04/12/2018
1308012;unknown
How can i transpose as below, starting from second column:
1308123;28/01/2019
1308123;28/01/2019
1308123;22/01/2019
1308456;20/11/2018
1308456;27/11/2018
1308456;09/11/2018
1308456;15/11/2018
1308456;10/11/2018
1308456;02/12/2018
1308789;06/12/2018
1308789;04/12/2018
1308012;unknown
I'm testing my script, but obtain a wrong result
echo "123;23/05/2018;24/05/2018" | awk -F";" 'NR==3{a=$1";";next}{a=a$1";"}END{print a}'
Thanks in advance
1st Solution: Eaisest solution will be, loop through all fields(off course have set field separator as ;) and then print $1 along with all fields in new line. Also note that loop is running from i=2 to till value of NF leaving first field since we need to print in new line from column 2nd onwards.
awk 'BEGIN{FS=OFS=";"} {for(i=2;i<=NF;i++){print $1,$i}}' Input_file
2nd Solution: Using 1 time substitution(sub) and global substitutions(gsub) functionality of awk. Here I am changing very first occurence of ; with ###(assumed that your Input_file will NOT have this characters together, in case it is there choose any unique character(s) which are NOT in one's Input_file on place of ###), then globally subsituting ;(all occurences) with ORS val(a variable which has value of $1) and ; so make values in new column. Now finally remove ### from first field. Why we have done this approch if we DO NOT substitute very first occurence of ; with any other character then it will place a NEW LINE before substituion which we DO NOT want to have. (Also as per Ed sir's comment this solution was tested in 1 Input_file and may have issues while reading multiple Input_files)
awk 'BEGIN{FS=OFS=";"} {val=$1;sub(";","###");gsub(";",ORS val ";");sub("###",";",$1)} 1' Input_file
Another awk
awk -F";" '{ OFS="\n" $1 ";"; $1=$1;$1=""; printf("%s",$0) } ' file

Formatting output using awk

I've a file with following content:
A 28713.64 27736.1000
B 9835.32
C 38548.96
Now, i need to check if the last row in the first column is 'C', then the value of first row in third column should be printed in the third column against 'C'.
Expected Output:
A 28713.64 27736.1000
B 9835.32
C 38548.96 27736.1000
I tried below, but it's not working:
awk '{if ($1 == "C") ; print $1,$2,$3}' file_name
Any help is most welcome!!!
This works for the given example:
awk 'NR==1{v=$3}$1=="C"{$0=$0 FS v}7' file|column -t
If you want to append the 3rd column value from A row to C row, change NR==1 into $1=="A"
The column -t part is just for making output pretty. :-)
EDIT: As per OP's comment OP is looking for very first line and looking to match C string at very last line of Input_file, if this is the case then one should try following.
awk '
FNR==1{
value=$NF
print
next
}
prev{
print prev
}
{
prev=$0
prev_first=$1
}
END{
if(prev_first=="C"){
print prev,value
}
else{
print
}
}' file | column -t
Assuming that your actual Input_file is same as shown samples and you want to pick value from 1st column whose value is A.
awk '$1=="A" && FNR==1{value=$NF} $1=="C"{print $0,value;next} 1' Input_file| column -t
Output will be as follows.
A 28713.64 27736.1000
B 9835.32
C 38548.96 27736.1000
POSIX dictates that "assigning to a nonexistent field (for example, $(NF+2)=5) shall increase the value of NF; create any intervening fields with the uninitialized value; and cause the value of $0 to be recomputed, with the fields being separated by the value of OFS."
So...
awk 'NR==1{x=$3} $1=="C"{$3=x} 1' input.txt
Note that the output is not formatted well, but that's likely the case with most of the solutions here. You could pipe the output through column, as Ravinder suggested. Or you could control things precisely by printing your data with printf.
awk 'NR==1{x=$3} $1=="C"{$3=x} {printf "%-2s%-26s%s\n",$1,$2,$3}' input.txt
If your lines can be expressed in a printf format, you'll be able to avoid the unpredictability of column -t and save the overhead of a pipe.

Bash vlookup kind of solution

I have two files,
File 1
2,1,1,1,Test1,1540584051,52
6,5,1,1,Test2,1540579206,54
3,3,0,0,Test3,1540591243,36
File 2
2,1,0,2,Test1,1540584051,52
6,5,0,2,Test2,1540579206,54
i want to look up column 7 value from File 1 to check if it matches with column 7 value from File 2 and when matched, replace the that line in file 2 with corresponding line in file 1
So the output would be
2,1,1,1,Test1,1540584051,52
6,5,1,1,Test2,1540579206,54
Thanks in advance.
You can do that with the following script:
BEGIN { FS="," }
NR==FNR {
lookup[$7] = $0
next
}
{
if (lookup[$7] != "") {
$0 = lookup[$7]
}
print
}
END {
print ""
print "Lookup table used was:"
for (i in lookup) {
print " Key '"i"', Value '"lookup[i]"'"
}
}
The BEGIN section simply sets the field separator to , so individual fields can be easily processed.
The NR and FNR variables are, respectively, the line number of the full input stream (all files) and the line number of the current file in the input stream. When you are processing the first (or only) file, these will be equal, so we use this as a means to simply store the lines from the first file, keyed on field seven.
When NR and FNR are not equal, it's because you've started the second file and this is where we want to replace lines if their key exists in the first file.
This is done by simply checking if a line exists in the lookup table with the desired key and, if it does, replacing the current line the lookup table line. Then we print the (original or replaced) line.
The END section is there just for debugging purposes, it outputs the lookup table that was created and used, and you can remove it once you're satisfied the script works as expected.
You'll see the output in the following transcript, illustrating hopefully that it is working correctly:
pax$ cat file1
2,1,1,1,Test1,1540584051,52
6,5,1,1,Test2,1540579206,54
3,3,0,0,Test3,1540591243,36
pax$ cat file2
2,1,0,2,Test1,1540584051,52
6,5,0,2,Test2,1540579206,54
pax$ awk -f sudarshan.awk file1 file2
2,1,1,1,Test1,1540584051,52
6,5,1,1,Test2,1540579206,54
Lookup table used was:
Key '36', Value '3,3,0,0,Test3,1540591243,36'
Key '52', Value '2,1,1,1,Test1,1540584051,52'
Key '54', Value '6,5,1,1,Test2,1540579206,54'
If you need it as a "short as possible" one-liner to use from your script, just use:
awk -F, 'NR==FNR{x[$7]=$0;next}{if(x[$7]!=""){$0=x[$7]};print}' file1 file2
though I prefer the readable version myself.
This might work for you (GNU sed):
sed -r 's|^([^,]*,){6}([^,]*).*|/^([^,]*,){6}\2/s/.*/&/p|' file1 | sed -rnf - file2
Turn file1 into a sed script and using the 7th field as a key lookup replace any line in file2 that matches.
In your example the 7th field is the last one, so a short version of the above solution is:
sed -r 's|.*,(.*)|/.*,\1/s/.*/&/p|' file1 | sed -nf - file2

Bash/Shell: How to remove duplicates from csv file by columns?

I have a csv separated with ;. I need to remove lines where content of 2nd and 3rd column is not unique, and deliver the material to the standard output.
Example input:
irrelevant;data1;data2;irrelevant;irrelevant
irrelevant;data3;data4;irrelevant;irrelevant
irrelevant;data5;data6;irrelevant;irrelevant
irrelevant;data7;data8;irrelevant;irrelevant
irrelevant;data1;data2;irrelevant;irrelevant
irrelevant;data9;data0;irrelevant;irrelevant
irrelevant;data1;data2;irrelevant;irrelevant
irrelevant;data3;data4;irrelevant;irrelevant
Desired output
irrelevant;data5;data6;irrelevant;irrelevant
irrelevant;data7;data8;irrelevant;irrelevant
irrelevant;data9;data0;irrelevant;irrelevant
I have found solutions where only first line is printed to the output:
sort -u -t ";" -k2,1 file
but this is not enough.
I have tried to use uniq -u but I can't find a way to check only a few columns.
Using awk:
awk -F';' '!seen[$2,$3]++{data[$2,$3]=$0}
END{for (i in seen) if (seen[i]==1) print data[i]}' file
irrelevant;data5;data6;irrelevant;irrelevant
irrelevant;data7;data8;irrelevant;irrelevant
irrelevant;data9;data0;irrelevant;irrelevant
Explanation: If $2,$3 combination doesn't exist in seen array then a new entry with key of $2,$3 is stored in data array with whole record. Every time $2,$3 entry is found a counter for $2,$3 is incremented. Then in the end those entries with counter==1 are printed.
If order is important and if you can use perl then:
perl -F";" -lane '
$key = #F[1,2];
$uniq{$key}++ or push #rec, [$key, $_]
}{
print $_->[1] for grep { $uniq{$_->[0]} == 1 } #rec' file
irrelevant;data5;data6;irrelevant;irrelevant
irrelevant;data7;data8;irrelevant;irrelevant
irrelevant;data9;data0;irrelevant;irrelevant
We use column2 and column3 to create composite key. We create array of array by pushing the key and the line to array rec for the first occurrence of the line.
In the END block, we check if that occurrence is the only occurrence. If so, we go ahead and print the line.
awk '!a[$0]++' file_input > file_output
This worked for me. It compares whole lines.

awk: print first line of file before reading lines

How would I go about printing the first line of given input before I start stepping through each of the lines with awk?
Say I wanted to run the command ps aux and return the column headings and a particular pattern I'm searching for. In the past I've done this:
ps aux | ggrep -Pi 'CPU|foo'
Where CPU is a value I know will be in the first line of input as it's one of the column headings and foo is the particular pattern I'm actually searching for.
I found an awk pattern that will pull the first line:
awk 'NR > 1 { exit }; 1'
Which makes sense, but I can't seem to figure out how to fire this before I do my pattern matching on the rest of the input. I thought I could put it in the BEGIN section of the awk command but that doesn't seem to work.
Any suggestions?
Use the following awk script:
ps aux | awk 'NR == 1 || /PATTERN/'
it prints the current line either if it is the first line in output or if it contains the pattern.
Btw, the same result could be achieved using sed:
ps aux | sed -n '1p;/PATTERN/p'
If you want to read in the first line in the BEGIN action, you can read it in with getline, process it, and discard that line before moving on to the rest of your awk command. This is "stepping in", but may be helpful if you're parsing a header or something first.
#input.txt
Name City
Megan Detroit
Jackson Phoenix
Pablo Charlotte
awk 'BEGIN { getline; col1=$1; col2=$2; } { print col1, $1; print col2, $2 }' input.txt
# output
Name Megan
City Detroit
Name Jackson
City Phoenix
Name Pablo
City Charlotte
Explaining awk BEGIN
I thought I could put it in the BEGIN section ...
In awk, you can have more than one BEGIN clause. These are executed in order before awk starts to read from stdin.

Resources