How to use AWK to grab a row in a file by a certain column - bash

1|1001|399.00|123
1|1001|29.99|234
2|1002|98.00|345
2|1002|29.98|456
3|1003|399.00|567
4|1004|234.56|456
How would I use awk to grab all the rows with '1002' in column 2?
If I wanted to grab all the rows with '2' in the first column, I could use grep ^2, but how do I search by different columns?

The typical solution is:
awk '$2 == 1002' FS=\| input-file
you get a slightly different result with:
$2 ~ 1002, which technically satisfies your query, but is probably not what you want. (It does a regex match, and so will match if the second column is "341002994").

Related

Add prefix to all rows and columns efficiently

My aim is to add a prefix to all rows and columns returned from an SQL query (all rows of the same column should take the same prefix). The way I am doing it at the moment is
echo "$(<my_sql_query> | awk '$0="prefixA_"$0' |
awk '$2="prefixB_"$2' |
awk '$3="prefixC_"$3' |
awk '$4="prefixD_"$4')"
The script above does exactly what I want but what I would like to know is whether there is faster way of doing it.
In case you are willing to do it with echo + awk solution then you could do it in a single awk, where we could prefix values in a single shot, though I am not sure about your query but considering here fields are separated by space only.
echo "$<my_sql-query>" |
awk '{$0="prefixA_"$0;$2="prefixB_"$2;$3="prefixC_"$3;$4="prefixD_"$4} 1'
EDIT: Adding a generic solution here, by which we could pass field numbers and their respective values to and it could be added to fields, fair warning not tested it much because samples were not given.
echo "$<my_sql-query>" |
awk '
function addPrefix(fieldNumbers,fieldValues){
num=split(fieldNumbers,arr1,"#")
split(fieldValues,arr2,"#")
for(i=1;i<=num;i++){
$arr1[i]=arr2[i]$arr1[i]
}
}
addPrefix("1#2#3#4","prefixA_#prefixB_#prefixC_#prefixD_")
1'

AWK write to a file based on number of fields in csv

I want to iterate over a csv file and discard the rows while writing to a file which doesnt have all columns in a row.
I have an input file mtest.csv like this
IP##Process##Device##ID
TestIP1##TestProcess2##TestDevice1##TestID1
TestIP2##TestProcess2##TestDevice2
TestIP3##TestProcess3##TestDevice3##TestID3
But I am trying to only write those row records where all the 4 columns are present. The output should not have the TestIP2 column complete row as it has 3 columns.
Sample output should look like this:
IP##Process##Device##ID
TestIP1##TestProcess2##TestDevice1##TestID1
TestIP3##TestProcess3##TestDevice3##TestID3
I used to do like this to get all the columns earlier but it writes the TestIP2 row as well which has 3 columns
awk -F "\##" '{print $1"\##"substr($2,1,50)"\##"substr($3,1,50)"\##"substr($4,1,50)}' mtest.csv >output2.csv
But when I try to ensure that it writes to file when all 4 columns are present, it doesn't work
awk -F "\##", 'NF >3 {print $1"\##"substr($2,1,50)"\##"substr($3,1,50)"\##"substr($4,1,50); exit}' mtest.csv >output2.csv
You are making things harder than it need to be. All you need to do is check NF==4 to output any records containing four fields. Your total awk expression would be:
awk -F'##' NF==4 < mtest.csv
(note: the default action by awk is print so there is no explicit print required.)
Example Use/Output
With your sample input in mtest.csv, you would receive:
$ awk -F'##' NF==4 < mtest.csv
IP##Process##Device##ID
TestIP1##TestProcess2##TestDevice1##TestID1
TestIP3##TestProcess3##TestDevice3##TestID3
Thanks David and vukung
Both your solutions are okay.I want to write to a file so that i can trim the length of each field as well
I think this below statement works out
awk -F "##" 'NF>3 {print $1"\##"substr($2,1,50)"\##"substr($3,1,2)"\##"substr($4,1,3)}' mtest.csv >output2.csv

Taking x number of last columns from a file in awk

I have a txt file with columns such as name, position, department, etc. I need to get certain characters from each column to generate a username and password for each listing.
I'm approaching this problem by first separating the columns with:
awk 'NR>10{print $1, $2, $3, ...etc}' $filename
#This skips the header information and simply prints out each specified column.
And then moving on to grab each character that I need from their respective fields.
The problem now is that each row is not uniform. For example the listings are organized as such:
(lastName, firstName, middleInitial, position, departmentName)
Some of the entries in the text file do not have certain fields such as middle initial. So when I try to list fields 3, 4, and 5 the entries without their middle initials return:
(position, departmentName, empty/null (I'm not sure how awk handles this))
The good news is that I do not need the middle initial, so I can ignore these listings . How can I go about grabbing the last 2 (in this example) columns from the file so I can isolate the fields that every entry has in order to cut the necessary characters out of them?
You can get them by $(NF-1) and $NF, they are last 2nd column and last column
echo "1,2,3,4" | awk -F, 'BEGIN{OFS=","}{print $(NF-1), $NF}'
NF means number of fields. If you have 4 fields. (NF-1) would be column 3, and $(NF-1) is the value of column 3.
Output would be
3,4
Another example with different length fields in file:
sample.csv
1,2,3,4,5
a,b,c,d
Run
awk -F, 'BEGIN{OFS=","}{print $(NF-1), $NF}' sample.csv
Output:
4,5
c,d

Insert column delimiters before pattern in a sorted file on a mac

Have a resulting file which contains values from different XML files.
The file have 5 columns separated by ";" in case that all pattern matched.
First column = neutral Index
Second column = specific Index1
Third column = file does contain Index1
Fourth column = specific Index2
Fifth column = file does contain Index2
Not matching pattern with Index2 (like last three lines) should also have 5 columns, while the last two columns should be like the first two lines.
The sorted files looks like:
AAA;AAA.1D1;file_X;AAA.2D1;file_Y
AAA;AAA.1E1;file_A;AAA.2E1;file_B
AAA;AAA.2F1;file_C
BBB;BBB.2G1;file_D
CCC;CCC.1B1;file_H
YYY;YYY.2M1;file_N
The desired result would be:
AAA;AAA.1D1;file_X;AAA.2D1;file_Y
AAA;AAA.1E1;file_A;AAA.2E1;file_B
AAA;;;AAA.2F1;file_C
BBB;;;BBB.2G1;file_D
CCC;CCC.1B1;file_H;;
YYY;;;YYY.2M1;file_N
If you have any idea/hint, your help is appreciated! Thanks in advance!
Updated Answer
In the light of the updated requirement, I think you want something like this:
awk -F';' 'NF==3 && $2~/\.1/{$0=$0 ";;"}
NF==3 && $2~/\.2/{$0=$1 ";;;" $2 ";" $3} 1' file
which can be written as a one-liner:
awk -F';' 'NF==3 && $2~/\.1/{$0=$0 ";;"} NF==3 && $2~/\.2/{$0=$1 ";;;" $2 ";" $3} 1' YourFile
Original Answer
I would do that with awk:
awk -F';' 'NF==3{$0=$1 ";;;" $2 ";" $3}1' YourFile
AAA;AAA.1D1;file_X;AAA.2D1;file_Y
AAA;AAA.1E1;file_A;AAA.2E1;file_B
AAA;;;AAA.2F1;file_C
BBB;;;BBB.2G1;file_D
YYY;;;YYY.2M1;file_N
That says..."run awk on YourFile using ';' as field separator. If there are only 3 fields on any line, recreate the line using the existing first field, three semi-colons and then the other two fields. The 1 at the end, means print the current line`".
If you don't use awk much, NF refers to the number of fields, $0 refers to the entire current line, $1 refers to the first field on the line, $2 refers to the second field etc.

Bash/Shell: How to remove duplicates from csv file by columns?

I have a csv separated with ;. I need to remove lines where content of 2nd and 3rd column is not unique, and deliver the material to the standard output.
Example input:
irrelevant;data1;data2;irrelevant;irrelevant
irrelevant;data3;data4;irrelevant;irrelevant
irrelevant;data5;data6;irrelevant;irrelevant
irrelevant;data7;data8;irrelevant;irrelevant
irrelevant;data1;data2;irrelevant;irrelevant
irrelevant;data9;data0;irrelevant;irrelevant
irrelevant;data1;data2;irrelevant;irrelevant
irrelevant;data3;data4;irrelevant;irrelevant
Desired output
irrelevant;data5;data6;irrelevant;irrelevant
irrelevant;data7;data8;irrelevant;irrelevant
irrelevant;data9;data0;irrelevant;irrelevant
I have found solutions where only first line is printed to the output:
sort -u -t ";" -k2,1 file
but this is not enough.
I have tried to use uniq -u but I can't find a way to check only a few columns.
Using awk:
awk -F';' '!seen[$2,$3]++{data[$2,$3]=$0}
END{for (i in seen) if (seen[i]==1) print data[i]}' file
irrelevant;data5;data6;irrelevant;irrelevant
irrelevant;data7;data8;irrelevant;irrelevant
irrelevant;data9;data0;irrelevant;irrelevant
Explanation: If $2,$3 combination doesn't exist in seen array then a new entry with key of $2,$3 is stored in data array with whole record. Every time $2,$3 entry is found a counter for $2,$3 is incremented. Then in the end those entries with counter==1 are printed.
If order is important and if you can use perl then:
perl -F";" -lane '
$key = #F[1,2];
$uniq{$key}++ or push #rec, [$key, $_]
}{
print $_->[1] for grep { $uniq{$_->[0]} == 1 } #rec' file
irrelevant;data5;data6;irrelevant;irrelevant
irrelevant;data7;data8;irrelevant;irrelevant
irrelevant;data9;data0;irrelevant;irrelevant
We use column2 and column3 to create composite key. We create array of array by pushing the key and the line to array rec for the first occurrence of the line.
In the END block, we check if that occurrence is the only occurrence. If so, we go ahead and print the line.
awk '!a[$0]++' file_input > file_output
This worked for me. It compares whole lines.

Resources