I have a txt file with columns such as name, position, department, etc. I need to get certain characters from each column to generate a username and password for each listing.
I'm approaching this problem by first separating the columns with:
awk 'NR>10{print $1, $2, $3, ...etc}' $filename
#This skips the header information and simply prints out each specified column.
And then moving on to grab each character that I need from their respective fields.
The problem now is that each row is not uniform. For example the listings are organized as such:
(lastName, firstName, middleInitial, position, departmentName)
Some of the entries in the text file do not have certain fields such as middle initial. So when I try to list fields 3, 4, and 5 the entries without their middle initials return:
(position, departmentName, empty/null (I'm not sure how awk handles this))
The good news is that I do not need the middle initial, so I can ignore these listings . How can I go about grabbing the last 2 (in this example) columns from the file so I can isolate the fields that every entry has in order to cut the necessary characters out of them?
You can get them by $(NF-1) and $NF, they are last 2nd column and last column
echo "1,2,3,4" | awk -F, 'BEGIN{OFS=","}{print $(NF-1), $NF}'
NF means number of fields. If you have 4 fields. (NF-1) would be column 3, and $(NF-1) is the value of column 3.
Output would be
3,4
Another example with different length fields in file:
sample.csv
1,2,3,4,5
a,b,c,d
Run
awk -F, 'BEGIN{OFS=","}{print $(NF-1), $NF}' sample.csv
Output:
4,5
c,d
Related
I have this small file small.csv:
STATE|STAGE|SUBCAT_ID|TOTAL TIMING|FAMA_COEFF_TIMING|DB_IMPORT_TIMING|COMMENT
SUCCEEDED|fe|L3-002559|110|7|15
SUCCEEDED|staging|L3-002241|46||24
And this bigger file big.csv:
STATE|STAGE|SUBCAT_ID|TOTAL TIMING|FAMA_COEFF_TIMING|DB_IMPORT_TIMING|COMMENT
SUCCEEDED|fe|L3-004082|16|0|8
SUCCEEDED|staging|L3-002730|85||57
SUCCEEDED|staging|L3-002722|83||56
SUCCEEDED|fe|L3-002559|100|7|15
I need a command (probably awk) that will loop on the small.csv file to check if the 1st, 2nd and 3rd column match a record in the big.csv file and then calculate based on the 4th column the difference small-big. So in the example above, since the 1st record's first 3 columns match the 4th record in big.csv the output would be:
SUCCEEDED|fe|L3-002559|10
where 10 is 110-100
Thank you
Assuming that lines with similar first three fields do not occur more than twice in the two files taken together. This works:
awk -F '|' 'FNR!=1 { key = $1 "|" $2 "|" $3; if(a[key]) print key "|" a[key]-$4; else a[key]=$4 }' small.csv big.csv
I have a file that looks like so:
file1.txt
rs13339951:45007956:T:C 45007956
rs2838331 45026728
rs5647 12335
rs4687576 5353566
file2.txt
rs13339951 45007956
rs2838331 45026728
rs5647 12335
rs4687576:ATCFHF 5353566
More descriptions:
Some of the values in column1 are identical between the 2 files but not all
The values in column2 are all identical between the 2 files
I want to identify the rows for which the values in column1 differs between the 2 files. I.e. these rows 1 and 4 in my example. I can do this with diff file1.txt and file2.txt.
However, I would like to obtain a end file like so (see below). Indeed, I aim to use sed to replace the names of one file in the other so that both files match completely.
rs13339951:45007956:T:C rs13339951
rs4687576 rs4687576:ATCFHF
awk is perfect for this
awk 'FNR==NR {a[$2]=$1; next} a[$2]!=$1 {print a[$2] " " $1}' file1 file2
outputs
rs13339951:45007956:T:C rs13339951
rs4687576 rs4687576:ATCFHF
We are passing two files to awk. It will pass over them consecutively.
FNR==NR {.... next} { ... }
With this "trick" the first action is executed for the first file and the second action is executed for the second file.
a[$2]=$1
A key value lookup table. Second column is key first column is value. We build this lookup table while reading the first file.
a[$2]!=$1 {print a[$2] " " $1}
While iterating over the second file, compare the current first column with the value in the lookup table. If they do not match print the desired output.
Have a resulting file which contains values from different XML files.
The file have 5 columns separated by ";" in case that all pattern matched.
First column = neutral Index
Second column = specific Index1
Third column = file does contain Index1
Fourth column = specific Index2
Fifth column = file does contain Index2
Not matching pattern with Index2 (like last three lines) should also have 5 columns, while the last two columns should be like the first two lines.
The sorted files looks like:
AAA;AAA.1D1;file_X;AAA.2D1;file_Y
AAA;AAA.1E1;file_A;AAA.2E1;file_B
AAA;AAA.2F1;file_C
BBB;BBB.2G1;file_D
CCC;CCC.1B1;file_H
YYY;YYY.2M1;file_N
The desired result would be:
AAA;AAA.1D1;file_X;AAA.2D1;file_Y
AAA;AAA.1E1;file_A;AAA.2E1;file_B
AAA;;;AAA.2F1;file_C
BBB;;;BBB.2G1;file_D
CCC;CCC.1B1;file_H;;
YYY;;;YYY.2M1;file_N
If you have any idea/hint, your help is appreciated! Thanks in advance!
Updated Answer
In the light of the updated requirement, I think you want something like this:
awk -F';' 'NF==3 && $2~/\.1/{$0=$0 ";;"}
NF==3 && $2~/\.2/{$0=$1 ";;;" $2 ";" $3} 1' file
which can be written as a one-liner:
awk -F';' 'NF==3 && $2~/\.1/{$0=$0 ";;"} NF==3 && $2~/\.2/{$0=$1 ";;;" $2 ";" $3} 1' YourFile
Original Answer
I would do that with awk:
awk -F';' 'NF==3{$0=$1 ";;;" $2 ";" $3}1' YourFile
AAA;AAA.1D1;file_X;AAA.2D1;file_Y
AAA;AAA.1E1;file_A;AAA.2E1;file_B
AAA;;;AAA.2F1;file_C
BBB;;;BBB.2G1;file_D
YYY;;;YYY.2M1;file_N
That says..."run awk on YourFile using ';' as field separator. If there are only 3 fields on any line, recreate the line using the existing first field, three semi-colons and then the other two fields. The 1 at the end, means print the current line`".
If you don't use awk much, NF refers to the number of fields, $0 refers to the entire current line, $1 refers to the first field on the line, $2 refers to the second field etc.
in bash I use grep -w -f list1.txt list2.txt to search a list1 name into list2, they are one-column files.
Now I need to search this list1's namesinto a multiple columns file (a matrix, tab delimited or csv), how do I get the name and the corresponding column number?
List1 is:
SERPINA3
ADRA1D
BDNF
ADSS
Matrix is:
**CLUST1 CLUST2 CLUST3**
AAMP A1BG ACACB
ACADSB A2M ADRA1D
ACO1 SERPINA3 AK4
ACP5 ACADM ALDH1A3
PLIN2 ACR AMD1
ADORA2B ACO2 ARSB
ADSL ALAS1 BDNF
ADSS ALB OSGIN2
output should be
SERPINA3 CLUST2
ADRA1D CLUST2
BDNF CLUST3
ADSS CLUST1
Thanks.
awk to the rescue!
$ awk 'NR==FNR{a[$0];next}
FNR==1{split($0,h);next}
{for(i=1;i<=NF;i++) if($i in a) print $i, h[i]}' file{1,2}
ADRA1D CLUST3
SERPINA3 CLUST2
BDNF CLUST3
ADSS CLUST1
you lose the order of file1, there are other ways to handle it, not sure it's important.
Explanation
NR==FNR{a[$0];next} store records of first file in array a, skip the rest while processing first file
FNR==1{split($0,h);next} now we know it's the second file, split the header to array h for reference of column names (first row), skip rest
for(i=1;i<=NF;i++) main loop for the second file for each record (line) iterate over all fields
if($i in a) if any field is in the array a (that is first file)
print $i, h[i]} print the field and column name (indexed by the field number)
file{1,2} shorthand for file1 file2, your case will be List1 Matrix
I have a tab-separated text file that has 4 columns of data:
StudentId Student Name GPA Major
I have to write a shell command that will stores the student names that are CS majors to another file. I used grep cs students.txt which works to display just students that are cs, but I do not know how to then take just the student's names and save them to a file.
Assuming that your input file is tab-separated (so you can have spaces in names):
awk -F'\t' '$4 == "cs" { print $2 }' <infile >outfile
This matches column 4 (major) against "cs", and prints column 2 when it is an exact match.
Got it:
grep cs students.txt | cut -f2 >file1