Extracting one column of a text file to another when a pattern is matched - shell

I have a tab-separated text file that has 4 columns of data:
StudentId Student Name GPA Major
I have to write a shell command that will stores the student names that are CS majors to another file. I used grep cs students.txt which works to display just students that are cs, but I do not know how to then take just the student's names and save them to a file.

Assuming that your input file is tab-separated (so you can have spaces in names):
awk -F'\t' '$4 == "cs" { print $2 }' <infile >outfile
This matches column 4 (major) against "cs", and prints column 2 when it is an exact match.

Got it:
grep cs students.txt | cut -f2 >file1

Related

bash q4u - how to append value in different file matching on userid in column1

Thanks for looking. I've tried various awk, sed, paste, cut commands - arg.
Looking for a bash solution.
I have two files - file1 is a csv of attributes for each user, beginning with userid in col1.
in file2 lists userid and another unique attribute, matching each userid in file1
how can i write a 'for i in' , or 'while read line' on file1, store the userid on each line in file1 on col1 , search file2 for the userid and then store the 2nd value (listed like this userid,uniqueAttribute), and that append that uniqueAttribute at the end of the stored line in file1 (or a new file with the entire line from file1,uniqueAttribute.
file1.csv:
userid,first_name
bob,Robert
jane,Janice
file2.csv:
userid,unicorn
bob,yes
jane,no
and here super simple script that correlates the two:
while IFS=, read userid first_name
do
echo "$userid,$first_name,$(grep ^$userid file2.csv | cut -d, -f2)"
done < file1.csv
and the output would be:
userid,first_name,unicorn
bob,Robert,yes
jane,Janice,no
You asked for a loop but join would give you the same result:
join -t, file1.csv file2.csv

Counting the number of names in a category in a .csv with bash

I would like to count the number of students in a .csv file depending on the category
Category 1 is the name, Category 2 is the country, Category 3 is the city
The .csv file is displayed as such :
michael_s;jpa;NYC
john_d;chn;TXS
jim_h;usa;POP
I have tried in my .sh script but it didn't work
sort -k3 -t; students.csv
edit:
I am trying to make a bash script that counts students by city and something that can also count one city just by executing the script such as
cat students.csv | ./script.sh NYC
The terminal will only display the students from NYC
If I've understood you correctly, something like this?
cut -d";" -f3 mike.txt | sort | uniq -c
(Sorry, incorrect solution first time - updated now)
To count only one city:
cut -d";" -f3 mike.txt | grep "NYC" | wc -l
Depending on the size of the file, how often you'll be doing this etc. it may be sensible to look at other solutions, eg. awk. But this solution will work just fine.
The reason for the error message "sort: multi-character tab 'students.csv'" is you haven't given the -t option the separator character. If you add a semicolon after -t, the sort will work as expected:
sort -k3 -t';' students.csv
There is always awk:
$ awk -F\; 'a[$1]++==0{c++}END{print c}' file
3
Once you describe your requirements more throughly, (count the names but sort -k3. Update the OP, please) we can help you better.
Edited to match your update:
$ awk -F\; -v col=3 -v val=NYC '
(length(val) && $col==val) || length(val)==0 && a[$col]++==0 {
c++
}
END { print c }
' file
1
If you set -v val= with the value you are looking for and -v col= with the column number, it counts the occurrences of val in col. You you set col but not val ot counts different values in col.

Parsing key value in an csv file using shell script

Given csv input file
Id Name Address Phone
---------------------
100 Abc NewYork 1234567890
101 Def San Antonio 9876543210
102 ghi Chicago 7412589630
103 GHJ Los Angeles 7896541259
How do we grep/command for the value using the key?
if Key 100, expected output is NewYork
You can try this:
grep 100 filename.csv | cut -d, -f3
Output:
New York
This will search the whole file for the value 100, and return all the values in the 3rd column of the matching rows.
With GNU grep:
grep -Po '^100.....\K...........' file
or shorter:
grep -Po '^100.{5}\K.{11}' file
Output:
NewYork
Awk splits lines by whitespace sequences (by default).
You could use that to write a condition on the first column.
In your example input, it looks like not CSV but columns with fixed width (except the header). If that's the case, then you can extract the name of the city as a substring:
awk '$1 == 100 { print substr($0, 9, 11); }' input.csv
Here 9 is the starting position of the city column, and 11 is its length.
If on the other hand your input file is not what you pasted, but really CSV (comma separated values), and if there are no other embedded commas or newline characters in the input, then you can write like this:
awk -F, '$1 == 100 { print $3 }' input.csv

Taking x number of last columns from a file in awk

I have a txt file with columns such as name, position, department, etc. I need to get certain characters from each column to generate a username and password for each listing.
I'm approaching this problem by first separating the columns with:
awk 'NR>10{print $1, $2, $3, ...etc}' $filename
#This skips the header information and simply prints out each specified column.
And then moving on to grab each character that I need from their respective fields.
The problem now is that each row is not uniform. For example the listings are organized as such:
(lastName, firstName, middleInitial, position, departmentName)
Some of the entries in the text file do not have certain fields such as middle initial. So when I try to list fields 3, 4, and 5 the entries without their middle initials return:
(position, departmentName, empty/null (I'm not sure how awk handles this))
The good news is that I do not need the middle initial, so I can ignore these listings . How can I go about grabbing the last 2 (in this example) columns from the file so I can isolate the fields that every entry has in order to cut the necessary characters out of them?
You can get them by $(NF-1) and $NF, they are last 2nd column and last column
echo "1,2,3,4" | awk -F, 'BEGIN{OFS=","}{print $(NF-1), $NF}'
NF means number of fields. If you have 4 fields. (NF-1) would be column 3, and $(NF-1) is the value of column 3.
Output would be
3,4
Another example with different length fields in file:
sample.csv
1,2,3,4,5
a,b,c,d
Run
awk -F, 'BEGIN{OFS=","}{print $(NF-1), $NF}' sample.csv
Output:
4,5
c,d

Extracting field from last row of given table using sed

I would like to write a bash script to extract a field in the last row of a table. I will illustrate by example. I have a text file containing tables with space delimited fields like ...
Table 1 (foobar)
num flag name comments
1 ON Frank this guy is frank
2 OFF Sarah she is tall
3 ON Ahmed who knows him
Table 2 (foobar)
num flag name comments
1 ON Mike he is short
2 OFF Ahmed his name is listed twice
I want to extract the first field in the last row of Table1, which is 3. Ideally I would like to be able to use any given table's title to do this. There are guaranteed carriage returns between each table. What would be the best way to accomplish this, preferably using sed and grep?
Awk is perfect for this, print the first field in the last row for each record:
$ awk '!$1{print a}{a=$1}END{print a}' file
3
2
Just from the first record:
$ awk '!$1{print a;exit}{a=$1}' file
3
Edit:
For a given table title:
$ awk -v t="Table 1" '$0~t{f=1}!$1&&f{print a;f=0}{a=$1}END{if (f) print a}' file
3
$ awk -v t="Table 2" '$0~t{f=1}!$1&&f{print a;f=0}{a=$1}END{if (f) print a}' file
2
This sed line seems to work for your sample.
table='Table 2'
sed -n "/$table"'/{n;n;:next;h;n;/^$/b last;$b last;b next;:last;g;s/^\s*\(\S*\).*/\1/p;}' file
Explanation: When we find a line matching the table name in $table, we skip that line, and the next (the field labels). Starting at :next we push the current line into the hold space, get the next line and see if it is blank or the end of the file, if not we go back to :next, push the current line into hold and get another. If it is blank or EOF, we skip to :last, pull the hold space (the last line of the table) into pattern space, chop out all but the first field and print it.
Just read each block as a record with each line as a field and then print the first sub-field of the last field of whichever record you care about:
$ awk -v RS= -F'\n' '/^Table 1/{split($NF,a," "); print a[1]}' file
3
$ awk -v RS= -F'\n' '/^Table 2/{split($NF,a," "); print a[1]}' file
2
Better tool to that is awk!
Here is a kind legible code:
awk '{
if(NR==1) {
row=$0;
next;
}
if($0=="") {
$0=row;
print $1;
} else {
row=$0;
}
} END {
if(row!="") {
$0=row;
print $1;
}
}' input.txt

Resources