cat from file.csv with grep data - bash

I have data in file.csv:
(...)
0000046;0000046;04688;29;1;52.1683;20.5567
0000046;0000046;04688;2A;1;52.1818;20.5639
0000046;0000046;04688;3;1;52.1785;20.5629
0000046;0000046;04688;4;1;52.1815;20.5638
0000046;0000046;04688;5;;52.1779;20.5635
0000046;0000046;04688;6;1;52.1813;20.5636
0000046;0000046;04688;7;;52.1777;20.5634
0000046;0000046;04688;8;;52.1810;20.5635
0000046;0000046;04688;9;1;52.1775;20.5631
0000046;0000046;05027;2;;52.1908;20.5660
0000046;0000046;05027;4;1;52.1907;20.5649
0000046;0000046;05527;1;1;52.1824;20.5636
(...)
I need to extract lines where the third field matches a given value. I tried
cat file.csv |grep 05027
Unfortunately, this matches any line containing 05027 anywhere. How can I restrict to matching only on the third field?

First of all, you don't need the cat for grep, you can just grep pattern file
awk is easier to handle column based data input.
What you can try is:
awk -F';' '$3=="05027"' file

Related

Using the result of grep to sort files

I need to sort a file based on the results of grep. Example:
cat cuts.txt | grep -P '(?<=[+]).*(?=[+])'
text +124+ text
text +034+ text
text +334+ text
How do I sort lines in crescent order based on what grep found?
Could you please try following, written and tested with shown samples. Considering that you need to sort by 2nd field's increasing values. Since OP mentioned +digits+ values could be present anywhere in line hence coming with this Generic solution here.
grep -P '(?<=[+]).*(?=[+])' Input_file |
awk '
match($0,/\+[0-9]+\+/){
print substr($0,RSTART,RLENGTH),$0
}
' | sort -k1.2 | cut -d' ' -f2-
Output will be as follows.
text +034+ text
text +124+ text
text +334+ text
Logical explanation: After passing grep command's output to awk command using regex in awk command to find +digits+ values in lines and printing 1st matched values then whole line, by doing this it will be easy for sort to why because we always get to sort now on 1st field. Once we do sorting on first field then use cut to get everything from 2nd field onwards, why because 1st field is an extra added field by awk command to make sort easier and not needed in actual output.
Also we need NOT to use a separate cat command to this one, we could directly read Input_file by grep command.

Can I use grep to extract a single column of a CSV file?

I'm trying to solve o problem I have to do as soon as possible.
I have a csv file, fields separated by ;.
I'm asked to make a shell command using grep to list only the third column, using regex. I can't use cut. It is an exercise.
My file is like this:
1;Evan;Bell;39;Obigod Manor;Ekjipih;TN;25008
2;Wayne;Watkins;22;Lanme Place;Cotoiwi;NC;86578
3;Danny;Vega;25;Fofci Center;Momahbih;MS;21027
4;Larry;Robinson;23;Bammek Boulevard;Gaizatoh;NE;27517
5;Myrtie;Black;20;Savon Square;Gokubpat;PA;92219
6;Nellie;Greene;23;Utebu Plaza;Rotvezri;VA;17526
7;Clyde;Reynolds;19;Lupow Ridge;Kedkuha;WI;29749
8;Calvin;Reyes;47;Paad Loop;Beejdij;KS;29247
9;Douglas;Graves;43;Gouk Square;Sekolim;NY;13226
10;Josephine;Estrada;48;Ocgig Pike;Beheho;WI;87305
11;Eugene;Matthews;26;Daew Drive;Riftemij;ME;93302
12;Stanley;Tucker;54;Cure View;Woocabu;OH;45475
13;Lina;Holloway;41;Sajric River;Furutwe;ME;62184
14;Hettie;Carlson;57;Zuheho Pike;Gokrobo;PA;89098
15;Maud;Phelps;57;Lafni Drive;Gokemu;MD;87066
16;Della;Roberson;53;Zafe Glen;Celoshuv;WV;56749
17;Cory;Roberson;56;Riltav Manor;Uwsupep;LA;07983
18;Stella;Hayes;30;Omki Square;Figjitu;GA;35813
19;Robert;Griffin;22;Kiroc Road;Wiregu;OH;39594
20;Clyde;Reynolds;19;Lupow Ridge;Kedkuha;WI;29749
21;Calvin;Reyes;47;Paad Loop;Beejdij;KS;29247
22;Douglas;Graves;43;Gouk Square;Sekolim;NY;13226
23;Josephine;Estrada;48;Ocgig Pike;Beheho;WI;87305
24;Eugene;Matthews;26;Daew Drive;Riftemij;ME;93302
I think I should use something like: cat < test.csv | grep 'regex'.
Thanks.
Right Tools For The Job: Using awk or cut
Assuming you want to match the third column against a specific field:
awk -F';' '$3 ~ /Foo/ { print $0 }' file.txt
...will print any line where the third field contains Foo. (Changing print $0 to print $3 would print only that third field).
If you just want to print the third column regardless, use cut: cut -d';' -f3 <file.txt
Wrong Tool For The Job: Using GNU grep
On a system where grep has the -o option, you can chain two instances together -- one to trim everything after the fourth column (and remove lines with less than four columns), another to take only the last remaining column (thus, the fourth):
str='foo;bar;baz;qux;meh;whatever'
grep -Eo '^[^;]*[;][^;]*[;][^;]*[;][^;]*' <<<"$str" \
| grep -Eo '[^;]+$'
To explain how that works:
^, outside of square brackets, matches only at the beginning of a line.
[^;]* matches any character except ; zero-or-more times.
[;] matches only the character ;.
...thus, each [^;]*[;] in the regex matches a single field, whether or not that field contains text. Putting four of those in the first stage means we're matching only fields, and grep -o tells grep to only emit content it was successfully able to match.
If you just need the 3rd field and it's always properly delimited with ';' why not use 'cut'?
cut -d';' -f3 <filename>
UPDATED:
OP wasn't clear, maybe only want to look at the 3rd line?
head -3 <filename> | tail -1
OR.. Maybe just getting of list of the things that appear in the 3rd field?
Not clear what the intended use of 'grep' would be??
cut -d';' -f3 <filename> | sort -u
As the other answers have said, using grep is a bad/unfortunate idea.
The only way I can think of using grep is to pull out a specific row where the 3rd column == some value. E.g.,
grep '^\([^;]*;\)\{2\}Bell;' test.txt
1;Evan;Bell;39;Obigod Manor;Ekjipih;TN;25008
Or if the first column is the index (not counting it as a column):
grep '^\([^;]*;\)\{3\}39;' test.txt
1;Evan;Bell;39;Obigod Manor;Ekjipih;TN;25008
Even using grep in this case leads to a pretty ugly solution.
Edit: Didn't see Charles Duffy's answer... that's pretty clever.

Bash extract parts from string and create csv

I have a document with 1+ million of the following strings and I like to create some new structures byextract some parts and create a csv file for it, what's the quickest way to do this?
document/0006-291X(85)91157-X
I would like to have a file with on each line the original string and the extracted parts
document/0006-291X(85)91157-X;0006-291X;85
You can try this one-liner awk:
awk -F "[/()]" -v OFS=';' '{print $0,$(NF-2),$(NF-1)}' your-file
It parses the fields of each line with taking /,(,) as delimiters. Then it prints out the whole line, the 3rd field and the second field starting from the end of the line. The option -v OFS=';' prints semicolumns as output field separator.

Bash script compare values from 2 files and print output values from one file

I have two files like this;
File1
114.4.21.198,cl_id=1J3W7P7H0S3L6g85900g736h6_101ps
114.4.21.205,cl_id=1O3M7A7Q0S3C6h85902g7b3h7_101pf
114.4.21.205,cl_id=1W3C7Z7W0U3J6795197g177j9_117p1
114.4.21.213,cl_id=1I3A7J7N0M3W6e950i7g2g2i0_1020h
File2
cl_id=1B3O7M6C8T4O1b559i2g930m0_1165d
cl_id=1X3J7M6J0W5S9535180h90302_101p5
cl_id=1G3D7X6V6A7R81356e3g527m9_101nl
cl_id=1L3J7R7O0F0L74954h2g495h8_117qk
cl_id=1L3J7R7O0F0L74954h2g495h8_117qk
cl_id=1J3W7P7H0S3L6g85900g736h6_101ps
cl_id=1W3C7Z7W0U3J6795197g177j9_117p1
cl_id=1I3A7J7N0M3W6e950i7g2g2i0_1020h
cl_id=1Q3Y7Q7J0M3E62953e5g3g5k0_117p6
I want to compare cl_id values that exist on file1 but not exist on file2 and print out the first values from file1 (IP Address).
it should be like this
114.4.21.198
114.4.21.205
114.4.21.205
114.4.21.213
114.4.23.70
114.4.21.201
114.4.21.211
120.172.168.36
I have tried awk,grep diff, comm. but nothing come close. Please tell the correct command to do this.
thanks
One proper way to that is this:
grep -vFf file2 file1 | sed 's|,cl_id.*$||'
I do not see how you get your output. Where does 120.172.168.36 come from.
Here is one solution to compare
awk -F, 'NR==FNR {a[$0]++;next} !a[$1] {print $1}' file2 file1
114.4.21.198
114.4.21.205
114.4.21.205
114.4.21.213
Feed both files into AWK or perl with field separator=",". If there are two fields, add the fields to a dictionary/map/two arrays/whatever ("file1Lines"). If there is just one field (this is file 2), add it to a set/list/array/whatever ("file2Lines"). After reading all input:
Loop over the file1Lines. For each element, check whether the key part is present in file2Lines. If not, print the value part.
This seems like what you want to do and might work, efficiently:
grep -Ff file2.txt file1.txt | cut -f1 -d,
First the grep takes the lines from file2.txt to use as patterns, and finds the matching lines in file1.txt. The -F is to use the patterns as literal strings rather then regular expressions, though it doesn't really matter with your sample.
Finally the cut takes the first column from the output, using , as the column delimiter, resulting in a list of IP addresses.
The output is not exactly the same as your sample, but the sample didn't make sense anyway, as it contains text that was not in any of the input files. Not sure if this is what you wanted or something more.

Save content between delimeters and compare

I have many difficulties to extract the content delimeted between '[ ]' and compare it using shell script. After that i need to erase the other fields with [].
I received files which filenames content som [xxxx] pattern, one of them useful and which i use to classify them.
One example:
Input: sample[t.225][lb.445][21042013][0913605].extension
Output (pattern lb.445): sample[lb.445].extension
I know i can use a grep with the pattern but after that i don't know how to erase the other fields in the filename. I think the strategy of use a grep is not the best option and the option of use a loop sound really weird in shell script and pattern comparison.
awk may help in this case:
awk -F'[][]' '{print $1"["$4"]"$NF}' input
the above line takes the text from the 2nd [...] block
take the example in your question:
kent$ echo "sample[t.225][lb.445][21042013][0913605].extension"|awk -F'[][]' '{print $1"["$4"]"$NF}'
sample[lb.445].extension

Resources