Using the result of grep to sort files - sorting

I need to sort a file based on the results of grep. Example:
cat cuts.txt | grep -P '(?<=[+]).*(?=[+])'
text +124+ text
text +034+ text
text +334+ text
How do I sort lines in crescent order based on what grep found?

Could you please try following, written and tested with shown samples. Considering that you need to sort by 2nd field's increasing values. Since OP mentioned +digits+ values could be present anywhere in line hence coming with this Generic solution here.
grep -P '(?<=[+]).*(?=[+])' Input_file |
awk '
match($0,/\+[0-9]+\+/){
print substr($0,RSTART,RLENGTH),$0
}
' | sort -k1.2 | cut -d' ' -f2-
Output will be as follows.
text +034+ text
text +124+ text
text +334+ text
Logical explanation: After passing grep command's output to awk command using regex in awk command to find +digits+ values in lines and printing 1st matched values then whole line, by doing this it will be easy for sort to why because we always get to sort now on 1st field. Once we do sorting on first field then use cut to get everything from 2nd field onwards, why because 1st field is an extra added field by awk command to make sort easier and not needed in actual output.
Also we need NOT to use a separate cat command to this one, we could directly read Input_file by grep command.

Related

Passing parameter as control number and get table name

I have a scenario where there is a file with control number and table name, hereby an example:
1145|report_product|N|N|
1156|property_report|N|N
I need to pass the control number as 1156 and have to get table name as PR once I get the table name as PR then I need to add some text on that.
Please help
Assuming the controll file is:
# cat controlfile.txt
1145|report_product|N|N
1156|property_report|N|N
To fine some line you can use:
grep 1156 controlfile.txt
If needed you can save it to a variable: result=$(grep 1156 file.txt)
Assuming you need to add append something on this line.... you can use:
sed '/^1156/s/$/ 123/' controlfile.txt
This example will add "123" at the end of line that start with 1156
If needed, add more details like what output you want or anything else to help us better understand your need.
You need to work in two stages:
You need to find the line, containing 1156.
You need to get the information from that line.
In order to find the line (as already indicated by Juranir), you can use grep:
Prompt> grep "1156" control.txt
1156|property_report|N|N
In order to get the information from that line, you need to get the second column, based on the vertical line (often referred as a "pipe" character), for which there are different approaches. I'll give you two:
The cut approach: you can cut a line into different parts and take a character, a byte, a column, .... In this case, this is what you need:
grep "1156" control.txt | cut -d '|' -f 2
-d '|' : use the vertical line as a column separator
-f 2 : show the second field (column)
The awk approach: awk is a general "text modifier" with multiple features (showing parts of text, performing basic calculations, ...). For this case, it can be used as follows:
grep "1156" control.txt | awk -F '|' '{print $2}'
-F '|' : use the vertical line as a column separator
'{print $2}' : the awk script for showing the second field.
Oh, by the way, I've edited your question. You might press the edit button in order to learn how I did this :-)
For getting only the first letters, separated by the underscores:
grep "1156" control.txt | awk -F '|' '{print $2}' | awk -F '_' '{print substr($1,1,1) substr($2,1,1)}'
(something like that)

sed/Awk/cut... How to decide which to use to parse Docker output?

My output:
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
jenkins/jenkins lts 806f56c84444 8 days ago 703MB
mongo latest 0da05d84b1fe 2 weeks ago 394MB
I would like to just cut the image ID alone from the output.
I tried using cut:
docker images | cut -d " " -f1
REPOSITORY
jenkins/jenkins
The -f1 just gives me the repository names, if I use -f3 it tends to be empty. Since the delimiter is not a single space I don't see how to get the desired output.
Can we cut based on field names?
I read the documentation and did not see anything relevant. I also saw that there is a way to achieve this using sed/AWK which i'm still figuring out.
In the meanwhile is there a easier way to achieve this using the cut command?
I'm new to Unix/Linux, how can I determine which of Sed/AWK/Cut to prefer?
Your input seems to have a fixed width of 20 chars for each field, so you can make use of gawk's FIELDWIDTHS feature.
$ awk -v FIELDWIDTHS="20 20 20 20 20" '{ print $3 }' file
IMAGE ID
806f56c84444
0da05d84b1fe
$
$ awk -v FIELDWIDTHS="20 20 20 20 20" '{ printf "%20s%20s\n", $1, $3 }' file
REPOSITORY IMAGE ID
jenkins/jenkins 806f56c84444
mongo 0da05d84b1fe
From man gawk:
If the FIELDWIDTHS variable is set to a space-separated list of numbers, each field is expected to have fixed width, and gawk splits up the record using the specified widths. Each field width may optionally be preceded by a colon-separated value specifying the number of characters to skip before the field starts. The value of FS is ignored. Assigning a new value to FS or FPAT overrides the use of FIELDWIDTHS.
You have to "squeeze" the space padding in the default output to single space.
1 2 == 1-space-space-2 == Field 1 before 1st space, Field between 1st and 2nd space, Field 3 after 2nd space.
cut -d' ' -f1 ==> '1'
cut -d' ' -f2 ==> '' empty field between 1st and 2nd delimiter
cut -d' ' -f3 ==> '2'
So, in your case use sed to replace consecutive spaces with 1:
docker images | sed 's/ */ /g' | cut -d " " -f1,3
If the output is fixed columns widths, then you can use this variant of cut:
docker images | cut -c1-20,41-60
This will cut out columns 41 to 60, where we find the Image ID.
If ever the output uses TAB for padding, you should use expand -t n to make the output consistently space padded then apply the appropriate cut -cx,y, e.g. (numbers may need adjusting):
docker images | expand -t 4 | cut -c1-20,41-60
Try this:
docker images | tr -s ' ' | cut -f3 -d' '
The command tr -s ' ' convert multiple spaces into a single one and after with cut you can grab your field. This work fine if values in your field haven't spaces.
With Procedural Text Edit it's :
forEach line {
if (contains ci "REPOSITORY") { remove }
keepRange word 2 1
}
removeEmptyLines // <- optional
In the general case, avoid parsing output meant for human consumption. Many modern utilities offer an option to produce output in some standard format like JSON or XML, or even CSV (though that is less strictly specified, and exists in multiple "dialects").
docker in particular has a generalized --format option which allows you to specify your own output format:
docker images --format "{{.ID}}"
If you cannot avoid writing your own parser (are you really sure!? Look again!), cut is suitable for output with a specific single-character delimiter, or otherwise fairly regular output. For everything else, I would go with Awk. Out of the box, it parses columns from sequences of whitespace, so it does precisely what you specifically ask for:
docker images | awk 'NR>1 { print $3 }'
(NR>1 skips the first line, which contains the column headers.)
In the case of fixed-width columns, it allows you to pull out a string by index:
docker images | awk 'NR>1 { print substr($0, 41, 12) }'
... though you could do that with cut, too:
docker images | cut -c41-53
... but notice that Docker might adjust column widths depending on your screen size!
Awk lets you write regular expression extractions, too:
awk 'NR>1 { sub(/^([^[:space:]]*[[:space:]]+){2}/, ""); sub(/[[:space]].*/, ""); print }'
This is where it overlaps with sed:
sed -n '2,$s/^[^ ]\+[ ]\+[^ ]\+[ ]\+\([^ ]\+\)[ ].*/\1/p'
though sed is significantly less human-readable, especially for nontrivial scripts. (This is still pretty trivial.)
If you haven't used regex before, the above will seem cryptic, but it really isn't very hard to pick apart. We are looking for sequences of non-spaces (a field in a column) followed by sequences of spaces (a column separator) - two before the ID field and whatever comes after it, starting from the first space after the ID column.
If you want to learn shell scripting, you should probably also learn at least the basics of Awk (and a passing familiarity with sed). If you just want to get the job done, and perhaps aren't specifically interested in learning U*x tools (though you probably should be anyway!), perhaps instead learn a modern scripting language like Python or Ruby.
... Here's a Python docker library:
import docker
client = docker.from_env()
for image in client.images.list():
print(image.id)
Can we cut based on field names? No.
How can I determine which of Sed/AWK/Cut to prefer? YMMV. For this particular input where fields are separated by two or more spaces, using awk you could set field separator to " +" (two or more spaces), look for desired field name (IMAGE ID below) and print only that particular field:
$ awk -F" +" ' # set field separator
{
if(f=="") # while we have not determined the desired field
for(i=1;i<=NF;i++) # ... keep looking
if($i=="IMAGE ID")
f=i
if(f!="") # once found
print $f # start printing it
}' file
Output:
IMAGE ID
806f56c84444
0da05d84b1fe
As one-liner:
$ awk -F" +" '{if(f=="")for(i=1;i<=NF;i++)if($i=="IMAGE ID")f=i;if(f!="")print $f}' file

Can I use grep to extract a single column of a CSV file?

I'm trying to solve o problem I have to do as soon as possible.
I have a csv file, fields separated by ;.
I'm asked to make a shell command using grep to list only the third column, using regex. I can't use cut. It is an exercise.
My file is like this:
1;Evan;Bell;39;Obigod Manor;Ekjipih;TN;25008
2;Wayne;Watkins;22;Lanme Place;Cotoiwi;NC;86578
3;Danny;Vega;25;Fofci Center;Momahbih;MS;21027
4;Larry;Robinson;23;Bammek Boulevard;Gaizatoh;NE;27517
5;Myrtie;Black;20;Savon Square;Gokubpat;PA;92219
6;Nellie;Greene;23;Utebu Plaza;Rotvezri;VA;17526
7;Clyde;Reynolds;19;Lupow Ridge;Kedkuha;WI;29749
8;Calvin;Reyes;47;Paad Loop;Beejdij;KS;29247
9;Douglas;Graves;43;Gouk Square;Sekolim;NY;13226
10;Josephine;Estrada;48;Ocgig Pike;Beheho;WI;87305
11;Eugene;Matthews;26;Daew Drive;Riftemij;ME;93302
12;Stanley;Tucker;54;Cure View;Woocabu;OH;45475
13;Lina;Holloway;41;Sajric River;Furutwe;ME;62184
14;Hettie;Carlson;57;Zuheho Pike;Gokrobo;PA;89098
15;Maud;Phelps;57;Lafni Drive;Gokemu;MD;87066
16;Della;Roberson;53;Zafe Glen;Celoshuv;WV;56749
17;Cory;Roberson;56;Riltav Manor;Uwsupep;LA;07983
18;Stella;Hayes;30;Omki Square;Figjitu;GA;35813
19;Robert;Griffin;22;Kiroc Road;Wiregu;OH;39594
20;Clyde;Reynolds;19;Lupow Ridge;Kedkuha;WI;29749
21;Calvin;Reyes;47;Paad Loop;Beejdij;KS;29247
22;Douglas;Graves;43;Gouk Square;Sekolim;NY;13226
23;Josephine;Estrada;48;Ocgig Pike;Beheho;WI;87305
24;Eugene;Matthews;26;Daew Drive;Riftemij;ME;93302
I think I should use something like: cat < test.csv | grep 'regex'.
Thanks.
Right Tools For The Job: Using awk or cut
Assuming you want to match the third column against a specific field:
awk -F';' '$3 ~ /Foo/ { print $0 }' file.txt
...will print any line where the third field contains Foo. (Changing print $0 to print $3 would print only that third field).
If you just want to print the third column regardless, use cut: cut -d';' -f3 <file.txt
Wrong Tool For The Job: Using GNU grep
On a system where grep has the -o option, you can chain two instances together -- one to trim everything after the fourth column (and remove lines with less than four columns), another to take only the last remaining column (thus, the fourth):
str='foo;bar;baz;qux;meh;whatever'
grep -Eo '^[^;]*[;][^;]*[;][^;]*[;][^;]*' <<<"$str" \
| grep -Eo '[^;]+$'
To explain how that works:
^, outside of square brackets, matches only at the beginning of a line.
[^;]* matches any character except ; zero-or-more times.
[;] matches only the character ;.
...thus, each [^;]*[;] in the regex matches a single field, whether or not that field contains text. Putting four of those in the first stage means we're matching only fields, and grep -o tells grep to only emit content it was successfully able to match.
If you just need the 3rd field and it's always properly delimited with ';' why not use 'cut'?
cut -d';' -f3 <filename>
UPDATED:
OP wasn't clear, maybe only want to look at the 3rd line?
head -3 <filename> | tail -1
OR.. Maybe just getting of list of the things that appear in the 3rd field?
Not clear what the intended use of 'grep' would be??
cut -d';' -f3 <filename> | sort -u
As the other answers have said, using grep is a bad/unfortunate idea.
The only way I can think of using grep is to pull out a specific row where the 3rd column == some value. E.g.,
grep '^\([^;]*;\)\{2\}Bell;' test.txt
1;Evan;Bell;39;Obigod Manor;Ekjipih;TN;25008
Or if the first column is the index (not counting it as a column):
grep '^\([^;]*;\)\{3\}39;' test.txt
1;Evan;Bell;39;Obigod Manor;Ekjipih;TN;25008
Even using grep in this case leads to a pretty ugly solution.
Edit: Didn't see Charles Duffy's answer... that's pretty clever.

Searching the first field and getting the output of all records where the first field is the same.

I have a pretty big text file. This file contains words and a number of definitions given for the words. There are 60 words which are repeated 17 times. The words are always in the first field and the definitions in the following fields adjacent to the words.
Example:
hand;extremity of the body;that which is commonly used to write with
paper;thin sheet made of wood pulp;material used to write things on;some other def's
book;collection of pages on a topic;publication of knowledge;concatenated paper with text
ham;that which comes from pork;a tasty meat;a type of food
anotherword;defs;defs;defs;defs
it continues until it reaches the 60th word then restarts with the same 60 words and different definitions. The order isn't always the same so the next 60 might be
book;defs;defs;defs
television;defs;defs;defs
ham;defs;defs;defs;defs;defs
paper;defs;defs
the field separator for this file is ";" and there is a empty record in between each record as shown in the examples.
What I want to do is look at the first field and output the records with the same first field.
Example:
ham;defs;defs;defs;defs;defs
ham;defs;defs;defs
ham;defs;defs;defs;defs
ham;defs;defs;
ham;defs;defs;defs
ham;defs;
ham;defs;defs
ham;defs;defs;defs;defs
paper;defs;defs;defs;defs
paper;defs;defs;defs
paper;defs;defs;
and so on.
I apologize if this isn't clear. Please help!
simple grep and sort command can that for you... try as below....
Explanation:
# ^$ will search for blank lines and -v will reverse that search ... so you get all lines which has data
# passing that data to sort command will sort it...
# -t option of sort for delimiter and -k option take which column it need to sort
grep -v ^$ yourfile.txt | sort -t";' -k1
# And if you expect duplicate lines also, meaning same lines multiple time but need it only 1 time... then pipe to the uniq command as below
grep -v ^$ yourfile.txt | sort -t";" -k1 | uniq
For your sample data I get the output as below....
$ grep -v ^$ mysamplefile.txt | sort -t";" -k1 | uniq
anotherword;defs;defs;defs;defs
book;collection of pages on a topic;publication of knowledge;concatenated paper with text
book;defs;defs;defs
ham;defs;defs;defs;defs;defs
ham;that which comes from pork;a tasty meat;a type of food
hand;extremity of the body;that which is commonly used to write with
paper;defs;defs
paper;thin sheet made of wood pulp;material used to write things on;some other def's
television;defs;defs;defs

Bash script compare values from 2 files and print output values from one file

I have two files like this;
File1
114.4.21.198,cl_id=1J3W7P7H0S3L6g85900g736h6_101ps
114.4.21.205,cl_id=1O3M7A7Q0S3C6h85902g7b3h7_101pf
114.4.21.205,cl_id=1W3C7Z7W0U3J6795197g177j9_117p1
114.4.21.213,cl_id=1I3A7J7N0M3W6e950i7g2g2i0_1020h
File2
cl_id=1B3O7M6C8T4O1b559i2g930m0_1165d
cl_id=1X3J7M6J0W5S9535180h90302_101p5
cl_id=1G3D7X6V6A7R81356e3g527m9_101nl
cl_id=1L3J7R7O0F0L74954h2g495h8_117qk
cl_id=1L3J7R7O0F0L74954h2g495h8_117qk
cl_id=1J3W7P7H0S3L6g85900g736h6_101ps
cl_id=1W3C7Z7W0U3J6795197g177j9_117p1
cl_id=1I3A7J7N0M3W6e950i7g2g2i0_1020h
cl_id=1Q3Y7Q7J0M3E62953e5g3g5k0_117p6
I want to compare cl_id values that exist on file1 but not exist on file2 and print out the first values from file1 (IP Address).
it should be like this
114.4.21.198
114.4.21.205
114.4.21.205
114.4.21.213
114.4.23.70
114.4.21.201
114.4.21.211
120.172.168.36
I have tried awk,grep diff, comm. but nothing come close. Please tell the correct command to do this.
thanks
One proper way to that is this:
grep -vFf file2 file1 | sed 's|,cl_id.*$||'
I do not see how you get your output. Where does 120.172.168.36 come from.
Here is one solution to compare
awk -F, 'NR==FNR {a[$0]++;next} !a[$1] {print $1}' file2 file1
114.4.21.198
114.4.21.205
114.4.21.205
114.4.21.213
Feed both files into AWK or perl with field separator=",". If there are two fields, add the fields to a dictionary/map/two arrays/whatever ("file1Lines"). If there is just one field (this is file 2), add it to a set/list/array/whatever ("file2Lines"). After reading all input:
Loop over the file1Lines. For each element, check whether the key part is present in file2Lines. If not, print the value part.
This seems like what you want to do and might work, efficiently:
grep -Ff file2.txt file1.txt | cut -f1 -d,
First the grep takes the lines from file2.txt to use as patterns, and finds the matching lines in file1.txt. The -F is to use the patterns as literal strings rather then regular expressions, though it doesn't really matter with your sample.
Finally the cut takes the first column from the output, using , as the column delimiter, resulting in a list of IP addresses.
The output is not exactly the same as your sample, but the sample didn't make sense anyway, as it contains text that was not in any of the input files. Not sure if this is what you wanted or something more.

Resources