Passing parameter as control number and get table name - bash

I have a scenario where there is a file with control number and table name, hereby an example:
1145|report_product|N|N|
1156|property_report|N|N
I need to pass the control number as 1156 and have to get table name as PR once I get the table name as PR then I need to add some text on that.
Please help

Assuming the controll file is:
# cat controlfile.txt
1145|report_product|N|N
1156|property_report|N|N
To fine some line you can use:
grep 1156 controlfile.txt
If needed you can save it to a variable: result=$(grep 1156 file.txt)
Assuming you need to add append something on this line.... you can use:
sed '/^1156/s/$/ 123/' controlfile.txt
This example will add "123" at the end of line that start with 1156
If needed, add more details like what output you want or anything else to help us better understand your need.

You need to work in two stages:
You need to find the line, containing 1156.
You need to get the information from that line.
In order to find the line (as already indicated by Juranir), you can use grep:
Prompt> grep "1156" control.txt
1156|property_report|N|N
In order to get the information from that line, you need to get the second column, based on the vertical line (often referred as a "pipe" character), for which there are different approaches. I'll give you two:
The cut approach: you can cut a line into different parts and take a character, a byte, a column, .... In this case, this is what you need:
grep "1156" control.txt | cut -d '|' -f 2
-d '|' : use the vertical line as a column separator
-f 2 : show the second field (column)
The awk approach: awk is a general "text modifier" with multiple features (showing parts of text, performing basic calculations, ...). For this case, it can be used as follows:
grep "1156" control.txt | awk -F '|' '{print $2}'
-F '|' : use the vertical line as a column separator
'{print $2}' : the awk script for showing the second field.
Oh, by the way, I've edited your question. You might press the edit button in order to learn how I did this :-)
For getting only the first letters, separated by the underscores:
grep "1156" control.txt | awk -F '|' '{print $2}' | awk -F '_' '{print substr($1,1,1) substr($2,1,1)}'
(something like that)

Related

Using the result of grep to sort files

I need to sort a file based on the results of grep. Example:
cat cuts.txt | grep -P '(?<=[+]).*(?=[+])'
text +124+ text
text +034+ text
text +334+ text
How do I sort lines in crescent order based on what grep found?
Could you please try following, written and tested with shown samples. Considering that you need to sort by 2nd field's increasing values. Since OP mentioned +digits+ values could be present anywhere in line hence coming with this Generic solution here.
grep -P '(?<=[+]).*(?=[+])' Input_file |
awk '
match($0,/\+[0-9]+\+/){
print substr($0,RSTART,RLENGTH),$0
}
' | sort -k1.2 | cut -d' ' -f2-
Output will be as follows.
text +034+ text
text +124+ text
text +334+ text
Logical explanation: After passing grep command's output to awk command using regex in awk command to find +digits+ values in lines and printing 1st matched values then whole line, by doing this it will be easy for sort to why because we always get to sort now on 1st field. Once we do sorting on first field then use cut to get everything from 2nd field onwards, why because 1st field is an extra added field by awk command to make sort easier and not needed in actual output.
Also we need NOT to use a separate cat command to this one, we could directly read Input_file by grep command.

sed/Awk/cut... How to decide which to use to parse Docker output?

My output:
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
jenkins/jenkins lts 806f56c84444 8 days ago 703MB
mongo latest 0da05d84b1fe 2 weeks ago 394MB
I would like to just cut the image ID alone from the output.
I tried using cut:
docker images | cut -d " " -f1
REPOSITORY
jenkins/jenkins
The -f1 just gives me the repository names, if I use -f3 it tends to be empty. Since the delimiter is not a single space I don't see how to get the desired output.
Can we cut based on field names?
I read the documentation and did not see anything relevant. I also saw that there is a way to achieve this using sed/AWK which i'm still figuring out.
In the meanwhile is there a easier way to achieve this using the cut command?
I'm new to Unix/Linux, how can I determine which of Sed/AWK/Cut to prefer?
Your input seems to have a fixed width of 20 chars for each field, so you can make use of gawk's FIELDWIDTHS feature.
$ awk -v FIELDWIDTHS="20 20 20 20 20" '{ print $3 }' file
IMAGE ID
806f56c84444
0da05d84b1fe
$
$ awk -v FIELDWIDTHS="20 20 20 20 20" '{ printf "%20s%20s\n", $1, $3 }' file
REPOSITORY IMAGE ID
jenkins/jenkins 806f56c84444
mongo 0da05d84b1fe
From man gawk:
If the FIELDWIDTHS variable is set to a space-separated list of numbers, each field is expected to have fixed width, and gawk splits up the record using the specified widths. Each field width may optionally be preceded by a colon-separated value specifying the number of characters to skip before the field starts. The value of FS is ignored. Assigning a new value to FS or FPAT overrides the use of FIELDWIDTHS.
You have to "squeeze" the space padding in the default output to single space.
1 2 == 1-space-space-2 == Field 1 before 1st space, Field between 1st and 2nd space, Field 3 after 2nd space.
cut -d' ' -f1 ==> '1'
cut -d' ' -f2 ==> '' empty field between 1st and 2nd delimiter
cut -d' ' -f3 ==> '2'
So, in your case use sed to replace consecutive spaces with 1:
docker images | sed 's/ */ /g' | cut -d " " -f1,3
If the output is fixed columns widths, then you can use this variant of cut:
docker images | cut -c1-20,41-60
This will cut out columns 41 to 60, where we find the Image ID.
If ever the output uses TAB for padding, you should use expand -t n to make the output consistently space padded then apply the appropriate cut -cx,y, e.g. (numbers may need adjusting):
docker images | expand -t 4 | cut -c1-20,41-60
Try this:
docker images | tr -s ' ' | cut -f3 -d' '
The command tr -s ' ' convert multiple spaces into a single one and after with cut you can grab your field. This work fine if values in your field haven't spaces.
With Procedural Text Edit it's :
forEach line {
if (contains ci "REPOSITORY") { remove }
keepRange word 2 1
}
removeEmptyLines // <- optional
In the general case, avoid parsing output meant for human consumption. Many modern utilities offer an option to produce output in some standard format like JSON or XML, or even CSV (though that is less strictly specified, and exists in multiple "dialects").
docker in particular has a generalized --format option which allows you to specify your own output format:
docker images --format "{{.ID}}"
If you cannot avoid writing your own parser (are you really sure!? Look again!), cut is suitable for output with a specific single-character delimiter, or otherwise fairly regular output. For everything else, I would go with Awk. Out of the box, it parses columns from sequences of whitespace, so it does precisely what you specifically ask for:
docker images | awk 'NR>1 { print $3 }'
(NR>1 skips the first line, which contains the column headers.)
In the case of fixed-width columns, it allows you to pull out a string by index:
docker images | awk 'NR>1 { print substr($0, 41, 12) }'
... though you could do that with cut, too:
docker images | cut -c41-53
... but notice that Docker might adjust column widths depending on your screen size!
Awk lets you write regular expression extractions, too:
awk 'NR>1 { sub(/^([^[:space:]]*[[:space:]]+){2}/, ""); sub(/[[:space]].*/, ""); print }'
This is where it overlaps with sed:
sed -n '2,$s/^[^ ]\+[ ]\+[^ ]\+[ ]\+\([^ ]\+\)[ ].*/\1/p'
though sed is significantly less human-readable, especially for nontrivial scripts. (This is still pretty trivial.)
If you haven't used regex before, the above will seem cryptic, but it really isn't very hard to pick apart. We are looking for sequences of non-spaces (a field in a column) followed by sequences of spaces (a column separator) - two before the ID field and whatever comes after it, starting from the first space after the ID column.
If you want to learn shell scripting, you should probably also learn at least the basics of Awk (and a passing familiarity with sed). If you just want to get the job done, and perhaps aren't specifically interested in learning U*x tools (though you probably should be anyway!), perhaps instead learn a modern scripting language like Python or Ruby.
... Here's a Python docker library:
import docker
client = docker.from_env()
for image in client.images.list():
print(image.id)
Can we cut based on field names? No.
How can I determine which of Sed/AWK/Cut to prefer? YMMV. For this particular input where fields are separated by two or more spaces, using awk you could set field separator to " +" (two or more spaces), look for desired field name (IMAGE ID below) and print only that particular field:
$ awk -F" +" ' # set field separator
{
if(f=="") # while we have not determined the desired field
for(i=1;i<=NF;i++) # ... keep looking
if($i=="IMAGE ID")
f=i
if(f!="") # once found
print $f # start printing it
}' file
Output:
IMAGE ID
806f56c84444
0da05d84b1fe
As one-liner:
$ awk -F" +" '{if(f=="")for(i=1;i<=NF;i++)if($i=="IMAGE ID")f=i;if(f!="")print $f}' file

Unix: Find duplicate occurrences in column in csv file, omit one possible value

I am hoping for a line or two of code for a bash script to find and print repeated items in a column in 2.5G csv file except for an item that I know is commonly repeated.
The data file has a header, but it is not duplicated, so I'm not worried about code that accounts for the header being present.
Here is an illustration of what the data look like:
header,cat,Everquest,mermaid
1f,2r,7g,8c
xc,7f,66,rp
Kf,87,gH,||
hy,7f,&&,--
rr,2r,89,))
v6,2r,^&,!c
92,#r,hd,m
2r,2r,2r,2r
7f,7f,7f,7f
9,10,11,12
7f,2r,7f,7f
76,#r,88,u|
I am seeking the output:
7f
#r
as both of these are duplicated in column two. As you can see, 2r is also duplicated, but it is commonly duplicated and I know it, so I just want to ignore it.
To be clear, I can't know the values of the duplicates other than the common one, which, in my real data files, is actually the word 'none'. It's '2r' above.
I read here that I can do something like
awk -F, ' ++A[$2] > 1 { print $2; exit 1 } ' input.file
However, I cannot figure out how to skip '2r' nor what ++A means.
I have read the awk manual, but I am afraid I find it a little confusing with respect to the question I am asking.
Additionally,
uniq -d
looks promising based on a few other questions and answers, but I am still unsure how to skip over the value that I want to ignore.
Thank you in advance for you help.
how to skip '2r':
$ awk -F, ' ++a[$2] == 2 && $2 != "2r" { print $2 } ' file
7f
#r
++a[$2] adds an element to a hash array and increases its value by 1, ie counts how many occurrences of each value in the second column exist.
Get only the second column using cut -d, -f2
sort
uniq -d to get repeated lines
grep -Fv 2r to exclude a value, or grep -Fv -e foo -e bar … to exclude multiple values
In other words something like this:
cut -d, -f2 input.csv | sort | uniq -d | grep -Fv 2r
Depending on the data it might be faster if you move grep earlier in the pipeline, but you should verify that with some benchmarking.

Can I use grep to extract a single column of a CSV file?

I'm trying to solve o problem I have to do as soon as possible.
I have a csv file, fields separated by ;.
I'm asked to make a shell command using grep to list only the third column, using regex. I can't use cut. It is an exercise.
My file is like this:
1;Evan;Bell;39;Obigod Manor;Ekjipih;TN;25008
2;Wayne;Watkins;22;Lanme Place;Cotoiwi;NC;86578
3;Danny;Vega;25;Fofci Center;Momahbih;MS;21027
4;Larry;Robinson;23;Bammek Boulevard;Gaizatoh;NE;27517
5;Myrtie;Black;20;Savon Square;Gokubpat;PA;92219
6;Nellie;Greene;23;Utebu Plaza;Rotvezri;VA;17526
7;Clyde;Reynolds;19;Lupow Ridge;Kedkuha;WI;29749
8;Calvin;Reyes;47;Paad Loop;Beejdij;KS;29247
9;Douglas;Graves;43;Gouk Square;Sekolim;NY;13226
10;Josephine;Estrada;48;Ocgig Pike;Beheho;WI;87305
11;Eugene;Matthews;26;Daew Drive;Riftemij;ME;93302
12;Stanley;Tucker;54;Cure View;Woocabu;OH;45475
13;Lina;Holloway;41;Sajric River;Furutwe;ME;62184
14;Hettie;Carlson;57;Zuheho Pike;Gokrobo;PA;89098
15;Maud;Phelps;57;Lafni Drive;Gokemu;MD;87066
16;Della;Roberson;53;Zafe Glen;Celoshuv;WV;56749
17;Cory;Roberson;56;Riltav Manor;Uwsupep;LA;07983
18;Stella;Hayes;30;Omki Square;Figjitu;GA;35813
19;Robert;Griffin;22;Kiroc Road;Wiregu;OH;39594
20;Clyde;Reynolds;19;Lupow Ridge;Kedkuha;WI;29749
21;Calvin;Reyes;47;Paad Loop;Beejdij;KS;29247
22;Douglas;Graves;43;Gouk Square;Sekolim;NY;13226
23;Josephine;Estrada;48;Ocgig Pike;Beheho;WI;87305
24;Eugene;Matthews;26;Daew Drive;Riftemij;ME;93302
I think I should use something like: cat < test.csv | grep 'regex'.
Thanks.
Right Tools For The Job: Using awk or cut
Assuming you want to match the third column against a specific field:
awk -F';' '$3 ~ /Foo/ { print $0 }' file.txt
...will print any line where the third field contains Foo. (Changing print $0 to print $3 would print only that third field).
If you just want to print the third column regardless, use cut: cut -d';' -f3 <file.txt
Wrong Tool For The Job: Using GNU grep
On a system where grep has the -o option, you can chain two instances together -- one to trim everything after the fourth column (and remove lines with less than four columns), another to take only the last remaining column (thus, the fourth):
str='foo;bar;baz;qux;meh;whatever'
grep -Eo '^[^;]*[;][^;]*[;][^;]*[;][^;]*' <<<"$str" \
| grep -Eo '[^;]+$'
To explain how that works:
^, outside of square brackets, matches only at the beginning of a line.
[^;]* matches any character except ; zero-or-more times.
[;] matches only the character ;.
...thus, each [^;]*[;] in the regex matches a single field, whether or not that field contains text. Putting four of those in the first stage means we're matching only fields, and grep -o tells grep to only emit content it was successfully able to match.
If you just need the 3rd field and it's always properly delimited with ';' why not use 'cut'?
cut -d';' -f3 <filename>
UPDATED:
OP wasn't clear, maybe only want to look at the 3rd line?
head -3 <filename> | tail -1
OR.. Maybe just getting of list of the things that appear in the 3rd field?
Not clear what the intended use of 'grep' would be??
cut -d';' -f3 <filename> | sort -u
As the other answers have said, using grep is a bad/unfortunate idea.
The only way I can think of using grep is to pull out a specific row where the 3rd column == some value. E.g.,
grep '^\([^;]*;\)\{2\}Bell;' test.txt
1;Evan;Bell;39;Obigod Manor;Ekjipih;TN;25008
Or if the first column is the index (not counting it as a column):
grep '^\([^;]*;\)\{3\}39;' test.txt
1;Evan;Bell;39;Obigod Manor;Ekjipih;TN;25008
Even using grep in this case leads to a pretty ugly solution.
Edit: Didn't see Charles Duffy's answer... that's pretty clever.

Unix scripting : Writing to another file with ":" is failing

I have below record (and many other such records) in one file
9460 xyz abc (lmn):1027739543798. Taxpayer's identification number (INN): 123. For all IIB. 2016/02/03
I need to search for the keyword IIB. If it matches, then I need to take that entire record and write to another file.
Below is the code which already exists. This code is not working. Problem with this code is when it takes the full matched
record, it is ignoring the text which falls after ":" and writing to another file.
cat keyword.cfg | while read KwdName
do
echo "KEYWORD:"${KwdName} //This prints IIB
grep "^${KwdName}\|${KwdName}\|~${KwdName}~\|:${KwdName}$\|:${KwdName}~" ${mainFileWithListOfRecords} | awk -F ":" '{print $1}' >> ${destinationFile}
done
So, instead of writing below record to destination file
9460 xyz abc (lmn):1027739543798. Taxpayer's identification number (INN): 123. For all IIB. 2016/02/03
It is only writing,
9460 xyz abc (lmn)
cat -vte mainFileWithListOfRecords gives below output
9460^IMEZHPROMBANK^I^ICJSC ;IIB;~ Moscow, (lmn): 1027739543798. Taxpayer's identification number (INN): 123. For all IIB. 2016/02/031#msid=s1448434872350^IC1^I2000/12/28^I2015/11/26^I^I$
The short fix is replacing
awk -F ":" '{print $1}'
with
cut -d ":" -f2-
But what are you cutting? Maybe ${mainFileWithListOfRecords} is a variabele with a list of files. In that case grep will show the matching file in front of its matches. You can change that with the -h option.
The result is that you do not need to cut or awk:
grep -h "${KwdName}" ${mainFileWithListOfRecords} >> ${destinationFile}
(I changed the searchstring as well, with \|${KwdName}\| in your searchstring you will match KwdName in all combinations)
Of course, it cuts on the colon - you programmed it that way. In your code, you have | awk -F ":" '{print $1}', which basically means "throw away everything starting from the first colon".
If you don't want to do this, why do you explicitly request it? What was your original intention, when writing the awk command?

Resources