Unix scripting : Writing to another file with ":" is failing - shell

I have below record (and many other such records) in one file
9460 xyz abc (lmn):1027739543798. Taxpayer's identification number (INN): 123. For all IIB. 2016/02/03
I need to search for the keyword IIB. If it matches, then I need to take that entire record and write to another file.
Below is the code which already exists. This code is not working. Problem with this code is when it takes the full matched
record, it is ignoring the text which falls after ":" and writing to another file.
cat keyword.cfg | while read KwdName
do
echo "KEYWORD:"${KwdName} //This prints IIB
grep "^${KwdName}\|${KwdName}\|~${KwdName}~\|:${KwdName}$\|:${KwdName}~" ${mainFileWithListOfRecords} | awk -F ":" '{print $1}' >> ${destinationFile}
done
So, instead of writing below record to destination file
9460 xyz abc (lmn):1027739543798. Taxpayer's identification number (INN): 123. For all IIB. 2016/02/03
It is only writing,
9460 xyz abc (lmn)
cat -vte mainFileWithListOfRecords gives below output
9460^IMEZHPROMBANK^I^ICJSC ;IIB;~ Moscow, (lmn): 1027739543798. Taxpayer's identification number (INN): 123. For all IIB. 2016/02/031#msid=s1448434872350^IC1^I2000/12/28^I2015/11/26^I^I$

The short fix is replacing
awk -F ":" '{print $1}'
with
cut -d ":" -f2-
But what are you cutting? Maybe ${mainFileWithListOfRecords} is a variabele with a list of files. In that case grep will show the matching file in front of its matches. You can change that with the -h option.
The result is that you do not need to cut or awk:
grep -h "${KwdName}" ${mainFileWithListOfRecords} >> ${destinationFile}
(I changed the searchstring as well, with \|${KwdName}\| in your searchstring you will match KwdName in all combinations)

Of course, it cuts on the colon - you programmed it that way. In your code, you have | awk -F ":" '{print $1}', which basically means "throw away everything starting from the first colon".
If you don't want to do this, why do you explicitly request it? What was your original intention, when writing the awk command?

Related

Passing parameter as control number and get table name

I have a scenario where there is a file with control number and table name, hereby an example:
1145|report_product|N|N|
1156|property_report|N|N
I need to pass the control number as 1156 and have to get table name as PR once I get the table name as PR then I need to add some text on that.
Please help
Assuming the controll file is:
# cat controlfile.txt
1145|report_product|N|N
1156|property_report|N|N
To fine some line you can use:
grep 1156 controlfile.txt
If needed you can save it to a variable: result=$(grep 1156 file.txt)
Assuming you need to add append something on this line.... you can use:
sed '/^1156/s/$/ 123/' controlfile.txt
This example will add "123" at the end of line that start with 1156
If needed, add more details like what output you want or anything else to help us better understand your need.
You need to work in two stages:
You need to find the line, containing 1156.
You need to get the information from that line.
In order to find the line (as already indicated by Juranir), you can use grep:
Prompt> grep "1156" control.txt
1156|property_report|N|N
In order to get the information from that line, you need to get the second column, based on the vertical line (often referred as a "pipe" character), for which there are different approaches. I'll give you two:
The cut approach: you can cut a line into different parts and take a character, a byte, a column, .... In this case, this is what you need:
grep "1156" control.txt | cut -d '|' -f 2
-d '|' : use the vertical line as a column separator
-f 2 : show the second field (column)
The awk approach: awk is a general "text modifier" with multiple features (showing parts of text, performing basic calculations, ...). For this case, it can be used as follows:
grep "1156" control.txt | awk -F '|' '{print $2}'
-F '|' : use the vertical line as a column separator
'{print $2}' : the awk script for showing the second field.
Oh, by the way, I've edited your question. You might press the edit button in order to learn how I did this :-)
For getting only the first letters, separated by the underscores:
grep "1156" control.txt | awk -F '|' '{print $2}' | awk -F '_' '{print substr($1,1,1) substr($2,1,1)}'
(something like that)

Shell script - remove all before and after

Find the next link if the Link header contains rel=next..
Getting the link header can result in different strings.. I need to find the next link.
e.g.
Link: <http://mygithub.com/api/v3/organizations/20/repos?page=1>; rel=prev, <http://mygithub.com/api/v3/organizations/20/repos?page=3>; rel=next, <http://mygithub.com/api/v3/organizations/20/repos?page=4>; rel=last, <http://mygithub.com/api/v3/organizations/20/repos?page=1>;
would be http://mygithub.com/api/v3/organizations/20/repos?page=3
Link: <http://mygithub.com/api/v3/organizations/4/repos?page=2>; rel="next", <http://mygithub.com/api/v3/organizations/4/repos?page=2>; rel="last"
would be http://mygithub.com/api/v3/organizations/4/repos?page=2
Played with sed and parameter expansion - not that experienced so got stuck :)
Please be aware that parsing HTML with non-html tools it fraught with peril; you will see that this works, and assume you can get away with it always. You'll spend hours trying to get the next level of complexity to work, when you should be studying how to use html-aware tools. Don't say we didn't warn you (-;, but
printf "<http://mygithub.com/api/v3/organizations/20/repos?page=1>; rel=prev, <http://mygithub.com/api/v3/organizations/20/repos?page=3>; rel=next, <http://mygithub.com/api/v3/organizations/20/repos?page=4>; rel=last, <http://mygithub.com/api/v3/organizations/20/repos?page=1>;\n" \
| awk -F" " '{
for(i=1;i<=NF;i++){
if ($i == "rel=next,") {
gsub(/[<>]/,"",$(i-1);sub(/;$/,"",$(i-1))
print $(i-1)
}
}
}'
produces required output:
http://mygithub.com/api/v3/organizations/20/repos?page=3
To save the output of a script section into a variable, you wrap the code for command-substitution, in this case
nextReposLink=$( printf .... | awk '....' )
#-------------^^--------------------------^
The ^ pointed items are modern syntax for command-substitution. The code inside of $( ... ) is executed and the standard output is passed as a argument to the invoking command line. (The original syntax for command substitution is/was `cmds` and works the same in the simple case var=`cmds` . You can nest modern cmd-substitution easily, whereas the old version requires a lot of escape character fiddling. Avoid it if you can.
Note that about any s/str/rep/ that sed can do, awk can do the same, but requires the use of the sub(/regx/, "repl", "str") or gsub(sameArgs) functions. In this particular case, you may need to escape the <> like \<\>.
Be sure to always dbl-quote the use of variables, i.e. echo "$nextReposLink".
IHTH
Well - I put one of your URL strings in a text file and was able to pull out the first URL with two cuts.
[root#oelinux2 ~]# cat test
Link: <http://mygithub.com/api/v3/organizations/20/repos?page=1>; rel=prev, <http://mygithub.com/api/v3/organizations/20/repos?page=3>; rel=next, <http://mygithub.com/api/v3/organizations/20/repos?page=4>; rel=last, <http://mygithub.com/api/v3/organizations/20/repos?page=1>;
Then with using cut:
cat test | cut -d "<" -f2 | cut -d ">" -f1
[root#oelinux2 ~]# cat test | cut -d "<" -f2 | cut -d ">" -f1
http://mygithub.com/api/v3/organizations/20/repos?page=1
That's one option - if you are just looking to get the first URL in the string. Basically - that's just grabbing what's between the two delimiters "<" and ">"
With Cut:
-d is the 'delimiter'
-f is the field you want to get.
If you wanted to get a later URL in that string, you could change the fields (-f #) and see what you get :)

Can I use grep to extract a single column of a CSV file?

I'm trying to solve o problem I have to do as soon as possible.
I have a csv file, fields separated by ;.
I'm asked to make a shell command using grep to list only the third column, using regex. I can't use cut. It is an exercise.
My file is like this:
1;Evan;Bell;39;Obigod Manor;Ekjipih;TN;25008
2;Wayne;Watkins;22;Lanme Place;Cotoiwi;NC;86578
3;Danny;Vega;25;Fofci Center;Momahbih;MS;21027
4;Larry;Robinson;23;Bammek Boulevard;Gaizatoh;NE;27517
5;Myrtie;Black;20;Savon Square;Gokubpat;PA;92219
6;Nellie;Greene;23;Utebu Plaza;Rotvezri;VA;17526
7;Clyde;Reynolds;19;Lupow Ridge;Kedkuha;WI;29749
8;Calvin;Reyes;47;Paad Loop;Beejdij;KS;29247
9;Douglas;Graves;43;Gouk Square;Sekolim;NY;13226
10;Josephine;Estrada;48;Ocgig Pike;Beheho;WI;87305
11;Eugene;Matthews;26;Daew Drive;Riftemij;ME;93302
12;Stanley;Tucker;54;Cure View;Woocabu;OH;45475
13;Lina;Holloway;41;Sajric River;Furutwe;ME;62184
14;Hettie;Carlson;57;Zuheho Pike;Gokrobo;PA;89098
15;Maud;Phelps;57;Lafni Drive;Gokemu;MD;87066
16;Della;Roberson;53;Zafe Glen;Celoshuv;WV;56749
17;Cory;Roberson;56;Riltav Manor;Uwsupep;LA;07983
18;Stella;Hayes;30;Omki Square;Figjitu;GA;35813
19;Robert;Griffin;22;Kiroc Road;Wiregu;OH;39594
20;Clyde;Reynolds;19;Lupow Ridge;Kedkuha;WI;29749
21;Calvin;Reyes;47;Paad Loop;Beejdij;KS;29247
22;Douglas;Graves;43;Gouk Square;Sekolim;NY;13226
23;Josephine;Estrada;48;Ocgig Pike;Beheho;WI;87305
24;Eugene;Matthews;26;Daew Drive;Riftemij;ME;93302
I think I should use something like: cat < test.csv | grep 'regex'.
Thanks.
Right Tools For The Job: Using awk or cut
Assuming you want to match the third column against a specific field:
awk -F';' '$3 ~ /Foo/ { print $0 }' file.txt
...will print any line where the third field contains Foo. (Changing print $0 to print $3 would print only that third field).
If you just want to print the third column regardless, use cut: cut -d';' -f3 <file.txt
Wrong Tool For The Job: Using GNU grep
On a system where grep has the -o option, you can chain two instances together -- one to trim everything after the fourth column (and remove lines with less than four columns), another to take only the last remaining column (thus, the fourth):
str='foo;bar;baz;qux;meh;whatever'
grep -Eo '^[^;]*[;][^;]*[;][^;]*[;][^;]*' <<<"$str" \
| grep -Eo '[^;]+$'
To explain how that works:
^, outside of square brackets, matches only at the beginning of a line.
[^;]* matches any character except ; zero-or-more times.
[;] matches only the character ;.
...thus, each [^;]*[;] in the regex matches a single field, whether or not that field contains text. Putting four of those in the first stage means we're matching only fields, and grep -o tells grep to only emit content it was successfully able to match.
If you just need the 3rd field and it's always properly delimited with ';' why not use 'cut'?
cut -d';' -f3 <filename>
UPDATED:
OP wasn't clear, maybe only want to look at the 3rd line?
head -3 <filename> | tail -1
OR.. Maybe just getting of list of the things that appear in the 3rd field?
Not clear what the intended use of 'grep' would be??
cut -d';' -f3 <filename> | sort -u
As the other answers have said, using grep is a bad/unfortunate idea.
The only way I can think of using grep is to pull out a specific row where the 3rd column == some value. E.g.,
grep '^\([^;]*;\)\{2\}Bell;' test.txt
1;Evan;Bell;39;Obigod Manor;Ekjipih;TN;25008
Or if the first column is the index (not counting it as a column):
grep '^\([^;]*;\)\{3\}39;' test.txt
1;Evan;Bell;39;Obigod Manor;Ekjipih;TN;25008
Even using grep in this case leads to a pretty ugly solution.
Edit: Didn't see Charles Duffy's answer... that's pretty clever.

Unix Shell scripting in AIX(Sed command)

I have a text file which consists of jobname,business name and time in min seperated with '-'(SfdcDataGovSeq-IntegraterJob-43).There are many jobs in this text file. I want to search with the jobname and change the time from 43 to 0 only for that particular row and update the same text file. Kindly advise what needs to be done.
Query that i am using : (cat test.txt | grep "SfdcDataGovSeq" | sed -e 's/43/0/' > test.txt) but the whole file is getting replaced with only one line.
sed -e '/SfdcDataGovSeq/ s/43/0/' test.txt
This will only replace if the search is positive.
Agreed with Ed, Here is a workaround to put word boundaries Although Equality with awk is robust.
sed -e '/SfdcDataGovSeq/ s/\<43\>/0/g' test.txt
You should be using awk instead of sed:
awk 'BEGIN{FS=OFS="-"} $1=="SfdcDataGovSeq" && $3==43{$3=0} 1' file
Since it does full string or numeric (not regexp) matches on specific fields, the above is far more robust than the currently accepted sed answer which would wreak havoc on your input file given various possible input values.

Grep outputs multiple lines, need while loop

I have a script which uses grep to find lines in a text file (ics calendar to be specific)
My script finds a date match, then goes up and down a few lines to copy the summary and start time of the appointment into a separate variable. The problem I have is that I'm going to have multiple appointments at the same time, and I need to run through the whole process for each result in grep.
Example:
LINE=`grep -F -n 20130304T232200 /path/to/calendar.ics | cut -f1 d:`
And it outputs only the lines, such as
86 89
Then it goes on to capture my other variables, as such:
SUMMARYLINE=$(( $LINE + 5 ))
SUMMARY:`sed -n "$SUMMARYLINE"p /path/to/calendar.ics
my script runs fine with one output, but it obviously won't work with more than 1 and I need for it to. should I send the grep results into an array? a separate text file to read from? I'm sure I'll need a while loop in here somehow. Need some help please.
You can call grep from a loop quite easily:
while IFS=':' read -r LINE notused # avoids the use of cut
do
# First field is now in $LINE
# Further processing
done < <(grep -F -n 20130304T232200 /path/to/calendar.ics)
However, if the file is not too large then it might be easier to read the whole file into an array and more around that.
With your proposed solution, you are reading through the file several times. Using awk, you can do it in one pass:
awk -F: -v time=20130304T232200 '
$1 == "SUMMARY" {summary = substr($0,9)}
/^DTSTART/ {start = $2}
/^END:VEVENT/ && start == time {print summary}
' calendar.ics

Resources