This question already has an answer here:
replace columns UNIX with awk
(1 answer)
Closed 6 years ago.
I have this file:
16492674422392|Alberto|Parra|female|1985-09-22|2012-09-01T01:30:59.228+0000|190.96.12.239|Chrome
> 16492674424948|Peng|Chen|female|1984-07-26|2012-09-23T00:51:52.900+0000|1.4.10.198|Internet
> Explorer
> 16492674425075|Changpeng|Xu|female|1984-03-27|2012-10-02T03:55:00.946+0000|1.50.15.119|Firefox
> 16492674425398|Prince|Kobayashi|male|1989-08-07|2012-09-30T03:30:41.772+0000|14.101.89.18|Chrome
> 16492674426410|Yang|Wei|male|1980-07-01|2012-10-01T13:11:48.528+0000|27.144.204.193|Firefox
I want the user to:
choose an id (id is the first column)
choose a column and
change the value to one value chosen by the user.
I use:
./tool.sh 16492674426410 3 replacement
as the inputs, and the code I run is:
awk -v antik1=$1 -v antik2=$2 '
{
sub(antik1, antik2);
print;
}' persons.dat.txt
This script doesn't let the user choose the column and id. How can I modify it so it works as I want?
Give this tested version a try:
#!/bin/bash --
awk -v anid="${1}" -v antik1="${2}" -v antik2="${3}" '
BEGIN {
FS="|";
OFS="|";
}
{
if ($1 == anid) {
$antik1=antik2;
}
print;
}' persons.dat.txt
The test:
$ ./tool.sh 16492674426410 3 replacement
16492674422392|Alberto|Parra|female|1985-09-22|2012-09-01T01:30:59.228+0000|190.96.12.239|Chrome
16492674424948|Peng|Chen|female|1984-07-26|2012-09-23T00:51:52.900+0000|1.4.10.198|Internet Explorer
16492674425075|Changpeng|Xu|female|1984-03-27|2012-10-02T03:55:00.946+0000|1.50.15.119|Firefox
16492674425398|Prince|Kobayashi|male|1989-08-07|2012-09-30T03:30:41.772+0000|14.101.89.18|Chrome
16492674426410|Yang|replacement|male|1980-07-01|2012-10-01T13:11:48.528+0000|27.144.204.193|Firefox
Related
I am processing text files with thousands of records per file. Each record is made up of two lines: a header that starts with ">" and followed by a line with a long string of characters "-AGTCNR".
Here is how a simple file looks like:
>ACML500-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_-2
----TAAGATTTTGACTTCTTCCCCCATCATCAAGAAGAATTGT-------
>ACRJP458-10|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----------TCCCTTTAATACTAGGAGCCCCTGACATAGCCTTTCCTAAATAAT-----
>ASILO303-17|Dip|gs-Par|sp-Par vid|subsp-NA|co
-----TAAGATTCTGATTACTCCCCCCCTCTCTAACTCTTCTTCTTCTATAGTAGATG
>ASILO326-17|Dip|gs-Goe|sp-Goe par|subsp-NA|c
TAAGATTTTGATTATTACCCCCTTCATTAACCAGGAACAGGATGA------
>CLT100-09|Lep|gs-Col|sp-Col elg|subsp-NA|co-Buru
AACATTATATTTGGAATTT-------GATCAGGAATAGTCGGAACTTCTCTGAA------
>PMANL2431-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_
----ATGCCTATTATAATTGGAGGATTTGGAAAACCTTTAATATT----CCGAAT
>STBOD057-09|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
ATCTAATATTGCACATAGAGGAACCTCNGTATTTTTTCTCTCCATCT------TTAG
>TBBUT582-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----CCCCCTCATTAACATTACTAAGTTGAAAATGGAGCAGGAACAGGATGA
>TBBUT583-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
TAAGATTTTGACTCATTAA----------------AATGGAGCAGGAACAGGATGA
>AFBTB001-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGCTCCATCC-------------TAGAAAGAGGGG---------GGGTGA
>PMANL2431-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_
----ATGCCTATTAGGAAATTGATTAGTACCTTTAATATT----CCGAAT---
>AFBTB003-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGATTTTGACTTCTGC------CATGAGAAAGA-------------AGGGTGA
>AFBTB002-09|Cole|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
-------TCTTCTGCTCAT-------GGGGCAGGAACAGGG----------TGA
>ACRJP458-10|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----------TCCCTTTAATACTAGGAGCCCCTTTCCT----TAAATAAT-----
Now I am trying to search through the second field (line) of each record and only extract records which have up to a certain maximum number of "-" characters (referred to as gaps) at the beginning $start_gaps and end, $end_gaps, of line(field $2).
I have tried a few codes and the following came works well:
read -p "Please enter the muximum number of gaps allowed at start position: " start_gaps &&
read -p "Please enter the maximum number of gaps allowed at the end position: " end_gaps &&
awk -v start_g=$start_gaps -v end_g=$end_gaps 'BEGIN{
RS="\n>"; FS="\n"; ORS="\n"; OFS="\n"; }; (x=start_g+1)(y=end_g+1) {
if ( match($2, "^-{5,}") && match($2, "-{6,}$") ) {
next} else {print x y ">"$0}}' infile > outfile
But I need to keep using variable numbers without explicitly editing the script every time i am conducting the regex pattern matching. So i tried the following but the regex do not accept variables. What is the best work around to this?
read -p "Please enter the muximum number of gaps allowed at start position: " start_gaps &&
read -p "Please enter the maximum number of gaps allowed at the end position: " end_gaps &&
awk -v start_g=$start_gaps -v end_g=$end_gaps 'BEGIN{
RS="\n>"; FS="\n"; ORS="\n"; OFS="\n"; }; (x=start_g+1)(y=end_g+1) {
if ( match($2, "^-{x,}") && match($2, "-{y,}$") ) {
next} else {print x y ">"$0}}' infile > outfile
Expected results:
>ASILO303-17|Dip|gs-Par|sp-Par vid|subsp-NA|co
-----TAAGATTCTGATTACTCCCCCCCTCTCTAACTCTTCTTCTTCTATAGTAGATG
>ASILO326-17|Dip|gs-Goe|sp-Goe par|subsp-NA|c
TAAGATTTTGATTATTACCCCCTTCATTAACCAGGAACAGGATGA------
>CLT100-09|Lep|gs-Col|sp-Col elg|subsp-NA|co-Buru
AACATTATATTTGGAATTT-------GATCAGGAATAGTCGGAACTTCTCTGAA------
>PMANL2431-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_
----ATGCCTATTATAATTGGAGGATTTGGAAAACCTTTAATATT----CCGAAT
>STBOD057-09|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
ATCTAATATTGCACATAGAGGAACCTCNGTATTTTTTCTCTCCATCT------TTAG
>TBBUT582-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----CCCCCTCATTAACATTACTAAGTTGAAAATGGAGCAGGAACAGGATGA
>TBBUT583-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
TAAGATTTTGACTCATTAA----------------AATGGAGCAGGAACAGGATGA
>AFBTB001-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGCTCCATCC-------------TAGAAAGAGGGG---------GGGTGA
>PMANL2431-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_
----ATGCCTATTAGGAAATTGATTAGTACCTTTAATATT----CCGAAT---
>AFBTB003-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGATTTTGACTTCTGC------CATGAGAAAGA-------------AGGGTGA
match sets variable RLENGTH to the matched substring's length, make use of it. Also, you don't need a multi-char RS for this.
awk -v start_g="$start_gaps" -v end_g="$end_gaps" '
/^>/ { hdr=$0; next }
match($0,/^-*/) && RLENGTH<=start_g && match($0,/-*$/) && RLENGTH<=end_g { print hdr; print }
' file
This question already has answers here:
Filter log file entries based on date range
(5 answers)
Closed 5 years ago.
I have the following data located in a .csv file that changes as new data is downloaded.
The syntax of the data is always YYYY-MM-DDTHHMMSS, examples below:
2017-12-08T194949
2017-12-08T194952
2017-12-08T195000
2017-12-08T195007
2017-12-08T195007
2017-12-08T195014
2017-12-08T195016
2017-12-08T195016
2017-12-08T195016
2017-12-08T195016
2017-12-08T195021
2017-12-08T195026
2017-12-08T195029
2017-12-08T195030
2017-12-08T195030
2017-12-08T195034
2017-12-08T195051
2017-12-08T195101
2017-12-08T195105
2017-12-08T195135
2017-12-08T195138
2017-12-08T195140
2017-12-08T195144
2017-12-08T195148
2017-12-08T195154
2017-12-08T195204
2017-12-08T195205
2017-12-08T195219
2017-12-08T195223
2017-12-08T195224
2017-12-08T195225
Currently, I define my datestrings using:
lower_bound=`date -d '1 day ago' "+%Y-%m-%dT%H%M%S"`
upper_bound=`date -d '12 hours ago' "+%Y-%m-%dT%H%M%S"`
Where the amount of minutes I lookback into the file is dependent on the system time. I can set the amount I lookback to be arbitrary.
I think I have gotten close with sed/awk as follows:
sed -n "/$lower_bound/,/$upper_bound/p" data.csv
awk -v a="$lower_bound" -v b="$upper_bound" '/a/{flag=1;next}/b/{flag=0}flag' data.csv
Given those lookback strings, the commands above should print out the range of dates in between the two variables, $lower_bound and $upper_bound. Obviously, I have experimented with different lookback times in the aforementioned variables.
Any ideas to why the range of dates aren't printing? Any help would be greatly appreciated; thank you in advance.
This: /a/ will match the literal "a". This: $0 ~ a will match the string you have stored in variable a, so your command should be:
awk -v a="$lower_bound" -v b="$upper_bound"
'$0 ~ a {flag=1;next} $0 ~ b {flag=0} flag' data.csv
But these awk/sed commands will not give you what you want because only accidentally they could match lines, in case the exact datetime bounds exist in your logs. More probably, the exact lower bound will not exist, so flag will never be set.
If you want to print for that date range then you should make an alphabetical
comparison of these dates, that means $0 > a and $0 < b
awk -v a="$lower_bound" -v b="$upper_bound" '$0 > a && $0 < b' data.csv
I'm trying to split a file that contains multiple SSL certificates with AWK but is showing an error message:
awk: too many output files 10
Command that I'm using is the following:
cat ${SSL_CERTIFICATES_PATH} | awk '/BEGIN/ { i++; } /BEGIN/, /END/ { print > i ".extracted.crt" }'
Error Message:
awk: too many output files 10
record number 735
Do you know how could I solve this issue?
You have to close() file,
awk '/BEGIN/ {f=i++".extracted.crt"}/BEGIN/,/END/{print > f;if(/END/)close(f)}'
The Best solution as suggested by Ed Morton, one should not use range expressions, for more details Read Here
awk '/BEGIN/{f=(++i)".extracted.crt"} f{print>f} /END/{close(f);f=""}'
Here is sample (not certificate)
Input
$ cat file
BEGIN
1
END
BEGIN
2
END
BEGIN
3
END
Execution
$ awk '/BEGIN/{f=i++".extracted.crt"}/BEGIN/,/END/{print > f;if(/END/)close(f)}' file
$ awk '/BEGIN/{f=(++i)".extracted.crt"} f{print>f} /END/{close(f);f=""}' file
Output files
$ ls *.crt
0.extracted.crt 1.extracted.crt 2.extracted.crt
File contents of each
$ for i in *.crt; do echo $i; cat $i; done
0.extracted.crt
BEGIN
1
END
1.extracted.crt
BEGIN
2
END
2.extracted.crt
BEGIN
3
END
We have to close the files each time variable i's value gets increases by 1, so try following and let me know if this helps you.
awk '/BEGIN/ {close(i".extracted.crt");i++} /BEGIN/, /END/ { print > i".extracted.crt" }' ${SSL_CERTIFICATES_PATH}
EDIT: Xavier, I have checked with a friend who has SUN 5 with him and following worked well without any error. You could put variable as per your need.
/usr/xpg4/bin/awk '/BEGIN/ {close(i".extracted.crt");i++} /BEGIN/, /END/ { print > i".extracted.crt" }' *.crt
I need to query the information of about 1000 files in once.
For example
My filename is
Test_001_20150517
Test_001_20150530
Information inside the file
{
1=2015
2=8
3=4
4=98888
5=123456
}
{
1=2014
2=456
3=5588
4=95858
5=67889
}
I want to query these 2 files with the conditions that 1=2015 and only show the result of 5
cat *201505*|awk -F '=' '{if ($1=="5"){print $2}}'
I'm trying to show the result but there is no condition that 1=2015 I don't know what should I do because 1 and 5 is the same as $1.
Sorry for my poor English if there is something wrong or misunderstand in my question.
Is this what you want?
$ awk -F'=' '{a[$1]=$2} /}/ && (a[1] == 2015) {print a[5]}' file
123456
Okay, I have two files: one is baseline and the other is a generated report. I have to validate a specific string in both the files match, it is not just a single word see example below:
.
.
name os ksd
56633223223
some text..................
some text..................
My search criteria here is to find unique number such as "56633223223" and retrieve above 1 line and below 3 lines, i can do that on both the basefile and the report, and then compare if they match. In whole i need shell script for this.
Since the strings above and below are unique but the line count varies, I had put it in a file called "actlist":
56633223223 1 5
56633223224 1 6
56633223225 1 3
.
.
Now from below "Rcount" I get how many iterations to be performed, and in each iteration i have to get ith row and see if the word count is 3, if it is then take those values into variable form and use something like this
I'm stuck at the below, which command to be used. I'm thinking of using AWK but if there is anything better please advise. Here's some pseudo-code showing what I'm trying to do:
xxxxx=/root/xxx/xxxxxxx
Rcount=`wc -l $xxxxx | awk -F " " '{print $1}'`
i=1
while ((i <= Rcount))
do
record=_________________'(Awk command to retrieve ith(1st) record (of $xxxx),
wcount=_________________'(Awk command to count the number of words in $record)
(( i=i+1 ))
done
Note: record, wcount values are later printed to a log file.
Sounds like you're looking for something like this:
#!/bin/bash
while read -r word1 word2 word3 junk; do
if [[ -n "$word1" && -n "$word2" && -n "$word3" && -z "$junk" ]]; then
echo "all good"
else
echo "error"
fi
done < /root/shravan/actlist
This will go through each line of your input file, assigning the three columns to word1, word2 and word3. The -n tests that read hasn't assigned an empty value to each variable. The -z checks that there are only three columns, so $junk is empty.
I PROMISE you you are going about this all wrong. To find words in file1 and search for those words in file2 and file3 is just:
awk '
NR==FNR{ for (i=1;i<=NF;i++) words[$i]; next }
{ for (word in words) if ($0 ~ word) print FILENAME, word }
' file1 file2 file3
or similar (assuming a simple grep -f file1 file2 file3 isn't adequate). It DOES NOT involve shell loops to call awk to pull out strings to save in shell variables to pass to other shell commands, etc, etc.
So far all you're doing is asking us to help you implement part of what you think is the solution to your problem, but we're struggling to do that because what you're asking for doesn't make sense as part of any kind of reasonable solution to what it sounds like your problem is so it's hard to suggest anything sensible.
If you tells us what you are trying to do AS A WHOLE with sample input and expected output for your whole process then we can help you.
We don't seem to be getting anywhere so let's try a stab at the kind of solution I think you might want and then take it from there.
Look at these 2 files "old" and "new" side by side (line numbers added by the cat -n):
$ paste old new | cat -n
1 a b
2 b 56633223223
3 56633223223 c
4 c d
5 d h
6 e 56633223225
7 f i
8 g Z
9 h k
10 56633223225 l
11 i
12 j
13 k
14 l
Now lets take this "actlist":
$ cat actlist
56633223223 1 2
56633223225 1 3
and run this awk command on all 3 of the above files (yes, I know it could be briefer, more efficient, etc. but favoring simplicity and clarity for now):
$ cat tst.awk
ARGIND==1 {
numPre[$1] = $2
numSuc[$1] = $3
}
ARGIND==2 {
oldLine[FNR] = $0
if ($0 in numPre) {
oldHitFnr[$0] = FNR
}
}
ARGIND==3 {
newLine[FNR] = $0
if ($0 in numPre) {
newHitFnr[$0] = FNR
}
}
END {
for (str in numPre) {
if ( str in oldHitFnr ) {
if ( str in newHitFnr ) {
for (i=-numPre[str]; i<=numSuc[str]; i++) {
oldFnr = oldHitFnr[str] + i
newFnr = newHitFnr[str] + i
if (oldLine[oldFnr] != newLine[newFnr]) {
print str, "mismatch at old line", oldFnr, "new line", newFnr
print "\t" oldLine[oldFnr], "vs", newLine[newFnr]
}
}
}
else {
print str, "is present in old file but not new file"
}
}
else if (str in newHitFnr) {
print str, "is present in new file but not old file"
}
}
}
.
$ awk -f tst.awk actlist old new
56633223225 mismatch at old line 12 new line 8
j vs Z
It's outputing that result because the 2nd line after 56633223225 is j in file "old" but Z in file "new" and the file "actlist" said the 2 files had to be common from one line before until 3 lines after that pattern.
Is that what you're trying to do? The above uses GNU awk for ARGIND but the workaround is trivial for other awks.
Use the below code:
awk '{if (NF == 3) { word1=$1; word2=$2; word3=$3; print "Words are:" word1, word2, word3} else {print "Line", NR, "is having", NF, "Words" }}' filename.txt
I have given the solution as per the requirement.
awk '{ # awk starts from here and read a file line by line
if (NF == 3) # It will check if current line is having 3 fields. NF represents number of fields in current line
{ word1=$1; # If current line is having exact 3 fields then 1st field will be assigned to word1 variable
word2=$2; # 2nd field will be assigned to word2 variable
word3=$3; # 3rd field will be assigned to word3 variable
print word1, word2, word3} # It will print all 3 fields
}' filename.txt >> output.txt # THese 3 fields will be redirected to a file which can be used for further processing.
This is as per the requirement, but there are many other ways of doing this but it was asked using awk.