Replacing a field with another value using sed or awk - bash

I have a text file with following lines say;
100 200 300
50 120 200
60 500 340
I am writing a script to change the value of any field based upon user input. If user gives the column no ($colno) and row number ($rowno) in bash, I am using the following code to determine the exact field to be replaced.
awk -v i=$colno -v j=$rowno 'NR ==j {print $i}' file
So, if the user gives $colno as 1 and $rownno as 3 I can locate the field $1 in row 3 (60).
Now I want to permanently replace this value (60) with some other user given value ($new_val).
How can I do this using either sed or awk? This bit of code will allow me to change any field specified by the user with the new value specified by the user. The code should not change any other field, even with the same value.
Please help me. Thanking you in advance.

Just replace the column with the value required,
colno=1
rowno=3
awk -v i=$colno -v j=$rowno -v newvalue=20 'NR ==j {$i=newvalue}1' file
100 200 300
50 120 200
20 500 340
The part, {$i=newvalue}1 sets the value of $1, i.e. from your current example 3rd row and 1st column to the value set in newvalue. The {}1 reconstructs each line based on the individual repalcements.

With GNU sed, to replace at line $rowno the $colnoth occurrence of non space characters with $newtext value:
sed -E "${rowno}s/[^ ]*( *)/${newtext}\1/${colno}" file

Related

Take mean of columns in text file by every 10-row blocks in bash

I have a tab delimited text file with two columns without header. Now, I want to take the mean of each column within blocks of 10 rows. This means, I take the first 10 rows, take the mean between the 10 numbers in each column and output the mean of each column into another text file. Now go further, take the next 10 rows and make the same again. Until the end of the file. If there are less than 10 rows left at the end, just take the mean of the left rows.
Input file:
0.32832977 3.50941E-10
0.31647876 3.38274E-10
0.31482627 3.36508E-10
0.31447645 3.36134E-10
0.31447645 3.36134E-10
0.31396809 3.35591E-10
0.31281157 3.34354E-10
0.312004 3.33491E-10
0.31102326 3.32443E-10
0.30771822 3.2891E-10
0.30560062 3.26647E-10
0.30413213 3.25077E-10
0.30373717 3.24655E-10
0.29636685 3.16777E-10
0.29622422 3.16625E-10
0.29590765 3.16286E-10
0.2949896 3.15305E-10
0.29414582 3.14403E-10
0.28841901 3.08282E-10
0.28820667 3.08055E-10
0.28291832 3.02403E-10
0.28243792 3.01889E-10
0.28156429 3.00955E-10
0.28043638 2.9975E-10
0.27872239 2.97918E-10
0.27833349 2.97502E-10
0.27825573 2.97419E-10
0.27669023 2.95746E-10
0.27645657 2.95496E-10
Expected output text file:
0.314611284 3.36278E-10
0.296772974 3.172112E-10
0.279535036 2.987864E-10
I tried this code, but i don't know how to include the loop for each 10th row:
awk '{x+=$1;next}END{print x/NR}' file
Here is an awk to do this:
awk -v m=10 -v OFS="\t" '
FNR%m==1{sum1=0;sum2=0}
{sum1+=$1;sum2+=$2}
FNR%m==0{print sum1/m,sum2/m; lfnr=FNR; next}
END{print sum1/(FNR-lfnr),sum2/(FNR-lfnr)}' file
Prints:
0.314611 3.36278e-10
0.296773 3.17211e-10
0.279535 2.98786e-10
Or if you want the same number of decimals you have, you can use printf:
awk -v m=10 -v OFS="\t" '
FNR%m==1{sum1=0;sum2=0}
{sum1+=$1;sum2+=$2}
FNR%m==0{printf("%0.9G%s%0.9G\n",sum1/m,OFS,sum2/m); lfnr=FNR; next}
END{printf("%0.9G%s%0.9G\n",sum1/(FNR-lfnr),OFS,sum2/(FNR-lfnr))}' file
Prints:
0.314611284 3.36278E-10
0.296772974 3.172112E-10
0.279535036 2.98786444E-10
Your superpower here is the % modulo operator which allows you to detect ever m step -- in this case every 10th. Your x-ray vision is the FNR awk special variable which is the line of the file you are reading.
FNR%10 is always less than 10 and when 0 you are on the 10th iteration and time to print. When 1 you are on the first iteration and it is time to reset the sums.

Append value to line in input file based on column number

I have a function append_val_to_line which appends the $append_val to the line in Input.txt and writes it back. I want to insert 45 at column 1 and it should and since its length is two values, it should at it at column 1 and column 0. But my code below is adding it at column 2 and column 3.
I am not sure why it is so. Can someone help me achieve the goal mentioned above?
My working solution is below but it does not add the 45 at column 0 and 1. I want a generic solution as I can insert the value at any column and the value should be added starting at column N to column N-1 ... N-2 based on the length of the append_val.
As you can see in the sample input before and sample input after the call to append_val_to_line function, the value 45 is added at column 1 and column 2 but I wanted to the from column 0 and end on column 1 as the value 45 is of length two. But my code adds it starting at column 2 and 3 instead.
Space is also a valid column number but I will not be adding any values in those spaces in a line.
#! /bin/bash
function append_val_to_line{
sed -i 's/\(.\{'$1'\}\)/\1'$2'/' "input.txt"
}
column_num=1
append_val=45
append_vals_to_line "$column_num" "$append_val"
Input.txt BEFORE call to function append_vals_to_line
1200 5600 775000 34555
Input.txt AFTER call to function append_vals_to_line
145200 5600 775000 34555
Note 45 has been added at column 2 and 3.
Since OP told OP wants to start character's position from 0 and I believe rather than column its character's position number which we are talking about here, so based on that and shown samples following may help then.
awk -v after="0" -v value="45" '{print substr($0,1,after+1) value substr($0,after+2)}' Input_file
Non one liner form of above:
awk -v after="0" -v value="45" '
{
print substr($0,1,after+1) value substr($0,after+2)
}
' Input_file
Explanation: Adding detailed explanation for above one.
awk -v after="0" -v value="45" ' ##Start awk prorgam from heer and setting after variable to 0 a per OP and value to 45.
{
print substr($0,1,after+1) value substr($0,after+2) ##Printing sub-string from 1 to till after+1 value since OP wants to insert value to 2nd character so printing 1st character here. Then I a printing value here, then printing rest of the current Line.
}
' Input_file ##Mentioning Input_file name here.

Multiplication of two variables containing tuples in BASH script

I have two variables containing tuples of same length generated from a PostgreSQL database and several successful follow on calculations, which I would like to multiply to generate a third variable containing the answer tuple. Each tuple contains 100 numeric records. Variable 1 is called rev_p_client_pa and variable 2 is called lawnp_p_client. I tried the following which gives me a third tuple but the answer rows are not calculated correctly:
rev_p_client_pa data is:
0.018183
0.0202814
0.013676
0.0134083
0.0108168
0.014197
0.0202814
lawn_p_client data is:
52.17
45
30.43
50
40
35
50
The command I used in the script:
awk -v var3="$rev_p_client_pa" 'BEGIN{print var3}' | awk -v var4="$lawnp_p_client" -F ',' '{print $(1)*var4}'
The command gives the following output:
0.948607
1.05808
0.713477
0.699511
0.564312
0.740657
1.05808
However when manually calculated in libreoffice calc i get:
0.94860711
0.912663
0.41616068
0.670415
0.432672
0.496895
1.01407
I used this awk structure to multiply a tuple variable with numeric value variable in a previous calculation and it calculated correctly. Does someone know how the correct awk statement should be written or maybe you have some other ideas that might be useful? Thanks for your help.
Use paste to join the two data sets together, forming a list of pairs, each separated by tab.
Then pipe the result to awk to multiply each pair of numbers, resulting in a list of products.
#!/bin/bash
rev_p_client_pa='0.018183
0.0202814
0.013676
0.0134083
0.0108168
0.014197
0.0202814'
lawn_p_client='52.17
45
30.43
50
40
35
50'
paste <(echo "$rev_p_client_pa") <(echo "$lawn_p_client") | awk '{print $1*$2}'
Output:
0.948607
0.912663
0.416161
0.670415
0.432672
0.496895
1.01407
All awk:
$ awk -v rev_p_client_pa="$rev_p_client_pa" \
-v lawn_p_client="$lawn_p_client" ' # "tuples" in as vars
BEGIN {
split(lawn_p_client,l,/\n/) # split the "tuples" by \n
n=split(rev_p_client_pa,r,/\n/) # get count of the other
for(i=1;i<=n;i++) # loop the elements
print r[i]*l[i] # multiply and output
}'
Output:
0.948607
0.912663
0.416161
0.670415
0.432672
0.496895
1.01407

Compare two timestamp columns and if difference is greater than 1 hour, trigger email alert(bash)

I have a file that looks like this:
user1,135.4,MATLAB,server1,14:53:59,15:54:28
user2,3432,Solver_HF+,server1,14:52:01,14:54:28
user3,3432,Solver_HF+,server1,14:52:01,15:54:14
user4,3432,Solver_HF+,server1,14:52:01,14:54:36
I want to run a comparison between the last two columns and if the difference is greater than an hour(such as lines 1 and 3) it will trigger something like this:
echo "individual line from file" | mail -s "subject" email#site.com
I was trying to come up with a possible solution using awk, but I'm still fairly new to linux and couldn't quite figure out something that worked.
the following awk scripts maybe is your want
awk 'BEGIN{FS=","}
{a="2019 01 01 " gensub(":"," ","g",$5);
b="2019 01 01 " gensub(":"," ","g",$6);
c = int((mktime(b)-mktime(a))/60)}
{if (c >= 60){system("echo "individual line from file" | mail -s "subject" email#site.com")}}' your_filename
then put the scritps into crontab or other trigger
for example
*/5 * * * * awk_scripts.sh
if you just want check new line . use tail -n filename may be more useful than cat
Here you go: (using gnu awk due to mktime)
awk -F, '{
split($(NF-1),t1,":");
split($NF,t2,":");
d1=mktime("0 0 0 "t1[1]" "t1[2]" "t1[3]" 0");
d2=mktime("0 0 0 "t2[1]" "t2[2]" "t2[3]" 0");
if (d2-d1>3600) print $0}' file
user1,135.4,MATLAB,server1,14:53:59,15:54:28
user3,3432,Solver_HF+,server1,14:52:01,15:54:14
Using field separator as comma to get the second last and last field.
The split the two field inn to array t1 and t2 to get hour min sec
mktime converts this to seconds.
do the math and print only lines with more than 3600 seconds
This can then be piped to other commands.
See how time function are used int gnu awk: https://www.gnu.org/software/gawk/manual/html_node/Time-Functions.html

Awk: how to compare two strings in one line

I have a dataset with 20 000 probes, they are in two columns, 21nts each. From this file I need to extract the lines in which last nucleotide in Probe1 column matches last nucleotide in in Probe 2 column. So far I tried AWK (substr) function, but didn't get the expected outcome. Here is one-liner I tried:
awk '{if (substr($2,21,1)==substr($4,21,1)){print $0}}'
Another option would be to anchor last character in columns 2 and 4 (awk '$2~/[A-Z]$/), but I can't find a way to match the probes in two columns using regex. All suggestions and comments will be very much appreciated.
Example of dataset:
Probe 1 Probe 2
4736 GGAGGAAGAGGAGGCGGAGGA A GGAGGACGAGGAGGAGGAGGA
4737 GGAGGAAGAGGAGGGAGAGGG B GGAGGACGAGGAGGAGGAGGG
4738 GGAGGATTTGGCCGGAGAGGC C GGAGGAGGAGGAGGACGAGGT
4739 GGAGGAAGAGGAGGGGGAGGT D GGAGGACGAGGAGGAGGAGGC
4740 GGAGGAAGAGGAGGGGGAGGC E GGAGGAGGAGGACGAGGAGGC
Desired output:
4736 GGAGGAAGAGGAGGCGGAGGA A GGAGGACGAGGAGGAGGAGGA
4737 GGAGGAAGAGGAGGGAGAGGG B GGAGGACGAGGAGGAGGAGGG
4740 GGAGGAAGAGGAGGGGGAGGC E GGAGGAGGAGGACGAGGAGGC
This will filter the input, matching lines where the last character of the 2nd column is equal to the last character of the 4th column:
awk 'substr($2, length($2), 1) == substr($4, length($4), 1)'
What I changed compared to your sample script:
Move the if statement out of the { ... } block into a filter
Use length($2) and length($4) instead of hardcoding the value 21
The { print $0 } is not needed, as that is the default action for the matched lines

Resources