Append value to line in input file based on column number - shell

I have a function append_val_to_line which appends the $append_val to the line in Input.txt and writes it back. I want to insert 45 at column 1 and it should and since its length is two values, it should at it at column 1 and column 0. But my code below is adding it at column 2 and column 3.
I am not sure why it is so. Can someone help me achieve the goal mentioned above?
My working solution is below but it does not add the 45 at column 0 and 1. I want a generic solution as I can insert the value at any column and the value should be added starting at column N to column N-1 ... N-2 based on the length of the append_val.
As you can see in the sample input before and sample input after the call to append_val_to_line function, the value 45 is added at column 1 and column 2 but I wanted to the from column 0 and end on column 1 as the value 45 is of length two. But my code adds it starting at column 2 and 3 instead.
Space is also a valid column number but I will not be adding any values in those spaces in a line.
#! /bin/bash
function append_val_to_line{
sed -i 's/\(.\{'$1'\}\)/\1'$2'/' "input.txt"
}
column_num=1
append_val=45
append_vals_to_line "$column_num" "$append_val"
Input.txt BEFORE call to function append_vals_to_line
1200 5600 775000 34555
Input.txt AFTER call to function append_vals_to_line
145200 5600 775000 34555
Note 45 has been added at column 2 and 3.

Since OP told OP wants to start character's position from 0 and I believe rather than column its character's position number which we are talking about here, so based on that and shown samples following may help then.
awk -v after="0" -v value="45" '{print substr($0,1,after+1) value substr($0,after+2)}' Input_file
Non one liner form of above:
awk -v after="0" -v value="45" '
{
print substr($0,1,after+1) value substr($0,after+2)
}
' Input_file
Explanation: Adding detailed explanation for above one.
awk -v after="0" -v value="45" ' ##Start awk prorgam from heer and setting after variable to 0 a per OP and value to 45.
{
print substr($0,1,after+1) value substr($0,after+2) ##Printing sub-string from 1 to till after+1 value since OP wants to insert value to 2nd character so printing 1st character here. Then I a printing value here, then printing rest of the current Line.
}
' Input_file ##Mentioning Input_file name here.

Related

Take mean of columns in text file by every 10-row blocks in bash

I have a tab delimited text file with two columns without header. Now, I want to take the mean of each column within blocks of 10 rows. This means, I take the first 10 rows, take the mean between the 10 numbers in each column and output the mean of each column into another text file. Now go further, take the next 10 rows and make the same again. Until the end of the file. If there are less than 10 rows left at the end, just take the mean of the left rows.
Input file:
0.32832977 3.50941E-10
0.31647876 3.38274E-10
0.31482627 3.36508E-10
0.31447645 3.36134E-10
0.31447645 3.36134E-10
0.31396809 3.35591E-10
0.31281157 3.34354E-10
0.312004 3.33491E-10
0.31102326 3.32443E-10
0.30771822 3.2891E-10
0.30560062 3.26647E-10
0.30413213 3.25077E-10
0.30373717 3.24655E-10
0.29636685 3.16777E-10
0.29622422 3.16625E-10
0.29590765 3.16286E-10
0.2949896 3.15305E-10
0.29414582 3.14403E-10
0.28841901 3.08282E-10
0.28820667 3.08055E-10
0.28291832 3.02403E-10
0.28243792 3.01889E-10
0.28156429 3.00955E-10
0.28043638 2.9975E-10
0.27872239 2.97918E-10
0.27833349 2.97502E-10
0.27825573 2.97419E-10
0.27669023 2.95746E-10
0.27645657 2.95496E-10
Expected output text file:
0.314611284 3.36278E-10
0.296772974 3.172112E-10
0.279535036 2.987864E-10
I tried this code, but i don't know how to include the loop for each 10th row:
awk '{x+=$1;next}END{print x/NR}' file
Here is an awk to do this:
awk -v m=10 -v OFS="\t" '
FNR%m==1{sum1=0;sum2=0}
{sum1+=$1;sum2+=$2}
FNR%m==0{print sum1/m,sum2/m; lfnr=FNR; next}
END{print sum1/(FNR-lfnr),sum2/(FNR-lfnr)}' file
Prints:
0.314611 3.36278e-10
0.296773 3.17211e-10
0.279535 2.98786e-10
Or if you want the same number of decimals you have, you can use printf:
awk -v m=10 -v OFS="\t" '
FNR%m==1{sum1=0;sum2=0}
{sum1+=$1;sum2+=$2}
FNR%m==0{printf("%0.9G%s%0.9G\n",sum1/m,OFS,sum2/m); lfnr=FNR; next}
END{printf("%0.9G%s%0.9G\n",sum1/(FNR-lfnr),OFS,sum2/(FNR-lfnr))}' file
Prints:
0.314611284 3.36278E-10
0.296772974 3.172112E-10
0.279535036 2.98786444E-10
Your superpower here is the % modulo operator which allows you to detect ever m step -- in this case every 10th. Your x-ray vision is the FNR awk special variable which is the line of the file you are reading.
FNR%10 is always less than 10 and when 0 you are on the 10th iteration and time to print. When 1 you are on the first iteration and it is time to reset the sums.

Sum of a column till certain count

I have a file ABC.txt which contain two columns. First column refer to the count and second column refer to the subscriber as below :-
1852 919474214491
1558 919475591746
1149 919475594574
1 919466423350
I have a variable in a script which shows some numeric value i.e Count is 3500.
So I want to compare the variable with first column in ABC.txt file. If value in first column is less than variable than move the value in second column in a separate file (123.txt). Go to next row, now add 1852 with 1558 and compare again with variable, if it is less than variable then move value in second column in file 123.txt. But if the sum of count is more than variable then stop.
Really easy to do with awk:
$ awk -v count=3500 '{ total += $1 } total >= count { exit } { print $2 }' ABC.txt
919474214491
919475591746

Replacing a field with another value using sed or awk

I have a text file with following lines say;
100 200 300
50 120 200
60 500 340
I am writing a script to change the value of any field based upon user input. If user gives the column no ($colno) and row number ($rowno) in bash, I am using the following code to determine the exact field to be replaced.
awk -v i=$colno -v j=$rowno 'NR ==j {print $i}' file
So, if the user gives $colno as 1 and $rownno as 3 I can locate the field $1 in row 3 (60).
Now I want to permanently replace this value (60) with some other user given value ($new_val).
How can I do this using either sed or awk? This bit of code will allow me to change any field specified by the user with the new value specified by the user. The code should not change any other field, even with the same value.
Please help me. Thanking you in advance.
Just replace the column with the value required,
colno=1
rowno=3
awk -v i=$colno -v j=$rowno -v newvalue=20 'NR ==j {$i=newvalue}1' file
100 200 300
50 120 200
20 500 340
The part, {$i=newvalue}1 sets the value of $1, i.e. from your current example 3rd row and 1st column to the value set in newvalue. The {}1 reconstructs each line based on the individual repalcements.
With GNU sed, to replace at line $rowno the $colnoth occurrence of non space characters with $newtext value:
sed -E "${rowno}s/[^ ]*( *)/${newtext}\1/${colno}" file

Awk: how to compare two strings in one line

I have a dataset with 20 000 probes, they are in two columns, 21nts each. From this file I need to extract the lines in which last nucleotide in Probe1 column matches last nucleotide in in Probe 2 column. So far I tried AWK (substr) function, but didn't get the expected outcome. Here is one-liner I tried:
awk '{if (substr($2,21,1)==substr($4,21,1)){print $0}}'
Another option would be to anchor last character in columns 2 and 4 (awk '$2~/[A-Z]$/), but I can't find a way to match the probes in two columns using regex. All suggestions and comments will be very much appreciated.
Example of dataset:
Probe 1 Probe 2
4736 GGAGGAAGAGGAGGCGGAGGA A GGAGGACGAGGAGGAGGAGGA
4737 GGAGGAAGAGGAGGGAGAGGG B GGAGGACGAGGAGGAGGAGGG
4738 GGAGGATTTGGCCGGAGAGGC C GGAGGAGGAGGAGGACGAGGT
4739 GGAGGAAGAGGAGGGGGAGGT D GGAGGACGAGGAGGAGGAGGC
4740 GGAGGAAGAGGAGGGGGAGGC E GGAGGAGGAGGACGAGGAGGC
Desired output:
4736 GGAGGAAGAGGAGGCGGAGGA A GGAGGACGAGGAGGAGGAGGA
4737 GGAGGAAGAGGAGGGAGAGGG B GGAGGACGAGGAGGAGGAGGG
4740 GGAGGAAGAGGAGGGGGAGGC E GGAGGAGGAGGACGAGGAGGC
This will filter the input, matching lines where the last character of the 2nd column is equal to the last character of the 4th column:
awk 'substr($2, length($2), 1) == substr($4, length($4), 1)'
What I changed compared to your sample script:
Move the if statement out of the { ... } block into a filter
Use length($2) and length($4) instead of hardcoding the value 21
The { print $0 } is not needed, as that is the default action for the matched lines

Print rows whose first field appears exactly twice in the file

I have a file like this:
91052011868;Export Equi_Fort Postal;EXPORT;23/02/2015;1;0;0
91052011868;Sof_equi_Fort_Email_am_%yyyy%%mm%%dd%;EMAIL;19/02/2015;1;0;0
91052011868;Sof_trav_Fort_Email_am_%yyyy%%mm%%dd%;EMAIL;19/02/2015;1;0;0
91052151371;Export Trav_faible temoin;EXPORT;12/02/2015;1;0;0
91052182019;Export Deme_fort temoin;EXPORT;24/02/2015;1;0;0
91052199517;Sof_voya_Faible_Email_pm;EMAIL;22/01/2015;1;0;0
91052199517;Sof_voya_Faible_Email_Relance_pm;EMAIL;26/01/2015;1;0;0
91052262558;Sof_deme_faible_Email_am;EMAIL;26/01/2015;1;0;1
91052265940;Sof_trav_Faible_Email_am_%yyyy%%mm%%dd%;EMAIL;13/02/2015;1;0;0
91052265940;Sof_trav_Faible_Email_Relance_am_%yyyy%%mm%%dd%;EMAIL;17/02/2015;1;0;0
91052265940;Sof_voya_Faible_Email_am_%yyyy%%mm%%dd%;EMAIL;13/02/2015;1;0;0
91052265940;Sof_voya_Faible_Email_Relance_am_%yyyy%%mm%%dd%;EMAIL;16/02/2015;1;0;0
91052531428;Export Trav_faible temoin;EXPORT;11/02/2015;1;0;0
91052547697;Export Deme_Faible Postal;EXPORT;27/02/2015;1;0;0
91052562398;Export Deme_faible temoin;EXPORT;18/02/2015;1;0;0
I want to know all the lines where the first column duplicated values are greater than 1 but strictly inferior to 3.
91052199517;Sof_voya_Faible_Email_pm;EMAIL;22/01/2015;1;0;0
91052199517;Sof_voya_Faible_Email_Relance_pm;EMAIL;26/01/2015;1;0;0
I did the part below but it doesn't work...
sort file | awk 'NR==FNR{a[$1]++;next;}{ if (a[$1] > 0 && a[$1] <1 )print $0;}' file file
Why?
If what you want is to print all those lines whose first field appears twice, you can use this:
$ awk -F";" 'FNR==NR{a[$1]++; next} a[$1]==2' file file
91052199517;Sof_voya_Faible_Email_pm;EMAIL;22/01/2015;1;0;0
91052199517;Sof_voya_Faible_Email_Relance_pm;EMAIL;26/01/2015;1;0;0
This sets the field separator to the semi colon and then reads the file twice:
- the first time to count how many the 1st field appears (a[$1]++)
- the second time to print those lines matching the condition a[$1]==2. That is, the first field to appearing twice throughout the file.
If you wanted those indexes appearing between 2 and 4 times, you could use the following syntax on the second block:
a[$1]>=2 && a[$1]<=4
Why wasn't your approach working?
Because your condition says:
if (a[$1] > 0 && a[$1] <1 )
which of course will never happen, since a[$1] is an integer and no integer is bigger than 0 and smaller than 1.
Note my proposed solution uses the same idea, only that in a bit more idiomatic way: There is no need to be explicit in the if condition, neither saying print $0: this is exactly what awk does when a condition evaluates as True.

Resources