I am new to bash programming. I have this file, the file contain is:
A B C D E
1 X 0 X 0 0
2 0 X X 0 0
3 0 0 0 0 0
4 X X X X X
Where X means, it has value, 0 means its empty.
From there, let say user enter B3, which is a 0, means I will need to replace it to X. What is the best way to do it? I need to constantly update this file.
FYI: This is a homework question, thus please, dont give a direct code/answer. But any sample code like (how to use this function etc) will be very much appreciated).
Regards,
Newbie Bash Scripter
EDIT:
If I am not wrong, Bash can call/update directly the specific column. Can it be done with row+column?
If you can use sed I'll throw out this tidbit:
sed -i "/^2/s/. /X /4" /path/to/matrix_file
Input
A B C D E
1 0 0 0 0 0
2 X 0 0 X 0
3 X X 0 0 X
Output
A B C D E
1 0 0 0 0 0
2 X 0 X X 0
3 X X 0 0 X
Explanation
^2: This restricts sed to only work on lines which begin with 2, i.e. the 2nd row
s: This is the replacement command
Note for the next two, '_' represents a white space
._: This is the pattern to match. The . is a regular expression that matches any character, thus ._ matches "any character followed by a space". Note that this could also be [0X]_ if you are guaranteed that the only two characters you can have are '0' and 'X'
X_: This is the replacement text. In this case we are replacing 'any character followed by a space' as described above with 'X followed by a space'
4: This matches the 4th occurrence of the pattern text above, i.e. the 4th row including the row index.
What would be left for you to do is use variables in the place of ^2 and 4 such as ^$row and $col and then map the letters A - E to 1 - 5
something to get you started
#!/bin/bash
# call it using ./script B 1
# make it executable using "chmod 755 script"
# read input parameters
col=$1
row=$2
# construct below splits on whitespace
while read -a line
do
for i in ${line[#]}; do
array=( "${array[#]}" $i );
done
done < m
# Now you have the matrix in a one-dimensional array that can be indexed.
# Lets print it
for i in ${array[#]}; do
echo $i;
done
Here's a starter for you using AWK:
awk -v col=B -v row=3 'BEGIN{getline; for (i=1;i<=NF;i++) cols[$i]=i+1} NR==row+1{print $cols[col]}'
The i+1 and row+1 account for the row heading and column heading, respectively.
Related
We have a simple exercise into which we have to get some values from a file and print out the result. The file contains a matrix and we have to do f=3*x^2 +4*y+5*z where x y z are the numbers of a 3x3 array that is inputted through a file.
Lets say that i name the matrix file f1.
How do i input the value of this file into the bash script
This is what i have done:
#!bin/bash/
while read x y z
do
let f=3*x*x+4*y+*z;
echo -n "f"
done < f1
exit 0
This script shows how to encapsulate an input file definition into the calling script, allowing a variable to be expanded into the resulting file (${MINE}).
if you wanted the contents to include non-expanded variables, you would put double quotes around the first "EnDoFiNpUt", on the "cat" line, not around the second instance. That would deposit the exact text into the file.
#!/bin/bash
MINE=8
cat >f1 <<EnDoFiNpUt
9 3 6
7 ${MINE} 2
2 1 4
EnDoFiNpUt
while read x y z
do
let f=3*x*x+4*y+z
echo -e "\t[${x} ${y} ${z}] f = ${f}"
done <f1
exit 0
Suppose I have the following data:
# all the numbers are their own number. I want to reshape exactly as below
0 a
1 b
2 c
0 d
1 e
2 f
0 g
1 h
2 i
...
And I would like to reshape the data such that it is:
0 a d g ...
1 b e h ...
2 c f i ...
Without writing a complex composition. Is this possible using the unix/bash toolkit?
Yes, trivially I can do this inside a language. The idea is NOT TO "just" do that. So if some cat X.csv | rs [magic options] sort of solution (and rs, or the bash reshape command, would be great, except it isn't working here on debian stretch) exists, that is what I am looking for.
Otherwise, an equivalent answer that involves a composition of commands or script is out of scope: already got that, but would rather not have it.
Using GNU datamash:
$ datamash -s -W -g 1 collapse 2 < file
0 a,d,g
1 b,e,h
2 c,f,i
Options:
-s sort
-W use whitespace (spaces or tabs) as delimiters
-g 1 group on the first field
collapse 2 print comma-separated list of values of the second field
To convert the tabs and commas to space characters, pipe the output to tr:
$ datamash -s -W -g 1 collapse 2 < file | tr '\t,' ' '
0 a d g
1 b e h
2 c f i
bash version:
function reshape {
local index number key
declare -A result
while read index number; do
result[$index]+=" $number"
done
for key in "${!result[#]}"; do
echo "$key${result[$key]}"
done
}
reshape < input
We just need to make sure input is in unix format
I have following matrix:
0.380451 0.381955 0 0.237594
0.317293 0.362406 0 0.320301
0.261654 0.38797 0 0.350376
0 0 0 1
0 1 0 0
0 0 0 1
0 0.001504 0 0.998496
0.270677 0.35188 0.018045 0.359398
0.36391 0.305263 0 0.330827
0.359398 0.291729 0.037594 0.311278
0.359398 0.276692 0.061654 0.302256
And I want to replace only the zeros not the zeros followed by points to 0.001, how can I do that with sed or gsub?
This is not elegant, and not super portable, but it works on your specific example:
sed -e 's=^0 =X =g
s= 0$= X=g
s= 0 = X =g' data.txt
First of all, it assumes that the fields in the input file are separated by one or more white spaces. The first part looks for "0" at the beginning of the line, the second at the end of the line, and the third finds "0" with spaces on both sides.
Any particular reason to use only sed for this? I am sure that a simple awk script could do a better job, and also be more robust.
Match whitespace in your replacement.
echo 0 0.001504 0 0.998496 | sed 's/0[\t ]/Z /g'
I have input file as below, need to do this conversion col1*0 + col2*1 + col3*2 for every 3 column triplet.
input.txt - All positive numbers, can be decimals, real file has 1000s of columns.
0 0 0 1 0 0
0 1 0 0 0 1
0 0 1 0 0 0
I have the below gawk line that does that:
gawk '{for(i=1;i<=NF;i+=3)x=(x?x FS:"")(($(i+1))+($(i+2)*2));print x;x=y}' input.txt
0 0
1 2
2 0
Additionally, I need to check if 3 numbers are all zeros, if they are all zeros then the conversion should be -9.
Pseudo code:
if($i==0 & $(i+1)==0 & $(i+2)==0) {-9} else {$(i+1)+$(i+2)*2}
#or as all numbers are positive.
if(($i+$(i+1)+$(i+2))==0) {-9} else {$(i+1)+$(i+2)*2}
Expected output:
-9 0
1 2
2 -9
Data description:
This data is output from IMPUTE2 software - a genotype imputation and haplotype phasing program. Rows are SNPs, columns are samples. Every SNP is represented by 3 columns. 3 numbers per SNP with range 0-1 (probability of allele AA AB BB). So in above example we have 3 SNPs and 2 samples. Imputation can also be represented as dosage value, 1 number per SNP with range 0-2. We are trying to covert probability format into dosage format. When IMPUTE2 can't give any probabilities to any of the alleles, it outputs as 0 0 0, then we should convert as no call -9.
You want the sum to be different if the three given columns are 0. For this, you can expand the ternary operator to something like>
gawk '{ for(i=1;i<=NF;i+=3) {
x=$(i+1) + $(i+2)*2; # the sum
res=res (res ? FS : "") ($i==0 && $(i+1)==0 && $(i+2)==0 ?-9:x)
}
print res; res="" # print stored line and empty for next loop
}' file
That is, append the value -9 if all the elements are 0. Otherwise, the calculated x:
res=res (res ? FS : "") ($i==0 && $(i+1)==0 && $(i+2)==0 ?-9:x)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^
if three columns are 0..........|
If all values are positive, the check can be reformatted to just compare if the sum is 0 or not.
($i + $(i+1) + $(i+2)) ? x : -9
Testing with your file apparently works:
$ gawk '{for(i=1;i<=NF;i+=3) {x=$(i+1) + $(i+2)*2; res=res (res ? FS : "") ($i==0 && $(i+1)==0 && $(i+2)==0 ?-9:x)} print res; res=""}' file
-9 0
1 2
2 -9
another awk one-liner (assuming non-negative input values)
$ awk '{c1=$2+2*$3;c2=$5+2*$6; print c1||$1?c1:-9,c2||$4?c2:-9}' lop
-9 0
1 2
2 -9
I have a file like this
file.txt
0 1 a
1 1 b
2 1 d
3 1 d
4 2 g
5 2 a
6 3 b
7 3 d
8 4 d
9 5 g
10 5 g
.
.
.
I want reset row number count to 0 in first column $1 whenever value of field in second column $2 changes, using awk or bash script.
result
0 1 a
1 1 b
2 1 d
3 1 d
0 2 g
1 2 a
0 3 b
1 3 d
0 4 d
0 5 g
1 5 g
.
.
.
As long as you don't mind a bit of excess memory usage, and the second column is sorted, I think this is the most fun:
awk '{$1=a[$2]+++0;print}' input.txt
This awk one-liner seems to work for me:
[ghoti#pc ~]$ awk 'prev!=$2{first=0;prev=$2} {$1=first;first++} 1' input.txt
0 1 a
1 1 b
2 1 d
3 1 d
0 2 g
1 2 a
0 3 b
1 3 d
0 4 d
0 5 g
1 5 g
Let's break apart the script and see what it does.
prev!=$2 {first=0;prev=$2} -- This is what resets your counter. Since the initial state of prev is empty, we reset on the first line of input, which is fine.
{$1=first;first++} -- For every line, set the first field, then increment variable we're using to set the first field.
1 -- this is awk short-hand for "print the line". It's really a condition that always evaluates to "true", and when a condition/statement pair is missing a statement, the statement defaults to "print".
Pretty basic, really.
The one catch of course is that when you change the value of any field in awk, it rewrites the line using whatever field separators are set, which by default is just a space. If you want to adjust this, you can set your OFS variable:
[ghoti#pc ~]$ awk -vOFS=" " 'p!=$2{f=0;p=$2}{$1=f;f++}1' input.txt | head -2
0 1 a
1 1 b
Salt to taste.
A pure bash solution :
file="/PATH/TO/YOUR/OWN/INPUT/FILE"
count=0
old_trigger=0
while read a b c; do
if ((b == old_trigger)); then
echo "$((count++)) $b $c"
else
count=0
echo "$((count++)) $b $c"
old_trigger=$b
fi
done < "$file"
This solution (IMHO) have the advantage of using a readable algorithm. I like what's other guys gives as answers, but that's not that comprehensive for beginners.
NOTE:
((...)) is an arithmetic command, which returns an exit status of 0 if the expression is nonzero, or 1 if the expression is zero. Also used as a synonym for let, if side effects (assignments) are needed. See http://mywiki.wooledge.org/ArithmeticExpression
Perl solution:
perl -naE '
$dec = $F[0] if defined $old and $F[1] != $old;
$F[0] -= $dec;
$old = $F[1];
say join "\t", #F[0,1,2];'
$dec is subtracted from the first column each time. When the second column changes (its previous value is stored in $old), $dec increases to set the first column to zero again. The defined condition is needed for the first line to work.