Related
Let's say I have a data set of {10, 20, 30}. My mean and variance here are mean = 20 and variance = 66.667. Is there a formula that lets me calculate the new variance value if I was to remove 10 from the data set turning it into {20, 30}?
This is a similar question to https://math.stackexchange.com/questions/3112650/formula-to-recalculate-variance-after-removing-a-value-and-adding-another-one-gi which deals with the case when there is replacement. https://math.stackexchange.com/questions/775391/can-i-calculate-the-new-standard-deviation-when-adding-a-value-without-knowing-t is also a similar question except that deals with adding adding a value instead of removing one. Removing a prior sample while using Welford's method for computing single pass variance deals with removing a sample, but I cannot figure out how to modify it for dealing with population.
To compute Mean and Variance we want 3 parameters:
N - number of items
Sx - sum of items
Sxx - sum of items squared
Having all these values we can find mean and variance as
Mean = Sx / N
Variance = Sxx / N - Sx * Sx / N / N
In your case
items = {10, 20, 30}
N = 3
Sx = 60 = 10 + 20 + 30
Sxx = 1400 = 100 + 400 + 900 = 10 * 10 + 20 * 20 + 30 * 30
Mean = 60 / 3 = 20
Variance = 1400 / 3 - 60 * 60 / 3 / 3 = 66.666667
If you want to remove an item, just update N, Sx, Sxx values and compute a new variance:
item = 10
N' = N - 1 = 3 - 1 = 2
Sx' = Sx - item = 60 - 10 = 50
Sxx' = Sxx - item * item = 1400 - 10 * 10 = 1300
Mean' = Sx' / N' = 50 / 2 = 25
Variance' = Sxx' / N' - Sx' * Sx' / N' / N' = 1300 / 2 - 50 * 50 / 2 / 2 = 25
So if you remove item = 10 the new mean and variance will be
Mean' = 25
Variance' = 25
Let’s say I have the following 3D discretized space, in which the indexes of the samples/nodes are sequential as it is shown in the picture.
Now consider only the horizontal middle layer.
My objective is to find a programmatically and iterative rule/s that allow me to run a spiral (like the image or similar, it can start in any direction) over the mid-layer, starting from node 254, as it is shown on the image:
As you can see in the picture, the yellow crosses show the nodes to be explored. In the first lap these nodes are consecutive while in the second they are separated by 1 node and so on.
I started to solve the problem as follows (pseudocode):
I considered size(y) = y = 13
Size(z) = z = 3
Lap 1:
254 – z * y = 215
254 – z * (y + 1) = 212
254 – z = 251
254 + z * (y - 1) = 290
254 + z * y = 293
254 + z * (y + 1) = 296
254 + z = 257
254 – z * (y – 1) = 218
Lap 2:
254 – 3 * z * y = 137
254 – 3 * z * (y + 2/3) = 131
…
But I think there may be a simpler, more general rule.
each direction has constant index increment:
const int dx = 39;
const int dy = 3;
const int dz = 1;
so to make a spiral you just start from start index and increment in current direction i-times then rotate by 90 deg and do the same ... then increment i and do this until desired size is hit ...
You should also add range checking so your spiral will not go outside your array as that would screw things up. By checking actual x,y,z coordinates. So either compute them in parallel or infer them from ix using modular arithmetics so for example something like (C++):
const int dx = 39;
const int dy = 3;
const int dz = 1;
int cw[4]={-dx,-dy,+dx,+dy}; // CW rotation
int ix=254; // start point (center of spiral)
int dir=0; // direction cw[dir]
int n=5; // size
int i,j,k,x,y,z,a; // temp
for (k=0,i=1;i<=n;i+=k,k^=1,dir++,dir&=3)
for (j=1;j<=i;j++)
{
int a=ix-1;
z = a% 3; a/= 3; // 3 is z-resolution
y = a%13; a/=13; // 13 is y-resolution
x = a;
if ((x>=0)&&(x<13)&&(y>=0)&&(y<13)&&(z>=0)&&(z<3))
{
// here use point ix
// Form1->mm_log->Lines->Add(AnsiString().sprintf("%i (%i,%i,%i) %i",ix,x,y,z,i));
}
ix+=cw[dir];
}
producing this output
ix x,y,z i
254 (6,6,1) 1
215 (5,6,1) 1
212 (5,5,1) 2
251 (6,5,1) 2
290 (7,5,1) 2
293 (7,6,1) 2
296 (7,7,1) 3
257 (6,7,1) 3
218 (5,7,1) 3
179 (4,7,1) 3
176 (4,6,1) 3
173 (4,5,1) 3
170 (4,4,1) 4
209 (5,4,1) 4
248 (6,4,1) 4
287 (7,4,1) 4
326 (8,4,1) 4
329 (8,5,1) 4
332 (8,6,1) 4
335 (8,7,1) 4
338 (8,8,1) 5
299 (7,8,1) 5
260 (6,8,1) 5
221 (5,8,1) 5
182 (4,8,1) 5
143 (3,8,1) 5
140 (3,7,1) 5
137 (3,6,1) 5
134 (3,5,1) 5
131 (3,4,1) 5
In case you want CCW spiral either reverse the cw[] or instead of dir++ do dir--
In case you want to have changeable screw width then you just increment i by the actual width instead of just by one.
Based on #Spektre answer, this code worked for me:
const int x_res = 13;
const int y_res = 13;
const int z_res = 3;
const int dx = 39;
const int dy = 3;
const int dz = 1;
int cw[4]={-dx,-dy,+dx,+dy}; // CW rotation
int ix=254; // start point (center of spiral)
int dir=0; // direction cw[dir]
int n=30; // size
int i,j,k;
cout << ix << endl;
// first "lap" (consecutive nodes)
for (k=0,i=1;i<=2;i+=k,k^=1,dir++,dir&=3)
for (j=1;j<=i;j++)
{
ix+=cw[dir];
cout << ix << endl;
}
i-=1;
int width = 2; //screw width
i+=width;
int dist = 1; //nodes separation
int node_count = 0; //nodes counter
for (k=k,i=i;i<=n;i+=k,k^=width,dir++,dir&=3)
{
if (dir==1)
{
dist+=1;
}
for (j=1;j<=i;j++)
{
ix+=cw[dir];
node_count +=1;
if ((0 < ix) && (ix <= x_res*y_res*z_res))
{
if (node_count == dist)
{
cout << ix << endl;
node_count = 0;
}
}
else return 0;
}
}
return 0;
with this output:
254 215 212 251 290 293 296 257 218 179 140 134 128 206 284 362 368 374 380 302
224 146 68 59 50 83 200 317 434 443 452 461 386 269 152 35
I'm spinning up on high level language for mixed integer linear programs (MILPs). The language is A Modeling Language for A Mathematical Programming Language (AMPL).
Chapter 4, page 65, Figure 4-7 shows the following syntax:
set PROD := bands coils plate ;
However, Chapter 5, page 74, shows the following syntax:
set PROD = {"bands", "coils", "plate"};
Can anyone please explain this difference in syntax?
I put the latter into a *.dat file, and AMPL complains expected ; ( : or symbol where the { is. Wondering if it is just a mistake in the manual.
Thanks.
The syntax in Chapter 4 --
set PROD := bands coils plate;
-- is used in data files, while the syntax in Chapter 5 --
set PROD = {"bands", "coils", "plate"};
-- is used in model files. It's a little weird (IMO) that the syntax for sets is different in model and data files, but it is. For another example of this difference, see this question and answer.
Complete working example code modified from AMPL manual
Added by the original poster of the question.
dietu.mod:
# dietu.mod
#----------
# set MINREQ; # nutrients with minimum requirements
# set MAXREQ; # nutrients with maximum requirements
set MINREQ = {"A", "B1", "B2", "C", "CAL"};
set MAXREQ = {"A", "NA", "CAL"};
set NUTR = MINREQ union MAXREQ; # nutrients
set FOOD; # foods
param cost {FOOD} > 0;
param f_min {FOOD} >= 0;
param f_max {j in FOOD} >= f_min[j];
param n_min {MINREQ} >= 0;
param n_max {MAXREQ} >= 0;
param amt {NUTR,FOOD} >= 0;
var Buy {j in FOOD} >= f_min[j], <= f_max[j];
minimize Total_Cost: sum {j in FOOD} cost[j] * Buy[j];
subject to Diet_Min {i in MINREQ}:
sum {j in FOOD} amt[i,j] * Buy[j] >= n_min[i];
subject to Diet_Max {i in MAXREQ}:
sum {j in FOOD} amt[i,j] * Buy[j] <= n_max[i];
The explicit definitions of setes MINREQ and MAXREQ and their members is taken from the *.dat file below (where their definitions have been commented out). Matlab users, observe above & beware that you need commas between members in a set.
dietu.dat:
# dietu.dat
#----------
data;
# set MINREQ := A B1 B2 C CAL ;
# set MAXREQ := A NA CAL ;
set FOOD := BEEF CHK FISH HAM MCH MTL SPG TUR ;
param: cost f_min f_max :=
BEEF 3.19 2 10
CHK 2.59 2 10
FISH 2.29 2 10
HAM 2.89 2 10
MCH 1.89 2 10
MTL 1.99 2 10
SPG 1.99 2 10
TUR 2.49 2 10 ;
param: n_min n_max :=
A 700 20000
C 700 .
B1 0 .
B2 0 .
NA . 50000
CAL 16000 24000 ;
param amt (tr): A C B1 B2 NA CAL :=
BEEF 60 20 10 15 938 295
CHK 8 0 20 20 2180 770
FISH 8 10 15 10 945 440
HAM 40 40 35 10 278 430
MCH 15 35 15 15 1182 315
MTL 70 30 15 15 896 400
SPG 25 50 25 15 1329 370
TUR 60 20 15 10 1397 450 ;
Solve the model using the following at the AMPL prompt:
reset data;
reset;
model dietu.mod;
data dietu.dat;
solve;
I have a vector A, that is
A = [300; 165; 150; 150; 400; 300; 80; 250; 165; 80; 200]
I am trying to find a set of vectors that are composed of the elements of this vector A so that their elements sum up to a value as close as possible to 400 and so that all elements of vector A are included in the disjoint set of vectors.
For example, 400 is already 400, so this is first set of vectors without a slack.
Another set would be the vector of [250 150], their sum is 400.
Another two can be two sets of the vector [300 80], their sum is 380, so a slack of 20 is compromised.
Another would be [165 165], they sum up to 330, with a slack of 70. The last one would be 200 and 150, with a slack of 50. The total slack is 20+20+70+50=160.
I'm trying to find a heuristic or an algorithm (not a programming model) that would minimize the slack. I'm coding in Matlab.
You could try something like this:
v = [300; 165; 150; 150; 400; 300; 80; 250; 165; 80; 200];
binarystr = dec2bin(1:(2^(length(v))-1));
bincell = mat2cell(binarystr,ones(size(binarystr ,1),1),ones(size(binarystr ,2),1));
bin = cellfun(#(x) str2double(x),bincell);
Now you can multiply to find all the combinations:
comb = b*v;
Find the minimum
target = 400;
[val,index] = min(abs(comb-target));
if you want to know what the combination was the you can look for the indexes:
idxs = find(bin(index,:));
and the values are:
disp(idxs)
disp(v(idxs))
Hope this helps.
So I thought this was a very interesting problem and started it at work (I hope my boss won't find out), but I am missing a part. The code is pretty much horrible, but I wanted to show the concept I guess.
A = [300; 165; 150; 150; 400; 300; 80; 250; 165; 80; 200] ;
P = (1 - (sum(A) /400 - floor(sum(A)/400))) * 400; %//minimum slack to be achieved
%//Round 1
G1 = zeros(floor(sum(A)/400)+1,3)
for t = 1:floor(sum(A)/400)+1
if size(A,1) > 1
%//single combination
[F indF] = min(abs(A-400));
%//double combination
if size(A >1)
D = combntns(A,2);
sumD = sum(D,2);
[F2 indF2] = min(abs(sumD-400));
end
%//triple combination
if size(A >2)
T = combntns(A,3);
sumT = sum(T,2);
[F3 indF3] = min(abs(sumT-400));
end
%remove 1
[R removeInd] = min([F,F2,F3]);
if removeInd == 1
G1(t,1) = A(indF);
A(indF) =[];
else if removeInd ==2
G1(t,1:2) = D(indF2,:) ;
[tmp,tmp2] = intersect(A,G1(t,:));
A(tmp2) = [];
else removeInd == 3
G1(t,:) = T(indF3,:) ;
[tmp,tmp2] = intersect(A,G1(t,:));
A(tmp2) = [] ;
end
end
else if size(A,1) == 1
G1(t,1) = A;
end
end
end
</pre></code>
the results:
>>400 0 0
150 250 0
165 150 80
300 80 0
165 200 0
300 0 0
The reason the results were wrong is because I searched for subsets with length of 1,2 and 3. 4 is not possible, since it produce huge results (but you can include it anyways). If I switch to subsets with length of 1 and 2 I get the right answer. So I think the step I am missing is how long my subsets can be.
results when max length of subset is set to 2:
>>400 0 0
150 250 0
300 80 0
300 80 0
165 200 0
165 150 0
All you have to do is % out the triple combination, and change this line : [R removeInd] = min([F,F2]); without F3
I am attempting to reclassify continuous data to categorical data using Matlab. The following script takes a 4-band (Red, Green, Blue, nIR) aerial image and calculates the normalized difference vegetation index (i.e. a vegetation index showing healthy green vegetation). The script then rescales the values from (-1 to 1) to (0 - 255). This is the matrix I am trying to reclassify in the third section of the script %% Reclassify Imag1 matrix. I am attempting to use conditional statements to perform the reclassification, although this may be the wrong approach. The reclassification step in the script does not have any apparent effect.
How can I reclassify continuous values (0 - 255) to categorical values (1, 2, 3, 4) on a cell by cell basis?
file = 'F:\path\to\naip\image\4112107_ne.tif';
[Z R] = geotiffread(file);
outputdir = 'F:\temp\';
%% Make NDVI calculations
NIR = im2single(Z(:,:,4));
red = im2single(Z(:,:,3));
ndvi = (NIR - red) ./ (NIR + red);
ndvi = double(ndvi);
%% Stretch NDVI to 0-255 and convert to 8-bit unsigned integer
ndvi = floor((ndvi + 1) * 128); % [-1 1] -> [0 256]
ndvi(ndvi < 0) = 0; % not really necessary, just in case & for symmetry
ndvi(ndvi > 255) = 255; % in case the original value was exactly 1
Imag1 = uint8(ndvi);
%% Reclassify Imag1 matrix
if (150 <= Imag1)
Imag1 = 1;
elseif (150 > Imag1) & (140 < Imag1)
Imag1 = 2;
elseif (140 > Imag1) & (130 < Imag1)
Imag1 = 3;
elseif (130 >= Imag1)
Imag1 = 4;
end
%% Write the results to disk
tiffdata = geotiffinfo(file);
outfilename = [outputdir 'reclass_ndvi' '.tif'];
geotiffwrite(outfilename, Imag1, R, 'GeoKeyDirectoryTag', tiffdata.GeoTIFFTags.GeoKeyDirectoryTag)
disp('Processing complete')
Try this:
Imag1 = [ 62 41 169 118 210;
133 158 96 149 110;
211 200 84 194 29;
209 16 15 146 28;
95 144 13 249 170];
Imag1(find(Imag1 <= 130)) = 4;
Imag1(find(Imag1 >= 150)) = 1;
Imag1(find(Imag1 > 140)) = 2;
Imag1(find(Imag1 > 130)) = 3;
Result:
Imag1 =
62 41 169 118 210
133 158 96 149 110
211 200 84 194 29
209 16 15 146 28
95 144 13 249 170
Imag1 =
4 4 1 4 1
3 1 4 2 4
1 1 4 1 4
1 4 4 2 4
4 2 4 1 1
I can go into the logic in detail if you like, but I wanted to confirm that this gives your expected results first.
Some updates based on comments on the follow-up question to eliminate the unnecessary find and make the code more robust and independent of execution order.
Imag2 = zeros(size(Imag1));
Imag2(Imag1 >= 150) = 1;
Imag2((Imag1 > 140) & (Imag1 < 150)) = 2;
Imag2((Imag1 > 130) & (Imag1 < 141)) = 3;
Imag2(Imag1 <= 130) = 4;
Note that the results are now in Imag2 instead of overwriting Imag1.