How to overcome simulation of ode issue in R? - ode

I#m trying to solve and then optimise the parameter and initial condition values in the system of differential equations. However, I can't run the code due to error messages (this code was working fine when the FS was given as an equation). Due to the limited data points, I want to be able to optimise against E values the whole system (calculated using equation F). I've made the code work till the LLode function, so optimisation doesn't take place at all.
I've managed to solve the error thanks to forum replies (Lyngbakr):
initSim<-ode(initCond, tspan, hormonesode, params, method="ode45",atol = 1e-10, rtol = 1e-10,hmax=NULL,maxsteps=100000000000000000)
Warning message:
In rk(y, times, func, parms, method = "ode45", ...) :
Number of time steps 1 exceeded maxsteps at t = 0
I will appreciate any hint.
Malgosia
My code looks as follow:
tspan<-c(0,1,2,3,4,5)#,6,7,8,9,16,17,18,19)
E<-c(0.303,0.205,0.205,0.381,0.272,0.188)#,0.317,0.274,0.106,0.161,0.947,2.722,4.701,0.24)
names(df)=c("tspan","E")
require(deSolve)
#initial parameter values
#k1<-1.062169#0.370 7.754
k2<-1.908891#-0.00727284 0.022
#k3<-0.321
k4<-2.14
k5<-10.7
#k6<-1.07
A0<-1.38 #15.47
B0<-0.61 #0.298
C0<-0.28
#F0<-0.303#3.28803757 3.434
#define a weight vector outside LLODE function
wts<-sqrt(sqrt(E))
#combine parameters into a vector
params<-c(k2,k4,k5,A0,B0,C0)
names(params)<-c("k2","k4","k5","A0","B0","C0")
#ode function
hormonesode<-function(t,x,params){
A<-x[1]
B<-x[2]
C<-x[3]
D<-x[4]
E<-x[5]
F<-x[6]
# k1<-params[1]
k2<-params[1]
#k3<-params[3]
k4<-params[2]
k5<-params[3]
A0<-params[4]
B0<-params[5]
C0<-params[6]
P<-3.02796-3.1867*cos(0.314159*t)-0.55332*cos(2*0.314159*t)+0.362678*cos(3*0.314159*t)+0.486708*cos(4*0.314159*t)-0.10133*cos(5*0.314159*t)-0.21977*cos(6*0.314159*t)-0.08926*cos(7*0.314159*t)+0.222292*cos(8*0.314159*t)-1.05119*sin(0.314159*t)+0.855633*sin(2*0.314159*t)+0.176677*sin(3*0.314159*t)-0.05658*sin(4*0.314159*t)-0.34108*sin(5*0.314159*t)-0.15718*sin(6*0.314159*t)+0.397642*sin(7*0.314159*t)-0.0986*sin(8*0.314159*t)
FS<-0.1944+0.002017*cos(0.261799*t)+0.009603*cos(2*0.261799*t)+0.01754*cos(3*0.261799*t)+0.106208*cos(4*0.261799*t)+0.020423*cos(5*0.261799*t)+0.015417*cos(6*0.261799*t)+0.01079*cos(7*0.261799*t)+0.115042*cos(8*0.261799*t)+0.008853*sin(0.261799*t)+0.013523*sin(2*0.261799*t)+0.012254*sin(3*0.261799*t)+0.026053*sin(4*0.261799*t)+0.000957*sin(5*0.261799*t)-0.001*sin(6*0.261799*t)+0.002374*sin(7*0.261799*t)+0.026775*sin(8*0.261799*t)
dA<-1.0621*1/(1+(1/5*P)^5)*(1/3*(F^10)/(1+(F)*1/3)^10)-2.14*A;
dB<-75*(((A/5)^10)/(1+(A/5)^10))-8.56*B;
dC<-1.909*FS+0.321*FS*C- 0.749*C;
dD<-0.749*C- 0.749*D+0.214*FS*D^2;
dE<-0.749*D-0.749*E+0.214*B*E^2;
dF<-k2 + k4*D + k5*E-1.07*F;
output<-c(dA, dB, dC, dD, dE, dF)
list(output)
}
#Initial conditions
A0<-2500#1.0038#2.794
B0<-105#25.0061#6.13
C0<-0.018#0.02#0.06126
D0<-0
E0<-0
F0<-0.303#3.28803757 3.434
initCond<-c(A0, B0, C0, D0, E0, F0)
initCond
#run ode with initial guesses
initSim<-ode(initCond, tspan, hormonesode, params, method="ode45",atol = 1e-1, rtol = 1e-1,hmax=NULL, maxsteps=100000)
plot(tspan,initSim[,7], type="l", lty="dashed")
points(tspan,E)
initSim
LLode<-function(params){
A0<-params[4]
B0<-params[5]
C0<-params[6]
D0<-0
E0<-0
F0<-0.303
initCond<-c(A0, B0, C0, D0, E0, F0)
#Run the ODE
odeOut<-ode(initCond,tspan,hormonesode,params,method="ode45",atol = 1e-1, rtol = 1e-1,hmax=NULL, maxsteps=100000)
if(attr(odeOut,"istate")[1]!=0){
#Check if satisfactory 2 indicates perceived success of method 'lsoda', 0 for
#'ode45'. Other integrators may have different indicator codes
cat("Integration possibly failed\n")
LL<-.Machine$double.xmax*1e-06#indicate failure
}else{
y<-odeOut[,7] #measurement variable
wtDiff1<-(y-E)*wts#weighted difference
LL<-as.numeric(crossprod(wtDiff1))#Sum of squares
}
LL
}
# optimize uzing optim()
MLoptres<-optim(params,LLode,method="Nelder-Mead",
control=list(trace=0,maxit=500))
MLoptres
require(optimx)
optxres<-optimx(params,LLode,lower=rep(0,6),upper=rep(200,6),
method=c("nmkb","bobyqa"),
control=list(usenumDeriv=TRUE, maxit=500,trace=0))
summary(optxres,order=value)
bestpar<-coef(summary(optxres,order=value)[1,])
cat("best parameters:")
print(bestpar)
#dput(optxres,file='includes/C20bestpar.dput')
bpSim<-ode(initCond,tspan,hormonesode,bestpar,method="ode45")
X11()
plot(tspan,initSim[,7],type="l",lty="dashed")
points(tspan,E)
#points(tspan,MLoptres[,]/k,type="l",lty="twodash")
points(tspan,bpSim[,7],type="l")
title(main="Improved fit using optimx")

Related

TensorFlow - directly calling tf.function much faster than calling tf.function returned from wrapper

I am training a VAE (using federated learning, but that is not so important) and wanted to keep the loss and train functions simple to exchange. The initial approach was to have a tf.function as loss function and a tf.function as train function as follows:
#tf.function
def kl_reconstruction_loss(model, model_input, beta):
x, y = model_input
mean, logvar = model.encode(x, y)
z = model.reparameterize(mean, logvar)
x_logit = model.decode(z, y)
cross_ent = tf.nn.sigmoid_cross_entropy_with_logits(logits=x_logit, labels=x)
reconstruction_loss = tf.reduce_mean(tf.reduce_sum(cross_ent, axis=[1, 2, 3]), axis=0)
kl_loss = tf.reduce_mean(0.5 * tf.reduce_sum(tf.exp(logvar) + tf.square(mean) - 1. - logvar, axis=-1), axis=0)
loss = reconstruction_loss + beta * kl_loss
return loss, kl_loss, reconstruction_loss
#tf.function
def train_fn(model: tf.keras.Model, batch, optimizer, kl_beta):
"""Trains the model on a single batch.
Args:
model: The VAE model.
batch: A batch of inputs [images, labels] for the vae.
optimizer: The optimizer to train the model.
beta: Weighting of KL loss
Returns:
The loss.
"""
def vae_loss():
"""Does the forward pass and computes losses for the generator."""
# N.B. The complete pass must be inside loss() for gradient tracing.
return kl_reconstruction_loss(model, batch, kl_beta)
with tf.GradientTape() as tape:
loss, kl_loss, rc_loss = vae_loss()
grads = tape.gradient(loss, model.trainable_variables)
grads_and_vars = zip(grads, model.trainable_variables)
optimizer.apply_gradients(grads_and_vars)
return loss
For my dataset this results in an epoch duration of approx. 25 seconds. However, since I have to call those functions directly in my code, I would have to enter different ones if I would want to try out different loss/train functions.
So, alternatively, I followed https://github.com/google-research/federated/tree/master/gans and wrapped the loss function in a class and the train function in another function. Now I have:
class VaeKlReconstructionLossFns(AbstractVaeLossFns):
#tf.function
def vae_loss(self, model, model_input, labels, global_round):
# KL Reconstruction loss
mean, logvar = model.encode(model_input, labels)
z = model.reparameterize(mean, logvar)
x_logit = model.decode(z, labels)
cross_ent = tf.nn.sigmoid_cross_entropy_with_logits(logits=x_logit, labels=model_input)
reconstruction_loss = tf.reduce_mean(tf.reduce_sum(cross_ent, axis=[1, 2, 3]), axis=0)
kl_loss = tf.reduce_mean(0.5 * tf.reduce_sum(tf.exp(logvar) + tf.square(mean) - 1. - logvar, axis=-1), axis=0)
loss = reconstruction_loss + self._get_beta(global_round) * kl_loss
if model.losses:
loss += tf.add_n(model.losses)
return loss, kl_loss, reconstruction_loss
def create_train_vae_fn(
vae_loss_fns: vae_losses.AbstractVaeLossFns,
vae_optimizer: tf.keras.optimizers.Optimizer):
"""Create a function that trains VAE, binding loss and optimizer.
Args:
vae_loss_fns: Instance of gan_losses.AbstractVAELossFns interface,
specifying the VAE training loss.
vae_optimizer: Optimizer for training the VAE.
Returns:
Function that executes one step of VAE training.
"""
# We check that the optimizer has not been used previously, which ensures
# that when it is bound the train fn isn't holding onto a different copy of
# the optimizer variables then the copy that is being exchanged b/w server and
# clients.
if vae_optimizer.variables():
raise ValueError(
'Expected vae_optimizer to not have been used previously, but '
'variables were already initialized.')
#tf.function
def train_vae_fn(model: tf.keras.Model,
model_inputs,
labels,
global_round,
new_optimizer_state=None):
"""Trains the model on a single batch.
Args:
model: The VAE model.
model_inputs: A batch of inputs (usually images) for the VAE.
labels: A batch of labels corresponding to the inputs.
global_round: The current glob al FL round for beta calculation
new_optimizer_state: A possible optimizer state to overwrite the current one with.
Returns:
The number of examples trained on.
The loss.
The updated optimizer state.
"""
def vae_loss():
"""Does the forward pass and computes losses for the generator."""
# N.B. The complete pass must be inside loss() for gradient tracing.
return vae_loss_fns.vae_loss(model, model_inputs, labels, global_round)
# Set optimizer vars
optimizer_state = get_optimizer_state(vae_optimizer)
if new_optimizer_state is not None:
# if optimizer is uninitialised, initialise vars
try:
tf.nest.assert_same_structure(optimizer_state, new_optimizer_state)
except ValueError:
initialize_optimizer_vars(vae_optimizer, model)
optimizer_state = get_optimizer_state(vae_optimizer)
tf.nest.assert_same_structure(optimizer_state, new_optimizer_state)
tf.nest.map_structure(lambda a, b: a.assign(b), optimizer_state, new_optimizer_state)
with tf.GradientTape() as tape:
loss, kl_loss, rc_loss = vae_loss()
grads = tape.gradient(loss, model.trainable_variables)
grads_and_vars = zip(grads, model.trainable_variables)
vae_optimizer.apply_gradients(grads_and_vars)
return tf.shape(model_inputs)[0], loss, optimizer_state
return train_vae_fn
This new formulation takes about 86 seconds per epoch.
I am struggling to understand why the second version performs so much worse than the first one. Does anyone have a good explanation for this?
Thanks in advance!
EDIT: My Tensorflow version is 2.5.0

Robust Standard Errors in lm() using stargazer()

I have read a lot about the pain of replicate the easy robust option from STATA to R to use robust standard errors. I replicated following approaches: StackExchange and Economic Theory Blog. They work but the problem I face is, if I want to print my results using the stargazer function (this prints the .tex code for Latex files).
Here is the illustration to my problem:
reg1 <-lm(rev~id + source + listed + country , data=data2_rev)
stargazer(reg1)
This prints the R output as .tex code (non-robust SE) If i want to use robust SE, i can do it with the sandwich package as follow:
vcov <- vcovHC(reg1, "HC1")
if I now use stargazer(vcov) only the output of the vcovHC function is printed and not the regression output itself.
With the package lmtest() it is possible to print at least the estimator, but not the observations, R2, adj. R2, Residual, Residual St.Error and the F-Statistics.
lmtest::coeftest(reg1, vcov. = sandwich::vcovHC(reg1, type = 'HC1'))
This gives the following output:
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.54923 6.85521 -0.3719 0.710611
id 0.39634 0.12376 3.2026 0.001722 **
source 1.48164 4.20183 0.3526 0.724960
country -4.00398 4.00256 -1.0004 0.319041
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
How can I add or get an output with the following parameters as well?
Residual standard error: 17.43 on 127 degrees of freedom
Multiple R-squared: 0.09676, Adjusted R-squared: 0.07543
F-statistic: 4.535 on 3 and 127 DF, p-value: 0.00469
Did anybody face the same problem and can help me out?
How can I use robust standard errors in the lm function and apply the stargazer function?
You already calculated robust standard errors, and there's an easy way to include it in the stargazeroutput:
library("sandwich")
library("plm")
library("stargazer")
data("Produc", package = "plm")
# Regression
model <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
data = Produc,
index = c("state","year"),
method="pooling")
# Adjust standard errors
cov1 <- vcovHC(model, type = "HC1")
robust_se <- sqrt(diag(cov1))
# Stargazer output (with and without RSE)
stargazer(model, model, type = "text",
se = list(NULL, robust_se))
Solution found here: https://www.jakeruss.com/cheatsheets/stargazer/#robust-standard-errors-replicating-statas-robust-option
Update I'm not so much into F-Tests. People are discussing those issues, e.g. https://stats.stackexchange.com/questions/93787/f-test-formula-under-robust-standard-error
When you follow http://www3.grips.ac.jp/~yamanota/Lecture_Note_9_Heteroskedasticity
"A heteroskedasticity-robust t statistic can be obtained by dividing an OSL estimator by its robust standard error (for zero null hypotheses). The usual F-statistic, however, is invalid. Instead, we need to use the heteroskedasticity-robust Wald statistic."
and use a Wald statistic here?
This is a fairly simple solution using coeftest:
reg1 <-lm(rev~id + source + listed + country , data=data2_rev)
cl_robust <- coeftest(reg1, vcov = vcovCL, type = "HC1", cluster = ~
country)
se_robust <- cl_robust[, 2]
stargazer(reg1, reg1, cl_robust, se = list(NULL, se_robust, NULL))
Note that I only included cl_robust in the output as a verification that the results are identical.

Error in setting max features parameter in Isolation Forest algorithm using sklearn

I'm trying to train a dataset with 357 features using Isolation Forest sklearn implementation. I can successfully train and get results when the max features variable is set to 1.0 (the default value).
However when max features is set to 2, it gives the following error:
ValueError: Number of features of the model must match the input.
Model n_features is 2 and input n_features is 357
It also gives the same error when the feature count is 1 (int) and not 1.0 (float).
How I understood was that when the feature count is 2 (int), two features should be considered in creating each tree. Is this wrong? How can I change the max features parameter?
The code is as follows:
from sklearn.ensemble.iforest import IsolationForest
def isolation_forest_imp(dataset):
estimators = 10
samples = 100
features = 2
contamination = 0.1
bootstrap = False
random_state = None
verbosity = 0
estimator = IsolationForest(n_estimators=estimators, max_samples=samples, contamination=contamination,
max_features=features,
bootstrap=boostrap, random_state=random_state, verbose=verbosity)
model = estimator.fit(dataset)
In the documentation it states:
max_features : int or float, optional (default=1.0)
The number of features to draw from X to train each base estimator.
- If int, then draw `max_features` features.
- If float, then draw `max_features * X.shape[1]` features.
So, 2 should mean take two features and 1.0 should mean take all of the features, 0.5 take half and so on, from what I understand.
I think this could be a bug, since, taking a look in IsolationForest's fit:
# Isolation Forest inherits from BaseBagging
# and when _fit is called, BaseBagging takes care of the features correctly
super(IsolationForest, self)._fit(X, y, max_samples,
max_depth=max_depth,
sample_weight=sample_weight)
# however, when after _fit the decision_function is called using X - the whole sample - not taking into account the max_features
self.threshold_ = -sp.stats.scoreatpercentile(
-self.decision_function(X), 100. * (1. - self.contamination))
then:
# when the decision function _validate_X_predict is called, with X unmodified,
# it calls the base estimator's (dt) _validate_X_predict with the whole X
X = self.estimators_[0]._validate_X_predict(X, check_input=True)
...
# from tree.py:
def _validate_X_predict(self, X, check_input):
"""Validate X whenever one tries to predict, apply, predict_proba"""
if self.tree_ is None:
raise NotFittedError("Estimator not fitted, "
"call `fit` before exploiting the model.")
if check_input:
X = check_array(X, dtype=DTYPE, accept_sparse="csr")
if issparse(X) and (X.indices.dtype != np.intc or
X.indptr.dtype != np.intc):
raise ValueError("No support for np.int64 index based "
"sparse matrices")
# so, this check fails because X is the original X, not with the max_features applied
n_features = X.shape[1]
if self.n_features_ != n_features:
raise ValueError("Number of features of the model must "
"match the input. Model n_features is %s and "
"input n_features is %s "
% (self.n_features_, n_features))
return X
So, I am not sure on how you can handle this. Maybe figure out the percentage that leads to just the two features you need - even though I am not sure it'll work as expected.
Note: I am using scikit-learn v.0.18
Edit: as #Vivek Kumar commented this is an issue and upgrading to 0.20 should do the trick.

mpi4py: Internal Error: invalid error code 409e0e (Ring ids do not match)

I am coding in python and using mpi4py to do some optimization in parallel. I am using Ordinary Least Squares, and my data is too large to fit on one processor, so I have a master process that then spawns other processes. These child processes each import a section of the data that they respectively work with throughout the optimization process.
I am using scipy.optimize.minimize for the optimization, so the child processes receive a coefficient guess from the parent process, and then report the sum of squared error (SSE) to the parent process, and then scipy.optimize.minimize goes through iterations, trying to find a minimum for the SSE. After each iteration of the minimize function, the parent broadcasts the new coefficient guesses to the child processes, who then calculate the SSE again. In the child processes, this algorithm is set up in a while loop. In the parent process, I simply call scipy.optimize.minimize.
On the part that is giving me a problem, I am doing a nested optimization, or an optimization within an optimization. The inner optimization is an OLS regression as described above, and then the outer optimization is minimizing another function that uses the coefficient of the inner optimization (the OLS regression).
So in my parent process, I have two functions that I minimize, and the second function calls on the first and does a new optimization for every iteration of the second function's optimization. The child processes have a nested while loop for those two optimizations.
Hopefully that all makes sense. If more information is needed, please let me know.
Here is the relevant code for the parent process:
comm = MPI.COMM_SELF.Spawn(sys.executable,args = ['IVQTparallelSlave_cdf.py'],maxprocs=processes)
# First stage: reg D on Z, X
def OLS(betaguess):
comm.Bcast([betaguess,MPI.DOUBLE], root=MPI.ROOT)
SSE = np.array([0],dtype='d')
comm.Reduce(None,[SSE,MPI.DOUBLE], op=MPI.SUM, root = MPI.ROOT)
comm.Bcast([np.array([1],'i'),MPI.INT], root=MPI.ROOT)
return SSE
# Here is the CDF function.
def CDF(yguess, delta_FS, tau):
# Calculate W(y) in the slave process
# Solving the Reduced form after every iteration: reg W(y) on Z, X
comm.Bcast([yguess,MPI.DOUBLE], root=MPI.ROOT)
betaguess = np.zeros(94).astype('d')
###########
# This calculates the reduced form coefficient
coeffs_RF = scipy.minimize(OLS,betaguess,method='Powell')
# This little block is to get the slave processes to stop
comm.Bcast([betaguess,MPI.DOUBLE], root=MPI.ROOT)
SSE = np.array([0],dtype='d')
comm.Reduce(None,[SSE,MPI.DOUBLE], op=MPI.SUM, root = MPI.ROOT)
cont = np.array([0],'i')
comm.Bcast([cont,MPI.INT], root=MPI.ROOT)
###########
contCDF = np.array([1],'i')
comm.Bcast([contCDF,MPI.INT], root=MPI.ROOT) # This is to keep the outer while loop going
delta_RF = coeffs_RF.x[1]
return abs(delta_RF/delta_FS - tau)
########### This one finds Y(1) ##############
betaguess = np.zeros(94).astype('d')
######### First Stage: reg D on Z, X #########
coeffs_FS = scipy.minimize(OLS,betaguess,method='Powell')
print coeffs_FS
# This little block is to get the slave processes' while loops to stop
comm.Bcast([betaguess,MPI.DOUBLE], root=MPI.ROOT)
SSE = np.array([0],dtype='d')
comm.Reduce(None,[SSE,MPI.DOUBLE], op=MPI.SUM, root = MPI.ROOT)
cont = np.array([0],'i')
comm.Bcast([cont,MPI.INT], root=MPI.ROOT)
delta_FS = coeffs_FS.x[1]
######### CDF Function #########
yguess = np.array([3340],'d')
CDF1 = lambda yguess: CDF(yguess, delta_FS, tau)
y_minned_1 = scipy.minimize(CDF1,yguess,method='Powell')
Here is the relevant code for the child processes:
#IVQTparallelSlave_cdf.py
comm = MPI.Comm.Get_parent()
.
.
.
# Importing data. The data is the matrices D, and ZX
.
.
.
########### This one finds Y(1) ##############
######### First Stage: reg D on Z, X #########
cont = np.array([1],'i')
betaguess = np.zeros(94).astype('d')
# This corresponds to 'coeffs_FS = scipy.minimize(OLS,betaguess,method='Powell')' of the parent process
while cont[0]:
comm.Bcast([betaguess,MPI.DOUBLE], root=0)
SSE = np.array(((D - np.dot(ZX,betaguess).reshape(local_n,1))**2).sum(),'d')
comm.Reduce([SSE,MPI.DOUBLE],None, op=MPI.SUM, root = 0)
comm.Bcast([cont,MPI.INT], root=0)
if rank==0: print '1st Stage OLS regression done'
######### CDF Function #########
cont = np.array([1],'i')
betaguess = np.zeros(94).astype('d')
contCDF = np.array([1],'i')
yguess = np.array([0],'d')
# This corresponds to 'y_minned_1 = spicy.minimize(CDF1,yguess,method='Powell')'
while contCDF[0]:
comm.Bcast([yguess,MPI.DOUBLE], root=0)
# This calculates the reduced form coefficient
while cont[0]:
comm.Bcast([betaguess,MPI.DOUBLE], root=0)
W = 1*(Y<=yguess)*D
SSE = np.array(((W - np.dot(ZX,betaguess).reshape(local_n,1))**2).sum(),'d')
comm.Reduce([SSE,MPI.DOUBLE],None, op=MPI.SUM, root = 0)
comm.Bcast([cont,MPI.INT], root=0)
#if rank==0: print cont
comm.Bcast([contCDF,MPI.INT], root=0)
My problem is that after one iteration through the outer minimization, it spits out the following error:
Internal Error: invalid error code 409e0e (Ring ids do not match) in MPIR_Bcast_impl:1328
Traceback (most recent call last):
File "IVQTparallelSlave_cdf.py", line 100, in <module>
if rank==0: print 'CDF iteration'
File "Comm.pyx", line 406, in mpi4py.MPI.Comm.Bcast (src/mpi4py.MPI.c:62117)
mpi4py.MPI.Exception: Other MPI error, error stack:
PMPI_Bcast(1478).....: MPI_Bcast(buf=0x2409f50, count=1, MPI_INT, root=0, comm=0x84000005) failed
MPIR_Bcast_impl(1328):
I haven't been able to find any information about this "ring id" error or how to fix it. Help would be much appreciated. Thanks!

How to normalize an image using Octave?

In their paper describing Viola-Jones object detection framework (Robust Real-Time Face Detection by Viola and Jones), it is said:
All example sub-windows used for training were variance normalized to minimize the effect of different lighting conditions.
My question is "How to implement image normalization in Octave?"
I'm NOT looking for the specific implementation that Viola & Jones used but a similar one that produces almost the same output. I've been following a lot of haar-training tutorials(trying to detect a hand) but not yet able to output a good detector(xml).
I've tried contacting the authors, but still no response yet.
I already answered how to to it in general guidelines in this thread.
Here is how to do method 1 (normalizing to standard normal deviation) in octave (Demonstrating for a random matrix A, of course can be applied to any matrix, which is how the picture is represented):
>>A = rand(5,5)
A =
0.078558 0.856690 0.077673 0.038482 0.125593
0.272183 0.091885 0.495691 0.313981 0.198931
0.287203 0.779104 0.301254 0.118286 0.252514
0.508187 0.893055 0.797877 0.668184 0.402121
0.319055 0.245784 0.324384 0.519099 0.352954
>>s = std(A(:))
s = 0.25628
>>u = mean(A(:))
u = 0.37275
>>A_norn = (A - u) / s
A_norn =
-1.147939 1.888350 -1.151395 -1.304320 -0.964411
-0.392411 -1.095939 0.479722 -0.229316 -0.678241
-0.333804 1.585607 -0.278976 -0.992922 -0.469159
0.528481 2.030247 1.658861 1.152795 0.114610
-0.209517 -0.495419 -0.188723 0.571062 -0.077241
In the above you use:
To get the standard deviation of the matrix: s = std(A(:))
To get the mean value of the matrix: u = mean(A(:))
And then following the formula A'[i][j] = (A[i][j] - u)/s with the
vectorized version: A_norm = (A - u) / s
Normalizing it with vector normalization is also simple:
>>abs = sqrt((A(:))' * (A(:)))
abs = 2.2472
>>A_norm = A / abs
A_norm =
0.034959 0.381229 0.034565 0.017124 0.055889
0.121122 0.040889 0.220583 0.139722 0.088525
0.127806 0.346703 0.134059 0.052637 0.112369
0.226144 0.397411 0.355057 0.297343 0.178945
0.141980 0.109375 0.144351 0.231000 0.157065
In the abvove:
abs is the absolute value of the vector (its length), which is calculated with vectorized multiplications (A(:)' * A(:) is actually sum(A[i][j]^2))
Then we use it to normalize the vector so it will be of length 1.

Resources