Is there any optimization function in Rcpp - performance

The following is my Rcpp code, and I want to minimize the objective function logtpoi(x,theta) respect to theta in R by 'nlminb'. I found it is slow.
I have two question:
Anyone can improve my Rcpp code? Thank you very much.
Is there any optimization functions in Rcpp? If yes,maybe I can use them in Rcpp directly. And how to use them? Thank you very much.
My code:
#include <RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::depends("RcppArmadillo")]]
// [[Rcpp::export]]
List dtpoi0(const IntegerVector& x, const NumericVector& theta){
//x is 3-dim vector; theta is a 6-dim parameter vector.
//be careful the order of theta1,...,theta6.
double theta1 = theta[0]; double theta2 = theta[1];
double theta3 = theta[2]; double theta4 = theta[3];
double theta5 = theta[4]; double theta6 = theta[5];
int x1 = x[0]; int x2 = x[1]; int x3 = x[2];
IntegerVector z1 = IntegerVector::create(x1,x2);
IntegerVector z2 = IntegerVector::create(x1,x3);
IntegerVector z3 = IntegerVector::create(x2,x3);
int s1 = min(z1); int s2 = min(z2); int s3 = min(z3);
arma::imat missy(1,3,fill::zeros); arma::irowvec ijk={0,0,0};
for (int i = 0; i <= s1; ++i) {
for (int j = 0; j <= s2; ++j) {
for (int k = 0; k <= s3; ++k) {
if ((i+j <= s1) & (i+k <= s2) & ( j+k <= s3))
{ ijk = {i,j,k};
missy = join_cols(missy,ijk);}
}
}
}
IntegerMatrix misy = as<IntegerMatrix>(wrap(missy));
IntegerVector u1 = IntegerVector::create(0);
IntegerVector u2 = IntegerVector::create(0);
IntegerVector u3 = IntegerVector::create(0);
IntegerVector u4 = IntegerVector::create(0);
IntegerVector u5 = IntegerVector::create(0);
IntegerVector u6 = IntegerVector::create(0);
int total = misy.nrow();
double fvalue = 0;
NumericVector part1(1); NumericVector part2(1);
NumericVector part3(1); NumericVector part4(1);
NumericVector part5(1); NumericVector part6(1);
for (int l = 1; l < total; ++l) {
u1 = IntegerVector::create(x1-misy(l,0)-misy(l,1));
u2 = IntegerVector::create(x2-misy(l,0)-misy(l,2));
u3 = IntegerVector::create(x3-misy(l,1)-misy(l,2));
u4 = IntegerVector::create(misy(l,0));
u5 = IntegerVector::create(misy(l,1));
u6 = IntegerVector::create(misy(l,2));
part1 = dpois(u1,theta1);
part2 = dpois(u2,theta2);
part3 = dpois(u3,theta3);
part4 = dpois(u4,theta4);
part5 = dpois(u5,theta5);
part6 = dpois(u6,theta6);
fvalue = fvalue + (part1*part2*part3*part4*part5*part6)[0]; }
return(List::create(Named("misy") = misy,Named("fvalue") = fvalue));
}
// [[Rcpp::export]]
NumericVector dtpoi(const IntegerMatrix& x, const NumericVector& theta){
//x is n*3 matrix, n is the number of observations.
int n = x.nrow();
NumericVector density(n);
for (int i = 0; i < n; ++i){
density(i) = dtpoi0(x.row(i),theta)["fvalue"];
}
return(density);
}
// [[Rcpp::export]]
double logtpoi0(const IntegerMatrix& x,const NumericVector theta){
// theta must be a 6-dimiension parameter.
double nln = -sum(log( dtpoi(x,theta) + 1e-60 ));
if(arma::is_finite(nln)) {nln = nln;} else {nln = -1e10;}
return(nln);
}

Huge caveat ahead: I don’t really know Armadillo. But I’ve had a stab at it because the code looks interesting.
A few general things:
You don’t need to declare things before you assign them for the first time. In particular, it’s generally not necessary to declare vectors outside a loop if they’re only used inside the loop. This is probably no less efficient than declaring them inside the loop. However, if your code is too slow it makes sense to carefully profile this, and test whether the assumption holds.
Many of your declarations are just aliases for vector elements and don’t seem necessary.
Your z{1…3} vectors aren’t necessary. C++ has a min function to find the minimum of two elements.
dtpoi0 contains two main loops. Both of these have been heavily modified in my code:
The first loop iterates over many ks that can are never used, due to the internal if that tests whether i + j exceeds s2. By pulling this check into the loop condition of j, we perform fewer k loops.
Your if uses & instead of &&. Like in R, using && rather than & causes short-circuiting. While this is probably not more efficient in this case, using && is idiomatic, whereas & causes head-scratching (my code uses and which is an alternative way of spelling && in C++; I prefer its readability).
The second loops effectively performs a matrix operation manually. I feel that there should be a way of expressing this purely with matrix operations — but as mentioned I’m not an Armadillo user. Still, my changes attempt to vectorise as much of this operation as possible (if nothing else this makes the code shorter). The dpois inner product is unfortunately still inside a loop.
The logic of logtpoi0 can be made more idiomatic and (IMHO) more readable by using the conditional operator instead of if.
const-correctness is a big deal in C++, since it weeds out accidental modifications. Use const liberally when declaring variables that are not supposed to change.
In terms of efficiency, the biggest hit when calling dtpoi or logtpoi0 is probably the conversion of missy to misy, which causes allocations and memory copies. Only convert to IntegerMatrix when necessary, i.e. when actually returning that value to R. For that reason, I’ve split dtpoi0 into two parts.
Another inefficiency is the fact that the first loop in dtpoi0 grows a matrix by appending columns. That’s a big no-no. However, rewriting the code to avoid this isn’t trivial.
#include <algorithm>
#include <RcppArmadillo.h>
// [[Rcpp::depends("RcppArmadillo")]]
using namespace Rcpp;
using namespace arma;
imat dtpoi0_mat(const IntegerVector& x) {
const int s1 = std::min(x[0], x[1]);
const int s2 = std::min(x[0], x[2]);
const int s3 = std::min(x[1], x[2]);
imat missy(1, 3, fill::zeros);
for (int i = 0; i <= s1; ++i) {
for (int j = 0; j <= s2 and i + j <= s1; ++j) {
for (int k = 0; k <= s3 and i + k <= s2 and j + k <= s3; ++k) {
missy = join_cols(missy, irowvec{i, j, k});
}
}
}
return missy;
}
double dtpoi0_fvalue(const IntegerVector& x, const NumericVector& theta, imat& missy) {
double fvalue = 0.0;
ivec xx = as<ivec>(x);
missy.each_row([&](irowvec& v) {
const ivec u(join_cols(xx - v(uvec{0, 0, 1}) - v(uvec{1, 2, 3}), v));
double prod = 1;
for (int i = 0; i < u.n_elem; ++i) {
prod *= R::dpois(u[i], theta[i], 0);
}
fvalue += prod;
});
return fvalue;
}
double dtpoi0_fvalue(const IntegerVector& x, const NumericVector& theta) {
imat missy = dtpoi0_mat(x);
return dtpoi0_fvalue(x, theta, missy);
}
// [[Rcpp::export]]
List dtpoi0(const IntegerVector& x, const NumericVector& theta) {
imat missy = dtpoi0_mat(x);
const double fvalue = dtpoi0_fvalue(x, theta, missy);
return List::create(Named("misy") = as<IntegerMatrix>(wrap(missy)), Named("fvalue") = fvalue);
}
// [[Rcpp::export]]
NumericVector dtpoi(const IntegerMatrix& x, const NumericVector& theta) {
//x is n*3 matrix, n is the number of observations.
int n = x.nrow();
NumericVector density(n);
for (int i = 0; i < n; ++i){
density(i) = dtpoi0_fvalue(x.row(i), theta);
}
return density;
}
// [[Rcpp::export]]
double logtpoi0(const IntegerMatrix& x, const NumericVector theta) {
// theta must be a 6-dimension parameter.
const double nln = -sum(log(dtpoi(x, theta) + 1e-60));
return is_finite(nln) ? nln : -1e10;
}
Important: This compiles, but I can’t test its correctness. It’s entirely possible (even likely!) that my refactor introduced errors. It should therefore only be viewed as a solution sketch, and should by no means be copied and pasted into an application.

Related

Is this implementation of a feedforward comb filter (FFCF) correct?

I'm trying to implement a feedforward comb filter (for use in a reverb) as described here: https://ccrma.stanford.edu/~jos/pasp/Feedforward_Comb_Filters.html
This is my code:
int delay = 1051;
int arraySize = delay + 1;
int n = 0;
double gain = 0.7;
double buffer = new double[arraySize];
double doDelay(double x) {
buffer[n] = x;
double y = buffer[(n + delay) % arraySize];
y += x * gain;
n--;
if (n < 0) n += arraySize;
return y;
}
// per-sample processing function (called for every sample)
void processSample(double& sample) {
sample = doDelay(sample);
}
Aside from it not being the most elegant code, is the application of a feedforward comb filter correct? I suspect I might be missing something.
Thank you.

how to calculate Otsu threshold in 1D

I'm trying to identify bimodal distributions in my analytical chemistry data. Each data set is a list of 3~70 retention times for a particular compound from the GC-MS. RTs for some compound are bimodally distributed where the library searches have assigned the same identity to two or more different features in the data with different RTs. This is quite common for isomers and other compound pairs with very similar mass spectra.
Eg. here's a histogram of RTs for one compound showing bimodal distribution.
I want to calculate the Otsu threshold to try and define bimodal data (there's also multimodal distributions but one step at a time). I'm struggling to understand the Wikipedia article on the calculations but the text indicates that the threshold can be found by finding the minimum intraclass variance. So I've tried computing this from a list of the RTs as follows:
a = list(d['Component RT'])
n = len(a)
b = [a.pop(0)]
varA = []
varB = []
for i in range(1,n-2):
b.append(a.pop(0))
varA.append(statistics.stdev(a)**2)
varB.append(statistics.stdev(b)**2)
Am I right in thinking that if I plot the sum of the variances for the above data I should be able to identify the Otsu threshold as the minimum?
In this example the threshold is obvious and there's about 35 values to work from. For most compounds there's fewer values (typically <15) and the data may be less well defined. Is this even the right threshold to use? The Wikipedia article on modality indicates a whole bunch of other tests for multimodality.
result is simillar to opencv thresh by OTSU.
uchar OTSU(const std::vector<uchar>& input_vec) {
// normalize input to 0-255 if needed
int count[256];
double u0, u1, u;
double pixelSum0, pixelSum1;
int n0, n1;
int bestThresold = 0, thresold = 0;
double w0, w1;
double variable = 0, maxVariable = 0;
for (int i = 0; i < 256; i++)
count[i] = 0;
for (int i = 0; i < input_vec.size(); i++) {
count[int(input_vec[i])] ++;
}
for (thresold = 0; thresold < 256; thresold++) {
n0 = 0;
n1 = 0;
w0 = 0;
w1 = 0;
pixelSum0 = 0;
pixelSum1 = 0;
for (int i = 0; i < thresold; i++) {
n0 += count[i];
pixelSum0 += i * count[i];
}
for (int i = thresold; i < 256; i++) {
n1 += count[i];
pixelSum1 += i * count[i];
}
w0 = double(n0) / (input_vec.size());
w1 = double(n1) / (input_vec.size());
u0 = pixelSum0 / n0;
u1 = pixelSum1 / n1;
u = u0 * w0 + u1 * w1;
variable = w0 * pow((u0 - u), 2) + w1 * pow((u1 - u), 2);
if (variable > maxVariable) {
maxVariable = variable;
bestThresold = thresold;
}
}
return bestThresold;}
ref :https://github.com/1124418652/edge_extract/blob/master/edge_extract/OTSU.cpp

Broken Merge Sort

Good morning, Stack Overflow. You guys helped me out on an earlier assignment, and I'm hoping to get a little help on this one.
It's a programming assignment relating to sorts, one part of which is to write a working implementation of merge sort.
I adapted my solution from the pseudocode the professor used in class, but I'm getting an annoying segfault at the indicated location.
This method is sorting an array of structs, with data_t defined as struct pointers.
The struct definition:
typedef struct {
int id;
int salary;
} employee_t;
typedef employee_t* data_t;
They're being sorted by salary, which is a randomly generated number from 40,000 to 90,000.
Here's the actual method
void merge_sort(data_t items[], size_t n)
{
if (n < 2)
return;
size_t mid = (n / 2);
data_t *left = malloc(sizeof(data_t) * mid);
data_t *right = malloc(sizeof(data_t) * (n - mid));
for (int y = 0; y < mid; y++)
{
left[y] = items[y];
}
for (int z = mid; z < n; z++)
{
right[z] = items[z];
}
merge_sort(left, mid);
merge_sort(right, (n - mid));
size_t l, r, i;
l = 0;
r = 0;
for (i = 0; i < (n - 1); i++)
{
if ((l < mid) && ((r >= (n - mid)) || ((left[l]->salary) <= (right[r]->salary))))
{
items[i] = left[l++];
}
else
{
items[i] = right[r++];
}
}
free(left);
free(right);
}
Note that I haven't made it as far as the end, so the array frees might be incorrectly located.
The segfault always occurs when I try to access right[r]->salary, so I'm assuming this is related to a null pointer, or similar. However, I'm extremely new to sorting, and I don't know exactly where to properly implement a check.
Any advice is appreciated greatly.
At first glance there's this fix:
for (int z = mid; z < n; z++)
{
right[z-mid] = items[z];
}

Problems with MPFIT and user-defined derivatives

I am trying to use the optimization library MPFIT to fit a Gaussian function to my data. Actually this code is part of the example code that comes with the MPFIT library. The original code automatically calculates internally the derivatives of the function numerically and it works perfectly. The MPFIT library also allows the user to provide the function derivatives. This is where the problem starts. Here is the function used to calculate the residuals and the function's first order partial derivatives.
int gaussfunc(int m, int n, double *p, double *dy, double **derivs, void *vars)
{
int i,j;
struct vars_struct *v = (struct vars_struct *) vars;
double *x, *y, *ey;
double a = p[1];
double b = p[2];
double c = p[3];
double d = p[0];
x = v->x;
y = v->y;
ey = v->ey;
for (i=0; i<m; i++)
{
dy[i] = (y[i] - (a*exp(-(x[i]-b)*(x[i]-b)/(2*c*c))+d))/ey[i];
}
// the code below this point is the code I added to calculate the derivatives.
if(derivs)
{
for(j = 0; j < n; j++)
{
if (derivs[j])
{
for (i = 0; i < m; i++)
{
double da = exp(-(x[i]-b)*(x[i]-b)/(2*c*c));
double db = a * exp(-(x[i]-b)*(x[i]-b)/(2*c*c)) * (x[i]-b)/(c*c);
double dc = a * exp(-(x[i]-b)*(x[i]-b)/(2*c*c)) * (x[i]-b)*(x[i]-b)/(c*c*c);
double dd = 1;
double foo;
if (j == 0) foo = dd;
else if(j == 1) foo = da;
else if(j == 2) foo = db;
else if(j == 3) foo = dc;
derivs[j][i] = foo;
}
}
}
}
return 0;
}
The code above the line 'if (derivs)' is the original one but refactored and the code below that is my code for computing the derivatives. I believe my maths are correct, and they are verified by https://math.stackexchange.com/questions/716545/calculating-the-first-order-partial-derivatives-of-the-gaussian-function/716553
Has anyone encountered the same problem while using MPFIT with user-defined derivatives?
Thank you.
Because the calculation of the residuals is (DATA-MODEL)/SIGMA, the derivatives should be:
[-d(MODEL)/d(PARAM)]/sigma.
So, this line:
derivs[j][i] = foo;
becomes:
derivs[j][i] = -foo/ey[i];
Problem solved! Thanks!

SSE floating point dot product for dummies

I have read many SO questions about SSE/SIMD (e.g., Getting started with SSE), but I'm still confused by all of it. All I want is a dot product between two double precision floating-point vectors, in C (C99 FWIW). I'm using GCC.
Can someone post a simple and complete example, including how to convert double vectors to the SSE types and back again?
[Edit 2012-10-08]
Here's some SSE2 code I managed to cobble together, critiques?
#include <emmintrin.h>
double dotprod(double *restrict a, double *restrict b, int n)
{
__m128d aa, bb, cc, ss;
int i, n1 = n - 1;
double *s = calloc(2, sizeof(double));
double s2 = 0;
ss = _mm_set1_pd(0);
for(i = 0 ; i < n1 ; i += 2)
{
aa = _mm_load_pd(a + i);
bb = _mm_load_pd(b + i);
cc = _mm_mul_pd(aa, bb);
ss = _mm_add_pd(ss, cc);
}
_mm_store_pd(s, ss);
s2 = s[0] + s[1];
if(i < n)
s2 += a[i] * b[i];
free(s);
return s2;
}

Resources