I need a quick advice how-to. I mention that the following scenario is based on the use of c_api available already to my monetdblite compilation on 64bit, intention is to use it with some adhoc C written functions.
Short: how can I achieve or simulate the following scenario:
update aTable set a,b,c = func(x,y,z,…)
Long. Many algorithms are returning more than one variable as, for instance, multiple regression.
bool m_regression(IN const double **data, IN const int cols, IN const int rows, OUT double *fit_values, OUT double *residuals, OUT double *std_residuals, OUT double &p_value);
In order to minimize the transfer of data between monetdb and heavy computational function, all those results are generated in one step. Question is how can I transfer them back at once, minimizing computational time and memory traffic between monetdb and external C/C++(/R/Python) function?
My first thought to solve this is something like this:
1. update aTable set dummy = func_compute(x,y,z,…)
where dummy is a temporary __int64 field and func_compute will compute all the necessary outputs and store the result into a dummy pointer. To make sure is no issue with constant estimation, first returned value in the array will be the real dummy pointer, the rest just an incremented value of dummy + i;
2. update aTable set a = func_ret(dummy, 1), b= func_ret (dummy, 2), c= func_ret (dummy, 3) [, dummy=func_free(dummy)];
Assuming the func_ret will get the dummy in the same order that it was returned on first call, I would just copy the prepared result into provided storage; In case the order is not preserved, I will need an extra step to get the minimum (real dummy pointer), then to use the offset of current value to lookup in my array.
__int64 real_dummy = __inputs[0][0];
double *my_pointer_data = (double *) (real_dummy + __inputs[1][0] * sizeof(double)* row_count);
memcpy(__outputs[0], my_pointer_data, sizeof(double)* row_count);
// or ============================
__int64 real_dummy = minimum(__inputs[0]);
double *my_pointer_data = (double *) (real_dummy + __inputs[0][1] * sizeof(double)* row_count);
for (int i=0;i<row_count;i++)
__outputs[0][i] = my_pointer_data[__inputs[0][i] - real_dummy];
It is less relevant how am I going to free the temporary memory, can be in the last statement in update or in a new fake update statement using func_free.
Problem is that it doesn’t look to me that, even if I save some computational (big) time, the passing of the dummy is still done 3 times (any chance that memory is actually not copied?).
Is it any other better way of achieving this?
I am not aware of a good way of doing this, sorry. You could retrieve the table, add your columns as BATs in whichever way you like and write it back.
Related
I wrote a function that acts on each combination of columns in an input matrix. It uses multiple for loops and is very slow, so I am trying to parallelize it to use the maximum number of threads on my computer.
I am having difficulty finding the correct syntax to set this up. I'm using the Parallel package in octave, and have tried several ways to set up the calls. Here are two of them, in a simplified form, as well as a non-parallel version that I believe works:
function A = parallelExample(M)
pkg load parallel;
# Get total count of columns
ct = columns(M);
# Generate column pairs
I = nchoosek([1:ct],2);
ops = rows(I);
slice = ones(1, ops);
Ic = mat2cell(I, slice, 2);
## # Non-parallel
## A = zeros(1, ops);
## for i = 1:ops
## A(i) = cmbtest(Ic{i}, M);
## endfor
# Parallelized call v1
A = parcellfun(nproc, #cmbtest, Ic, {M});
## # Parallelized call v2
## afun = #(x) cmbtest(x, M);
## A = parcellfun(nproc, afun, Ic);
endfunction
# function to apply
function P = cmbtest(indices, matrix)
colset = matrix(:,indices);
product = colset(:,1) .* colset(:,2);
P = sum(product);
endfunction
For both of these examples I generate every combination of two columns and convert those pairs into a cell array that the parcellfun function should split up. In the first, I attempt to convert the input matrix M into a 1x1 cell array so it goes to each parallel instance in the same form. I get the error 'C must be a cell array' but this must be internal to the parcellfun function. In the second, I attempt to define an anonymous function that includes the matrix. The error I get here specifies that 'cmbtest' is undefined.
(Naturally, the actual function I'm trying to apply is far more complex than cmbtest here)
Other things I have tried:
Put M into a global variable so it doesn't need to be passed. Seemed to be impossible to put a global variable in a function file, though I may just be having syntax issues.
Make cmbtest a nested function so it can access M (parcellfun doesn't support that)
I'm out of ideas at this point and could use help figuring out how to get this to work.
Converting my comments above to an answer.
When performing parallel operations, it is useful to think of each parallel worker that will result as separate and independent octave instances, which need to have appropriate access to all functions and variables they will require in order to do their independent work.
Therefore, do not rely on subfunctions when calling parcellfun from a main function, since this might lead to errors if the worker is unable to access the subfunction directly under the hood.
In this case, separating the subfunction into its own file fixed the problem.
Code as below:
// Generate the returns matrix
boost::shared_ptr<Eigen::MatrixXd> returns_m = boost::make_shared<Eigen::MatrixXd>(factor_size, num_of_obs_per_simulation);
//Generate covariance matrix
boost::shared_ptr<MatrixXd> corMatrix = boost::make_shared<MatrixXd>(factor_size, factor_size);
(*corMatrix) = (*returns_m) * (*returns_m).transpose() / (num_of_obs_per_simulation - 1);
The point is that I want to return the corMatrx as a pointer, not as an object, to be stored as a member of a result class for later use. Above code seems to make a copy of the big matrix, is there any better way to do it?
Thank you and best wishes...
Slight improvement to #ggael's answer, you can directly construct your corMatrix shared pointer from the expression:
boost::shared_ptr<MatrixXd> corMatrix
= boost::make_shared<MatrixXd>((*returns_m) * (*returns_m).transpose() * (1./(num_of_obs_per_simulation - 1));
Or, you can exploit the symmetry of the product using rankUpdate:
boost::shared_ptr<MatrixXd> corMatrix = boost::make_shared<MatrixXd>(MatrixXd::Zero(factor_size, factor_size));
corMatrix->selfadjointView<Eigen::Upper>().rankUpdate(*returns_m, 1.0 / (num_of_obs_per_simulation - 1));
// optionally copy upper half to lower half as well:
corMatrix->triangularView<Eigen::StrictlyLower>() = corMatrix->adjoint();
I don't understand your question as returning corMatrix as a shared_ptr will do exactly what you want, but regarding your product, you can save one temporary using noalias and * (1./x):
(*corMatrix).noalias() = (*returns_m) * (*returns_m).transpose() * (1./(num_of_obs_per_simulation - 1));
The whole expression will be turned as a single call to a gemm-like routine.
To complete the explanation:
Recall that Eigen is an expression template library, so when you do A = 2*B + C.transpose(); with A,B,C matrices, then everything happen in operator=, that is the right-hand-side expression is directly evaluated within A. For products, the story is slightly different because 1) to be efficient it needs to be directly evaluated within something, and 2) it is not possible to directly write to the destination if there is aliasing, e.g.: A = A*B. The noalias tells Eigen that the destination does not not alias and the product can be directly evaluated within it.
I want to find out the best way to perform a group-by in SAS so I can perform some benchmarks. The simplest two ways I can think of is Proc SQL and Proc means. Here is the example in proc sql
proc sql noprint; /* took 6 mins */
create table summ as select
id,
sum(val)
from
randint
group by
id
;
quit;
I think there are ways to make this run fast
use sasfile command to load the data into memory first
create an index on id
Are there any other options I can use? Any SAS options I should turn on to make this run as fast as possible? I am not tied to proc sql nor proc means, so if there are faster ways then I would love to know about it!!!
My set up code is as below
options macrogen;
options obs=max sortsize=max source2 FULLSTIMER;
options minoperator SASTRACE=',,,d' SASTRACELOC=SASLOG;
options compress = binary NOSTSUFFIX;
options noxwait noxsync;
options LRECL=32767;
proc fcmp outlib=work.myfunc.sample;
function RandBetween(min, max);
return (min + floor((1 + max - min) * rand("uniform")));
endsub;
run;
options cmplib=work.myfunc;
data RandInt;
do i = 1 to 250000000;
id = RandBetween(1, 2500000);
val = rand("uniform");
output;
end;
drop i;
run;
My SAS comparison macros are as below
%macro sasbench(dosql = N); %macro _; %mend;
%if &dosql. = Y %then %do;
proc sql noprint; /* took 6 mins */
create table summ as select
id,
sum(val)
from
randint
group by
id
;
quit;
%end;
proc means data=randint sum noprint;
var val ;
class id;
output out = summmeans(drop=_type_ _freq_) sum = /autoname;
run;
%mend;
%sasbench();
/**/
/*sasfile randint load;*/
/*%sasbench();*/
/*sasfile randint close;*/
proc datasets lib=work;
modify randint;
INDEX CREATE id / nomiss;
run;
%sasbench();
sasfile is only a benefit if the entire data set can fit into session ram limits and if the data set is going to be used more than once. I suppose this would make sense if your benchmark includes multiple runs / different techniques on the same sasfile.
An index on id would help if the data was unsorted by id. When the data set is presorted by id the id column metadata will have sortedby flag set which a procedure can use for its own internal optimization, however there is no guarantee. As for indexes, use option msglevel=i to get informational messages in the log about index selection during processing.
The fastest way is direct addressing, but requires enough ram to handle the largest id value as an array index:
array ids(250000000) _temporary_
ids(id) + value
The next fastest way is probably hand coded array based hashing:
search SAS conference proceedings for papers by Paul Dorfman
The next fastest hash way is probably the hash component object with key suminc.
DATA Step was edited to align with the comments
data demo_data;
do rownum = 1 to 1000;
id = ceil(100*ranuni(123)); * NOTE: 100 different groups, disordered;
value = ceil(1000*ranuni(123)); * NOTE: want to sum value over group, for demonstration individual values integers from 1..1000;
output;
end;
run;
data _null_;
if 0 then set demo_data(keep=id value); %* prep pdv ;
length total 8; %* prep keysum variable ;
call missing (total); %* prevent warnings ;
declare hash ids (ordered:'a', suminc:'value', keysum:'total'); %* ordered ensures keys will be sorted ascending upon output ;
ids.defineKey('id');
*ids.defineData('id'); % * not having a defineData is an implicit way of adding only the keys as data, only data + keysum variables are .output;
ids.defineDone();
* read all records and touch each hash key in order to perform tacit total+value summation;
do until (end);
set demo_data end=end;
if ids.find() ne 0 then ids.add();
end;
ids.output(dataset:'sum_value_over_id'); * save the summation of each key combination;
stop;
run;
Note: There can be only one keysum variable.
If the suminc variable was set to be always 1 instead of value, then the keysum would be the count instead of the total.
Obtaining both sum and count over group via hash would require an explicit defineData for a count and sum variable and slightly different statements, such as:
declare hash ids (ordered:'a');
...
ids.defineData('id', 'count', 'total');
...
if ids.find() ne 0 then do; count=0; total=0; end;
count+1;
total+value;
ids.replace();
...
However, if value is known to be always a natural number, and group size is known to be < 10group size limit you could numerically encode the count by using a suminc of value + 10-group size limit and numerically decode count by processing the output data with count = (total - int(total)) * 10group size limit.
For sorted data the fastest way is most likely a DOW loop with accumulation.
proc sort data=foo;
by id;
data sum_value_over_id_v2(keep=id total);
do until (last.id);
set foo;
by id;
total = sum(total, value);
end;
run;
You will likely find that I/O is largest component of performance.
The best answer varies dramatically by the application. In your example, PROC SQL at least on my machine significantly outperforms PROC MEANS, but there are plenty of cases where it will not do so. It's able to in this case because it's building hash tables behind the scenes, more than likely, which are quite fast - a single pass through the data is all that's needed.
You certainly could speed things up by putting your full dataset into memory with SASFILE, if you have room to store the whole thing. You would have to have it in memory to begin with, though, more than likely; just reading it into memory for this purpose alone wouldn't really help since you're doing that read anyway.
As Richard notes, there are a bunch of ways to do this. I think PROC SQL will often be the fastest or similar to the fastest in simple cases, both because it's multithreaded (as opposed to data step being single threaded) and because it's got a fast hash table backend.
PROC MEANS is also usually going to be competitive, the case you show in the example is almost a worst case for it since it's got a huge number of class variables so I think it may be creating a temporary table on disk. It's also multithreaded. Reduce the class variable categories to 2500 instead of 2,500,000 and you get PROC MEANS a bit faster than PROC SQL (but within the margin of error).
Data step accumulation, either in a hash table or a DoW loop, will sometimes outperform both of the above, and sometimes not, again depending on the data. Here it does outperform slightly. The code for data step accumulation tends to be a bit more complex, which is why I'd usually discourage it unless the savings is substantial (having more code to maintain is worse, typically). PROC MEANS and PROC SQL require less maintenance and less to understand. But in applications where performance is critical and these solutions happen to be superior, it may be worth it to go this route, especially if the data step is helpful. Of course, the hash table method is limited to fitting the results in memory, though usually that's manageable.
Ultimately, I would encourage you to use whatever method is easiest to maintain but still gives sufficient performance; and when possible try to be self consistent with other code. If most of your code is in SQL, that is probably fine. SASFILE and indexes probably won't be needed, unless you're doing more complicated things than you present above. Summation is actually more work than I/O in many cases. Don't overcomplicate it, ultimately: programmer hours and difficulty of QA is something that should trump basic performance, unless you're talking several hours' difference. And if you are, then just run tests on your actual use case and see what works best.
If you assume the data is sorted then this is another solution
data sum_value_over_id_v2(keep=id total);
set a.randint(keep=id val);
by id;
total + val;
if last.id then do;
output;
total = 0;
end;
drop val;
run;
I am doing a problem and i need to do this task.
I want to add pairs (p1,q1),(p2,q2)..(pn,qn) in such way that
(i) Duplicate pair added only once(like in set).
(ii) I store count how many time each pair are added to set.For ex : (7,2) pair
will present in set only once but if i add 3 times count will 3.
Which container is efficient for this problem in c++?
Little example will be great!
Please ask if you cant understand my problem and sorry for bad English.
How about a std::map<Key, Value> to map your pairs (Key) to their count and as you insert, increment a counter (Value).
using pairs_to_count = std::map<std::pair<T1, T2>, size_t>;
std::pair<T1, T2> p1 = // some value;
std::pair<T1, T2> p2 = // some other value;
pairs_to_count[p1]++;
pairs_to_count[p1]++;
pairs_to_count[p2]++;
pairs_to_count[p2]++;
pairs_to_count[p2]++;
In this code, the operator[] will automatically add a key in the map if it does not exist yet. At that moment, it will initialize the key's corresponding value to zero. But as you insert, even the first time, that value is incremented.
Already after the first insertion, the count of 1 correctly reflects the number of insertion. That value gets incremented as you insert more.
Later, retrieving the count is a matter of calling operator[] again to get value associated with a given key.
size_t const p2_count = pairs_to_count[p2]; // equals 3
I need to create a syntax loop that runs a series of transformation
This is a simplified example of what I need to do
I would like to create five fruit variables
apple_variable
banana_variable
mango_variable
papaya_variable
orange_variable
in V1
apple=1
banana=2
mango=3
papaya=4
orange=5
First loop
IF (V1={number}) {fruit}_variable = VX.
IF (V2={number}) {fruit}_variable = VY.
IF (V3={number}) {fruit}_variable = VZ.
Run loop for next fruit
So what I would like is the scripte to check if V1, V2 or V3 contains the fruit number. If one of them does (only one can) The new {fruit}_variable should get the value from VX, VY or VZ.
Is this possible? The script need to create over 200 variables so a bit to time consuming to do manually
The first loop can be put within a DO REPEAT command. Essentially you define your two lists of variables and you can loop over the set of if statements.
DO REPEAT V# = V1 V2 V3
/VA = VX VY VZ.
if V# = 1 apple_variable = VA.
END REPEAT.
Now 1 and apple_variable are hard coded in the example above, but we can roll this up into a simple macro statement to take arbitrary parameters.
DEFINE !fruit (!POSITIONAL = !TOKENS(1)
/!POSITIONAL = !TOKENS(1)).
DO REPEAT V# = V1 V2 V3
/VA = VX VY VZ.
if V# = !1 !2 = VA.
END REPEAT.
!ENDDEFINE.
!fruit 1 apple_variable.
Now this will still be a bit tedious for over 200 variables, but should greatly simplify the task. After I get this far I typically just do text editing to my list to call the macro 200 times, which in this instance all that it would take is inserting !fruit before the number and the resulting variable name. This works well especially if the list is static.
Other approaches using the in-built SPSS facilities (mainly looping within the defined MACRO) IMO tend to be ugly, can greatly complicate the code and frequently not worth the time (although certainly doable). Although that is somewhat mitigated if you are willing to accept a solution utilizing python commands.
DO REPEAT is a good solution here, but I'm wondering what the ultimate goal is. This smells like a problem that might be solved by using the multiple response facilities in Statistics without the need to go through these transformations. Multiple response functionality is available in the old MULTIPLE RESPONSE procedure and in the newer CTABLES and Chart Builder facilities.
HTH,
Jon Peck
combination of loop statements: for,while, do while with nested if..else and switch case will do the trick. just make sure you have your initial value and final value for the loop to go
let's say:
for (initial; final; increment)
{
if (x == value) {
statements;
}else{
...
}