Alea GPU for loop cannot get field - aleagpu

I am just starting with ALEA and I am curious how you can access other types and references inside a given gpu parallel.for. when i do the following i get a runtime error that states "Cannot get field random. Possible reasons: 1) Static field is not supported.2) The field type is not supported. 3) In closure class, the field doesn't have [GpuParam] attribute."
This error makes sense but I am not sure what the correct implementation would be
[GpuManaged]
public void InitPoints()
{
var gp = Gpu.Default;
gp.For(1, (10), (i) =>
{
int pointStart = random.Next(totalPoints) + 1;
Pt point = new Pt(pointStart, ptAt[i]);
point.Process();
});
}

You try to call the System.Random.Next. This is .NET library code and cannot be compiled to GPU. There is no MSIL behind that function that could be accessed and compiled to run on the GPU. Also the System.Random.Next is a random number generator implemented for serial applications. You should use the parallel random number generators provided in cuRand, which are also exposed in Alea GPU.

Related

Removing a std::function<()> from a vector c++

I'm building a publish-subscribe class (called SystermInterface), which is responsible to receive updates from its instances, and publish them to subscribers.
Adding a subscriber callback function is trivial and has no issues, but removing it yields an error, because std::function<()> is not comparable in C++.
std::vector<std::function<void()> subs;
void subscribe(std::function<void()> f)
{
subs.push_back(f);
}
void unsubscribe(std::function<void()> f)
{
std::remove(subs.begin(), subs.end(), f); // Error
}
I've came down to five solutions to this error:
Registering the function using a weak_ptr, where the subscriber must keep the returned shared_ptr alive.
Solution example at this link.
Instead of registering at a vector, map the callback function by a custom key, unique per callback function.
Solution example at this link
Using vector of function pointers. Example
Make the callback function comparable by utilizing the address.
Use an interface class (parent class) to call a virtual function.
In my design, all intended classes inherits a parent class called
ServiceCore, So instead of registering a callback function, just
register ServiceCore reference in the vector.
Given that the SystemInterface class has a field attribute per instance (ID) (Which is managed by ServiceCore, and supplied to SystemInterface by constructing a ServiceCore child instance).
To my perspective, the first solution is neat and would work, but it requires handling at subscribers, which is something I don't really prefer.
The second solution would make my implementation more complex, where my implementation looks as:
using namespace std;
enum INFO_SUB_IMPORTANCE : uint8_t
{
INFO_SUB_PRIMARY, // Only gets the important updates.
INFO_SUB_COMPLEMENTARY, // Gets more.
INFO_SUB_ALL // Gets all updates
};
using CBF = function<void(string,string)>;
using INFO_SUBTREE = map<INFO_SUB_IMPORTANCE, vector<CBF>>;
using REQINF_SUBS = map<string, INFO_SUBTREE>; // It's keyed by an iterator, explaining it goes out of the question scope.
using INFSRC_SUBS = map<string, INFO_SUBTREE>;
using WILD_SUBS = INFO_SUBTREE;
REQINF_SUBS infoSubrs;
INFSRC_SUBS sourceSubrs;
WILD_SUBS wildSubrs;
void subscribeInfo(string info, INFO_SUB_IMPORTANCE imp, CBF f) {
infoSubrs[info][imp].push_back(f);
}
void subscribeSource(string source, INFO_SUB_IMPORTANCE imp, CBF f) {
sourceSubrs[source][imp].push_back(f);
}
void subscribeWild(INFO_SUB_IMPORTANCE imp, CBF f) {
wildSubrs[imp].push_back(f);
}
The second solution would require INFO_SUBTREE to be an extended map, but can be keyed by an ID:
using KEY_T = uint32_t; // or string...
using INFO_SUBTREE = map<INFO_SUB_IMPORTANCE, map<KEY_T,CBF>>;
For the third solution, I'm not aware of the limitations given by using function pointers, and the consequences of the fourth solution.
The Fifth solution would eliminate the purpose of dealing with CBFs, but it'll be more complex at subscriber-side, where a subscriber is required to override the virtual function and so receives all updates at one place, in which further requires filteration of the message id and so direct the payload to the intended routines using multiple if/else blocks, which will increase by increasing subscriptions.
What I'm looking for is an advice for the best available option.
Regarding your proposed solutions:
That would work. It can be made easy for the caller: have subscribe() create the shared_ptr and corresponding weak_ptr objects, and let it return the shared_ptr.
Then the caller must not lose the key. In a way this is similar to the above.
This of course is less generic, and then you can no longer have (the equivalent of) captures.
You can't: there is no way to get the address of the function stored inside a std::function. You can do &f inside subscribe() but that will only give you the address of the local variable f, which will go out of scope as soon as you return.
That works, and is in a way similar to 1 and 2, although now the "key" is provided by the caller.
Options 1, 2 and 5 are similar in that there is some other data stored in subs that refers to the actual std::function: either a std::shared_ptr, a key or a pointer to a base class. I'll present option 6 here, which is kind of similar in spirit but avoids storing any extra data:
Store a std::function<void()> directly, and return the index in the vector where it was stored. When removing an item, don't std::remove() it, but just set it to std::nullptr. Next time subscribe() is called, it checks if there is an empty element in the vector and reuses it:
std::vector<std::function<void()> subs;
std::size_t subscribe(std::function<void()> f) {
if (auto it = std::find(subs.begin(), subs.end(), std::nullptr); it != subs.end()) {
*it = f;
return std::distance(subs.begin(), it);
} else {
subs.push_back(f);
return subs.size() - 1;
}
}
void unsubscribe(std::size_t index) {
subs[index] = std::nullptr;
}
The code that actually calls the functions stored in subs must now of course first check against std::nullptrs. The above works because std::nullptr is treated as the "empty" function, and there is an operator==() overload that can check a std::function against std::nullptr, thus making std::find() work.
One drawback of option 6 as shown above is that a std::size_t is a rather generic type. To make it safer, you might wrap it in a class SubscriptionHandle or something like that.
As for the best solution: option 1 is quite heavy-weight. Options 2 and 5 are very reasonable, but 6 is, I think, the most efficient.

SystemVerilog - How to get the number of enumerated types at compile time

I trying to find a way to get the number of possible enumerations in an enum type at compile time. I need this for initializing a templated class that uses enumerated types.
I am curious if there is a utility function (or system task) that gives this. It would be similar to $size() but for enumerated types. However, I can't seem to find a function for that. After doing a lot of research, it doesn't seem to be possible.
Here is an example I am trying to do:
typedef enum {RANDOM, STICKY, SWEEP} bias_t;
// can be parameterized to pick another enum type at random
class enum_picker #(type T = bias_t); //type must be an enumerated type
local T current_type;
local const int weights[$size(T)]; //<--- How do I get the number of enumerated types?
function T pick_type();
... some code ...
endfunction
endclass
So for the variable weights, it is an array of weights in which its size is the number of enumerated types. Right now it is 32 because of the $size() call but that is wrong; in this particular code example, the array size should be 3.
Is there a way to do this? Or is this simply not allowed in SystemVerilog?
You probably don't want to set up weights as a const; you would not be able to set values into it. You can use the num() method to get the number of enumerations.
class enum_picker #(type T = bias_t); //type must be an enumerated type
local T current_type;
local int weights[];
function new;
weights = new[current_type.num()];
foreach (weights[i]) weights[i] = $urandom_range(10);
endfunction
function T pick_type();
endfunction
endclass

Type mismatch in CUDA when invoking Kernel

I'm trying to port some code onto a GPU using CUDA 9.0.
I ran into the problem that the Kernel appears to expect a different type inside the kernel than outside the Kernel.
I have boiled down the problem to the following lines, which should show the problem. I hope this should be enough code to expose the error source.
I definitely do not have a second kernel named similar or equal, of course all streams are defined and for testing purposes I commented out any inner implementation of the kernel.
Real is a typedef, that sets here to float. For trial purposes I have replaced the Real with float, which leads me to the same result.
// Kernel definition
__global__ void doStuff(Real *masses)
{
int i = blockIdx.x*blockDim.x + threadIdx.x;
// no inner implementation, yet
}
// prepare the loop
for(...)
{
Real *masses, *d_masses;
masses = getMasses();
cudaMalloc(&d_masses, numActiveParticles * sizeof(Real));
cudaMemcpyAsync(d_masses, masses, numActiveParticles * sizeof(Real), cudaMemcpyHostToDevice, dataStream1);
cudaStreamSynchronize(dataStream1);
doStuff<<<256, 256, 0, executionStream>>>(d_masses);
// ....
}
The error message that I am getting now is:
error: argument of type "Real *" is incompatible with parameter of type
"unsigned int"
and when I replace everything with float:
error: argument of type "float *" is incompatible with parameter of type
"unsigned int"
Help would be much appreciated, and thank you all in advance,
Update:
I found the error. My class inherited another class with a member function named like the kernel. Instead of invoking the kernel it always tried to invoke the parent's class member function.

JNI - Converting jobject representing Basic Java Objects (Boolean) to native basic types (bool)

I think I managed to fit most of the question in to the title on this one!
I'm pulling back an Object from Java in my native C++ code:
jobject valueObject = env->CallObjectMethod(hashMapObject, hashMapGetMID, keyObject);
It's possible for me to check wether the return object is one of the native types using something like:
jclass boolClass = env->FindClass("java/lang/Boolean");
if(env->IsInstanceOf(valueObject, boolClass) == JNI_TRUE) { }
So, I now have a jobject which I know is a Boolean (note the upper case B) - The question is, what is the most efficient way (considering I already have the jobject in my native code) to convert this to a bool. Typecasting doesn't work which makes sense.
Although the above example is a Boolean I also want to convert Character->char, Short->short, Integer->int, Float->float, Double->double.
(Once i've implemented it I will post an answer to this which does Boolean.booleanValue())
You have two choices.
Option #1 is what you wrote in your self-answer: use the public method defined for each class to extract the primitive value.
Option #2 is faster but not strictly legal: access the internal field directly. For Boolean, that would be Boolean.value. For each primitive box class you have a fieldID for the "value" field, and you just read the field directly. (JNI cheerfully ignores the fact that it's declared private. You can also write to "final" fields and do other stuff that falls into the "really bad idea" category.)
The name of the "value" field is unlikely to change since that would break serialization. So officially this is not recommended, but in practice you can get away with it if you need to.
Either way, you should be caching the jmethodID / jfieldID values, not looking them up every time (the lookups are relatively expensive).
You could also use the less expensive IsSameObject function rather than IsInstanceof, because the box classes are "final". That requires making an extra GetObjectClass call to get valueObject's class, but you only have to do that once before your various comparisons.
BTW, be careful with your use of "char". In your example above you're casting the result of CallCharMethod (a 16-bit UTF-16 value) to a char (an 8-bit value). Remember, char != jchar (unless you're somehow configured for wide chars), long != jlong (unless you're compiling with 64-bit longs).
This is the solution I'm going to use if I get no more input. Hopefully it isn't this difficult but knowing JNI i'm thinking it might be:
if (env->IsInstanceOf(valueObject, boolClass) == JNI_TRUE)
{
jmethodID booleanValueMID = env->GetMethodID(boolClass, "booleanValue", "()Z");
bool booleanValue = (bool) env->CallBooleanMethod(valueObject, booleanValueMID);
addBoolean(key, booleanValue);
}
else if(env->IsInstanceOf(valueObject, charClass) == JNI_TRUE)
{
jmethodID characterValueMID = env->GetMethodID(charClass, "charValue", "()C");
char characterValue = (char) env->CallCharMethod(valueObject, characterValueMID);
addChar (key, characterValue);
}
In general, I write jni for the better performance.
How to gain the better performance ? Using asm, primitive types and few method call.
I suggest that design your method return type can use in c/c++, such as
jint, jlong, jboolean, jbyte and jchar etc.
The redundant function call and convert will make inefficient and unmaintainable implementation.

Fuzzy/approximate checking of solutions from algorithms

We have people who run code for simulations, testing etc. on some supercomputers that we have. What would be nice is, if as part of a build process we can check that not only that the code compiles but that the ouput matches some pattern which will indicate we are getting meaningful results.
i.e. the researcher may know that the value of x must be within some bounds. If not, then a logical error has been made in the code (assuming it compiles and their is no compile time error).
Are there any pre-written packages for this kind of thing. The code is written in FORTRAN, C, C++ etc.
Any specific or general advice would be appreciated.
I expect most unit testing frameworks could do this; supply a toy test data set and see that the answer is sane in various different ways.
A good way to ensure that the resulting value of any computation (whether final or intermediate) meets certain constraints, is to use an object oriented programming language like C++, and define data-types that internally enforce the conditions that you are checking for. You can then use those data-types as the return value of any computation to ensure that said conditions are met for the value returned.
Let's look at a simple example. Assume that you have a member function inside of an Airplane class as a part of a flight control system that estimates the mass of the airplane instance as a function of the number passengers and the amount of fuel that plane has at that moment. One way to declare the Airplane class and an airplaneMass() member function is the following:
class Airplane {
public:
...
int airplaneMass() const; // note the plain int return type
...
private:
...
};
However, a better way to implement the above, would be to define a type AirplaneMass that can be used as the function's return type instead of int. AirplaneMass can internally ensure (in it's constructor and any overloaded operators) that the value it encapsulates meets certain constraints. An example implementation of the AirplaneMass datatype could be the following:
class AirplaneMass {
public:
// AirplaneMass constructor
AirplaneMass(int m) {
if (m < MIN || m > MAX) {
// throw exception or log constraint violation
}
// if the value of m meets the constraints,
// assign it to the internal value.
mass_ = m;
}
...
/* range checking should also be done in the implementation
of overloaded operators. For instance, you may want to
make sure that the resultant of the ++ operation for
any instance of AirplaneMass also lies within the
specified constraints. */
private:
int mass_;
};
Thereafter, you can redeclare class Airplane and its airplaneMass() member function as follows:
class Airplane {
public:
...
AirplaneMass airplaneMass() const;
// note the more specific AirplaneMass return type
...
private:
...
};
The above will ensure that the value returned by airplaneMass() is between MIN and MAX. Otherwise, an exception will be thrown, or the error condition will be logged.
I had to do that for conversions this month. I don't know if that might help you, but it appeared quite simple a solution to me.
First, I defined a tolerance level. (Java-ish example code...)
private static final double TOLERANCE = 0.000000000001D;
Then I defined a new "areEqual" method which checks if the difference between both values is lower than the tolerance level or not.
private static boolean areEqual(double a, double b) {
return (abs(a - b) < TOLERANCE);
}
If I get a false somewhere, it means the check has probably failed. I can adjust the tolerance to see if it's just a precision problem or really a bad result. Works quite well in my situation.

Resources