Let's say I'm using raymarching to render a field function. (This on the CPU, not the GPU.) I have an algorithm like this crudely-written pseudocode:
pixelColour = arbitrary;
pixelTransmittance = 1.0;
t = 0;
while (t < max_view_distance) {
point = rayStart + t*rayDirection;
emission, opacity = sampleFieldAt(point);
pixelColour, pixelTransmittance =
integrate(pixelColour, pixelTransmittance, emission, absorption);
t = t * stepFactor;
}
return pixelColour;
The logic is all really simple... but how does integrate() work?
Each sample actually represents a volume in my field, not a point, even though the sample is taken at a point; therefore the effect on the final pixel colour will vary according to the size of the volume.
I don't know how to do this. I've had a look around, but while I've found lots of code which does it (usually on Shadertoy), it all does it differently and I can't find any explanations of why. How does this work, and more importantly, what magic search terms will let me look it up on Google?
It's the Beer-Lambert law, which governs extinction through participating homogenous media. No wonder I was unable to find any keywords which worked.
There's a good writeup here, which tells me almost everything I need to know, although it does rather gloss over the calculation of the phase functions. But at least now I know what to read up on.
Related
I am struggling a bit with the API of the Eigen Library, namely the SimplicialLLT class for Cholesky factorization of sparse matrices.
I have three matrices that I need to factor and later use to solve many equation systems (changing only the right side) - therefore I would like to factor these matrices only once and then just re-use them. Moreover, they all have the same sparcity pattern, so I would like to do the symbolic decomposition only once and then use it for the numerical decomposition for all three matrices. According to the documentation, this is exactly what the SimplicialLLT::analyzePattern and SimplicialLLT::factor methods are for. However, I can't seem to find a way to keep all three factors in the memory.
This is my code:
I have these member variables in my class I would like to fill with the factors:
Eigen::SimplicialLLT<Eigen::SparseMatrix<double>> choleskyA;
Eigen::SimplicialLLT<Eigen::SparseMatrix<double>> choleskyB;
Eigen::SimplicialLLT<Eigen::SparseMatrix<double>> choleskyC;
Then I create the three sparse matrices A, B and C and want to factor them:
choleskyA.analyzePattern(A);
choleskyA.factorize(A);
choleskyB.analyzePattern(B); // this has already been done!
choleskyB.factorize(B);
choleskyC.analyzePattern(C); // this has already been done!
choleskyC.factorize(C);
And later I can use them for solutions over and over again, changing just the b vectors of right sides:
xA = choleskyA.solve(bA);
xB = choleskyB.solve(bB);
xC = choleskyC.solve(bC);
This works (I think), but the second and third call to analyzePattern are redundant. What I would like to do is something like:
choleskyA.analyzePattern(A);
choleskyA.factorize(A);
choleskyB = choleskyA.factorize(B);
choleskyC = choleskyA.factorize(C);
But that is not an option with the current API (we use Eigen 3.2.3, but if I see correctly there is no change in this regard in 3.3.2). The problem here is that the subsequent calls to factorize on the same instance of SimplicialLLT will overwrite the previously computed factor and at the same time, I can't find a way to make a copy of it to keep. I took a look at the sources but I have to admit that didn't help much as I can't see any simple way to copy the underlying data structures. It seems to me like a rather common usage, so I feel like I am missing something obvious, please help.
What I have tried:
I tried using simply choleskyB = choleskyA hoping that the default copy constructor will get things done, but I have found out that the base classes are designed to be non-copyable.
I can get the L and U matrices (there's a getter for them) from choleskyA, make a copy of them and store only those and then basically copy-paste the content of SimplicialCholeskyBase::_solve_impl() (copy-pasted below) to write the method for solving myself using the previously stored L and U directly.
template<typename Rhs,typename Dest>
void _solve_impl(const MatrixBase<Rhs> &b, MatrixBase<Dest> &dest) const
{
eigen_assert(m_factorizationIsOk && "The decomposition is not in a valid state for solving, you must first call either compute() or symbolic()/numeric()");
eigen_assert(m_matrix.rows()==b.rows());
if(m_info!=Success)
return;
if(m_P.size()>0)
dest = m_P * b;
else
dest = b;
if(m_matrix.nonZeros()>0) // otherwise L==I
derived().matrixL().solveInPlace(dest);
if(m_diag.size()>0)
dest = m_diag.asDiagonal().inverse() * dest;
if (m_matrix.nonZeros()>0) // otherwise U==I
derived().matrixU().solveInPlace(dest);
if(m_P.size()>0)
dest = m_Pinv * dest;
}
...but that's quite an ugly solution plus I would probably screw it up since I don't have that good understanding of the process (I don't need the m_diag from the above code since I am doing LLT, right? that would be relevant only if I was using LDLT?). I hope this is not what I need to do...
A final note - adding the necessary getters/setters to the Eigen classes and compiling "my own" Eigen is not an option (well, not a good one) as this code will (hopefully) be further redistributed as open source, so it would be troublesome.
This is a quite unusual pattern. In practice the symbolic factorization is very cheap compared to the numerical factorization, so I'm not sure it's worth bothering much. The cleanest solution to address this pattern would be to let SimplicialL?LT to be copiable.
I'd like to know if someone has experience in writing a HAL AudioUnit rendering callback taking benefits of multi-core processors and/or symmetric multiprocessing?
My scenario is the following:
A single audio component of sub-type kAudioUnitSubType_HALOutput (together with its rendering callback) takes care of additively synthesizing n sinusoid partials with independent individually varying and live-updated amplitude and phase values. In itself it is a rather straightforward brute-force nested loop method (per partial, per frame, per channel).
However, upon reaching a certain upper limit for the number of partials "n", the processor gets overloaded and starts producing drop-outs, while three other processors remain idle.
Aside from general discussion about additive synthesis being "processor expensive" in comparison to let's say "wavetable", I need to know if this can be resolved right way, which involves taking advantage of multiprocessing on a multi-processor or multi-core machine? Breaking the rendering thread into sub-threads does not seem the right way, since the render callback is already a time-constraint thread in itself, and the final output has to be sample-acurate in terms of latency. Has someone had positive experience and valid methods in resolving such an issue?
System: 10.7.x
CPU: quad-core i7
Thanks in advance,
CA
This is challenging because OS X is not designed for something like this. There is a single audio thread - it's the highest priority thread in the OS, and there's no way to create user threads at this priority (much less get the support of a team of systems engineers who tune it for performance, as with the audio render thread). I don't claim to understand the particulars of your algorithm, but if it's possible to break it up such that some tasks can be performed in parallel on larger blocks of samples (enabling absorption of periods of occasional thread starvation), you certainly could spawn other high priority threads that process in parallel. You'd need to use some kind of lock-free data structure to exchange samples between these threads and the audio thread. Convolution reverbs often do this to allow reasonable latency while still operating on huge block sizes. I'd look into how those are implemented...
Have you looked into the Accelerate.framework? You should be able to improve the efficiency by performing operations on vectors instead of using nested for-loops.
If you have vectors (of length n) for the sinusoidal partials, the amplitude values, and the phase values, you could apply a vDSP_vadd or vDSP_vmul operation, then vDSP_sve.
As far as I know, AU threading is handled by the host. A while back, I tried a few ways to multithread an AU render using various methods, (GCD, openCL, etc) and they were all either a no-go OR unpredictable. There is (or at leas WAS... i have not checked recently) a built in AU called 'deferred renderer' I believe, and it threads the input and output separately, but I seem to remember that there was latency involved, so that might not help.
Also, If you are testing in AULab, I believe that it is set up specifically to only call on a single thread (I think that is still the case), so you might need to tinker with another test host to see if it still chokes when the load is distributed.
Sorry I couldn't help more, but I thought those few bits of info might be helpful.
Sorry for replying my own question, I don't know the way of adding some relevant information otherwise. Edit doesn't seem to work, comment is way too short.
First of all, sincere thanks to jtomschroeder for pointing me to the Accelerate.framework.
This would perfectly work for so called overlap/add resynthesis based on IFFT. Yet I haven't found a key to vectorizing the kind of process I'm using which is called "oscillator-bank resynthesis", and is notorious for its processor taxing (F.R. Moore: Elements of Computer Music). Each momentary phase and amplitude has to be interpolated "on the fly" and last value stored into the control struct for further interpolation. Direction of time and time stretch depend on live input. All partials don't exist all the time, placement of breakpoints is arbitrary and possibly irregular. Of course, my primary concern is organizing data in a way to minimize the number of math operations...
If someone could point me at an example of positive practice, I'd be very grateful.
// Here's the simplified code snippet:
OSStatus AdditiveRenderProc(
void *inRefCon,
AudioUnitRenderActionFlags *ioActionFlags,
const AudioTimeStamp *inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList *ioData)
{
// local variables' declaration and behaviour-setting conditional statements
// some local variables are here for debugging convenience
// {... ... ...}
// Get the time-breakpoint parameters out of the gen struct
AdditiveGenerator *gen = (AdditiveGenerator*)inRefCon;
// compute interpolated values for each partial's each frame
// {deltaf[p]... ampf[p][frame]... ...}
//here comes the brute-force "processor eater" (single channel only!)
Float32 *buf = (Float32 *)ioData->mBuffers[channel].mData;
for (UInt32 frame = 0; frame < inNumberFrames; frame++)
{
buf[frame] = 0.;
for(UInt32 p = 0; p < candidates; p++){
if(gen->partialFrequencyf[p] < NYQUISTF)
buf[frame] += sinf(phasef[p]) * ampf[p][frame];
phasef[p] += (gen->previousPartialPhaseIncrementf[p] + deltaf[p]*frame);
if (phasef[p] > TWO_PI) phasef[p] -= TWO_PI;
}
buf[frame] *= ovampf[frame];
}
for(UInt32 p = 0; p < candidates; p++){
//store the updated parameters back to the gen struct
//{... ... ...}
;
}
return noErr;
}
This bug is due to Matlab being too smart for its own good.
I have something like
for k=1:N
stats = subfun(E,k,stats);
end
where statsis a 1xNarray, N=5000 say, and subfun calculates stats(k)from E, and fills it into stats
function stats = subfun(E,k,stats)
s = mean(E);
stats(k) = s;
end
Of course, there is some overhead in passing a large array back and forth, only to fill in one of its elements. In my case, however, the overhead is negligable, and I prefer this code instead of
for k=1:N
s = subfun(E,k);
stats(k) = s;
end
My preference is because I actually have a lot more assignments than just stats.
Also some of the assignments are actually a good deal more complicated.
As mentioned, the overhead is negligable. But, if I do something trivial, like this inconsequential if-statement
for k=1:N
i = k;
if i>=1
stats = subfun(E,i,stats);
end
end
the assignments that take place inside subfun then suddenly takes "forever" (it increases much faster than linearly with N). And it's the assignment, not the calculation that takes forever. In fact, it is even worse than the following nonsensical subfun
function stats = subfun(E,k,stats)
s = calculation_on_E(E);
clear stats
stats(k) = s;
end
which requires re-allocation of stats every time.
Does anybody have the faintest idea why this happens?
This might be due to some obscure detail of Matlab's JIT. The JIT of recent versions of Matlab knows not to create a new array, but to do modifications in-place in some limited cases. One of the requirements is that the function is defined as
function x = modify_big_matrix(x, i, j)
x(i, j) = 123;
and not as
function x_out = modify_big_matrix(x_in, i, j)
x_out = x_in;
x_out(i, j) = 123;
Your examples seem to follow this rule, so, as Praetorian mentioned, your if statement might prevent the JIT from recognizing that it is an in-place operation.
If you really need to speed up your algorithm, it is possible to modify arrays in-place using your own mex-functions. I have successfully used this trick to gain a factor of 4 speedup on some medium sized arrays (order 100x100x100 IIRC). This is however not recommended, could segfault Matlab if you are not careful and might stop working in future versions.
As discussed by others, the problem almost certainly lies with JIT and its relatively fragile ability to modify in place.
As mentioned, I really prefer the first form of the function call and assignments, although other workable solutions have been suggested. Without relying on JIT, the only way this can be efficient (as far as I can see) is some form of passing by reference.
Therefore I made a class Stats that inherits from handle, and which contains the data array for k=1:N. It is then passed by reference.
For future reference, this seems to work very well, with good performance, and I'm currently using it as my working solution.
I have a several series of data points that need to be graphed. For each graph, some points may need to be thrown out due to error. An example is the following:
The circled areas are errors in the data.
What I need is an algorithm to filter this data so that it eliminates the error by replacing the bad points with flat lines, like so:
Are there any algorithms out there that are especially good at detecting error points? Do you have any tips that could point me in the right direction?
EDIT: Error points are any points that don't look consistent with the data on both sides. There can be large jumps, as long as the data after the jump still looks consistent. If it's on the edge of the graph, large jumps should probably be considered error.
This is a problem that is hard to solve generically; your final solution will end up being very process-dependent, and unique to your situation.
That being said, you need to start by understanding your data: from one sample to the next, what kind of variation is possible? Using that, you can use previous data samples (and maybe future data samples) to decide if the current sample is bogus or not. Then, you'll end up with a filter that looks something like:
const int MaxQueueLength = 100; // adjust these two values as necessary
const double MaxProjectionError = 5;
List<double> FilterData(List<double> rawData)
{
List<double> toRet = new List<double>(rawData.Count);
Queue<double> history = new Queue<double>(MaxQueueLength); // adjust queue length as necessary
foreach (double raw_Sample in rawData)
{
while (history.Count > MaxQueueLength)
history.Dequeue();
double ProjectedSample = GuessNext(history, raw_Sample);
double CurrentSample = (Math.Abs(ProjectedSample - raw_Sample) > MaxProjectionError) ? ProjectedSample : raw_Sample;
toRet.Add(CurrentSample);
history.Enqueue(CurrentSample);
}
return toRet;
}
The magic, then, is coming up with your GuessNext function. Here, you'll be getting into stuff that is specific to your situation, and should take into account everything you know about the process that is gathering data. Are there physical limits to how quickly the input can change? Does your data have known bad values you can easily filter?
Here is a simple example for a GuessNext function that works off of the first derivative of your data (i.e. it assumes that your data is a roughly a straight line when you only look at a small section of it)
double lastSample = double.NaN;
double GuessNext(Queue<double> history, double nextSample)
{
lastSample = double.IsNaN(lastSample) ? nextSample : lastSample;
//ignore the history for simple first derivative. Assume that input will always approximate a straight line
double toRet = (nextSample + (nextSample - lastSample));
lastSample = nextSample;
return toRet;
}
If your data is particularly noisy, you may want to apply a smoothing filter to it before you pass it to GuessNext. You'll just have to spend some time with the algorithm to come up with something that makes sense for your data.
Your example data appears to be parametric in that each sample defines both a X and a Y value. You might be able to apply the above logic to each dimension independently, which would be appropriate if only one dimension is the one giving you bad numbers. This can be particularly successful in cases where one dimension is a timestamp, for instance, and the timestamp is occasionally bogus.
If removing the outliers by eye is not possible, try kriging (with error terms) as in http://www.ipf.tuwien.ac.at/cb/publications/pipeline.pdf . This seems to work quite well to automatically deal with occasional extreme noise. I know that French meteorologists use such an approach to remove outliers in their data (like a fire next to a temperature sensor or something kicking a wind sensor for instance).
Please note that it is a difficult problem in general. Any information about the errors is precious. Did someone kick the measuring device ? Then you cannot do much except removing the offending data by hand. Is your noise systematic ? You can do a lot of things then by making (reasonable) hypotheses about it.
So lets say there's a game that has a 'life bar' that consists of theoretical levels. As the user performs specific actions, depending on accuracy of their actions, the life bar grows at a corresponding speed. As it grows and goes into next levels, the criteria for desirable actions change and so the user now has to figure out what those new actions are, to keep the bar growing instead of shrinking. And while the user tries to learn which actions/patterns result in growth, things like 'time' along with undesirable actions slowly bring them back down.
I'm wondering if anyone knows of any open-source games that may have similar logic.
Or perhaps if there's a name for this type of logic so I can try and find some algorithms that may help me set something like this up.
TIA
-added
As it seems there's probably no technical term for something like this, perhaps someone can suggest some pseudo top level logic. I've never built a game before and would like to raise my chances of heading in the optimum direction.
That sounds suspiciously like my Stack Overflow reputation score.
For the purpose of this code, let's pretend that the bar holds the score for the player.
Score = Max score that can be received from the action without modifier
Accuracy = [0..1] Where 0 is total miss on the action and 1 is a perfect hit.
Example: The score for a headshot
LevelModifier = [0..1] Where 0 means that in this level, it doesn't give any
scores and 1 means that the player receives the max bonus.
You can also refer to this as a difficulty modifier.
The higher the level, the more bonus you get.
ScoreDelta = (Score * Accuracy) * LevelModifier
ScoreBar += ScoreDelta
For the timer, you can lower their ScoreBar every second.
ScoreBar -= TimePenalty
For gameplay reasons, you can reset the timer whenever the player does an action. This would reward players who kept moving.
It sounds like you're trying to model karma... I think there's a few web sites that have karma-like systems (SO's rep system is arguably something like that).
I'd start with something simple... If the user does "good" things, it goes up. If they do bad things it goes down. If they do nothing (sloth?), it goes down slowly.
That sounds a lot like an experience bar.
It sounds like you would be best-served by using a state machine.
State A:
* Walk Forward : ++Points
* Jump : Points += 100
* Points < 100 : Go to State A
* Points > 100 : Points = 0; Go to State B
* Points > 150 : Points = 0; Go to State C
State B:
* Kill Bad Guy : ++Points
* Get Hurt : --Points
* Points < -50 : Points = 0; Go to State A
* Points < 100 : Go to State C
* Points > 100 : Points = 0; Go to State D
...etc...
That 'Points == 150' condition is just something I made up to demonstrate the power of the state machine. If the player does something especially good to jump from less than 100 to above 150, then he gets to skip a level. You could even have bonus levels that are only accessible in this way.
Edit: Wow, I got so engrossed in my typing, that I kinda forgot what the initial problem was. Hopefully my answer makes more sense now.
(I think most of the other answerers are interpreting your description as logarithmic growth.)
To be honest, the best way to really do this might just be trial and error. Invent a formula, not too complicated. Play with it in game. If it feels a bit chunky or stiff or floppy or whatever, adjust it, add terms, just experiment.
eventually, it will feel aesthetically pleasing. Thats what you want. It should feel like the response of the health bar follows the effort you're putting in.
Also, just by writing the game, you'll know it pretty well. Be sure and give your friends/coworkers/any random victim, a chance to try it too and determine if they feel it is aesthetically right as you do.