Streaming Stores Segmentation Fault on Intel MIC - intel-mic

I want to implement streaming stores in my code on Intel MIC. I have a force_array and 3 variables tempx, tempy and tempz. I need to do some computation and then store them in another array which won't be used in near future. So I felt streaming stores would be a better choice to improve the performance. But I see that I am getting a segmentation fault and I am not sure if it is because of the load or the store. This code is preceded and succeeded by a few lines of code and the entire piece of code is inside two for loops which is preceded by openmp directives. As it is a parallel program, I am not able to debug it well. Can anyone help me by finding out the mistake(s) ?
Thanks in advance !!! The code is given below:
for(k=0;k<np;k++) //np is the number of particles.
{
for(j=k+1;j<np;j++)
{
__m512d y1, y2, y3, y4, y5, y6;
y1 = _mm512_load_pd(force_array + k*nd + 0);
y4 = _mm512_load_pd(&tempx);
y1 = _mm512_sub_pd(y1,y4);
y2 = _mm512_load_pd(force_array + k*nd + 1);
y5 = _mm512_load_pd(&tempy);
y2 = _mm512_sub_pd(y2,y5);
y3 = _mm512_load_pd(force_array + k*nd + 2);
y6 = _mm512_load_pd(&tempz);
y3 = _mm512_sub_pd(y3,y6);
_mm512_storenr_pd((f+k*nd+0), y1);
_mm512_storenr_pd((f+k*nd+1), y2);
_mm512_storenr_pd((f+k*nd+2), y3);
}
}

_mm512_load_pd() requires the address that you are loading from to be 64 byte aligned.
The arrays f and force_array will need to have their starting addresses 64 byte aligned and allocated with _mm_alloc(size,64) or be declared __attribute__((aligned(64)) for stack objects as you have done.
I think the problem here is not the starting addresses but the computed addresses during your inner loop. If nd=3 that means when k=1 the offset from the beginning of the force_array with be 3 doubles i.e. 24 bytes.
You will need to pad each of these force objects out to 8 bytes to use aligned loads, otherwise you will need to use unaligned loads.
Best regards,
Alastair
P.S. y1 and y2 load 8 doubles that are only 8 bytes apart, are you sure this is what you meant to achieve?

Related

How to plot a float number in Microchip Data Visualizer

I am having a problem between sending a float trough the UART to be plotted in a graph on the Data Visualizer of Microchip.
I could plot int numbers without problem, but float ones are driving me crazy.
I made a sine wave with Laplace trnasform. After that put it on the 'z' plane with the bilineal z transform, then put the equation in the main routine of a dsPIC33FJ128GP802. It is working ok. In the terminal I can see the values and if I copy/paste those values on gnumeric and make a graph, it shows me my discrete sine wave.
The problem comes when I try to plot the float number 'yn' in the data visualizer of the MPLABX. There is something I am missing in the middle.
I am using MPLABX v5.45, XC16 v1.61 on Debian Bullseye. The communication with the microcontroller is transparent #9600-8N1.
Here is my main code:
int main(void)
{
InitClock(); // This is the PLL settings
Init_UART1();// This is the UART Init values for 9600-8-N-1
float states[6] = {0,0,0,0,0,0};
// states [xn-2 xn-1 xn yn yn-1 yn-2]
xn = 1.0; //the initial value
while (1)
{
yn = 1.9842*yn1-yn2+0.0013*xn1+0.0013*xn2; // equation for the sine wave
yn2 = yn1;
yn1 = yn;
xn2 = xn1;
xn1 = xn;
putc(0x03,stdout);
//Here I want to send the xn to plot in MDV
putc(0xFC,stdout);
}
}
The variables in the equation
yn = 1.9842*yn1-yn2+0.0013*xn1+0.0013*xn2;
are with #define like this
#define xn states[2]
#define xn1 states[1]
#define xn2 states[0]
#define yn states[3]
#define yn1 states[4]
#define yn2 states[5]
The WriteUART1(0x03); and the WriteUART1(0xFC); are for Data Visualizer to see the first byte and the last byte. It is like the example on the Microchip video.
The question is: How can I manage the float yn to be plot by the Microchip Data Visualizer.
Thanks in advance.
Ok, here is the answer.
A float number it is 32 bits long but you can't manage them bit by bit like int ones. So the way is to manage like a char.
You have to make a pointer to a char, assign the address of the float to the pointer (casting the address, because char pointer isn't the same as float pointer). Then just send 4 bytes incrementing the char pointer.
Here is the code:
while (1)
{
yn = 1.9842 * yn1 - yn2 + 0.0013 * xn1 + 0.0013 * xn2; // sine recursive equation
yn2 = yn1;
yn1 = yn;
xn2 = xn1;
xn1 = xn;
ptr = (char *) & yn; // This is the char pointer ptr saving the address of yn by casting it to char*, because &yn is float*
putc(0x03,stdout); // the start frame for MPLABX Data Visualizer
for (x = 0; x < sizeof(yn) ; x++) // with the for we go around the four bytes of the float
putc(*ptr++,stdout); // we send every byte to the UART
putc(0xFC,stdout); // the end frame for MPLABX Data Visualizer.
}
With this working, you have to config the data visualizer, your baudrate, and then select new streaming variable. You select a name, then Framing Mode you select One's complement, the start frame in this case 0x03 and the end frame 0xFC. Just name the variable and then type float32, press next, plot variable, finish and you have the variable in the MPLABX time plotter.
Here is the image of the plot
Hope, this helps someone.
Regards.-

Sorting an array of objects based on one attribute only in Processing

I have a series of randomly plotted lines from a class called Line.
I have put all the objects into an array. I would like to connect any lines that are near each other with a dotted line. The simplest way I can think of doing this is to say if the x1 co-ordinate is <5 pixels from the x1 of another line, then draw a dotted line connecting the two x1 co-ordinates.
The problem I have is how to compare all the x1 co-ordinates with all the other x1 co-ordinates. I think this should involve 1. Sorting the array and then 2. Comparing consecutive array elements. However I want to sort only on x1 and I dont know how to do this.
Here is my code so far:
class Line{
public float x1;
public float y1;
public float x2;
public float y2;
public color cB;
public float rot;
public float fat;
public Line(float x1, float y1, float x2, float y2, color tempcB, float rot, float fat){
this.x1 = x1;
this.y1 = y1;
this.x2 = x2;
this.y2 = y2;
this.cB = tempcB;
this.rot = rot;
this.fat = fat;
};void draw(){
line(x1, y1, x2, y2);
//float rot = random(360);
float fat = random(5);
strokeWeight(fat);
////stroke (red,green,blue,opacity)
stroke(fat*100, 0, 0);
rotate(rot);
}
}
//Create array of objects
ArrayList<Line> lines = new ArrayList<Line>();
void setup(){
background(204);
size(600, 600);
for(int i = 0; i < 200; i++){
float r = random(500);
float s = random(500);
lines.add(new Line(r,s,r+10,s+10,color(255,255,255),random(360),random(5)));
}
//Draw out all the lines from the array
for(Line line : lines){
line.draw();
//Print them all out
println(line.x1,line.y1,line.x2,line.y2,line.cB,line.rot,line.fat);
}
}
//Now create connections between the elements
//If the x1 of the line is <5 pixels from another line then create a dotted line between the x1 points.
Like the other answer said, you need to compare both end points for this to make any sense. You also don't have to sort anything.
You should be using the dist() function instead of trying to compare only the x coordinate. The dist() function takes 2 points and gives you their distance. You can use this to check whether two points are close to each other or not:
float x1 = 75;
float y1 = 100;
float x2 = 25;
float y2 = 125;
float distance = dist(x1, y1, x2, y2);
if(distance < 100){
println("close");
}
You can use this function in your Line class to loop through other Lines and check for close points, or find the closest points, whatever you want.
As always, I recommend you try something out and ask another question if you get stuck.
The problem lies in the fact that a Line is composed of two points, and despite being tied together (pun intended), you need to check the points of each Line independently. The only point you really don't need to check is other point in the same Line instance.
In this case, it might be in your best interest to have a Point class. Line would then use Point instances to define both ends rather than the raw float coordinates. In this way, you can have both a list of Lines as well as a list of Points.
In this way you can sort Points by x coordinate or y coordinate and grab all points within 5 pixels of your point (and that isn't the same instance or other point in Line instance of course).
Being able to split handling into Points and Lines is important in that you're using multiple views to handle the same data. As a general rule, you should rearrange said data whenever it becomes cumbersome to deal with in its current form. However if I may make a recommendation, the sorting is not strictly necessary. If you're checking a single point with all other points, you'd have to sort repeatedly according to the current point which is more work than simply making a pass in a list to deal with all other points that are close enough.

iOS: How to determine whether three CGPoints lie in a straight line

I have two given CGPoints A and B and one another CGPoint C as obtained from within touchesEnded event.
I want to determine whether the three points lie in a straight line for which I have used the below given formula:
For point A(x1, y1), B(x2, y2) and C(x3, y3)
x1(y2 - y3) + x2 (y3-y1) + x3(y1-y2) = 0
But the formula doesn't helpful at all.
is there any other way of determining collinearity of three points in iOS
Thanks
arnieterm
This is not really an iOS question, or even a programming question -- it's an algorithm question.
I'm not sure if the algorithm you were given is correct -- see the cross-product answer given in comments for what I think of as the correct answer. In what way do you find your formula not helpful? Here it is in code, btw. (code typed in browser, not checked):
CGPoint p1;
CGPoint p2;
CGPoint p3;
const float closeEnough = 0.00001; // because floats rarely == 0
float v1 = p1.x * (p2.y - p3.y);
float v2 = p2.x * (p3.y - p1.y);
float v3 = p3.x * (p1.y - p2.y);
if (ABS(v1 + v2 + v3) < closeEnough)
{
// your algorithm is true
}

OpenCV running slow

I have got OpenCV installed on vs2010/win7 however I'm seeing some behaviour that I can't figure out.
I am new to OpenCV so just have a basic program to pull frames from an avi file - it then splits that frame into single channel images and generates histograms for each of those (taken from an internet example). It actually all works fine, it's just extremely slow. It turns out that cvFillConvexPoly is actually taking 10-15 seconds (sometimes longer) to complete - but when it eventually returns it is correct.
This is the code snippet where I call the culprit function and as you can see I also tried cvFillPoly which took the same amount of time to complete.
IplImage* DrawHistogram(CvHistogram *hist, float sX)
{
float histMax = 0;
cvGetMinMaxHistValue(hist, 0, &histMax, 0, 0);
IplImage *imgHist = cvCreateImage(cvSize(256, 64), IPL_DEPTH_8U, 1);
cvZero(imgHist);
float histValue = 0;
float nextValue = 0;
for (int i = 0; i < ((BINS - 1)*sX); i++)
{
histValue = cvQueryHistValue_1D(hist, i);
nextValue = cvQueryHistValue_1D(hist, i + 1);
CvPoint p1 = cvPoint(i * sX, 64);
CvPoint p2 = cvPoint((i + 1) * sX, 64);
CvPoint p3 = cvPoint((i + 1) * sX, 64 - histValue*(64/histMax));
CvPoint p4 = cvPoint(i * sX, 64 - histValue*(64/histMax));
int n = 5;
CvPoint pts[] = {p1, p2, p3, p4};
cvFillConvexPoly(imgHist, pts, n, cvScalar(255));
//cvFillPoly(imgHist, pts, &n, 1,cvScalar(255));
}
return imgHist;
}
Any help is appreciated.
Compiled on Win7 x64 with CMake 2.8.2/VS2010 as 32 bit. Same behaviour when debugging and when running as a standalone.
Also have it running on Ubuntu 10.10 32 bit, compiled with gcc 4.4.5 where there isn't a problem.
Edit
I've tried recompiling with VS2008 and it still does the same thing. I don't understand what would cause it to run so slowly - unless it's the way 64bit windows "emulates" 32 bit which is causing the problem.
I can spot 2 possible bugs in your code, both of which have to do with bounds. Reading/writing outside array boundaries may result in all kinds of unexpected behavior, so it's a wonder your programs don't crash. Maybe GCC and/or the OpenCV library behaves differently on Ubuntu and Windows causing it not to crash on Ubuntu, but you should definitely take a look at the following 2 points.
I assume sX is a scaling factor? Your for loop should run from 0 to (BINS-1) no matter this scaling, since you're using i to index your histogram and there are BINS bins, not BINS*sX. As long as sX == 1 you won't run into trouble but any other value will invalidate your histogram drawing code. You are already using sX in its correct way too in the cvPoint declarations.
According to the docs of the cvFillConvexPoly function, n should be the number of points, which is 4 in your case, not 5.

How do I visualize audio data?

I would like to have something that looks something like this. Two different colors are not nessesary.
(source: sourceforge.net)
I already have the audio data (one sample/millisecond) from a stereo wav in two int arrays, one each for left and right channel. I have made a few attempts but they don't look anywhere near as clear as this, my attempts get to spikey or a compact lump.
Any good suggestions? I'm working in c# but psuedocode is ok.
Assume we have
a function DrawLine(color, x1, y1, x2, y2)
two int arrays with data right[] and left[] of lenght L
data values between 32767 and -32768
If you make any other assumptions just write them down in your answer.
for(i = 0; i < L - 1; i++) {
// What magic goes here?
}
This is how it turned out when I applied the solution Han provided. (only one channel)
alt text http://www.imagechicken.com/uploads/1245877759099921200.jpg
You'll likely have more than 1 sample for each pixel. For each group of samples mapped to a single pixel, you could draw a (vertical) line segment from the minimum value in the sample group to the maximum value. If you zoom in to 1 sample per pixel or less, this doesn't work anymore, and the 'nice' solution would be to display the sinc interpolated values.
Because DrawLine cannot paint a single pixel, there is a small problem when the minimum and maximum are the same. In that case you could copy a single pixel image in the desired position, as in the code below:
double samplesPerPixel = (double)L / _width;
double firstSample = 0;
int endSample = firstSample + L - 1;
for (short pixel = 0; pixel < _width; pixel++)
{
int lastSample = __min(endSample, (int)(firstSample + samplesPerPixel));
double Y = _data[channel][(int)firstSample];
double minY = Y;
double maxY = Y;
for (int sample = (int)firstSample + 1; sample <= lastSample; sample++)
{
Y = _data[channel][sample];
minY = __min(Y, minY);
maxY = __max(Y, maxY);
}
x = pixel + _offsetx;
y1 = Value2Pixel(minY);
y2 = Value2Pixel(maxY);
if (y1 == y2)
{
g->DrawImageUnscaled(bm, x, y1);
}
else
{
g->DrawLine(pen, x, y1, x, y2);
}
firstSample += samplesPerPixel;
}
Note that Value2Pixel scales a sample value to a pixel value (in the y-direction).
You might want to look into the R language for this. I don't have very much experience with it, but it's used largely in statistical analysis/visualization scenarios. I would be surprised if they didn't have some smoothing function to get rid of the extremes like you mentioned.
And you should have no trouble importing your data into it. Not only can you read flat text files, but it's also designed to be easily extensible with C, so there is probably some kind of C# interface as well.

Resources