Scenario:
I m experimenting the thermocouple amplifier (SN-6675) with Arduino DUE.
After i'm included MAX6675 library, Arduino can measured room temperature.
However, Temp measured from arduino have 2 issues,
1) offset compare to "Fluke thermometer"
2) have tons of noise, and keep fluctuated after taking average of each 5 temperature sample.
eg, Fluke thermometer got 28.9C at room temp, arduino got 19.75~45.75C
Question: Any method/filter to reduce the measured noise, and gives a steady output?
Codes is attached for reference.
#include <MAX6675.h>
//TCamp Int
int CS = 7; // CS pin on MAX6675
int SO = 8; // SO pin of MAX6675
int SCKpin = 6; // SCK pin of MAX6675
int units = 1; // Units to readout temp (0 = ˚F, 1 = ˚C)
float error = 0.0; // Temperature compensation error
float tmp = 0.0; // Temperature output variable
//checking
int no = 0;
MAX6675 temp0(CS,SO,SCKpin,units,error); // Initialize the MAX6675 Library for our chip
void setup() {
Serial.begin(9600); // initialize serial communications at 9600 bps:
}
void loop() {
no= no + 1;
tmp = temp0.read_temp(5); // Read the temp 5 times and return the average value to the var
Serial.print(tmp);
Serial.print("\t");
Serial.println(no);
delay(1000);
}
Any method/filter to reduce the measured noise, and gives a steady output?
The Kalman filter is pretty much the standard method for this:
Kalman filtering, also known as linear quadratic estimation (LQE), is an algorithm that uses a series of measurements observed over time, containing noise (random variations) and other inaccuracies, and produces estimates of unknown variables that tend to be more precise than those based on a single measurement alone.
If your background isn't maths, don't be put off by the formulas that you come across. In the single-variable case like yours, the filter is remarkably easy to implement, and I am sure googling will find a few implementations.
The filter will give you an estimate of the temperature as well an estimate of the variance of the temperature (the latter gives you an idea about how confident the filter is about its current estimate).
You may want to go for a simpler averaging algorithm. This is not as elegant as a low pass algorithm but may be adequate for your case. These algorithms are plentiful on the web.
You can monkey around with the number of samples you take to balance the compromise between latency and stability. You may want to start with 10 samples and work from there.
Related
When I print my acceleration and velocity, they both start (seemingly) normal. Shortly, they start getting very big, then return -Infinity, then return NaN. I have tried my best with the math/physics aspect, but my knowledge is limited, so be gentle. Any help would be appreciated.
float ang1, ang2, vel1, vel2, acc1, acc2, l1, l2, m1, m2, g;
void setup() {
background(255);
size(600, 600);
stroke(0);
strokeWeight(3);
g = 9.81;
m1 = 10;
m2 = 10;
l1 = 100;
l2 = 100;
vel1 = 0;
vel2 = 0;
acc1 = 0;
acc2 = 0;
ang1 = random(0, TWO_PI);
ang2 = random(0, TWO_PI);
}
void draw() {
pushMatrix();
background(255);
translate(width/2, height/2); // move origin
rotate(PI/2); // make 0 degrees face downward
ellipse(0, 0, 5, 5); // dot at origin
ellipse(l1*cos(ang1), l1*sin(ang1), 10, 10); // circle at m1
ellipse(l2*cos(ang2) + l1*cos(ang1), l2*sin(ang2) + l1*sin(ang1), 10, 10); // circle at m2
line(0, 0, l1*cos(ang1), l1*sin(ang1)); // arm 1
line(l1*cos(ang1), l1*sin(ang1), l2*cos(ang2) + l1*cos(ang1), l2*sin(ang2) + l1*sin(ang1)); // arm 2
float mu = 1 + (m1/m2);
acc1 = (g*(sin(ang2)*cos(ang1-ang2)-mu*sin(ang1))-(l2*vel2*vel2+l1*vel1*vel1*cos(ang1-ang2))*sin(ang1-ang2))/(l1*(mu-cos(ang1-ang2)*cos(ang1-ang2)));
acc2 = (mu*g*(sin(ang1)*cos(ang1-ang2)-sin(ang2))+(mu*l1*vel1*vel1+l2*vel2*vel2*cos(ang1-ang2))*sin(ang1-ang2))/(l2*(mu-cos(ang1-ang2)*cos(ang1-ang2)));
vel1 += acc1;
vel2 += acc2;
ang1 += vel1;
ang2 += vel2;
println(acc1, acc2, vel1, vel2);
popMatrix();
}
You haven't done anything wrong in your code, but the application of this mathematical technique is tricky.
This is a general problem with using numerical "solutions" to differential equations. Similar things happen if you try to simulate a bouncing ball:
//physics variables:
float g = 9.81;
float x = 200;
float y = 200;
float yvel = 0;
float radius = 10;
//graphing variables:
float[] yHist;
int iterator;
void setup() {
size(800, 400);
iterator = 0;
yHist = new float[width];
}
void draw() {
background(255);
y += yvel;
yvel += g;
if (y + radius > height) {
yvel = -yvel;
}
ellipse(x, y, radius*2, radius*2);
//graphing:
yHist[iterator] = height - y;
beginShape();
for (int i = 0; i < yHist.length; i++) {
vertex(i,
height - 0.1*yHist[i]
);
}
endShape();
iterator = (iterator + 1)%width;
}
If you run that code, you'll notice that the ball seems to bounce higher every single time. Obviously this does not happen in real life, nor should it happen even in ideal, lossless scenarios. So what happened here?
If you've ever used Euler's method for solving differential equations, you might see something about what's happening here. Really, what we are doing when we code simulations of differential equations, we are applying Euler's method. In the case of the bouncing ball, the real curve is concave down (except at the points when it bounces). Euler's method always gives an overestimate when the real solution is concave down. That means that every frame, the computer guesses a little bit too high. These errors add up, and the ball bounces higher and higher.
Similarly, with your pendulum, it's getting a little bit more energy almost every single frame. This is a general problem with using numerical solutions. They are simply inaccurate. So what do we do?
In the case of the ball, we can avoid using a numerical solution altogether, and go to an analytical solution. I won't go through how I got the solution, but here is the different section:
float h0;
float t = 0;
float pd;
void setup() {
size(400, 400);
iterator = 0;
yHist = new float[width];
noFill();
h0 = height - y;
pd = 2*sqrt(h0/g);
}
void draw() {
background(255);
y = g*sq((t-pd/2)%pd - pd/2) + height - h0;
t += 0.5;
ellipse(x, y, radius*2, radius*2);
... etc.
This is all well and good for a bouncing ball, but a double pendulum is a much more complex system. There is no fully analytical solution to the double pendulum problem. So how do we minimize error in a numerical solution?
One strategy is to take smaller steps. The smaller the steps you take, the closer you are to the real solution. You can do this by reducing g (this might feel like cheating, but think for a minute about the units you're using. g=9.81 m/s^2. How does that translate to pixels and frames?). This will also make the pendulum move slower on the screen. If you want to increase accuracy without changing the viewing pace, you can take many small steps before rendering the frame. Consider changing lines 39-46 to
int substepCount = 1000;
for (int i = 0; i < substepCount; i++) {
acc1 = (g*(sin(ang2)*cos(ang1-ang2)-mu*sin(ang1))-(l2*vel2*vel2+l1*vel1*vel1*cos(ang1-ang2))*sin(ang1-ang2))/(l1*(mu-cos(ang1-ang2)*cos(ang1-ang2)));
acc2 = (mu*g*(sin(ang1)*cos(ang1-ang2)-sin(ang2))+(mu*l1*vel1*vel1+l2*vel2*vel2*cos(ang1-ang2))*sin(ang1-ang2))/(l2*(mu-cos(ang1-ang2)*cos(ang1-ang2)));
vel1 += acc1/substepCount;
vel2 += acc2/substepCount;
ang1 += vel1/substepCount;
ang2 += vel2/substepCount;
}
This changes your one big step to 1000 smaller steps, making it much more accurate. I tested that part out and it continued for over 20000 frames multiple times with no NaN errors. It might devolve into NaN at some point, but this allows it to last much longer.
EDIT:
I also highly recommend using % TWO_PI when incrementing the angles:
ang1 = (ang1 + vel1/substepCount) % TWO_PI;
ang2 = (ang2 + vel2/substepCount) % TWO_PI;
because it makes the angle measurements MUCH more accurate in the later times.
When you don't do this, if vel1 is positive for a long time, then ang1 gets bigger and bigger. Once ang1 is greater than 1, the computer needs a bit to indicate the ones place, at the expense of an extra digit on the end. Since numbers are stored using binary, this happens again when ang1 > 2, and again when ang1 > 4, and so on.
If you keep it between -PI and PI (which is what % does in this case), you only need a bit for the sign and a bit for the ones place, and all the remaining bits can be used to measure the angle to the highest possible precision. This is actually important: if vel1/substepCount < 1/32768, and ang1 doesn't have enough bits to measure out to the 1/32768 place, then ang1 will not register the change.
To see the effects of this difference, give ang1 and ang2 really high initial values:
g = 0.0981;
ang1 = 101.1*PI;
ang2 = 101.1*PI;
If you don't use % TWO_PI, it approximates low velocities to zero, resulting in a bunch of stopping and starting.
END EDIT
If you need it to go for a ridiculously long time, so long that it isn't feasible to increase substepCount sufficiently, there is another thing you can do. This all comes about because vel increases to an extreme degree. You can constrain vel1 and vel2 so that they don't get too big.
In this case, I would recommend limiting the velocities based on conservation of energy. There is a maximum amount of mechanical energy allowed in the system based on the initial conditions. You cannot have more mechanical energy than the initial potential energy. Potential energy can be calculated based on the angles:
U(ang1, ang2) = -g*((m1+m2)*l1*cos(ang1) + m2*l2*cos(ang2))
Therefore we can determine exactly how much kinetic energy is in the system at any moment: The initial values of ang1 and ang2 give us the total mechanical energy. The current values of ang1 and ang2 give us the current potential energy. Then we can simply take the difference in order to find the current kinetic energy.
The way that pendulum motion is typically described does not lend itself to computing kinetic energy. It is possible, but I'm not going to do it here. My recommendation for constraining the velocities of the two pendulums is as follows:
Calculate the kinetic energy of the two arms separately.
Take the ratio between them
Calculate the total kinetic energy currently in the two arms.
Distribute the kinetic energy in the same proportions as you calculated in step 2. e.g. If you calculate that there is twice as much kinetic energy in the further mass as there is in the closer mass, put 1/3 of the kinetic energy in the closer mass and 2/3 in the further one.
I hope this helps, let me know if you have any questions.
I need to transform the coordinates from spherical to Cartesian space using the Eigen C++ Library. The following code serves the purpose.
const int size = 1000;
Eigen::Array<std::pair<float, float>, Eigen::Dynamic, 1> direction(size);
for(int i=0; i<direction.size();i++)
{
direction(i).first = (i+10)%360; // some value for this example (denoting the azimuth angle)
direction(i).second = (i+20)%360; // some value for this example (denoting the elevation angle)
}
SSPL::MatrixX<T1> transformedMatrix(3, direction.size());
for(int i=0; i<transformedMatrix.cols(); i++)
{
const T1 azimuthAngle = direction(i).first*M_PI/180; //converting to radians
const T1 elevationAngle = direction(i).second*M_PI/180; //converting to radians
transformedMatrix(0,i) = std::cos(azimuthAngle)*std::cos(elevationAngle);
transformedMatrix(1,i) = std::sin(azimuthAngle)*std::cos(elevationAngle);
transformedMatrix(2,i) = std::sin(elevationAngle);
}
I would like to know a better implementation is possible to improve the speed.
I know that Eigen has supporting functions for Geometrical transformations. But I am yet to see a clear example to implement the same.
Is it also possible to vectorize the code to improve the performance?
You could at least use the vectorized versions of sine/cosine:
void dir2vector2(Eigen::Matrix3Xf& out, const Eigen::Array2Xf& in){
Eigen::Array2Xf sine = sin(in * (M_PI/180));
Eigen::Array2Xf cosi = cos(in * (M_PI/180));
out.resize(3, in.cols());
out << cosi.row(0) * cosi.row(1),
sine.row(0) * cosi.row(1),
sine.row(1);
}
There would still be a lot of optimization potential, e.g., calculating both sine and cosine of the same angle could share a lot of computation. And it is technically not necessary to store sine and cosi explicitly into temporaries (but Eigen is currently not able to automatically re-use common-sub expressions).
Also, the multiplication at the end could be vectorized better, if you store your input and output in row-major format (though the Eigen comma-initializer currently does not well with vectorization, it seems).
I wrote a sample code to measure the brightness of LED by controlling the duty cycle of LED connected to Arduino. I want to get the range of least bright light to max bright light for a specific period of period. When i put desired_brightness = 1, the LED is emitting light at 93 lux units, its not the least brightlight. Any suggestion on how to get the least bright light?
int led = 3; // the pin that the LED is attached to
int brightness =0; // how bright the LED is
int incrementfactor = 10; // how many points to fade the LED by
int desired_brightness = 255 ;
int extra_delay = 1000;
void setup() { // declare pin 9 to be an output:
pinMode(led, OUTPUT);
analogWrite(led, desired_brightness);
}
void loop() {
analogWrite(led, desired_brightness);
brightness=brightness+incrementfactor;
if (brightness==desired_brightness) {
delay(extra_delay);
}
}
I've tailored your code a bit. The main problem was you were going to maximum brightness right away, and were never decreasing it. analogWrite() only takes values from 0 to 255. You were starting at 255 and increasing from there, so it just stayed bright. Try this instead, it's the "breathing effect" you see on so many electronics these days, and loops forever:
int ledPin = 3; // the pin that the LED is attached to
int brightness =0; // how bright the LED is
int extra_delay = 1000; // the extra delay at max
int super_delay = 5000; // super delay at min
int direction = 1; // the dimmer-brighter direction
void setup() {
pinMode(ledPin, OUTPUT);
analogWrite(ledPin, 0);
}
void loop() {
analogWrite(ledPin, brightness); // light at certain brightness
//delay(5); // wait a bit, stay at this level so we can see the effect!
delay(50); // longer delay means much slower change from bright to dim
if (direction == 1) // determine whether to go brighter or dimmer
brightness += 1;
else
brightness -= 1;
if (brightness == 255) // switch direction for next time
{
direction = -1;
delay(extra_delay); // extra delay at maximum brightness
}
if (brightness == 0) // switch direction, go brighter next time
{
direction = 1;
delay(super_delay); // super long delay at minimum brightness
}
}
This will go brighter, then dim, and repeat. The delay is very important -- you can reduce it or lengthen it, but without it, the change happens so fast you can't see it by eye, only on an oscilloscope. Hope this helps you!
EDIT:
Added a "super delay" at minimum brightness for measurement of that level.
One thing to keep in mind is that pulse-width modulation like this still gives full driving voltage from the output pin. PWM just alters the ratio of the time the pin is high vs. low. On this Aduino, it's still a voltage swinging instantly and quite rapidly between 0V and 3.3V. To get a true analog voltage from this output, one needs some circuitry to filter the highs and lows of PWM into a smoother average DC voltage. If you want to pursue that, search for "RC low-pass filter" or visit a site like this one.
In my virtual reality program I am heavily bound by memory bandwidth:
#version 320 es
precision lowp float;
const int n_pool = 30;
layout(local_size_x = 8, local_size_y = 16, local_size_z = 1) in;
layout(rgba8, binding = 0) writeonly uniform lowp image2D image;
layout(rgba8, binding = 1) readonly uniform lowp image2DArray pool;
uniform mat3 RT[n_pool]; // <- this is a rotation-translation matrix
void main() {
uint u = gl_GlobalInvocationID.y;
uint v = gl_GlobalInvocationID.x;
vec4 Ir = imageLoad(pool, ivec3(u,v,29));
float cost = 1.0/0.0;
for (int j = 0; j < 16; j++) {
float C = 0.0;
for (int i = 0; i < n_pool; i++) {
vec3 w = RT[i]*vec3(u,v,j);
C += length(imageLoad(pool, ivec3(w[0],w[1],i)) - Ir);
}
}
cost = C < cost ? C : cost;
}
imageStore(image, ivec2(u,v), vec4(cost, cost, cost, 1.0));
}
You can see that I have a lot of random accesses on a TEXTURE_2D_ARRAY (width = 320, height = 240, layers = 30). However, the access is not so random, because it will be in the proximity of u,v.
Here are my thoughts:
another texture format instead rgba-floats (rgba-unsigned byte maybe?).
the shared memory is too small to even store one gray scale image.
changing loop order. Strangely, this ordering is faster although the other should to have a better caching behaviour.
resizing work groups to fit the textures better.
using compressed images (Unlikely scenario giving performance boost). In theory however, that should help with the bandwidth.
What are your thoughts?
Do you have any actual data which shows that the issue you have is texture
bandwidth, or is that just an assumption?
I can see a number of issues which mean that may well not actually be your problem. For example:
vec3 w = RT[i]*vec3(u,v,j);
... you have a mat3 array load in inside your inner loop, so on most architectures I know you probably are uniform fetch bound, not texture bound. This should cache well in the GPU data cache, but is probably still being refetched per loop iteration, which smells a lot more expensive than a single imageLoad() unless you texture format is exceptionally wide ...
If you are using fp16 or fp32 RGBA texture inputs, then narrower 8-bit unorm formats are always going to be faster (fp32 is particularly expensive).
For the following:
cost = C < cost ? C : cost;
... it's probably more reliable in terms of code generation to use the min() built in function.
UPDATE 1
moving from conventional pixel pipeline to compute shader brought 3x speedup
UPDATE 2
using compressed formats increases 5% FPS
UPDATE 3
In this simplified version I didn't show that I indeed created many temporary vectors on-the-fly (both, in the outer and the inner loop). By removing mat3/vec4/vec3 creation within loop brought 2x speedup. Very, surprising to me that creating vectors in loops is that expensive.
Now I am well deep in real-time and fulfilled my goal...
I am doing several steps of reprojections of a point cloud (around 40 Million points initially, ~20 Million while processing). The Programm crashes at seemingly random points at one of these 2 loops. If I run it with a smaller subset (~10 Million Points) everything works fine.
//Projection of Point Cloud into a sphere
pcl::PointCloud<pcl::PointXYZ>::Ptr projSphere(pcl::PointCloud<pcl::PointXYZ>::Ptr cloud,int radius)
{
//output cloud
pcl::PointCloud<pcl::PointXYZ>::Ptr output(new pcl::PointCloud<pcl::PointXYZ>);
//time marker
int startTime = time(NULL);
cout<<"Start Sphere Projection"<<endl;
//factor by which each Point Vector ist multiplied to get a distance of radius to the origin
float scalar;
for (int i=0;i<cloud->size();i++)
{
if (i%1000000==0) cout<<i<<endl;
//P
pcl::PointXYZ tmpin=cloud->points.at(i);
//P'
pcl::PointXYZ tmpout;
scalar=radius/(sqrt(pow(tmpin.x,2)+pow(tmpin.y,2)+pow(tmpin.z,2)));
tmpout.x=tmpin.x*scalar;
tmpout.y=tmpin.y*scalar;
tmpout.z=tmpin.z*scalar;
//Adding P' to the output cloud
output->push_back(tmpout);
}
cout<<"Finished projection of "<<output->size()<<" points in "<<time(NULL)-startTime<<" seconds"<<endl;
return(output);
}
//Stereographic Projection
pcl::PointCloud<pcl::PointXYZ>::Ptr projStereo(pcl::PointCloud<pcl::PointXYZ>::Ptr cloud)
{
//output cloud
pcl::PointCloud<pcl::PointXYZ>::Ptr outputSt(new pcl::PointCloud<pcl::PointXYZ>);
//time marker
int startTime = time(NULL);
cout<<"Start Stereographic Projection"<<endl;
for (int i=0;i<cloud->size();i++)
{
//P
if (i%1000000==0) cout<<i<<endl;
pcl::PointXYZ tmpin=cloud->points.at(i);
//P'
pcl::PointXYZ tmpout;
//equation
tmpout.x=tmpin.x/(1.0+tmpin.z);
tmpout.y=tmpin.y/(1.0+tmpin.z);
tmpout.z=0;
//Adding P' to the output cloud
outputSt->push_back(tmpout);
}
cout<<"Finished projection of"<<outputSt->size()<<" points in "<<time(NULL)-startTime<<" seconds"<<endl;
return(outputSt);
}
If I do all the steps independently by saving/loading the pointclouds on the harddisk and rerunning the program for each step it also works fine. I'd like to provie the entire source files but I'm not sure how/if it's neccessary.
Thanks in advance
Edit:1
After about a week I have still no idea what might be the issue here, since the crashes are somewhat random, but not really? I tried to test the programm with a different system workload (freshly rebooted, with heavy duty programs loaded etc.) makes no apparent difference. Since I thought it's maybe a memory issue, I tried o move the large objects from stack to heap (initialising them with new), did also make no difference. By far the largest object is the raw input file, which I open and close by:
ifstream file;
file.open(infile);
/*......*/
file.close();
delete file;
Is that properly done, so that after the method is completed the memory is released?
Edit again:
So I try further and further, and finally I managed to put all the steps into one function like this:
void stereoTiffI(string infile, string outfile, int length)
{
//set up file input
cout<<"Opening file: "<< infile<<endl;
ifstream file;
file.open(infile);
string line;
//skip first lines
for (int i=0;i<9;i++)
{
getline(file,line);
}
//output cloud
pcl::PointCloud<pcl::PointXYZ> cloud;
getline(file,line);
//indexes for string parsing, coordinates and starting Timer
int i=0;
int j=0;
int k=0;
float x=0;
float y=0;
float z=0;
float intensity=0;
float scalar=0;
int startTime = time(NULL);
pcl::PointXYZ tmp;
//begin loop
cout<<"Begin reading and projecting"<< infile<<endl;
while (!file.eof())
{
getline(file,line);
i=0;
j=line.find(" ");
x=atof(line.substr(i,j).c_str());
i=line.find(" ",i)+1;
j=line.find(" ",i)-i;
y=atof(line.substr(i,j).c_str());
i=line.find(" ",i)+1;
j=line.find(" ",i)-i;
z=atof(line.substr(i,j).c_str());
//i=line.find(" ",i)+1;
//j=line.find(" ",i)-i;
//intensity=atof(line.substr(i,j).c_str());
//leave out points below scanner height
if (z>0)
{
//projection to a hemisphere with radius 1
scalar=1/(sqrt(pow(x,2)+pow(y,2)+pow(z,2)));
x=x*scalar;
y=y*scalar;
z=z*scalar;
//stereographic projection
x=x/(1.0+z);
y=y/(1.0+z);
z=0;
tmp.x=x;
tmp.y=y;
tmp.z=z;
//tmp.intensity=intensity;
cloud.push_back(tmp);
k++;
if (k%1000000==0)cout<<k<<endl;
}
}
cout<<"Finished producing projected cloud in: "<<time(NULL)-startTime<<" with "<<cloud.size()<<" points."<<endl;
And this actually works quit nicely and quickly. In a next step I tried to use Pointtype XYZI because I need to also get the intensity of the scanned points. And guess what, the program crashes at around 17000000 again, and again I have no idea why. Please help
Ok, I solved it. Dr. Memory gave me the right hint by giving me a heap allocation error. After a bit of googling I enabled Large Addresses in Visual Studio (Properties -> Linker -> System)
Everything works like a charm.