I have some data I'm trying to fit an ExponentialSmoothing on:
fit1 = ExponentialSmoothing(df1, trend='add').fit()
For some reason the prediction will just predict a straight line:
plt.figure(figsize = [15, 5])
plt.plot(fit1.fittedvalues)
plt.plot(fit1.forecast(5000))
plt.show()
Why can this happen? What am I missing?
The model you have selected (exponential smoothing with a trend term) can only produce two types of forecasts:
A straight line that is trending up or down (if there is an upwards or downwards trend in your data)
A horizonal line (if the trend is estimated to be zero)
Although your data has irregular explosive behavior, it does not overall have a trend in it. To me, it seems reasonable that the trend term would be estimated to be zero, and so the forecast is a straight line.
If you want to model those irregular explosive periods, you would need a more complicated model.
Related
I took the data from here (Nasa - Topography) and here (Nasa - Bathymetric) and used it to create a 3D model of the Earth's entire surface (both above and below water).
Here's what I got:
As you can see, it is super jagged.
The problem is that due to the fact that I'm using greyscale images, I only have 512 distinct levels to work with (256*2). When going from the ocean floor to the highest peak, you're obviously going to hit more than 512 distinct elevations. So it's basically an unwanted step-function.
Had they used all RGB channels this wouldn't be a problem, but then the image wouldn't be very "human-readable"
Smoothing in general is a possibility, but not a great possibility because it will lower the quality of cliffs, peaks, canyons, ect. drastically.
Here's the thing: we know that each pixel is within (maxHeight-minHeight)/512 (=maxoffset) of the actually correct value: as stated it has pretty much gone through an unwanted step function. Of course, mathematically a step function is irreversible - however, that doesn't stop us from trying!
Here are some of my thoughts on how this might work:
Find the average height of surrounding pixels, for some radius. Calculate the difference between this pixel's current value and the calculated average. Do nothing with this value yet.
While calculating this, store which pixel has the greatest difference.
Then, "normalize" all values such that this greatest difference is (maxHeight-minHeight)/512: the maxoffset: the max that a pixel could be off. Due to outliers, this "normalization" shouldn't be linear, but such that the average is 85% (or something) of this maxoffset.
Peaks (pixels that are higher than all surrounding pixels) and Basins (same idea except lower) get excluded from this process, as they'll be outliers and shouldn't change much anyhow (or undergo a process of their own).
That might not work. I could still use basic "average smoothing" except with the following rules:
No smoothing of peaks (pixels that are higher than all surrounding pixels), basins (same idea except lower), cliffs (this is way more difficult and may not happen - but the idea is to check if pixels have a drop on one side and roughly the same height pixels on the other side for some distance).
If the pixel has significantly more pixels around the same height than not, give greater weight to those nearly-same-height pixels.
I'm also looking into finding better "data", but I'm not confident that I will due to the fact that I require bathymetric data: most GPS APIs are Topography exclusive. In any case, this is an interesting problem nonetheless and I'm curious if are already some good algorithms.
I'm working on a project in which a rod is attached at one end to a rotating shaft. So, as the shaft rotates from 0 to ~100 degrees back-and-forth (in the xy plane), so does the rod. I mounted a 3-axis accelerometer at the end of the moving rod, and I measured the distance of the accelerometer from the center of rotation (i.e., the length of the rod) to be about 38 cm. I have collected a lot of data, but I'm in need of help to find the best method to filter it. First, here's a plot of the raw data:
I think the data makes sense: if it's ramping up, then then I think at that point the acceleration should be linearly increasing, and then when it's ramping down, it should linearly decrease. If its moving constantly, the acceleration will be ~zero. Keep in mind though that sometimes the speed changes (is higher) from one "trial" to the other. In this case, there were ~120 "trials" or movements/sweeps, data sampled at 148 Hz.
For filtering, I've tried a low pass filter and then an exponentially decreasing moving average, and both plots weren't too hot. And although I'm not good at interpreting these: here is what I got when coding a power frequency plot:
What I was hoping to get help with here is, attain a really good method by which I can filter this data. The one thing that keeps coming up again time and time again (especially on this site) is the Kalman filter. While there's lots of code online that helps implementing these in MATLAB, I haven't been able to actually understand it that great, and therefore neglect to include my work on it here. So, is a kalman filter appropriate here, for rotational acceleration? If so, can someone help me implement one in matlab and interpret it? Is there something I'm not seeing that may be just as good/better that is relatively simple?
Here's the data I'm talking about. Looking at it more closely/zooming in gives a better appreciation for what's going on in the movement, I think:
http://cl.ly/433B1h3m1L0t?_ga=1.81885205.2093327149.1426657579
Edit: OK, here is the plot of both relavent dimensions collected from the accelerometer. I am neglecting to include the up and down dimension as the accelerometer shows a near constant ~1 G, so I think its safe to say its not capturing much rotational motion. Red is what I believe is the centripetal component, and blue is tangential. I have no idea how to combine them though, which is why I (maybe wrongfully?) ignored it in my post.
And here is the data for the other dimension:
http://cl.ly/1u133033182V?_ga=1.74069905.2093327149.1426657579
Forget the Kalman filter, see the note at the end of the answer for the reason why.
Using a simple moving average filter (like I showed you on an earlier reply if i recall) which is in essence a low-pass filter :
n = 30 ; %// length of the filter
kernel = ones(1,n)./n ;
ysm = filter( kernel , 1 , flipud(filter( kernel , 1 , flipud(y) )) ) ;
%// assuming your data "y" are in COLUMN (otherwise change 'flipud' to 'fliplr')
note: if you have access to the curvefit toolbox, you can simply use: ys = smooth(y,30) ; to get nearly the same result.
I get:
which once zoomed look like:
You can play with the parameter n to increase or decrease the smoothing.
The gray signal is your original signal. I strongly suspect that the noise spikes you are getting are just due to the vibrations of your rod. (depending on the ratio length/cross section of your rod, you can get significant vibrations at the end of your 38 cm rod. These vibrations will take the shape of oscillations around the main carrier signal, which definitely look like what I am seeing in your signal).
Note:
The Kalman filter is way overkill to do a simple filtering of noisy data. Kalman filter is used when you want to calculate a value (a position if I follow your example) based on some noisy measurement, but to refine the calculations, the Kalman filter will also use a prediction of the position based on the previous state (position) and the inertial data (how fast you were rotating for example). For that prediction you need a "model" of the behavior of your system, which you do not seem to have.
In your case, you would need to calculate the acceleration seen by the accelerometer based on the (known or theoretical) rotation speed of the shaft at any point of time, the distance of the accell to the center of rotation, and probably to make it more precise, a dynamic model of the main vibration modes of your rod. Then for each step, compare that to the actual measurement... seems a bit heavy for your case.
Look at the quick figure explaining the Kalman filter process in this wikipedia entry : Kalman filter, and read on if you want to understand it more.
I will propose for you low-pass filter, but ordinary first-order inertial model instead of Kalman. I designed filter with pass-band till 10 Hz (~~0,1 of your sample frequency). Discrete model has following equation:
y[k] = 0.9418*y[k-1] + 0.05824*u[k-1]
where u is your measured vector, and y is vector after filtering. This equation starts at sample number 1, so you can just assign 0 to the sample number 0.
I have a set of (slightly noisy) GPS coordinates that I want to turn into a path. How do I go about this?
I find it similar to this question, except my points are ordered. Also, the path does not need to go through the points, but just follow their general direction.
It seems that Bezier curves might be the answer, but is there a way to use Bezier curves on hundreds of points?
Q&A:
How are your points ordered They are ordered by time and attached to a travelling car. There might be data that specify that the the car is travelling backwards, but I can remove that data by requiring that all points move in a "forward" direction. So then I should have just a list of points that all go forwards in space and don't intersect with themselves.
What if we connect all the lines with straight lines It won't look pretty. I'd like for the lines to be continuous and curvy.
What about using a spline between all the points This too will make too much noise. The path will be very "jumpy". It would be better if we didn't care about going through points, but just near them.
It is a bit of heavy machinery, but you can model your GPS observations as points following a Gaussian process with Gaussian noise, where the main Gaussian process model specifies that the underlying unknown true x and y coordinates of two measurements close in time should be close, and the noise allows the observed x and y GPS measurement values to deviate a bit from the true x and y values predicted by the Gaussian process model. You can read the book "Gaussian Processes for Machine Learning" available online if you're interested. I think it's a really elegant, flexible and powerful solution, but it would take way too much space to explain it in enough detail here so you really do need to read about it from the book.
Once you've learned the most likely Gaussian process model solution, you can make predictions of x and y locations for any time point, and it will be a smooth curve, which you can then plot. It won't pass through the observed GPS locations exactly.
So I’m trying to find the rotational angle for stripe lines in images like the attached photo.
The only assumption is that the lines are parallel, and their orientation is about 90 degrees approximately more or less [say 5 degrees tolerance].
I have to make sure the stripe lines in the result image will be %100 vertical. The quality of the images varies as well as their histogram/greyscale values. So methods based on non-adaptive thresholding already failed for my cases [I’m not interested in thresholding based methods if I cannot make it adaptive]. Also, there are some random black clusters on top of the stripe lines sometimes.
What I did so far:
1) Of course HoughLines is the first option, but I couldn’t make it work for all my images, I had some partial success though following this great article:
http://felix.abecassis.me/2011/09/opencv-detect-skew-angle/.
The main reason of failure to my understanding was that, I needed to fine tune the parameters for different images. Parameters such as Canny/BW/Morphological edge detection (If needed) | parameters for minLinelength/maxLineGap/etc. For sure there’s a way to hack into this and make it work, but, to me this is a fragile solution!
2) What I’m working on right now, is to divide the image to a top slice and a bottom slice, then find the peaks and valleys of each slice. Then basically find the angle using the width of the image and translation of peaks. I’m currently working on finding which peak of the top slice belongs to which of the bottom slice, since there will be some false positive peaks in my computation due to existence of black/white clusters on top of the strip lines.
Example: Location of peaks for slices:
Top Slice = { 1, 33,67,90,110}
BottomSlice = { 3, 14, 35,63,90,104}
I am actually getting similar vectors when extracting peaks. So as can be seen, the length of vector might vary, any idea how can I get a group like:
{{1,3},{33,35},{67,63},{90,90},{110,104}}
I’m open to any idea about improving any of these algorithms or a completely new approach. If needed, I can upload more images.
If you can get a list of points for a single line, a linear regression will give you a formula for the straight line that best fits the points. A simple trig operation will convert the line formula to an angle.
You can probably use some line thinning operation to turn the stripes into a list of points.
You can run an accumulator of spatial derivatives along different angles. If you want a half-degree precision and a sample of 5 lines, you have a maximum 10*5*1500 = 7.5m iterations. You can safely reduce the sampling rate along the line tenfold, which will give you a sample size of 150 points per sample, reducing the number of iterations to less than a million. Somewhere around that point the operation of straightening the image ought to become the bottleneck.
my problem is that I have a large set of GPS tracks from different GPS loggers used in cars. When not turned off these cheap devices log phantom movements even if standing still:
As you can see in the image above, about a thousand points get visualized in a kind of congestion. Now I want to remove all of these points so that the red track coming from the left ends before the jitter starts.
My approach is to "draw" two or three circles around each point in the track, check how many other points are located within these circles and check the ratio:
(#points / covered area) > threshold?
If the threshold exceeds a certain ratio (purple circles), I could delete all points within. So: easy method, but has huge disadvantages, e.g. computation time, deleting "innocent" tracks only passing through the circle, doesn't detect outliers like the single points at the bottom of the picture).
I am looking for a better way to detect large heaps of points like in the picture. It should not remove false positives (of perhaps 5 or 10 points, these aggregations don't matter to me). Also, it should not simplify the rest of the track!
Edit: The result in given example should look like this:
My first step would be to investigate the speeds implied by the 'movements' of your stationary car and the changes in altitude. If either of these changes too quickly or too slowly (you'll have to decide the thresholds here) then you can probably conclude that they are due to the GPS jitter.
What information, other than position at time, does your GPS device report ?
EDIT (after OP's comment)
The problem is to characterise part of the log as 'car moving' and part of the log as 'car not moving but GPS location jittering'. I suggested one approach, Benjamin suggested another. If speed doesn't discriminate accurately enough, try acceleration. Try rate of change of heading. If none of these simple approaches work, I think it's time for you to break out your stats textbooks and start figuring out autocorrelation of random processes and the like. At this point I quietly slink away ...
Similarly to High Performance Mark's answer, you could look for line intersections that happen within a short number of points. When driving on a road, the route of the last n points rarely intersects with itself, but it does in your stationary situation because of the jitter. A single intersection could be a person doubling-back or circling around a block, but multiple intersections should be rarer. The angle of intersection will also be sharper for the jitter case.
What is the data interval of the GPS Points, it seems that these are in seconds. There may be one other way to add to the logic previously mentioned.
sum_of_distance(d0,d1,d2....dn)>=80% of sum_of_distance(d0,dn)
This 0 to n th value can iterate in smaller and larger chunks, as the traveled distance within that range will not be much. So, you can iterate over may be 60 points of data initially, and within that data iterate in 10 number of data in each iteration.