Consider a timeline that visually summarises a signal of data. Let the length of this timeline be fixed at, say, 5,000 pixels long. Data streams in blocks and fills up our 5,000 window. OK so far. We then receive another block of data 500 values which we want to merge into our 5,000 timeline window, giving the user a realtime visualisation of the overall signal to date. Does anyone know an algorithm that supports this?
What I implemented (but doesn't work) is when my 5000 window grows by 500 and becomes 5,500 long I then interpolate this down to the fixed 5000 window, updating the view. And repeat this process as blocks continue to arrive. However, I have found this doesn't work and gradually moves the overall picture of the signal from right to left, crunching up the data on the left hand side.
Because the data streams in blocks and is too large to store and perform an overall picture at the end, I need to continually update the overall view in real time as the data arrives.
If anyone knows of an algorithn, it would be much appreciated. I code in Java, but a solution in any language or technical paper would be fine. Thanks.
Related
I'm gathering data from load sensors at about 50Hz. I might have 2-10 sensors running at a time. This data is stored locally but after a period of about a month it needs to be uploaded to the cloud. The data during this one second can vary quite significantly and is quite dynamic.
It's too much data to send because its going over GSM and signal will not always be great.
The most simplistic approach I can think of is to look at the 50 data points in 1 sec and reduce it to just enough data to make a box and whisker graph. Then, the data stored in the cloud could be used to create dashboards that look similar to how you look at stocks. This would at least show me the max, min, average and give some idea around the distribution of the load during that second.
This is probably over simplified though so I was wondering if there was a common approach to this problem in data science... take a dense set of data and reduce it to still capture the highlights and not lose its meaning.
Any help or ideas appreciated
I am currently toying around with the Scroll API of Elasticsearch, and want to use it to obtain a large set of data and do some manual processing on it. The processing is performed by an external library and is not of the type that can easily be included as a script.
While this seems to work nicely at the moment, I was wondering what considerations that I should take into account when fine-tuning the scroll size for performing this form of processing. A quick observation seems to indicate that increasing the scroll size will reduce the latency of the operation. While I suspect that larger scroll sizes will generally reduce throughput, I have no idea whether this hypothesis is correct. Also, I have no idea if there are any other consequences that I do not envision right now.
So to summarize, my question is: what impact does changing Elasticsearch's scroll size have, especially on performance, in a scenario where the results are processed for each batch that is obtained?
Thanks in advance!
The one (and the only I know of) consideration is to be able to process batch fast enough to not release scroll context (which is controlled by ?scroll=X parameter).
Assuming that you will consume all the data from query, there, scroll should be tuned based on network and 3rd-party app performance. I.e.
if your app can process data in stream-like manner, bigger chunks is better
if your app processing data in batches (waiting for full ES response first), the upper limit for batch size should guarantee processing time < scroll release time
if you work in poor network environment, less batch size is better to handle overhead of dropped connections/retries
generally, bigger batch is obviously better, as it eliminates some network/ES cpu overhead
Background for the data: it is the data from a single variable from a machine like a bulldozer (pressure of the hydraulics, which is responsible for the movement of its bucket), which performs actions like loading its bucket and then moving the vehicle to a place to dump the loaded material and then dumping the material.
I have marked the Load Event (loading the bucket), Haul Event (machine moving to dump), and Dump Event (dumping the load).
So one Load Event, Haul Event and Dump Event constitutes a Complete Cycle. In the image provided I see 12 such cycles.
Problem Statement: Detect the count of such cycles in the data provided, and also eliminate the noise (I have marked Noise in red in the image). Also calculate the time taken by each event: how much time it took for the load event, haul event and dump event? Combining these three gives the complete cycle time.
I tried to detect it using a moving average but it doesn't fit well.
Can anyone suggest a machine learning/ANN/better way which can accurately detect the events?
Observing at the image one can learn that the initial peaks are Load and then Haul and the last spike would be Dump. So, we need to detect peaks based on a dynamic threshold.
Any viable approach to solve this problem is appreciated.
in my application a receive measurement data from a external library. It's about a dataset of 20 3D Points 30 times per second. I've realized some jitter in the data and I'm looking for a way to supress or better flatten peaks out of the data stream without significantly slowing down the whole system. Unfortunately I have to rely on the last dataset I receive so I can't record the data (lets say in a Queue) and filter out "bad" values. But I could record data and try fitting an incoming value to the recorded data.
Is there any approved and reliable algorithm for this problem?
Thanks in advance
FUX
Let's say I want to speed up networking in a real-time game by sending only changes in position instead of absolute position. How might I deal with packet loss? If one packet is dropped the position of the object will be wrong until the next update.
Reflecting on #casperOne's comment, this is one reason why some games go "glitchy" over poor connections.
A possible solution to this is as follows:
Decide on the longest time you can tolerate an object/player being displayed in the wrong location - say xx ms. Put a watchdog timer in place that sends location of an object "at least" every xx ms, or whenever a new position is calculated.
Depending on the quality of the link, and the complexity of your scene, you can shorten the value of xx. Basically, if you are not using available bandwidth, start sending current position of the object that has not had an update sent the longest.
To do this you need to maintain a list of items in the order you have updated them, and rotate through it.
That means that fast changes are reflected immediately (if object updates every ms, you will probably get a packet through quite often so there is hardly any lag), but it never takes more then xx ms before you get another chance at having an updated state.