Radioactively Depleting timer list - data-structures

I have a dataset of about 1-10k records, each has a ticking timer associated with it.
I'd like to be able to query it in a way that it surfaces the data which has passed a certain time limit.
Here's an example:
[obj1] [2sec]
[obj2] [3sec]
[obj3] [2sec]
[obj4] [5sec]
[obj5] [7sec]
[obj6] [3sec]
[obj7] [2sec]
After 2 seconds, I'd like this data structure to surface out obj1, obj3 and obj7.
After 1 more second, the data structure with surface out obj2 and obj 6, and so on...
Sidenote, is there a way to do this via Redis?
Thank you for your help in advance...

Related

Using grafana counter to visualize weather data

I'm trying to visualize my weather data using grafana. I've already made the prometheus part and now I face an issue that hunts me for quite a while.
I created an counter that adds temperature indoor every five minutes.
var tempIn = prometheus.NewCounter(prometheus.CounterOpts{
Name: "tempin",
Help: "Temperature indoor",
})
for {
tempIn.Add(station.Body.Devices[0].DashboardData.Temperature)
time.Sleep(time.Second*300)
}
How can I now visualize this data that it shows current temperature and stores it for unlimited time so I can look at it even 1 year later like an normal graph?
tempin{instance="localhost:9999"} will only display added up temperature so its useless for me. I need the current temperature not the added up one. I also tried rate(tempin{instance="localhost:9999"}[5m])
How to solve this issue?
Although a counter is not the best solution for this use case, you can use the operator increase.
Increase(tempin{instance="localhost:9999"}[5m])
This will tell you how much the counter increased in the last five minutes

Get current no from prooph event store

I try to update a projection from event store. The following line will load all events:
$events = $this->eventStore->load(new StreamName('mystream'));
Currently i try to load only not handled events by passing the fromNumber parameter:
$events = $this->eventStore->load(new StreamName('mystream'), 10);
This will load all events eg from 15 to 40. But i found no way to figure out which is the current/highest "no" of the results. But this is necessary for me to load only from this entry on the next time.
If the database is truncated (with restarted sequences) this is not a real problem cause i know that the events will start with 1. But if the primary key starts with a number higher than 1 can not figure out which event has which number in the event store
When you are using pdo-event-store, you have a key _position in the event metadata after loading, so your read model can track which position was the last you were working on. Other then that, if you are working with proophs event-store projections, you don't need to take care of that at all. The projector will track the current event position for all needed streams internally, you just need to provide callbacks for each event where you need to do something.

Session Window preventing GroupByKey from working

I have an incoming stream of events, each of which already has an associated sessionId from another process.
All I wish to do is combine these events into a single session object using a custom CombineFn.
During development, I'm using a bounded dataset that reads from a file and the following code seems to work:
input.apply(ParDo.named("ParseEvent").of(new ParseEventFn()))
.setCoder(KvCoder.of(StringUtf8Coder.of(), AvroCoder.of(Event.class)))
.apply(GroupByKey.<String, Event>create())
.apply(Combine.groupedValues(new SessionAccumulator()))
The above code (with input/output handling) will output a series of sessions with multiple events in each.
{sessionId: 1, events: [event1,event2,event3]}
{sessionId: 2, events: [event4,event5]}
But in order for this to work on an unbounded dataset, I need to apply a Windowing function, which in this case is a SessionWindow.
input.apply(ParDo.named("ParseEvent").of(new ParseEventFn()))
.setCoder(KvCoder.of(StringUtf8Coder.of(), AvroCoder.of(Event.class)))
.apply(Window.<KV<String, Event>>into(Sessions.withGapDuration(Duration.standardMinutes(30))))
.apply(GroupByKey.<String, Event>create())
.apply(Combine.groupedValues(new SessionAccumulator()))
In that case the only new code is the Windowing function, and rather than rolling up the events, I get each event in it's own session, like this:
{sessionId: 1, events: [event1]}
{sessionId: 1, events: [event2]}
{sessionId: 1, events: [event3]}
{sessionId: 2, events: [event4]}
{sessionId: 2, events: [event5]}
Any idea why this is happening?
EDIT: I should add, the ParseEventFn is applying a timestamp to the PCollection using context.outputWithTimestamp(), and that timestamp seems to be correct.
Digging into it further, in my case the issue was that my core assumption that the timestamps were correct, was wrong.
The timestamps I was applying before the windowing were wrong.
The Windowing was doing exactly what it should, but I had set my timestamps too far apart and it was creating separate sessions for each event.
Oops
In your case, you could possibly write your own WindowFn. If you set the keys to be the session IDs then a large gap duration also works, but it doesn't reflect the nature of your data and computation quite as well.
The ingredients to your WindowFn would be:
your own subclass of BoundedWindow, in this case you would make a window type that contained the session ID in a field
assignWindows, where you would assign each element to a window identified by the session ID. The length of the window still matters, as it controls when the window expires and is garbage collected.
mergeWindows, where you would merge all windows that have the same session ID. They wouldn't have to fall within any particular gap duration.
Another thing you'll need to be careful of is that the watermark that governs the garbage collection of these windows is determined by the source of your unbounded stream of events. So setting the timestamps in your ParDo.of(new ParseEventFn()) will be too late to influence the watermark. You may have data dropped that you'd like to keep.

How do we improve a MongoDB MapReduce function that takes too long to retrieve data and gives out of memory errors?

Retrieving data from mongo takes too long, even for small datasets. For bigger datasets we get out of memory errors of the javascript engine. We've tried several schema designs and several ways to retrieve data. How do we optimize mongoDB/mapReduce function/MongoWire to retrieve more data quicker?
We're not very experienced with MongoDB yet and are therefore not sure whether we're missing optimization steps or if we're just using the wrong tools.
1. Background
For graphing and playback purposes we want to store changes for several objects over time. Currently we have tens of objects per project, but expectations are we need to store thousands of objects. The objects may change every second or not change for long periods of time. A Delphi backend writes to and reads from MongoDB through MongoWire and SuperObjects, the data is displayed in a web frontend.
2. Schema design
We're storing the object changes in minute-second-millisecond objects in a record per hour. The schema design is like described here. Sample:
o: object1,
dt: $date,
v: {0: {0:{0: {speed: 8, rate: 0.8}}}, 1: {0:{0: {speed: 9}}}, …}
We've put indexes on {dt: -1, o: 1} and {o:1}.
3. Retrieving data
We use a mapReduce to construct a new date based on the minute-second-millisecond objects and to put the object back in v:
o: object1,
dt: $date,
v: {speed: 8, rate:0.8}
An average document is about 525 kB before the mapReduce function and has had ~29000 updates. After mapReduce of such a document, the result is about 746 kB.
3.1 Retrieving data from through mongo shell with mapReduce
We're using the following map function:
function mapF(){
for (var i = 0; i < 3600; i++){
var imin = Math.floor(i / 60);
var isec = (i % 60);
var min = ''+imin;
var sec = ''+isec;
if (this.v.hasOwnProperty(min) && this.v[min].hasOwnProperty(sec)) {
for (var ms in this.v[min][sec]) {
if (imin !== 0 && isec !== 0 && ms !== '0' && this.v[min][sec].hasOwnProperty(ms)) {// is our keyframe
var currentV = this.v[min][sec][ms];
//newT is new date computed by the min, sec, ms above
if (toDate > newT && newT > fromDate) {
if (fields && fields.length > 0) {
for (var p = 0, length = fields.length; p < length; p++){
//check if field is present and put it in newV
}
if (newV) {
emit(this.o, {vs: [{o: this.o, dt: newT, v: newV}]});
}
} else {
emit(this.o, {vs: [{o: this.o, dt: newT, v: currentV}]});
}
}
}
}
}
}
};
The reduce function basically just passes the data on. The call to mapReduce:
db.collection.mapReduce( mapF,reduceF,
{out: {inline: 1},
query: {o: {$in: objectNames]}, dt: {$gte: keyframeFromDate, $lt: keyframeToDate}},
sort: {dt: 1},
scope: {toDate: toDateWithinKeyframe, fromDate: fromDateWithinKeyframe, fields: []},
jsMode: true});
Retrieving 2 objects over 1 hour: 2,4 seconds.
Retrieving 2 objects over 5 hour: 8,3 seconds.
For this method we would have to write js and bat files runtime and read the json data back in. We have not measured times fort his yet, because frankly, we don’t like the idea very much.
Another problem with this method is that we get out of memory errors of the v8 javascript engine when we try to retrieve data for longer periods and/or more objects. Using a pc with more RAM works to some extend in preventing out of memory, but it doesn't make retrieving data faster.
This article mentions splitVector, which we might use to devide the workload. But we're not sure on how to use the keyPattern and maxChunkSizeBytes options. Can we use a keyPattern for both o and dt?
We might use multiple collections, but our dataset isn’t that big to start with at the moment, so we’re worried about how much collections we’d need.
3.2 Retrieving data through mongoWire with mapReduce
For retrieving data through mongoWire with mapReduce, we use the same mapReduce functions as above. We use the following Delphi code to start te query:
FMongoWire.Get('$cmd',BSON([
'mapreduce', ‘collection’,
'map', bsonJavaScriptCodePrefix + FMapVCRFunction.Text,
'reduce', bsonJavaScriptCodePrefix + FReduceVCRFunction.Text,
'out', BSON(['inline', 1]),
'query', mapquery,
'sort', BSON(['dt', -1]),
'scope', scope
]));
Retrieving data with this method is about 3-4 times (!) slower. And then the data has to be translated from BSON (IBSONDocument to JSON (SuperObject), which is a major time consuming part in this method. For retrieving raw data we use TMongoWireQuery which translates the BSONdocument in parts, while this mapReduce function uses TMongoWire directly and tries to translate the complete result. This might explain why this takes so long, while normally it's quite fast. If we can reduce the time it takes for the mapReduce to return results, this might be a next step for us to focus on.
3.3 Retrieving raw data and parsing in Delphi
Retrieving raw data to Delphi takes a bit longer then the previous method, but probably because of the use of TMongoWireQuery, the translation from BSON to JSON is much quicker.
4. Questions
Can we do further optimizations on our schema design?
How can we make the mapReduce function faster?
How can we prevent the out of
memory errors of the v8 engine? Can someone give more information on
the splitVector function?
How can we best use of mapReduce from Delphi? Can we use
MongoWireQuery in stead of MongoWire?
5. Specs
MongoDB 3.0.3
MongoWire from 2015 (recently updated)
Delphi 2010 (got XE5 as well)
4GB RAM (tried on 8GB RAM as well, less out of memory, but reading times are about the same)
Phew what a question! First up: I'm not an expert at MongoDB. I wrote TMongoWire as a way to get to know MongoDB a little. Also I really (really) dislike when wrappers have a plethora of overloads to do the same thing but for all kinds of specific types. A long time ago programmers didn't have generics, but we did have Variant. So I built a MongoDB wrapper (and IBSONDocument) based around variants. That said, I apparently made something people like to use, and by keeping it simple performs quite well. (I haven't been putting much time in it lately, but on the top of the list is catering for the new authentication schemes since version 3.)
Now, about your specific setup. You say you use mapreduce to get from 500KB to 700KB? I think there's a hint there you're using the wrong tool for the job. I'm not sure what the default mongo shell does differently than when you do the same over TMongoWire.Get, but if I assume mapReduce assembles the response first before sending it over the wire, that's where the performance gets lost.
So here's my advice: you're right with thinking about using TMongoWireQuery. It offers a way to process data faster as the server will be streaming it in, but there's more.
I strongly suggest to use an array to store the list of seconds. Even if not all seconds have data, store null on the seconds without data so each minute array has 60 items. This is why:
One nicety that turned up in designing TMongoWireQuery, is the assumption you'll be processing a single (BSON) document at a time, and that the contents of the documents will be roughly similar, at least in the value names. So by using the same IBSONDocument instance when enumerating the response, you actually save a lot of time by not having to de-allocate and re-allocate all those variants.
That goes for simple documents, but would actually be nice to have on arrays as well. That's why I created IBSONDocumentEnumerator. You need to pre-load an IBSONDocument instance with an IBSONDocumentEnumerator in the place where you're expecting the array of documents, and you need to process the array in roughly the same way as with TMongoWireQuery: enumerate it using the same IBSONDocument instance, so when subsequent documents have the same keys, time is saved not having to re-allocate them.
In your case though, you would still need to pull the data of an entire hour through the wire just to select the seconds you need. As I said before, I'm not a MongoDB expert, but I suspect there could be a better way to store data like this. Either with a separate document per second (I guess this would let the indexes do more of the work, and MongoDB can take that insert-rate), or with a specific query construction so that MongoDB knows to shorten the seconds array into just that data you're requesting (is that what $splice does?)
Here's an example of how to use IBSONDocumentEnumerator on documents like {name:"fruit",items:[{name:"apple"},{name:"pear"}]}
q:=TMongoWireQuery.Create(db);
try
q.Query('test',BSON([]));
e:=BSONEnum;
d:=BSON(['items',e]);
d1:=BSON;
while q.Next(d) do
begin
i:=0;
while e.Next(d1) do
begin
Memo1.Lines.Add(d['name']+'#'+IntToStr(i)+d1['name']);
inc(i);
end;
end;
finally
q.Free;
end;

D3 ticks() does not return value if provided scale has only 1 result

I have an x-axis that displays the days that my data occurs on. The data is dynamic and sometimes I have data for only 1 day, 2 days, n days, etc.
Here is my code for displaying the days on the x-axis:
chart.x = d3.time.scale()
.range([0, chart.w]);
chart.xAxis = d3.svg.axis()
.scale(chart.x)
.orient("bottom")
.ticks(d3.time.day) // --- TODO : this is not showing the current day, for some reason...
.tickFormat(d3.time.format("%b %-d %p"));
If my data is spread on 2 days (ex: Tuesday, Wednesday), this will only display a tick for the second day (Wednesday), ie. when the day "changes" from one to another.
I want to also display a tick for the first day (Tuesday).
Even if there is only data on 1 day, I still want to display a tick for it.
Thanks you guys,
To extend the domain so that the scale starts and ends at a tick mark you use the .nice() method, as #meetamit suggested -- but "nicing" only works if you call that method after you set the domain, so that's why you might not have noticed any change. The API doesn't really make that clear, although since the method alters the domain I suppose it makes sense that changing the domain later would over-ride the effect of a previous nice() call.
Also, be sure to use the time-scale version of the method: .nice(d3.time.day) to get a domain rounded off to the nearest day as opposed to just the nearest hour.
Here's a fiddle:
http://fiddle.jshell.net/4rGQq/
The key code is simply:
xScale.domain(d3.extent(d))
//d3.extent() returns max and min of array, which become the basic domain
.nice(d3.time.day);
//nice() extends the domain to nearest start/end of a day
Compare what happens if you comment out the .nice() call after setting the domain, even with the other .nice() call during initialization of the scale. Also compare what happens if you don't specify the day-interval as a parameter to the nice method.
Can you show how chart.x is set up? Hard to tell without seeing it, but you may be able to fix it by calling chart.x.nice() (see documentation).
Otherwise, seems like you'll need to manually check the extents of its domain, and adjust them in the case of single day.
Clarification
Your code shows how you call range() but not how you call domain(), which is the important one.
It seems to me to me that if do
var domain = chart.x.domain()
console.log domain[0] == domain[1]
you'll see true getting logged whenever the data is for only one day. If so, it means you're dealing with a single point in time rather than a time range. In that case, you'll need to adjust the domain to be a longer range.
Really hard to know without even seeing an image of what you're working on.
.ticks() should be used to set the number of ticks you'd like to have on your axis, not the kind of data that should be in them. So try to set it like .ticks(3) and it should set a couple of ticks.
From the wiki:
.ticks([count])
Returns approximately count representative values from the scale's input domain. If count is not specified, it defaults to 10. The returned tick values are uniformly spaced, have human-readable values (such as multiples of powers of 10), and are guaranteed to be within the extent of the input domain. Ticks are often used to display reference lines, or tick marks, in conjunction with the visualized data. The specified count is only a hint; the scale may return more or fewer values depending on the input domain.

Resources