how to capture non-event-driven data in keen-io - business-intelligence

Capturing data in Keen.io is pretty straight-forward when it's driven by an event. (User-logged-in, Customer-bought-stuff, ...). That's very nicely explained in the docs.
But what is the best approach when I want insight on a state of my app at a given time in the past?
For instance: how many active licenses were there this time last year?
Our first thought is to run a cronjob every hour, to get those data from te DB, and store them in a custom collection. Is this the right approach, or are there better solutions?

Related

What is a good way to store arbitrary schedule data?

I am working on a project where the objective is to keep track of whether a client has uploaded data within their expected time-window or not. These windows can occur daily, weekly, bi-weekly, bi-monthly, quarterly... etc. I'm currently using the CRON syntax to describe these schedules which kind-of works, but it falls short in some scenarios, for example the bi-weekly one.
I was wondering if there is some industry standard way to efficiently describe and store schedules that I don't know of. I tried to educate myself via Google, but when I put the word schedule into the mix, it always assumes I'm trying to schedule some task, but I only want to calculate dates based on these schedule definitions and compare the incoming file dates with them.
Thank you

How do I create a time-based delay that starts and ends by a clock time?

I'm trying to simulate a movie theater. Movies start and end at specified times, no matter when a customer arrives and sits down. I want to be able to start and end the delay by time based units, kicking everyone out at the same time (once the movie is over)
I've tried googling - because I am a student and this was way too ambitious to try but I really wanted to. I'm literally hoping for any kind of insight
Select output, 4 different services (not ped), ped go to, then all same sink
I want it to work and it doesn't
Try thinking of your movie as an agent. Each movie can have its own event, which you could set to go off at a specific calendar date/time. In the action of the event, you could send a message to all your customers that they can now leave the theater...or, if your customers are in a queue with a hold block for that particular movie, you can just unblock the hold.
With movies as agents, you could have a start/end time and then create your population of movies from a database linked to Excel. That is, you would be able to set up your movie schedule outside of AnyLogic, which would probably be easier for whoever your stakeholders are.
Your question is probably too broad to answer fully in stack overflow. There are a million different ways to model a movie theater, and approach would depend on what you are trying to answer, animation requirements, your comfort level with programming, user interface requirements, etc.

Scheduling tasks/messages for later processing/delivery

I'm creating a new service, and for that I have database entries (Mongo) that have a state field, which I need to update based on a current time, so, for instance, the start time was set to two hours from now, I need to change state from CREATED -> STARTED in database, and there can be multiple such states.
Approaches I've thought of:
Keep querying database entries that are <= current time and then change their states accordingly. This causes extra reads for no reason and half the time empty reads, and it will get complicated fast with more states coming in.
I write a job scheduler (I am using go, so that'd be not so hard), and schedule all the jobs, but I might lose queue data in case of a panic/crash.
I use some products like celery, have found a go implementation for it https://github.com/gocelery/gocelery
Another task scheduler I've found is on Google Cloud https://cloud.google.com/solutions/reliable-task-scheduling-compute-engine, but I don't want to get stuck in proprietary technologies.
I wanted to use some PubSub service for this, but I couldn't find one that has delayed messages (if that's a thing). My problem is mainly not being able to find an actual name for this problem, to be able to search for it properly, I've even tried searching Microsoft docs. If someone can point me in the right direction or if any of the approaches I've written are the ones I should use, please let me know, that would be a great help!
UPDATE:
Found one more solution by Netflix, for the same problem
https://medium.com/netflix-techblog/distributed-delay-queues-based-on-dynomite-6b31eca37fbc
I think you are right in that the problem you are trying to solve is the job or task scheduling problem.
One approach that many companies use is the system you are proposing: jobs are inserted into a datastore with a time to execute at and then that datastore can be polled for jobs to be run. There are optimizations that prevent extra reads like polling the database at a regular interval and using exponential back-off. The advantage of this system is that it is tolerant to node failure and the disadvantage is added complexity to the system.
Looking around, in addition to the one you linked (https://github.com/gocelery/gocelery) there are other implementations of this model (https://github.com/ajvb/kala or https://github.com/rakanalh/scheduler were ones I found after a quick search).
The other approach you described "schedule jobs in process" is very simple in go because goroutines which are parked are extremely cheap. It's simple to just spawn a goroutine for your work cheaply. This is simple but the downside is that if the process dies, the job is lost.
go func() {
<-time.After(expirationTime.Sub(time.Now()))
// do work here.
}()
A final approach that I have seen but wouldn't recommend is the callback model (something like https://gitlab.com/andreynech/dsched). This is where your service calls to another service (over http, grpc, etc.) and schedules a callback for a specific time. The advantage is that if you have multiple services in different languages, they can use the same scheduler.
Overall, before you decide on a solution, I would consider some trade-offs:
How acceptable is job loss? If it's ok that some jobs are lost a small percentage of the time, maybe an in-process solution is acceptable.
How long will jobs be waiting? If it's longer than the shutdown period of your host, maybe a datastore based solution is better.
Will you need to distribute job load across multiple machines? If you need to distribute the load, sharding and scheduling are tricky things and you might want to consider using a more off-the-shelf solution.
Good luck! Hope that helps.

How to update/migrate data when using CQRS and an EventStore?

So I'm currently diving the CQRS architecture along with the EventStore "pattern".
It opens applications to a new dimension of scalability and flexibility as well as testing.
However I'm still stuck on how to properly handle data migration.
Here is a concrete use case:
Let's say I want to manage a blog with articles and comments.
On the write side, I'm using MySQL, and on the read side ElasticSearch, now every time a I process a Command, I persist the data on the write side, dispatch an Event to persist the data on the read side.
Now lets say I've some sort of ViewModel called ArticleSummary which contains an id, and a title.
I've a new feature request, to include the article tags to my ArticleSummary, I would add some dictionary to my model to include the tags.
Given the tags did already exist in my write layer, I would need to update or use a new "table" to properly use the new included data.
I'm aware of the EventLog Replay strategy which consists in replaying all the events to "update" all the ViewModel, but, seriously, is it viable when we do have a billion of rows?
Is there any proven strategies? Any feedbacks?
I'm aware of the EventLog Replay strategy which consists in replaying
all the events to "update" all the ViewModel, but, seriously, is it
viable when we do have a billion of rows?
I would say "yes" :)
You are going to write a handler for the new summary feature that would update your query side anyway. So you already have the code. Writing special once-off migration code may not buy you all that much. I would go with migration code when you have to do an initial update of, say, a new system that requires some data transformation once off, but in this case your infrastructure would exist.
You would need to send only the relevant events to the new handler so you also wouldn't replay everything.
In any event, if you have a billion rows of data your servers would probably be able to handle the load :)
Im currently using the NEventStore by JOliver.
When we started, we were replaying our entire store back through our denormalizers/event handlers when the application started up.
We were initially keeping all our data in memory but knew this approach wouldn't be viable in the long term.
The approach we use currently is that we can replay an individual denormalizer, which makes things a lot faster since you aren't unnecessarily replaying events through denomalizers that haven't changed.
The trick we found though was that we needed another representation of our commits so we could query all the events that we handled by event type - a query that cannot be performed against the normal store.

Solution for graphing application events metrics in real time

We have an application that parses tweets and we want to see the activity in real time. We have tried several solution without success. Our main problems is that the graphing solution (example:graphite), needs a continious flow of metrics. When the db aggregates the metrics it's an average operation which is done, not a a sum.
We recently saw cube from square which would fit our requirement but it's too new.
Any alternatives?
I found the solution in the last version of graphite:
http://graphite.readthedocs.org/en/latest/config-carbon.html#storage-aggregation-conf
If I understood correctly, you cannot feed graphite in realtime, for instance as soon as you discover a new tweet?
If that's the case, it looks like you can specify a unix timestamp when updating graphite metric_path value timestamp\n so you could pass in the time of discovery/publication/whatever, regardless of when you process it.

Resources