Given I have a metric (stats.counters.PROD.FailedBuildMessages) and I want to extend its resolution (stats.counters.PROD.FailedBuildMessages.1, stats.counters.PROD.FailedBuildMessages.2, stats.counters.PROD.FailedBuildMessages.3 etc') but I don't want to lose my previous data and still use it as long as I can, how can I overlay the data? It seems once I add the sub counters to the FailedBuildMessages node it loses the data.
is there a way to do it?
You should send the old metric alongside the new one. They are completely different metrics even if they share the same hierarchy.
I do that in order to have coarser/finer details on a metric and it works quite well.
Related
i am confused about using storm. I am going to measure status of data source using its streamed data. Status will be calculated with combine of some fields, and these field can be achieved different time interval. That's why i need to save fields to measure status of data source.
Can i use BaseStatefulBolt? Or the only solution is trident for this cenario?
What is the difference btw them. Because there is a statefactory inside trident too.
Thank you.
I think the difference is trident is higher level than BaseStatefulBol, it has some options for counting like group by,persistentAggregate,aggregate .
I have used trident for counting total view per user. If we only care about current total count, I think we can use trident by using MemoryMapState.Factory() and class implement action for counting or summing.
In your case you need to managing status of some current fields , I think implement BaseStatefulBolt is a good choice, it has KeyValueState for save current state.
For every processor there is a way to configure the processor and there is a context menu to view data provenance.
Is there a good explanation of what is data provenance?
Data provenance is all about understanding the origin and attribution of data. In a typical system you get 'logs'. When you consider data flowing through a series of processes and queues you end up with a lot of lots of course. If you want to follow the path a given piece of data took, or how long it took to take that path, or what happened to an object that got split up into different objects and so on all of that is really time consuming and tough. The provenance that NiFi supports is like logging on steroids and is all about keeping and tracking these relationships between data and the events that shaped and impacted what happened to it. NiFi is keeping track of where each piece of data comes from, what it learned about the data, maintains the trail across splits, joins, transformations, where it sends it, and ultimately when it drops the data. Think of it like a chain of custody for data.
This is really valuable for a few reasons. First, understanding and debugging. Having this provenance capture means from a given event you can go forwards or backwards in the flow to see where data came from and went. Given that NiFi also has an immutable versioned content store under the covers you can also use this to click directly to the content at each stage of the flow. You can also replay the content and context of a given event against the latest flow. This in turn means much faster iteration to the configuration and results you want. This provenance model is also valuable for compliance reasons. You can prove whether you sent data to the correct systems or not. If you learn that you didn't then have data with which you can address the issue or create a powerful audit trail for follow-up.
The provenance model in Apache NiFi is really powerful and it is being extended to the Apache MiNiFi which is a subproject of Apache NiFi as well. More systems producing more provenance will mean you have a far stronger ability to track data from end-to-end. Of course this becomes even more powerful when it can be combined with other lineage systems or centralized lineage stores. Apache Atlas may be a great system to integrate with for this to bring a centralized view. NiFi is able to not only do what I described above but to also send these events to such a central store. So, exciting times ahead for this.
Hope that helps.
I'm using logstash 1.5 to analyze logs.
I want to track two events which occur one after the other.
So I would like to set a flag/field/tag when first event occurs and retain the value across events.
I looked at this link but looks like grep and drop are not supported in logstash 1.5.
Is there a way of achieving this?
The closest you can get with logstash is the elapsed{} filter. You could use that code as a basis for your own filter if it doesn't meet your needs. I also run some external (python) post-processing to do more than elapsed{} can (or should) do.
So I'm currently diving the CQRS architecture along with the EventStore "pattern".
It opens applications to a new dimension of scalability and flexibility as well as testing.
However I'm still stuck on how to properly handle data migration.
Here is a concrete use case:
Let's say I want to manage a blog with articles and comments.
On the write side, I'm using MySQL, and on the read side ElasticSearch, now every time a I process a Command, I persist the data on the write side, dispatch an Event to persist the data on the read side.
Now lets say I've some sort of ViewModel called ArticleSummary which contains an id, and a title.
I've a new feature request, to include the article tags to my ArticleSummary, I would add some dictionary to my model to include the tags.
Given the tags did already exist in my write layer, I would need to update or use a new "table" to properly use the new included data.
I'm aware of the EventLog Replay strategy which consists in replaying all the events to "update" all the ViewModel, but, seriously, is it viable when we do have a billion of rows?
Is there any proven strategies? Any feedbacks?
I'm aware of the EventLog Replay strategy which consists in replaying
all the events to "update" all the ViewModel, but, seriously, is it
viable when we do have a billion of rows?
I would say "yes" :)
You are going to write a handler for the new summary feature that would update your query side anyway. So you already have the code. Writing special once-off migration code may not buy you all that much. I would go with migration code when you have to do an initial update of, say, a new system that requires some data transformation once off, but in this case your infrastructure would exist.
You would need to send only the relevant events to the new handler so you also wouldn't replay everything.
In any event, if you have a billion rows of data your servers would probably be able to handle the load :)
Im currently using the NEventStore by JOliver.
When we started, we were replaying our entire store back through our denormalizers/event handlers when the application started up.
We were initially keeping all our data in memory but knew this approach wouldn't be viable in the long term.
The approach we use currently is that we can replay an individual denormalizer, which makes things a lot faster since you aren't unnecessarily replaying events through denomalizers that haven't changed.
The trick we found though was that we needed another representation of our commits so we could query all the events that we handled by event type - a query that cannot be performed against the normal store.
We have an application that parses tweets and we want to see the activity in real time. We have tried several solution without success. Our main problems is that the graphing solution (example:graphite), needs a continious flow of metrics. When the db aggregates the metrics it's an average operation which is done, not a a sum.
We recently saw cube from square which would fit our requirement but it's too new.
Any alternatives?
I found the solution in the last version of graphite:
http://graphite.readthedocs.org/en/latest/config-carbon.html#storage-aggregation-conf
If I understood correctly, you cannot feed graphite in realtime, for instance as soon as you discover a new tweet?
If that's the case, it looks like you can specify a unix timestamp when updating graphite metric_path value timestamp\n so you could pass in the time of discovery/publication/whatever, regardless of when you process it.