We have an application that parses tweets and we want to see the activity in real time. We have tried several solution without success. Our main problems is that the graphing solution (example:graphite), needs a continious flow of metrics. When the db aggregates the metrics it's an average operation which is done, not a a sum.
We recently saw cube from square which would fit our requirement but it's too new.
Any alternatives?
I found the solution in the last version of graphite:
http://graphite.readthedocs.org/en/latest/config-carbon.html#storage-aggregation-conf
If I understood correctly, you cannot feed graphite in realtime, for instance as soon as you discover a new tweet?
If that's the case, it looks like you can specify a unix timestamp when updating graphite metric_path value timestamp\n so you could pass in the time of discovery/publication/whatever, regardless of when you process it.
Related
I am using Dynatrace to help orient my efforts as I'm optimizing an endpoint of our service.
Looking at the Controller's PurePath, I am currently wondering: what does each individual yellow bar mean exactly?
It seems to be some sort of aggregate since I don't think we have any kind of batching activated. Yet, we see multiple times the same statement being aggregated into one bar, then right after the same statement aggregated again into a single bar, but in the same timeframe (for example: we see a 89x, then a 90x following each other).
As per company policy, I had to hide a bunch of things with black rectangles: sorry for that!
We have been using Dynatrace for long time now. These yellow boxes are showing the time taken for executing respective query. The query can be seen at the start of that row.
Looking at your diagram it seems you are executing few queries in parallel. e.g. last 4 queries have started at the same time and based on complexity each has taken different time to complete the execution.
The multiplying factor shown as 90X or 89X is the number of times that query is executed. This is what documentation says.
I truly do not agree with that. Why would developer/ DB server run the same query those many times? May be the agent installed on that DB server is getting confused due to same query is getting executed across different requests. This is just my guess.
Regards,
Vikrant Korde
I'm creating a new service, and for that I have database entries (Mongo) that have a state field, which I need to update based on a current time, so, for instance, the start time was set to two hours from now, I need to change state from CREATED -> STARTED in database, and there can be multiple such states.
Approaches I've thought of:
Keep querying database entries that are <= current time and then change their states accordingly. This causes extra reads for no reason and half the time empty reads, and it will get complicated fast with more states coming in.
I write a job scheduler (I am using go, so that'd be not so hard), and schedule all the jobs, but I might lose queue data in case of a panic/crash.
I use some products like celery, have found a go implementation for it https://github.com/gocelery/gocelery
Another task scheduler I've found is on Google Cloud https://cloud.google.com/solutions/reliable-task-scheduling-compute-engine, but I don't want to get stuck in proprietary technologies.
I wanted to use some PubSub service for this, but I couldn't find one that has delayed messages (if that's a thing). My problem is mainly not being able to find an actual name for this problem, to be able to search for it properly, I've even tried searching Microsoft docs. If someone can point me in the right direction or if any of the approaches I've written are the ones I should use, please let me know, that would be a great help!
UPDATE:
Found one more solution by Netflix, for the same problem
https://medium.com/netflix-techblog/distributed-delay-queues-based-on-dynomite-6b31eca37fbc
I think you are right in that the problem you are trying to solve is the job or task scheduling problem.
One approach that many companies use is the system you are proposing: jobs are inserted into a datastore with a time to execute at and then that datastore can be polled for jobs to be run. There are optimizations that prevent extra reads like polling the database at a regular interval and using exponential back-off. The advantage of this system is that it is tolerant to node failure and the disadvantage is added complexity to the system.
Looking around, in addition to the one you linked (https://github.com/gocelery/gocelery) there are other implementations of this model (https://github.com/ajvb/kala or https://github.com/rakanalh/scheduler were ones I found after a quick search).
The other approach you described "schedule jobs in process" is very simple in go because goroutines which are parked are extremely cheap. It's simple to just spawn a goroutine for your work cheaply. This is simple but the downside is that if the process dies, the job is lost.
go func() {
<-time.After(expirationTime.Sub(time.Now()))
// do work here.
}()
A final approach that I have seen but wouldn't recommend is the callback model (something like https://gitlab.com/andreynech/dsched). This is where your service calls to another service (over http, grpc, etc.) and schedules a callback for a specific time. The advantage is that if you have multiple services in different languages, they can use the same scheduler.
Overall, before you decide on a solution, I would consider some trade-offs:
How acceptable is job loss? If it's ok that some jobs are lost a small percentage of the time, maybe an in-process solution is acceptable.
How long will jobs be waiting? If it's longer than the shutdown period of your host, maybe a datastore based solution is better.
Will you need to distribute job load across multiple machines? If you need to distribute the load, sharding and scheduling are tricky things and you might want to consider using a more off-the-shelf solution.
Good luck! Hope that helps.
Capturing data in Keen.io is pretty straight-forward when it's driven by an event. (User-logged-in, Customer-bought-stuff, ...). That's very nicely explained in the docs.
But what is the best approach when I want insight on a state of my app at a given time in the past?
For instance: how many active licenses were there this time last year?
Our first thought is to run a cronjob every hour, to get those data from te DB, and store them in a custom collection. Is this the right approach, or are there better solutions?
I want to display a graphical report based on time (weekly/daily) which shows that what is the status of static code analysis over the period of time. E.g. vertical bar will denote number of issue and horizontal will display the time day/month/week. This will help to keep an watch of code quality easily over the period of time (something like burn down chart of scrum). Can someone help me for this?
The 5.1.2 issues search web service includes parameters which let you query for issues by creation date. Your best best is to use AJAX requests to get the data you need and build your widget from there.
Note that you can query iteratively across a date range using &p=1&ps=1 (page=1 and page size=1) to limit the volume of data flying around, and just mine the total value in the top level of the response to get your answer.
Here's an example on Nemo
we've got an application developed in java, with GWT providing the frontent. The application is used on a variety of hardware specifications, e.g. also on older machines. Of course users complain about performance.
We'd like to collect profiling data from real-world users. So far we can measure the pure server-side duration (that's easy) and the duration of the network roundtrip (not so easy, but we managed that).
The hardest part for us is measuring the time elapsed between "user clicking on search button" and "first xxx rows of grid have been displayed".
Any idea?
Thanks
Holger
I would play around with creating a timestamp at the start of the page load and a timestamp at the end. I believe that "the beginning" would be "onModuleLoad" and the end would be after your last element/widget is added. I hope I have given you a good idea of where to start. You can play with moving these timestamps around to mazimize the time difference that you get. Once you feel confident that you are getting the rendering time, you can save the time difference in a database so that whenever anyone uses your page you get more user data.
Try using SpeedTracer, it's a google chrome plugin developed by google itself
There is no full-stack solution at the moment as far as I know. What you could build internally is a combination of remote logging, gwt lightweight measurements and deferred binding magic.
The first part is to understand that all RPC events and initialization sequence are already measured and how to plug in into that: http://code.google.com/webtoolkit/doc/latest/DevGuideLightweightMetrics.html
The second part is adding Deferred Binding magic to measure onSuccess() method execution time of application Callbacks. The inspiration (but not a solution) could be found here: http://josephmarques.wordpress.com/2010/11/29/performance-monitoring-using-gwt/
The final part is delivering back to client. Here you may use gwt-log or new gwt logging possibilities. Not sure if they implemented that in JDK logging though.
I was thinking to create an embeddable open source library today as we have solved exactly the same problem recently and in process of porting that to GWT 2.0 :)
But I guess it will take some time from idea to implementation...
Hope it helps.
Dmitry
You can use the Duration class available on GWT Client side. com.google.gwt.core.client.Duration
It is a utility class for measuring elapsed time on the client side.
Example usage:
Duration duration = new Duration();
doSomething(); //Returns the number of milliseconds that have elapsed since this object was created.
GWT.log("time taken for doSomething() to complete: "+duration.elapsedMillis());
Documentation
More examples