Is there a web-based generic metric monitoring service? - performance

First of all... I'm not looking for New Relic :-)
I'm looking for something very similar to Munin but hosted, and accessible (at least for pushing data) via an HTTP API. I want to monitor some custom metrics on a web-application and I'm looking for nice graphs, historical data, ease of setup and obviously the ability to use custom metrics that I'll measure and report myself. I'll be using it to monitor aspects of a NodeJS app, but the source of the data shouldn't matter much.

Try AlertGrid. It has extremely simple API (via HTTP), with only one method which is used to push any custom data. Then you build rules in a nice and simple editor to handle the incoming data (e.g. if metric1>10 and metric2 not in ['a','b','c'] then send email to X and sms to Y) or handle situations when expected event did not occur at all within a timeframe (e.g. when no data received from X for 15 minutes, then email to Y, sms to Z). It can also automatically draw simple graphs from the received data (for integer and float fields). Everything is web-based.
Unlike Nagios, AlertGrid is extremely simple to use and integrate, and requires no installation. If you know how to make a http request, then in 5 minutes you have a working solution (examples and wrapper classes are available). I'm on the dev team, so if you had any questions, feel free to ask.

You can try Nagios or write plugin for Munin.

I really like DataDog. I think it would check the boxes on all of your requirements. We've been using it to set up dashboards for a number of services at Mobify and so far it's been a pleasure to use.
I've recently released a NodeJS library that might be helpful: datadog-metrics.
Here's some example code:
var metrics = require('datadog-metrics');
metrics.init({ host: 'myhost', prefix: 'myapp.' });
function collectMemoryStats() {
var memUsage = process.memoryUsage();
metrics.gauge('memory.rss', memUsage.rss);
metrics.gauge('memory.heapTotal', memUsage.heapTotal);
metrics.gauge('memory.heapUsed', memUsage.heapUsed);
};
setInterval(collectMemoryStats, 5000);

Related

Parallel Req/Rep via Pub/Sub

I have multiple servers, at any point, one and only one will be the leader whcih can respond to a request, all others just drop the request. The issue is that the client does not know which server is the leader.
I have tried using a pub socket on the client for the parallel request out, however I can't work out the right semantics for the response. In terms of how to get the server to respond to that specific client.
A hacky solution which I have tried is to have a sub socket on the client to pub sockets on all the servers, with the leader responding by publishing a message with a filter such that it only goes to the client.
However I am unable to receive any responses this way, the server believes that it sent the message and the client believes it subscribed to "" but then doesn't receive anything...
So I am wondering whether there is a more proper way of doing this? I have thought that potentially a dealer/router with sending to a specific client would work, however I am unsure how to do that.
Essentially I am trying to do a standard Req/Rep however doing the req in parallel to all the nodes, rather than round robin.
UPDATE: By sending the routing id of the dealer in the pub request, making the remote call idempotent (just returning pre-computed results on repeated attempts), and then sending the result back via a router, with message filtering on the receiving side, it now works.
Q : " is (there) a more proper way of doing this? "
Yes.
Start to apply the Maslow's Hammer rule:
“When the only tool you have is a hammer, every problem begins to resemble a nail.”
In other words, do not try use (one) hammer for solving every problem. PUB/SUB-archetype was designed to serve those-and-only-those multi-party Formal-Communications-Pattern archetypes, where many SUB-scribe to .recv() some PUB-lisher(s) .send()-broadcast messages, but nothing other.
Similarly, REQ/REP-archetype was defined and implemented so as to serve one-and-only-one multi-party distributed Formal-Communications-Pattern ( and will obviously not meet any use-case, which has any single other or even a slightly different requirement ).
Users often require some special, non-trivial features, that obviously were not a part of the said trivial Formal-Communications-Pattern archetype primitives ( those ready-made blocks, made available in the ZeroMQ toolbox ).
It is architecs' / designers' role to define, analyse and implement any more complex user-specific distributed-behaviour definition ( a protocol ) and to implement it, most often using a layered combination of the ready-made ZeroMQ primitives.
If in doubts, take a sheet of paper and pencil, draw a small crowd of kids on playground and sketch their "shouts", their "listening", their "silence", "waiting" and "doubts", their many or few "replies", their "voting" and "anger" of not being voted for by friends, their fight for a place on the Sun and their "persistence" not to let others take theirs turn and let 'em sit on the "swing" after releasing the so far pleasurable swinging oneselves.
All this is the part of finding the right mix of ( protocol-orchestrated ) levels of control and levels of freedom to act.
There we get the new, distributed-behaviour, tailor-made for your specific use-case.
Probability to find a ready-made primitive tool to match and fulfill any user-specific use case is limitlessly close to Zero ( sure, unless one's own, user-specific use-case requirements match all those of the primitive archetype, but that is not a user-specific use-case, but a re-use of an already implemented archetype for the very same situation, that was foreseen by the ZeroMQ fathers, wasn't it? )
Again, welcome to the art of Zen-of-Zero.
Maylike to readthis and this and this

How get a data without polling?

This is more of a theorical question.
Well, imagine that I have two programas that work simultaneously, the main one only do something when he receives a flag marked with true from a secondary program. So, this main program has a function that will keep asking to the secondary for the value of the flag, and when it gets true, it will do something.
What I learned at college is that the polling is the simplest way of doing that. But when I started working as an developer, coworkers told me that this method generate some overhead or it's waste of computation, by asking every certain amount of time for a value.
I tried to come up with some ideas for doing this in a different way, searched on the internet for something like this, but didn't found a useful way about how to do this.
I read about interruptions and passive ways that can cause the main program to get that data only if was informed by the secondary program. But how this happen? The main program will need a function to check for interruption right? So it will not end the same way as before?
What could I do differently?
There is no magic...
no program will guess when it has new information to be read, what you can do is decide between two approaches,
A -> asks -> B
A <- is informed <- B
whenever use each? it depends in many other factors like:
1- how fast you need the data be delivered from the moment it is generated? as far as possible? or keep a while and acumulate
2- how fast the data is generated?
3- how many simoultaneuos clients are requesting data at same server
4- what type of data you deal with? persistent? fast-changing?
If you are building something like a stocks analyzer where you need to ask the price of stocks everysecond (and it will change also everysecond) the approach you mentioned may be the best
if you are writing a chat based app like whatsapp where you need to check if there is some new message to the client and most of time wont... publish subscribe may be the best
but all of this is a very superficial look into a high impact architecture decision, it is not possible to get the best by just looking one factor
what i want to show is that
coworkers told me that this method generate some overhead or it's
waste of computation
it is not a right statement, it may be in some particular scenario but overhead will always exist in distributed systems
The typical way to prevent polling is by using the Publish/Subscribe pattern.
Your client program will subscribe to the server program and when an event occurs, the server program will publish to all its subscribers for them to handle however they need to.
If you flip the order of the requests you end up with something more similar to a standard web API. Your main program (left in your example) would be a server listening for requests. The secondary program would be a client hitting an endpoint on the server to trigger an event.
There's many ways to accomplish this in every language and it doesn't have to be tied to tcp/ip requests.
I'll add a few links for you shortly.
Well, in most of languages you won't implement such a low level. But theorically speaking, there are different waiting strategies, you are talking about active waiting. Doing this you can easily eat all your memory.
Most of languages implements libraries to allow you to start a process as a service which is at passive waiting and it is triggered when a request comes.

The ways to make a scraper bot look more like a human

Due to a limitation of the API of a websites I use for searching some products, I have to do html scraping its Products page. There's no no other way because it offers only free API with the limitation. I just need 10 or 100 times more items that its API returns, meaning even if I call it 5 times, it'll return the same set of the products as if it were 1 call.
I don't need to scrape plenty of the page in short period of time. Normally a scrape bot would scrape all that data in a few minutes. For me a few hours is acceptable, so my scraper can be more like a human.
The questions is: what are the ways to make my scraper look like a normal user?
First, make less calls in a short period of time.
Use a headless browser, maybe?
Use vpn? or proxy? or both?
What are other pointers?
Note: in my case scraping is the only way to achieve what I want because the API doesn't work. So there's no question whether I should use the API or scraping. I simply can only use scraping.
You are basically heading toward a right direction.
Yet I suspect that you don't really master the API (or it's a weird one) if if call it 5 times, it'll return the same set of the products as if it were 1 call. API should be able to let users access to all possible data (with frequency limit though).
The items you've asked about:
Make less calls in a short period of time. - Kind of true, yet still you should be clear what request frequency is acceptible for certain site (not being detected, nor bandwidth throttling).
Use a headless browser. - Yes. Abandon cookie, be anonymous.
Use vpn? or proxy? - Proxy yes, use an appropriate proxy service that will provide you enough flexibility of not being detected. VPN does not help, since network nodes (where you scrape from) are limited in number and have static IPs (basically).
I think this post might be to your help.

AJAX Real Time and collaborative

I am trying to create real-time and collaborative application like - google wave for example.
When user1 writes something at the same time it shows on user2 screens.
I started a little research,and found some ways to this with Ajax -
1.every X seconds send request to the server and to check what is "happening"
2.timeout - long request ,Problem - I saw i can do this only with IE8
there are other options?what is the best way to this?
And with way number 2,this true I can do this only with IE8?
Yosy
The whole point of AJAX is that the server can wait for notifications from each clients, and notify all the other clients when something happens. There's no need for polling. Look up keywords like comet, and bayeux. Dojo has a good implementation.
I'm not sure what you are referring to in 2, but if I were going to implement something like this, I'd do what you explain in 1. Basically your server will be keeping track of the conversation, and the clients will constantly ask for updates.
Another possible option would be flash, but I don't know much about that other than it would be capable, so your on your own for researching that.
Some notes on keeping things running quickly in option 1:
Remember you only have 2 "ajax"
calls to work with on the client side (you can only have 2 calls
out at once). So keep track
of the calls that are out. Make use
of abort() if a call takes too long or its response is not going to be valid anymore.
Get the most out of your calls, if
you need to send text to the server,
use the response to get an update on
the current "conversation".

Java EE servlet to create a file and show progress while creating it

I need to write a servlet that will return to the user a csv that holds some statistics.
I know how to return just the file, but how can I do it while showing a progress bar of the file creation process?
I am having trouble understanding how can I do something ajaxy to show the progress of the file creation, while creating the file at the same time - if I create a servlet that will return the completion percentage, how can it keep the same file it is creating while returning a response every x seconds to the browser to show the progress.
There's two fundamentally different approaches. One is true asynchronous delivery using an approach such as Comet. You can see some descriptions in articles such as this. I would use this approach where the data your are delivering is naturally incremental - for example live measurements from instrumentation. Some Java App Servers have nice integration between their JMS message systems and comet to the browser.
The other approach is that you have a polling mechanism. The JavaScript in the browser makes periodic calls to the server to get status (and maybe the next chunk of data). The advantage of this approach is that you are using a very standard programming model, less new stuff to learn. For many cases, such as "are there new answers for the Stack Overflow question I'm working on?" this is quite sufficient.
Your challenge may be to determine any useful progress information. How would you know how far through the generation of the CSV file you are?
If you are firing off a long running request from a servlet it's quite likely that you will effectivley spin off a worker thread to do that work. (Maybe using JMS, maybe using asynch workers) and immediately return a response to the browser saying "Understood, I'm thinking". This ensures that you are not vulnerable to and Http response timeouts. The problem then is how to determine the current progress. Unless the "worker" doing the work has some way to communicate its partial progress you have nothing useful to say. This kind of thing tend to be very application-specific. Some tasks very naturally have progress points (consider printing we know how many pages to do and how many printed) others don't (consider determining if a number is prime - yes or no, no useful intermediate stages perhaps)

Resources