How to get performance metrics for mapreduce applications I run in my command line such as CPU usage, network bandwidth. IO bandwidth... but in proper way that I can save the results and visualize them. Is there a web UI for that (for yarn web UI I don't get much details about the applications I run)
Have you tried cloudera manager ? As far as I know it does not have many metrics but it does have cluster information maybe you can extract some performace information from there.
You can find the documentation for it here:Cloudera manager documentation
Related
I have got a requirement to show the management/ Client that the executor-memory, number of cores, default parallelism, number of shuffle partitions and other configuration properties for running the spark job are not excessive or more than required. I need a monitoring (with visualization) tool by which I can justify the memory usage in the spark job. Additionally it should give the kind of information like memory is not getting used properly or certain job requires more memory.
Please suggest some application or tool.
LinkedIn has created a tool that sounds very similar to what you're looking for
See for a presentation as an overview of that product
https://youtu.be/7KjnjwgZN7A?t=480
LinkedIn team has open-sourced Dr. Elephant here -
https://github.com/linkedin/dr-elephant
Give it a try.
Notice that this setup may require manual tweaking of Spark History Server as part of initial integration setup to get the information that Dr. Elephant requires.
As a part of performance Testing on cloud Foundry applications, i am now focusing more on server side (i.e containers where applications are stored) and interested in pulling out metrics which are useful to find bottlenecks such as
1) CPU consumption,
2) disk usage,
3) memory usage
4) Logs
Searched around internet but instead got a lot of confusions.Anyone can please suggest framework or tool that can be used to achieve the same using a windows OS.
The proper way to get metrics & logs would be through the firehose.
https://docs.cloudfoundry.org/loggregator/architecture.html#firehose
You use a Nozzle to get the information from the firehose.
https://docs.cloudfoundry.org/loggregator/architecture.html#nozzles
If you just want to experiment and see what information is available, you can use the firehose-plugin for the cf cli.
https://github.com/cloudfoundry-community/firehose-plugin
Ideally, you'd end up finding or writing a nozzle to integrate with your metrics and log capturing platform. For example, there is a DataDog nozzle for sending metrics off to DataDog.
https://github.com/cloudfoundry-incubator/datadog-firehose-nozzle
There's also a nozzle for sending logs to a syslog server (like ELK).
https://github.com/cloudfoundry-community/firehose-to-syslog
And there's one for Splunk too.
https://github.com/cloudfoundry-community/splunk-firehose-nozzle
Hope that helps!
Context:
Seeing a null pointer in one of the integration test, which runs in a locally spawned stom cluster. Increased the log level and could not figure it out what is really happening. Any help would be appreciated.
Your question doesn't quite match your title. If you're looking for better access to logs for scalable apps (whether on Hadoop or Storm) then check out tools that collect and aggregate logs from multiple nodes and systems. I'm familiar with PaperTrail and GreyLog, but I'm sure there are others. These tools, in conjunction with judicious use of log levels, can help you quickly find errors in your scalable apps.
If you're looking to get a better idea of how your system is performing (this is what I think of when I hear "visualization") then check out distributed monitoring tools. We've had very good success with the both the visualization of Storm bolt/spout performance and alert processing with CopperEgg, for example.
We start to integrate yammer metrics in our applications. And i would like to visualize the metrics.
Yammer-metrics have collect process, which could send metrics to Ganglia, or Graphite. But there are a little bit huge to install on my computer.
Do you know some simple reporting tool, with ram storage for example for this usage ?
There is a javascript library that graphs the output of the MetricsServlet: https://github.com/benbertola/metrics-watcher
I was looking at the Metrics project (i assume is this: http://metrics.codahale.com/) and found that is able to export the metrics to a CSV file, which can be used with many reporting tools including this one: DBxtra, the reason i recommend this one is because is very ad-hoc and you can design and view a report in less than 10 minutes, mostly by doing drag and drop.
If you just want to report periodically on your console you could use:
com.yammer.metrics.reporting.ConsoleReporter.enable(5, TimeUnit.MINUTES)
As our servers gets busier I'm increasingly interesting in monitor what's going on over time, we have some our host offers some crappy graphs which show CPU usage and Memory over time but there not really telling me much.
What sort of high performance tools are available to accurately monitor Apache?
You can setup rrd to store data and create graphs based on data you send it. I suggest parsing data from the apache logs and from the ps command.
You can also use the cacti package to create an interface to rrd.