The web based ganglia dashboard is nice. But I'm looking for a simpler dashboard that I can run on text-only terminal. I'm mainly just interested of load of each cluster node. Before writing my own script, does this already exist?
I wrote a very simple text based dashboard myself.
https://github.com/hannorein/ganglia_txt_dashboard
Related
Assuming I have many Python processes running on an automation server such as Jenkins, let's say I want to use Python's native logging module and, other than writing to the Jenkins console or to a log file, I want to store & centralize the logs somewhere.
I thought of using ELK for that, but then I realized that I can just as well create a dedicated log table in an existing database (I'm using Redshift), use something like Grafana for log dashboards/visualization and save myself the trouble of deploying a new system (most of the people in my team are familiar with Redshift but not with ElasticSearch).
Although it sounds straightforward, I feel like I'm not looking at the big picture and that I would be missing some powerful capabilities that components like Logstash were written for the in the first place. What would these capabilities be and how would it be advantageous to use ELK instead of my solution?
Thank you!
I have implemented a full ELK stack in my company in the past year.
The project was huge and took a lot of time to properly implement. The advantages of using ELK and not implementing our own centralized logging solution would be:
Not needing to re-invent the wheel- There is already a product that is doing just that. (and the installation part is extremely easy)
It is battle tested and can stand huge amount of logs in a short time.
As your business and product grows and shift you will need to parse more logs with different structure which will mean DB changes on self built system. logstash will give you endless possibilities of filtering and parsing those new formatted logs.
It has Cluster and HA capabilities, and you can scale your logging system vertically and horizontally.
Very easy to maintain and change over time.
It can send the needed output to a variety of products including Zabbix, Grafana, elasticsearch and many more.
Kibana will give you ability to view the logs, build graphs and dashboards, alerts and more...
The options with ELK are really endless and the more I work with it, the more I find new ways it can help me. not just from viewing logs on distributed remote server systems, but also security alerts and SLA graphs and many other insights.
The question is about the ways to create a Windows desktop-based and/or web-based application client that is able to connect and talk to the server containing Spark application (either local or on-premise cloud distributions) in the run-time.
Any language/architecture may work. So far, I've seen two things that may be a help in that, but I'm not so sure if they would be the best alternative and how they work yet:
Spark Job Server - https://github.com/spark-jobserver/spark-jobserver - defines a REST API for Spark
Hue - http://gethue.com/get-started-with-spark-deploy-spark-server-and-compute-pi-from-your-web-browser/ - uses item 1)
Any advice would be appreciated. Simple toy example program (or steps) that shows, e.g. how to build such client for simply creating Spark Context on a local machine and say reading text file and returning basic stats would be ideal answer!
You may want to have a look on how the guys at Adobe Research built their spindle platform. Personally I haven't investigated that in detail, but they are also providing "Spark query results as a service".
Context:
Seeing a null pointer in one of the integration test, which runs in a locally spawned stom cluster. Increased the log level and could not figure it out what is really happening. Any help would be appreciated.
Your question doesn't quite match your title. If you're looking for better access to logs for scalable apps (whether on Hadoop or Storm) then check out tools that collect and aggregate logs from multiple nodes and systems. I'm familiar with PaperTrail and GreyLog, but I'm sure there are others. These tools, in conjunction with judicious use of log levels, can help you quickly find errors in your scalable apps.
If you're looking to get a better idea of how your system is performing (this is what I think of when I hear "visualization") then check out distributed monitoring tools. We've had very good success with the both the visualization of Storm bolt/spout performance and alert processing with CopperEgg, for example.
I have an academic course "Middleware" which covers different aspects of Distributed Software Systems including introduction to topics like [tag:Distributed File system]. This also involves introduction to hbase,hadoop,mapreduce,hiveql,piglatin.
I want to know, can I have a small project which tries to integrate above technologies. For starters, I am aware of vm provided by cloudera for having a feel of hadoop and playing around using Eclipse.
I was thinking on lines of implementing an application which accepts stream of events as an input, Analyses this and gives an output.
I have both windows/linux on my machine with i7 procoessor and 4Gb Ram.
Please let me know how to get started with everything and any suggestions for simple example application are welcome.
Here is a blog post on analyzing Tweets using Hive/HDFS. And here is a blog post on performing Clickstream analytics using Pig and Hive.
Check some of the Big Data use cases here and try to solve an interesting problem.
We start to integrate yammer metrics in our applications. And i would like to visualize the metrics.
Yammer-metrics have collect process, which could send metrics to Ganglia, or Graphite. But there are a little bit huge to install on my computer.
Do you know some simple reporting tool, with ram storage for example for this usage ?
There is a javascript library that graphs the output of the MetricsServlet: https://github.com/benbertola/metrics-watcher
I was looking at the Metrics project (i assume is this: http://metrics.codahale.com/) and found that is able to export the metrics to a CSV file, which can be used with many reporting tools including this one: DBxtra, the reason i recommend this one is because is very ad-hoc and you can design and view a report in less than 10 minutes, mostly by doing drag and drop.
If you just want to report periodically on your console you could use:
com.yammer.metrics.reporting.ConsoleReporter.enable(5, TimeUnit.MINUTES)