I am new to Elasticsearch, Logstash and Kibana by using this tutorial i just played around it. Now I want to know how to create real production application for Log Analysis including login and securities model. Is there any way to achieve with kibana.
A fairly vague question, but...
In Kibana 3, you setup dashboards that are made up of "panels". Each panel can be a different type (histogram, pie chart, etc). Each panel can chart one or more queries.
In Kibana 4 (still in beta), it's a more explicit multi-step process. You create "visualizations", which, like panels, are components that you'll use in dashboards.
Related
Assuming I have many Python processes running on an automation server such as Jenkins, let's say I want to use Python's native logging module and, other than writing to the Jenkins console or to a log file, I want to store & centralize the logs somewhere.
I thought of using ELK for that, but then I realized that I can just as well create a dedicated log table in an existing database (I'm using Redshift), use something like Grafana for log dashboards/visualization and save myself the trouble of deploying a new system (most of the people in my team are familiar with Redshift but not with ElasticSearch).
Although it sounds straightforward, I feel like I'm not looking at the big picture and that I would be missing some powerful capabilities that components like Logstash were written for the in the first place. What would these capabilities be and how would it be advantageous to use ELK instead of my solution?
Thank you!
I have implemented a full ELK stack in my company in the past year.
The project was huge and took a lot of time to properly implement. The advantages of using ELK and not implementing our own centralized logging solution would be:
Not needing to re-invent the wheel- There is already a product that is doing just that. (and the installation part is extremely easy)
It is battle tested and can stand huge amount of logs in a short time.
As your business and product grows and shift you will need to parse more logs with different structure which will mean DB changes on self built system. logstash will give you endless possibilities of filtering and parsing those new formatted logs.
It has Cluster and HA capabilities, and you can scale your logging system vertically and horizontally.
Very easy to maintain and change over time.
It can send the needed output to a variety of products including Zabbix, Grafana, elasticsearch and many more.
Kibana will give you ability to view the logs, build graphs and dashboards, alerts and more...
The options with ELK are really endless and the more I work with it, the more I find new ways it can help me. not just from viewing logs on distributed remote server systems, but also security alerts and SLA graphs and many other insights.
I have Elasticsearch index which logs my scraper statistics, like response status and headers used. How to do something like machine learning to generate a guess which combination of headers would succeed the best in future scrapes. is it possible to do with plain Elasticsearch if not - what plugins would you suggest.
From what I found out ELK only provides machine learning functionalities in Kibana's X-Pack extension, e.g. anomaly detection and forecasts link. For me it's useless because my model would need advanced data filtering and I want to visualize all my predictions on a dashboard. If you want to make custom predictions then the only way is to make your own script for predictions or use some out of the box ML solution like for example Amazon Machine Learning.
You can treat Elasticsearch as an ordinary NoSQL database and periodically extract raw data from Elasticsearch using REST requests and redirect it to a created ML script or ML webservice. Then you can save predictions to Elasticsearch as a new index which can be later visualized in Kibana.
HTTP GET HTTP PUT
Elasticsearch =========> Script(Filtering and Predictions) ==========> Elasticsearch
I'm still looking for the best solution to produce predictions but for now custom script seems like the only option and I'm currently developing it.
I am building an application in a micro service architecture . So I have my different business models running on different microservices.
Microservices are using graph and document databases.
What I have to do is, I need to keep all audit logs about the objects whenever they were changed. There are couple of ways to do this,two I thought of :
Store audit logs in the each databases whenever something changes to object.
Instead of having it localized, make it to a central repository where we can see all the audits for whole application as behind the
scenes application is served by micro services but at front this is
just one app for the users and also for us. Would elastic search be
used for this purpose of long term storage ? or we have other
solutions ?
Which other ways are the best practices that I must follow. My objective in the end is to the when what was changed in the object by whom.
Cheers!
General recommendation is not to use ES as your authoritative data store. If you want 99.99% reliability for the audit data store it somewhere else, and index in ES when you need its searching abilities.
In my experience ES is quite resilient, still I keep in mind its storage is not that polished comparing to well known relational DBs or Cassandra/HDFS and I would not store important data there.
Also keep in mind ES index in not very flexible, if you want to heavily rescale your cluster or to change field mapping you may have to reindex everything. Newer versions of ES offer "Reindex API", still it's weak point.
I have recently installed Kibana4 but I am beginning to understand that dashboards are designed differently from Kibana3 i.e., to embed multiple visualizations which are designed individually into every dashboard. I already have a lot of dashboards designed in Kibana3 so I would like to know if there is a way to load them to kibana4 instead of creating everything from scratch.
To the best I know, there is no way to do that. Not just the formats, but the queries sent to ES backend are quite different. Kibana 3 used to use facets a lot for segmentation which is a deprecated feature and Kibana4 got rid of that.
We are currently using elasticsearch to index and perform searches on about 10M documents. It works fine and we are happy with its performance. My colleague who initiated the use of elasticsearch is convinced that it can be used as the central data repository and other data systems (e.g. SQL Server, Hadoop/Hive) can have data pushed to them. I didn't have any arguments against it because my knowledge of both is too limited. However, I am concerned.
I do know that data in elasticsearch is stored in a manner that is efficient for text searching. Hadoop stores data just as a file system would but in a manner that is efficient to scale/replicate blocks over over multiple data nodes. Therefore, in my mind it seems more beneficial to use Hadoop (as it is more agnostic w.r.t its view on data) as a central data repository. Then push data from Hadoop to SQL, elasticsearch, etc...
I've read a few articles on Hadoop and elasticsearch use cases and it seems conventional to use Hadoop as the central data repository. However, I can't find anything that would suggest that elasticsearch wouldn't be a decent alternative.
Please Help!
As is the case with all database deployments, it really depends on your specific application.
Elasticsearch is a great open source search engine built on top of Apache Lucene. Its features and upgrades allow it to basically function just like a schema-less JSON datastore that can be accessed using both search-specific methods and regular database CRUD-like commands.
Nevertheless all the advantages Elasticsearch that brings, there are still some main disadvantages:
Security - Elasticsearch does not provide any authentication or access control functionality. It's supported since they have introduced shield.
Transactions - There is no support for transactions or processing on data manipulation. Well now data manipulation is handled with logstash.
Durability - ES is distributed and fairly stable but backups and durability are not as high priority as in other data stores.
Maturity of tools - ES is still relatively new and has not had time to develop mature client libraries and 3rd party tools which can make development much harder. We can consider that it's quite mature now
with a variety of connectors and tools around it like kibana. But it's still not suited for large computations - Commands for searching data are not suited to "large" scans of data and advanced computation on the db side.
Data Availability - ES makes data available in "near real-time" which may require additional considerations in your application (ie: comments page where a user adds new comment, refreshing the page might not actually show the new post because the index is still updating).
If you can deal with these issues then there's certainly no reason why you can't use Elasticsearch as your primary data store. It can actually lower complexity and improve performance by not having to duplicate your data but again this depends on your specific use case.
As always, weigh the benefits, do some experimentation and see what works best for you.
DISCLAIMER: This answer was written a while ago for the Elasticsearch 1.x series. These critics still somehow stand with the 2.x series. But Elastic is working on them, as the 2.x series comes with more mature tools, APIs and plugins per example, security wise, like Shield or even transport clients like Logstash or Beats, etc.
I'd highly discourage most users from using elasticsearch as your primary datastore. It will work great until your cluster melts down due to a network partition. Even settings such as minimum_master_nodes that the ES pros always set won't save you. See this excellent analysis by Aphyr with his Call Me Maybe series:
http://aphyr.com/posts/317-call-me-maybe-elasticsearch
eliasah, is right, it depends on your use case, but if your data (and job) is important to you, stay away.
Keep your golden record of your data stored in something really focused on persisting and sync your data out to search from there. It adds extra complexity and resources, but will result in a better nights rest :)
There are plenty of ways to go about this and if elasticsearch does everything you need, you can look into Kafka for persisting all the events going into a cluster which would allow replaying if things go wrong. I like this approach as it provides an async ingestion pipeline into elasticsearch that also does the persistence.