How to monitor queries in Elasticsearch? - elasticsearch

We are using Elastic APM for monitoring our APIs. It shows queries status and useful information about the queries. I want to have the same information about the queries which are sent to Elasticsearch server.
I want to have information about queries, time, status code, etc. Is there any plugin in Elastic stack that I can use for this purpose?

For a high-level overview type of information, have a look at Elastic Stack Monitoring.
If you want to look at any monitoring in more detail, have a look at the monitoring APIs themselves.
If you want to log this sort of information, you should set thresholds for your Elasticsearch slow log.
If you want to index and then view data from the slow log, you can always use Filebeat to ingest that slow log data back into Elasticsearch.

Related

How do I instrument my code for Splunk metrics?

I'm brand new to Splunk, having worked exclusively with Prometheus before. The one obvious thing I can't see from looking at the Splunk website is how in my code, I create/expose a metric... if I must provide an HTTP endpoint for consumption, or call into some API to push values, etc. Further, I cannot see which languages Splunk provide libraries for, in order to aid instrumentation - I cannot see where all this low level stuff is documented!
Can anyone help me understand how Splunk works, particularly how it compares to Prometheus?
Usually, programs write their normal log files and Splunk ingests those files so they can be searched and data extracted.
There are other ways to get data into Splunk, though. See https://dev.splunk.com/enterprise/reference for the SDKs available in a few languages.
You could write your metrics to collectd and then send them to Splunk. See https://splunkonbigdata.com/2020/05/09/metrics-data-collection-via-collectd-part-2/
You could write your metrics directly to Splunk using their HTTP Event Collector (HEC). See https://dev.splunk.com/enterprise/docs/devtools/httpeventcollector/

ElasticSearch deep learning

I have Elasticsearch index which logs my scraper statistics, like response status and headers used. How to do something like machine learning to generate a guess which combination of headers would succeed the best in future scrapes. is it possible to do with plain Elasticsearch if not - what plugins would you suggest.
From what I found out ELK only provides machine learning functionalities in Kibana's X-Pack extension, e.g. anomaly detection and forecasts link. For me it's useless because my model would need advanced data filtering and I want to visualize all my predictions on a dashboard. If you want to make custom predictions then the only way is to make your own script for predictions or use some out of the box ML solution like for example Amazon Machine Learning.
You can treat Elasticsearch as an ordinary NoSQL database and periodically extract raw data from Elasticsearch using REST requests and redirect it to a created ML script or ML webservice. Then you can save predictions to Elasticsearch as a new index which can be later visualized in Kibana.
HTTP GET HTTP PUT
Elasticsearch =========> Script(Filtering and Predictions) ==========> Elasticsearch
I'm still looking for the best solution to produce predictions but for now custom script seems like the only option and I'm currently developing it.

How to view "Events per second" in Logstash?

I am building an alert using Elasticsearch and I need to access the data for one of the logstash nodes for how many events it is receiving per second. On the Monitoring (formally Marvel) tab, this information is readily available in graph format. Is there anyway to get that same information using an ELK API aside from scripting something?
Take a look at the Logstash Metrics filter plugin

What's the new Watson Discovery service?

I just went to Bluemix and saw that there is a new experimental service called Discovery. Apparently, it can ingest PDFs, Word Documents, and HTML pages among other file types.
What's the difference between that service and Document Conversion(DC)? Before, I used to convert my documents using DC and then index them in Retrieve and Rank? Is Discovery the merge of Retrieve and Rank and Document Conversion?
The IBM Watson™ Discovery Service uses data analysis combined with cognitive intuition to take your unstructured data and enrich it so you can query it for the information you need. The service enables you to ingest and index content so that you can subsequently use that information to answer queries.
The service is experimental now but the idea is that you will be able to do something similar to what you currently do with Document Conversion and Retrieve and Rank. One of the main benefits is that ingestion and indexing are now managed by the service.
For detailed information, see the documentation.
Note: I work for IBM Watson

Elasticsearch: security concerns

We are using elasticsearch as back-end for our in-house logging and monitoring system. We have multiple sites pouring in data to one ES cluster but in different index. e.g. abc-us has data from US site, abc-india has it from India site.
Now concerns are we need some security checks before pushing in data to cluster.
data coming to index is coming from right IP address
incoming json request is of inserting new data and not delete/update
while reading we want certain IP should not be able to read data of other index.
Kindly let me know if its possible to achieve using elasticsearch.
The elasticsearch-jetty plugin brings full power of Jetty and adds several new features to elasticsearch. With this plugin elasticsearch can now handle SSL connections, support basic authentication, and log all or some incoming requests in plain text or json formats.
The idea is to add a Jetty wrapper to ElasticSearch, as a plugin.
What remains is only to restrict certain URL and some methods (eg DELETE) to some users.
You can find elasticsearch-jetty on github with detailed specification about it's usage, configuration and limitations of course.

Resources