I have Elasticsearch index which logs my scraper statistics, like response status and headers used. How to do something like machine learning to generate a guess which combination of headers would succeed the best in future scrapes. is it possible to do with plain Elasticsearch if not - what plugins would you suggest.
From what I found out ELK only provides machine learning functionalities in Kibana's X-Pack extension, e.g. anomaly detection and forecasts link. For me it's useless because my model would need advanced data filtering and I want to visualize all my predictions on a dashboard. If you want to make custom predictions then the only way is to make your own script for predictions or use some out of the box ML solution like for example Amazon Machine Learning.
You can treat Elasticsearch as an ordinary NoSQL database and periodically extract raw data from Elasticsearch using REST requests and redirect it to a created ML script or ML webservice. Then you can save predictions to Elasticsearch as a new index which can be later visualized in Kibana.
HTTP GET HTTP PUT
Elasticsearch =========> Script(Filtering and Predictions) ==========> Elasticsearch
I'm still looking for the best solution to produce predictions but for now custom script seems like the only option and I'm currently developing it.
Related
I'm building a simple web application that will list/search retail items for sale.
design is like this ...
MySQL database -> Elastic Search deployment -> Spring Boot REST service -> Web UI (JSP/Bootstrap or Angular)
I am planning to write Java client code to read the database and post records to Elastic Search for indexing.
Googling, it looks like Logstash is used for this sort of thing. I'm not familiar with Logstash, I am very familiar with Java.
QUESTION: Is Java client considered a "deprecated" or "legacy" way to submit data to Elastic Search for indexing?
I'm very familiar with Java, should I use Java or Logstash?
Adding to #chris answer, logstash will add complexity and another Infrastructure to maintain in your stack, and logstash is known for getting stuck and is not as resilient as Elasticsearch is.
You are already using Java for your application code and btw elasticsearch now officially has a java client known as java high-level rest client(JHLRC) , which is very popular and provides an exhaustive list of APIs for indexing/searching and building a modern search system.
IMHO you should use the JHLRC,
which will save you to the pain points of logstash
you don't have to learn another tool
simple infrastructure
simple deployment
last but not least simple and easy to maintain codebase.
Logstash is good tool to be used to migrate the data from many sources to elastic search. It's build in java language only.
You can use Logstash. It also has options to mutate the data or filter the data. Its a ready to use to tool which will save lot of your development time and efforts.
But if you have a requirement for lot of customisation and need lot of control over your data before pushing it to elastic search then you can build your own application for the same.
Coming back to your question..java is not deprecated for indexing data to elastic search. It is still a preferred option.
I am trying to create a local index for my notes which comprises mainly of markdown files, text files, codes in python, javascript and dart.
I came across Solr and Elasticsearch.
But the main differences are focused around online use and distributedness.
Which can be a better choice if i need a good integrarion with javascript through electronjs?
Keeping in mind that the files are on local storage and there is not much focus on distributedness but on integration with javascript frontend and efficiency on local system.
Elasticsearch is more popular among newer developers due to its ease of use. But if you are already used to working with Solr, stay with it because there is no specific advantage of migrating to Elasticsearch.
I believe for your use case either of them would work.
However, If you need it to handle analytical queries in addition to searching text, Elasticsearch is the better choice
In terms of popularity, a larger community, documentations I would say elasticsearch is the winner, You can look at the below google trends
You can use the solr along with Apache Tika.
Apache Tika help in extracting the content/Text of different file system.
Using the above the you can index the metadata of the files and content of the files to the Apache solr.
You get admin tool for the analysis of the index and the fields to determine if you able to achieve the desired result.
In a v1.0 of a .Net data crawler, I created a Windows service that would read URLs from a database and based on some logic would select what to crawl on a specified interval.
This was single-threaded and worked well for a low number of endpoints, but scaling is obviously an issue.
I'm trying to find out how to do this using the ElasticSearch (ELK) stack and came across HTTPBeat,
a Beat to poll HTTP endpoints in a regular interval and ship the
result to the configured output channel, e.g. Logstash and
Elasticsearch.
In looking at the documentation, you have to add URLs to the config.yaml file. Not what I'm looking for as the list of URLs could change and we may not want all URLs crawled at the same time.
Then there's RSS for Logstash, which is a command-line tool - again, not what I'm looking for.
Is there a way to make use of the Beats daemon to read from the ElasticSearch database to do work based on database values - crawls, etc?
To take this to the enterprise level, do Beats or any other component of the ElasticSearch ecosystem use message queuing or a spooler (like FileBeats does - is this built into Beats?)?
I have recently installed Kibana4 but I am beginning to understand that dashboards are designed differently from Kibana3 i.e., to embed multiple visualizations which are designed individually into every dashboard. I already have a lot of dashboards designed in Kibana3 so I would like to know if there is a way to load them to kibana4 instead of creating everything from scratch.
To the best I know, there is no way to do that. Not just the formats, but the queries sent to ES backend are quite different. Kibana 3 used to use facets a lot for segmentation which is a deprecated feature and Kibana4 got rid of that.
We are using elasticsearch as back-end for our in-house logging and monitoring system. We have multiple sites pouring in data to one ES cluster but in different index. e.g. abc-us has data from US site, abc-india has it from India site.
Now concerns are we need some security checks before pushing in data to cluster.
data coming to index is coming from right IP address
incoming json request is of inserting new data and not delete/update
while reading we want certain IP should not be able to read data of other index.
Kindly let me know if its possible to achieve using elasticsearch.
The elasticsearch-jetty plugin brings full power of Jetty and adds several new features to elasticsearch. With this plugin elasticsearch can now handle SSL connections, support basic authentication, and log all or some incoming requests in plain text or json formats.
The idea is to add a Jetty wrapper to ElasticSearch, as a plugin.
What remains is only to restrict certain URL and some methods (eg DELETE) to some users.
You can find elasticsearch-jetty on github with detailed specification about it's usage, configuration and limitations of course.