We are starting with Elastic search for a full text search and still not started anything with code level. We are familiar with both Java and Python as well. Some one could suggest, which one is preferable to use for building a ES application - Java or Python?
Language is not an issue here I think. Especially when you are considering Java or Python. Both have official clients and if something is not implemented there you can always communicate with Elasticsearch directly via REST. Elasticsearch is written in Java so the Java clients could me often faster up-to-date but it is not a rule.
Related
I'm building a simple web application that will list/search retail items for sale.
design is like this ...
MySQL database -> Elastic Search deployment -> Spring Boot REST service -> Web UI (JSP/Bootstrap or Angular)
I am planning to write Java client code to read the database and post records to Elastic Search for indexing.
Googling, it looks like Logstash is used for this sort of thing. I'm not familiar with Logstash, I am very familiar with Java.
QUESTION: Is Java client considered a "deprecated" or "legacy" way to submit data to Elastic Search for indexing?
I'm very familiar with Java, should I use Java or Logstash?
Adding to #chris answer, logstash will add complexity and another Infrastructure to maintain in your stack, and logstash is known for getting stuck and is not as resilient as Elasticsearch is.
You are already using Java for your application code and btw elasticsearch now officially has a java client known as java high-level rest client(JHLRC) , which is very popular and provides an exhaustive list of APIs for indexing/searching and building a modern search system.
IMHO you should use the JHLRC,
which will save you to the pain points of logstash
you don't have to learn another tool
simple infrastructure
simple deployment
last but not least simple and easy to maintain codebase.
Logstash is good tool to be used to migrate the data from many sources to elastic search. It's build in java language only.
You can use Logstash. It also has options to mutate the data or filter the data. Its a ready to use to tool which will save lot of your development time and efforts.
But if you have a requirement for lot of customisation and need lot of control over your data before pushing it to elastic search then you can build your own application for the same.
Coming back to your question..java is not deprecated for indexing data to elastic search. It is still a preferred option.
I am trying to create a local index for my notes which comprises mainly of markdown files, text files, codes in python, javascript and dart.
I came across Solr and Elasticsearch.
But the main differences are focused around online use and distributedness.
Which can be a better choice if i need a good integrarion with javascript through electronjs?
Keeping in mind that the files are on local storage and there is not much focus on distributedness but on integration with javascript frontend and efficiency on local system.
Elasticsearch is more popular among newer developers due to its ease of use. But if you are already used to working with Solr, stay with it because there is no specific advantage of migrating to Elasticsearch.
I believe for your use case either of them would work.
However, If you need it to handle analytical queries in addition to searching text, Elasticsearch is the better choice
In terms of popularity, a larger community, documentations I would say elasticsearch is the winner, You can look at the below google trends
You can use the solr along with Apache Tika.
Apache Tika help in extracting the content/Text of different file system.
Using the above the you can index the metadata of the files and content of the files to the Apache solr.
You get admin tool for the analysis of the index and the fields to determine if you able to achieve the desired result.
I'm trying to get some suggestions as I setup my data system. I'd like to setup a system for web crawling. It'll crawl probably a few hundred/thousand sites on a regular basis.
I'm aware of Nutch and have used Nutch, however I'd like to know if others know of a better crawler than Nutch.
I'm also using Elasticsearch as the indexer and its quite hard to get Nutch to work with newer versions of ES.
You can take a look at StormCrawler is based on Apache Storm and is not only a full-featured crawler but also has a focus on Near Real Time crawling. ES is usually very updated, at the moment of this writing, supports ES v6.1.1 (https://github.com/DigitalPebble/storm-crawler/blob/master/external/elasticsearch/pom.xml#L20) so this could work you. Keep in mind that is a different approach & technologie than Nutch, although it uses some of the ideas behind Apache Nutch.
Also, in https://github.com/BruceDone/awesome-crawler you can find a list of a lot of crawlers written in a lot of different languages.
Can we monitor the elastic stack 6.0 and above(like elastic search..) without using the X-Pack?As we know many of the Features like security, machine learning, graph APIs don't be supported under BASIC(free Licence).
So I want to know if there are any APIs, without Licence limitation, can be used to implement those functionalities mentioned above?
All the information should be in the cluster APIs, you'll just lack the visualizations.
Monitoring (of the local cluster) is actually included in X-Pack Basic unlike the other features. Any reason you don't want to use it?
Alternatives include Kopf, Cerebro,... though you'll need to run them as a separate process and watch out for version compatibilities.
We've had success with ElasticHQ for Monitoring (requires python)
https://github.com/ElasticHQ/elasticsearch-HQ
And sentinl for setting up alerts/watchers (it is a plugin for kibana)
https://github.com/sirensolutions/sentinl/wiki
We have set up a reverse proxy to enable ssl/tls and use ubuntu user management to create logins, however, we do not limit access within Kibana itself.
We have little need for graph/machine learning so I am unaware of free alternatives.
The company I work for is heavily Open Source, so these projects suit us.
I'm exploring building an Electron desktop app that would be powered by ElasticSearch running on the client. Is this possible, and if so, how can it be done?
Depending on interpretation, there are two possible answers to your question:
If you want to implement an elasticsearch library that enables the use of elasticsearch from within electron, try elasticsearch.js.
If you want to implement local offline search within a client, try using either lunr.js, or its weird but loveable cousin, elasticlunr.