ElasticSearch - Index a large file using Java API - elasticsearch

We have a requirement wherein we have to use ElasticSearch for performing full text search. We have a Spring based application and for integration with ES we can use either Java API of Elastic Search or Spring Data for ElasticSearch.
The input will be of a file type having size around 5MB.
I went through examples for both ES Java API and SpringData, they do have
tutorials available for inserting a JSON document.
But any help with regards to using File as an input to create documents/index is not available.
I am newbie with Elastic Search, any guidance/help on this will be much appreciated.
EDIT:
I could see that there is a Ingest Attachment Processor plugin available in ES (https://www.elastic.co/guide/en/elasticsearch/plugins/master/ingest-attachment.html).
Can anybody point me to a sample CURL request to use this plugin or any Java code to use this plugin

1.You may use Elasticsearch mapper attachments plugin. This plugin uses Apache Tika to ingest almost any well known type of document and make it searchable by Elasticsearch.
https://www.elastic.co/guide/en/elasticsearch/plugins/2.3/mapper-attachments.html
2.You can use Apache Tika to extract useful content from file and use elasticsearch Bulk Indexing api to index to ES
Hope that helps

Related

hapi fhir elastic search how to configure

I am using hapi fhir v5.1.0 with jpa server(hapi-fhir-jpa-server-starter). As per the description, this version contains elastic search library for text search.
How can I configure elastic server here?
I see some entries in the properties file and configured the elastic rest url, but nothing works and always get the following error - HSEARCH000222 - the search factory was not initialized.
Could someone please let me know the configuration steps required to activate elastic search inside hapi fhir jpa server?

Most useful plugins for ElasticSearch

What would be some of the top most used Elasticsearch plugins?
For example, monitoring data, mapping, or analysis plugin.
OP didn't mention the ES version you are using. I am suggesting below plugins as they are easy to set-up, free and provides the admin interface for elasticsearch cluster.
I would recommend, For ES versions less than 2.x KOPF plugin and for the latest version of ES, use Cerebro, which is from the same author of kopf.
It offers an easy way of performing common tasks on an elasticsearch cluster. Not every single API is covered by this plugin, but it does offer a REST client which allows you to explore the full potential of the ElasticSearch API.

What is the best way to use Spring and ElasticSearch?

I have to implement some application by using springframework.
All i have to do is just select from repository (no RDBMS, maybe lucene or elastic search core) and Display some view pages for customers. that is not save or update but read.
What is the best way to select for repositories in spring framework ?
You can use spring-data-elasticsearch which is the Spring Data implementation for ElasticSearch.
In order to get started, you may like to refer to https://www.mkyong.com/spring-boot/spring-boot-spring-data-elasticsearch-example/ which explains the integration with an example. Although it is a bit old but provide you with enough information to get it working.

Couchbase plugin for ElasticSearch deprecated?

I was reading https://www.elastic.co/blog/deprecating-rivers which stats that ES rivers (plugin) are getting deprecated. i.e. any plugin directly integrated with ElasticSearch server will no longer work beyond ES 3.x onwards.
Couchbase plugin is one of those kind.
I searched all the documents of couchbase plugin at http://developer.couchbase.com/documentation/server/4.5/connectors/elasticsearch-2.1/elastic-intro.html but could not find if they are using deprecated way or not?
Does anyone know? Should we keep using couchbase plugin or should start planning to write data directly to ES using our application.
We have couchbase data getting replicated to ES using couchbase plugin and XDCR.
I'm the maintainer of the Couchbase ES transport plugin. As Roi mention in his answer, the plugin doesn't use rivers, so it won't be deprecated. It currently supports any version of ES from 1.3 to 2.x, and I'm working on adding support for 5.x. It's taking a bit longer, because ES 5.x broke some configuration sharing features in unexpected ways.
I'd suggest always looking at our github repo for the latest plugin releases:
https://github.com/couchbaselabs/elasticsearch-transport-couchbase
The Couchbase plugin is not using Rivers, there is another River plugin which is not longer valid.
take a look here: https://github.com/couchbaselabs/elasticsearch-transport-couchbase

How to combine neo4j and elasticsearch

I am developing a Question answering application and for that I need to use neo4j and elasticsearch in the same maven project. I am using elasticsearch to make my application more robust.
As we know that neo4j and elasticsearch works on different version of lucene, so whichever version I include in dependency, it gives an error.
Here is what I am doing:
First elasticsearch will index the data and the data and relationships will be stored as graphdatabase using neo4j. Then the user will input as a query, through which the data will be retrieved with the help of indexes. This data will be trigerred in graphdatabasev using trigger score which will be then propagated along the graphdatabase to find relevant results according to the user query.
Is there any way that I can integrate neo4j and elasticsearch in same maven project, or is there any other way through which these two modules can interact seperately.
Thanks
Please check out our integration page:
http://neo4j.com/developer/elastic-search/
Which has some discussion and also an example project to get you started.
http://github.com/neo4j-contrib/neo4j-elasticsearch

Resources