elasticsearch error while sorting and paginatioon - sorting

I am using the Java API for ElasticSeach and I realized it was not applying correctly the sorting and pagination. My logic is working but because I am currently retrieving all the indexed documents, then applying the ordering logic and then the pagination (but manually). This is an issue because resource consumption will be high as there is a lot of demand.
My question is: There any way to let the API knw I want to order first and then apply pagination logic (this part I have seen how but well its not applied correctly)...
I am trying to use the Java API for elasticsearch to apply correctly the sorting and filtering... I am expecting to use the API and not applying all by my own.

Related

Using elastic search for a UI dashboard behind a proxy

I am working on a search dashboard with full text search capabilities, backed by ES. The search would initially be consumed by a UI dashboard. I am planning to have an application web service (WS) api layer between the UI dashboard and ES which will route the business search to ES.
There can be multiple clients to WS going forward, each with its own business use cases, and complex data requirements (basically response fields). There are many entities and huge number of fields across them. Each client would need to specify what fields entities it wants to return with what fields.
To support this dynamically changing requirement, one approach could be to have the WS be a pass through to the ES (with pre validations like access control and post transformations to the response from ES). The WS APIs will look exactly like the ES APIs, the UI should build ES queries through JS client and send it to WS, which after access control will get data from ES.
I am new to ES and skeptic of this approach. Can there be any particular challenges in this approach. One of my colleague has worked on ES before but always with a backend Java client, so he's not too sure.
I looked up a ES Js client and there's an official one here.
Some Context here:
We have around 4 different entities (can increase in future) with both full text and keyword type fields. A typical search could have multiple filters and search terms and would want to specify the result fields. Also, some searches would be across entities and some to individual ones. We are maintaining a separate entity for each entity.
What I understand from your post is, below is what you want to achieve at high level.
There can be multiple clients to WS going forward, each with its own
business use cases, and complex data requirements (basically response
fields)
And as you are not sure, how to do this, you are thinking to build Elasticsearch queries from Javascript in your front-end only. I am not a very big fan of this approach as it exposes, how you are building queries and if some hacker knows crucial like below information, then can bring your entire ES cluster to its knees:
Knows what types of wildcard queries.
Knows index names and ES cluster details(although you may have access control but still you are exposing the crucial info).
How you are building your search queries.
Above are just a few examples and will add more info.
Right approach
As you already have a backend, where you would be checking the access, there only build the Elasticsearch queries and you even have the advantage of your teammates who knows it.
For building complex response field, you can use the source filtering, using which you can specify in your search request, what all fields you want to return in your search result.

ElasticSearch filter results by role

I have a requirement where I need to filter the results of an ElasticSearch query according with the user role and the combination of permissions that are within the role.
However, performing the filtering after the results being retrieved from ES it will break the pagination. Building the pagination on the application layer may not work properly as is going to impact the results when we have facet search.
So, my idea was trying to include all the authorisation logic in the ES query, or build a custom plugin to deal with it, but I am bit reluctant which one is the best approach as I am not really an expert in ES neither I have written any plugin before.
It is possible to write a plugin to this work for us? It is considered a good approach?

Working with NLP tags in Elasticsearch

Working on a large data-oriented search product powered by elasticsearch. We've built a lot of machine learning functionality on top of this app, but currently we're having some difficulty deciding how to integrate fairly standard NLP-based word tags into our ES index.
Currently we have a tagging service that can annotate a word with a respective type (or types, but one may be useful enough for now). This function could be abstracted to: type = getWordType(word) I imagine there must be a way to integrate this tagging service into the analysis chain that is applied at index time, where, maybe, we tell the index what type a particular word belongs to. However, doing this kind of advanced analysis is a bit beyond my elasticsearch capacity. Does anyone have pointers on this kind of advanced analysis in elasticsearch?
Thanks!
you might want to take a look at the ingest node functionality introduced in Elasticsearch 5.0. This allows you to preprocess your documents and add fields into the JSON before the document is being indexed in Elasticsearch.
I wrote an ingest processor that is using OpenNLP to enrich documents. You could take a look at that one and adapt it to your needs (also, pull requests are very welcome).
Check it out at https://github.com/spinscale/elasticsearch-ingest-opennlp
This is achieved in Elasticsearch 6.5 with the type annotated_text: https://www.elastic.co/guide/en/elasticsearch/plugins/6.x/mapper-annotated-text-usage.html
Essentially, kind of like synonyms, the tags (or named entity IDs, etc) can exist at the same position as the word you’re tagging.
Needs a plugin installed, the Mapper Annotated Text Plugin.

Solr Slice v Page

Is it possible to use Slice via solrTemplate ?
actually I am struggling to see if it will even make a difference because even without using spring, there doesnt appear to be any way of telling Solr to exclude its "numFound" (total results) from a query
And when I use a normal spring data Page<..> query , when I look under the hood I only see one query issued to solr, i.e. no extra one for count. Or is the count simply done inside Solr somehow in an extra step ?
confused
Total document count is part of the Solr query. No additional query is required. Therefore, there is no advantage to Slice vs. Page.
The only related concept is when somebody wants to export a significant amount of data, in which case built-in paging becomes slower the further is data requested. For that, Solr has exporting functionality.

When do I query with Searchable and when with Hibernate?

I user Hibernate as well as the Grails Searchable Plugin which is based on Lucene and Compass. I was wondering when I should use what for querying objects from the database.
Is there a rule of thumb when to use Hibernate and when to user Searchable?
Searcable plugin will be highly useful when you think of free form text search through out your application.
To cite an example, if you are working on a banking application and you are building a portal with a search feature. And you want the search to be free form for all the key elements like customer name, ssn, phone number and/or email id, then you would like to index those using searchable and provide the search talking to searchable to get immediate search results. For this to happen you would have to index those key elements at the least. The indices would grow as ans when you add more key search elements.
On the other hand, hibernate will help you provide the detail information if you do not want to index lot of elements. To extend the above example, once you did a search on SSN and you got a hit, on selecting that entry you can use hibernate to fetch the detail information from the underlying persistence layer using hibernate.
Inference:
For speedy, high performance, free form search searhable is an option.
For gathering detailed information, post the search, I think hibernate is the way to go unless you want to use searchable for the detail info as well in which case the size of the indices will be in Gigs.
Follow here in elastic search which might help to understand.
My point is to make elastic/searchable lighter keeping the heavy lifting part taken care by hibernate.
NOTE
On a side note, I would suggest using elastic instead of searchable. It has also got a groovy API which is useful. Also note that elastic plugin uses v0.20.0 version of elastic search right now, the latest one being v0.90.2 I guess. If required you can directly use elastic search as a dependency and get the latest feature.

Resources