I have a requirement where I need to filter the results of an ElasticSearch query according with the user role and the combination of permissions that are within the role.
However, performing the filtering after the results being retrieved from ES it will break the pagination. Building the pagination on the application layer may not work properly as is going to impact the results when we have facet search.
So, my idea was trying to include all the authorisation logic in the ES query, or build a custom plugin to deal with it, but I am bit reluctant which one is the best approach as I am not really an expert in ES neither I have written any plugin before.
It is possible to write a plugin to this work for us? It is considered a good approach?
Related
I am working on a search dashboard with full text search capabilities, backed by ES. The search would initially be consumed by a UI dashboard. I am planning to have an application web service (WS) api layer between the UI dashboard and ES which will route the business search to ES.
There can be multiple clients to WS going forward, each with its own business use cases, and complex data requirements (basically response fields). There are many entities and huge number of fields across them. Each client would need to specify what fields entities it wants to return with what fields.
To support this dynamically changing requirement, one approach could be to have the WS be a pass through to the ES (with pre validations like access control and post transformations to the response from ES). The WS APIs will look exactly like the ES APIs, the UI should build ES queries through JS client and send it to WS, which after access control will get data from ES.
I am new to ES and skeptic of this approach. Can there be any particular challenges in this approach. One of my colleague has worked on ES before but always with a backend Java client, so he's not too sure.
I looked up a ES Js client and there's an official one here.
Some Context here:
We have around 4 different entities (can increase in future) with both full text and keyword type fields. A typical search could have multiple filters and search terms and would want to specify the result fields. Also, some searches would be across entities and some to individual ones. We are maintaining a separate entity for each entity.
What I understand from your post is, below is what you want to achieve at high level.
There can be multiple clients to WS going forward, each with its own
business use cases, and complex data requirements (basically response
fields)
And as you are not sure, how to do this, you are thinking to build Elasticsearch queries from Javascript in your front-end only. I am not a very big fan of this approach as it exposes, how you are building queries and if some hacker knows crucial like below information, then can bring your entire ES cluster to its knees:
Knows what types of wildcard queries.
Knows index names and ES cluster details(although you may have access control but still you are exposing the crucial info).
How you are building your search queries.
Above are just a few examples and will add more info.
Right approach
As you already have a backend, where you would be checking the access, there only build the Elasticsearch queries and you even have the advantage of your teammates who knows it.
For building complex response field, you can use the source filtering, using which you can specify in your search request, what all fields you want to return in your search result.
I am using elastic-search for product filtering for products. We have complex logic of product availability. I can see two options
Using elastic to store only product specific data and product availability logic resides in web server part. we first filter data from elastic then check the condition on those result set if it matches the logic of availability.
or We can flatten the data and store it in elastic though for that case there will be duplicates of data.
My concern is if it is good practice to call elastic endpoint from browser. As it does not have any auth system by default. and every query and response will be visible in network log. I believe call should be made from web server to elastic and front end will communicate with elastic unaware of elastic existence.
Any best practice insight will be helpful
Simply create and authenticated endpoint in your backend and send the queries to that endpoint. Do make sure there are some enforced limits such as
size -- You don't want to let anybody download your whole index and
aggregation depth -- you don't want anyone to perform summaries on your whole index/indices to get a competitive advantage.
Regarding the duplicates: I wouldn't worry too much about the storage aspect (many NoSQL approaches will probably have some duplication to facilitate fast queries) but keep in mind that aggregations might yield "wrong" counts and sums. You'd typically perform those aggregations to get, say, the totals in your product categories and you want to make sure they are representative of your warehouse state.
More cannot really be said right now based on the limited information you've provided.
I'm currently designing the architecture of my project or atleast try to figure it out what will be useful in my case.
** Simple use case
I will have several thousands of profiles in a backend and I to need implement a fast search engine. So elasticsearch look perfect in that case. Everytime a profile is updated, the index will be updated by an asynchronous task.
My question now is : If I want to implement a cache system for the detail of a profile. Should I stick with elasticsearch and put these data in my index ? Or use Redis and do something like profil_id => data ?
I think both sounds good the problem is whenever a profile is updated, I will have to flush it after the reindexing in elasticsearch. If I want to see the change in my backend.
So what can I do ? Thank you so much !
You should consider using RediSearch. Using RediSearch can provide you a solution for your needs, getting both Redis performance and a full-text support.
Elasticsearch and redis are basically meant to solve two different problems, As one does indexing while other does caching.
Redis is meant to return already requested data as fast as possible whereas as
Elasticsearch is a search and analytics engine, it would perfectly fit a use-case where you have to implement a fast search engine and it will be more performant than any in-memory data structure store or cache such as redis(Assuming your searches will be complex, will involve some aggregation/filters).
The problem comes profile updates Since your profile updates are not that frequent you could actually do partial updates to the ES index rather doing reindex.So whenever a person updates its profile get the changeling set(changed data) and do a partial update to the particular document in ES Index. You can see how its done here partial update.
This one particular stackoverflow answer will help you cache vs indexing
I'm just getting to know elasticsearch and I'm wondering if it suits my case at all:
Considering a system where companies (with multiple employees) can register and administer their clients, and send documents to their clients.
Now, I want to enable companies to search their documents - but ONLY theirs, not the documents of other companies. In other words: how to separate the data of those companies for searches? How can this be implemented with elasticsearch?
Is this separation to be handled by elasticsearch itself? I.e. there is some mapping between the companies in my system and a related user for elasticsearch.
Or is this to be handled by the backend of my system? I.e. the backend somehow decides (how?) to show only search results for that particular company. So there would be just one user, namely the backend of my system, that accesses and filters the results of elasticsearch. But is this sensible?
I'm sure there is a wealth of information about this out there. Please just give me a hint, because I don't know what to search for. Searches for elasticsearch authentication/authorization, for example, only yield results about who gains access to the search system in general - not about a pattern to solve this separation.
Thanks in advance!
Elasticsearch on its own does not support Authorization and Authentication, you need to add this via plugins, of which there are two that I know of. Shield is the official solution, which is part of the X-Pack and you need to pay Elastic if you want to use it. SearchGuard is an open source alternative with enterprise upgrades that you can buy.
Both of these enable you to define fine grained access rights for different users. What you'd probably want to do is give every company an index of their own for their documents and then restrict their user to only be able to read/write that index. Or if you absolutely want all documents in one index, you can add document level restrictions as well, so that everybody queries the same index but only gets results returned for their company. Depending on how many companies you expect to service this might make more sense in order to not have too many indices and shards, but I'd suspect that an index per company would be the best way to go.
Without these plugins you would need to resort to something on the http-layer, for example an nginx reverse proxy that filters requests based on the index names contained in the urls or something, but I'd severely advise against this, lots of pain lies that way!
I user Hibernate as well as the Grails Searchable Plugin which is based on Lucene and Compass. I was wondering when I should use what for querying objects from the database.
Is there a rule of thumb when to use Hibernate and when to user Searchable?
Searcable plugin will be highly useful when you think of free form text search through out your application.
To cite an example, if you are working on a banking application and you are building a portal with a search feature. And you want the search to be free form for all the key elements like customer name, ssn, phone number and/or email id, then you would like to index those using searchable and provide the search talking to searchable to get immediate search results. For this to happen you would have to index those key elements at the least. The indices would grow as ans when you add more key search elements.
On the other hand, hibernate will help you provide the detail information if you do not want to index lot of elements. To extend the above example, once you did a search on SSN and you got a hit, on selecting that entry you can use hibernate to fetch the detail information from the underlying persistence layer using hibernate.
Inference:
For speedy, high performance, free form search searhable is an option.
For gathering detailed information, post the search, I think hibernate is the way to go unless you want to use searchable for the detail info as well in which case the size of the indices will be in Gigs.
Follow here in elastic search which might help to understand.
My point is to make elastic/searchable lighter keeping the heavy lifting part taken care by hibernate.
NOTE
On a side note, I would suggest using elastic instead of searchable. It has also got a groovy API which is useful. Also note that elastic plugin uses v0.20.0 version of elastic search right now, the latest one being v0.90.2 I guess. If required you can directly use elastic search as a dependency and get the latest feature.