Creating a security filter in elasticsearch - elasticsearch

I'm attempting to create a security filter of sorts to exclude certain users from seeing certain documents in elasticsearch. As an example, if a document contains "ABC:123" and "ABC:XYZ", the user must have both of those in their profile to see the document. We are creating this on the fly using mustache templates. My first attempt was along these lines:
"bool": {
"filter": {
"bool": {
"minimum_should_match": 1,
"should": {
"bool": [{
"must_not": {
"prefix": {
"controlSet": "ABC:"
}
}
},{
"must": {
"terms": {
"controlSet": ["ABC:123","ABC:XYZ"]
}
}
}]
}
}
}
}
However, I quickly realized that this will allow a user with one control to view a document that has multiple. A document must have a subset of the controls that the user has to be matched. So if the user has "ABC:XYZ" only they should not be able to see a document that has "ABC:123" even if the document also contains "ABC:XYZ".
Is there a way to accomplish this that I am missing? Currently we enumerate ever control in the system and add them to a must_not but controls change periodically and I'd rather not maintain that listing manually.

Assuming your document looks like the following:
{
...
"controlSet": ["ABC:123", "ABC:XYZ"],
...
}
and controlSet is a keyword field, the following query should do the trick:
{
"bool": {
"filter": {
"terms": {
"controlSet": ["ABC:123", "ABC:XYZ"]
}
}
}
}
It will match documents with controlSet having at least ABC:123 and ABC:XYZ

I may have found a solution...
{
"bool": {
"must_not": {
"regexp": {
"value": "ABC:~(XYZ|123)",
"flags": "COMPLEMENT"
}
}
}
}
That should allow documents with either ABC:XYZ or ABC:123 (or both) while excluding documents with ABC:[anything else]
Of course I am worried about the speed of the regexp but I think the lack of wildcards will make it relatively fast.

Related

Elasticsearch "boost" not working when inside "filter"

I'm trying to boost matches on a certain field over another.
This works fine:
{
"query": {
"bool": {
"should": [
{
"terms": {
"boost": 2,
"mainField": "foo"
}
},
{
"terms": {
"otherField": "foo"
}
}
]
}
}
}
When i see the documents matched on mainField, i see they have a _score of 2.0 as expected.
But when i wrap this same query in a filter:
{
"query": {
"bool": {
"filter": [
{
"bool": {
"should": [
{
"terms": {
"boost": 2,
"mainField": "foo"
}
},
{
"terms": {
"otherField": "foo"
}
}
]
}
}
]
}
}
}
The _score for all documents is 0.0.
The same thing happens for multi_match. By itself (e.g inside a query) it works fine, but inside a bool + filter, it doesn't work.
Can someone explain why this is the case? I need to wrap in a filter due to the way my app composes queries.
Some context might also help: I'm trying to return documents that match on either mainField or otherField, but sort the ones matching on mainField first, so i figured boost would be the most appropriate choice here. But let me know if there is a better way.
The filter queries are always executed in the filter context. It will always return a score of zero and only contribute to the filtering of documents.
Refer to this documentation, to know more about filter context
Due to this, you are not getting a _score of 2.0, even after applying boost, in the second query

Create API keys on ElasticSearch with limited search capabilities

I want to create API keys on elasticsearch via POST _security/api_key API, I am able to create these but I want to limit search capability for the generated key which I am unable to do.
Essentially, what I want to achieve is let's say that all my records have a field like "username":"username1" or another one like "username":"username2" i.e. all the records will have a valid value for username field
Now what I want is to be able to create a key where I specify something like "username":"username2" which then gets appended to every search query made using that key as an AND case like if this key searches (/index_name/_search) for
{
"query": {
"match": {
"key_zz":"value_aa"
}
}
}
this query would actually be an AND of both of the below (i.e. if a record with this combination exists "key_zz":"value_aa" but the value for username is not username2, the API would not return that object)
{
"query": {
"match": {
"key_zz":"value_aa"
}
}
}
AND
{
"query": {
"must": {
"match" : {
"username":"username2"
}
}
}
}
I have tried creating the key with all of the following combinations:
"name": "key-name",
"role_descriptors": {
"role_name": {
"indices": [
{
"names": [
"index_name"
],
"privileges": [
"read"
],
"query": {
"match": {
"username": "value_custom"
}
}
}
]
}
}
}
In the query field, I have tried all the following combinations:
"query": {
"and": {
"username": "value_custom"
}
}
"query": {
"bool": {
"must": {
"match": {
"username": "value_custom"
}
}
}
}
"query": {
"bool": {
"must": {
"bool": {
"must": {
"match": {
"username": "value_custom"
}
}
}
}
}
}
But none of the above worked. Also, earlier the mapping type of username field was text but I then updated it to be keyword
In a nutshell, what I am trying to achieve is some kind of document level security. I know ElasticSearch has some kind of Document Level Security (https://www.elastic.co/guide/en/elastic-stack-overview/current/document-level-security.html) but in our architecture, we want to achieve this using API keys with restricted search capabilities. We are currently using Algolia and we were achieving this using the exact implementation I described above. ElasticSearch documentation has references for how to limit a role but not how to limit API keys. Need help to achieve this.
Also, I am using ElasticSearch v7
Some reference links :
https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-create-api-key.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-put-role.html

Search in every field with a fixed parameter

Perhaps it's a basic question; by the way, I need to search in every indexed field and to have a specific fixed value for another field.
How can I do it?
Currently I have a simple: query( "aValue", array_of_models )
I tried many options without success, for example:
query({
"query": {
"bool": {
"query": "aValue",
"filter": {
"term": {
"published": "true"
}
}
}
}
})
I would prefer to avoid to specify the fields to search in because I use the same search params for different models.
I found a solution, perhaps it's not optimized but works:
{
"query": {
"bool": {
"should": [
{
"match": {
"_all": "aValue"
}
}
],
"filter": {
"term": {
"published": true
}
}
}
}
}
Not sure if I understood correctly your intention.
The _all field is as default enabled. So if you have no special mapping every indexed field value is added as text string to the _all field.
You can use the
Query String Query, https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html
Simple Query String Query, https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html
With a simple query like this, that should work for you.
GET my_index/_search
{
"query": {
"simple_query_string": {
"query": "aValue",
"fields": []
}
}
}
Both query types contains parameters, that should suffice your use case IMHO.

Hide a single record in Elastic Search on a per user basis

As a logged in user, I want to be able to hide a single record that I never want to see again if I perform the same search. Is this possible with ElasticSearch?
I've read about multitenancy and filters but I'm not quite sure how a top level implementation might look like.
One of my ideas is that I store some reference to the unwanted record in an RDB and then add those references into a filter query but I'm not sure what reference to use since Elastic Search generates it's own ID's that may not stay the same when a re-index happens.
It depends. If you have not many users and not too big documents you can go with field on the document, Add field dismissedBy and when use dismiss write update to document
POST test/type1/1/_update
{
"script" : {
"inline": "ctx._source.dismissedBy.add(params.userId)",
"lang": "painless",
"params" : {
"userId" : "1"
}
}
}
And query:
POST /index/documents/_search
{
"query": {
"bool": {
"must_not": {
"term": {
"dismissedBy": 1
}
}
}
}
}
Problem with this approach is that if you re-index document all settings will be overwritten so you must keep copy in some other places too.
Other option if documents are large or you have lots of users then I would go with parent/child approach
If user hit dismiss then you should index it
PUT /indexname/dissmisses/1?parent=dismissforid
{
"userId": 1
}
Then when you search you do
POST /index/documents/_search
{
"query": {
"bool": {
"must_not": {
"has_child": {
"type": "dissmiss",
"query": {
"term": {
"userId": 1
}
}
}
}
}
}
}

Elasticsearch terms query on array of values

I have data on ElasticSearch index that looks like this
{
"title": "cubilia",
"people": [
"Ling Deponte",
"Dana Madin",
"Shameka Woodard",
"Bennie Craddock",
"Sandie Bakker"
]
}
Is there a way for me to do a search for all the people whos name starts with
"ling" (should be case insensitive) and get distinct terms properly cased "Ling Deponte" not "ling deponte"?
I am find with changing mappings on the index in any way.
Edit does what I want but is really bad query:
{
"size": 0,
"aggs": {
"person": {
"filter": {
"bool":{
"should":[
{"regexp":{
"people.raw":"(.* )?[lL][iI][nN][gG].*"
}}
]}
},
"aggs": {
"top-colors": {
"terms": {
"size":10,
"field": "people.raw",
"include":
{
"pattern": ["(.* )?[lL][iI][nN][gG].*"]
}
}
}
}
}
}
}
people.raw is not_analyzed
Yes, and you can do it without a regular expression by taking advantage of Elasticsearch's full text capabilities.
GET /test/_search
{
"query": {
"match_phrase": {
"people": "Ling"
}
}
}
Note: This could also be match or match_phrase_prefix in this case. The match_phrase* queries imply an order of the values in the text. match simply looks for any of the values. Since you only have one value, it's pretty much irrelevant.
The problem is that you cannot limit the document responses to just that name because the search API returns documents. With that said, you can use nested documents and get the desired behavior via inner_hits.
You do not want to do wildcard prefixing whenever possible because it simply does not work at scale. To put it in SQL terms, that's like doing a full table scan; you effectively lose the benefit of the inverted index because it has to walk it entirely to find the actual start.
Combining the two should work pretty well though. Here, I use the query to widdle down results to what you are interested in, then I use your inner aggregation to only include based on the value.
{
"size": 0,
"query": {
"match_phrase": {
"people": "Ling"
}
}
"aggs": {
"person": {
"terms": {
"size":10,
"field": "people.raw",
"include": {
"pattern": ["(.* )?[lL][iI][nN][gG].*"]
}
}
}
}
}
Hi Please find the query it may help for your request
GET skills/skill/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"wildcard": {
"skillNames.raw": "jav*"
}
}
]
}
}
}
}
}
My intention is to find documents starting with the "jav"

Resources