Query documents with access control filter - elasticsearch

Each document in my Elasticsearch index has two access control lists containing user ids. One is an allow list, the other is a deny list. I am trying to add a filter to a given query that considers these ACLs. I thought I could use a bool query with a must clause for the given query, a filter clause for the allow list, and a must_not clause for the deny list. What I have so far (example for user 1):
{
"bool" : {
"must" : {
[given query]
},
"filter" : [ {
"match" : {
"acl.allow" : {
"query" : "/user/1",
"type" : "boolean"
}
}
}],
"must_not" : [ {
"match" : {
"acl.deny" : {
"query" : "/user/1",
"type" : "boolean"
}
}
}]
}
}
Unfortunately, this query does not return the desired result. It returns objects that have not listed user 1 in their allow list (a behavior I don't understand). Also, it (obviously) ignores objects with empty access control lists (which should be visible to anyone). Any suggestions to fix that?

I figured it out. First of all, using match isn't really a good solution for that kind of query—due to its analyzer. Using term though left me puzzled why I did not get any results. Term queries only return results if the corresponding field is set to not_analyzed. Thus I changed my mapping:
"acl": {
"properties": {
"allow": {
"type": "string",
"index": "not_analyzed"
},
"deny": {
"type": "string",
"index": "not_analyzed"
}
}
}
My second problem—treating objects with empty ACLs as visible to anyone—was solved using exists nested in must_not nested in bool. This is recommended as substitute for the deprecated missing query. My final query looks like this and passed all ACL related tests I could think of.
{
"bool" : {
"must" : {
[given query]
},
"filter" : {
"bool" : {
"should" : [ {
"terms" : {
"acl.allow" : [ "/user/1" ]
}
}, {
"bool" : {
"must_not" : {
"exists" : {
"field" : "acl.allow"
}
}
}
} ]
}
},
"must_not" : {
"terms" : {
"acl.deny" : [ "/user/1" ]
}
}
}
}

Related

ElasticSearch - Fuzzy search in list elements

I've got some documents stored in ElasticSearch like this:
{
"tag" : ["tag1", "tag2", "tag3"]
...
}
I want to search through the "tag" field. I know that It should work with a query like:
{
"query":
{
"match" : {"tag" : "tag1"}
}
}
But, I don't want to use a match, I want to use a fuzzy search through the list, for example, something like:
{
"query":
{
"fuzzy" : {"tag" : "tagg1"}
}
}
The problem is, the above query doesn't return anything. What should I use instead?
What is the type of tag field in your elasticsearch mapping ?
I have tried with following type for tag field & elastisearch version is 7.2
"tag" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
And working well for me.
Query With elastic fuzzy will be :
{
"query":
{
"fuzzy": {"tag" : "tagg1"}
}
}

sorting/scoring parent documents based on child field values

Types Description: parent type
1)Parent Type: "product"
2)childType : "ratings"
Description of problem :i have an es query(query-1) which is working fine and fetching me results from parent Type(product), now i added a new type(ratings), and made the new type child of "product", and I need to extend this existing query where the new results should get sorted based on the filed values of matched child(ratings) type.
query -1:
{
"bool" : {
"must" : [ {
"has_parent" : {
"query" : {
"bool" : {
"must" : {
"term" : {
"searchable" : "true"
}
}
}
},
"parent_type" : "supplierparent",
"inner_hits" : { }
}
}, {
"bool" : {
"must" : {
"query" : {
"simple_query_string" : {
"query" : "rice"
}
}
}
}
}, {
"nested" : {
"query" : {
"term" : {
"productStatusList.status" : "Approved"
}
},
"path" : "productStatusList"
}
} ]
}
}
I think there should be better ways to do this like we do with nested documents' attribute's sort. But one way with parent-child relation to sort can occur is on current type's attribute.
since they are now completely different document store, it would not be possible to use child property for sorting parent. There can be multiple children for the same parent doc, which would create ambiguity in sorting results.

Elastic(search): How to structure nested queries correctly?

I'm currently quite confuse about the structuring of queries in elastic. Let me explain what I mean with the following template that works fine for me:
{
"template" : {
"query" : {
"filtered" : {
"query" : {
"bool" : {
"must" : [
{ "match" : {
"user" : "{{param_user}}"
} },
{ "match" : {
"session" : "{{param_session}}"
} },
{ "range" : {
"date" : {
"gte" : "{{param_from}}",
"lte" : "{{param_to}}"
}
} }
]
}
}
}
}
}
}
Ok so I want to get entries of a specific session of a user in a certain time period. Now if you take a llok at this link http://www.elastic.co/guide/en/elasticsearch/guide/current/combining-filters.html you can find the following query:
{
"query" : {
"filtered" : {
"filter" : {
"bool" : {
"should" : [
{ "term" : {"price" : 20}},
{ "term" : {"productID" : "XHDK-A-1293-#fJ3"}}
],
"must_not" : {
"term" : {"price" : 30}
}
}
}
}
}
}
In this example we have right after the "filtered" the "filter" keyword. However if I exchange my second "query" with a "filter" as in the example , my template won't work anymore. This is really counterintuitive and I payed alot of time to figure this out. A̶l̶s̶o̶ ̶I̶ ̶d̶o̶n̶'̶t̶ ̶u̶n̶d̶e̶r̶s̶t̶a̶n̶d̶ ̶w̶h̶y̶ ̶w̶e̶ ̶n̶e̶e̶d̶ ̶t̶o̶ ̶p̶u̶t̶ ̶e̶v̶e̶r̶y̶ ̶f̶i̶l̶t̶e̶r̶ ̶i̶n̶ ̶s̶e̶p̶a̶r̶a̶t̶e̶ ̶̶{̶ ̶}̶̶ ̶e̶v̶e̶n̶ ̶t̶h̶o̶u̶g̶h̶ ̶t̶h̶e̶y̶ ̶a̶r̶e̶ ̶a̶l̶r̶e̶a̶d̶y̶ ̶s̶e̶p̶a̶r̶a̶t̶e̶d̶ ̶b̶y̶ ̶t̶h̶e̶ ̶a̶r̶r̶a̶y̶ ̶s̶y̶n̶t̶a̶x̶.̶
Another issue I had was that I suggested to match several fields I can just type smth like:
{
"query" : {
"match" : {
"user" : "{{param_user}}",
"session" : "{{param_session}}"
}
}
}
but it seemed that I have to use a bool query which I didn't know of, so I searched for 'elastic multi match' but got something completely different.
My question: where can I find how to structure a query properly (smth like a PEG)? The documentation only give basic examples but doesn't state what we can actually do and how.
Best regards,
Jan
Edit: Ok I just found by accident that I cannot exchange "query" with "filter" as "match" is a query and not a filter. But then again what about "range"? It seems to be a query as well as a filter... Is there a summary of keywords specifying in which context they can be used?
Is there a summary of keywords specifying in which context they can be used?
I wouldn't consider that as keywords. It's just there are both queries and filters with the same names (but not all of them).
Here is everything you need. For example there are both range query and filter. All you need is to understand the difference between filters and queries.
For example, if you want to move range section from query to filter, you can do that like shown in the code below (not tested). Since your code already contains filtered type of query, you can just create filter section right after query section.
{
"template": {
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"match": {
"user": "{{param_user}}"
}
},
{
"match": {
"session": "{{param_session}}"
}
}
]
}
},
"filter": {
"range": {
"date": {
"gte": "{{param_from}}",
"lte": "{{param_to}}"
}
}
}
}
}
}
}
Just remember that you can filter only not analyzed fields.

match or term query on a long property for exact match?

My document has the following mapping property:
"sid" : {"type" : "long", "store": "yes", "index": "not_analyzed"},
This property has only one long value for each record. I would like to query this property. I tried the following two queries:
{
"query" : {
"term" : {
"sid" : 10
}
}
}
{
"query" : {
"match" : {
"sid" : 10
}
}
}
Both queries work and return the target document. My question: which one is more efficient? And why?
You want to use a term query, and if you want to be even more effecient, use a filtered query so your results get cached.
GET index1/test/_search
{
"query": {
"filtered": {
"filter": {
"term": {
"sid": 10
}
}
}
}
}
Both work like the same way as you mentioned. As distinguished from match query the term query matches documents that have fields that contain a term (not analyzed!). So my opinion is that term query is more efficient in your case, because no analyzing have to be done.See:http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-term-query.html

ElasticSearch using wildcard and term queries

I'm new using Elastic Search, and i never used Lucene too.
I build this query:
{
"query" : {
"wildcard" : { "referer" : "*.domain.com*" }
},
"filter" : {
"query" : {
"term" : { "first" : "1" }
}
},
"facets" : {
"site_id" : {
"terms" : {
"field" : "site",
"size" : "70"
}
}
}
}
The wildcard is working great, but the term filter was ignored, what i did wrong?
I need to filter the results with both wildcard and term
Thanks!
Assuming what you are trying to do is applying the filter on the wildcard query results,
you can use a FilteredQuery. However, your case might fit better for a filter.
You use a query filter. Instead of that you may directly use a TermFilter in a FilteredQuery rather than making a filter out of a TermQuery. TermFilter should be faster as it directly uses the TermsEnum.
Note that results of Filters are cached in a FilterCache and Filters are faster because they do not do any scoring of documents. In your case, even though the filter part of the FilteredQuery will work fast, but the wildcard query will be unnecessarily do scoring. You may try to use an AND Filter to club both queryfilter(wildcard query) and term filter instead of a FilteredQuery.
To make just the filter work as required by you, try something like below. (Not tried myself)
{
"filtered" : {
"query" : {
"wildcard" : { "referer" : "*.domain.com*" }
},
"filter" : {
"term" : { "first" : "1" }
}
},
"facets" : {
"site_id" : {
"terms" : {
"field" : "site",
"size" : "70"
}
}
}
}

Resources