Query with terms does not work properly - elasticsearch

I have this document in Elasticsearch (1.6)
{
"_index": "onkopedia",
"_type": "document_",
"_id": "0afa26afc2d1440a8ed03dac0e8511fc",
"_version": 1,
"_score": null,
"_source": {
"description": "",
"contributors": [ ],
"metaTypeName": "Connector",
"sortableTitle": "mammakarzinom der frau",
"subject": [ ],
"authorizedUsers": [
"Anonymous"
],
"language": "",
"title": "Mammakarzinom der Frau",
"url": "http://dev1.veit-schiele.de:9080/onkopedia/de/onkopedia/guidelines/mammakarzinom-der-frau",
"author": "ajung",
"modified": "2015-05-11T05:21:14",
"metaType": "xmldirector.plonecore.connector",
"content": " Mammakarzinom der Frau Stand: Januar 2013 Autoren der aktuellen .....",
"authorName": "ajung",
"created": "2015-05-11T05:21:14",
"review_state": "published"
},
"sort": [
null
]
}
containing a key
'authorizedUsers': ['Anonymous']
The following query is supposed to return the document above however it does not:
{
"sort": [
"_score"
],
"from": 0,
"fields": [
"url",
"title",
"description",
"metaType",
"metaTypeName",
"author",
"authorName",
"contributors",
"modified",
"subject",
"review_state",
"language",
"content"
],
"query": {
"filtered": {
"filter": {
"and": [
{
"terms": {
"execution": "or",
"metaType": [
"Document",
"FormFolder",
"Collection",
"Discussion Item",
"News Item",
"xmldirector.plonecore.connector",
"CaptchaField"
]
}
},
{
"terms": {
"execution": "or",
"authorizedUsers": [
"Manager",
"Authenticated",
"Anonymous",
"user:ajung"
]
}
}
]
},
"query": {
"query_string": {
"query": "mammakarzinom",
"default_operator": "AND",
"fields": [
"title^3",
"contributors^2",
"subject^2",
"description",
"content"
]
}
}
}
},
"highlight": {
"fields": {
"content": {
"fragment_size": 250,
"number_of_fragments": 3
},
"description": {
"fragment_size": 250,
"number_of_fragments": 2
},
"title": {
"number_of_fragments": 0
}
}
},
"size": 15
}
The query without the filter for 'authorizedUsers' does return the document.
Why? 'Anonymous' as value for 'authorizedUsers' is available within the query, so I would expect that the document would be found by the first query, or?
{
"sort": [
"_score"
],
"from": 0,
"fields": [
"url",
"title",
"description",
"metaType",
"metaTypeName",
"author",
"authorName",
"contributors",
"modified",
"subject",
"review_state",
"language",
"content"
],
"query": {
"filtered": {
"filter": {
"and": [
{
"terms": {
"execution": "or",
"metaType": [
"Document",
"FormFolder",
"Collection",
"Discussion Item",
"News Item",
"xmldirector.plonecore.connector",
"CaptchaField"
]
}
}
]
},
"query": {
"query_string": {
"query": "mammakarzinom",
"default_operator": "AND",
"fields": [
"title^3",
"contributors^2",
"subject^2",
"description",
"content"
]
}
}
}
},
"highlight": {
"fields": {
"content": {
"fragment_size": 250,
"number_of_fragments": 3
},
"description": {
"fragment_size": 250,
"number_of_fragments": 2
},
"title": {
"number_of_fragments": 0
}
}
},
"size": 15
}

Probably your analyzer for authorizedUsers field is lowercasing the value itself. So, in your index the actual values is anonymous (lowercase a).
Try this filter:
{
"terms": {
"execution": "or",
"authorizedUsers": [
"manager",
"authenticated",
"anonymous",
"user:ajung"
]
}
}
meaning, search the index with the values that are actually there.
One more thing: terms is not analyzing the input text. This means that if you search for Anonymous then this is what it will look into the index. Since you have anonymous in the index, it will not match.

Related

Order documents by multiple geolocations

I am new to ElasticSearch and I try to create an index for companies that come with multiple branches in the city.
Each of the branches, it has its own geolocation point.
My companies document looks like this:
{
"company_name": "Company X",
"branch": [
{
"address": {
// ... other fields
"location": "0.0000,1.1111"
}
}
]
}
The index have the following mapping:
{
"companies": {
"mappings": {
"dynamic_templates": [
{
"ids": {
"match": "id",
"match_mapping_type": "long",
"mapping": {
"type": "long"
}
}
},
{
"company_locations": {
"match": "location",
"match_mapping_type": "string",
"mapping": {
"type": "geo_point"
}
}
}
],
"properties": {
"branch": {
"properties": {
"address": {
"properties": {
// ...
"location": {
"type": "geo_point"
},
// ...
}
},
}
}
}
}
}
}
Now, in the ElasticSearch I've indexed the following documents:
{
"company_name": "Company #1",
"branch": [
{
"address": {
"location": "39.615,19.8948"
}
}
]
}
and
{
"company_name": "Company #2",
"branch": [
{
"address": {
"location": "39.586,19.9028"
}
},
{
"address": {
"location": "39.612,19.9134"
}
},
{
"address": {
"location": "39.607,19.8946"
}
}
]
}
Now what is my problem. If I try to run the following search query, unfortunately the company displayed first is the Company #2 although the geodistance query has the location data of the Company #1:
GET companies/_search
{
"fields": [
"company_name",
"branch.address.location"
],
"_source": false,
"sort": [
{
"_geo_distance": {
"branch.address.location": {
"lon": 39.615,
"lat": 19.8948
},
"order": "asc",
"unit": "km"
}
}
]
}
Am I doing something wrong? Is there a way to sort the search results using this method?
Please keep in mind that if for example search with a geolocation that is more close to some geolocations of the "Comapny #2", in this case I need the Company #2 to be first.
Finally, if the setup I have isn't correct for what I require, if there's any other way to achieve that same result with different document structure, please let me know. I am still in the beginning of the project, and It's simple to adapt to what is more appropriate.
The documentation here says "Geopoint expressed as a string with the format: "lat,lon"."
Your location is "location": "39.615,19.8948", maybe the query must be below:
"branch.address.location": {
"lat": 39.615,
"lon": 19.8948
}
My Tests:
PUT idx_test
{
"mappings": {
"properties": {
"branch": {
"properties": {
"address": {
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
}
}
}
}
POST idx_test/_doc/1
{
"company_name": "Company #1",
"branch": [
{
"address": {
"location": "39.615,19.8948"
}
}
]
}
POST idx_test/_doc/2
{
"company_name": "Company #2",
"branch": [
{
"address": {
"location": "39.586,19.9028"
}
},
{
"address": {
"location": "39.612,19.9134"
}
},
{
"address": {
"location": "39.607,19.8946"
}
}
]
}
Search by location "39.607,19.8946" company #2
GET idx_test/_search?
{
"fields": [
"company_name",
"branch.address.location"
],
"_source": false,
"sort": [
{
"_geo_distance": {
"branch.address.location": {
"lat": 39.607,
"lon": 19.8946
},
"order": "asc",
"unit": "km"
}
}
]
}
Response:
"hits": [
{
"_index": "idx_test",
"_id": "2",
"_score": null,
"fields": {
"branch.address.location": [
{
"coordinates": [
19.9028,
39.586
],
"type": "Point"
},
{
"coordinates": [
19.9134,
39.612
],
"type": "Point"
},
{
"coordinates": [
19.8946,
39.607
],
"type": "Point"
}
],
"company_name": [
"Company #2"
]
},
"sort": [
0
]
},
{
"_index": "idx_test",
"_id": "1",
"_score": null,
"fields": {
"branch.address.location": [
{
"coordinates": [
19.8948,
39.615
],
"type": "Point"
}
],
"company_name": [
"Company #1"
]
},
"sort": [
0.8897252783915647
]
}
]
Search by location "39.615,19.8948" company #1
GET idx_test/_search?
{
"fields": [
"company_name",
"branch.address.location"
],
"_source": false,
"sort": [
{
"_geo_distance": {
"branch.address.location": {
"lat": 39.615,
"lon": 19.8948
},
"order": "asc",
"unit": "km"
}
}
]
}
Response
"hits": [
{
"_index": "idx_test",
"_id": "1",
"_score": null,
"fields": {
"branch.address.location": [
{
"coordinates": [
19.8948,
39.615
],
"type": "Point"
}
],
"company_name": [
"Company #1"
]
},
"sort": [
0
]
},
{
"_index": "idx_test",
"_id": "2",
"_score": null,
"fields": {
"branch.address.location": [
{
"coordinates": [
19.9028,
39.586
],
"type": "Point"
},
{
"coordinates": [
19.9134,
39.612
],
"type": "Point"
},
{
"coordinates": [
19.8946,
39.607
],
"type": "Point"
}
],
"company_name": [
"Company #2"
]
},
"sort": [
0.8897285575578558
]
}
]

ElasticSearch DSL: query on custom tags too

I'm new to ES, but already have a basic query that I need to extend.
The query is currently doing a search based on keywords and also on geo-distance.
Now, I have added a custom tag into my index, and I wish to take it into account too.
I wish to filter and score my results based on the imageTags (tag + score)!
PS: I have seen other similar posts, but I can't figure out how to adapt my query (none of my attempts work).
Here is my query:
GET /posts_index/_search
{
"track_total_hits": true,
"from":0,
"size":10,
"_source": [
"id",
"tags",
"taxonomies",
"storesNames",
"geoLocations",
"storesIds",
"imageUri"
],
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"distance_feature": {
"field": "geoLocations",
"pivot": "10km",
"origin": [
-71.3,
41.15
]
}
},
{
"distance_feature": {
"field": "creationTime",
"pivot": "14d",
"origin": "now"
}
},
{
"query_string": {
"fields": [
"tags^3",
"taxonomies^5"
],
"query": "",
"fuzziness": "auto"
}
},
]
}
},
"functions": [
{
"script_score": {
"script": {
"source": "Math.sqrt(doc['commentsCount'].value)"
}
}
},
{
"script_score": {
"script": {
"source": "Math.log(2 + doc['likesCount'].value)"
}
}
},
{
"script_score": {
"script": {
"source": "Math.log(2 + doc['viewsCount'].value)"
}
}
}
],
"score_mode": "avg"
}
}
}
Here is one of my attempts to modify it:
{
"query": {
"nested": {
"path": "imageTags",
"score_mode": "sum",
"query": {
"function_score": {
"query": {
"match": {
"imageTags.tag.keyword": "tripod"
}
},
"field_value_factor": {
"field": "imageTags.score",
"factor": 1,
"missing": 0
}
}
}
}
}
}
By example, here is and example of the index:
{
"id": "4a9afd93-62bc-e8b2-29b4-39f5b073336d",
"tags": [
"fashion",
"mode",
"summer"
],
"imageTags": [
{
"score": 0.95150965,
"tag": "four-poster"
},
{
"score": 0.014835004,
"tag": "window"
},
{
"score": 0.014835004,
"tag": "shade"
},
{
"score": 0.009375425,
"tag": "sliding"
},
{
"score": 0.009375425,
"tag": "door"
}
],
"taxonomies": [],
"categories": [],
"qualityScore": 0.0,
"geoLocations": [
{
"lat": 50.4651156,
"lon": 4.865208
}
],
"storesIds": [
"ba9b3f59-50aa-8774-11a7-39f5ad58ae1a"
],
"storesNames": [
"Zara Namur"
],
"creationTime": "2020-06-10T12:48:30.5710000Z",
"updateTime": "2020-06-10T12:48:30.5710000Z",
"imageUri": "https://localhost:44359/cdn/e_76372856-f7a0-49cc-d3d9-39f5ad58ad6d/653d147084637b2af68b39f5b0733359.jpg",
"description": "",
"likesCount": 0,
"viewsCount": 0,
"commentsCount": 0,
"ImageTags": [
{
"score": 0.95150965,
"tag": "four-poster"
},
{
"score": 0.014835004,
"tag": "window"
},
{
"score": 0.014835004,
"tag": "shade"
},
{
"score": 0.009375425,
"tag": "sliding"
},
{
"score": 0.009375425,
"tag": "door"
}
]
}

Elasticsearch - how to apply sort on nested field

I'm rewriting the nested field in the documents and failing to get the query right to sort on the nested fields.
Previously I had the nested field like this:
"my_nested_obj": {
"project-type": [
{
"name": "Table",
"value": "159841"
}
],
"cost": [
{
"name": "Under $50",
"value": "426503"
}
],
"skill-level": [
{
"name": "Intermediate",
"value": "63897"
}
],
"room": [
{
"name": "Outdoor",
"value": "19246"
}
]
}....
And I was able to write queries like these where I can boost and also sort on the 'my_nested_obj' for example:
{
"from": 0,
"size": 50,
"query": {
"filtered": {
"query": {
"multi_match": {
"query": "something",
"fields": [
"content",
"name",
"my_nested_obj.skill-level.name^3"
]
}
},
"filter": {
"bool": {
"must": [
{
"match_all": [
]
},
{
"term": {
"retired": false
}
}
]
}
}
}
},
"sort": {
"my_nested_obj.skill-level.name": "desc"
},
"timeout": "1800ms"
}
Now, I need to reformat the nested field like:
"my_nested_obj": [
{
"name": "Table",
"type": "project-type",
"value": "159841"
},
{
"name": "Under $50",
"type": "cost",
"value": "426503"
},
{
"name": "Intermediate",
"type": "skill-level",
"value": "63897"
},
{
"name": "Outdoor",
"type": "room",
"value": "19246"
}
]....
I can do a generic sort on my_nested_obj.name like:
....
"sort": {
"my_nested_obj.name": "desc"
},
...
How do I go about adding for example sort specifically skill-level name and not all the my_nested_obj.name? Also is there some way to specify the boost?
Thanks!

GROUP BY in elasticsearch

I am trying to write a GROUP BY query in elastic search using version 5.2
I want to query the data and limit that down to those which have a particular 'tag'. In the case below. I want to select items which contain the word "FIY" in the title or content fields and then narrow that down so as to only search those documents which have the tags "FIY" and "Competition"
The query part is fine but I am struggling to limit it to the given tag.
So far I have got, but I am getting the error.
"reason": "[bool] query does not support [terms]",
GET advice-articles/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "FIY",
"fields": ["title", "content"]
}
}
], "filter": {
"bool": {
"terms": {
"tags.tagName": [
"competition"
]
}
}
}
}
}
}
an example index is
"_index": "advice-articles",
"_type": "article",
"_id": "1460",
"_score": 4.3167734,
"_source": {
"id": "1460",
"title": "Your top FIY tips",
"content": "Fix It Yourself in April 2012.",
"tags": [
{
"tagName": "Fix it yourself"
},
{
"tagName": "customer tips"
},
{
"tagName": "competition"
}
]
the mappings I have are as follows
{
"advice-articles": {
"mappings": {
"article": {
"properties": {
"content": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"tags": {
"type": "nested",
"properties": {
"tagName": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
}
}
}
bool query built using one or more boolean clauses, each clause with a typed occurrence. The occurrence types are:
must, must_not, filter, should
GET _search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "FIY",
"fields": [
"title",
"content"
]
}
},
{
"nested": {
"path": "tags",
"query": {
"terms": {
"tags.tagName": [
"competition"
]
}
}
}
}
]
}
}
}
Here is how you can use a must clause for your query requirements.
Inside the filter you dont need to put bool.
POST newindex/test/1460333
{
"title": "Your top FIY tips",
"content": "Fix It Yourself in April 2012.",
"tags": [
{
"tagName": "Fix it yourself"
},
{
"tagName": "customer tips"
},
{
"tagName": "shoud not return"
}
]
}
POST newindex/test/1460
{
"title": "Your top FIY tips",
"content": "Fix It Yourself in April 2012.",
"tags": [
{
"tagName": "Fix it yourself"
},
{
"tagName": "customer tips"
},
{
"tagName": "competition"
}
]
}
Query:
GET newindex/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "FIY",
"fields": [
"title",
"content"
]
}
}
],
"filter": {
"terms": {
"tags.tagName": [
"competition"
]
}
}
}
}
}
Result :
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.2876821,
"hits": [
{
"_index": "newindex",
"_type": "test",
"_id": "1460",
"_score": 0.2876821,
"_source": {
"title": "Your top FIY tips",
"content": "Fix It Yourself in April 2012.",
"tags": [
{
"tagName": "Fix it yourself"
},
{
"tagName": "customer tips"
},
{
"tagName": "competition"
}
]
}
}
]
}
}

Kibana sometimes returns completely irrelevant results

We are using Logstash, elasticsearch and kibana for handling and searching of our logs.
Often, we searching, Kibana will return results that do not contain the searched for item.
For example, we search on the exact phrase - here is the query
curl -XGET 'http://logs.magick.nu/kibana2/logstash-2014.10.17,logstash-2014.10.16/_search?pretty' -d '{
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "COND_30892c7a490e154e01490e2dcf7a0008(2)"
}
}
]
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"from": 1413471279957,
"to": 1413557679958
}
}
}
]
}
}
}
},
"highlight": {
"fields": {},
"fragment_size": 2147483647,
"pre_tags": [
"#start-highlight#"
],
"post_tags": [
"#end-highlight#"
]
},
"size": 500,
"sort": [
{
"#timestamp": {
"order": "desc",
"ignore_unmapped": true
}
},
{
"#timestamp": {
"order": "desc",
"ignore_unmapped": true
}
}
]
}'
And Kibana would return results such as:
{
"_index": "logstash-2014.10.17",
"_type": "app SwitchYard",
"_id": "unti1lWJRTelQd4N5_LVjA",
"_score": null,
"_source": {
"message": "2014/10/17 13:50:43,739 [com.domain.Connector.service.ent.BasicJMSTickListener] (NJ4X-63) Sending market info for product symbol to JMS topic. Broker Server: broker.Demo. Account Number: 1235. StrategyId: 4028e49447ac4296147af921d5f00b. OrderCount: 2",
"#version": "1",
"#timestamp": "2014-10-17T14:24:32.193Z",
"type": "app SwitchYard",
"tags": [
"node"
],
"domain": "trading1-magickdev.amakitu.com",
"env": "DEV",
"host": "nodelarge.amakitu.com",
"path": "/var/lib/openshift/541723389821cc77c2000167/jbosseap/logs/server.log"
},
"sort": [
1413555872193,
1413555872193
]
}
This happens a lot!
Any ideas what is wrong?

Resources