Elasticsearch Parse Exception for boolean queries - elasticsearch

I'm trying to create queries similar to kibana queries in elasticsearch lucene queries. What I'm basically trying to do is matching some phrases. For example; my kibana query looks like this:(+"anna smith") AND ( (+"university"), (+"chairman"), (+"women rights")) It searches "anna smith" as must and one of the other phrases as should(there should be at least one of them exist in the text). I wrote a query to do this but it gives "elasticsearch parse exception:expected field name but got start_object". How can I solve this. Here is my query;
{
"query": {
"bool": {
"must": {
"match": {
"text": {
"query": "anna smith",
"operator": "and"
}
}
}
},
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"match": {
"text": {
"query": "university",
"boost": 2
}
}
},
{
"match": {
"text": {
"query": "chairman",
"boost": 2
}
}
}
]
}
}]
}}}}

Your second query at the bottom cannot be there, it needs to be inside the first bool/must like this
{
"query": {
"bool": {
"must": [
{
"match": {
"text": {
"query": "anna smith",
"operator": "and"
}
}
},
{
"bool": {
"should": [
{
"match": {
"text": {
"query": "university",
"boost": 2
}
}
},
{
"match": {
"text": {
"query": "chairman",
"boost": 2
}
}
}
]
}
}
]
}
}
}

Related

Elasticsearch return exact match first then other matches

I have some PageDocuments which I would like to search based on the title, excluding PageDocuments with a path starting with some particular text. This field is analyzed. I would like some fuzziness to help users with spelling mistakes. I need to be able to do partial matches so some would match some text and this is some text.
If I use the following query I don't get an exact match back as the first result because of tf-idf
{
"size": 20,
"query": {
"bool": {
"must": [
{
"match": {
"title": {
"query": "myterm",
"fuzziness": 1
}
}
}
],
"must_not": [
{
"wildcard": {
"path": {
"value": "/test/*"
}
}
}
]
}
}
}
So then I added a not_analyzed version of the title field at title.not_analyzed and tried adding a function score to increase the weighting of an exact match using term.
{
"query": {
"function_score": {
"functions": [
{
"weight": 2,
"filter": {
"fquery": {
"query": {
"term": {
"title.not_analyzed": {
"value": "myterm"
}
}
}
}
}
}
],
"query": {
"bool": {
"must": [
{
"match": {
"title": {
"query": "myterm",
"fuzziness": 1
}
}
}
],
"must_not": [
{
"wildcard": {
"path": {
"value": "/path/*"
}
}
}
]
}
},
"boost_mode": "multiply"
}
}
}
But this gives me the same results. How can I get the exact matches returned first?
We found a solution to this by adding a combination of should and boost.
{
"size": 20,
"query": {
"bool": {
"must": [
{
"match": {
"title": {
"query": "myterm",
"fuzziness": 1
}
}
}
],
"must_not": [
{
"wildcard": {
"path": {
"value": "/path/*"
}
}
}
],
"should": [
{
"term": {
"title": {
"value": "myterm",
"boost": 10
}
}
}
]
}
}
}

Elasticsearch query + filter

This is my original query dsl, and total of hits was 8,981.
GET /{index}/{document}/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"title": {
"query": "blue shoes",
"boost": 2
}
}
},
{
"match": {
"description": {
"query": "blue shoes",
"operator": "and",
"boost": 1
}
}
}
]
}
}
}
I want to add filter to this query.
GET /{index}/{document}/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"title": {
"query": "blue shoes",
"boost": 2
}
}
},
{
"match": {
"description": {
"query": "blue shoes",
"operator": "and",
"boost": 1
}
}
}
],
"filter": {
"terms": {
"store.id": [ "store_a.com", "store_b.com" ]
}
}
}
}
}
Now its total of hits is 15,989(increased).
And I sort the result by score in asc(I don't know why it's asc not desc), there are documents which is scored 0.
I think there is no more filtering by query because it is already filtered.
Can I remove 0 scored documents from the result?
To add a filter, use a must clause in your bool query to add a mandatory value. Try :
GET /{index}/{document}/_search
{
"query": {
"bool": {
"must": [
"terms": {
"store.id": [ "store_a.com", "store_b.com" ]
}
],
"should": [
{
"match": {
"title": {
"query": "blue shoes",
"boost": 2
}
}
},
{
"match": {
"description": {
"query": "blue shoes",
"operator": "and",
"boost": 1
}
}
}
]
}
}
}

ElasticSearch How to AND a nested query

I am trying to figure out how to AND my Elastic Search query. I've tried a few different variations but I am always hitting a parser error.
What I have is a structure like this:
{
"title": "my title",
"details": [
{ "name": "one", "value": 100 },
{ "name": "two", "value": 21 }
]
}
I have defined details as a nested type in my mappings. What I'm trying to achieve is a query where it matches a part of the title and it matches various details by the detail's name and value.
I have the following query which gets me nearly there but I haven't been able to figure out how to AND the details. As an example I'd like to find anything that has:
detail of one with value less than or equal to 100
AND detail of two with value less than or equal to 25
The following query only allows me to search by one detail name/value:
"query" : {
"bool": {
"must": [
{ "match": {"title": {"query": titleQuery, "operator": "and" } } },
{
"nested": {
"path": "details",
"query": {
"bool": {
"must": [
{ "match": {"details.name" : "one"} },
{ "range": {"details.value" : { "lte": 100 } } }
]
}
}
} // nested
}
] // must
}
}
As a second question, would it be better to query the title and then move the nested part of the query into a filter?
You were so close! Just add another "nested" clause in your outer "must":
POST /test_index/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": {
"query": "title",
"operator": "and"
}
}
},
{
"nested": {
"path": "details",
"query": {
"bool": {
"must": [
{"match": {"details.name": "one" } },
{ "range": { "details.value": { "lte": 100 } } }
]
}
}
}
},
{
"nested": {
"path": "details",
"query": {
"bool": {
"must": [
{"match": {"details.name": "two" } },
{ "range": { "details.value": { "lte": 25 } } }
]
}
}
}
}
]
}
}
}
Here is some code I used to test it:
http://sense.qbox.io/gist/1fc30d49a810d22e85fa68d781114c2865a7c92e
EDIT: Oh, the answer to your second question is "yes", though if you're using 2.0 things have changed a little.

"boost" not working for "term" query

I'm running Elasticsearch 1.5.2 and trying the following query:
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"gender": "male"
}
}
]
}
},
"query": {
"bool": {
"must": [
{
"match_all": {}
}
],
"should": [
{
"term": {
"top_users": 1,
"boost": 2
}
}
]
}
}
}
}
}
Everything is fine until I add the "boost": 2 to the should -> term part. The complete query is much more complex, that's why I need to boost, but the remaining queries don't make any difference: ES returns an error 400 if a term query gets a boost argument:
QueryParsingException[[index_name] [_na] query malformed, must start with start_object]
Any suggestions?
It should be like this:
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"gender": "male"
}
}
]
}
},
"query": {
"bool": {
"must": [
{
"match_all": {}
}
],
"should": [
{
"term": {
"top_users": {
"value": "1",
"boost": 2
}
}
}
]
}
}
}
}
}

Elasticsearch boost score with nested query

I have the following query in Elasticsearch version 1.3.4:
{
"filtered": {
"query": {
"bool": {
"should": [
{
"bool": {
"should": [
{
"match_phrase": {
"_all": "java"
}
},
{
"bool": {
"should": [
{
"match_phrase": {
"_all": "adobe creative suite"
}
}
]
}
}
]
}
},
{
"bool": {
"should": [
{
"nested": {
"path": "skills",
"query": {
"bool": {
"must": [
{
"term": {
"skills.name.original": "java"
}
},
{
"bool": {
"should": [
{
"match": {
"skills.source": {
"query": "linkedin",
"boost": 5
}
}
},
{
"match": {
"skills.source": {
"query": "meetup",
"boost": 5
}
}
}
]
}
}
],
"minimum_should_match": "100%"
}
}
}
}
]
}
}
],
"minimum_should_match": "100%"
}
},
"filter": {
"and": [
{
"bool": {
"should": [
{
"term": {
"skills.name.original": "java"
}
}
]
}
},
{
"bool": {
"should": [
{
"term": {
"skills.name.original": "ajax"
}
},
{
"term": {
"skills.name.original": "html"
}
}
]
}
}
]
}
}
}
Mappings look like this:
skills: {
type: "nested",
include_in_parent: true,
properties: {
name: {
type: "multi_field",
fields: {
name: {type: "string"},
original: {type : "string", analyzer : "string_lowercase"}
}
}
}
}
and finally the document structure, for skills (excluded other parts), looks like this:
"skills":
[
{
"name": "java",
"source": [
"linkedin",
"facebook"
]
},
{
"name": "html",
"source": [
"meetup"
]
}
]
My goal with this query is to, first filter out some irrelevant hits with the filters (bottom of the query), then score a person by searching the whole document for the match_phrase "java", extra boosting if it also contains the match_phrase "adobe creative suit", then check the nested value where we get a hit in "skills" to see what kind of "source(s)" the skill came from. Then give the query a boost based on what source, or sources the nested object has.
This kinda of works, at least I don't get any errors, but the final score is odd and its hard to see if its working. If I give a small boost, lets say 2, the score goes DOWN slightly, my top hit at the moment has a score of 32.176407 with boost = 1. With a boost of 5 it goes down to 31.637703. I would expect it to go up, not down? With a boost of 1000, the score goes down to 2.433376.
Is this the right way to do this, or is there a better/easier way? I could change the structure and mappings etc. And why is my score decreasing?
Edit: I have simplified the query a little, only dealing with one "skill":
{
"filtered": {
"query": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"bool": {
"should": [
{
"match_phrase": {
"_all": "java"
}
}
],
"minimum_should_match": 1
}
}
]
}
}
],
"should": [
{
"nested": {
"path": "skills",
"score_mode": "avg",
"query": {
"bool": {
"must": [
{
"term": {
"skills.name.original": "java"
}
}
],
"should": [
{
"match": {
"skills.source": {
"query": "linkedin",
"boost": 1.2
}
}
},
{
"match": {
"skills.source": {
"query": "meetup",
"boost": 1.2
}
}
}
]
}
}
}
}
]
}
},
"filter": {
"and": [
{
"bool": {
"should": [
{
"term": {
"skills.name.original": "java"
}
}
]
}
}
]
}
}
}
The problem now is that I expect two similar documents, where the only difference is the "source" value on the skill "java". They are "linkedin" and "meetup" respectively. In my new query, they both get the same boost, but the final _score is very different for the two documents.
From the query explanation for doc 1:
"value": 3.82485,
"description": "Score based on child doc range from 0 to 125"
and for doc two:
"value": 2.1993546,
"description": "Score based on child doc range from 0 to 125"
These values are the only ones that differ, and I cant see why.
I can't answer the question regarding the boost, but how many shards do you have on index?
TF and IDF are calculated per shard not per index and this could be creating the difference in score.
https://groups.google.com/forum/#!topic/elasticsearch/FK-PYb43zcQ.
If you reindex with only 1 shard does change the outcome?
Edit: Also, the doc range is the range of docs for each document in the shard and you can use this to calculate IDF for each doc to verify scores.

Resources