Score up when picture not null - elasticsearch-1.7 - elasticsearch

I have a field picture and I implement a search of users.
The goal is to show first the people with picture, then the people without it.
I am maintaining an Elasticsearch 1.7 project and I can't upgrade the version.
The mapping:
"user": {
"_all": {
"auto_boost": true
},
"properties": {
"id": {
"type": "string",
"store": true
},
"picture": {
"type": "string",
"store": true
}
It seems that the query Exist / Missing Query does not exist in ElasticSearch 1.7 (doc)
When user doens not have picture, it is stored as null. When he has one, it is store with the filename: xxx.jpg or yyyy.PNG
I tried to do a query like that:
{
"track_scores": true,
"query": {
"bool": {
"must": [
{
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"user.enabled": true
}
}
]
}
}
}
}
],
"should": [
{
"match": {
"picture": {
"query": ".jpg .png .JPG .PNG",
"operator": "or"
}
}
}
]
}
}
}
I've tried also :
/* ... */
"should": [
{
"terms": {
"minimum_match": 1,
"teacher.picture": [
".jpg",
".png",
".JPG",
".PNG"
]
}
}
]
I still have results with picture mixed with the one with no pictures...
Do you know how i can achieve this?

you mentioned your goal is to just show first the people with the pic and then the people without the pic. So you can simply just use _missing in sort.
With your following changes you would not be able to get people who don't have a picture, so don't chase exists query.
/* ... */
"should": [
{
"terms": {
"minimum_match": 1,
"teacher.picture": [
".jpg",
".png",
".JPG",
".PNG"
]
}
}
]
Instead use _missing in score.
{
"track_scores": true,
"query": {
"bool": {
"must": [
{
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"user.enabled": true
}
}
]
}
}
}
}
],
"should": [
{
"match": {
"picture": {
"query": ".jpg .png .JPG .PNG",
"operator": "or"
}
}
}
]
}
},
"sort" : [
{ "picture" : {"missing" : "_last"} },
]
}
Hope this helps.
Thanks

Related

ElasticSearch should with nested and bool must_not exists

With the following mapping:
"categories": {
"type": "nested",
"properties": {
"category": {
"type": "integer"
},
"score": {
"type": "float"
}
}
},
I want to use the categories field to return documents that either:
have a score above a threshold in a given category, or
do not have the categories field
This is my query:
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "categories",
"query": {
"bool": {
"must": [
{
"terms": {
"categories.category": [
<id>
]
}
},
{
"range": {
"categories.score": {
"gte": 0.5
}
}
}
]
}
}
}
},
{
"bool": {
"must_not": [
{
"exists": {
"field": "categories"
}
}
]
}
}
],
"minimum_should_match": 1
}
}
}
It correctly returns documents both with and without the categories field, and orders the results so the ones I want are first, but it doesn't filter the results having score below the 0.5 threshold.
Great question.
That is because categories is not exactly a field from the elasticsearch point of view[a field on which inverted index is created and used for querying/searching] but categories.category and categories.score is.
As a result categories being not found in any document, which is actually true for all the documents, you observe the result what you see.
Modify the query to the below and you'd see your use-case working correctly.
POST <your_index_name>/_search
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "categories",
"query": {
"bool": {
"must": [
{
"terms": {
"categories.category": [
"100"
]
}
},
{
"range": {
"categories.score": {
"gte": 0.5
}
}
}
]
}
}
}
},
{
"bool": {
"must_not": [ <----- Note this
{
"nested": {
"path": "categories",
"query": {
"bool": {
"must": [
{
"exists": {
"field": "categories.category"
}
},
{
"exists": {
"field": "categories.score"
}
}
]
}
}
}
}
]
}
}
],
"minimum_should_match": 1
}
}
}

Using multiple Should queries

I want to get docs that are similar to multiple "groups" but separately. Each group has it's own rules (terms).
When I try to use more than one Should query inside a "bool" I get items that are a mix of both Should's terms.
I want to use 1 query total and not msearch for example.
Can someone please help me with that?
{
"explain": true,
"query": {
"filtered": {
"filter": {
"bool": {
"must_not": [
{
"term": {
"p_id": "123"
}
},
{
"term": {
"p_id": "124"
}
}
]
}
},
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"bool": {
"minimum_should_match": 1,
"should": [
{
"term": {
"cat": "1"
}
},
{
"term": {
"cat": "2"
}
},
{
"term": {
"keys": "a"
}
},
{
"term": {
"keys": "b"
}
}
]
}
},
{
"bool": {
"minimum_should_match": 1,
"should": [
{
"term": {
"cat": "6"
}
},
{
"term": {
"cat": "7"
}
},
{
"term": {
"keys": "r"
}
},
{
"term": {
"keys": "u"
}
}
]
}
}
]
}
}
}
},
"from": 0,
"size": 3
}
You can try using a terms aggregation on multiple fields with scripting and add a top hits aggregation as a sub-aggregation. Be warned this will be pretty slow. Add this after the query/filter and adjust the size parameter as needed
"aggs": {
"Cat_and_Keys": {
"terms": {
"script": "doc['cat'].values + doc['keys'].values"
},
"aggs":{ "separate_docs": {"top_hits":{"size":1 }} }
}
}

Match multiple properties on the same nested document in ElasticSearch

I'm trying to accomplish what boils down to a boolean AND on nested documents in ElasticSearch. Let's say I have the following two documents.
{
"id": 1,
"secondLevels": [
{
"thirdLevels": [
{
"isActive": true,
"user": "anotheruser#domain.com"
}
]
},
{
"thirdLevels": [
{
"isActive": false,
"user": "user#domain.com"
}
]
}
]
}
{
"id": 2,
"secondLevels": [
{
"thirdLevels": [
{
"isActive": true,
"user": "user#domain.com"
}
]
}
]
}
In this case, I want to only match documents (in this case ID: 2) that have a nested document with both isActive: true AND user: user#domain.com.
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "secondLevels.thirdLevels",
"query": {
"bool": {
"must": [
{
"term": {
"secondLevels.thirdLevels.isActive": true
}
},
{
"term": {
"secondLevels.thirdLevels.user": "user#domain.com"
}
}
]
}
}
}
}
]
}
}
}
However, what seems to be happening is that my query turns up both documents because the first document has one thirdLevel that has isActive: true and another thirdLevel that has the appropriate user.
Is there any way to enforce this strictly at query/filter time or do I have to do this in a script?
With nested-objects and nested-query, you have made most of the way.
All you have to do now is to add the inner hits flag and also use source filtering for move entire secondLevels documents out of the way:
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "secondLevels.thirdLevels",
"query": {
"bool": {
"must": [
{
"term": {
"secondLevels.thirdLevels.isActive": true
}
},
{
"term": {
"secondLevels.thirdLevels.user": "user#domain.com"
}
}
]
}
},
"inner_hits": {
"size": 100
}
}
}
]
}
}
}

Elasticsearch boost score with nested query

I have the following query in Elasticsearch version 1.3.4:
{
"filtered": {
"query": {
"bool": {
"should": [
{
"bool": {
"should": [
{
"match_phrase": {
"_all": "java"
}
},
{
"bool": {
"should": [
{
"match_phrase": {
"_all": "adobe creative suite"
}
}
]
}
}
]
}
},
{
"bool": {
"should": [
{
"nested": {
"path": "skills",
"query": {
"bool": {
"must": [
{
"term": {
"skills.name.original": "java"
}
},
{
"bool": {
"should": [
{
"match": {
"skills.source": {
"query": "linkedin",
"boost": 5
}
}
},
{
"match": {
"skills.source": {
"query": "meetup",
"boost": 5
}
}
}
]
}
}
],
"minimum_should_match": "100%"
}
}
}
}
]
}
}
],
"minimum_should_match": "100%"
}
},
"filter": {
"and": [
{
"bool": {
"should": [
{
"term": {
"skills.name.original": "java"
}
}
]
}
},
{
"bool": {
"should": [
{
"term": {
"skills.name.original": "ajax"
}
},
{
"term": {
"skills.name.original": "html"
}
}
]
}
}
]
}
}
}
Mappings look like this:
skills: {
type: "nested",
include_in_parent: true,
properties: {
name: {
type: "multi_field",
fields: {
name: {type: "string"},
original: {type : "string", analyzer : "string_lowercase"}
}
}
}
}
and finally the document structure, for skills (excluded other parts), looks like this:
"skills":
[
{
"name": "java",
"source": [
"linkedin",
"facebook"
]
},
{
"name": "html",
"source": [
"meetup"
]
}
]
My goal with this query is to, first filter out some irrelevant hits with the filters (bottom of the query), then score a person by searching the whole document for the match_phrase "java", extra boosting if it also contains the match_phrase "adobe creative suit", then check the nested value where we get a hit in "skills" to see what kind of "source(s)" the skill came from. Then give the query a boost based on what source, or sources the nested object has.
This kinda of works, at least I don't get any errors, but the final score is odd and its hard to see if its working. If I give a small boost, lets say 2, the score goes DOWN slightly, my top hit at the moment has a score of 32.176407 with boost = 1. With a boost of 5 it goes down to 31.637703. I would expect it to go up, not down? With a boost of 1000, the score goes down to 2.433376.
Is this the right way to do this, or is there a better/easier way? I could change the structure and mappings etc. And why is my score decreasing?
Edit: I have simplified the query a little, only dealing with one "skill":
{
"filtered": {
"query": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"bool": {
"should": [
{
"match_phrase": {
"_all": "java"
}
}
],
"minimum_should_match": 1
}
}
]
}
}
],
"should": [
{
"nested": {
"path": "skills",
"score_mode": "avg",
"query": {
"bool": {
"must": [
{
"term": {
"skills.name.original": "java"
}
}
],
"should": [
{
"match": {
"skills.source": {
"query": "linkedin",
"boost": 1.2
}
}
},
{
"match": {
"skills.source": {
"query": "meetup",
"boost": 1.2
}
}
}
]
}
}
}
}
]
}
},
"filter": {
"and": [
{
"bool": {
"should": [
{
"term": {
"skills.name.original": "java"
}
}
]
}
}
]
}
}
}
The problem now is that I expect two similar documents, where the only difference is the "source" value on the skill "java". They are "linkedin" and "meetup" respectively. In my new query, they both get the same boost, but the final _score is very different for the two documents.
From the query explanation for doc 1:
"value": 3.82485,
"description": "Score based on child doc range from 0 to 125"
and for doc two:
"value": 2.1993546,
"description": "Score based on child doc range from 0 to 125"
These values are the only ones that differ, and I cant see why.
I can't answer the question regarding the boost, but how many shards do you have on index?
TF and IDF are calculated per shard not per index and this could be creating the difference in score.
https://groups.google.com/forum/#!topic/elasticsearch/FK-PYb43zcQ.
If you reindex with only 1 shard does change the outcome?
Edit: Also, the doc range is the range of docs for each document in the shard and you can use this to calculate IDF for each doc to verify scores.

Use partial_fields in elasticsearch kibana query

I am trying to add the partial_fields directive to an elasticsearch query (generated from kibana's table widget).
Where exactly would I have to place this statement in the below ES query?
Already tried to add it right after the first "query" node which produces valid json but still doesn't exclude xmz_Data
"partial_fields": {
"partial1": {
"exclude": "xmz_Data"
}
},
ES Query
{
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "*"
}
}
]
}
},
"filter": {
"bool": {
"must": [
{
"match_all": {}
},
{
"bool": {
"must": [
{
"match_all": {}
}
]
}
}
]
}
}
}
},
"highlight": {
"fields": {},
"fragment_size": 2147483647,
"pre_tags": [
"#start-highlight#"
],
"post_tags": [
"#end-highlight#"
]
},
"size": 250,
"sort": [
{
"timestamp": {
"order": "desc"
}
}
]
}
You can place the partial_fields directive anywhere in your query, I tested successfully with it both before and after the query node. However, your formatting for the excluded fields value is incorrect. Your exclude fields value needs to be an array. Try this instead...
"partial_fields": {
"partial1": {
"exclude": ["xmz_Data"]
}
},

Resources