ElasticSearch Script Score Using Field Value - elasticsearch

ElasticSearch 1.2.1
I am trying to query documents using weighted tags.
curl -X PUT 'http://localhost:9200/test'
curl -X PUT 'http://localhost:9200/test/thing/_mapping' - '{
"thing": {
"properties": {
"tags": {
"type": "nested",
"properties": {
"name": { "type": "string" },
"weight": { "type": "integer" }
}
}
}
}}'
Adding a document:
curl -X PUT 'http://localhost:9200/test/thing/1', -d '{
"tags": [
{ "name": "a", "weight": 2 }
]
}'
Now I am searching for documents having a tag a and boost the score based on the weight.
Note: to run these examples you have to enable dynamic scripting in ElasticSearch: add script.disable_dynamic: false to elasticsearch.yml
curl -X GET 'http://localhost:9200/test/thing/_search?pretty' -d '{
"query": {
"function_score": {
"boost_mode": "replace",
"query": {
"match_all": {}
},
"functions": [
{
"filter": {
"nested": {
"path": "tags",
"filter": {
"term": {
"tags.name": "a"
}
}
}
},
"script_score": {
"script": "doc.weight.value"
}
}
]
}
}
}'
The document is found, as expected, however the score is 0. It seems as if the property doc.weight was empty.
Let's test this by replacing the script with doc.weight.empty ? 50 : 100. The hit now has a score of 50 indicating that the field doc.weight is empty. It is found though, because using a non-existant field name (e.g. doc.foobar) gives an error.
Background: The match_all part would be replaced by a real query. I want to use the tags to boost results matching the tags before the ones not matching the tags.
What am I missing?

Related

Search in two different types with different mappings in Elasticsearch

Having the following mapping of the index tester with two types items and items_two:
curl -XPUT 'localhost:9200/tester?pretty=true' -d '{
"mappings": {
"items": {
"properties" : {
"body" : { "type": "string" }
}},
"items_two": {
"properties" : {
"body" : { "type": "string" },
"publised" : { "type": "integer"}
}}}}'
I put three elements on it.
curl -XPUT 'localhost:9200/tester/items/1?pretty=true' -d '{
"body" : "Hey there im reading a book"
}'
curl -XPUT 'localhost:9200/tester/items_two/1?pretty=true' -d '{
"body" : "I love the new book of my brother",
"publised" : 0
}'
curl -XPUT 'localhost:9200/tester/items_two/2?pretty=true' -d '{
"body" : "Stephen kings book is very nice",
"publised" : 1
}'
I need to make a query that matches the word book and has published = 1 AND the ones that has not published on the mapping, but has book on it (as the only item of items).
With the following query I only get match with the "Stephen kings book is very nice" item (obviously).
curl -XGET 'localhost:9200/tester/_search?pretty=true' -d '{
"query": {
"bool": {
"must": [
{
"match": { "body": "book" }
},
{
"match": { "publised": "1" }
}]
}}}'
My desired output if I search for the string book should match item #1 from the type items ("Hey there im reading a book") and item #2 from the type items_two ("Stephen kings book is very nice").
I don't want to change the mapping or anything else, I need to archieve this via one query, so how can I build my query?
Thanks in advance.
You can use the _type field for these kind of searches. Try the following query
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"match": {
"body": "text"
}
},
{
"match": {
"publised": "1"
}
}
],
"filter": {
"term": {
"_type": "items_two"
}
}
}
},
{
"bool": {
"must": [
{
"match": {
"body": "text"
}
}
],
"filter": {
"term": {
"_type": "items"
}
}
}
}
]
}
}
}

query must match 2 fields exactly, don't analyze

I tried a few different ways of doing a simple get request, filtering on two different attributes, example:
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"email": "erik.landvall#example.com"
}
},
{
"term": {
"password": "bb3810356e9b60cf6..."
}
}
]
}
},
"query": {
"match_all": []
}
}
}
The problem is that I get nothing back in return. As I understand it, this is because ElasticSearch analyzes the email field, making the query fail. So if I however would use the term erik.landvall instead of the complete email address, it will match the document - which confirms that's what's going on.
I can define the attribute as type:string and index:not_analyzed when I create the index. But what if I wanna be able to search on the email attribute in a different context? So there should, to my mind, be a way to specify that I wanna filter on the actual value of the attribute in a query. I can however not find how such a query would look.
Is it possible to force Elasticsearch to use "not_analyze" when querying? If so, then how?
You can use scripting for this purpose. You would have to directly access the JSON you have stored with _source. Try following query
{
"query": {
"bool": {
"filter": {
"script": {
"script": {
"inline" : "_source.email==param1 && _source.password==param2",
"params" : {
"param1" : "erik.landvall#example.com",
"param2" : "bb3810356e9b60cf6"
}
}
}
}
}
}
}
You would need to enable dynamic scripting. Add script.inline: on to your yml file and restart the node.
If this kind of query is fairly regular then It would be much better to reindex the data as others have suggested in the comments.
Its not possible to turn on/off analyzed or not, the way to do it to "transform" your field to analysis you need by using fields.
curl -XPUT 'localhost:9200/my_index?pretty' -d'
{
"mappings": {
"my_type": {
"properties": {
"city": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}'
curl -XPUT 'localhost:9200/my_index/my_type/1?pretty' -d'
{
"city": "New York"
}'
curl -XPUT 'localhost:9200/my_index/my_type/2?pretty' -d'
{
"city": "York"
}'
curl -XGET 'localhost:9200/my_index/_search?pretty' -d'
{
"query": {
"match": {
"city": "york"
}
},
"sort": {
"city.raw": "asc"
},
"aggs": {
"Cities": {
"terms": {
"field": "city.raw"
}
}
}
}'

Find documents with empty string value on elasticsearch

I've been trying to filter with elasticsearch only those documents that contains an empty string in its body. So far I'm having no luck.
Before I go on, I should mention that I've already tried the many "solutions" spread around the Interwebz and StackOverflow.
So, below is the query that I'm trying to run, followed by its counterparts:
{
"query": {
"filtered":{
"filter": {
"bool": {
"must_not": [
{
"missing":{
"field":"_textContent"
}
}
]
}
}
}
}
}
I've also tried the following:
{
"query": {
"filtered":{
"filter": {
"bool": {
"must_not": [
{
"missing":{
"field":"_textContent",
"existence":true,
"null_value":true
}
}
]
}
}
}
}
}
And the following:
{
"query": {
"filtered":{
"filter": {
"missing": {"field": "_textContent"}
}
}
}
}
None of the above worked. I get an empty result set when I know for sure that there are records that contains an empty string field.
If anyone can provide me with any help at all, I'll be very grateful.
Thanks!
If you are using the default analyzer (standard) there is nothing for it to analyze if it is an empty string. So you need to index the field verbatim (not analyzed). Here is an example:
Add a mapping that will index the field untokenized, if you need a tokenized copy of the field indexed as well you can use a Multi Field type.
PUT http://localhost:9200/test/_mapping/demo
{
"demo": {
"properties": {
"_content": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
Next, index a couple of documents.
/POST http://localhost:9200/test/demo/1/
{
"_content": ""
}
/POST http://localhost:9200/test/demo/2
{
"_content": "some content"
}
Execute a search:
POST http://localhost:9200/test/demo/_search
{
"query": {
"filtered": {
"filter": {
"term": {
"_content": ""
}
}
}
}
}
Returns the document with the empty string.
{
took: 2,
timed_out: false,
_shards: {
total: 5,
successful: 5,
failed: 0
},
hits: {
total: 1,
max_score: 0.30685282,
hits: [
{
_index: test,
_type: demo,
_id: 1,
_score: 0.30685282,
_source: {
_content: ""
}
}
]
}
}
Found solution here https://github.com/elastic/elasticsearch/issues/7515
It works without reindex.
PUT t/t/1
{
"textContent": ""
}
PUT t/t/2
{
"textContent": "foo"
}
GET t/t/_search
{
"query": {
"bool": {
"must": [
{
"exists": {
"field": "textContent"
}
}
],
"must_not": [
{
"wildcard": {
"textContent": "*"
}
}
]
}
}
}
Even with the default analyzer you can do this kind of search: use a script filter, which is slower but can handle the empty string:
curl -XPOST 'http://localhost:9200/test/demo/_search' -d '
{
"query": {
"filtered": {
"filter": {
"script": {
"script": "_source._content.length() == 0"
}
}
}
}
}'
It will return the document with empty string as _content without a special mapping
As pointed by #js_gandalf, this is deprecated for ES>5.0. Instead you should use: query->bool->filter->script as in https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
For those of you using elastic search 5.2 or above, and still stuck. Easiest way is to reindex your data correctly with the keyword type. Then all the searches for empty values worked. Like this:
"query": {
"term": {"MY_FIELD_TO_SEARCH": ""}
}
Actually, when I reindex my database and rerun the query. It worked =)
The problem was that my field was type: text and NOT a keyword. Changed the index to keyword and reindexed:
curl -X PUT https://username:password#host.io:9200/mycoolindex
curl -X PUT https://user:pass#host.io:9200/mycoolindex/_mapping/mycooltype -d '{
"properties": {
"MY_FIELD_TO_SEARCH": {
"type": "keyword"
},
}'
curl -X PUT https://username:password#host.io:9200/_reindex -d '{
"source": {
"index": "oldindex"
},
"dest": {
"index": "mycoolindex"
}
}'
I hope this helps someone who was as stuck as I was finding those empty values.
OR using lucene query string syntax
q=yourfield.keyword:""
See Elastic Search Reference https://www.elastic.co/guide/en/elasticsearch/reference/6.5/query-dsl-query-string-query.html#query-string-syntax
in order to find the empty string of one field in your document, it's highly relevant to the field's mapping, in other word, its index/analyzer setting .
If its index is not_analyzed, which means the token is just the empty string, you can just use term query to find it, as follows:
{"from": 0, "size": 100, "query":{"term": {"name":""}}}
Otherwise, if the index setting is analyzed and I believe most analyzer will treat empty string as null value So
you can use the filter to find the empty string.
{"filter": {"missing": {"existence": true, "field": "name", "null_value": true}}, "query": {"match_all": {}}}
here is the gist script you can reference: https://gist.github.com/hxuanji/35b982b86b3601cb5571
BTW, I check the commands you provided, it seems you DON'T want the empty string document.
And all my above command are just to find these, so just put it into must_not part of bool query would be fine.
My ES is 1.0.1.
For ES 1.3.0, currently the gist I provided cannot find the empty string. It seems it has been reported: https://github.com/elasticsearch/elasticsearch/issues/7348 . Let's wait and see how it go.
Anyway, it also provides another command to find
{ "query": {
"filtered": {
"filter": {
"not": {
"filter": {
"range": {
"name": {
}
}
}
}
}
} } }
name is the field name to find the empty-string. I've tested it on ES 1.3.2.
I'm using Elasticsearch 5.3 and was having trouble with some of the above answers.
The following body worked for me.
{
"query": {
"bool" : {
"must" : {
"script" : {
"script" : {
"inline": "doc['city'].empty",
"lang": "painless"
}
}
}
}
}
}
Note: you might need to enable the fielddata for text fields, it is disabled by default. Although I would read this: https://www.elastic.co/guide/en/elasticsearch/reference/current/fielddata.html before doing so.
To enable the fielddata for a field e.g. 'city' on index 'business' with type name 'record' you need:
PUT business/_mapping/record
{
"properties": {
"city": {
"type": "text",
"fielddata": true
}
}
}
If you don't want to or can't re-index there is another way. :-)
You can use the negation operator and a wildcard to match any non-blank string *
GET /my_index/_search?q=!(fieldToLookFor:*)
For nested fields use:
curl -XGET "http://localhost:9200/city/_search?pretty=true" -d '{
"query" : {
"nested" : {
"path" : "country",
"score_mode" : "avg",
"query" : {
"bool": {
"must_not": {
"exists": {
"field": "country.name"
}
}
}
}
}
}
}'
NOTE: path and field together constitute for search. Change as required for you to work.
For regular fields:
curl -XGET 'http://localhost:9200/city/_search?pretty=true' -d'{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "name"
}
}
}
}
}'
I didn't manage to search for empty strings in a text field. However it seems to work with a field of type keyword. So I suggest the following:
delete /test_idx
put test_idx
{
"mappings" : {
"testMapping": {
"properties" : {
"tag" : {"type":"text"},
"content" : {"type":"text",
"fields" : {
"x" : {"type" : "keyword"}
}
}
}
}
}
}
put /test_idx/testMapping/1
{
"tag": "null"
}
put /test_idx/testMapping/2
{
"tag": "empty",
"content": ""
}
GET /test_idx/testMapping/_search
{
"query" : {
"match" : {"content.x" : ""}}}
}
}
You need to trigger the keyword indexer by adding .content to your field name. Depending on how the original index was set up, the following "just works" for me using AWS ElasticSearch v6.x.
GET /my_idx/_search?q=my_field.content:""
I am trying to find the empty fields (in indexes with dynamic mapping) and set them to a default value and the below worked for me
Note this is in elastic 7.x
POST <index_name|pattern>/_update_by_query
{
"script": {
"lang": "painless",
"source": """
if (ctx._source.<field name>== "") {
ctx._source.<field_name>= "0";
} else {
ctx.op = "noop";
}
"""
}
}
I followed one of the responses from the thread and came up with below it will do the same
GET index_pattern*/_update_by_query
{
"script": {
"source": "ctx._source.field_name='0'",
"lang": "painless"
},
"query": {
"bool": {
"must": [
{
"exists": {
"field": "field_name"
}
}
],
"must_not": [
{
"wildcard": {
"field_name": "*"
}
}
]
}
}
}
I am also trying to find the documents in the index that dont have the field and add them with a value
one of the responses from this thread helped me to come up with below
GET index_pattern*/_update_by_query
{
"script": {
"source": "ctx._source.field_name='0'",
"lang": "painless"
},
"query": {
"bool": {
"must_not": [
{
"exists": {
"field": "field_name"
}
}
]
}
}
}
Thanks to every one who contributed to this thread I am able to solve my problem

Elasticsearch: Influence scoring with custom score field in document

I have a set of words extracted out of text through NLP algos, with associated score for each word in every document.
For example :
document 1: { "vocab": [ {"wtag":"James Bond", "rscore": 2.14 },
{"wtag":"world", "rscore": 0.86 },
....,
{"wtag":"somemore", "rscore": 3.15 }
]
}
document 2: { "vocab": [ {"wtag":"hiii", "rscore": 1.34 },
{"wtag":"world", "rscore": 0.94 },
....,
{"wtag":"somemore", "rscore": 3.23 }
]
}
I want rscores of matched wtag in each document to affect the _score given to it by ES, maybe multiplied or added to the _score, to influence the final _score (in turn, order) of the resulting documents. Is there any way to achieve this?
Another way of approaching this would be to use nested documents:
First setup the mapping to make vocab a nested document, meaning that each wtag/rscore document would be indexed internally as a separate document:
curl -XPUT "http://localhost:9200/myindex/" -d'
{
"settings": {"number_of_shards": 1},
"mappings": {
"mytype": {
"properties": {
"vocab": {
"type": "nested",
"fields": {
"wtag": {
"type": "string"
},
"rscore": {
"type": "float"
}
}
}
}
}
}
}'
Then index your docs:
curl -XPUT "http://localhost:9200/myindex/mytype/1" -d'
{
"vocab": [
{
"wtag": "James Bond",
"rscore": 2.14
},
{
"wtag": "world",
"rscore": 0.86
},
{
"wtag": "somemore",
"rscore": 3.15
}
]
}'
curl -XPUT "http://localhost:9200/myindex/mytype/2" -d'
{
"vocab": [
{
"wtag": "hiii",
"rscore": 1.34
},
{
"wtag": "world",
"rscore": 0.94
},
{
"wtag": "somemore",
"rscore": 3.23
}
]
}'
And run a nested query to match all the nested documents and add up the values of rscore for each nested document which matches:
curl -XGET "http://localhost:9200/myindex/mytype/_search" -d'
{
"query": {
"nested": {
"path": "vocab",
"score_mode": "sum",
"query": {
"function_score": {
"query": {
"match": {
"vocab.wtag": "james bond world"
}
},
"script_score": {
"script": "doc[\"rscore\"].value"
}
}
}
}
}
}'
Have a look at the delimited payload token filter which you can use to store the scores as payloads, and at text scoring in scripts which gives you access to the payloads.
UPDATED TO INCLUDE EXAMPLE
First you need to setup an analyzer which will take the number after | and store that value as a payload with each token:
curl -XPUT "http://localhost:9200/myindex/" -d'
{
"settings": {
"analysis": {
"analyzer": {
"payloads": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
" delimited_payload_filter"
]
}
}
}
},
"mappings": {
"mytype": {
"properties": {
"text": {
"type": "string",
"analyzer": "payloads",
"term_vector": "with_positions_offsets_payloads"
}
}
}
}
}'
Then index your document:
curl -XPUT "http://localhost:9200/myindex/mytype/1" -d'
{
"text": "James|2.14 Bond|2.14 world|0.86 somemore|3.15"
}'
And finally, search with a function_score query that iterates over each term, retrieves the payload and incorporates it with the _score:
curl -XGET "http://localhost:9200/myindex/mytype/_search" -d'
{
"query": {
"function_score": {
"query": {
"match": {
"text": "james bond"
}
},
"script_score": {
"script": "score=0; for (term: my_terms) { termInfo = _index[\"text\"].get(term,_PAYLOADS ); for (pos : termInfo) { score = score + pos.payloadAsFloat(0);} } return score;",
"params": {
"my_terms": [
"james",
"bond"
]
}
}
}
}
}'
The script itself, when not compressed into one line, looks like this:
score=0;
for (term: my_terms) {
termInfo = _index['text'].get(term,_PAYLOADS );
for (pos : termInfo) {
score = score + pos.payloadAsFloat(0);
}
}
return score;
Warning: accessing payloads has a significant performance cost, and running scripts also has a performance cost. You may want to experiment with it using dynamic scripts as above, then rewrite the script as a native Java script when you're satisfied with the result.
I think that script_score function is what you need (doc).
Function score queries were introduced in 0.90.4 if you are using an older version check custom score queries
You can use the field_value_factor function: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#function-field-value-factor

Sort an elasicsearch resultset based on a filter term

For an ecommerce I am implementing elasticsearch in order to get a sorted and paginated resultset of product ids for a category.
I have a product document which looks like this:
PUT /products_test/product/1
{
"id": "1",
"title": "foobar",
"sort": 102,
"categories": [
"28554568",
"28554577",
"28554578"
],
}
To get the resultset I filter and sort like this:
POST /products/_search
{
"filter": {
"term": {
"categories": "28554666"
}
},
"sort" : [
{ "sort" : {"order" : "asc"}}
]
}
However, how I now learned the requirement is, that the product sorting depends on the category. Looking at the example above this means that I need to add a different sort value for each value in the categories array and depending on the category that I filter by I want to sort by the corresponding sort value.
The document should look something like this:
PUT /products_test/product/1
{
"id": "1",
"title": "foobar",
"categories": [
{ "id": "28554568", "sort": "102" },
{ "id": "28554577", "sort": "482" },
{ "id": "28554578", "sort": "2" }
]
}
My query now should be able to sort something like this:
POST /products/_search
{
"filter": {
"term": {
"categories.id": "28554666"
}
},
"sort" : [
{ "categories.{filtered_category_id}.sort" : {"order" : "asc"}}
]
}
Is it somehow possible to accomplish this?
To achieve this, you will have to store your categories as nested documents. If not, Elasticsearch will not know what sort is associated with what category ID.
Then, you will have to sort on the nested documents, by also filtering to choose the right one.
Here's a runnable example you can play with: https://www.found.no/play/gist/47282a07414e1432de6d
curl -XPUT "$ELASTICSEARCH_ENDPOINT/play" -d '{
"mappings": {
"type": {
"properties": {
"categories": {
"type": "nested"
}
}
}
}
}'
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"type"}}
{"id":1,"title":"foobar","categories":[{"id":"28554568","sort":102},{"id":"28554577","sort":482},{"id":"28554578","sort":2}]}
{"index":{"_index":"play","_type":"type"}}
{"id":2,"title":"barbaz","categories":[{"id":"28554577","sort":0}]}
'
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"query": {
"nested": {
"path": "categories",
"query": {
"term": {
"categories.id": {
"value": 28554577
}
}
}
}
},
"sort": {
"categories.sort": {
"order": "asc",
"nested_filter": {
"term": {
"categories.id": 28554577
}
}
}
}
}
'

Resources