How to add suggestion inside term query in DSL - elasticsearch

My DOc is below
[
{'id':1, 'name': 'sachin messi', 'description': 'football#football.com', 'type': 'football', 'var':'sports'},
{'id':2, 'name': 'lionel messi', 'description': 'messi#fifa.com','type': 'soccer','var':'sports'},
{'id':3, 'name': 'sachin', 'description': 'was', 'type': 'cricket', 'var':'sports'}
]
I need to suggest a string which after the term
My DSL query is below
quer = {
"query": {
"bool": {
"must": [
{
"terms": {
"var.keyword": [
"notsports"
]
}
},
{
"query_string": {
"query": "schin",
"fields": [
"name^128",
"description^64",
]
}
}
]
}
},
"suggest": {
"my-suggestion": {
"text": "schin",
"term": {
"field": "name",
"prefix_length": 0,
"min_word_length": 3,
"string_distance": "ngram"
}
}
}
}
My var.keyword is notsports
still i am getting suggestion 'suggest': {'my-suggestion': [{'text':'schin','offset':0,'length':5,'options': [{'text':'sachin', 'score': 0.75, 'freq': 1}]}
When i tried to put suggest inside terms list then i am getting RequestError: RequestError(400, 'x_content_parse_exception', 'unknown query [suggest]')
I need to get the suggestion only if var.keyword matches sports
I have asked question in elasticsearch also https://discuss.elastic.co/t/how-to-add-suggestion-inside-term-query-in-dsl/309893

Related

How to sort with case insensitive without changing the settings

My index name is data_new
Below is the code to insert into index
test = [ {'id':1,'name': 'A', 'professor': ['Bill Cage', 'accounting']},
{ 'id':2, 'name': 'AB', 'professor': ['Gregg Payne', 'engineering']},
{'id':3, 'name': 'a', 'professor': ['Bill Cage', 'accounting']},
{'id':4,'name': 'Tax Accounting 200', 'professor': ['Thomas Baszo', 'finance']},
{'id':5,'name': 'Capital Markets 350', 'professor': ['Thomas Baszo', 'finance']},
{'id':6,'name': 'Theatre 410', 'professor': ['Sebastian Hern', 'art']},
{'id':7,'name': 'Accounting 101', 'professor': ['Thomas Baszo', 'finance']},
{'id':8,'name': 'Marketing 101', 'professor': ['William Smith', 'finance']},
{'id':8,'name': 'Anthropology 230', 'professor': ['Devin Cranford', 'history']},
{'id':10, 'name': 'Computer Science 101',
'professor': ['Gregg Payne', 'engineering']}]
from elasticsearch import Elasticsearch
import json
es = Elasticsearch()
es.indices.create(index='data_new', ignore=400)
for e in test:
es.index(index="data_new", body=e, id=e['id'])
search = es.search(index="data_new", body={"from" : 0, "size" : 2,"query": {"match_all": {}}})
search['hits']['hits']
Right now
[{'id':1,'name': 'A'},
{ 'id':2, 'name': 'AB'},
{'id':3, 'name': 'a'}]
Expected is in below order
[{'id':1,'name': 'A'},
{ 'id':3, 'name': 'a'},
{'id':2, 'name': 'AB'}]
for input ["a", "b", "B", "C", "c", "A"]
the result is : ["A", "B", "C", "a", "b", "c"]
I want output as ["A", "a", "B", "b", "C", "c"]
Expected out
My first Expected output > I need to sort the output with respect to name only in {Case insensitive}. I need to normalise name keyword and sort
How to do the modification on search = es.search(index="data_new", body={"from" : 0, "size" : 2,"query": {"match_all": {}}})
I have updated the code with below
search = es.search(index="data_new", body={ "sort" : [{"name.keyword" : {"order" : "asc"}], {"size": 1000, "query": {"query_string": {"query": "A"}}})
with "normalizer": "case_insensitive"}
I got the error
RequestError: RequestError(400, 'x_content_parse_exception', '[1:41] [field_sort] unknown field [normalizer]')
In order to do this you will have to use a script with ctx._source.mykey.toLowerCase()
https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-sort-context.html
You can find another post which talk about it:
Script-based sorting on Elasticsearch date field
And a good article with an example here:
https://qbox.io/blog/how-to-painless-scripting-in-elasticsearch
Code will look like (not tested)
{
"query": {
"match_all": {}
},
"sort": {
"_script": {
"type": "string",
"order": "asc",
"script": {
"lang": "painless",
"inline": "doc['name.keyword'].value.toLowerCase()"
}
}
}
}
Note: It's a bad practice and you should do it only for a one shot query. If you want your application to stay healthy you should implement the solution suggested by saeednasehi.
You can also use index sorting to be more performant.
In order to use normalizer, you need to define it into your mapping. you are not able to use it as an argument in your search. In your case, you need to have two fields for sort. I have made this by copying data to other fields. the first field has lowercase normalizer and the other one not.
PUT /test_index/
{
"settings": {
"analysis": {
"normalizer": {
"myLowercase": {
"type": "custom",
"filter": [ "lowercase" ]
}
}
}
},
"mappings":{
"post":{
"properties":{
"name":{
"normalizer":"myLowercase",
"type":"keyword",
"copy_to": [
"name2"
]
},
"name2":{
"type":"keyword"
}
}
}
}
}
And your query would be something like this:
GET test_index/_search
{
"query": {
"match_all": {}
},"sort": [
{
"name": {
"order": "asc"
}
},
{
"name2":{
"order": "asc"
}
}
]
}
This is the mapping and setting that you must have for your name field in your indices and you need to add other fields to the mapping as well. Please have the attention that this is for elasticsearch version below 7. If you use elasticsearch version 7 you must delete doc_type which is named post here from the mapping.

Aggregation value inside array of array elasticsearch

i have json structure like this:
[{
'id': 1,
'result': [{
"score": 0.0,
"result_rules": [{
"rule_id": "sr-1",
},
{
"rule_id": "sr-2",
}
]
}]
},
{
'id': 2,
'result': [{
"score": 0.0,
"result_rules": [{
"rule_id": "sr-1",
},
{
"rule_id": "sr-4",
}
]
}]
}]
i want to count rule_id, so the result would be:
[
{
'rule_id': 'sr-1',
'doc_count': 2
},
{
'rule_id': 'sr-2',
'doc_count': 1
},
{
'rule_id': 'sr-4',
'doc_count': 1
}
]
i've tried something like this, but it's showing empty aggregation
{
"aggs":{
"group_by_rule_id":{
"terms":{
"field": "result.result_rules.rule_id.keyword"
}
}
}
}
For aggregation on nested structure you would have to use nested aggregation.
See the example on ES DOC.

Date math in elastic watcher email

I would like to find the datetime for 1 day ago so that I can create link to kibana in an email sent from the watcher. Using Elasticsearch 5.0.2
I've tried the watch below but it returns an error of
ScriptException[runtime error]; nested: IllegalArgumentException[Unable to find dynamic method [minusDays] with [1] arguments for class [org.joda.time.DateTime].];
minusDays does exist in the joda DateTime spec
but it doesn't exist in the elastic codebase
here's the watch
PUT /_xpack/watcher/watch/errors-prod
{
"trigger": {
"schedule": {
"daily": {
"at": [
"08:36"
]
}
}
},
"input": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
"<das-logstash-{now}>",
"<das-logstash-{now-1d}>"
],
"types": [
"redis-input"
],
"body": {
"size": 0,
"query": {
"match_all": {}
}
}
}
}
},
"actions": {
"send_email": {
"transform": {
"script" : "return [ 'from' : ctx.trigger.scheduled_time.minusDays(1) ]"
},
"email": {
"profile": "standard",
"from": "noreply#email.com",
"to": [
"me#email.com"
],
"subject": "errors",
"body": {
"html": "<html><body><p>from {{ctx.payload.from}}</p><p>to {{ctx.trigger.scheduled_time}}</p></body></html>"
}
}
}
}
}
I needed something similar and was able to hack this together by modifying a comment that almost worked from an elastic forum.
"transform": {
"script" : {
"source" : "def payload = ctx.payload; DateFormat df = new SimpleDateFormat(\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\"); ctx.payload.from = df.format(Date.from(Instant.ofEpochMilli(ctx.execution_time.getMillis() - (24 * 60 * 60 * 1000) ))); return payload"
}
},
Hope that helps!

How to get latest values for each group with an Elasticsearch query?

I have some documents indexed on Elasticsearch, looking like these samples:
{'country': 'France', 'collected': '2015-03-12', 'value': 20}
{'country': 'Canada', 'collected': '2015-03-12', 'value': 21}
{'country': 'Brazil', 'collected': '2015-03-12', 'value': 33}
{'country': 'France', 'collected': '2015-02-01', 'value': 10}
{'country': 'Canada', 'collected': '2015-02-01', 'value': 11}
{'country': 'Mexico', 'collected': '2015-02-01', 'value': 9}
...
I want to build a query that gets one result per country, getting only the ones with max(collected).
So, for the examples shown above, the results would be something like:
{'country': 'France', 'collected': '2015-03-12', 'value': 20}
{'country': 'Canada', 'collected': '2015-03-12', 'value': 21}
{'country': 'Brazil', 'collected': '2015-03-12', 'value': 33}
{'country': 'Mexico', 'collected': '2015-02-01', 'value': 9}
I realized I need to do aggregation on country, but I'm failing to understand how to limit the results on max(collected).
Any ideas?
You can use a top_hits aggregation that groups on the country field, returns 1 doc per group, and orders the docs by the collected date descending:
POST /test/_search?search_type=count
{
"aggs": {
"group": {
"terms": {
"field": "country"
},
"aggs": {
"group_docs": {
"top_hits": {
"size": 1,
"sort": [
{
"collected": {
"order": "desc"
}
}
]
}
}
}
}
}
}
For those like user1892775 who run into "Fielddata is disabled on text fields by default...", you can create a multi field (https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html). So you might have mapping like:
"mapping": {
"properties": {
"country": {"type": "string", "fields": {"raw": {"type": "string", "index": "not_analyzed"}}}
}
Then your query would look like
POST /test/_search?search_type=count
{
"aggs": {
"group": {
"terms": {
"field": "country.raw"
},
"aggs": {
"group_docs": {
"top_hits": {
"size": 1,
"sort": [
{
"collected": {
"order": "desc"
}
}
]
}
}
}
}
}
}
(Note the use of country.raw)
The answer marked correct worked great for me. Here is how I added some extra filters. This is version 7.4 on AWS.
The field I'm grouping by is a keyword field named tags.
For each group (tag), get top 3 documents sorted by date_uploaded descending.
Also show the total amount of documents within each group (tag).
Only consider non-deleted documents belonging to user 22.
Only return 10 groups (tags), sorted alphabetically.
For each document, return its ID (book_id) and date_uploaded. (Default is that all info is returned.)
Size:0 keeps the query from returning lots of info about all the documents.
{'query': {'bool': {'filter': [{'terms': {'user_id': [22]}}, {'terms': {'deleted': ['false']}}]}},
'size': 0,
"aggs": {
"group": {
"terms": {
"field": "tags.keyword",
"size":10,
"order":{ "_key": "asc" }
},
"aggs": {
"group_docs": {
"top_hits": {
"size": 3,
"_source":["book_id","date_uploaded"],
"sort": [ {"date_uploaded": { "order": "desc" }}]
}
}
}
}
}
}
Here is how to get each group (tag in my case) and the document matches for each group.
query_results = ... result of query
buckets = query_results["aggregations"]["group"]["buckets"]
for bucket in buckets:
tag = bucket["key"]
tag_doc_count = bucket["doc_count"]
print tag, tag_total_doc_count
tag_hits = bucket["group_docs"]["hits"]["hits"]
for hit in tag_hits:
source = hit["_source"]
print source["book_id"], source["date_uploaded"]
FYI, the "group" term can be named anything. Just make sure to use the same name when getting buckets from your query results.

Using multi-match query in a multi-field does not work

Our system stores account in the following format: acct:username#domain
But for many searches we only need to username, so for the user created memos I've decided to make the user field a multi_field like this:
{
'text': {
'type': 'string'
}
'user': {
'type': 'multi_field',
'path': 'just_name',
'fields': {
'user': {
'type': 'string',
'index': 'analyzed',
'analyzer': 'lower_keyword'
},
'username': {
'type': 'string',
'index': 'analyzed',
'analyzer': 'username'
}
}
}
}
and other settings:
__settings__ = {
'analysis': {
'tokenizer': {
'username': {
'type': 'pattern',
'group': 1,
'pattern': '^acct:(.+)#.*$'
}
},
'analyzer': {
'lower_keyword': {
'type': 'custom',
'tokenizer': 'keyword',
'filter': 'lowercase'
},
'username': {
'tokenizer': 'username',
'filter': 'lowercase'
}
}
}
}
Now, if I make a query for the username it works. I.e if I have the following user: acct:testuser#testdomain
and I make a query like this:
{
"query": {
"bool": {
"must": [
{
"terms": {
"username": [
"testuser"
]
}
}
],
"minimum_number_should_match": 1
}
},
"size": 50
}
It works (I know it can be done much easier but this is a system generated query).
But, I need to make searches which looks for a string in both the text and the username fields.
I've decided to use a multi-match query for this.
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"operator": "and",
"query": "testuser",
"type": "cross_fields",
"fields": [
"text",
"username"
]
}
}
],
"minimum_number_should_match": 1
}
},
"size": 50
}
Now the problem is, that this query does not work for the username field. It does for the text field, and for other fields if I include them, but does not bring back any result for the username field.
Can you help me what am I doing wrong?
I've forgotten that the username analyzer would also tokenize my searches for match/multi match queries. That way the string 'testuser' was analyzed and it generated zero token.
So the solution is to change the username's field mapping to:
'username': {
'type': 'string',
'index': 'analyzed',
'index_analyzer': 'username',
'search_analyzer': 'lower_keyword'
}
and now both queries are working.

Resources