Elasticsearch suggestion scoring not working with fuzzy search - elasticsearch

When next elasticsearch query getting data for autocomplete recieved data is not relevant and scoring not working
GET quick_search/_search
{
"suggest": {
"name-suggest": {
"text": "Clic",
"completion": {
"field": "Name",
"size": 25,
"skip_duplicates": true,
"fuzzy" : {
"fuzziness": 1,
"prefix_length": 1,
"min_length": 4,
"unicode_aware": true
}
}
}
}
}
Query for search is "Clic" but in search results fuzzy search found not maximum relevant data. How can I boost my results for maximum relevancy for words as "CLIC7000" cause for my query it more relative than "CLI36"
{
"took" : 706,
"timed_out" : false,
"_shards" : {
"total" : 15,
"successful" : 15,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : 0.0,
"hits" : [ ]
},
"suggest" : {
"name-suggest" : [
{
"text" : "Clic",
"offset" : 0,
"length" : 4,
"options" : [
{
"text" : "CLI36",
"_index" : "quick_search",
"_type" : "quick_search",
"_id" : "330719",
"_score" : 3.0,
"_source" : {
"ID" : "330719",
"Name" : "CLI36"
}
},
{
"text" : "CLI361511B001",
"_index" : "quick_search",
"_type" : "quick_search",
"_id" : "330717",
"_score" : 3.0,
"_source" : {
"ID" : "330717",
"Name" : "CLI361511B001"
}
},
{
"text" : "CLI42C6385B001",
"_index" : "quick_search",
"_type" : "quick_search",
"_id" : "185340",
"_score" : 3.0,
"_source" : {
"ID" : "185340",
"Name" : "CLI42C6385B001"
}
},
{
"text" : "CLI42PM",
"_index" : "quick_search",
"_type" : "quick_search",
"_id" : "185345",
"_score" : 3.0,
"_source" : {
"ID" : "185345",
"Name" : "CLI42PM",
}
},
{
"text" : "CLI42PM6389B001",
"_index" : "quick_search",
"_type" : "quick_search",
"_id" : "185343",
"_score" : 3.0,
"_source" : {
"ID" : "185343",
"Name" : "CLI42PM6389B001"
}
},
{
"text" : "CLI441",
"_index" : "quick_search",
"_type" : "quick_search",
"_id" : "233554",
"_score" : 3.0,
"_source" : {
"ID" : "233554",
"Name" : "CLI441"
}
},
{
"text" : "CLI451BK",
"_index" : "quick_search",
"_type" : "quick_search",
"_id" : "185334",
"_score" : 3.0,
"_source" : {
"ID" : "185334",
"Name" : "CLI451BK"
}
},
{
"text" : "CLI451BK6523B001",
"_index" : "quick_search",
"_type" : "quick_search",
"_id" : "185332",
"_score" : 3.0,
"_source" : {
"ID" : "185332",
"Name" : "CLI451BK6523B001"
}
},
{
"text" : "CLI451C",
"_index" : "quick_search",
"_type" : "quick_search",
"_id" : "185331",
"_score" : 3.0,
"_source" : {
"ID" : "185331",
"Name" : "CLI451C"
}
}
]
}
]
}
}

Related

I need help for a query elasticsearch

I need help for a query.
This is my query and my sample :
GET /product/_search
{
"query": {
"bool" : {
"must" : {
"multi_match" : {
"query": "Torsades",
"fields": [ "ean^10", "name^4", "brand" ]
}
}
}
}
}
[
{
"_index" : "product_2022-05-13-194440",
"_type" : "_doc",
"_id" : "1",
"_score" : 13.78764,
"_source" : {
"country" : 1,
"ean" : "3250391967858",
"name" : "Torsades Semi-complètes BIO - 500G",
"brand" : "Fiorini"
}
},
{
"_index" : "product_2022-05-13-194440",
"_type" : "_doc",
"_id" : "74",
"_score" : 13.78764,
"_source" : {
"country" : null,
"ean" : "3564700009826",
"name" : "Pâtes Torsades - Turini - 500 g",
"brand" : "Turini"
}
},
{
"_index" : "product_2022-05-13-194440",
"_type" : "_doc",
"_id" : "78",
"_score" : 11.964245,
"_source" : {
"country" : null,
"ean" : "3250391967858",
"name" : "Torsades Semi-complètes BIO - 500G - ITM BENCHMARK",
"brand" : "Fiorini"
}
}
]
I want a condition specific and I can't find the solution :
I want :
ALL products for country=1 AND (ALL products for country=null MINUS product.ean IN country=1)
In my sample, I want have 2 hits :
THIS is deleted because EAN in country=1 :
{
"_index" : "product_2022-05-13-194440",
"_type" : "_doc",
"_id" : "78",
"_score" : 11.964245,
"_source" : {
"country" : null,
"ean" : "3250391967858",
"name" : "Torsades Semi-complètes BIO - 500G - ITM BENCHMARK",
"brand" : "Fiorini"
}
}
Someone have a solution ?
EDIT :
I want this result :
[
{
"_index" : "product_2022-05-13-194440",
"_type" : "_doc",
"_id" : "1",
"_score" : 13.78764,
"_source" : {
"country" : 1,
"ean" : "3250391967858",
"name" : "Torsades Semi-complètes BIO - 500G",
"brand" : "Fiorini"
}
},
{
"_index" : "product_2022-05-13-194440",
"_type" : "_doc",
"_id" : "74",
"_score" : 13.78764,
"_source" : {
"country" : null,
"ean" : "3564700009826",
"name" : "Pâtes Torsades - Turini - 500 g",
"brand" : "Turini"
}
}
]
You tried to use Field Collapsing?
GET test/_search
{
"query": {
"bool": {
"must": {
"multi_match": {
"query": "Torsades",
"fields": [
"ean^10",
"name^4",
"brand"
]
}
}
}
},
"collapse": {
"field": "ean.keyword"
}
}
Response:
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.5611319,
"_source" : {
"country" : 1,
"ean" : "3250391967858",
"name" : "Torsades Semi-complètes BIO - 500G",
"brand" : "Fiorini"
},
"fields" : {
"ean.keyword" : [
"3250391967858"
]
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.5611319,
"_source" : {
"country" : null,
"ean" : "3564700009826",
"name" : "Pâtes Torsades - Turini - 500 g",
"brand" : "Turini"
},
"fields" : {
"ean.keyword" : [
"3564700009826"
]
}
}
]

How do I apply reindex to new data values through filters?

This is basic_data(example) Output value
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 163,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "0513_final_test_instgram",
"_type" : "_doc",
"_id" : "6uShY3kBEkIlakOYovrR",
"_score" : 1.0,
"_source" : {
"host" : "DESKTOP-7MDCA36",
"path" : "C:/python_file/20210513_114123_instargram.csv",
"#version" : "1",
"message" : "hello",
"#timestamp" : "2021-05-13T02:50:05.962Z"
},
{
"_index" : "0513_final_test_instgram",
"_type" : "_doc",
"_id" : "EeShY3kBEkIlakOYovvm",
"_score" : 1.0,
"_source" : {
"host" : "DESKTOP-7MDCA36",
"path" : "C:/python_file/20210513_114123_instargram.csv",
"#version" : "1",
"message" : "python,
"#timestamp" : "2021-05-13T02:50:05.947Z"
}
First of all, out of various field values, only message values have been extracted.(under code example)
GET 0513_final_test_instgram/_search?_source=message&filter_path=hits.hits._source
{
"hits" : {
"hits" : [
{
"_source" : {
"message" : "hello"
}
},
{
"_source" : {
"message" : "python"
}
I got to know reindex that stores new indexes.
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html
However, I don't know even if I look at the document.
0513 attempt code
POST _reindex
{
"source": {
"index": "0513_final_test_instgram"
},
"dest": {
"index": "new_data_index"
}
}
How do you use reindex to store data that only extracted message values in a new index?
update comment attempt
output
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 163,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "new_data_index",
"_type" : "_doc",
"_id" : "6uShY3kBEkIlakOYovrR",
"_score" : 1.0,
"_source" : {
"message" : "hello"
}
},
{
"_index" : "new_data_index",
"_type" : "_doc",
"_id" : "EeShY3kBEkIlakOYovvm",
"_score" : 1.0,
"_source" : {
"message" : "python"
}
}
You simply need to specify which fields you want to reindex into the new index:
{
"source": {
"index": "0513_final_test_instgram",
"_source": ["message"]
},
"dest": {
"index": "new_data_index"
}
}

How do I extract the "message" value from elasticsearch? (DSL)

How do I extract the "message" value from elasticsearch? (DSL)
I tried it. (my code)
-> I want to extract all the "message" values only.
GET 0503instgram_csv/_search
{
"query" : {
"query_string": {
"default_field": "message",
"query": ?????????????
}
}
}
-> I want to process the data by saving new field values of all "message" that are printed out.
I'd really appreciate your help.
#ESCoder
This is a picture of the result of the attempt as you said.
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 281,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "0503instgram_csv",
"_type" : "_doc",
"_id" : "zbKQMXkB98wUkKJOL8dT",
"_score" : 1.0,
"_source" : {
"message" : "\"lovablepoetree"
}
},
{
"_index" : "0503instgram_csv",
"_type" : "_doc",
"_id" : "zrKQMXkB98wUkKJOL8dU",
"_score" : 1.0,
"_source" : {
"message" : """내가 아는 사람 중에 최고 셀럽(#hanstar.kim)과 맥주 마심🍻셀럽과 술이라니....! 성공해따 나자신!!!!!! 그래놓고 사진은 나 혼자 찍어따^.^ 다음번엔 투샷을 찍어보쟈......🥰"""
}
},
{
"_index" : "0503instgram_csv",
"_type" : "_doc",
"_id" : "z7KQMXkB98wUkKJOL8dU",
"_score" : 1.0,
"_source" : {
"message" : """🔥🔥..열정을 응원 합니다. 도대현 드림❤️❤️❤️"""
}
},
{
"_index" : "0503instgram_csv",
"_type" : "_doc",
"_id" : "0LKQMXkB98wUkKJOL8dU",
"_score" : 1.0,
"_source" : {
"message" : "lovablepoetree"
}
},
{
"_index" : "0503instgram_csv",
"_type" : "_doc",
"_id" : "0bKQMXkB98wUkKJOL8dj",
"_score" : 1.0,
"_source" : {
"message" : """좋아요 69개
"""
}
},
{
"_index" : "0503instgram_csv",
"_type" : "_doc",
"_id" : "0rKQMXkB98wUkKJOL8dj",
"_score" : 1.0,
"_source" : {
"message" : """['paulchang1103,Paul Chang (장준성)🇰🇷,passionated_man,도대현,luv_____juju,쥬쥬,koonge01,영이,p.s.j___5959,또둔,panchitoyoon,Ye Suk Yoon,hyeriiing__,혤,sunny.gibbab,써니네식탁(sunny Gib-bab),_wjstn_ry_02,전수교(20),sungwoon_jinsik,윤성운🇰🇷,t_a_k_3014,🎀케이,팔로우']
"""
}
},
{
"_index" : "0503instgram_csv",
"_type" : "_doc",
"_id" : "07KQMXkB98wUkKJOL8dj",
"_score" : 1.0,
"_source" : {
"message" : "passionated_man"
}
},
{
"_index" : "0503instgram_csv",
"_type" : "_doc",
"_id" : "1LKQMXkB98wUkKJOL8dj",
"_score" : 1.0,
"_source" : {
"message" : "1일"
}
},
{
"_index" : "0503instgram_csv",
"_type" : "_doc",
"_id" : "1bKQMXkB98wUkKJOL8dj",
"_score" : 1.0,
"_source" : {
"message" : """🔥🔥..열정을 응원 합니다. 도대현 드림❤️❤️❤️"""
}
},
{
"_index" : "0503instgram_csv",
"_type" : "_doc",
"_id" : "1rKQMXkB98wUkKJOL8dj",
"_score" : 1.0,
"_source" : {
"message" : """1일답글 달기","lovablepoetree"""
}
}
]
}
}
You can do source filtering to extract only message field values
GET/_search
{
"_source": [
"message"
]
}
Update 1:
You can use the reindex API, to store only the field values of the message field in a different index
POST /_reindex
{
"source": {
"index": "old-index",
"_source": ["message"]
},
"dest": {
"index": "new-index"
}
}

How to fire a single query to fetch the following count of users by month till date in Elasticsearch

How to fire a single query to fetch the following count of users by month till date
Index: user (list of all users in a company)
users who joined this month (month till date - i.e. in Nov)
users who joined previous month (say Oct)
...
users who joined on second month (say Feb)
users who joined on first month (say Jan)
Is there a quick way to fetch all information using a single query, I would like to see a response which contains all information retrieved from a single query?
If I understood your issue well, I suggest you use a date histogram aggregation alongside with a range query.
gte stands for greater than or equal to while lte stands for less than or equal to.
I am first specifying the range I need in this case from Nov 2020 to Jan 2020. Based on this result, I would do an aggregation with interval of one month.
I am assuming for each user a document is being created in the index.
I indexed the following data in my index:
"hits" : [
{
"_index" : "month-count",
"_type" : "_doc",
"_id" : "UPv_l3UBrH4n7Et0xLpD",
"_score" : 1.0,
"_source" : {
"timestamp" : "2020-01-01"
}
},
{
"_index" : "month-count",
"_type" : "_doc",
"_id" : "qPv_l3UBrH4n7Et0-7r9",
"_score" : 1.0,
"_source" : {
"timestamp" : "2020-01-01"
}
},
{
"_index" : "month-count",
"_type" : "_doc",
"_id" : "r_sAmHUBrH4n7Et0e7s-",
"_score" : 1.0,
"_source" : {
"timestamp" : "2020-09-01"
}
},
{
"_index" : "month-count",
"_type" : "_doc",
"_id" : "s_sAmHUBrH4n7Et0gLsh",
"_score" : 1.0,
"_source" : {
"timestamp" : "2020-09-01"
}
},
{
"_index" : "month-count",
"_type" : "_doc",
"_id" : "TfsAmHUBrH4n7Et0nrwS",
"_score" : 1.0,
"_source" : {
"timestamp" : "2020-02-01"
}
},
{
"_index" : "month-count",
"_type" : "_doc",
"_id" : "-vsAmHUBrH4n7Et07Ly-",
"_score" : 1.0,
"_source" : {
"timestamp" : "2020-03-01"
}
},
{
"_index" : "month-count",
"_type" : "_doc",
"_id" : "_vsAmHUBrH4n7Et09LyD",
"_score" : 1.0,
"_source" : {
"timestamp" : "2020-03-01"
}
},
{
"_index" : "month-count",
"_type" : "_doc",
"_id" : "APsAmHUBrH4n7Et0972w",
"_score" : 1.0,
"_source" : {
"timestamp" : "2020-03-01"
}
}
]
The query which I use:
GET month-count/_search
{
"query": {
"range": {
"timestamp": {
"gte": "now-11M/M",
"lte": "now/M"
}
}
},
"aggs": {
"get_Month": {
"date_histogram": {
"field": "timestamp",
"interval": "month"
}
}
}
}
The response:
"hits" : [
{
"_index" : "month-count",
"_type" : "_doc",
"_id" : "UPv_l3UBrH4n7Et0xLpD",
"_score" : 1.0,
"_source" : {
"timestamp" : "2020-01-01"
}
},
{
"_index" : "month-count",
"_type" : "_doc",
"_id" : "qPv_l3UBrH4n7Et0-7r9",
"_score" : 1.0,
"_source" : {
"timestamp" : "2020-01-01"
}
},
{
"_index" : "month-count",
"_type" : "_doc",
"_id" : "r_sAmHUBrH4n7Et0e7s-",
"_score" : 1.0,
"_source" : {
"timestamp" : "2020-09-01"
}
},
{
"_index" : "month-count",
"_type" : "_doc",
"_id" : "s_sAmHUBrH4n7Et0gLsh",
"_score" : 1.0,
"_source" : {
"timestamp" : "2020-09-01"
}
},
{
"_index" : "month-count",
"_type" : "_doc",
"_id" : "TfsAmHUBrH4n7Et0nrwS",
"_score" : 1.0,
"_source" : {
"timestamp" : "2020-02-01"
}
},
{
"_index" : "month-count",
"_type" : "_doc",
"_id" : "-vsAmHUBrH4n7Et07Ly-",
"_score" : 1.0,
"_source" : {
"timestamp" : "2020-03-01"
}
},
{
"_index" : "month-count",
"_type" : "_doc",
"_id" : "_vsAmHUBrH4n7Et09LyD",
"_score" : 1.0,
"_source" : {
"timestamp" : "2020-03-01"
}
},
{
"_index" : "month-count",
"_type" : "_doc",
"_id" : "APsAmHUBrH4n7Et0972w",
"_score" : 1.0,
"_source" : {
"timestamp" : "2020-03-01"
}
}
]
},
"aggregations" : {
"get_Month" : {
"buckets" : [
{
"key_as_string" : "2020-01-01T00:00:00.000Z",
"key" : 1577836800000,
"doc_count" : 2
},
{
"key_as_string" : "2020-02-01T00:00:00.000Z",
"key" : 1580515200000,
"doc_count" : 1
},
{
"key_as_string" : "2020-03-01T00:00:00.000Z",
"key" : 1583020800000,
"doc_count" : 3
},
{
"key_as_string" : "2020-04-01T00:00:00.000Z",
"key" : 1585699200000,
"doc_count" : 0
},
{
"key_as_string" : "2020-05-01T00:00:00.000Z",
"key" : 1588291200000,
"doc_count" : 0
},
{
"key_as_string" : "2020-06-01T00:00:00.000Z",
"key" : 1590969600000,
"doc_count" : 0
},
{
"key_as_string" : "2020-07-01T00:00:00.000Z",
"key" : 1593561600000,
"doc_count" : 0
},
{
"key_as_string" : "2020-08-01T00:00:00.000Z",
"key" : 1596240000000,
"doc_count" : 0
},
{
"key_as_string" : "2020-09-01T00:00:00.000Z",
"key" : 1598918400000,
"doc_count" : 2
}
]
}
}
The doc_count is what you need I think.
Let me know if you need help, I will be glad to help you.
Link:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html

How to perform range searches on Height in elasticsearch

How to represent Height and HeightRange in Elasticsearch, so that its easier to do range searches
Height.java: int feet, int inches;
HeightRange.java: Height from, Height to
I want to search for users who fall in a certain range (say 5ft - 6ft)
If I understood your issue well, you ay use a range query as follows. I did a local test as follows, where I ingested the following data:
"hits" : [
{
"_index" : "height-index-array",
"_type" : "_doc",
"_id" : "OfCHdXUB1QlsTOLdRJgd",
"_score" : 1.0,
"_source" : {
"user" : "user1",
"height" : {
"feet" : 5,
"inch" : 8
}
}
},
{
"_index" : "height-index-array",
"_type" : "_doc",
"_id" : "CfCJdXUB1QlsTOLdEZxS",
"_score" : 1.0,
"_source" : {
"user" : "user2",
"height" : {
"feet" : 7,
"inch" : 9
}
}
},
{
"_index" : "height-index-array",
"_type" : "_doc",
"_id" : "CvCJdXUB1QlsTOLdEpx5",
"_score" : 1.0,
"_source" : {
"user" : "user3",
"height" : {
"feet" : 5,
"inch" : 6
}
}
},
{
"_index" : "height-index-array",
"_type" : "_doc",
"_id" : "C_CJdXUB1QlsTOLdE5yk",
"_score" : 1.0,
"_source" : {
"user" : "user4",
"height" : {
"feet" : 5,
"inch" : 8
}
}
},
{
"_index" : "height-index-array",
"_type" : "_doc",
"_id" : "T_CJdXUB1QlsTOLdFZwx",
"_score" : 1.0,
"_source" : {
"user" : "user5",
"height" : {
"feet" : 2,
"inch" : 3
}
}
}
]
The query which I used to query the the height in feet between 5 and 6:
"query": {
"range": {
"height.feet": {
"gte": 5,
"lte": 6
}
}
}
The gte is equivalent to greater than or equal to and the lte is equivalent to less than or equal to.
The results are:
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "height-index-array",
"_type" : "_doc",
"_id" : "OfCHdXUB1QlsTOLdRJgd",
"_score" : 1.0,
"_source" : {
"user" : "user1",
"height" : {
"feet" : 5,
"inch" : 8
}
}
},
{
"_index" : "height-index-array",
"_type" : "_doc",
"_id" : "CvCJdXUB1QlsTOLdEpx5",
"_score" : 1.0,
"_source" : {
"user" : "user3",
"height" : {
"feet" : 5,
"inch" : 6
}
}
},
{
"_index" : "height-index-array",
"_type" : "_doc",
"_id" : "C_CJdXUB1QlsTOLdE5yk",
"_score" : 1.0,
"_source" : {
"user" : "user4",
"height" : {
"feet" : 5,
"inch" : 8
}
}
}
]
}
Let me know if you have any issues, I will be glad to help :)
As per your request, if you need to combine both metrics, you may use a bool query:
"query": {
"bool": {
"must": [
{
"range": {
"height.feet": {
"gte": 5,
"lte": 6
}
}
},{
"range": {
"height.inch": {
"gte": 6,
"lte": 8
}
}
}
]
}
}
The response:
"hits" : [
{
"_index" : "height-index-array",
"_type" : "_doc",
"_id" : "OfCHdXUB1QlsTOLdRJgd",
"_score" : 2.0,
"_source" : {
"user" : "user1",
"height" : {
"feet" : 5,
"inch" : 8
}
}
},
{
"_index" : "height-index-array",
"_type" : "_doc",
"_id" : "CvCJdXUB1QlsTOLdEpx5",
"_score" : 2.0,
"_source" : {
"user" : "user3",
"height" : {
"feet" : 5,
"inch" : 6
}
}
},
{
"_index" : "height-index-array",
"_type" : "_doc",
"_id" : "C_CJdXUB1QlsTOLdE5yk",
"_score" : 2.0,
"_source" : {
"user" : "user4",
"height" : {
"feet" : 5,
"inch" : 8
}
}
}
]
Links:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html

Resources