How to read the JSON output of a faceted search query? - elasticsearch

I am having Movies that belong to a genre and have multiple ratings. With ElasticSearch, I want to do a faceted search on Genres first, and then Ratings.
I was reading about the idea here: http://www.elasticsearch.org/guide/reference/api/search/facets/
But I am confused how to understand the output of this Curl query:
curl -X POST "http://localhost:9200/movies/_search?pretty=true" -d '
{
"query" : { "query_string" : {"query" : "T*"} },
"facets" : {
"categories" : { "terms" : {"field" : "categories"} }
}
}
'
{
"took" : 35,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 1.0,
"hits" : [ {
"_index" : "movies",
"_type" : "movie",
"_id" : "13",
"_score" : 1.0, "_source" : {"category_id":2,"created_at":"2013-05-03T16:40:21Z","description":null,"title":"Tiny Plastic Men","updated_at":"2013-05-03T16:40:21Z","user_id":null}
}, {
"_index" : "movies",
"_type" : "movie",
"_id" : "32",
"_score" : 1.0, "_source" : {"category_id":14,"created_at":"2013-05-03T16:55:02Z","description":null,"title":"The Extreme Truth","updated_at":"2013-05-03T16:55:02Z","user_id":null}
}, {
"_index" : "movies",
"_type" : "movie",
"_id" : "39",
"_score" : 1.0, "_source" : {"category_id":7,"created_at":"2013-05-03T16:55:02Z","description":null,"title":"A Time of Day","updated_at":"2013-05-03T16:55:02Z","user_id":null}
} ]
},
"facets" : {
"categories" : {
"_type" : "terms",
"missing" : 3,
"total" : 0,
"other" : 0,
"terms" : [ ]
}
}
I am having some movies that start with a 'T', but additionally I would expect movies from the Genre/Category 'Thriller'.
Therefore, what can I read from the JSON above?

It seems like your facet does not match any fields in your document you should probably use:
curl -X POST "http://localhost:9200/movies/_search?pretty=true" -d '
{
"query" : { "query_string" : {"query" : "T*"} },
"facets" : {
"categories" : { "terms" : {"field" : "category_id"} }
}
}
'
then you sould get a list of category_id and a count of documents in each category_id

Facets are deprecated. See https://www.elastic.co/guide/en/elasticsearch/reference/1.6/search-facets.html
Better alternative is to use aggregations: https://www.elastic.co/guide/en/elasticsearch/reference/1.6/search-aggregations.html

Related

How do I apply reindex to new data values through filters?

This is basic_data(example) Output value
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 163,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "0513_final_test_instgram",
"_type" : "_doc",
"_id" : "6uShY3kBEkIlakOYovrR",
"_score" : 1.0,
"_source" : {
"host" : "DESKTOP-7MDCA36",
"path" : "C:/python_file/20210513_114123_instargram.csv",
"#version" : "1",
"message" : "hello",
"#timestamp" : "2021-05-13T02:50:05.962Z"
},
{
"_index" : "0513_final_test_instgram",
"_type" : "_doc",
"_id" : "EeShY3kBEkIlakOYovvm",
"_score" : 1.0,
"_source" : {
"host" : "DESKTOP-7MDCA36",
"path" : "C:/python_file/20210513_114123_instargram.csv",
"#version" : "1",
"message" : "python,
"#timestamp" : "2021-05-13T02:50:05.947Z"
}
First of all, out of various field values, only message values have been extracted.(under code example)
GET 0513_final_test_instgram/_search?_source=message&filter_path=hits.hits._source
{
"hits" : {
"hits" : [
{
"_source" : {
"message" : "hello"
}
},
{
"_source" : {
"message" : "python"
}
I got to know reindex that stores new indexes.
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html
However, I don't know even if I look at the document.
0513 attempt code
POST _reindex
{
"source": {
"index": "0513_final_test_instgram"
},
"dest": {
"index": "new_data_index"
}
}
How do you use reindex to store data that only extracted message values in a new index?
update comment attempt
output
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 163,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "new_data_index",
"_type" : "_doc",
"_id" : "6uShY3kBEkIlakOYovrR",
"_score" : 1.0,
"_source" : {
"message" : "hello"
}
},
{
"_index" : "new_data_index",
"_type" : "_doc",
"_id" : "EeShY3kBEkIlakOYovvm",
"_score" : 1.0,
"_source" : {
"message" : "python"
}
}
You simply need to specify which fields you want to reindex into the new index:
{
"source": {
"index": "0513_final_test_instgram",
"_source": ["message"]
},
"dest": {
"index": "new_data_index"
}
}

How to get max_score in a query in elasticsearch

Hi I am new to elastic search, I was wondering how I could get the max_score of my first query and then compare it to the rest of the values. For example if the max_score was 2.6 I would want to take that value and compare it with the _score of all the docs in the query.
GET searchentities/_search
{
"query": {
"multi_match": {
"query": "iron man",
"type": "bool_prefix",
"fields": [
"title",
"title._2gram",
"title._3gram"
],
"minimum_should_match": 2
}
}
}
Gives back the following:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 2.4475412,
"hits" : [
{
"_index" : "searchentities",
"_type" : "_doc",
"_id" : "IronMan",
"_score" : 2.4475412,
"_source" : {
"id" : "Iron Man",
"ad" : false,
"verified" : false,
"clicks" : 2,
"photoID" : "5f8a9dd82ab79c00017722bb",
"title" : "Iron Man "
}
},
{
"_index" : "searchentities",
"_type" : "_doc",
"_id" : "IronMan3",
"_score" : 2.2448254,
"_source" : {
"id" : "Iron Man 3 ",
"ad" : false,
"verified" : false,
"clicks" : 2,
"photoID" : "5f8a9dd82ab79c00017722bb",
"title" : "Iron Man 3 "
}
},
{
"_index" : "searchentities",
"_type" : "_doc",
"_id" : "IronMan2",
"_score" : 2.2448254,
"_source" : {
"id" : "Iron Man 2 ",
"ad" : false,
"verified" : false,
"clicks" : 20,
"photoID" : "5f8a9dd82ab79c00017722bb",
"title" : "Iron Man 2"
}
}
]
}
}
I would want to see
2.44-2.44 = 0
2.44-2.24 = .2
2.44-2.24 = .2
Unfortunately there is no way to post process the score in this way in elastic.
The max score is just the highest score after all scores have been calculated.
This values is not available in the postprocess possibilities that exists in Elasticsearch.
See:
https://discuss.elastic.co/t/using-max-score-inside-script-function/156871/3
I think you have to write your own program processing the result and returning the difference between scores.

Using index sorting by default in _search

I am using ElasticSearch 7.6 and the Index Sorting feature which was introduced in 6.0.
What i am looking to do is to do a GET /myindice/_search without specifying sort and get documents based on Index sorting settings I have specified for my index and NOT insertion order.
My index as per the doc :
PUT twitter
{
"settings" : {
"index" : {
"sort.field" : "date",
"sort.order" : "desc"
}
},
"mappings": {
"properties": {
"date": {
"type": "date"
}
}
}
}
PUT twitter/_doc/a
{
"date": "2015-01-01"
}
PUT twitter/_doc/b
{
"date": "2016-01-01"
}
PUT twitter/_doc/c
{
"date": "2017-01-01"
}
My initial thought is that
GET twitter/_search
Should return doc C, B and A.
I get the following :
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "a",
"_score" : 1.0,
"_source" : {
"date" : "2015-01-01"
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "b",
"_score" : 1.0,
"_source" : {
"date" : "2016-01-01"
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "c",
"_score" : 1.0,
"_source" : {
"date" : "2017-01-01"
}
}
]
}
}
As the documentation isn't clear at this particular subject and that all query are using sort :
https://www.elastic.co/guide/en/elasticsearch/reference/6.0/index-modules-index-sorting.html
Am I required to specify the sort order in the GET query (hence repeating the sort specified as the Index Sorting) ?
Thanks in advance for any diligent soul that could help me,

Elasticsearch suggestion scoring not working with fuzzy search

When next elasticsearch query getting data for autocomplete recieved data is not relevant and scoring not working
GET quick_search/_search
{
"suggest": {
"name-suggest": {
"text": "Clic",
"completion": {
"field": "Name",
"size": 25,
"skip_duplicates": true,
"fuzzy" : {
"fuzziness": 1,
"prefix_length": 1,
"min_length": 4,
"unicode_aware": true
}
}
}
}
}
Query for search is "Clic" but in search results fuzzy search found not maximum relevant data. How can I boost my results for maximum relevancy for words as "CLIC7000" cause for my query it more relative than "CLI36"
{
"took" : 706,
"timed_out" : false,
"_shards" : {
"total" : 15,
"successful" : 15,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : 0.0,
"hits" : [ ]
},
"suggest" : {
"name-suggest" : [
{
"text" : "Clic",
"offset" : 0,
"length" : 4,
"options" : [
{
"text" : "CLI36",
"_index" : "quick_search",
"_type" : "quick_search",
"_id" : "330719",
"_score" : 3.0,
"_source" : {
"ID" : "330719",
"Name" : "CLI36"
}
},
{
"text" : "CLI361511B001",
"_index" : "quick_search",
"_type" : "quick_search",
"_id" : "330717",
"_score" : 3.0,
"_source" : {
"ID" : "330717",
"Name" : "CLI361511B001"
}
},
{
"text" : "CLI42C6385B001",
"_index" : "quick_search",
"_type" : "quick_search",
"_id" : "185340",
"_score" : 3.0,
"_source" : {
"ID" : "185340",
"Name" : "CLI42C6385B001"
}
},
{
"text" : "CLI42PM",
"_index" : "quick_search",
"_type" : "quick_search",
"_id" : "185345",
"_score" : 3.0,
"_source" : {
"ID" : "185345",
"Name" : "CLI42PM",
}
},
{
"text" : "CLI42PM6389B001",
"_index" : "quick_search",
"_type" : "quick_search",
"_id" : "185343",
"_score" : 3.0,
"_source" : {
"ID" : "185343",
"Name" : "CLI42PM6389B001"
}
},
{
"text" : "CLI441",
"_index" : "quick_search",
"_type" : "quick_search",
"_id" : "233554",
"_score" : 3.0,
"_source" : {
"ID" : "233554",
"Name" : "CLI441"
}
},
{
"text" : "CLI451BK",
"_index" : "quick_search",
"_type" : "quick_search",
"_id" : "185334",
"_score" : 3.0,
"_source" : {
"ID" : "185334",
"Name" : "CLI451BK"
}
},
{
"text" : "CLI451BK6523B001",
"_index" : "quick_search",
"_type" : "quick_search",
"_id" : "185332",
"_score" : 3.0,
"_source" : {
"ID" : "185332",
"Name" : "CLI451BK6523B001"
}
},
{
"text" : "CLI451C",
"_index" : "quick_search",
"_type" : "quick_search",
"_id" : "185331",
"_score" : 3.0,
"_source" : {
"ID" : "185331",
"Name" : "CLI451C"
}
}
]
}
]
}
}

How to perform the arthimatic operation on data from elasticsearch

I need to have average of cpuload on specific nodetype. For example if I give nodetype as tpt it should give the average of cpuload of nodetype's of all tpt available. I tried different methods but vain...
My data in elasticsearch is below:
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 1.0,
"hits" : [
{
"_index" : "kpi",
"_type" : "kpi",
"_id" : "\u0003",
"_score" : 1.0,
"_source" : {
"kpi" : {
"CpuAverageLoad" : 13,
"NodeId" : "kishan",
"NodeType" : "Tpt",
"State" : "online",
"Static_limit" : 0
}
}
},
{
"_index" : "kpi",
"_type" : "kpi",
"_id" : "\u0005",
"_score" : 1.0,
"_source" : {
"kpi" : {
"CpuAverageLoad" : 15,
"NodeId" : "kishan1",
"NodeType" : "tpt",
"State" : "online",
"Static_limit" : 0
}
}
},
{
"_index" : "kpi",
"_type" : "kpi",
"_id" : "\u0004",
"_score" : 1.0,
"_source" : {
"kpi" : {
"MaxLbCapacity" : "700000",
"NodeId" : "kishan2",
"NodeType" : "bang",
"OnlineCSCF" : [
"001",
"002"
],
"State" : "Online",
"TdbGroup" : 1,
"TdGroup" : 0
}
}
},
{
"_index" : "kpi",
"_type" : "kpi",
"_id" : "\u0002",
"_score" : 1.0,
"_source" : {
"kpi" : {
"MaxLbCapacity" : "700000",
"NodeId" : "kishan3",
"NodeType" : "bang",
"OnlineCSCF" : [
"001",
"002"
],
"State" : "Online",
"TdLGroup" : 1,
"TGroup" : 0
}
}
}
]
}
}
And my query is
curl -XGET 'localhost:9200/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"bool" : {
"must" : {
"script" : {
"script" : {
"source" : "kpi[CpuAverageLoad].value > params.param1",
"lang" : "painless",
"params" : {
"param1" : 5
}
}
}
}
}
}
}'
but is falling as it is unable to find the exact source.
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "[script] unknown field [source], parser not found"
}
],
"type" : "illegal_argument_exception",
"reason" : "[script] unknown field [source], parser not found"
},
"status" : 400
}

Resources