Query with match to get all values for a given field! ElasticSearch - elasticsearch

I'm pretty new to elastic search and would like to write a query for all of the values a specific field? I mean, say i have a field "Number" and "change_manager_group", is there a query to perform list all the numbers of which "change_manager_group = Change Managers - 2"
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 10,
"successful" : 10,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1700,
"max_score" : 1.0,
"hits" : [
{
"_index" : "test-tem-changes",
"_type" : "_doc",
"_id" : "CHG0393073_1554800400000",
"_score" : 1.0,
"_source" : {
"work_notes" : "",
"priority" : "4 - Low",
"planned_start" : 1554800400000,
"Updated_by" : "system",
"Updated" : 1554819333000,
"phase" : "Requested",
"Number" : "CHG0312373",
"change_manager_group" : "Change Managers - 1",
"approval" : "Approved",
"downtime" : "false",
"close_notes" : "",
"Standard_template_version" : "",
"close_code" : null,
"actual_start" : 1554819333000,
"closed_by" : "",
"Type" : "Normal"
}
},
{
"_index" : "test-tem-changes",
"_type" : "_doc",
"_id" : "CHG0406522_0",
"_score" : 1.0,
"_source" : {
"work_notes" : "",
"priority" : "4 - Low",
"planned_start" : 0,
"Updated_by" : "svcmdeploy_automation",
"Updated" : 1553320559000,
"phase" : "Requested",
"Number" : "CHG041232",
"change_manager_group" : "Change Managers - 2",
"approval" : "Approved",
"downtime" : "false",
"close_notes" : "Change Installed",
"Standard_template_version" : "",
"close_code" : "Successful",
"actual_start" : 1553338188000,
"closed_by" : "",
"Type" : "Automated"
}
},
{
"_index" : "test-tem-changes",
"_type" : "_doc",
"_id" : "CHG0406526_0",
"_score" : 1.0,
"_source" : {
"work_notes" : "",
"priority" : "4 - Low",
"planned_start" : 0,
"Updated_by" : "svcmdeploy_automation",
"Updated" : 1553321854000,
"phase" : "Requested",
"Number" : "CHG0412326",
"change_manager_group" : "Change Managers - 2",
"approval" : "Approved",
"downtime" : "false",
"close_notes" : "Change Installed",
"Standard_template_version" : "",
"close_code" : "Successful",
"actual_start" : 1553339629000,
"closed_by" : "",
"Type" : "Automated"
}
},
I tried this after a bit of googling, but that errors out
curl -XGET "http://localhost:9200/test-tem-changes/_search?pretty=true" -H 'Content-Type: application/json' -d '
> {
> "query" : { "Number" : {"query" : "*"} }
> }
> '
What am i missing here?

To get all the documents where change_manager_group ==Change Managers - 2 you want to use a Term Query. Below I am wrapping it in a filter context so that it is faster (does not score relevance).
If change_manager_group is not a keyword mapped field, you may have to use change_manager_group.keyword depending on your mapping.
GET test-tem-changes/_search
{
"query": {
"bool": {
"filter": {
"term": {
"change_manager_group": "Change Managers - 2"
}
}
}
}
}

Related

How to count the number of repetitions of a specific word in specific fields of each document in the ElasticSearch index?

I'm pretty new is ElasticSearch and will be thankful for the help.
I have an index.
It's an example of data:
{
"took" : 12,
"timed_out" : false,
"_shards" : {
"total" : 20,
"successful" : 20,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1834,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "profile_similarity",
"_id" : "9c346fe0-253b-4c68-8f11-97bbb18d9c9a",
"_score" : 1.0,
"_source" : {
"country" : "US",
"city" : "Salt Lake City Metropolitan Area",
"headline" : "Product Manager"
}
},
{
"_index" : "profile_similarity",
"_id" : "e97cdbe8-445f-49f0-b659-6a19829a0a14",
"_score" : 1.0,
"_source" : {
"country" : "US",
"city" : "Los Angeles",
"headline" : "K2 & Amazon, Smarter King, LLC."
}
},
{
"_index" : "profile_similarity",
"_id" : "a7a69710-4fad-4b7d-88e4-bd0873e6fd03",
"_score" : 1.0,
"_source" : {
"country" : "CA",
"city" : "Greater Toronto Area",
"headline" : "Senior Product Manager"
}
}
]
}
}
Its mappings:
{
"profile_similarity_ivan" : {
"mappings" : {
"properties" : {
"city" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
},
"fielddata" : true
},
"country" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
},
"fielddata" : true
},
"headline" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
},
"fielddata" : true
}
}
}
}
}
I would like for fields country and headline to count a number of specific words.
For example, if I search for 'US', an output might be like this:
{
"took" : 12,
"timed_out" : false,
"_shards" : {
"total" : 20,
"successful" : 20,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1834,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "profile_similarity",
"_id" : "9c346fe0-253b-4c68-8f11-97bbb18d9c9a",
"_score" : 1.0,
"_source" : {
"country" : "US",
"city" : "Salt Lake City Metropolitan Area",
"headline" : "Product Manager",
"country_count_US" : 1,
"headline_count_US" : 0
}
},
{
"_index" : "profile_similarity",
"_id" : "e97cdbe8-445f-49f0-b659-6a19829a0a14",
"_score" : 1.0,
"_source" : {
"country" : "US",
"city" : "Los Angeles",
"headline" : "K2 & Amazon, Smarter King, LLC.",
"country_count_US" : 1,
"headline_count_US" : 0
}
},
{
"_index" : "profile_similarity",
"_id" : "a7a69710-4fad-4b7d-88e4-bd0873e6fd03",
"_score" : 1.0,
"_source" : {
"country" : "CA",
"city" : "Greater Toronto Area",
"headline" : "Senior Product Manager",
"country_count_US" : 0,
"headline_count_US" : 0
}
}
]
}
}
I notice that it can be done using runtime fields in ElasticSearch and scripting with painless
In general, I have issues with writing the painless script for this task.
Can you help me please write this script and create the right query in ElasticSearch for this task please?
Also will be thankful for any advice for this task can be finished by other functionality (not only by runtime fields) of ElasticSearch.
Thanks
This can be done but you need to fix three things.
You seem not to have created a mapping for your index, what you show look like the dynamic mappings ES assigns on its own to any given field. Even with your current mappings, you can simply run a terms aggregation on the results of your query and you will get the count of the words that you need. Just pass them as individual terms to be aggregated. Something like this will give you some output.
GET _search
{
"query": {
"match": {
"Country": "US"
}
},
"aggs": {
"country_count": {
"composite" : {
"sources" : [
{"country" : {"terms" : {"field" : "country"}}},
{"id" : {"terms" : {"field" : "_id", "include" : "US"}}}
]
}
}
}
}
The compostie aggregation will return PER DOCUMENT, how many times the word "US" has come.
Just go look at the docs about how to paginate the composite aggregation. This way you can get all the required counts for EVERY SINGLE DOCUMENT.
Composite Aggregation
Generally aggregations are used to get such answers. You may need to tweak the mappings of the fields, to use different analyzers(whitespace).
But generally you just need to use terms aggregations.
HTH.

How do I apply reindex to new data values through filters?

This is basic_data(example) Output value
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 163,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "0513_final_test_instgram",
"_type" : "_doc",
"_id" : "6uShY3kBEkIlakOYovrR",
"_score" : 1.0,
"_source" : {
"host" : "DESKTOP-7MDCA36",
"path" : "C:/python_file/20210513_114123_instargram.csv",
"#version" : "1",
"message" : "hello",
"#timestamp" : "2021-05-13T02:50:05.962Z"
},
{
"_index" : "0513_final_test_instgram",
"_type" : "_doc",
"_id" : "EeShY3kBEkIlakOYovvm",
"_score" : 1.0,
"_source" : {
"host" : "DESKTOP-7MDCA36",
"path" : "C:/python_file/20210513_114123_instargram.csv",
"#version" : "1",
"message" : "python,
"#timestamp" : "2021-05-13T02:50:05.947Z"
}
First of all, out of various field values, only message values have been extracted.(under code example)
GET 0513_final_test_instgram/_search?_source=message&filter_path=hits.hits._source
{
"hits" : {
"hits" : [
{
"_source" : {
"message" : "hello"
}
},
{
"_source" : {
"message" : "python"
}
I got to know reindex that stores new indexes.
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html
However, I don't know even if I look at the document.
0513 attempt code
POST _reindex
{
"source": {
"index": "0513_final_test_instgram"
},
"dest": {
"index": "new_data_index"
}
}
How do you use reindex to store data that only extracted message values in a new index?
update comment attempt
output
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 163,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "new_data_index",
"_type" : "_doc",
"_id" : "6uShY3kBEkIlakOYovrR",
"_score" : 1.0,
"_source" : {
"message" : "hello"
}
},
{
"_index" : "new_data_index",
"_type" : "_doc",
"_id" : "EeShY3kBEkIlakOYovvm",
"_score" : 1.0,
"_source" : {
"message" : "python"
}
}
You simply need to specify which fields you want to reindex into the new index:
{
"source": {
"index": "0513_final_test_instgram",
"_source": ["message"]
},
"dest": {
"index": "new_data_index"
}
}

How to get max_score in a query in elasticsearch

Hi I am new to elastic search, I was wondering how I could get the max_score of my first query and then compare it to the rest of the values. For example if the max_score was 2.6 I would want to take that value and compare it with the _score of all the docs in the query.
GET searchentities/_search
{
"query": {
"multi_match": {
"query": "iron man",
"type": "bool_prefix",
"fields": [
"title",
"title._2gram",
"title._3gram"
],
"minimum_should_match": 2
}
}
}
Gives back the following:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 2.4475412,
"hits" : [
{
"_index" : "searchentities",
"_type" : "_doc",
"_id" : "IronMan",
"_score" : 2.4475412,
"_source" : {
"id" : "Iron Man",
"ad" : false,
"verified" : false,
"clicks" : 2,
"photoID" : "5f8a9dd82ab79c00017722bb",
"title" : "Iron Man "
}
},
{
"_index" : "searchentities",
"_type" : "_doc",
"_id" : "IronMan3",
"_score" : 2.2448254,
"_source" : {
"id" : "Iron Man 3 ",
"ad" : false,
"verified" : false,
"clicks" : 2,
"photoID" : "5f8a9dd82ab79c00017722bb",
"title" : "Iron Man 3 "
}
},
{
"_index" : "searchentities",
"_type" : "_doc",
"_id" : "IronMan2",
"_score" : 2.2448254,
"_source" : {
"id" : "Iron Man 2 ",
"ad" : false,
"verified" : false,
"clicks" : 20,
"photoID" : "5f8a9dd82ab79c00017722bb",
"title" : "Iron Man 2"
}
}
]
}
}
I would want to see
2.44-2.44 = 0
2.44-2.24 = .2
2.44-2.24 = .2
Unfortunately there is no way to post process the score in this way in elastic.
The max score is just the highest score after all scores have been calculated.
This values is not available in the postprocess possibilities that exists in Elasticsearch.
See:
https://discuss.elastic.co/t/using-max-score-inside-script-function/156871/3
I think you have to write your own program processing the result and returning the difference between scores.

How to perform the arthimatic operation on data from elasticsearch

I need to have average of cpuload on specific nodetype. For example if I give nodetype as tpt it should give the average of cpuload of nodetype's of all tpt available. I tried different methods but vain...
My data in elasticsearch is below:
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 1.0,
"hits" : [
{
"_index" : "kpi",
"_type" : "kpi",
"_id" : "\u0003",
"_score" : 1.0,
"_source" : {
"kpi" : {
"CpuAverageLoad" : 13,
"NodeId" : "kishan",
"NodeType" : "Tpt",
"State" : "online",
"Static_limit" : 0
}
}
},
{
"_index" : "kpi",
"_type" : "kpi",
"_id" : "\u0005",
"_score" : 1.0,
"_source" : {
"kpi" : {
"CpuAverageLoad" : 15,
"NodeId" : "kishan1",
"NodeType" : "tpt",
"State" : "online",
"Static_limit" : 0
}
}
},
{
"_index" : "kpi",
"_type" : "kpi",
"_id" : "\u0004",
"_score" : 1.0,
"_source" : {
"kpi" : {
"MaxLbCapacity" : "700000",
"NodeId" : "kishan2",
"NodeType" : "bang",
"OnlineCSCF" : [
"001",
"002"
],
"State" : "Online",
"TdbGroup" : 1,
"TdGroup" : 0
}
}
},
{
"_index" : "kpi",
"_type" : "kpi",
"_id" : "\u0002",
"_score" : 1.0,
"_source" : {
"kpi" : {
"MaxLbCapacity" : "700000",
"NodeId" : "kishan3",
"NodeType" : "bang",
"OnlineCSCF" : [
"001",
"002"
],
"State" : "Online",
"TdLGroup" : 1,
"TGroup" : 0
}
}
}
]
}
}
And my query is
curl -XGET 'localhost:9200/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"bool" : {
"must" : {
"script" : {
"script" : {
"source" : "kpi[CpuAverageLoad].value > params.param1",
"lang" : "painless",
"params" : {
"param1" : 5
}
}
}
}
}
}
}'
but is falling as it is unable to find the exact source.
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "[script] unknown field [source], parser not found"
}
],
"type" : "illegal_argument_exception",
"reason" : "[script] unknown field [source], parser not found"
},
"status" : 400
}

How to read the JSON output of a faceted search query?

I am having Movies that belong to a genre and have multiple ratings. With ElasticSearch, I want to do a faceted search on Genres first, and then Ratings.
I was reading about the idea here: http://www.elasticsearch.org/guide/reference/api/search/facets/
But I am confused how to understand the output of this Curl query:
curl -X POST "http://localhost:9200/movies/_search?pretty=true" -d '
{
"query" : { "query_string" : {"query" : "T*"} },
"facets" : {
"categories" : { "terms" : {"field" : "categories"} }
}
}
'
{
"took" : 35,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 1.0,
"hits" : [ {
"_index" : "movies",
"_type" : "movie",
"_id" : "13",
"_score" : 1.0, "_source" : {"category_id":2,"created_at":"2013-05-03T16:40:21Z","description":null,"title":"Tiny Plastic Men","updated_at":"2013-05-03T16:40:21Z","user_id":null}
}, {
"_index" : "movies",
"_type" : "movie",
"_id" : "32",
"_score" : 1.0, "_source" : {"category_id":14,"created_at":"2013-05-03T16:55:02Z","description":null,"title":"The Extreme Truth","updated_at":"2013-05-03T16:55:02Z","user_id":null}
}, {
"_index" : "movies",
"_type" : "movie",
"_id" : "39",
"_score" : 1.0, "_source" : {"category_id":7,"created_at":"2013-05-03T16:55:02Z","description":null,"title":"A Time of Day","updated_at":"2013-05-03T16:55:02Z","user_id":null}
} ]
},
"facets" : {
"categories" : {
"_type" : "terms",
"missing" : 3,
"total" : 0,
"other" : 0,
"terms" : [ ]
}
}
I am having some movies that start with a 'T', but additionally I would expect movies from the Genre/Category 'Thriller'.
Therefore, what can I read from the JSON above?
It seems like your facet does not match any fields in your document you should probably use:
curl -X POST "http://localhost:9200/movies/_search?pretty=true" -d '
{
"query" : { "query_string" : {"query" : "T*"} },
"facets" : {
"categories" : { "terms" : {"field" : "category_id"} }
}
}
'
then you sould get a list of category_id and a count of documents in each category_id
Facets are deprecated. See https://www.elastic.co/guide/en/elasticsearch/reference/1.6/search-facets.html
Better alternative is to use aggregations: https://www.elastic.co/guide/en/elasticsearch/reference/1.6/search-aggregations.html

Resources