Search by text field - elasticsearch

Here is my index:
λ curl -XGET -u elastic:elasticpassword http://192.168.1.71:9200/test/mytype/_search?pretty -d'{"query":{"match_all":{}}}'
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [
{
"_index" : "test",
"_type" : "mytype",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "Dio",
"age" : 10
}
},
{
"_index" : "test",
"_type" : "mytype",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "Paul",
"pro" : {
"f" : "Cris",
"t" : "So"
}
}
}
]
}
}
Here is a default mapping:
λ curl -XGET -u elastic:elasticpassword http://192.168.1.71:9200/test/mytype/_mapping?pretty
{
"test" : {
"mappings" : {
"mytype" : {
"properties" : {
"age" : {
"type" : "long"
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
I can find by age field, but cannot by name field. Why ?
λ curl -XGET -u elastic:elasticpassword http://192.168.1.71:9200/test/mytype/_search?pretty -d'{"query":{"term":{"age":10}}}'
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [
{
"_index" : "test",
"_type" : "mytype",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "Dio",
"age" : 10
}
}
]
}
}
λ curl -XGET -u elastic:elasticpassword http://192.168.1.71:9200/test/mytype/_search?pretty -d'{"query":{"term":{"name":"Paul"}}}'
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}

The problem is that you name field is analyzed by default with the standard analyzer, which lowercases the field. You can either search for paul or search in name.keyword field with Paul.

Related

elasticsearch reducing result to one column - return only 1 value for each document

I try to reduce the json result of elasticsearch to only the column or columns i suggested to get. Is there any way?
When I use the following command, I get the result nasted into "_source":
{
"from": "0", "size":"2",
"_source":["id"],
"query": {
"match_all": {}
}
}
'
and there is no need for my use case.
I get this result:
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "indexer_1",
"_type" : "type_indexer_1",
"_id" : "38142",
"_score" : 1.0,
"_source" : {
"id" : 38142
}
},
{
"_index" : "indexer_1",
"_type" : "type_indexer_1",
"_id" : "38147",
"_score" : 1.0,
"_source" : {
"id" : 38147
}
}
]
}
}
What I would like to have:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : 1.0,
"hits" : [
{
"id" : 38142
},
{
"id" : 38147
}
]
}
}
And this json-result I would love:
{
{
"id" : 38142
},
{
"id" : 38147
}
}
Is there any way out of the box in ES to reduce the result set?
you can filter the output JSON look at the documentation : response filtering
GET /index/_search?filter_path=hits.hits._id
{
"from": "0",
"size":"2",
"_source":["id"],
"query": {
"match_all": {}
}
}

Null field in elasticsearch need to be replaced

How can i replace the "build_duration" : "null", with value 21600000 in elasticsearch?
DevTools > Console
GET myindex/_search
{
"query": {
"term": {
"build_duration": "null"
}
}
}
Output:-
{
"took" : 10,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 9.658761,
"hits" : [
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "40324749",
"_score" : 9.658761,
"_source" : {
"build_duration" : "null",
"build_end_time" : "2021-05-20 04:00:36",
"build_requester" : "daniel.su",
"build_site" : "POL",
"build_id" : "40324749",
"#version" : "1"
}
}
]
}
}
With below query able to replace the filed value.
POST /myindex/_update/mydocid
{
"doc" : {
"build_duration": "21600000"
}
}

How do I apply reindex to new data values through filters?

This is basic_data(example) Output value
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 163,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "0513_final_test_instgram",
"_type" : "_doc",
"_id" : "6uShY3kBEkIlakOYovrR",
"_score" : 1.0,
"_source" : {
"host" : "DESKTOP-7MDCA36",
"path" : "C:/python_file/20210513_114123_instargram.csv",
"#version" : "1",
"message" : "hello",
"#timestamp" : "2021-05-13T02:50:05.962Z"
},
{
"_index" : "0513_final_test_instgram",
"_type" : "_doc",
"_id" : "EeShY3kBEkIlakOYovvm",
"_score" : 1.0,
"_source" : {
"host" : "DESKTOP-7MDCA36",
"path" : "C:/python_file/20210513_114123_instargram.csv",
"#version" : "1",
"message" : "python,
"#timestamp" : "2021-05-13T02:50:05.947Z"
}
First of all, out of various field values, only message values have been extracted.(under code example)
GET 0513_final_test_instgram/_search?_source=message&filter_path=hits.hits._source
{
"hits" : {
"hits" : [
{
"_source" : {
"message" : "hello"
}
},
{
"_source" : {
"message" : "python"
}
I got to know reindex that stores new indexes.
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html
However, I don't know even if I look at the document.
0513 attempt code
POST _reindex
{
"source": {
"index": "0513_final_test_instgram"
},
"dest": {
"index": "new_data_index"
}
}
How do you use reindex to store data that only extracted message values in a new index?
update comment attempt
output
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 163,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "new_data_index",
"_type" : "_doc",
"_id" : "6uShY3kBEkIlakOYovrR",
"_score" : 1.0,
"_source" : {
"message" : "hello"
}
},
{
"_index" : "new_data_index",
"_type" : "_doc",
"_id" : "EeShY3kBEkIlakOYovvm",
"_score" : 1.0,
"_source" : {
"message" : "python"
}
}
You simply need to specify which fields you want to reindex into the new index:
{
"source": {
"index": "0513_final_test_instgram",
"_source": ["message"]
},
"dest": {
"index": "new_data_index"
}
}

How to perform the arthimatic operation on data from elasticsearch

I need to have average of cpuload on specific nodetype. For example if I give nodetype as tpt it should give the average of cpuload of nodetype's of all tpt available. I tried different methods but vain...
My data in elasticsearch is below:
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 1.0,
"hits" : [
{
"_index" : "kpi",
"_type" : "kpi",
"_id" : "\u0003",
"_score" : 1.0,
"_source" : {
"kpi" : {
"CpuAverageLoad" : 13,
"NodeId" : "kishan",
"NodeType" : "Tpt",
"State" : "online",
"Static_limit" : 0
}
}
},
{
"_index" : "kpi",
"_type" : "kpi",
"_id" : "\u0005",
"_score" : 1.0,
"_source" : {
"kpi" : {
"CpuAverageLoad" : 15,
"NodeId" : "kishan1",
"NodeType" : "tpt",
"State" : "online",
"Static_limit" : 0
}
}
},
{
"_index" : "kpi",
"_type" : "kpi",
"_id" : "\u0004",
"_score" : 1.0,
"_source" : {
"kpi" : {
"MaxLbCapacity" : "700000",
"NodeId" : "kishan2",
"NodeType" : "bang",
"OnlineCSCF" : [
"001",
"002"
],
"State" : "Online",
"TdbGroup" : 1,
"TdGroup" : 0
}
}
},
{
"_index" : "kpi",
"_type" : "kpi",
"_id" : "\u0002",
"_score" : 1.0,
"_source" : {
"kpi" : {
"MaxLbCapacity" : "700000",
"NodeId" : "kishan3",
"NodeType" : "bang",
"OnlineCSCF" : [
"001",
"002"
],
"State" : "Online",
"TdLGroup" : 1,
"TGroup" : 0
}
}
}
]
}
}
And my query is
curl -XGET 'localhost:9200/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"bool" : {
"must" : {
"script" : {
"script" : {
"source" : "kpi[CpuAverageLoad].value > params.param1",
"lang" : "painless",
"params" : {
"param1" : 5
}
}
}
}
}
}
}'
but is falling as it is unable to find the exact source.
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "[script] unknown field [source], parser not found"
}
],
"type" : "illegal_argument_exception",
"reason" : "[script] unknown field [source], parser not found"
},
"status" : 400
}

ElasticSearch doubles my _id field when filtering on it

I don't understand why ES doubles my _id field in an array when I'm filtering on it.
curl -X GET "http://localhost:9200/pgep-development_broadcasts/broadcast/_search?pretty=true" -d '{"query":{"match_all":{}},"fields":["_id", "title"]}'
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "pgep-development_broadcasts",
"_type" : "broadcast",
"_id" : "50ed959dcc93282abc000062",
"_score" : 1.0,
"fields" : {
"_id" : [ "50ed959dcc93282abc000062", "50ed959dcc93282abc000062" ],
"title" : "24 heures d'info"
}
} ]
}
}
You are probably hitting this bug https://github.com/elasticsearch/elasticsearch/issues/2161. If this is the case, you can simply stop storing id field.

Resources