Elastic search retrieving results with half the value

Elastic search retrieving results with half the value - elasticsearch

I am trying to check whether client_id is already exists in the index or not. but problem is ES still retrieving full ID, even though I am giving half of the id. here is the mapping.
'mappings': {
'properties': {
'client_id': {'index': 'true','type': 'keyword'},
'client_name': {'index': 'true', 'type': 'keyword'},
'data_index_server': {'type': 'ip'},
'data_file_node_path': {'index': 'true', 'type': 'keyword'},
}
If I have the record like this
{
"_index": "client_index",
"_type": "_doc",
"_id": "wYlkrYMB_q_jkYaCv6pU",
"_version": 1,
"_score": 1,
"_source": {
"doc": {
"client_id": "0935be6b-61fe-4ec4-80c8-5c5ee8384378",
"client_name": "citi",
"data_file_node_path": " ",
"data_index_server": " "
}
},
"fields": {
"doc.client_id": [
"0935be6b-61fe-4ec4-80c8-5c5ee8384378"
],
"doc.client_name": [
"sample_name"
],
"doc.data_index_server": [
" "
],
"doc.client_name.keyword": [
"citi"
],
"doc.data_file_node_path.keyword": [
" "
],
"doc.client_id.keyword": [
"0935be6b-61fe-4ec4-80c8-5c5ee8384378"
],
"doc.data_index_server.keyword": [
" "
],
"doc.data_file_node_path": [
" "
]
}
}
my request for search is this. and if I am taking some part of the ID and search against it. I am expecting to be hits will be zero
POST /client_index/_search
{
"query": {
"match": {
"doc.client_id":"0935be6b-61fe-4ec4-80c8" ,
}
}
}
I have follwed this url: How to make elastic search only match full field. change the fields type to keywords and also toggled index between true and false, but no result

I think you have mixed up few things, in the mapping you showed just four properties which are simple fields, while in the search response, it has additional fields under fields which are of type object.
You read it correct, keyword fields are not analyzed and on these fields you will get results only on full field value.
You should just try to define your client-id as a keyword field in new index with your sample field and search on that index to see it in action.

Related

How to get fields inside message array from Logstash?

I've been trying to configure a logstash pipeline with input type is snmptrap along with yamlmibdir. Here's the code
input {
snmptrap {
host => "abc"
port => 1062
yamlmibdir => "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/snmp-1.3.2/data/ruby/snmp/mibs"
}
}
filter {
mutate {
gsub => ["message","^\"{","{"]
gsub => ["message","}\"$","}"]
gsub => ["message","[\\]",""]
}
json { source => "message" }
split {
field => "message"
target => "evetns"
}
}
output {
elasticsearch {
hosts => "xyz"
index => "logstash-%{+YYYY.MM.dd}"
}
stdout { codec => rubydebug }
}
and the result shown in Kibana (JSON format)
{
"_index": "logstash-2019.11.18-000001",
"_type": "_doc",
"_id": "Y_5zjG4B6M9gb7sxUJwG",
"_version": 1,
"_score": null,
"_source": {
"#version": "1",
"#timestamp": "2019-11-21T05:33:07.675Z",
"tags": [
"_jsonparsefailure"
],
"1.11.12.13.14.15": "teststring",
"message": "#<SNMP::SNMPv1_Trap:0x244bf33f #enterprise=[1.2.3.4.5.6], #timestamp=#<SNMP::TimeTicks:0x196a1590 #value=55>, #varbind_list=[#<SNMP::VarBind:0x21f5e155 #name=[1.11.12.13.14.15], #value=\"teststring\">], #specific_trap=99, #source_ip=\"xyz\", #agent_addr=#<SNMP::IpAddress:0x5a5c3c5f #value=\"xC0xC1xC2xC3\">, #generic_trap=6>",
"host": "xyz"
},
"fields": {
"#timestamp": [
"2019-11-21T05:33:07.675Z"
]
},
"sort": [
1574314387675
]
}
As you can see in the message field, it's an array so how can I get all the field inside the array. also able to select these field to display on Kibana.
ps1. still got tags _jsonparsefailure if select type 'Table' in Expanded document
ps2. even if using gsub for remove '\' from expected json result, why still got an result with '\' ?

how to store my json log file to logstash with json filter

This is my json log file. I'm trying to store the file to my elastic-Search through my logstash.
{ "id": "135569", "title" : "Star Trek Beyond", "year":2016 , "genre":
["Action", "Adventure", "Sci-Fi"] }
after storing the data into the elasticSearch, my results is as follow
{
"_index": "filebeat-6.2.4-2018.11.09",
"_type": "doc",
"_id": "n-J39mYB6zb53NvEugMO",
"_score": 1,
"_source": {
"#timestamp": "2018-11-09T03:15:32.262Z",
"source": "/Users/jinwoopark/Jin/json_files/testJson.log",
"offset": 106,
"message": """{ "id": "135569", "title" : "Star Trek Beyond", "year":2016 , "genre":["Action", "Adventure", "Sci-Fi"] }""",
"id": "%{id}",
"#version": "1",
"host": "Jinui-MacBook-Pro.local",
"tags": [
"beats_input_codec_plain_applied"
],
"prospector": {
"type": "log"
},
"title": "%{title}",
"beat": {
"name": "Jinui-MacBook-Pro.local",
"hostname": "Jinui-MacBook-Pro.local",
"version": "6.2.4"
}
}
}
What I'm trying to do is that,
I want to store only "genre value" into the message field, and store other values(ex id, title) into extra fields(the created fields, which is id and title field). but the extra fields were stored with empty values(%{id}, %{title}). It seems like I need to modify my logstash json filter, but here I need your help.
my current configuration of logstash is as follow
input {
beats {
port => 5044
}
}
filter {
json {
source => "genre" //want to store only genre (from json log) into message field
}
mutate {
add_field => {
"id" => "%{id}" // want to create extra field for id value from log file
"title" => "%{title}" // want to create extra field for title value from log file
}
}
date {
match => [ "timestamp", "dd/MM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "%{[#metadata][beat]}-%{[#metadata][version]}-%{+YYYY.MM.dd}"
}
stdout {
codec => rubydebug
}
}

When you tell the json filter that the source is genre, it should ignore the rest of the document, which would explain why you don't get an id or title.
Seems like you should parse the entire json document, and use the mutate->replace plugin to move the contents of genre to message.

ElasticSearch - Get key of searched value

I search for key word machine4 in my ES . My python client is simply:
result = es.search('machine4', index='machines')
Result look like this
[
{
"_score": 0.13424811,
"_type": "person",
"_id": "2",
"_source": {
"date": "**20180601**",
"deleted": [],
"changed": [
"machine1",
"machine2",
"machine3"
],
"**added**": [
"**machine4**",
"machine5"
]
},
"_index": "contacts"
},
{
"_score": 0.13424811,
"_type": "person",
"_id": "3",
"_source": {
"date": "**20180701**",
"deleted": [
"machine2"
],
"**changed**": [
"machine1",
"**machine4**",
"machine3"
],
"added": [
"machine7"
]
},
"_index": "contacts"
}
]
So we can easily see:
In date 20180601 , machine4 belonged to added.
In date 20180701 , machine4 belonged to changed.
I can write another function to analyze the result. Basically loop through every key,value of each items and check if the searched keyword belong, like this:
for result in search_results['hits']['hits']:
source_result = result['_source']
for key,value in source_result.items():
if 'machine4' in value:
print key
However, I wonder if ES having API to detect which key/mapping/field that the searched keywords belonged to ? In this case is added of the 1st result, and changed in 2nd result
Thank you so much
Alex

The simple answer seems to be that no, Elasticsearch doesn't have a way to do this out of the box, because Lucene doesn't have it, as per this thread
Elasticsearch has the concept of highlights, however. These could be useful, but they do require you to have some idea about which fields the match may be in.
The ES Python search documentation suggests there's no way to do that as a parameter to search, but you could create a custom query and pass it on as the q argument. It would look something like:
q = {"query" : {"match": { "content": "'machine4'" }}, "highlight" : {"fields" : {"added" : {}, "updated": {}}}}
result = es.search(index='machines', q=q)
Hope this is helpful!

How to use MultiTermVectorsAsync

I am trying to call the below query using NEST
GET 123_original/_doc/_mtermvectors
{
"ids": [
"9a271078-086f-4f4b-8ca0-16376c2f49a7",
"481ce3db-69bf-4886-9c38-fcb878d44925"
],
"parameters": {
"fields": ["*"],
"positions": false,
"offsets": false,
"payloads": false,
"term_statistics": false,
"field_statistics": false
}
}
The NEST API (I think) would look something like this
var term = await elasticClient.MultiTermVectorsAsync(x =>
{
return x.Index(indexOriginal) // 123_original
.Type(typeName) // _doc
.GetMany<ElasticDataSet>(ids.Keys) // list of strings
.Fields("*")
.FieldStatistics(false)
.Positions(false)
.Offsets(false)
.TermStatistics(false)
.Payloads(false);
});
The problem is that the above API is returning the following error
Index name is null for the given type and no default index is set. Map an index name using ConnectionSettings.DefaultMappingFor<TDocument>() or set a default index using ConnectionSettings.DefaultIndex().
And this is the query that its trying to execute which has the index in it and is missing the ids, but works in Kibana when the ids are set.
123_original/_doc/_mtermvectors?fields=%2A&field_statistics=false&positions=false&offsets=false&term_statistics=false&payloads=false
I cannot find a documentation on how to use the Multi Term Vector using NEST.

The Multi Term Vectors API within NEST does not expose the ability to set only Ids, it always assumes that you are passing "docs".
Even when passing
client.MultiTermVectors(mt => mt
.Index("123_original")
.Type("_doc")
.GetMany<object>(ids)
.Fields("*")
.Positions(false)
.Offsets(false)
.Payloads(false)
.TermStatistics(false)
.FieldStatistics(false)
);
The _index and _type for each id is inferred from object in GetMany<T>
POST http://localhost:9200/123_original/_doc/_mtermvectors?pretty=true&fields=*&positions=false&offsets=false&payloads=false&term_statistics=false&field_statistics=false
{
"docs": [
{
"_index": "users",
"_type": "object",
"_id": "9a271078-086f-4f4b-8ca0-16376c2f49a7"
},
{
"_index": "users",
"_type": "object",
"_id": "481ce3db-69bf-4886-9c38-fcb878d44925"
}
]
}
I think this could be exposed in a more consumable way within the client in the future.
The good news is that you can submit the exact query that you would like with the low level client exposed on IElasticClient, and still get back a high level response
MultiTermVectorsResponse response =
client.LowLevel.Mtermvectors<MultiTermVectorsResponse>("123_original", "_doc", PostData.Serializable(new
{
ids = ids,
parameters = new
{
fields = new[] { "*" },
positions = false,
offsets = false,
payloads = false,
term_statistics = false,
field_statistics = false
}
}));
which will send the following request:
POST http://localhost:9200/123_original/_doc/_mtermvectors?pretty=true
{
"ids": [
"9a271078-086f-4f4b-8ca0-16376c2f49a7",
"481ce3db-69bf-4886-9c38-fcb878d44925"
],
"parameters": {
"fields": [
"*"
],
"positions": false,
"offsets": false,
"payloads": false,
"term_statistics": false,
"field_statistics": false
}
}

Elasticsearch scoring and boosting query on match resultset

I have below type documents in elasticsearch.I want to make a query like when query result will match with particular records then it will automatically set according to its weightage.
{
"_id": 652,
"name":"jason rock",
"emails": [
{
"email_id": "angel#gmail.com",
"em_id": "1228"
},
{
"email_id": "rock#gmail.com",
"em_id": "1228"
}
],
"org": [
{
"org_job_title": "Manager",
"org_name": "OYO Rooms",
"org_id":2
},
{
"org_job_title": "QA Lead",
"org_name": "prime technology",
"org_id":1
}
],
"address": [
{
"add_id": "15067770698711",
"formatted_address": "TUCSON AZ"
},
{
"add_id": "15078034145004",
"formatted_address": "DRAGRAM SUITE"
}
]
}
I want query when any match found with name then it show at top after emails.email_id then after org.org_name,org.org_job_title and then address.formatted_address should match.
if match not found with name then according to above mentioned priority will display.Example if record not match with name and email.email_id but match with org.org_name that record should be on top of result.
If records presents in both,name for _id :102 and org.org_name for _ids:101 then _id:102 would appear at top.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Elastic search retrieving results with half the value - elasticsearch

Related

How to get fields inside message array from Logstash?

how to store my json log file to logstash with json filter

ElasticSearch - Get key of searched value

How to use MultiTermVectorsAsync

Elasticsearch scoring and boosting query on match resultset

Categories

Resources