On searching with routing it giving me data of different routing keys also.
Please help me out.
I have set routing in v2.0 and i queried query with routing key,below is example:
GET myindex/mytype/_search?routing=5
{
"query": {
"match_all": {}
}
}
I have search data with routing key=5 but output i get was:
hits": [
{
"_index": "goqii",
"_type": "nazar",
"_id": "2047",
"_score": 1,
"_routing": "10",
"_source": {
"userId": "111239",
"activityId": "765982",
"activityUserId": "111239",
"activityType": "water",
"commentText": "kinu juice",
"status": "delievered",
"createdTime": "2016-01-13 13:28:54"
}
},
{
"_index": "goqii",
"_type": "nazar",
"_id": "2046",
"_score": 1,
"_routing": "5",
"_source": {
"userId": "110554",
"activityId": "251449",
"activityUserId": "110554",
"activityType": "activity",
"commentText": "did home cycling yesterday for 20mins",
"status": "delievered",
"createdTime": "2016-01-13 12:04:31"
}
}
It is giving me routing key:5 & routing key:10 boths data.please help me out if i am doing something wrong.
Routing doesnt gaurantee that all the items on a shard will have the same routing key, it will make sure that all the docs that share a routing key are all on the same shard.
Routing is the process of determining which shard that document will reside in.
How many shards do you have in your index? It defaults to 5.
The scenario seems that you have the default number of shards in your index.
Routing scheme hashes the ID of a document and uses that to find a
shard. Routing ensures that documents of a particular routing value all go to the same shard…but that doesn’t mean that other documents aren’t routed to the shard too.
In your case both userIds "110554" and "111239" may be getting assigned to the same shard, hence the behavior.
Related
My events in Elasticsearch look something like that (simplified version):
{
"_index": "greatest_index-2023.01",
"_type": "_doc",
"_id": "5BQ8yIUBtpR1CBn8kFyo",
"_version": 1,
"_score": 0,
"_source": {
"#version": "1",
"#timestamp": "2023-01-18T09:35:50.251Z",
"id": "4e80c00dd8e003c8",
"action": "action1"
},
"fields": {
"#timestamp": [
"2023-01-18T09:35:50.251Z"
]
}
}
Basically, the "id" field is common to multiple events. Each id goes through a few "action" field values through time (action1, action2, action3) - only once for each action value.
I'm trying to create a visualization in Kibana that would display the actions each id went through.
If it were a table, it could look something like this :
id
actions
5BQ8yIUBtpR1CBn8kFyo
action1, action 2
pISQ9VDSJVlkqklv9VQ9
action1
cohqBHSQC85AHB67AB2h
action1, action 2, action 3
I tried to use Transforms in the Elasticsearch section of Kibana (v 7.5.0), but it doesn't seem to be the right way.
How would you recommend doing that ?
I search for key word machine4 in my ES . My python client is simply:
result = es.search('machine4', index='machines')
Result look like this
[
{
"_score": 0.13424811,
"_type": "person",
"_id": "2",
"_source": {
"date": "**20180601**",
"deleted": [],
"changed": [
"machine1",
"machine2",
"machine3"
],
"**added**": [
"**machine4**",
"machine5"
]
},
"_index": "contacts"
},
{
"_score": 0.13424811,
"_type": "person",
"_id": "3",
"_source": {
"date": "**20180701**",
"deleted": [
"machine2"
],
"**changed**": [
"machine1",
"**machine4**",
"machine3"
],
"added": [
"machine7"
]
},
"_index": "contacts"
}
]
So we can easily see:
In date 20180601 , machine4 belonged to added.
In date 20180701 , machine4 belonged to changed.
I can write another function to analyze the result. Basically loop through every key,value of each items and check if the searched keyword belong, like this:
for result in search_results['hits']['hits']:
source_result = result['_source']
for key,value in source_result.items():
if 'machine4' in value:
print key
However, I wonder if ES having API to detect which key/mapping/field that the searched keywords belonged to ? In this case is added of the 1st result, and changed in 2nd result
Thank you so much
Alex
The simple answer seems to be that no, Elasticsearch doesn't have a way to do this out of the box, because Lucene doesn't have it, as per this thread
Elasticsearch has the concept of highlights, however. These could be useful, but they do require you to have some idea about which fields the match may be in.
The ES Python search documentation suggests there's no way to do that as a parameter to search, but you could create a custom query and pass it on as the q argument. It would look something like:
q = {"query" : {"match": { "content": "'machine4'" }}, "highlight" : {"fields" : {"added" : {}, "updated": {}}}}
result = es.search(index='machines', q=q)
Hope this is helpful!
I have a data in elasticsearch which has a time field as Created. Below is the data:
{
"_index": "machine",
"_type": "health",
"_id": "30",
"_score": 1,
"_source": {
"Data": {
"DataId": "46",
"Created": "2018-06-11T07:31:33.739575"
},
"Datacount": 2,
"hostname": "username",
"health": "running"
}
}
As in the above data, I am using Data.Created as my time field in the elasticsearch. Now I want to query the data for which I open the Dev Tools and enter the below command:
GET machine/health/_search?
This gives me all the data belonging to index as machine and type as health. How can I sort this data to descending order on the basis of Data.Created so that the latest data should come first. Also with this, how can we only get data between two time range.?
Thanks
Simply add the sort query string parameter
GET machine/health/_search?sort=Data.Created:desc
In order to add a range you can do it like this:
GET machine/health/_search?sort=Data.Created:desc&q=Data.Created:[2018-06-10T00:00:00 TO 2018-06-11T00:00:00]
I'm trying out the new machine learning module in x pack. I'm trying to identify rare response codes in HTTP Access logs in time. My logs are being stored in elasticsearch as below:
{
"_index": "logstash-2017.05.18",
"_type": "Accesslog",
"_id": "AVxvVfFGdMmRr-0X-J5P",
"_version": 1,
"_score": null,
"_source": {
"request": "/web/Q123/images/buttons/asdf.gif",
"server": "91",
"auth": "-",
"ident": "-",
"verb": "GET",
"type": "Accesslog",
"path": "/path/to/log",
"#timestamp": "2017-05-18T10:20:00.000Z",
"response": "304",
"clientip": "1.1.1.1",
"#version": "1",
"host": "ip-10-10-10-10",
"httpversion": "1.1",
"timestamp": "18/May/2017:10:20:00 +0530"
},
"fields": {
"#timestamp": [
1495102800000
]
}
I added a detector where I selected the function as 'rare' and the by_field_name' as 'response'. But when I save the job I get the following error:
Save failed: [illegal_argument_exception] Can't merge a non object mapping [response] with an object mapping [response]
Please help.
The error message means that you are trying to change an existing mapping. However, that is not possible in Elasticsearch. Once a mapping has been created, it cannot be changed.
As explained by Shay Banon himself:
You can't change existing mapping type, you need to create a new index
with the correct mapping and index the data again.
So you must create a new index to create this mapping. Depending on the situation, you either
create an additional index, or
delete the current index and re-create it from scratch.
Of course in the latter case you will lose all data in the index, so prepare accordingly.
I illustrate my question with an example.
Assume having a type named "post". Each post has a userid field which indicates its author. This is how the post is stored:
{
"_index": "test",
"_type": "post",
"_id": "10098259467307",
"_score": 1,
"_source": {
"text": "user 1 message",
"userid": 1,
"id": 10098259467307,
}}
Is there a way to extract the relationship between the users based on the co-occurrence of terms within the "text" field of posts which they have authored?
For example, if there are two posts containing the word "elastic", what I would like to see is an edge between the posts' authors.