How to de-normalize the relational data in Elasticsearch - elasticsearch

I am working on social networking application and I am using elasticsearch for service data.I have multiple joins in elasticsearch. Users can share the posts and each post has one parent user. I have a scenario than I have shown posts of those users whose you follow.
Type Post
{
"_index" : "xxxxxx",
"_type" : "_doc",
"_id" : "p-370648",
"_score" : null,
"_routing" : "2",
"_source" : {
"uid" : "9a73b0e0-a52c-11ec-aa58-37061b467b8c",
"user_id" : 87,
"id" : 370648,
"type" : {
"parent" : "u-87",
"name" : "post"
},
"item_type_number" : 2,
"source_key" : "youtube-5wcpIrpbvXQ#2"
}
}
Type User
{
"_index" : "trending",
"_type" : "_doc",
"_id" : "u-56432",
"_score" : null,
"_routing" : "1",
"_source" : {
"gender" : "female",
"picture" : "125252125.jpg",
"uid" : "928de1a5-cc93-4fd3-adec-b9fb220abc2b",
"full_name" : "Shannon Owens",
"dob" : "1990-08-18",
"id" : 56432,
"username" : "local_12556",
"type" : {
"name" : "user"
},
},
}
Type Follow
{
"_index" : "trending",
"_type" : "_doc",
"_id" : "fr-561763",
"_score" : null,
"_routing" : "6",
"_source" : {
"user_id" : 25358,
"id" : 561763,
"object_id" : 36768,
"status" : "U",
"type" : {
"parent" : "u-36768",
"name" : "followers"
},
}
}
So in this scenario if user follow someone then we save record in elasticsearch with object_id following user and user_id who follow the user and type "followers", and on the other hand each post has one parent user. So when I try to fetch posts from elasticsearch with type post so then I need to put two level joins to fetch posts.
First one for post parent with user and second for checking following status with user. This query work good when there is no traffic on system. But when traffic comes on system send consurrent requests then the elasticsearch query gets down due to processing even I try to fix this issue with high server with higher performance and CPU/Ram but still facing fall down.
So I decided to denormalize the type post data but the problem is that I am failed to check the following status with post.
Because If I do another query from DB and use some caching then I facing memory exaust issue when thousand of following users data come in query. So is there any way that I can check the following directly in following with type posts instead of adding parent join in query.

Related

Check documents not existing at elasticsearch

I have millions of indexed documents. after indexing I figured that there is an document count mismatch. i want to send array of hundreds of document ids and search at Elastic search if those document ids exists?. and in response get ids that has not Indexed.
example:
these are indexed documents
[497499, 497550, 498370, 498476, 498639, 498726, 498826, 500479, 500780, 500918]
I'm sending 4 at a time
[497599, 88888, 497550, 77777]
response should be whats not at there
[88888, 77777]
You should consider using the _mget endpoint and then parse the result like for instance :
GET someidx/_mget?_source=false
{
"docs" : [
{
"_id" : "c37m5W4BifZmUly9Ni-X"
},
{
"_id" : "2"
}
]
}
Result :
{
"docs" : [
{
"_index" : "someidx",
"_type" : "_doc",
"_id" : "c37m5W4BifZmUly9Ni-X",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true
},
{
"_index" : "someidx",
"_type" : "_doc",
"_id" : "2",
"found" : false
}
]
}

Kibana - given an index, how to find saved objects relying on it?

In Kibana I have many dozens of indices.
Given one of them, I want a way to find all the saved objects (searches/dashboards/visualizations) that rely on this index.
Thanks
You can retrieve the document ID of your index pattern and then use that to search your .kibana index
{
"_index" : ".kibana",
"_type" : "index-pattern",
"_id" : "AWBWDmk2MjUJqflLln_o", <---- take this id...
You can use this query on Kibana 5:
GET .kibana/_search?q=AWBWDmk2MjUJqflLln_o <---- ...and use it here
You'll find your visualizations:
{
"_index" : ".kibana",
"_type" : "visualization",
"_id" : "AWBZNJNcMjUJqflLln_s",
"_score" : 6.2450323,
"_source" : {
"title" : "CA groupe",
"visState" : """{"title":"XXX","type":"pie","params":{"addTooltip":true,"addLegend":true,"legendPosition":"right","isDonut":false,"type":"pie"},"aggs":[{"id":"1","enabled":true,"type":"sum","schema":"metric","params":{"field":"XXX","customLabel":"XXX"}},{"id":"2","enabled":true,"type":"terms","schema":"segment","params":{"field":"XXX","size":5,"order":"desc","orderBy":"1","customLabel":"XXX"}}],"listeners":{}}""",
"uiStateJSON" : "{}",
"description" : "",
"version" : 1,
"kibanaSavedObjectMeta" : {
"searchSourceJSON" : """{"index":"AWBWDmk2MjUJqflLln_o","query":{"match_all":{}},"filter":[]}"""
^
|
this is where your index pattern is used
}
}
},

ElasticSearch Bulk with ingest plugin

I am using the Attachment Processor Attachment Processor in a Pipeline.
All work fine, but i wanted to do multiple post, then I tried to used bulk API.
Bulk work fine too, but I can't find how to send the url parameter "pipeline=attachment".
this put works :
POST testindex/type1/1?pipeline=attachment
{
"data": "Y291Y291",
"name" : "Marc",
"age" : 23
}
this bulk works :
POST _bulk
{ "index" : { "_index" : "testindex", "_type" : "type1", "_id" : "2" } }
{ "name" : "jean", "age" : 22 }
But how can I index Marc with his data field in bulk to be understood by the pipeline plugin?
thanks to Val comment, I did that and it work fine:
POST _bulk
{ "index" : { "_index" : "testindex", "_type" : "type1", "_id" : "2", "pipeline": "attachment"} } }
{"data": "Y291Y291", "name" : "jean", "age" : 22}

Is the order of operations guaranteed in a bulk update?

I am sending delete and index requests to elasticsearch in bulk (the example is adapted from the docs):
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
The sequence above is intended to first delete a possible document with _id=1, then index a new document with the same _id=1.
Is the order of the actions guaranteed? In other words, for the example above, can I be sure that the delete will not touch the document indexed afterwards (because the order would not be respected for a reason or another)?
The delete operation is useless in this scenario, if you simply index a document with the same ID, it will automatically and implicitly delete/replace the previous document with the same ID.
So if document with ID=1 already exists, simply sending the below command will replace it (read delete and re-index it)
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
According to an Elastic Team Member:
Elasticsearch is distributed and concurrent. We do not guarantee that requests are executed in the order they are received.
https://discuss.elastic.co/t/are-bulk-index-operations-serialized/83770/6

Cannot update path in timestamp value

Here is my problem, I'm trying to insert a bunch of data into elastic search and to vizualize it using kibana, however I got an issue with kibana timestamp recognition.
My time field is called "dateStart", and I tried to use it as a timestamp using the following command :
curl -XPUT 'localhost:9200/test/type1/_mapping' -d'{ "type1" :{"_timestamp":{"enabled":true, "format":"yyyy-MM-dd HH:mm:ss","path":"dateStart"}}}'
But this command give me the following error message :
{"error":"MergeMappingException[Merge failed with failures {[Cannot update path in _timestamp value. Value is null path in merged mapping is missing]}]","status":400}
I'm not sure to understand what I do with this command, but what I would like to do is telling to elastic search and kibana to use my "dateStart" field as a timestamp.
Here is a sample of my insert file (I use bulk insert) :
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1"} }
{ "dateStart" : "15-03-31 06:00:00", "score":0.9920092243874442}
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "2"} }
{ "dateStart" : "15-03-23 06:00:00", "score":0.0}
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "3"} }
{ "dateStart" : "15-03-29 12:00:00", "score":0.0}

Resources