Is there a preferred way to join two Elasticsearch indices so that I can sort my query?
Index #1
// GET /activities/_doc/1aadea40-e93b-42b4-9c76-05ebed4335fe (simplified output)
{
"_index" : "activities-1605040906149",
"_type" : "_doc",
"_id" : "1aadea40-e93b-42b4-9c76-05ebed4335fe",
"_source" : {
"date" : 1614286078420,
"activityId" : "1aadea40-e93b-42b4-9c76-05ebed4335fe",
"referralId" : "943f6d94-b2dd-4e89-9383-447fdd1d73d8",
"duration" : 90
}
}
Index #2
// GET /referrals/_doc/2c022a6e-2543-4cdd-8595-98aea41e8966 (simplified output)
{
"_index" : "referrals-1612984843755",
"_type" : "_doc",
"_id" : "2c022a6e-2543-4cdd-8595-98aea41e8966",
"_source" : {
"displayName" : "JOHN DOE",
"referralId" : "2c022a6e-2543-4cdd-8595-98aea41e8966",
}
}
I’d like to be able to join the contents of the referrals index with the contents of my activities index and then sort based on the referral’s displayName. I would need to do this for tens of thousands of records.
Other solutions include denormalizing my data, but I was hoping to see if there was an alternative way.
you can do joins by following ways
nested structure : map using an object ,no need of joins
parent-child relationship : use a 'type' field to distinguish you documents using different type for a same index ,indexing parent and child into the same shard,hence query will be limited to single shard
application side joins : For example,add activities fields directly in the referrals documents, allowing you to search/query on them directly.
then you can sort based on the referral’s field
Related
I am working on social networking application and I am using elasticsearch for service data.I have multiple joins in elasticsearch. Users can share the posts and each post has one parent user. I have a scenario than I have shown posts of those users whose you follow.
Type Post
{
"_index" : "xxxxxx",
"_type" : "_doc",
"_id" : "p-370648",
"_score" : null,
"_routing" : "2",
"_source" : {
"uid" : "9a73b0e0-a52c-11ec-aa58-37061b467b8c",
"user_id" : 87,
"id" : 370648,
"type" : {
"parent" : "u-87",
"name" : "post"
},
"item_type_number" : 2,
"source_key" : "youtube-5wcpIrpbvXQ#2"
}
}
Type User
{
"_index" : "trending",
"_type" : "_doc",
"_id" : "u-56432",
"_score" : null,
"_routing" : "1",
"_source" : {
"gender" : "female",
"picture" : "125252125.jpg",
"uid" : "928de1a5-cc93-4fd3-adec-b9fb220abc2b",
"full_name" : "Shannon Owens",
"dob" : "1990-08-18",
"id" : 56432,
"username" : "local_12556",
"type" : {
"name" : "user"
},
},
}
Type Follow
{
"_index" : "trending",
"_type" : "_doc",
"_id" : "fr-561763",
"_score" : null,
"_routing" : "6",
"_source" : {
"user_id" : 25358,
"id" : 561763,
"object_id" : 36768,
"status" : "U",
"type" : {
"parent" : "u-36768",
"name" : "followers"
},
}
}
So in this scenario if user follow someone then we save record in elasticsearch with object_id following user and user_id who follow the user and type "followers", and on the other hand each post has one parent user. So when I try to fetch posts from elasticsearch with type post so then I need to put two level joins to fetch posts.
First one for post parent with user and second for checking following status with user. This query work good when there is no traffic on system. But when traffic comes on system send consurrent requests then the elasticsearch query gets down due to processing even I try to fix this issue with high server with higher performance and CPU/Ram but still facing fall down.
So I decided to denormalize the type post data but the problem is that I am failed to check the following status with post.
Because If I do another query from DB and use some caching then I facing memory exaust issue when thousand of following users data come in query. So is there any way that I can check the following directly in following with type posts instead of adding parent join in query.
PS: I'm new to elasticsearch
http://localhost:9200/indexname/domains/<mydocname>
Let's suppose we have indexname as our index and i'm uploading a lot of documents at <mydoc> with domain names ex:
http://localhost:9200/indexname/domains/google.com
http://localhost:9200/indexname/domains/company.com
Looking at http://localhost:9200/indexname/_count , says that we have "count": 119687 amount of documents.
I just want my elastic search to return the document names of all 119687 entries which are domain names.
How do I achieve that and is it possible to achieve that in one single query?
Looking at the example : http://localhost:9200/indexname/domains/google.com I am assuming your doc_type is domains and doc id/"document name" is google.com.
_id is the document name here which is always part of the response. You can use source filtering to disable source and it will show only something like below:
GET indexname/_search
{
"_source": false
}
Output
{
...
"hits" : [
{
"_index" : "indexname",
"_type" : "domains",
"_id" : "google.com",
"_score" : 1.0
}
]
...
}
If documentname is a field that is mapped, then you can still use source filtering to include only that field.
GET indexname/_search
{
"_source": ["documentname"]
}
I am trying to get the make mapping for different companies using elastic search in which different companies are defined in the type field and I want to fetch all the makes which occur in a particular company based on which I want to fetch makes makes of other companies common to first search query company.
I am new to elastic search want some assistance with this.
I have tried segregating the problem using elastic search filter and aggregations but still not able to get the values required
DATA for vehicle mappings is:-
{
"_index" : "vehiclemapping",
"_id" : "fN1P-GwBjuCNVtK7BNxL",
"company":"abc1",
"make":"make1"
},
{
"_index" : "vehiclemapping",
"_id" : "fN1P-GwBjuCNVtK7BNx2",
"company":"abc2",
"make":"make2"
},
{
"_index" : "vehiclemapping",
"_id" : "fN1P-GwBjuCNVtK7BNx3",
"company":"abc3",
"make":"make3"
},{
"_index" : "vehiclemapping",
"_id" : "fN1P-GwBjuCNVtK7BNx4",
"company":"abc1",
"make":"make2"
},
{
"_index" : "vehiclemapping",
"_id" : "fN1P-GwBjuCNVtK7BNx5",
"company":"abc2",
"make":"make1"
},
{
"_index" : "vehiclemapping",
"_id" : "fN1P-GwBjuCNVtK7BNx6",
"company":"abc2",
"make":"make3"
}
I am expecting to get result all documents having company = 'abc1' along with other documents having abc1 company makes.
EXPECTED OUTPUT:-
{
"make":"make1"
},
{"make":"make2"},
{"make":"make3"}
I have logs captured in elastic index, the variable "message" in an index holds entire log message. I wanted to split that data into multiple fields like timstamp, ip etc.
Note: The logs are pumped directly into elastic from our application using POST.
I have created grok to split this information, but i am not sure how to transform this on the fly.
{
"_index" : "logs_exception",
"_type" : "_doc",
"_id" : "9RI-BGoBwdzZ5ffB3_Sj",
"_score" : 2.4795628,
"_source" : {
"CorrelationId" : "bd3fc7d6-ca39-44e1-9a59-xxasdasd1",
"Message" : "2019-04-10 10:36:27,780 [8] ERROR LoggingService.TestConsole.Program [(null)] - System.AppDomainUnloadedException: Attempted to access an unloaded AppDomain."
}
can we create a pipeline in elastic to feed from one of the index and apply grok and push it back to another index? or whats the best way to do this?
The best way to do is to configure the Ingest node to pre process your documents before indexing it in to es.
In your case you need a Grok Processor to match the message field and separate it in to fields, Below is a sample pipeline definition with Grok Processor to ingest your document in to elastic
{
"description" : "...",
"processors": [
{
"grok": {
"field": "message",
"patterns": ["%{DATESTAMP:timestamp}%{SPACE}%{SPACE}\[(?<misc1>.*)\]%{SPACE}%{WORD:loglevel}%{SPACE}%{JAVACLASS:originator}%{SPACE}\[(?<misc2>.*)\]%{SPACE}%{GREEDYDATA:data}"]
}
}
]
}
With the above pipeline definition in place your data will be indexed as below.
{
"_index" : "logs_exception",
"_type" : "_doc",
"_id" : "9RI-BGoBwdzZ5ffB3_Sj",
"_score" : 2.4795628,
"_source" : {
"CorrelationId" : "bd3fc7d6-ca39-44e1-9a59-xxasdasd1",
"timestamp" : "19-04-10 10:36:27,780",
"misc1" : 8,
"loglevel": ERROR,
"originator": "LoggingService.TestConsole.Program",
"misc2": (null),
"data" : "- System.AppDomainUnloadedException: Attempted to access an unloaded AppDomain.",
"Message" : "2019-04-10 10:36:27,780 [8] ERROR LoggingService.TestConsole.Program [(null)] - System.AppDomainUnloadedException: Attempted to access an unloaded AppDomain."
}
You can make use of json filter:
filter {
json => {
source=>message
target=>event
}
}
I am sending delete and index requests to elasticsearch in bulk (the example is adapted from the docs):
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
The sequence above is intended to first delete a possible document with _id=1, then index a new document with the same _id=1.
Is the order of the actions guaranteed? In other words, for the example above, can I be sure that the delete will not touch the document indexed afterwards (because the order would not be respected for a reason or another)?
The delete operation is useless in this scenario, if you simply index a document with the same ID, it will automatically and implicitly delete/replace the previous document with the same ID.
So if document with ID=1 already exists, simply sending the below command will replace it (read delete and re-index it)
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
According to an Elastic Team Member:
Elasticsearch is distributed and concurrent. We do not guarantee that requests are executed in the order they are received.
https://discuss.elastic.co/t/are-bulk-index-operations-serialized/83770/6