Elastic Search single index vs multiple index - elasticsearch

I want to insert nested Structure in Elastic Search.
For Example :
[
{ "Product" : "P1",
"Desc" : "productDesc",
"Items":[{
"I1": "i1",
"I_desc" : "i1_desc",
"prices" :[{
"id" : "price1",
"value" : 10
},{
"id" : "price2",
"value" : 20
}]
},
{
"I2": "i2",
"I_desc" : "i2_desc",
"prices" :[{
"id" : "price1",
"value" : 10
},{
"id" : "price",
"value" : 20
}]
}]
},
{ "Product" : "P12",
"Desc" : "product2Desc",
"Items":[{
"I1": "i1",
"I_desc" : "i1_desc",
"prices" :[{
"id" : "price11",
"value" : 12
},{
"id" : "price12",
"value" : 10
}]
},{
"I2": "i3",
"I_desc" : "i3_desc",
"prices" :[{
"id" : "price11",
"value" : 12
},{
"id" : "price31",
"value" : 33
}]
}]
}
]
I want to insert similar to this nested structure in Elastic Serach with index pro and id = P1 and P12 (2 insert data).
Then query for the data like
1. Give me all Product Ids -> which has prices -> id = price11
2. All Products which has item = i1
Should I use single index to Id or index all the attributes like Item, productDesc, prices, id , value?

Related

Query for documents where field is largest

I have data such this
{
"UID": "a24asdb34-asd42ljdf-ikloewqr",
"createdById" : 1,
"name" : "name1",
"createDate" : 01.14.2019,
"latest" : 369
},
{
"UID": "a24asdb34-asd42ljdf-ikloewqr",
"createdById" : 1,
"name" : "name2",
"createDate" : 01.14.2019
"latest": 395
},
{
"UID": "a24asdb34-asd42ljdf-ikloewqr",
"createdById" : 1,
"name" : "name3",
"createDate" : 01.14.2019,
"latest" : 450
}
i need query which select the one element of document where field latest is greatest than such field in another document elements
Java code
#Query(value ="[ {$sort : {latest: -1}},{$limit : 1} ]",fields = "{ 'UID' : 1, 'name' : 1, createDate : 1}")
Page<MyObject> findByCreatedById(String userId, Pageable pageable);
db.orders.find(
[
{ $sort: { latest: -1 } },
{ $limit: 1 }
]
)
Sort in descending order of the latest field and limit the result size to 1.

Elasticsearch Top 10 Most Frequent Values In Array Across All Records

I have an index "test". Document structure is as shown below. Each document has an array of "tags". I am not able to figure out how to query this index to get top 10 most frequently occurring tags?
Also, what are the best practices one should follow if we have more than 2mil docs in this index?
{
"_index" : "test",
"_type" : "data",
"_id" : "1412879673545024927_1373991666",
"_score" : 1.0,
"_source" : {
"instagramuserid" : "1373991666",
"likes_count" : 163,
"#timestamp" : "2017-06-08T08:52:41.803Z",
"post" : {
"created_time" : "1482648403",
"comments" : {
"count" : 9
},
"user_has_liked" : true,
"link" : "https://www.instagram.com/p/BObjpPMBWWf/",
"caption" : {
"created_time" : "1482648403",
"from" : {
"full_name" : "PARAMSahib ™",
"profile_picture" : "https://scontent.cdninstagram.com/t51.2885-19/s150x150/12750236_1692144537739696_350427084_a.jpg",
"id" : "1373991666",
"username" : "parambanana"
},
"id" : "17845953787172829",
"text" : "This feature talks about how to work pastels .\n\nDull gold pullover + saffron khadi kurta + baby pink pants + Deep purple patka and white sneakers - Perfect colours for a Happy sunday christmas morning . \n#paramsahib #men #menswear #mensfashion #mensfashionblog #mensfashionblogger #menswearofficial #menstyle #fashion #fashionfashion #fashionblog #blog #blogger #designer #fashiondesigner #streetstyle #streetfashion #sikh #sikhfashion #singhstreetstyle #sikhdesigner #bearded #indian #indianfashionblog #indiandesigner #international #ootd #lookbook #delhistyleblog #delhifashionblog"
},
"type" : "image",
"tags" : [
"men",
"delhifashionblog",
"menswearofficial",
"fashiondesigner",
"singhstreetstyle",
"fashionblog",
"mensfashion",
"fashion",
"sikhfashion",
"delhistyleblog",
"sikhdesigner",
"indianfashionblog",
"lookbook",
"fashionfashion",
"designer",
"streetfashion",
"international",
"paramsahib",
"mensfashionblogger",
"indian",
"blog",
"mensfashionblog",
"menstyle",
"ootd",
"indiandesigner",
"menswear",
"blogger",
"sikh",
"streetstyle",
"bearded"
],
"filter" : "Normal",
"attribution" : null,
"location" : null,
"id" : "1412879673545024927_1373991666",
"likes" : {
"count" : 163
}
}
}
},
If your tags type in mapping is object (which is by default) you can use an aggregation query like this:
{
"size": 0,
"aggs": {
"frequent_tags": {
"terms": {"field": "post.tags"}
}
}
}

Aggregating on generic nested array in Elasticsearch with NEST

I'm trying to analyse data with Elasticsearch. I've started working with Elasticsearch and Nest about four months ago, so I might have missed some obvious stuff. All examples are simplified or altered, but the core is the same.
The data contains an array of nested objects, each of which also contain an array of nested objects, and again, each contains an array of nested objects. The data is obtained from an information request which contains XML messages. The messages are parsed and each element containing (multiple) text elements is saved with their element name, location, and an array with all text element names and values under the message name. I'm thinking this set-up might make analyzing the data easier.
Mapping example:
{
"data" : {
"properties" : {
"id" : { "type" : "string" },
"action" : { "type" : "string" },
"result" : { "type" : "string" },
"details" : {
"type" : "nested",
"properties" : {
"description" : { "type" : "string" },
"message" : {
"type" : "nested",
"properties" : {
"name" : { "type" : "string" },
"nodes" : {
"type" : "nested",
"properties" : {
"name" : { "type" : "string" },
"value" : { "type" : "string" }
}
},
"source" : { "type" : "string" }
}
}
}
}
}
}
}
Data example:
{
"id" : "123456789",
"action" : "GetInformation",
"result" : "Success",
"details" : [{
"description" : "Request",
"message" : [{
"name" : "Body",
"source" : "Message|Body",
"nodes" : [{
"name" : "Action",
"value" : "GetInformation"
}, {
"name" : "Identity",
"value" : "1234"
}
]
}
]
}, {
"description" : "Response",
"message" : [{
"name" : "Object",
"source" : "Message|Body|Object",
"nodes" : [{
"name" : "ID",
"value" : "123"
}, {
"name" : "Name",
"value" : "Jim"
}
]
}, {
"name" : "Information",
"source" : "Message|Body|Information",
"nodes" : [{
"name" : "Type",
"value" : "Birth City"
}, {
"name" : "City",
"value" : "Los Angeles"
}
]
}, {
"name" : "Information",
"source" : "Message|Body|Information",
"nodes" : [{
"name" : "Type",
"value" : "City of Residence"
}, {
"name" : "City",
"value" : "New York"
}
]
}
]
}
]
}
XML Example:
<Message>
<Body>
<Object>
<ID>123</ID>
<Name>Jim</Name>
</Object>
<Information>
<Type>Birth City</Type>
<City>Los Angeles</City>
<Information>
<Information>
<Type>City of Residence</Type>
<City>New York</City>
<Information>
</Body>
</Message>
I want to analyse the Name and Value properties of Nodes so I can get an overview of each city within the index that functions as a birthplace and how many people were born in them. Something like:
Dictionary<string, int> birthCities = {
{"Los Angeles", 400}, {"New York", 800},
{"Detroit", 500}, {"Michigan", 700} };
The code I have so far:
var response = client.Search<Data>(search => search
.Query(query =>
query.Match(match=> match
.OnField(data=>data.Action)
.Query("GetInformation")
)
)
.Aggregations(a1 => a1
.Nested("Messages", messages => messages
.Path(data => data.details.FirstOrDefault().Message)
.Aggregations(a2 => a2
.Terms("Sources", termSource => termSource
.Field(data => data.details.FirstOrDefault().Message.FirstOrDefault().Source)
.Aggregations(a3 => a3
.Nested("Nodes", nodes => nodes
.Path(dat => data.details.FirstOrDefault().Message.FirstOrDefault().Nodes)
.Aggregations(a4 => a4
.Terms("Names", termName => termName
.Field(data => data.details.FirstOrDefault().Message.FirstOrDefault().Nodes.FirstOrDefault().Name)
.Aggregations(a5 => a5
.Terms("Values", termValue => termValue
.Field(data => data.details.FirstOrDefault().Message.FirstOrDefault().Nodes.FirstOrDefault().Value)
)
)
)
)
)
)
)
)
)
)
);
var dict = new Dictionary<string, long>();
var sAggr = response.Aggs.Nested("Messages").Terms("Sources");
foreach (var item in sAggr.Items)
{
if (item.Key.Equals("information"))
{
var nAggr = item.Nested("Nodes").Terms("Names");
foreach (var nItem in nAggr.Items)
{
if (nItem.Key.Equals("city"))
{
var vAgg = nItem.Terms("Values");
foreach (var vItem in vAgg.Items)
{
if (!dict.ContainsKey(vItem.Key))
{
dict.Add(vItem.Key, 0);
}
dict[vItem.Key] += vItem.DocCount;
}
}
}
}
}
This code gives me every city and how many times they occur, but since they're saved with the same element name and at the same location (both of which I'm not able to change), I've found no way to distinguish between birth cities and cities of residence.
Specific types for each action are sadly not an option. So my question is: How can I count all occurrences of a city name with Birth City type, preferably without having to import and go through all documents.

Update object in array with new fields mongodb

ai have some mongodb document
horses is array with id, name, type
{
"_id" : 33333333333,
"horses" : [
{
"id" : 72029,
"name" : "Awol",
"type" : "flat",
},
{
"id" : 822881,
"name" : "Give Us A Reason",
"type" : "flat",
},
{
"id" : 826474,
"name" : "Arabian Revolution",
"type" : "flat",
}
}
I need to add new fields
I thought something like that, but I did not go to his head
horse = {
"place" : 1,
"body" : 11
}
Card.where({'_id' => 33333333333}).find_and_modify({'$set' => {'horses.' + index.to_s => horse}}, upsert:true)
But all existing fields are removed and inserted new how to do that would be new fields added to existing
Indeed, this command will overwrite the subdocument
'$set': {
'horses.0': {
"place" : 1,
"body" : 11
}
}
You need to set individual fields:
'$set': {
'horses.0.place': 1,
'horses.0.body': 11
}

Dynamic fields and slow queries

Currently, I'm managing a set of lists containing a number of members.
Every list can look different, when it comes to fields and the naming of these fields.
Typically, a basic list member could look like so (from my members collection):
{
"_id" : ObjectId("52284ae408edcb146200009f"),
"list_id" : 1,
"status" : "active",
"imported" : 1,
"fields" : {
"firstname" : "John",
"lastname" : "Doe",
"email" : "john#example.com",
"birthdate" : ISODate("1977-09-03T23:08:20.000Z"),
"favorite_color" : "Green",
"interests" : [
{
"id" : 8,
"value" : "Books"
},
{
"id" : 10,
"value" : "Travel"
},
{
"id" : 12,
"value" : "Cooking"
},
{
"id" : 15,
"value" : "Wellnes"
}
]
},
"created_at" : ISODate("2012-05-06T15:12:26.000Z"),
"updated_at" : ISODate("2012-05-06T15:12:26.000Z")
}
All the fields under the "fields" index, is fields that is unique for the current list id - and these fields can change for every list ID, which means a new list could look like so:
{
"_id" : ObjectId("52284ae408edcb146200009f"),
"list_id" : 2,
"status" : "active",
"imported" : 1,
"fields" : {
"fullname" : "John Doe",
"email" : "john#example.com",
"cell" : 123456787984
},
"created_at" : ISODate("2012-05-06T15:12:26.000Z"),
"updated_at" : ISODate("2012-05-06T15:12:26.000Z")
}
Currently, my application is allowing users to search dynamically in each of the customs fields, but since they have no indexes, this process can be very slow.
I don't believe it's an option to allow list creaters to select which fields should be indexed - but I really need to speed this up.
Is there any solution for this?
If you refactor your documents in a way that you have an array of fields, you can leverage indexes.
fields: [
{ name: 'fullName', value: 'John Doe' },
{ name: 'email', value: 'john#example.com' },
...
]
Create an index on fields.name and fields.value.
Of course this is not a solution for "deeper" values like your interests list.

Resources