How to index thousands sub-objects?

How to index thousands sub-objects? - elasticsearch

I've somethinh like it:
MainObject
~3000x SubObjects
Each sub ojects have ~2 SubSubObjects
Idem: ~1 SubSubSubObject
For each subOject I need to get a mainObject information (array of integer), for the moment when I had the MainObject in database with all its subObjects (via command in console) I duplicate the array for all objects (thousands duplication...) when I need to edit this array, I re-index all again... I'm sure I can do it better.
In the document I've see it exists many possibilities: object, nested, parent/child... But I don't really know which is the better...
And in an other post, someone explain me how to do with nested document, with aggregation... But I can't do it... And more I read, more I've doubt about the nested method...
Thank you for your help
Edit, simplified arborescence in JSON of my entities (in Doctrine)
{
"public": false,
"authorized_users": [1, 23, 51],
"chromosomes": [
{
"name": "C1",
"locus": [
{
"name": "locus1",
"features": [
{
"name": "feature1",
"products": [
{
"name": "product1"
//...
}
]
}
]
}
]
}
]
}
And I just do search on name for locus, features and products, but with a filter on public and authorized_users, thats why I do objects like (in Elasticsearch):
{
"_type": "locus",
"name": "locus1",
"public": false,
"authorized_users": [1, 23, 51],
},
{
"_type": "locus",
"name": "locus2",
"public": false,
"authorized_users": [1, 23, 51],
}
{
"_type": "feature",
"name": "feature1",
"public": false,
"authorized_users": [1, 23, 51],
}

Related

Laravel intertwined relationship

while a user A is editing the permissions of user B, user A needs to see both the permissions it has and the permissions that user B has. For this, I thought of something like this and added something like and it gives me a nice output yes!
Controller:
PermissionCategoryResource::collection(PermissionCategory::with([
'permissions' => fn ($query) => $query->whereHas('adminUsers', fn ($query) => $query->where('admin_users.id', $this->user()->id)),
'selected' => fn ($query) => $query->whereHas('adminUsers', fn ($query) => $query->where('admin_users.id', $id)),
])
->select('id','name')
->get());
output:
{
"id": 2,
"name": "user",
"permissions": {
"permissions": [
[{
"id": 2,
"name": "userCreate"
},
{
"id": 3,
"name": "userUpdate"
},
{
"id": 4,
"name": "userDelete"
}
]
],
"selected": [
[{
"id": 2,
"name": "userCreate"
},
{
"id": 3,
"name": "userUpdate"
}
]
]
}
},
selected: The permissions of user B, edited by user A.
However, there is a situation like this. In order to compare the permissions in permissions with the permissions in selected, I need to put them both in a foreach loop. I don't like using a nested foreach loop. And I think Laravel has a solution for this. I'm new to Laravel and I'm trying to learn something so forgive me. Actually, I want an output like this. Let's say we loop the permissions in Permissions. Inside the loop: Does the permission in Permissions also exist in selected ? If it exists, I need to give the selected: true key and value to its permission in Permissions. So, to explain briefly, it is as follows:
{
"id": 2,
"name": "user",
"permissions": {
"permissions": [
[{
"id": 2,
"name": "userCreate"
"selected":true
},
{
"id": 3,
"name": "userUpdate"
"selected":true
},
{
"id": 4,
"name": "userDelete"
}
]
],
"selected": [
[{
"id": 2,
"name": "userCreate"
},
{
"id": 3,
"name": "userUpdate"
}
]
]
}
},
yes, it is better to explain with this example.
I tried this with resources and array map but failed. Do you have a solution suggestion for this issue?

How to cleanly batch queries together in Gremlin

I am writing a GraphQL resolver that retrieves all vertices by a particular edge using the following query (created returns label person):
software {
created {
name
}
}
Which would resolve to the following Gremlin Query for each software node found:
g.V().hasLabel('software').has('name', 'ripple').in('created')
This returns a result that includes all properties of the object:
{
"result": [
{
"#type": "d",
"#rid": "#24:0",
"#version": 6,
"#class": "person",
"in_knows": [
"#35:0"
],
"name": "josh",
"out_created": [
"#32:0",
"#33:0"
],
"age": 32,
"#fieldTypes": "in_knows=g,out_created=g"
}
],
"dbStats": {
...
}
}
I realize that this will fall foul on GraphQL's N+1 query so i'm trying to batch queries together using a Dataloader pattern. (i'm also hoping to do property selections, so i'm not asking the database to return too much info)
So i'm trying to craft a query like so:
g.V().union(
__.hasLabel('software').has('name', 'ripple').
project('parent', 'child').by('id').
by(__.in('created').fold()),
__.hasLabel('software').has('name', 'lop').
project('parent', 'child').by('id').
by(__.in('created').fold())
)
But this results in the following where the props are missing and it just includes the id of the vertices I want:
{
"result": [
{
"parent": "ripple",
"child": [
"#24:0"
]
},
{
"parent": "lop",
"child": [
"#22:0",
"#23:0",
"#24:0"
]
}
],
"dbStats": {
...
}
}
My Question is, how can I have the Gremlin query return all of the props for the found vertices and none of the other props? Should I even been doing batching this way?

For anyone else reading, the query I was trying to write wouldn't work because the TraversalSet created in the .by(_.in('created') can't be cast from a List to an ElementMap as the stream cardinality wouldn't be enforced. (You can only have one record per row, I think?)
My working query would be to duplicate the keys for each row and specify the props needed (the query below is ok for gremlin 3.3 as used in ODB, otherwise if you've got < gremlin 3.4 replace the last by step with be(elementMap('name', 'age')):
g.V().union(
__.hasLabel('software').has('name', 'ripple').
as('parent').
in('created').as('child').
select('parent', 'child').
by(values('name')).
by(properties('id', 'name', 'age').
group().by(__.key()).
by(__.value())),
__.hasLabel('software').has('name', 'lop').
as('parent').
in('created').as('child').
select('parent', 'child').
by(values('name')).
by(properties('id', 'name', 'age').
group().by(__.key()).
by(__.value()))
)
So that you get a result like this:
{"data": [
{
"parent": "ripple",
"child": {
"id": 5717,
"name": "josh",
"age": 32
}
},
{
"parent": "lop",
"child": {
"id": 5709,
"name": "peter",
"age": 35
}
},
{
"parent": "lop",
"child": {
"id": 5713,
"name": "marko",
"age": 29
}
},
{
"parent": "lop",
"child": {
"id": 5717,
"name": "josh",
"age": 32
}
}
]
}
Which would allow you to create a lookup where you concat all results for "lop" and "ripple" into arrays.

Pagination in many-to-many mapping in Spring Boot

I have two tables with Many-To-Many mappings. Tables are content and tag. So, I have another table content_tag to normalize the many-to-many relationship. But, I am having problem in pagination. As example, when I am fetching a tag by name it returns a single tag object, but with multiple content object nested inside. I know how to do pagination for tag, but my question is how can I make pagination for the nested content object in a single tag object. Please see my result below, which I am getting from POSTMAN.
{
"id": 12,
"tag": "Viral",
"contents": [
{
"id": 15,
"idHide": "0",
"listCategory": {
"id": 11,
"title": "Dramai",
"titleBn": "ড্রামাই",
"status": "active"
},
"title": "Cat",
"titleBn": "#99",
"brief": "uytyyyy",
"briefBn": "#495",
"highlight": "0",
"dim": "0",
"sticky": "0",
"status": "active",
"createdAt": "Jan 24, 2018 3:08:34 PM",
"listTag": [
{
"id": 12,
"title": "Viral"
},
{
"id": 13,
"title": "Progress"
},
{
"id": 14,
"title": "Limit"
}
]
}
]
}
Please note contents tag in response is a list of content object for viral tag. Sometimes contents array size gone over 50 objects. So, I want to implement pagination for that. How can I implement pagination for content?

Best approch of Elastic Search time based feeds module?

I am new with elastic search and looking for the best solution with which i can create a feed module which have time based feeds along with there group and comment.
I learned little and come up with following.
PUT /group
{
"mappings": {
"groupDetail": {},
"content": {
"_parent": {
"type": "groupDetail"
}
},
"comment": {
"_parent": {
"type": "content"
}
}
}
}
so that will be placed separately as per index.
but than after i found one post where i found that parent child is costly operation for search than nested objects.
something like following is two group(feed) having details with content and comments as nested element.
{
"_index": "group",
"_type": "groupDetail",
"_id": 6829,
"_score": 1,
"_source": {
"groupid": 6829,
"name": "Jignesh Public",
"insdate": "2016-10-01T04:09:33.916Z",
"upddate": "2017-04-19T05:19:40.281Z",
"isVerified": true,
"tags": [
"spotrs",
"surat"
],
"content": [
{
"contentid": 1,
"type": "1",
"byUser": 5858,
"insdate": "2016-10-01 11:20",
"info": [
{
"t": 1,
"v": "lorem ipsum long text 1"
},
{
"t": 2,
"v": "http://www.imageurl.com/1"
}
],
"comments": [
{
"byuser": 5859,
"comment": "Comment 1",
"upddate": "2016-10-01T04:09:33.916Z"
},
{
"byuser": 5860,
"comment": "Comment 2",
"upddate": "2016-10-01T04:09:33.916Z"
}
]
},
{
"contentid": 2,
"type": "2",
"byUser": 5859,
"insdate": "2016-10-01 11:20",
"info": [
{
"t": 4,
"v": "http://www.videoURL.com/1"
}
],
"comments": [
{
"byuser": 5859,
"comment": "Comment 1",
"upddate": "2016-10-01T04:09:33.916Z"
},
{
"byuser": 5860,
"comment": "Comment 2",
"upddate": "2016-10-01T04:09:33.916Z"
}
]
}
]
}
}
{
"_index": "group",
"_type": "groupDetail",
"_id": 6849,
"_score": 1,
"_source": {
"groupid": 6849,
"name": "Xyz Group Public",
"insdate": "2016-10-01T04:09:33.916Z",
"upddate": "2017-04-19T05:19:40.281Z",
"isVerified": false,
"tags": [
"spotrs",
"food"
],
"content": [
{
"contentid": 3,
"type": "1",
"byUser": 5858,
"insdate": "2016-10-01 11:20",
"info": [
{
"t": 1,
"v": "lorem ipsum long text 3"
},
{
"t": 2,
"v": "http://www.imageurl.com/1"
}
],
"comments": [
{
"byuser": 5859,
"comment": "Comment 1",
"upddate": "2016-10-01T04:09:33.916Z"
},
{
"byuser": 5860,
"comment": "Comment 2",
"upddate": "2016-10-01T04:09:33.916Z"
}
]
},
{
"contentid": 4,
"type": "2",
"byUser": 5859,
"insdate": "2016-10-01 11:20",
"info": [
{
"t": 4,
"v": "http://www.videoURL.com/1"
}
],
"comments": [
{
"byuser": 5859,
"comment": "Comment 1",
"upddate": "2016-10-01T04:09:33.916Z"
},
{
"byuser": 5860,
"comment": "Comment 2",
"upddate": "2016-10-01T04:09:33.916Z"
}
]
}
]
}
}
now if i try to think with nested object than i confused if user add comment very frequently than reindexing factor will effect?
So main think i want to ask is which is the best approach with which i can add comment frequently and my content searching result is also faster.

Performance
Parent/child stores relevant data in same shards, as separately doc, which avoid the network;
Parent/child needs a joining process when retrieving data;
Nested object store the inner and outer object together, as a single doc;
So, we can infer:
Update nested object will re-index whole index, which can very expensive if your document is large;
Update parent or child alone will not affect the other one;
Searching nested object is a little fast, which save the process of joining;
Suggestions
As far as I understand your problem, you should use parent/child.
When your group's comments become more and more, adding a new comment will still re-index whole content, which can be very time-consuming;
On the other hand, search a comment with parent/child just need one more look up after finding the child, which is relative acceptable.
Furthermore, you should also take the rate of searching a comment comparing to adding a comment into account:
If you need searching a lot but a little new comments, maybe you can choose nested object;
Otherwise, choose parent/child;
By the way, you may combine both of them:
When this feed is active, use parent/child to store them;
When it is closed, i.e., no more comments can be added, move them to a new index with nested object;

If you do not specify more detailed info other than very frequently it is going to be hard to come up with a recommendation. Also you have not mentioned how your data looks like. A comment in a blog post might be happening rare, even in heated discussions. A comment/reply in a forum post (that will result in a huge document) might be sth very different. I'd personally start with nested and see how it goes, but I also do not know all the requirements, so this might be a very wrong answer.

Elastic Search. Search by sub-collection value

Need help with specific ES query.
I have objects at Elastic Search index. Example of one of them (Participant):
{
"_id": null,
"ObjectID": 6008,
"EventID": null,
"IndexName": "crmws",
"version_id": 66244,
"ObjectData": {
"PARTICIPANTTYPE": "2",
"STATE": "ACTIVE",
"EXTERNALID": "01010111",
"CREATORID": 1006,
"partAttributeList":
[
{
"SYSNAME": "A",
"VALUE": "V1"
},
{
"SYSNAME": "B",
"VALUE": "V2"
},
{
"SYSNAME": "C",
"VALUE": "V2"
}
],
....
I need to find the only entity(s) by partAttributeList entities. For example whole Participant entity with SYSNAME=A, VALUE=V1 at the same entity of partAttributeList.
If i use usul matches:
{"match": {"ObjectData.partAttributeList.SYSNAME": "A"}},
{"match": {"ObjectData.partAttributeList.VALUE": "V1"}}
Of course I will find more objects than I really need. Example of redundant object that can be found:
...
{
"SYSNAME": "A",
"VALUE": "X"
},
{
"SYSNAME": "B",
"VALUE": "V1"
}..

What I get you are trying to do is to search multiple fields of the same object for exact matches of a piece of text so please try this out:
https://www.elastic.co/guide/en/elasticsearch/guide/current/multi-query-strings.html

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to index thousands sub-objects? - elasticsearch

Related

Laravel intertwined relationship

How to cleanly batch queries together in Gremlin

Pagination in many-to-many mapping in Spring Boot

Best approch of Elastic Search time based feeds module?

Elastic Search. Search by sub-collection value

Categories

Resources