Best approch of Elastic Search time based feeds module? - performance

I am new with elastic search and looking for the best solution with which i can create a feed module which have time based feeds along with there group and comment.
I learned little and come up with following.
PUT /group
{
"mappings": {
"groupDetail": {},
"content": {
"_parent": {
"type": "groupDetail"
}
},
"comment": {
"_parent": {
"type": "content"
}
}
}
}
so that will be placed separately as per index.
but than after i found one post where i found that parent child is costly operation for search than nested objects.
something like following is two group(feed) having details with content and comments as nested element.
{
"_index": "group",
"_type": "groupDetail",
"_id": 6829,
"_score": 1,
"_source": {
"groupid": 6829,
"name": "Jignesh Public",
"insdate": "2016-10-01T04:09:33.916Z",
"upddate": "2017-04-19T05:19:40.281Z",
"isVerified": true,
"tags": [
"spotrs",
"surat"
],
"content": [
{
"contentid": 1,
"type": "1",
"byUser": 5858,
"insdate": "2016-10-01 11:20",
"info": [
{
"t": 1,
"v": "lorem ipsum long text 1"
},
{
"t": 2,
"v": "http://www.imageurl.com/1"
}
],
"comments": [
{
"byuser": 5859,
"comment": "Comment 1",
"upddate": "2016-10-01T04:09:33.916Z"
},
{
"byuser": 5860,
"comment": "Comment 2",
"upddate": "2016-10-01T04:09:33.916Z"
}
]
},
{
"contentid": 2,
"type": "2",
"byUser": 5859,
"insdate": "2016-10-01 11:20",
"info": [
{
"t": 4,
"v": "http://www.videoURL.com/1"
}
],
"comments": [
{
"byuser": 5859,
"comment": "Comment 1",
"upddate": "2016-10-01T04:09:33.916Z"
},
{
"byuser": 5860,
"comment": "Comment 2",
"upddate": "2016-10-01T04:09:33.916Z"
}
]
}
]
}
}
{
"_index": "group",
"_type": "groupDetail",
"_id": 6849,
"_score": 1,
"_source": {
"groupid": 6849,
"name": "Xyz Group Public",
"insdate": "2016-10-01T04:09:33.916Z",
"upddate": "2017-04-19T05:19:40.281Z",
"isVerified": false,
"tags": [
"spotrs",
"food"
],
"content": [
{
"contentid": 3,
"type": "1",
"byUser": 5858,
"insdate": "2016-10-01 11:20",
"info": [
{
"t": 1,
"v": "lorem ipsum long text 3"
},
{
"t": 2,
"v": "http://www.imageurl.com/1"
}
],
"comments": [
{
"byuser": 5859,
"comment": "Comment 1",
"upddate": "2016-10-01T04:09:33.916Z"
},
{
"byuser": 5860,
"comment": "Comment 2",
"upddate": "2016-10-01T04:09:33.916Z"
}
]
},
{
"contentid": 4,
"type": "2",
"byUser": 5859,
"insdate": "2016-10-01 11:20",
"info": [
{
"t": 4,
"v": "http://www.videoURL.com/1"
}
],
"comments": [
{
"byuser": 5859,
"comment": "Comment 1",
"upddate": "2016-10-01T04:09:33.916Z"
},
{
"byuser": 5860,
"comment": "Comment 2",
"upddate": "2016-10-01T04:09:33.916Z"
}
]
}
]
}
}
now if i try to think with nested object than i confused if user add comment very frequently than reindexing factor will effect?
So main think i want to ask is which is the best approach with which i can add comment frequently and my content searching result is also faster.

Performance
Parent/child stores relevant data in same shards, as separately doc, which avoid the network;
Parent/child needs a joining process when retrieving data;
Nested object store the inner and outer object together, as a single doc;
So, we can infer:
Update nested object will re-index whole index, which can very expensive if your document is large;
Update parent or child alone will not affect the other one;
Searching nested object is a little fast, which save the process of joining;
Suggestions
As far as I understand your problem, you should use parent/child.
When your group's comments become more and more, adding a new comment will still re-index whole content, which can be very time-consuming;
On the other hand, search a comment with parent/child just need one more look up after finding the child, which is relative acceptable.
Furthermore, you should also take the rate of searching a comment comparing to adding a comment into account:
If you need searching a lot but a little new comments, maybe you can choose nested object;
Otherwise, choose parent/child;
By the way, you may combine both of them:
When this feed is active, use parent/child to store them;
When it is closed, i.e., no more comments can be added, move them to a new index with nested object;

If you do not specify more detailed info other than very frequently it is going to be hard to come up with a recommendation. Also you have not mentioned how your data looks like. A comment in a blog post might be happening rare, even in heated discussions. A comment/reply in a forum post (that will result in a huge document) might be sth very different. I'd personally start with nested and see how it goes, but I also do not know all the requirements, so this might be a very wrong answer.

Related

Elasticsearch - Sort By Distance Not Working?

I have an index where the records are stored in the following format:
"_source": {
"name": "ACME Pallets",
"about": null,
"slug": "acme-pallets",
"serviceAreas": [
{
"admin1": "usa",
"admin2": null,
"admin3": null,
"admin4": null,
"countryCode": "US",
"googlePlaceId": null,
"locality": null,
"selectedLevel": "admin1"
}
],
"id": "fadsflsjdfkk3234234",
"addresses": [
{
"address1": "4342 Dietrich Rd",
"address2": null,
"city": "San Antonio",
"countryCode": "US",
"latitude": 29.44122,
"longitude": -98.34404,
"primary": true,
"name": "office",
"postal": "78219",
"province": "TX",
"location": {
"lat": 29.44156,
"lon": -98.37704
}
}
]
}
I am trying to return results from this index where the records are sorted by distance to the search point I pass in. My sort config being passed in looks like this:
_geo_distance: {
'addresses.location': { lat: 31.75917, lon: -106.48749 },
order: 'asc',
unit: 'mi',
mode: 'min'
}
The results I receive back are not sorted according to distance. If I manually plot out the individual locations on a map and the search pin passed in, I can see that the sorting is out of order.
If I pass in a sorting config to my search to sort by alphabetically order or to sort by relevance (aka _score), the sorting returned is correct.
Does anyone know why ES might be returning my results incorrectly when sorting by distance?
addresses is an array in my index. Each object inside of addresses has a property called location of type geo_point.
From all the documentation that I've read, passing 'addresses.location': { lat: 31.75917, lon: -106.48749 } into the search should work, but it doesn't. ES should be smart enough to find the location geo point in each object and use that as the reference when calculating the distance. If there are more than one object inside of the addresses array, then ES by default should get the center point of all the objects inside of addresses and use that to calculate the distance from the search point.
In my case, I don't have any data where addresses has more than one object. I ended up creating a location geo_point property outside of the addresses property during index build and then passing in location: { lat: 31.75917, lon: -106.48749 } for the search. This made ES sort results based on distance correctly.
What my new index looks like with the added location property:
"_source": {
"name": "ACME Pallets",
"about": null,
"slug": "acme-pallets",
"serviceAreas": [
{
"admin1": "usa",
"admin2": null,
"admin3": null,
"admin4": null,
"countryCode": "US",
"googlePlaceId": null,
"locality": null,
"selectedLevel": "admin1"
}
],
"id": "fadsflsjdfkk3234234",
"addresses": [
{
"address1": "4342 Dietrich Rd",
"address2": null,
"city": "San Antonio",
"countryCode": "US",
"latitude": 29.44122,
"longitude": -98.34404,
"primary": true,
"name": "office",
"postal": "78219",
"province": "TX",
"location": {
"lat": 29.44156,
"lon": -98.37704
}
}
]
"location": {
"lat": 29.44156,
"lon": -98.37704
}
}

Delete existing Records if they are not in sent array Rails 5 API

I need help on how to delete records that exist in the DB but not in array sent in a request;
My Array:
[
{ "id": "509",
"name": "Motions move great",
"body": "",
"subtopics": [
{
"title": "Tywan",
"url_path": "https://ugonline.s3.amazonaws.com/resources/6ca0fd64-8214-4788-8967-b650722ac97f/WhatsApp+Audio+2021-09-24+at+13.57.34.mpeg"
},
{
"title": "Transportations Gracious",
"url_path": "https://ugonline.s3.amazonaws.com/resources/6ca0fd64-8214-4788-8967-b650722ac97f/WhatsApp+Audio+2021-09-24+at+13.57.34.mpeg"
},
{
"title": "Transportation part",
"url_path": "https://ugonline.s3.amazonaws.com/resources/6ca0fd64-8214-4788-8967-b650722ac97f/WhatsApp+Audio+2021-09-24+at+13.57.34.mpeg"
}
]
},
{
"name": "Motions kkk",
"body": "",
"subtopics": [
{
"title": "Transportations",
"url_path": "https://ugonline.s3.amazonaws.com/resources/6ca0fd64-8214-4788-8967-b650722ac97f/WhatsApp+Audio+2021-09-24+at+13.57.34.mpeg"
}
]
}
]
Below is my implementation: where am going wrong?
#topics = #course.topics.map{|m| m.id()}
#delete= #topics
puts #delete
if Topic.where.not('id IN(?)', #topics).any?
#topics.each do |topic|
topic.destroy
end
end
it's not clear to me where, in your code, you pick the ids sent in the array you showed before... so I'm assuming like this:
objects_sent = [
{ "id": "509",
"name": "Motions move great",
"body": "",
"subtopics": [
{
"title": "Tywan",
"url_path": "https://ugonline.s3.amazonaws.com/resources/6ca0fd64-8214-4788-8967-b650722ac97f/WhatsApp+Audio+2021-09-24+at+13.57.34.mpeg"
},
{
"title": "Transportations Gracious",
"url_path": "https://ugonline.s3.amazonaws.com/resources/6ca0fd64-8214-4788-8967-b650722ac97f/WhatsApp+Audio+2021-09-24+at+13.57.34.mpeg"
},
{
"title": "Transportation part",
"url_path": "https://ugonline.s3.amazonaws.com/resources/6ca0fd64-8214-4788-8967-b650722ac97f/WhatsApp+Audio+2021-09-24+at+13.57.34.mpeg"
}
]
},
{
"name": "Motions kkk",
"body": "",
"subtopics": [
{
"title": "Transportations",
"url_path": "https://ugonline.s3.amazonaws.com/resources/6ca0fd64-8214-4788-8967-b650722ac97f/WhatsApp+Audio+2021-09-24+at+13.57.34.mpeg"
}
]
}
]
since you have your array like this, the only information you need to query on database is the ids (also, assuming the id's in the array are the id's on database, otherwise it wouldn't make sense). You can get them like this:
sent_ids = objects_sent.map{|o| o['id'].to_i}
Also, it seems to me that, for the code you showed, you want to destroy them based on a specific course. There would be 2 ways to do that. First, using the relationship (I prefer like this one):
#course.topics.where.not(id: sent_ids).destroy_all
Or you can do the query directly on the Topic model, but passing the course_id param:
Topic.where(course_id: #course.id).where.not(id: sent_ids).destroy_all
ActiveRecord is smart enough to mount that query correctly in both ways. Give it a test and see which works better for you

Combine json response in nifi

We are calling invokehttp processes and getting response which json. Example
{
"id": "h569gcjhcm",
"doi": {
"id": "10.17632/h569gcjhcm.1",
"status": "allocated",
"prefix": "10.17632"
},
"name": "Data for: Flooding of the Caspian Sea at the intensification of Northern Hemisphere Glaciations",
"description": "Supplementary data for the Jeirankechmez section in Azerbaijan.\n\n- Appendix A contains all paleomagnetic data and interpretations of the Jeirankechmez section. This .dir file can be imported into the paleomagnetism.org webportal under \"Interpretation Portal\", \"Advanced Options\", \"Import Application Save\". For further details on the use of paleomagnetism.org please refer to the article by Koymans et al. (2016) - https://doi.org/10.1016/j.cageo.2016.05.007.\n- Appendix B contains the magnetic susceptibility data for the analysed samples, including geographic coordinates and stratigraphic levels.\n- Appendix C contains the 40Ar/39Ar data for the three analysed volcanic ash layers. ",
"version": 1,
"publish_date": "2019-01-29T12:51:38.090Z",
"data_licence": {
"id": "01d9c749-3c4d-4431-9df3-620b2dcfe144",
"short_name": "CC BY 4.0",
"full_name": "Creative Commons Attribution 4.0 International",
"description": "This dataset is licensed under a Creative Commons Attribution 4.0 International licence.\n\nWhat does this mean?\nYou can share, copy and modify this dataset so long as you give appropriate credit, provide a link to the CC BY license, and indicate if changes were made, but you may not do so in a way that suggests the rights holder has endorsed you or your use of the dataset. Note that further permission may be required for any content within the dataset that is identified as belonging to a third party.",
"url": "http://creativecommons.org/licenses/by/4.0",
"category": "Creative"
},
"contributors": [
{
"first_name": "Christiaan",
"last_name": "van Baak"
},
{
"first_name": "Marius",
"last_name": "Stoica"
},
{
"first_name": "Arjen",
"last_name": "Grothe"
},
{
"first_name": "Gareth",
"last_name": "Davies"
},
{
"profile_id": "72970719-95c8-341b-80d2-afa9e7154baf",
"first_name": "Wout",
"last_name": "Krijgsman"
},
{
"profile_id": "3a4bfe2c-4098-3859-9b88-789fa993e05a",
"first_name": "Keith",
"last_name": "Richards"
},
{
"profile_id": "f1660f3c-ebbd-3289-8240-1f4ea7913df4",
"first_name": "Klaudia",
"last_name": "Kuiper"
},
{
"first_name": "Elmira",
"last_name": "Aliyeva"
}
],
"versions": [
{
"version": 1,
"publish_date": "2019-01-29T12:51:38.090Z",
"available": true
}
],
"files": [
{
"filename": "Appendix_A_Jeirankechmez_pmag_interpretations.dir",
"id": "f2f4cba7-2411-4737-a9b2-f094db30dca1",
"content_details": {
"id": "994bc865-5300-4d76-a373-e528ccd830e8",
"sha256_hash": "2427c4b077372760973ce8224694f2a2ee5383c7f022ad818164d847a20e27cc",
"sha1_hash": "73792dc6d6eb2c1de1e04926ba5d4420dd0aaece",
"content_type": "application/x-director",
"size": 917022,
"created_date": "2019-01-03T00:00:00.000Z"
"download_expiry_time": "2019-01-29T13:52:25.729Z"
},
"metrics": {
"downloads": 0,
"previews": 0
}
},
{
"filename": "Appendix_B_Sample_locations_susceptibility.xlsx",
"id": "64241bf0-5279-49e8-a505-be9075b910e1",
"content_details": {
"id": "af8809d0-8e63-4599-abaa-e7af9ad39959",
"sha256_hash": "0588f44a0cbd477aa2798323e57ce0b2d4a118e767c0b1ffdc9eb1017e4d23c2",
"sha1_hash": "02e89f6f197ebf495e1e2c3d1aab250efc7545e7",
"content_type": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
"size": 24770,
"created_date": "2019-01-03T00:00:00.000Z"
,
"download_expiry_time": "2019-01-29T13:52:25.732Z"
},
"metrics": {
"downloads": 0,
"previews": 0
}
},
{
"filename": "Appendix_C_ArAr_data.xlsx",
"id": "2e912027-ff3f-48ad-98b9-b643b59ba0e3",
"content_details": {
"id": "4960377c-060d-41f6-b7af-150617d8ebeb",
"sha256_hash": "235dc32c1e99f350ee5c99908a5f5d72d1aeeab02f78c2e0181d585bd1880fa6",
"sha1_hash": "6483156e4577948cac5d2679eee862c76faed1c9",
"content_type": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
"size": 18510,
"created_date": "2019-01-03T00:00:00.000Z"
},
"metrics": {
"downloads": 0,
"previews": 0
}
}
],
"articles": [
{
"id": "10.1016/j.gloplacha.2019.01.007",
"title": "Flooding of the Caspian Sea at the intensification of Northern Hemisphere Glaciations",
"doi": "10.1016/j.gloplacha.2019.01.007",
"journal": {
"issn": "0921-8181",
"name": "Global and Planetary Change",
"url": "http://www.sciencedirect.com/science/journal/09218181"
}
}
],
"categories": [
{
"id": "http://com/vocabulary/OmniScience/Concept-170590667",
"label": "Geology"
},
{
"id": "http://data.elsevier.com/vocabulary/OmniScience/Concept-473860195",
"label": "Strontium Isotope"
}
],
"institutions": [ ],
"metrics": {
},
"available": true,
"related_links": [ ]
}
I am using $contributors.profile_id from above json to call new endpoint(invokeshttp) (https://api.xxx.com/profile/$.profile_id)
Json response for this
"contributors": [
{
“profile_id”:”cedferfiherhforhforf”
"first_name": “xxx”,
"last_name": "van Baak”,
“other_ids”:[] ,
“Other info”: “deeded” }
I have to call this endpoint depending upon number of object in contributor(let say we have 5 object in contributor ,so I have to call this endpoint 5 time)and combine these 5 response together
Then I have to merge the response(above response to the main response )
just an example:
EvaluateJsonPath to extract "id" into attribute, later join by this attribute
SplitJson to split your json by "contributors"
call endpoint
MergeContent merge by "id" and with count after SplitJson

Elastic Search. Search by sub-collection value

Need help with specific ES query.
I have objects at Elastic Search index. Example of one of them (Participant):
{
"_id": null,
"ObjectID": 6008,
"EventID": null,
"IndexName": "crmws",
"version_id": 66244,
"ObjectData": {
"PARTICIPANTTYPE": "2",
"STATE": "ACTIVE",
"EXTERNALID": "01010111",
"CREATORID": 1006,
"partAttributeList":
[
{
"SYSNAME": "A",
"VALUE": "V1"
},
{
"SYSNAME": "B",
"VALUE": "V2"
},
{
"SYSNAME": "C",
"VALUE": "V2"
}
],
....
I need to find the only entity(s) by partAttributeList entities. For example whole Participant entity with SYSNAME=A, VALUE=V1 at the same entity of partAttributeList.
If i use usul matches:
{"match": {"ObjectData.partAttributeList.SYSNAME": "A"}},
{"match": {"ObjectData.partAttributeList.VALUE": "V1"}}
Of course I will find more objects than I really need. Example of redundant object that can be found:
...
{
"SYSNAME": "A",
"VALUE": "X"
},
{
"SYSNAME": "B",
"VALUE": "V1"
}..
What I get you are trying to do is to search multiple fields of the same object for exact matches of a piece of text so please try this out:
https://www.elastic.co/guide/en/elasticsearch/guide/current/multi-query-strings.html

error given Sending json array in ajax request? using javascript

hihi i am doing a project regarding ajax (xmlHttRequest),
how am i going to call the title in the note session, because normally if you call start year is like, detail = eval....
then for loop it
inside should be
var start year=""
startyear += ...[i].startyear
something like this, but how am i going to call the title inside the note?
i try to call detail.notes.note.title it say is null or is not a object
this is json data:
{
"infos": {
"info": [
{
"startYear": "1900",
"endYear": "1930",
"timeZoneDesc": "daweerrewereopreproewropewredfkfdufssfsfsfsfrerewrBlahhhhh..",
"timeZoneID": "1",
"note": {
"notes": [
{
"id": "1",
"title": "Mmm"
},
{
"id": "2",
"title": "Wmm"
},
{
"id": "3",
"title": "Smm"
}
]
},
"links": [
{
"id": "1",
"title": "Red House",
"url": "http://infopedia.nl.sg/articles/SIP_611_2004-12-24.html"
},
{
"id": "2",
"title": "Joo Chiat",
"url": "http://www.the-inncrowd.com/joochiat.htm"
},
{
"id": "3",
"title": "Bake",
"url": "https://thelongnwindingroad.wordpress.com/tag/red-house-bakery"
}
]
}
]
}
}
Use http://jsonlint.com/ to help you out with JSON.
I got this...
Parse error on line 30:
... ] }
----------------------^
Expecting '}', ',', ']'

Resources