ACID update of ElasticSearch Document - elasticsearch

I'm trying to build a Tinder-like system right now. Here I need to know which cards have already been seen.
If I save the cards in ElasticSearch, and then have such a document:
{ nama: David, location: {lat, lon}, seenFromUsers: [] }
I'm just wondering if it makes sense to create a list in the object itself. Probably there are 2000 entries in it.
But if I do an update in ElasticSearch, then I always have to pass all 2000 entries. If two users do this at the same time, does one get lost? How can I simply add another ID to the array? Is that even possible?
What other solutions are there?

One other solution would be a complete different approach. Instead if creating documents like this
{
"name": "David",
"location": { "lat": ..., "lon": ...},
"seenFromUsers": ["Laura", "Simone"]
}
think in Relations like this:
{
"name": "David",
"seenBy": "Laura"
}
{
"name": "David",
"seenBy": "Simone"
}
this approach will give you simpler queries, and the ACID problem is solved. New profile views are simply new documents...
As a benefit, you´ll get rid of inner objects and it will be more easy to add additional data to this relation:
{
"name": "David",
"seenBy": "Laura",
"timestamp": ...,
"liked": true
}
{
"name": "David",
"seenBy": "Simone",
"timestamp": ...,
"liked": false
}
And now you´ll be able to do a simple query for all positive likes of a profile, or bi-directional likes/matches...

Related

Is there a way to write an Expression in Power Automate to retrieve item from SurveyMonkey?

There is no dynamic content you can get from the SurveyMonkey trigger in Power Automate except for the Analyze URL, Created Date, and Link. Is it possible I could retrieve the data with an expression so I could add fields to SharePoint or send emails based on answers to questions?
For instance, here is some JSON data for a county multiple choice field, that I would like to know the county so I can have the email sent to the correct person:
{
"id": "753498214",
"answers": [
{
"choice_id": "4963767255",
"simple_text": "Williamson"
}
],
"family": "single_choice",
"subtype": "menu",
"heading": "County where the problem is occurring:"
}
And basically, a way to create dynamic fields from the content so it would be more usable?
I am a novice so your answer will have to assume I know nothing!
Thanks for considering the question.
Overall, anything I have tried is unsuccessful!
I was able to get an answer on Microsoft Power Users support.
Put this data in compose action:
{
"id": "753498214",
"answers": [
{
"choice_id": "4963767255",
"simple_text": "Williamson"
}
],
"family": "single_choice",
"subtype": "menu",
"heading": "County where the problem is occurring:"
}
Then these expressions in additional compose actions:
To get choice_id:
outputs('Compose')?['answers']?[0]?['choice_id']
To get simple_text:
outputs('Compose')?['answers']?[0]?['simple_text']
Reference link here where I retrieved the answer is here.
https://powerusers.microsoft.com/t5/General-Power-Automate/How-to-write-an-expression-to-retrieve-answer/m-p/1960784#M114215

How to index and query Nested documents in the Elasticsearch

I have 1 million users in a Postgres table. It has around 15 columns which are of the different datatype (like integer, array of string, string). Currently using normal SQL query to filter the data as per my requirement.
I also have an "N" number of projects (max 5 projects) under each user. I have indexed these projects in the elasticsearch and doing the fuzzy search. Currently, for each project (text file) I have a created a document in the elasticsearch.
Both the systems are working fine.
Now my need is to query the data on both the systems. Ex: I want all the records which have the keyword java (on elasticsearch) and with experience of more than 10 years (available in Postgres).
Since the user's count will be increasing drastically, I have moved all the Postgres data into the elasticsearch.
There is a chance of applying filters only on the fields related to the user (except project related fields).
Now I need to created nest projects for the corresponding users. I tried parent-child types and didn't work for me.
Could anyone help me with the following things?
What will be the correct way of indexing projects associated with the users?
Since each project document has a field called category, is it possible to get the matched category name in the response?
Are there any other better way to implement this?
By your description, we can tell that the "base document" is all based on users.
Now, regarding your questions:
Based on what I said before, you can add all the projects associated to each user as an array. Like this:
{
"user_name": "John W.",
..., #More information from this user
"projects": [
{
"project_name": "project_1",
"role": "Dev",
"category": "Business Intelligence",
},
{
"project_name": "project_3",
"role": "QA",
"category": "Machine Learning",
}
]
},
{
"user_name": "Diana K.",
..., #More information from this user
"projects": [
{
"project_name": "project_1"
"role": "Project Leader",
"category": "Business Intelligence",
},
{
"project_name": "project_4",
"role": "DataBase Manager",
"category": "Mobile Devices",
},
{
"project_name": "project_5",
"role": "Project Manager",
"category": "Web services",
}
]
}
This structure is with the goal of adding all the info of the user to each document, doesn't matter if the info is repeated. Doing this will allow you to bring back, for example, all the users that work in a specific project with queries like this:
{
"query":{
"match": {
"projects.name": "project_1"
}
}
}
Yes. Like the query above, you can match all the projects by their "category" field. However, keep in mind that since your base document is merely related to users, it will bring back the whole user's document.
For that case, you might want to use the Terms aggregation, which will bring you the unique values of certain fields. This can be "combined" with a query. Like this:
{
"query":{
"match": {
"projects.category": "Mobile Devices"
}
}
},
"size", 0 #Set this to 0 since you want to focus on the aggregation's result.
{
"aggs" : {
"unique_projects_names" : {
"terms" : { "field" : "projects.name" }
}
}
}
That last query will bring back, in the aggregation fields, all the unique projects' name with the category "Mobile Devices".
You can create a new index where you'll store all the information related to your projects. However, the relationships betwen users and projects won't be easy to keep (remember that ES is NOT intended for being an structured or ER DB, like SQL) and the queries will become very complex, even if you decide to name both of your indices (users and projects) in a way you can call them with a wildcard.
EDIT: Additional, you can consider store all the info related to your projects in Postgress and do the call separately, first get the project ID (or name) from ES and then the project's info from Postgres (since I assume is maybe the info that is more likely not to change).
Hope this is helpful! :D

How to get name/confidence individually from classify_text?

Most of the other methods in the language api, such as analyze_syntax, analyze_sentiment etc, have the ability to return the constituent elements like
sentiment.score
sentiment.magnitude
token.part_of_speech.tag
etc etc etc....
but I have not found a way to return name and confidence in isolation from classify_text. It doesn't look like it's possible but that seems weird. Am missing something? Thanks
The language.documents.classifyText method returns a ClassificationCategory object which contains name and confidence. If you only want one of the fields you can filter by categories/name or categories/confidence. As an example I executed:
POST https://language.googleapis.com/v1/documents:classifyText?fields=categories%2Fname&key={YOUR_API_KEY}
{
"document": {
"content": "this is a test for a StackOverflow question. I get an error because I need more words in the document and I don't know what else to say",
"type": "PLAIN_TEXT"
}
}
Which returns:
{
"categories": [
{
"name": "/Science/Computer Science"
},
{
"name": "/Computers & Electronics/Programming"
},
{
"name": "/Jobs & Education"
}
]
}
Direct link to API explorer for interactive testing of my example (change content, filters, etc.)

Using elastic search to build flow/funnel results based on unique identifiers

I want to be able to return a set of counts of individual documents from a single index based on a previous set of results, and am wondering if there is a way to do it without running a separate query for each.
So, given a data set like this (simplified version of my ES documents):
{
"name": "visit",
"sessionId": "session1"
},
{
"name": "visit",
"sessionId": "session2"
},
{
"name": "visit",
"sessionId": "session3"
},
{
"name": "click",
"sessionId": "session1"
},
{
"name": "click",
"sessionId": "session3"
}
What I would like to do is be able to search for name: visit and give a count of all those. That part is easy. But I would also like to be able to now count my name: click docs that have the sessionId of the name: visit result set and return a count of how many of those name: click there were as well as the name: visit.
Is there an easy way to do this? I have looked at aggregation APIs but they all seem to not quite fit my needs. There also seems to be a parent/child relationship but it doesn't apply to my situation since both documents I want to individually get counts of are of the same type.
Expected result would be something like this:
{
"count": {
// total number of visit events since this is my start point
"visit": 3,
// the amount of click results that have sessionId
// matching my previous search's sessionId values
"click": 2
}
}
At first glance, you need to do this in two queries:
the first aggregation query to retrieve the sessionIds and
a second aggregation query filtered with those sessionIds to find the count of clicks.
I don't think it's a big deal to run those two queries, but that depends on how much data you have and how many sessionIds you want to retrieve at once.

Rethinkdb: Including a subdocument for nested doc

I am performing an operation, and it works, but I want to know if there is a better or more efficient way to do what I want.
I have an object in my db that looks like this:
{
"id": "testId",
"name": "testName",
"products": [
{
"name": "product1"
"info": "sampleInfo",
"templateIds": [
"asdf-1",
"asdf-2"
]
},
{
"name": "product2"
"info": "sampleInfo",
"templateIds": [
"asdf-1",
"asdf-2"
]
}
]
}
As you can see, each "product" in the "products" array has a sub-array of templateIds. These match templates stored in another table. What I want to do is create a query that merges those templates onto each product object before I send it all back.
Currently I am doing this with sub-merges:
r.table('suites').get('testId').merge(function(suite){
return {
products: suite('products').merge(function(product){
return {
templates: r.expr(product('templateIds')).map(function(id) {
return r.table('templates').get(id)
})
}
})
}
})
My question is: is there a more efficient way to do this? Or is there a completely different way of thinking I should employ to do this?
Thanks guys!
That looks right to me. The only thing I can think of is that r.table('templates').get_all(r.args(product('templateIds'))) is shorter than product('templateIds').map(function(id){ return t.table('templates').get(id);}) and might well be faster.
EDIT: If you have a small number of templates, another thing that would make this run faster would be to do the substitution in the client instead and cache the retrieved templates by ID. RethinkDB will have to do a separate read for each template ID, even if it sees the same one over and over again, because it doesn't know enough to know whether or not caching those values is safe.

Resources