What are the options for marking/flaging documents in Elasticsearch? - elasticsearch

Let's assume I have a front-end app for a blog and I stored the blog posts in an Elasticsearch instance (this is a hypothetical example).
I want multiple users to be able to mark some blog posts as favorite and the super users to be able to flag blog posts. For marking as favorite, only the user that did the marking is able to see it as marked. For flagging, if one user flags it, all the other users sees it as flagged.
I was thinking about adding a boolean field for the flagging and an array field with the user ids for the marking. This way I can use a boolean query to find flagged posts and for the favorite posts of a user I can use an exists query.
I'm pretty new to Elasticsearch so I'm not sure if this will perform good enough on millions/billions of posts. What other options are there?
Edit: Forgot to mention that I would also like to have paging for the blog posts and be able to filter out/in the flagged or marked posts. For example I want first (ordered by creation date) 10 blog posts that are marked as favorite, or last 10 blog posts flagged.

To make a favorite system one solution is to store the data in a different index with blog_id
user_id
created_at
This way you can easily add remove and search.
I want multiple users to be able to mark some blog posts as favorite
User 1 click on the favorite link of blog 2, system will store in "favorite" index {"user_id":1, "blog_id": 2, "created_at": "2019-10-02 12:00:02", "blog_created_at": "2019-01-01 09:10:11"}
only the user that did the marking is able to see it as marked.
You can search with get by id if you concaten the user_id-blog_id or you can make a search with blog_id, user_id and you can know if the record exist if the blog you display is marked as favorite by the user who read.
Same for list page as you know the user_id and after you build the list of blog_ids you'll display you can make the search and retrieve a list that you will use when you'll display your list of blogs.
This solution will have good performance even for billions of posts.
If you have flag you can also flag your blog post the same way and put a category field.
Depends on how much flag and which kind of flag you'll have you can consider saving in the same index with a category field ['favorite', 'flag', ...] or save in different indices.
Also another thing to check about is using periodic index (monthly, weekly or daily) depend on the number of document you will store and how much update (add/remove favorite) you will have. You can rollup your index in yearly later if you have lass activity on them.
And a last thing, maybe consider using cache to handle frenetic click on favorite button that can lead to add/remove document which will increase the number of deleted document in your index, that can make your index slow.
Edit for the Edit in the question:
For example I want first (ordered by creation date) 10 blog posts
that are marked as favorite, or last 10 blog posts flagged.
You can add the blog creation date in your favorite records "blog_created_at" (I updated the example document). So you can sort by blog creation date and limit your aggregation at 10 if you want the 10 first.
For the other case in your comment:
If I want to get just 10 blog posts, ordered by date, which are not
marked as favorite or not flagged
You can add a field in your blog and set as True if you have a favorite, something like "has_favorite" or "has_flag".
You set as True when you first set as favorite, if it's already favorite you do nothing.
So you can search against this field to filter the blog that don't have favorite.
If somebody remove a favorite you can count how much this blog have favorite if 0 set has_favorite to False. <-- only this case can generate update but it's maybe 0.001% of case so better to focus on the 99% of case. If it increase, need to adapt the solution.

Related

How to index two document types in parent-child relationship in Elasticsearch

I am building a search functionality for two types of related documents, let's call them "blogs" and "posts", respectively a blog website (with a bunch of posts) and the specific posts written in that blog. I'd like to be able to search against both of them. In a relational database (which ES is not), I would have two main tables which would be linked against a foreign key, and I could search the two tables separately or with a join. In Elasticsearch, I am considering a parent-child relationship where "blog" is the parent document, and there are potentially many "post" documents associated with it as the child.
EDIT: I should explain why I want to index them this way. Basically, I want people to be able to search for blogs (the overall series of posts written by the same author), and the search terms might not be in the blog's description alone, but rather in the posts; for instance, a blog about Python might have a general description that talks about python, but the blog posts might talk about django, so if someone searches for "django" I'd like the python blog to come up. Also, I want people to be able to search for specific posts. I also think (prove me wrong!) these need to be separate types of documents because they would have different fields, e.g. a post might have a date field, while a blog would not have that field.
In any case: Ideally, I would like to be able to offer a search function against "blog" which would also search against the "post" text (as the relevant text might be in the post); additionally, I'd like to allow users to search all posts regardless of what blog they are associated with.
What are the best practices for setting this up? From what I can tell, Elasticsearch has removed the ability to have two types of documents on the same index, and parent-child relationships need to be on the same index. With this constraint, it seems like parent-child relationships would only be for relationships between documents of the same type, e.g. if you are indexing people and you can indicate who is a parent and child (literally).
The other option would be to create two indexes, one for blogs (which would include the posts' texts) and a second index which would include only the posts. But my instinct is that this would duplicate a tremendous amount of data, and also a lot more work to keep it updated and in sync with my main relational data store.

GraphQL query for only unseen content - Schema Advice

I'm building a graphql schema through AWS AppSync and have a question regarding schema structure. My app will show users new posts and have them either join or pass on them. I'm trying to build in a way that I will only show users new posts and not repeat or at least not repeat for a certain amount of time. It's similar to swiping on tinder, they don't show you somebody again if you've already swiped on them. Does anybody have any ideas how to structure this in my schema. Do I need to store references to all of the seen posts in the user model or should I store each swipe as its own model and how should I structure the querying? I'd appreciate any advice on this.
Thanks!
Assuming a post has a creation time, you could keep track of the last (max created time) post they've seen, then display anything after that.
But think about what happens if they've been off the app for 5 minutes, or 5 days, or 5 weeks... depending on the volume of posts you anticipate they could quickly get behind and have to wade through too many older posts.
One thought would be to show the next oldest post, based derived from the creation time of the most recent post they viewed. Unless N number of posts were created since the last time they were online (a threshold you'd have to decide). Then start with displaying the N - Xth post (where X is 5, or 500, again depending on volume) until they're all caught up.
There are lots of ways you could program it, it all depends on your use case, you may want to take "popular" posts into account for example, those might be weighted above/before the other posts in their backlog.
Hope this helped.

Where can I find sold item notes in the Square API?

I'm trying to create a report to pull my Square POS transaction data into Excel through Power Query. Basically, I want the information available in the standard "Items Detail CSV" report found in the Square Dashboard, but connected to Excel so I can build custom reporting and update it any time with a refresh.
I can connect to the Square data in Excel no problem. What I'm having trouble with is finding the right data, specifically the notes the merchant can enter in during the time of sale about the item. We have several items that will come up as "Custom Amount" where we enter additional notes about the item, and I can't find the notes field through the API.
After looking through the documentation, I've tried two main queries:
The Transactions List from v2:
https://connect.squareup.com/v2/locations/{location_id}/transactions
The Payments List from v1:
https://connect.squareup.com/v1/{location_id}/payments
The Payments List includes the v1PaymentItemization data type, which should include the notes field defined as "Notes entered by the merchant about the item at the time of payment, if any." I assume that's what I'm looking for.
(Link to documentation: https://docs.connect.squareup.com/api/connect/v1#type-v1paymentitemization)
However, I cannot find the notes field anywhere in my pulled results. There are no error messages, and I see every data field listed in the v1PaymentItemization documentation except "notes" in my query results.
Edit: The notes we're using are on individual items, not the payment as a whole. This fits the description of V1 List Payments -> PaymentItemization -> notes. I did check the tender notes as mentioned in the comments, but this was not what I'm looking for. Until now, I wasn't aware we could make a comment on the transaction as a whole, as opposed to individual items. The individual items notes would be more helpful anyway.
Either way, I didn't find the tender notes in the pulled data either. Most of the fields are there in the data pull, but not notes, v1TenderEntryMethod, and a couple of others. There's actually more data available in the standard dashboard reports than is actually pulling from the API.
I do realize a workaround is to export the Item Detail CSV report from the Square Dashboard, and then manipulate the data in Excel from there. I could even have a connection to the folder or file where I save my exports. It's just not as smooth as the desired result of opening Excel, setting my parameters there, and clicking refresh to get the data and formatted report all in one place.
Thanks
Second Edit: In the POS, I'm entering an amount which shows up as Custom Amount in the itemized list for the sale. I then click on the Custom Amount to add a note to it and specify what the item is (e.g., "Lamp"). That note is applied to a single item, and there may be several items per transaction that have these notes added to them which would otherwise only show as "Custom Amount" on a report or receipt. We do this because we sell several items that are not standard inventory items, but we do want to keep track of what we've sold.
I can see these notes for each item in the standard reporting, so I know the data is entered and saved correctly. However, I can't find the note field when I pull from the API. I see all of the other itemization fields (i.e., name, quantity, item_detail, itemization_type, etc.), but not the note field.
I'm getting these results with a simple /v1/payments pull with no parameters or filters.

How to create a quick search in CRM that spans multiple entities with grouped conditions

We are a housing association with a large CRM system (2016 & SP1). We have a new requirement that requires our users to be able to search for people who are current (ie not previous) occupants or residents or who are not residents (eg contractors)
For this purpose, we need to search the Person entity which has a related Tenancy entity. Person has TenancyType field with possible (option set) values Occupant, Resident, Contractor. Tenancy has TenancyStatus field with possible (text) values Current and Previous.
We tried using the following filter criteria in the quick view on the Person entity:
thinking that it would return all people who are not previous residents. However we noticed that it would filter out contractors because contractors do not have related tenancy records.
We needed to change the criteria to return all contractors OR all residents and occupants with no previous tenancy. So we changed it to the following:
at which point we got stuck because we noticed that it was not possible to AND together the second and the third conditions as the third one is a related entity.
We are wondering what the best way is to achieve the above bearing in mind that we do not want a separate view for each condition, eg one for residents, one for none residents, etc.
Any help or suggestion is greatly appreciated.
It is not possible to do this with a single query.
Instead, you can use two queries. If you do not want to do that, then using reports (as suggested by Alex) or a BI-solution would be other possibilities.
Thanks to everyone here who spent time answering my question. The following describes the correct answer:
https://community.dynamics.com/crm/f/117/p/241352/666651#666651

How to sort posts to keep new top rated posts at top

Lets say we have a website with posts.
The information we can get is:
post_time (from site launch)(doesn't change)
post_rating (changes over time)
number_of_comments (changes over time)
what would be a good formula in order for the website to keep fresh and good posts at the top without seeing the same post at the top again.
I figured I should give "weights" to each of the fields above, where
post_time will have the heaviest weight.
What kind of sorting does 9gag use for instance?
the difference between post_time can vary between 1 second and minutes/hours
edit:
Clarification:
I have a database where I keep all of this information, what I need
is a formula that will keep the posts page up to date and a user that logs now and in 20 minutes will see different posts.
Ok so the simplest way I found is as follows:
Add an additional DATE field to the Posts SQL table that is called top_date
Select from the posts table all highest ratings that has top_date NULL
update top_date of a wanted post with current time stamp
now the posts page will use DESC by top_date.
also there can be a function that will be called every few minutes that will
gather the top rating (RATING ONLY) posts without top_date and randomly choose
one to be updated.
If i was unclear feel free to ask any question here or in private message.

Resources