best practice for very simple relation on a nosql table - graphql

I am using a dynamoDB table with GraphQL api to store posts. I want a user to be able to mark certain posts as favorites.
I have thought to create a relation table of user to post, but I also thought to just add an array of userId's to the post object with all the userIds of users who have made that post a favorite.
My understanding is a UUID is 16 bytes so even if say 10,000 users favorite the object then that array will be 160kb. Not insignificant but manageable to pass that much data with the object each time it is loaded.
Just wondering what is the best practice for this scenario. I'm pretty new to nosql.

With dynamoDB you have to think about access patterns first:
To get the favorite posts of a user, store a postsIds array in the user table
To get the users who like a post, store a likerIds array in the post table
To get a bidirectional link, do both of the above
Please also keep in mind that:
You can select fields when getting a document (only select the fields you are interested in)
I don't see a scenario where you would load 10k usernames and display them
The above solution looks pretty good for common scenarios.
More advanced solution:
There could be a more powerful way to do that using range keys. For instance:
Hash Key: postID
range key: likerID
title
...
post1
MyFancyPost
post1
user1
post1
user2
This structures is more powerful, and could store a lot of connections without having any "big" field in the post model.
you could easily paginate, and count the list of likers
can handle many more likers for only one post

Related

GraphQL viewer-contextual queries on the top-level or within the viewer type?

When building the query and type graph structure in a GraphQL API, where would you put highly contextual queries that only apply to the viewer?
On the top-level (query.friendRequests)
This would remove noise in the User entity and only keep queries in there that are queryable for all users. Not just the viewing user.
It would add much more top-level queries with a risk of them becoming specialists in specific things which is not really thinking-in-a-graph and model-data-around-business-logic ideas.
On the viewer entity (query.viewer.friendRequests)
From a data perspective, this makes more sense to put it underneath the viewer entity (which is a User type). friend requests always belong to a parent object which is always a user.
Other Examples
Dashboard widgets
User notifications
Action items / TODO items / Task lists
Messages
Counters and badges
What are you guys' thoughts on this? What would be a good best-practice to follow for viewing user contextual queries that don't apply to other user entities in an API implementation?
We have always put it under a specific field in Query. First we started with a me query that would return a user. But this did not turn out very practical because the user type got very big and also most fields did not need the whole user object but only the user's ID. In your example we would have done two queries
SELECT * FROM account WHERE id = $id
SELECT * FROM friend_request WHERE account_id = $id
Unless we would query a trivial field on the me query the first query was completely wasted.
Then we got inspired a bit this thread and especially this answer from Lee Byron
Viewer is what we used everywhere at FB, so it’s stuck with me. Also, a Viewer is not a User, it’s an Auth session - which references a User. So there’s a useful distinction of terms.
Now we have a viewer query that returns a Viewer object. This object then has a field user to query the actual user object. This also might or might not help solving the problem around private and public fields on your user object.

Search/retrieve by a large OR query clause with Solr or Elasticsearch

I have a search database of car models: "Nissan Gtr", "Huynday Elantra", "Honda Accord", etc...
Now I also have a user list and the types of cars they like
user1 likes: carId:1234, carId:5678 etc...
Given user 1 I would like to return all the cars he likes, it can be 0 to even hundreads.
What the best way to model this in Solr or potentially another "nosql" system that can help with this problem.
I'm using Solr but I have the opportunity to use another system if I can and if it makes sense.
EDIT:
Solr solution is to slow for Join (Maybe we can try nested). And the current MySQL solution which uses join tables has over 2 billion rows.
so, you just want to store a mapping between User->Cars, and retrieve the cars based on the user...sounds very simple:
Your docs are Users: contain id (indexed), etc fields
one of the field is 'carsliked', multivalued, which contains the set of car ids he likes
you have details about each care in a different collection for example.
given a user id, you retrieve the 'carsliked' field, and get the car details with a cross collection join
You could also use nested object to store each liked car (with all the info about it) inside each user, but is a bit more complex. As a plus, you don't need the join on the query.
Solr would allow you many more things, for example, given a car, which users do like it? Elasticsearch will work exactly the same way (and probably many other tools, given how simple your use case seems).

Parse.com post-comment relationship

I would like to build a application like facebook (actually has nothing to do with facebook, but for the nature of the question we can say so).
I currently have a table named Post and another named Comment and of course I would represent the one-to-many relationship between them (I read the documentation here but wasn't really helpful to me).
In Comment I created a column with a pointer to the Post class with the parent Post.
In Post I created then a column with an Array where will be stored the related comment's id.
(each post will have a number of comments not very high, between 10 and 100).
The technique used here is the best? There are more efficient methods?
If your array is only storing the objectIDs for the comments then it's probably more idiomatic to use a Relation as the column type rather than an Array.
A Relation is more efficient in that the ID's aren't returned when you retrieve your Post object, so your Post objects will transfer faster, and it has the same disadvantages as storing the object ID's in an Array in that you'll still have to run a query to get the Comment objects. The only possible downside I can see is that if you need to have the number of comments, you can calculate this based on the size of the array, but with a Relation you'll have to run a count query (or maintain a separate count field).
With an Array you're introducing a slight data maintenance/integrity overhead as well. If your users have the ability to delete comments, then you'll also need to remove the comment ID from the array. And this will require a permissive ACL (to allow a commenter to edit a post they may not have created, and because of this they'll have the ability to edit any value in the post), or you'll have to have a before/after save action to update the Post when a Comment is deleted.

Best approach on allowing users create their own fields

I'm about to embark on a project where a user will be able to create their own custom fields. MY QUESTION - what's the best approach for something like this?
Use case: we have medical records with attributes like first_name, last_name etc... However we also want a user to be able to log into their account and create custom fields. For instance they may want to create a field called 'second_phone' etc... They will then map their CRM to their fields within this app so they can import their data.
I'm thinking on creating tables like 'field_sets (has_many fields)', 'fields', 'field_values' etc...
This seems like it would be somewhat common hence why I thought I would first ask for opinions and/or existing examples.
This is where some modern schemaless databases can help you. My favourite is MongoDB. In short: you take whatever data you have and stuff a document with it. No hard thinking required.
If, however, you are in relational land, EAV is one of classic approaches.
I have also seen people do these things:
predefine some "optional" fields in the schema and use them if necessary.
serialize this optional data to string (using JSON, for example) and write it to text blob.

How are application like twitter implemented?

Suppose A follows 100 person,
then will need 100 join statement,
which is horrible for database I think.
Or there are other ways ?
Why would you need 100 Joins?
You would have a simple table "Follows" with your ID and the other persons ID in it...
Then you retrieve the "Tweets" by joining something like this:
Select top 100
tweet.*
from
tweet
inner join
followers on follower.id = tweet.AuthorID
where
followers.masterID = yourID
Now you just need a decent caching and make sure you use a non locking query and you have all information... (Well maybe add some userdata into the mix)
Edit:
tweet
ID - tweetid
AuthorID - ID of the poster
Followers
MasterID - (Basically your ID)
FollowerID - (ID of the person following you)
The Followers table has a composite ID based on master and followerID
It should have 2 indexes - one on "masterID - followerID" and one on "FollowerID and MasterID"
The real trick is to minimize your database usage (e.g., cache, cache, cache) and to understand usage patterns. In the specific case of Twitter, they use a bunch of different techniques from queuing, an insane amount of in-memory caching, and some really clever data flow optimizations. Give Scaling Twitter: Making Twitter 10000 percent faster and the other associated articles a read. Your question about how you implement "following" is to denormalize the data (precalculate and maintain join tables instead of performing joins on the fly) or don't use a database at all. <-- Make sure to read this!

Resources