Storing contents/references of the graph nodes on a graph db - social-networking

I am building a social network natured web app. there are a number of related entitites like publishers, users, publications, posts, comments; with relations like subscribe, like, share, comment, friend, follow, work, author. Also as usual there is content of some of the entities lie contents of publications, posts, comments.
Publications and Posts are similar and just like classical social posts with more than one text or one media files. And Publications, Posts, and Comments will be ever increasing and will reach high numbers.
I plan to use Cayley db maybe dependence on Postgresql.
My so far complete tuples describing relations of entitites are so:
publisher -publish- publication
user -subscribe- publisher
user -comment- publication
user -like- publication
user -share- publication
user -post- post
user -friend- user
user -comment- post
user -like- post
user -share- post
publisher -follow- publisher
user -work- publisher
user -author- publication
The question is:
Where to store my posts/comments, which have also content such as one to many text fields and media files, and will be liked, shared, and commented by users?
Should I store posts/comments in graph database, if so should I store content or store only a reference and store content in another database, table or document? Regarding quantities of these entities will reach large numbers.

Stumbled upon this open question and thought I try to answer it for future reference.
And of course it totally depends on your implementation and the requirements.
Comments you can store in Cayley, but I would leave images/video or other media out of it. Just store a URI to the specific resource or an UUID for the object in another datastore.
Also for comments; you could store the UUID/ID for the comment in your graphdb and store the actual comment in Postgres or even in a file on disk (just as an example).
I personally prefer to keep my graphdb as small as possible and store actual content in a different storage.

Related

Dyanamic forms with floating labels Django/Bootstrap5

Good day, I've struggled with an upcoming forms project architecture, specifications are as such from the stakeholder:
Django framework,
Bootstrap5,
Floating Labels,
Multiple Users that cannot have access to each others db.
Operation Theory:
A law clerk logs in, request a form via API, the questions in the form are somehow saved and presented to a user logging in at different times. After the questions are answered by the user, the Q's and A's are submitted and saved for review before being pushed elsewhere.
Things I'm struggling with:
Should I create two(2) tables and save the questions returned from the API as field entries? Then, once the user is prepared to answer, dynamically present the questions using floating labels and then save the answers to another table?
The questions from the API call are of any length, it can be as little as 5 or as many as 130, how do I create fields at runtime with floating labels accodring to the questionaire length? Also have to match them(Q's and A's) back up for review by a third party.

best practice for very simple relation on a nosql table

I am using a dynamoDB table with GraphQL api to store posts. I want a user to be able to mark certain posts as favorites.
I have thought to create a relation table of user to post, but I also thought to just add an array of userId's to the post object with all the userIds of users who have made that post a favorite.
My understanding is a UUID is 16 bytes so even if say 10,000 users favorite the object then that array will be 160kb. Not insignificant but manageable to pass that much data with the object each time it is loaded.
Just wondering what is the best practice for this scenario. I'm pretty new to nosql.
With dynamoDB you have to think about access patterns first:
To get the favorite posts of a user, store a postsIds array in the user table
To get the users who like a post, store a likerIds array in the post table
To get a bidirectional link, do both of the above
Please also keep in mind that:
You can select fields when getting a document (only select the fields you are interested in)
I don't see a scenario where you would load 10k usernames and display them
The above solution looks pretty good for common scenarios.
More advanced solution:
There could be a more powerful way to do that using range keys. For instance:
Hash Key: postID
range key: likerID
title
...
post1
MyFancyPost
post1
user1
post1
user2
This structures is more powerful, and could store a lot of connections without having any "big" field in the post model.
you could easily paginate, and count the list of likers
can handle many more likers for only one post

Distributed GraphQL in microservices

I'm trying to write microservices in Java. I've implemented GraphQL endpoints using graphql-spring-boot-starter.
Now I have a problem how to make it efficient.
Datamodel is like a tree and I need to query for data from multiple services at once. The problem is how to filter for a member of collection, something like CONTAINS in database, but data is not in separate table, but separate microservice. Maybe the problem is that domain is not correctly splitted between services?
Let's make an example: I have 3 microservices: users, libriaries, books. Every library have collection of users and books (just list of identifiers, like foreign keys). Every book has a name and genre. Every library have lists of books borrowed by user (identifiers too).
Question 1 - should library hosts list of books and users (just identifiers, like foreign keys)? Is it correct approach?
Question 2 - I want to find libraries in which specified users (by surname) have borrowed books of specified genre. Going from top I need to first find libraries containing users. Not easy, as we have names in different service. We need to query first for users, gathers their identifiers, and now we are able to query for libraries. But it isn't all. Now we need to find books for every user and check genres - in different service. And it's not all. I want to have everything presented in nice way, so whole output should be sorted and paged. It force me to collect all data from all services, then page and sort it, which of course will not be efficient.
Please don't concentrate on this example, I'm looking how to solve general approach, not this one example. I've tried to use Datafetchers but it's troublesome and there are not good examples of calling Graphql-to-GraphQL. Most examples covers calling REST endpoints etc.

BigTable query with IN operator to get all user group keys

I have little problem with permissions in my future social application.
Platform will nonrel db (Google's BigTable).
In my application each user has groups (for example: friends, collaborators, family...). In group has some friends (like in Facebook). And can publish some content (news, short text, ...) only for this group.
If I have user in my group it is my friend. Like in Facebook, but more groups.
My idea is, that each user can see (on himself "feed") all last content of all friends in one page (like as Facebook's Top news).
But I have problems with creating simple query.
For example:
SELECT * FROM News WHERE group_key IN [list_of_groups_where_i_am]
This works good, but there are sub-queries and limit of list is 30 items.
Other way is strong caching of content.
Does anybody have some idea? Or any study material, example...
With a requirement like this you can optimize for either read or write, but usually not both. You have the write optimized version - just write a record with the right group key but have a complex query to get content for all the groups.
The read optimized version would be to write the content (or just its id) to a feed for each user, which makes the read query very simple.

Outlook contact sync - How to identify the correct object to sync with?

I have a web application that syncs Outlook contacts to a database (and back) via CDO. The DB contains every contact only once (at least theoretically, of course doublets happen), providing a single point of change for a contact, regardless of how many users have that particular contact in Outlook (like Interaction or similar products).
The sync process is not automatic, but user-initialized. An arbitrary timespan can pass before users decide to sync their contacts. A subset of these contacts may have been updated by other users in the meantime.
Generally, this runs fine, but I have never been able to solve this fundamental problem:
How do I doubtlessly identify a contact object in a mailbox?
I can't rely on PR_ENTRYID, this
property changes on contact move or
mailbox move.
I can't rely on my own IDs (e.g. DB
table ID), because these get copied
with the contact.
I absolutely can't rely on fields
like name or e-mail address, they
are subject to changes and updates.
Currently I use a combination of 1 (preferred) and 2 (fall-back). But inevitably, sometimes users run into the problem of synching to the wrong contact because there is none with a given PR_ENTRYID, but two with the same DB ID, of which the wrong one is chosen.
There are a bunch of Outlook-synching products out there, so I guess the problem must be solvable.
I had a similar problem to overcome with an internal outlook plugin that does contact syncing. I ended up sticking a database id in the Outlook object and referring to that when doing syncs.
The difference here is that our system has a bunch of duplicates that get resolved later by the users. When they get merged I'll remove the old records and update outlook with all of the new information along with a new id.
You could do fuzzy matching to identify duplicates, but duplicate resolution is a funny problem that's mostly trial and error. We've been successful at implementing "fuzzy" matching logic using the levenshtein distance algorithm for names and addresses cleaned down to a hash code.
Good luck, my syncing experiences have been somewhat painful.

Resources