Designing a Firebase based scalable feed model - performance

Question :
How to design a social network "feed" with Firebase as backend, that scales ?
Possible answers :
"MVP" solution is to design a feeds root child, one for each user, and append any new post from the followed user in every follower's feeds.
users
user1
name: bob
user2
name: alice
follows:
user1: true
posts
post1
author: user1
text: 'Hi there'
feeds
user2
post1: true
This works well, and is demoed in the Firefeed project. But it does not scale well : if Katy Perry wants to post something, her mobile phone will have to write to millions of feed.
Hence the solution reported in this SO question to delegate this operation to a server based process.
My problem is, Firebase is a "no-backend" solution, and this is the main reason why I use it, so I'd like to make sure there is absolutely no chance of implementing this feature without a server.
What if the feeds child is removed in the above schema ?
Then do this :
baseRef.child('posts')
.orderBy('author')
.whereIn(baseRef.child('users/user2/follows').keys())
Unfortunately, whereIn does not exists in Firebase API, nor subqueries :(
Any other model structure possible without the need of a server ?
Thanks

Firebase guys kinda replied on their blog : https://www.firebase.com/blog/2015-10-07-how-to-keep-your-data-consistent.html
The post is about "Data fanning" (spreading items across many nodes in one atomic write operation).
The technique greatly addresses the feed model of the original question
The post actually contains example code for implementing it :
Function for creating the fannout object (actually a simple object with keys being API endpoints to be written)
function fanoutPost({ uid, followersSnaphot, post }) {
// Turn the hash of followers to an array of each id as the string
var followers = Object.keys(followersSnaphot.val());
var fanoutObj = {};
// write to each follower's timeline
followers.forEach((key) => fanoutObj['/timeline/' + key] = post);
return fanoutObj;
}
And the logic using this function :
var followersRef = new Firebase('https://<YOUR-FIREBASE-APP>.firebaseio.com/followers');
var followers = {};
followersRef.on('value', (snap) => followers = snap.val());
var btnAddPost = document.getElementById('btnAddPost');
var txtPostTitle = document.getElementById('txtPostTitle');
btnAddPost.addEventListener(() => {
// make post
var post = { title: txtPostTitle.value };
// make fanout-object
var fanoutObj = fanoutPost({
uid: followersRef.getAuth().uid,
followers: followers,
post: post
});
// Send the object to the Firebase db for fan-out
rootRef.update(fanoutObj);
});
Note: this is way more scalable than a loop writing each time in one follower feed. However, it could nevertheless be insufficient for millions of followers. In that case, it would be safer to trust a server operation making several writes. I think client-side can be used for up to a few hundreds followers, which is the average number of followers on social media. (This needs to be verified by testing though)

Related

GraphQL Authorization / Permission

So basically how do you handle permissions?
Let's say we have a list of Post(s) of some kind, with an argument first to limit the amount of posts. And only the owner and approved users can read the posts, everyone else can't. What is the best way to implement this?
query {
{
viewer {
posts(first: 10) {
id
text
}
}
}
}
What I'm currently thinking of, is to have a single source of truth to whether a user can read the post or not, and hook it up with the dataloader module.
But, how do I query for exactly 10 posts? If I query my DB for exactly 10 rows, when I then later on filter them with some business logic, then I can get for example 8 posts returned.
A solution is to not put a limit on the query, but that's not very efficient. So what is a good way to go about this?
Inspiration from here
(1) https://dev-blog.apollodata.com/auth-in-graphql-part-2-c6441bcc4302
(2) https://dev-blog.apollodata.com/graphql-at-facebook-by-dan-schafer-38d65ef075af
(1) solved it by
export const DB = {
Lists: {
all: (user_id) => {
return sql.raw("SELECT id FROM lists WHERE owner_id is NULL or owner_id = %s, user_id);
}
}
}
as the query, and then to filter out which rows can be read:
resolve: (root, _, ctx) => {
// factor out data fetching
return DB.Lists.all(ctx.user_id)
.then( lists => {
// enforce auth on each node
return lists.map(auth.List.enforce_read_perm(ctx.user_id));
});
}
So, we can clearly see that it's querying for all the rows, even if, say, the first argument was 1, which is what I'm trying to avoid.
Maybe I'm approaching the problem wrong in some way, as the business logic lives on another layer than the DB one, so there's no way but to query all the rows. Any help appreciated.
For future reference and other people searching for solutions.
Used Dataloader to solve the authentication problem.
Literally implemented what they did in https://dev-blog.apollodata.com/graphql-at-facebook-by-dan-schafer-38d65ef075af and used this boilerplate repo as guidance. Not much more to say than that.

Merging a dynamic number of collections together

I'm working on my first laravel project: a family tree. I have 4 branches of the family, each with people/families/images/stories/etc. A given user on the website will have access to everything for 1, 2, or 4 of these branches of the family (I don't want to show a cousin stuff for people they're not related to).
So on various pages I want the collections from the controller to contain stuff based on the given user's permissions. Merge seems like the right way to do this.
I have scopes to get people from each branch of the family, and in the following example I also have a scope for people with a birthday this month. In order to show the right set of birthdays for this user, I can get this by merging each group individually if they have access.
Here's what my function would look like if I showed everyone in all 4 family branches:
public function get_birthday_people()
{
$user = \Auth::user();
$jones_birthdays = Person::birthdays()->jones()->get();
$smith_birthdays = Person::birthdays()->smith()->get();
$lee_birthdays = Person::birthdays()->lee()->get();
$brandt_birthdays = Person::birthdays()->brandt()->get();
$birthday_people = $jones_birthdays
->merge($smith_birthdays)
->merge($lee_birthdays )
->merge($brandt_birthdays );
return $birthday_people;
My challenge: I'd like to modify it so that I check the user's access and only add each group of people accordingly. I'm imagining something where it's all the same as above except I add conditionals like this:
if($user->jones_access) {
$jones_birthdays = Person::birthdays()->jones()->get();
}
else{
$jones_birthdays =NULL;
}
But that throws an error for users without access because I can't call merge on NULL (or an empty array, or the other versions of 'nothing' that I tried).
What's a good way to do something like this?
if($user->jones_access) {
$jones_birthdays = Person::birthdays()->jones()->get();
}
else{
$jones_birthdays = new Collection;
}
Better yet, do the merge in the condition, no else required.
$birthday_people = new Collection;
if($user->jones_access) {
$birthday_people->merge(Person::birthdays()->jones()->get());
}
You are going to want your Eloquent query to only return the relevant data for the user requesting it. It doesn't make sense to query Lee birthdays when a Jones person is accessing that page.
So what you will wind up doing is something like
$birthdays = App\Person::where('family', $user->family)->get();
This pulls in Persons where their family property is equal to the family of the current user.
This probably does not match the way you have your relationships right now, but hopefully it will get you on the right track to getting them sorted out.
If you really want to go ahead with a bunch of queries and checking for authorization, read up on the authorization features of Laravel. It will give let you assign abilities to users and check them easily.

Plugin performance in Microsoft Dynamics CRM 2013/2015

Time to leave the shy mode behind and make my first post on stackoverflow.
After doing loads of research (plugins, performance, indexes, types of update, friends) and after trying several approaches I was unable to find a proper answer/solution.
So if possible I would like to get your feedback/help in a Microsoft Dynamics CRM 2013/2015 plugin performance issue (or coding technique)
Scenario:
Microsoft Dynamics CRM 2013/2015
2 Entities with Relationship 1:N
EntityA
EntityB
EntityB has the following columns:
Id | EntityAId | ColumnDemoX (decimal) | ColumnDemoY (currency)
Entity A has: 500 records
Entity B has: 150 records per each Entity A record. So 500*150 = 75000 records.
Objective:
Create a Post Entity A Plugin Update to "mimic" the following SQL command
Update EntityB
Set ColumnDemoX = (some quantity), ColumnDemoY = (some quantity) * (some value)
Where EntityAId = (some id)
One approach could be:
using (var serviceContext = new XrmServiceContext(service))
{
var query = from a in serviceContext.EntityASet
where a.EntityAId.Equals(someId)
select a;
foreach (EntityA entA in query)
{
entA.ColumnDemoX = (some quantity);
serviceContext.UpdateObject(entA);
}
serviceContext.SaveChanges();
}
Problem:
The foreach for 150 records in the post plugin update will take 20 secs or more.
While the
Update EntityB Set ColumnDemoX = (some quantity), ColumnDemoY = (some quantity) * (some value) Where EntityAId = (some id)
it will take 0.00001 secs
Any suggestion/solution?
Thank you all for reading.
H
You can use the ExecuteMultipleRequest, when you iterate the 150 entities, save the entities you need to update and after that call the request. If you do this, you only call the service once, that's very good for the perfomance.
If your process could be bigger and bigger, then you should think making it asynchronous as a plug-in or a custom activity workflow.
This is an example:
// Create an ExecuteMultipleRequest object.
requestWithResults = new ExecuteMultipleRequest()
{
// Assign settings that define execution behavior: continue on error, return responses.
Settings = new ExecuteMultipleSettings()
{
ContinueOnError = false,
ReturnResponses = true
},
// Create an empty organization request collection.
Requests = new OrganizationRequestCollection()
};
// Add a UpdateRequest for each entity to the request collection.
foreach (var entity in input.Entities)
{
UpdateRequest updateRequest = new UpdateRequest { Target = entity };
requestWithResults.Requests.Add(updateRequest);
}
// Execute all the requests in the request collection using a single web method call.
ExecuteMultipleResponse responseWithResults =
(ExecuteMultipleResponse)_serviceProxy.Execute(requestWithResults);
Few solutions comes to mind but I don't think they will please you...
Is this really a problem ? Yes it's slow and database update can be so much faster. However if you can have it as a background process (asynchronous), you'll have your numbers anyway. Is it really a "I need this numbers in the next second as soon as I click or business will go down" situation ?
It can be a reason to ditch 2013. In CRM 2015 you can use a calculated field. If you need this numbers only to show up in forms (eg. you don't use them in reporting), you could also do it in javascript.
Warning this is for the desesperate call. If you really need your update to be synchronous, immediate, you can't use calculated fields, you really know what your doing etc... Why not do it directly in the database? I know this is a very bad advice. There are a lot of reason not to do it this way (you can read a few here). It's unsupported and if you do something wrong it could go really bad. But if your real situation is as simple as your example (just a calculated field, no entity creation, no relation modification), you could do it this way. You'll have to consider many things: you won't have any audit on the fields, no security, caching issues, no modified by, etc. Actually I pretty much advise against this solution.
1 - Put it this logic to async workflow.
OR
2 - Don't use
serviceContext.UpdateObject(entA);
serviceContext.SaveChanges();.
Get all the records (150) from post stage update the fields and ExecuteMultipleRequest to update crm records in one time.
Don't send update request for each and every record

Meteor: filter data in publish or on client

In Meteor I want to work on the document level when having a Mongo database and according to sources, what I have to watch out for is expensive publications so today my question is:
How would I go about publishing documents with relations, would I follow the relational-type of query where we would find assignment details with an assignment id like this:
Meteor.publish('someName', function () {
var empId = "dj4nfhd56k7bhb3b732fd73fb";
var assignmentData = Assignment.find({ employee_id: empId });
return AssignmentDetails.find({ assignment_id: $in [ assignment ] });
});
or should we rather take an approach like this where we skip the filtering step in the publish and instead publish every assignment_detail and handle that filter on the client:
Meteor.publish('someName', function () {
var empId = "dj4nfhd56k7bhb3b732fd73fb";
var assignmentData = Assignment.find({ employee_id: empId });
var detailData = AssignmentDetails.find({ employee_id: empId });
return [ assignmentData, detailData];
});
I guess this is a question of whether the amount of data being searched trough on the server should be more then or if the amount of data being transferred to the client should be bigger.
Which of these would be most cost effective for the server?
It's a matter of opinion, but if possible I would strongly recommend attaching employee_id to docs in AssignmentDetails, as you have in the second example. You're correct in suggesting that publications are expensive, but much more so if the publication function is more complex than necessary, and you can reduce your pub function to one line if you have employee_id in AssignmentDetails (even where there are many employee_ids for each assignment) by just searching on that. You don't even need to return that field to the client (you can specify the fields to return in your find), so the only incurred overhead would be in database storage (which is v. cheap) and adding it to inserted/updated AssignmentDetails docs (which would be imperceptible). The actual amount of data transferred would be the same as in the first case.
The alternative of just publishing everything might be fine for a small collection, but it really depends on the number of assignments, and it's not going to be at all scalable this way. You need to send the entire collection to the client every time a client connects, which is expensive and time-consuming at both ends if it's more than a MB or so, and there isn't really any way round that overhead when you're talking about a dynamic (i.e. frequently-changing) collection, which I think you are (whereas for largely static collections you can do things with localStorage and poll-and-diff).

How can I change the column name of an existing Class in the Parse.com Web Browser interface?

I couldn't find a way to change a column name, for a column I just created, either the browser interface or via an API call. It looks like all object-related API calls manipulate instances, not the class definition itself?
Anyone know if this is possible, without having to delete and re-create the column?
This is how I did it in python:
import json,httplib,urllib
connection = httplib.HTTPSConnection('api.parse.com', 443)
params = urllib.urlencode({"limit":1000})
connection.connect()
connection.request('GET', '/1/classes/Object?%s' % params, '', {
"X-Parse-Application-Id": "yourID",
"X-Parse-REST-API-Key": "yourKey"
})
result = json.loads(connection.getresponse().read())
objects = result['results']
for object in objects:
connection = httplib.HTTPSConnection('api.parse.com', 443)
connection.connect()
objectId = object['objectId']
objectData = object['data']
connection.request('PUT', ('/1/classes/Object/%s' % objectId), json.dumps({
"clonedData": objectData
}), {
"X-Parse-Application-Id": "yourID",
"X-Parse-REST-API-Key": "yourKEY",
"Content-Type": "application/json"
})
This is not optimized - you can batch 50 of the processes together at once, but since I'm just running it once I didn't do that. Also since there is a 1000 query limit from parse, you will need to do run the load multiple times with a skip parameter like
params = urllib.urlencode({"limit":1000, "skip":1000})
From this Parse forum answer : https://www.parse.com/questions/how-can-i-rename-a-column
Columns cannot be renamed. This is to avoid breaking an existing app.
If your app is still under development, you can just query for all the
objects in your class and copy the value of the old column to the new
column. The REST API is very useful for this. You may them drop the
old column in the Data Browser
Hope it helps
Yes, it's not a feature provided by Parse (yet). But there are some third party API management tools that you can use to rename the fields in the response. One free tool is called apibond.com
It's a work around, but I hope it helps

Resources