How to best create sorted sets with Redis - sorting

I'm still a bit lost when it comes to Sorted Sets and how to best construct them. Currently I have a simple set of activity on my site. Normally it will display things like User Followed, User liked, User Post etc. The JSON looks something like...
id: 2808697,
activity_type: "created_follower",
description: "Bob followed this profile",
body: null,
user: "Bob",
user_id: 99384,
user_profile_id: 233007,
user_channel_id: 2165811,
user_cube_url: "bob-anerson",
user_action: "followed this profile",
buddy: "http://s3.amazonaws.com/stuff/ju-logo.jpg",
affected: "Bill Anerson is following Jon Denver.",
created_at: "2014-06-24T20:34:11-05:00",
created_ms: 1403660051902,
profile_id: 232811,
channel_id: 2165604,
cube_url: "jondenver",
type: "profiles",
So if the activity type can be multiple things (IE Created Follow, Liked Event, Posted News, ETC) how would I go about putting this all in a sorted set? I'm already sure I want the score to be the created_ms but the question is, can I do multiple values in a sorted set that all have keys as fields? Should most of this be in a hash? I realize this is a fairly open question but after trying to wrap my head around all the tutorials Im just concerned about setting up the data structure before had so I dont get caught to deep in the weeds.

A sorted set is useful if you want to... keep stuff sorted! ;)
So, I assume you're interested in keeping the activities sorted by their creation time (ms). As for storing the actual data, you have two options:
Use the sorted set itself to store the data, even in native JSON format. Note that with this approach you'll only be able to fetch the entire JSON and you'll have to parse it at the client.
Alternatively, use the sorted to store "pointers" to hashes - i.e. the values will be key names in which you'll store the data. From your description, this appears the preferable approach.

Related

Spring Data MongoDB - Embedded Document as Reference in Other Document

I'd like to know if it's possible (or even correct) to use embedded documents as reference in other documents.
I know I can move the embedded document to its own collection but the main goal is to have the performance benefit of embedded document and also avoid duplication.
For example:
User
{
_id: ObjectId("4fed0591d17011868cf9c982"),
_class: "User"
...
addresses: [ {
_id: ObjectId("87KJbk87gjgjjygREewakj86"),
_class: "Address",
...
} ]
}
Order
{
_id: ObjectId("gdh60591d123487658cf9c982"),
_class: "Order",
...
address: ObjectId("87KJbk87gjgjjygREewakj86")
}
Your case reminds me of the typical relational approach, which I was a victim of, too, when starting to use document-oriented DBs. All of your
entities in the example are referenced, there is no redundancy anymore.
You should start to get used to the idea of letting go normalization and starting to duplicate data. In many cases it is hard to determine which data should be referenced and which should be embedded. Your case tends to be quite clear, though.
Without knowing your entire domain model, the address seems to be a perfect candidate for a value object. Do not maintain an Address collection, embed it within the user object. In Order, you could either make a reference to the user, which gives you implicitly the address object and might make sense, since an order is made by a user.
But...I recommend that you embed the address entirely in the Order. First, it is faster since you don't need to resolve a reference. Second, the address in shipped orders should never change! Consider orders of the last year. If you hold a reference to the address you would lose the information to which address they were shipped, once the user changes his address.
Suggestion: Always take a snapshot of the address and embed it in the Order. Save the MongoDB ID of the user as a regular string (no #DBRef) within the `Order. If a user should change his address, you can make a query for all non-shipped orders of that user and amend the address.
Since you asked if this is even correct, I would say, gently, "No." At least not typically.
But if you did want to insist on using an embedded address from user:
You can reference the user embedded address in the Order object, just not the way you might think! If you stored the id of the user in the order (it should already be there if Order belongs_to User), then you merely use user.address instead of copying the address instance as you have done.
ALTERNATIVE
I hope to illustrate a better approach to modeling the domain...
A more common approach is to instantiate a new order object, using the user's address as the default "ship to" address for the order, yet allow the user to override the shipping address if desired. In theory, each order could have a different "ship to" address.
Just because two classes have an address, does not mean they are necessarily the same address.
COMMENTARY
Orders are more of an historical document, versus one that changes. Therefore, Orders are generally immutable once placed, your model allows the address to change every time the user changes their address. That change ripples into the Orders, and would be incorrect insofar as normal order business logic goes.
Assume your address last year was in Spain and you had Order #1 show Spain when you ran a report of Orders last year. Imagine if your address this year is now Portugal and Order #1 now shows Portugal in the same report. That would be factually incorrect.
BTW: #Matt gave you the tip that from a "problem domain" perspective, you likely do not want to model it as you have. I am merely elaborating on that...
Since I got no answer I will post here how I did it. If you have a better solution I am happy to here it.
It looks like there's no way to create/reference a collection inside another collection, so I had to extract the addresses from the user collection to it's own collection and create a reference in the User and Order collections as mentioned here. I was expecting something more flexible, but couldn't find one.
User
{
_id: ObjectId("4fed0591d17011868cf9c982"),
_class: "User"
...
addresses: [ {
"$ref" : "addresses",
"$id" : ObjectId("87KJbk87gjgjjygREewakj86")
} ]
}
Address
{
_id: ObjectId("87KJbk87gjgjjygREewakj86"),
...
}
Order
{
_id: ObjectId("gdh60591d123487658cf9c9867"),
_class: "Order",
...
address: {
"$ref" : "addresses",
"$id" : ObjectId("87KJbk87gjgjjygREewakj86")
}
}

Algorithm for recursively linked objects

I'm maintaining a small program that goes through documents in a Neo4j database and dumps a JSON-encoded object to a document database. In Neo4j—for performance reasons, I imagine—there's no real data, just ID's.
Imagine something like this:
posts:
post:
id: 1
tags: 1, 2
author: 2
similar: 1, 2, 3
I have no idea why it was done like this, but this is what I have to deal with. The program then uses the ID's to fetch information for each data structure, resulting in a proper structure. Instead of author being just an int, it's an Author object, with name, email, and so on.
This worked well until the similar feature was added. Similar consists of ID's referencing other posts. Since in my loop I'm building the actual post objects, how can I reference them in an efficient manner? The only thing I could imagine was creating a cache with the posts I already "converted" and, if the referenced ID is not in the cache, put the current post on the bottom of the list. Eventually, they will all be processed.
The approach you're proposing won't work if there are cycles of similar relationships, which there probably are.
For example, you've shown a post 1 that is similar to a post 2. Let's say you come across post 1 first. It refers to post 2, which isn't in the cache yet, so you push post 1 back onto the end of the queue. Now you get to post 2. It refers to post 1, which isn't in the cache yet, so you push post 2 back onto the end of the queue. This goes on forever.
You can solve this problem by building the post objects in two passes. During the first pass, you make Post objects and fill them with all the information except for the similar references, and you build up a map[int]*Post that maps ID numbers to posts. On the second pass, for each post, you iterate over the similar ID numbers, look up each one in the map, and use the resulting *Post values to fill a []*Post slice of similar posts.

How exactly does the fetchAllIfNeeded differ from fetchAll in the JS SDK?

I never quite understood the if needed part of the description.
.fetchAll()
Fetches the given list of Parse.Object.
.fetchAllIfNeeded()
Fetches the given list of Parse.Object if needed.
What is the situation where I might use this and what exactly determines the need? I feel like it's something super elementary but I haven't been able to find a satisfactory and clear definition.
In the example in the API, I notice that the fetchAllIfNeeded() has:
// Objects were fetched and updated.
In the success while the fetchAll only has:
// All the objects were fetched.
So does the fetchAllIfNeeded() also save stuff too? Very confused here.
UPDATES
TEST 1
Going on some of the hints #danh left in the comments I tried the following things.
var todos = [];
var x = new Todo({content:'Test A'}); // Parse.Object
todos.push(x);
x.save();
// So now we have a todo saved to parse and x has an id. Async assumed.
x.set({content:'Test B'});
Parse.Object.fetchAllIfNeeded(todos);
So in this scenario, my client x is different than the server. But the x.hasChanged() is false since we used the set function and the change event is triggered. fetchAllIfNeeded returns no results. So it isn't that it's trying to compare this outright to what is on the server to sync and fetch.
I notice that in the request payload, running the fetchAllIfNeeded is sending the following interesting thing.
{where: {objectId: {$in: []}}, _method: "GET",…}
So it seems that on the clientside something determines whether an object isNeeded
Test 2
So now, based on the comments I tried manipulating the changed state of the object by setting with silent.
x.set({content:'Test C'}, {silent:true});
x.hasChanged(); // true
Parse.Object.fetchAllIfNeeded(todos);
Still nothing interesting. Clearly the server state ("Test A") is different than clientside ("Test C"). and I still results [] and the request payload is:
{where: {objectId: {$in: []}}, _method: "GET",…}
UPDATE 2
Figured it out by looking at the Parse source. See answer.
After many manipulations, then taking a look at the source - I figured this out. Basically fetchAllIfNeeded will fetch models in an array that have no data, meaning there are no attribute properties and values.
So the use case would be you have lets say a parent object with an array of nested Parse Objects. When you fetch the parent object, the nested child objects in the array will not be included (unless you have the include query constraint set). Instead, the pointers are sent back to clientside and in your client, those pointers are translated into 'empty' models with no data, basically just blank Parse.Objects with ids.
Specifically, the Parse.Object has an internal Boolean property called _hasData which seems to be toggled true any time stuff like set, or fetch, or whatever gives that model attributes.
So, lets say you need to fetch those child objects. You can just do something like
var childObjects = parent.get('children'); // Array
Parse.Object.fetchAllIfNeeded(childObjects);
And it will search for those children who are currently only represented as empty Objects with id.
It's useful as opposed to fetchAll in that you might go through the children array and lazily load one at a time as needed, then at a later time need to "get the rest". fetchAllIfNeeded essentially just filters "the rest" and sends a whereIn query that limits fetching to those child objects that have no data.
In the Parse documentation, they have a comment in the callback response to fetchAllIfNeeded as:
// Objects were fetched and UPDATED.
I think they mean the clientside objects were updated. fetchAllIfNeeded is definitely sending GET calls so I doubt anything updates on the serverside. So this isn't some sync function. This really confused me as I instantly thought of serverside updating when they really mean:
// Client objects were fetched and updated.

Rails ActiveRecords with own attributes + associated objects' IDs

I have a rather simple ActiveRecords associations like such (specifically in Rails 4):
An organization has many users
A user belongs to an organization
But in terms of ActiveReocord queries, what's an optimal way to construct a query to return an array of Organizations each with its own array of user ids associated with itself? Basically, I'd like to return the following data structure:
#<ActiveRecord::Relation [#<Organization id: 1, name: "org name",.... user_ids: [1,2,3]>, <Organization id: 2...>]>
... or to distill it even further in JSON:
[{id: 1, name: 'org name', ... user_ids: [1,2,3]}, {...}]
where users is not part of the Organizations table but simply an attribute constructed on the fly by ActiveRecord.
Thanks in advance.
EDIT: After trying a few things out, I came up with something that returned the result in the format I was looking for. But I'm still not sure (nor convinced) if this is the most optimal query:
Organization.joins(:users).select("organizations.*, '[#{User.joins(:organization).pluck(:id).join(',')}]' as user_ids").group('organizations.id')
Alternatively, the JBuilder/Rabl approach #Kien Thanh suggested seem very reasonable and approachable. Is that considered current best practice nowadays for Rails-based API development (the app has the back-end and front-end pieces completely de-coupled)?
The only thing to be aware of with a library solution such as JBuilder or Rabl is to watch the performance when they build the json.
As for your query use includes instead of joins to pull the data back.
orgs = Organization.includes(:users)
You should not have to group your results this way (unless the group was for some aggregate value).
ActiveRecord::Relation gives you some automatic helper methods, one of which is association_ids.
So if you create your own JSON from a hash you can do
orgs.map! {|o| o.attributes.merge(user_ids: o.user_ids).to_json }
EDIT: Forgot to add the reference for has_many http://guides.rubyonrails.org/association_basics.html#has-many-association-reference

Adviced on how to array a mongodb document

I am building an API using Codeigniter and MongoDB.
I got some questions about how to "model" the mongoDB.
A user should have basic data like name and user should also be able to
follow other users. Like it is now each user document keeps track of all people
that is following him and all that he is following. This is done by using arrays
of user _ids.
Like this:
"following": [323424,2323123,2312312],
"followers": [355656,5656565,5656234234,23424243,234246456],
"fullname": "James Bond"
Is this a good way? Perhaps the user document should only contain ids of peoples that the user is following and not who is following him? I can imaging that keeping potentially thousands of ids (for followers) in an array will make the document to big?
All input is welcome!
The max-document size is currently limited to 16MB (v1.8.x and up), this is pretty big. But i still think, that it would be ok in this case to move the follower-relations to an own collection -- you never know how big your project gets.
However: i would recommend using database references for storing the follower-relations: it's way easier to resolve the user from a database reference. Have a look at:
http://www.mongodb.org/display/DOCS/Database+References

Resources