Redis - how to store my data? - memory-management

In the redis site , in "memory optimization" it says that small hashes use way less memory than a few keys so it is better to store a small hash with few fields instead of a few keys so I thought of making,for example, a users hash and storing the users in fields as json serialized data but how about my hash is REALLY big meaning I have a lot of fields.
Is it better to store the users as a single hash with a lot of fields or as several small hashes ??
Im asking this because in the redis site it says that "small" hashes are better than several keys for storing a couple of values but I dont know if it still aplies for really big hashes.

I would say your best solution is creating a key per user, perhaps named by the users id and storing the json data.
We tried storing each user as 1 hash per user and then fields for each of the users properties, but we found we never really used the fields individually and in most cases needed most of the data (HGETALL), so switched to storing json - which also helps with preserving data types.
Would need more detail as to what and how you're trying to store the data to give more suggestions really.

Let's say you have a user like this:
{"ID": "344", "Name": "Blah", "Occupation": "Engineer", "Telephone": [ "550-33...", ...] }
You would serialize the JSON and store it as what Redis says String. I.e. you would use the "GET" and "SET" commands.
e.g.
SET "user:344" "<SERIALIZED>
Since "users" is one of your main objects, it is not a small hash.
The gist of the documentation is about having hashes will small number of elements. For example, let's say that in your whole system you have 10 colors, and you want to associate some data with each of them. Instead of doing:
color:blue -> DATA, color:white -> DATA
It is better to you the hash.
colors -> blue -> DATA
colors -> while -> DATA

Related

Can I sort a bunch of values without retaining the actual content of the string? Two-Key sort one from on-premise another in the cloud

What do I want to do
I want to sort a bunch of strings, simple enough.
What are my constraints
I have the original text stored on-premises which has the real text which I want to sort, the cloud has some other "columns" of data which is not on-premises and for security reasons I cannot take the original text from on-premises to the cloud.
The real constraint is that I cannot have all the data in one place which causes sorting, paging on values across on-premises & cloud data difficult.
What I thought of (and where I need help)
Maybe I can take a hash or some other way of extracting certain data from the string in such a way that the original string cannot be reproduced (takes care of the security thing) but the extracted string would be enough that I can do sorting on it.
Example
on-premises data:
[
{
"id": 1,
"name": "abcd"
}
]
cloud data:
[
{
"id": 1,
"price": "20"
}
]
I need to sort on both price and name in the above example (imagine a 100,000 rows of such data).
What you need to do is to store pairs of a string and the corresponding id, e.g. in two lists/arrays (whatever the programming language of your choice offers).
Then start sorting the strings, but each time you move a string, move the id the same way.
Alternatively, most programming languages offer constructs which allow to make pairs and then you sort those pairs according to strings, which will automatically move the ids around.
Both ways mean that after sorting, you can still find the id for each string, then with that id you can access the corresponding cloud data as usual.
As an example, the programming language C offers the compound data type construct
struct IdStringPair
{
int id;
char* string;
/* actually just the address of where the full string is stored,
but basically what you probably want to use */
};
Hardly any programming language exists which does not offer something similar.
If conversely the data to sort by is in the cloud, then sorting has to take place in the cloud, i.e. by something being able to execute the sorting algorithm. Make sure that you sort the id along with the key. Then finding the non-cloud string is again the same as before. Whatever you previously did to find a string to an id, do it with the id you got from the cloud sorted data.
This is the same as the first situation/solution, just mirrored.
The core concept is to always sort the ids along with the key (and other data) and thereby dispose of the need to have the data from the other side of the gap between clould and premise. That is applicable to all versions of the sorting of separated data.

Query core data store based on a transient calculated value

I'm fairly new to the more complex parts of Core Data.
My application has a core data store with 15K rows. There is a single entity.
I need to display a subset of those rows in a table view filtered on a calculated search criteria, and for each row displayed add a value that I calculate in real time but don't store in the entity.
The calculation needs to use a couple of values supplied by the user.
A hypothetical example:
Entity: contains fields "id", "first", and "second"
User inputs: 10 and 20
Search / Filter Criteria: only display records where the entity field "id" is a prime number between the two supplied numbers. (I need to build some sort of complex predicate method here I assume?)
Display: all fields of all records that meet the criteria, along with a derived field (not in the the core data entity) that is the sum of the "id" field and a random number, so each row in the tableview would contain 4 fields:
"id", "first", "second", -calculated value-
From my reading / Googling it seems that a transient property might be the way to go, but I can't work out how to do this given that the search criteria and the resultant property need to calculate based on user input.
Could anyone give me any pointers that will help me implement this code? I'm pretty lost right now, and the examples I can find in books etc. don't match my particular needs well enough for me to adapt them as far as I can tell.
Thanks
Darren.
The first thing you need to do is to stop thinking in terms of fields, rows and columns as none of those structures are actually part of Core Data. In this case, it is important because Core Data supports arbitrarily complex fetches but the sqlite store does not. So, if you use a sqlite store your fetches are restricted those supported by SQLite.
In this case, predicates aimed at SQLite can't perform complex operations such as calculating whether an attribute value is prime.
The best solution for your first case would be to add a boolean attribute of isPrime and then modify the setter for your id attribute to calculate whether the set id value is prime or not and then set the isPrime accordingly. That will be store in the SQLite store and can be fetched against e.g. isPrime==YES &&((first<=%#) && (second>=%#))
The second case would simply use a transient property for which you would supply a custom getter to calculate its value when the managed object was in memory.
One often overlooked option is to not use an sqlite store but to use an XML store instead. If the amount of data is relatively small e.g. a few thousand text attributes with a total memory footprint of a few dozen meg, then an XML store will be super fast and can handle more complex operations.
SQLite is sort of the stunted stepchild in Core Data. It's is useful for large data sets and low memory but with memory becoming ever more plentiful, its loosing its edge. I find myself using it less these days. You should consider whether you need sqlite in this particular case.

Sort by key in Cassandra

Let's assume I have a keyspace with a column family that stores user objects and the key of these objects is the username.
How can I use Hector to get a list of users sorted by username?
I tried to use a RangeSlicesQuery, paging works fine with this query, but the results are not sorted in any way.
I'm an absolute Cassandra beginner, can anyone point me to a simple example that shows how to sort a column family by key? Please ask if you need more details on my efforts.
Edit:
The result was not sorted because I used the default RandomPartitioner instead of the OrderPreseveringPartitioner in cassandra.yaml.
Probably it's better not to rely on the sorting by key but to use a secondary index.
Quoting Cassandra - The Definitive Guide
Column names are stored in sorted order according to the value of compare_with. Rows,
on the other hand, are stored in an order defined by the partitioner (for example,
with RandomPartitioner, they are in random order, etc.)
I guess you are using RandomPartitioner which
... return data in an essentially random order.
You should probably use OrderPreservingPartitioner (OPP) where
Rows are therefore stored
by key order, aligning the physical structure of the data with your sort order.
Be aware of inefficiency of OPP.
(edit on Mar 07, 2014)
Important:
This answer is very old now.
It is a system-wide setting. You can set in cassandra.yaml. See this doc. Again, OPP is highly discouraged. This document is for version 1.1, and you can see it is deprecated. It is likely that it is removed from latest version. If you do want to use OPP, you may want to revisit the architecture the architecture.
Or create a row called "meta:userNames" in same column family and put all user names as a look up hash. Something like that.
Users {
key: "meta:userNames" {david:david, paolo:paolo, victor:victor},
key: "paolo" {password:"*****", locale:"it_it"},
key: "david" {password:"*****", locale:"en_us"},
key: "victor" {password:"*****", locale:"en_uk"}
}
First query the meta:userNames columns (that are sorted) and use them to get the user rows. Don't try to get everything via single db query as in SQL driven databases. Use Cassandra as huge Hash Map which provides rapid random access to its data.

Comment System using Redis Database System

I am trying to build a comment system using Redis database, I am currently using hashes to store the comment data, but the problem I am facing is that after 10 or 12 comments, comments lose their order and start appearing randomly, anyone know what data type should be used for building a commenting system using Redis, currently my hashes are of the form.
postid:comments commentid:userid "Testcomment"
Thanks, Any help will be appreciated.
Hashes are set up for quick access by key rather than retrieval in order. If you need items in a particular order, try a list or sorted set.
The reason it appears to work at first is an optimization for small sets - when you only have a small number of items a list is the most efficient structure, so that is what redis uses internally. When you get more items, an actual hashmap is needed for efficient querying and redis rearranges the data so that it is ordered by hash rather than by insertion order.
With my web app, I am using a format like this.
(appname):(postid):(comment id) - The hash of the posts
(appname):(postid):count - The latest comment id
And then I query the (appname):(postid):count key to get the amount of times I should run a loop that gets the contents of the (appname):(postid):(comment id) hash.
Example Code
$c = $redis->get('(appname):(postid):count');
for($i = 0; $i<$c; $i++) {
var_dump($redis->hgetall('(appname):(postid):'.$i));
}

What to prefer in GQL; StringListProperty or ListProperty?

I am building an application with a many to many relationship;
An item of entity 'Picture' can be linked to any number of Galleries ('Gallery'). And of course a Gallery can hold any number of Pictures.
So, following the Google Suggestion here, I will use a List at 'Picture' which holds the foreign keys of 'Gallery'. This is the BigTable approach.
(The old-style Relational DB approach would be to have a table / entity in between 'Picture' and 'Gallery'.)
Here's my question: When storing the Key, should I go for a "StringListProperty" on 'Picture' or would a "ListProperty(db.Key)" work better?
One reason I see for a StringList would be, that I could store also other values then Keys, but on the other hand that would be dirty style anyway. But I am also pretty sure that Google suggested not to use more then one List at an entity because the Index(es) will explode. So this will keep me a backdoor.
As for the ListProperty with type "Key" one point would be the automatic verification, if the value is actually a Key.
As it is very easy to convert Strings to Keys and vice versa, I don't see any reason for one of the List types to prefer here.
When it comes to performance issues, I have no idea on how I could test this - but it looks like this will be the main factor in this decision.
Curious about your input. Especially if someone has tested the performance on this or would be so kind and do it.
Cheers,
//Hannes
Use a db.ListProperty(db.Key) if you're intending to store lists of keys. They will be stored in a binary representation, which is more compact than the string representation you would use in a string list.
You're right that mixing keys with other objects in a list is messy. Having multiple lists in an entity is fine, as long as you don't index more than one of them in the same custom index - that is what causes exploding indexes.
Use db.ListProperty(db.Key), this is will make the data fetch easier than string.. if Gallery model has property had pic_list which is of type db.ListProperty(db.Key), which contains the list of keys of picture entity.. Suppose Picture is the name of your entity.. then Picture.get(//GalleryObject//.pic_list) will get all the picture entites..

Resources