Modeling data in Redis - memory-management

I am building a system that keeps track of many counters in real time in Redis. Each counter is basically the impression, conversion details for ad keywords shown on a specific url.
ie. if 10 keywords are shown on a specific url, I need to update a count for each of those keywords for both impressions and conversions. And on each impression of a url, possibly a different set of 10 keywords can be shown.
ie. the basic data model I need is something like
> url=>
k1 =>
impression => 2
conversion => 1
k2 =><br>
impression => 100
conversion => 8
.
.
k100 (max around 100)</li>
I understand Redis doesnt have nested hashes so I cant store a 2 level hash as I have shown above.
What is the best way to solve this problem?
I thought of combining k1-impression and k1 conversion and making it one single field
ie like
url =>
k1-impression => 100
k1-conversion => 3
.<br>
. so on</li>
But the problem is the lengths of 'k1', 'k2' etc is significant ( 120-150 bytes) and I dont want to replicate that data, if possible, to save on memory.
How would I go about solving this problem?
Any help will be appreciated.

If your keywords are of significant enough length that you're worried about it, you should normalize them. Make a hash of keyword -> id, and a hash of id -> keyword, for encoding and decoding them. Then you can have per-url hashes of the form url => {kw_id:impressions => 1123, kw_id:conversions => 28}. This will also serve you well when you start needing to make indexes of the key words, which you will as soon as you get a requirement to show the top 10 best performing key words across all urls, for example.

Related

Laravel 5.3 and Redis (predis) - autoincrement hash and delete hash `row`

I've been flirting with Redis for a while now.
I've watched these series some time ago and they were awesome. I've been through some of the documentation and the mentioning of the Time complexity of the queries blew me away, this is something that's rarely mentioned in web materials but is of huge importance for app building.
Anyhow I'm trying to make my app use the Redis on the consumer end so the users can fetch the data as fast as possible.
So I'm trying to save some objects to hash as:
$redis->hmset("taxi_car", array(
"brand" => "Toyota",
"model" => "Yaris",
"license number" => "RO-01-PHP",
"year of fabrication" => 2010,
"nr_stats" => 0)
as found here and this works nicely.
However I can't find a way to delete the whole entry anywhere.
Did I get this hash thing wrong?
Following this example I would like to delete the entry with given licence number. All I could find is how to delete the licence number from the object:
$redis->hdel("taxi_car", "license number");
and can't figure out how to delete the whole hash row (please do correct with proper word for row here).
Another problem here is that it seems this only allows me to save a single taxi_car in the Redis. How do I set the UUID so I can have multiple Taxi cars?
I'm going to play with this a bit, any help is welcome. Thanks!
To delete a key of any type, Hash included, call the Redis DEL command.
To have multiple keys, give them different names, e.g. taxi_car:1, taxi_car:2 etc.

What's the 'Rails 4 Way' of finding some number of random records?

User.find(:all, :order => "RANDOM()", :limit => 10) was the way I did it in Rails 3.
User.all(:order => "RANDOM()", :limit => 10) is how I thought Rails 4 would do it, but this is still giving me a Deprecation warning:
DEPRECATION WARNING: Relation#all is deprecated. If you want to eager-load a relation, you can call #load (e.g. `Post.where(published: true).load`). If you want to get an array of records from a relation, you can call #to_a (e.g. `Post.where(published: true).to_a`).
You'll want to use the order and limit methods instead. You can get rid of the all.
For PostgreSQL and SQLite:
User.order("RANDOM()").limit(10)
Or for MySQL:
User.order("RAND()").limit(10)
As the random function could change for different databases, I would recommend to use the following code:
User.offset(rand(User.count)).first
Of course, this is useful only if you're looking for only one record.
If you wanna get more that one, you could do something like:
User.offset(rand(User.count) - 10).limit(10)
The - 10 is to assure you get 10 records in case rand returns a number greater than count - 10.
Keep in mind you'll always get 10 consecutive records.
I think the best solution is really ordering randomly in database.
But if you need to avoid specific random function from database, you can use pluck and shuffle approach.
For one record:
User.find(User.pluck(:id).shuffle.first)
For more than one record:
User.where(id: User.pluck(:id).sample(10))
I would suggest making this a scope as you can then chain it:
class User < ActiveRecord::Base
scope :random, -> { order(Arel::Nodes::NamedFunction.new('RANDOM', [])) }
end
User.random.limit(10)
User.active.random.limit(10)
While not the fastest solution, I like the brevity of:
User.ids.sample(10)
The .ids method yields an array of User IDs and .sample(10) picks 10 random values from this array.
Strongly Recommend this gem for random records, which is specially designed for table with lots of data rows:
https://github.com/haopingfan/quick_random_records
All other answers perform badly with large database, except this gem:
quick_random_records only cost 4.6ms totally.
the accepted answer User.order('RAND()').limit(10) cost 733.0ms.
the offset approach cost 245.4ms totally.
the User.all.sample(10) approach cost 573.4ms.
Note: My table only has 120,000 users. The more records you have, the more enormous the difference of performance will be.
UPDATE:
Perform on table with 550,000 rows
Model.where(id: Model.pluck(:id).sample(10)) cost 1384.0ms
gem: quick_random_records only cost 6.4ms totally
For MYSQL this worked for me:
User.order("RAND()").limit(10)
You could call .sample on the records, like: User.all.sample(10)
The answer of #maurimiranda User.offset(rand(User.count)).first is not good in case we need get 10 random records because User.offset(rand(User.count) - 10).limit(10) will return a sequence of 10 records from the random position, they are not "total randomly", right? So we need to call that function 10 times to get 10 "total randomly".
Beside that, offset is also not good if the random function return a high value. If your query looks like offset: 10000 and limit: 20 , it is generating 10,020 rows and throwing away the first 10,000 of them,
which is very expensive. So call 10 times offset.limit is not efficient.
So i thought that in case we just want to get one random user then User.offset(rand(User.count)).first maybe better (at least we can improve by caching User.count).
But if we want 10 random users or more then User.order("RAND()").limit(10) should be better.
Here's a quick solution.. currently using it with over 1.5 million records and getting decent performance. The best solution would be to cache one or more random record sets, and then refresh them with a background worker at a desired interval.
Created random_records_helper.rb file:
module RandomRecordsHelper
def random_user_ids(n)
user_ids = []
user_count = User.count
n.times{user_ids << rand(1..user_count)}
return user_ids
end
in the controller:
#users = User.where(id: random_user_ids(10))
This is much quicker than the .order("RANDOM()").limit(10) method - I went from a 13 sec load time down to 500ms.

How to create unique ID in format xx-123 on rails

is it possible to create some unique ID for articles on rails?
For example, first article will get ID - aa-001,
second - aa-002
...
article #999 - aa-999,
article #1000 - ab-001 and so on?
Thanks in advance for your help!
The following method gives the next id in the sequence, given the one before:
def next_id(id, limit = 3, seperator = '-')
if id[/[0-9]+\z/] == ?9 * limit
"#{id[/\A[a-z]+/i].next}#{seperator}#{?0 * (limit - 1)}1"
else
id.next
end
end
> next_id("aa-009")
=> "aa-010"
> next_id("aa-999")
=> "ab-001"
The limit parameter specifies the number of digits. You can use as many prefix characters as you want.
Which means you could use it like this in your application:
> Post.last.special_id
=> "bc-999"
next_id(Post.last.special_id)
=> "bd-001"
However, I'm not sure I'd advice you to do it like this. Databases have smart methods to avoid race conditions for creating ids when entries are created concurrently. In Postgres, for example, it doesn't guarantee gapless ids.
This approach has no such mechanism, which could potentially lead to race conditions. However, if this is extremely unlikely to happen such in a case where you are the only one writing articles, you could do it anyway. I'm not exactly sure what you want to use this for, but you might want to look into to_param.
You may want to look into the FriendlyId gem. There’s also a Railscast on this topic which covers a manual approach as well as the usage of FriendlyId.

Ruby on Rails - generating bit.ly style identifiers

I'm trying to generate UUIDs with the same style as bit.ly urls like:
http://bit [dot] ly/aUekJP
or cloudapp ones:
http://cl [dot] ly/1hVU
which are even smaller
how can I do it?
I'm now using UUID gem for ruby but I'm not sure if it's possible to limitate the length and get something like this.
I am currently using this:
UUID.generate.split("-")[0] => b9386070
But I would like to have even smaller and knowing that it will be unique.
Any help would be pretty much appreciated :)
edit note: replaced dot letters with [dot] for workaround of banned short link
You are confusing two different things here. A UUID is a universally unique identifier. It has a very high probability of being unique even if millions of them were being created all over the world at the same time. It is generally displayed as a 36 digit string. You can not chop off the first 8 characters and expect it to be unique.
Bitly, tinyurl et-al store links and generate a short code to represent that link. They do not reconstruct the URL from the code they look it up in a data-store and return the corresponding URL. These are not UUIDS.
Without knowing your application it is hard to advise on what method you should use, however you could store whatever you are pointing at in a data-store with a numeric key and then rebase the key to base32 using the 10 digits and 22 lowercase letters, perhaps avoiding the obvious typo problems like 'o' 'i' 'l' etc
EDIT
On further investigation there is a Ruby base32 gem available that implements Douglas Crockford's Base 32 implementation
A 5 character Base32 string can represent over 33 million integers and a 6 digit string over a billion.
If you are working with numbers, you can use the built in ruby methods
6175601989.to_s(30)
=> "8e45ttj"
to go back
"8e45ttj".to_i(30)
=>6175601989
So you don't have to store anything, you can always decode an incoming short_code.
This works ok for proof of concept, but you aren't able to avoid ambiguous characters like: 1lji0o. If you are just looking to use the code to obfuscate database record IDs, this will work fine. In general, short codes are supposed to be easy to remember and transfer from one medium to another, like reading it on someone's presentation slide, or hearing it over the phone. If you need to avoid characters that are hard to read or hard to 'hear', you might need to switch to a process where you generate an acceptable code, and store it.
I found this to be short and reliable:
def create_uuid(prefix=nil)
time = (Time.now.to_f * 10_000_000).to_i
jitter = rand(10_000_000)
key = "#{jitter}#{time}".to_i.to_s(36)
[prefix, key].compact.join('_')
end
This spits out unique keys that look like this: '3qaishe3gpp07w2m'
Reduce the 'jitter' size to reduce the key size.
Caveat:
This is not guaranteed unique (use SecureRandom.uuid for that), but it is highly reliable:
10_000_000.times.map {create_uuid}.uniq.length == 10_000_000
The only way to guarantee uniqueness is to keep a global count and increment it for each use: 0000, 0001, etc.

data structure to support lookup based on full key or part of key

I need to be able to lookup based on the full key or part of the key..
e.g. I might store keys like 10,20,30,40 11,12,30,40, 12,20,30,40
I want to be able to search for 10,20,30,40 or 20,30,40
What is the best data structure for achieving this..best for time.
our programming language is Java..any pointers for open source projects will be appreciated..
Thanks in advance..
If those were the actual numbers I'd be working with, I'd use an array where a given index contains an array of all records that contain the index. If the actual numbers were larger, I'd use a hash table employed the same way.
So the structure would look like (empty indexes elided, in the case of the array implementation):
10 => ((10,20,30,40)),
11 => ((11,12,30,40)),
12 => ((11,12,30,40), (12,20,30,40)),
20 => ((10,20,30,40), (12,20,30,40)),
30 => ((10,20,30,40), (11,12,30,40), (12,20,30,40)),
40 => ((10,20,30,40), (11,12,30,40), (12,20,30,40)),
It's not clear to me whether your searches are inclusive (OR-based) or exclusive (AND-based), but either way you look up the record groups for each element of the search set; for the inclusive search you find their union, and for the exclusive search you find their intersection.
Since you seen to care about retrieval time over other concerns (such as space), I suggest you use a hashtable and you enter your items several times, once per subkey. So you'd put("10,20,30,40",mydata), then put("20,30,40",mydata) and so on (of course this would be a method, you're not going to manually call put so many times).
Use a tree structure. Here is an open source project that might help ... written in Java :-)
http://suggesttree.sourceforge.net/

Resources