I have redis datastore with data stored using alphanumeric non-date keys. How might I get the values that have been stored longer than a certain time period?
Store the name of every key you add in a Sorted Set, with the score being the creation timestamp. To retrieve ranges, such as keys created before x time, refer to ZRANGE.
Related
I'm starting work on project which will require to do many work with sorted sets. I need to keep some sets sorted and do CRUDs as fast as possible, there is any tarantool functionality that allows to insert data to sorted set like redis ZADD function? Or i have to sort data on my own (using C or lua scripts) or maybe sorted selects from tarantool is fast enough? Please give me some opinions or advices
In Tarantool, TREE index automatically sorts your data. Create a simple space with TREE primary key on the first field. You can store any json data in the second or third, fourth, ... field, or you can then format the space to reflect your schema and set values will conform to the schema, just like in a relational database.
I like to store the log of byte counters for 10 Million LAN devices.
Each device reports byte counter value every 15 minutes (96 samples/day), and each data sample has 500 columns. Each device is identified by its device serial dev_sn.
At the end of day, I will process the data (compute the total byte per device) for all the devices and store them into HIVE data format.
The raw data would be like this:(ex. Device sn1,sn2,and sn3 report values at t1,t2,and t3)
Method 1: Use both dev_sn and timestamp as the composite row-key.
Method 2: Use dev_sn as the row-key and store each data as the version update of the existing values.
To find the total bytes,
Method 1: Search by sn1 for composite key and sort by time and process the data
Method 2: Search by sn1 and pull all the versions and process the data
I think Method 2 is a better solution as it will create less number of row-keys, but not sure if that is really the better approach. Some advice would really helpful.
This is subjective, but I always opt for a composite row key over versioning, for the following reasons:
You can store unlimited "versions" per device. With versioning, this property is limited (as set in configuration).
It's much easier to retrieve entries from specific timestamps/time ranges with an HBase command. Prefix scans are much easier to work with than the version API.
There's no reason for you to want to reduce the number of row keys - HBase is designed specifically to store huge numbers of row keys.
What if you need to delete last Tuesday's data? With versioning that's difficult, with composite keys it's a small piece of code.
As an aside, be sure to pre-split your region servers so that the dev_sn values distribute evenly.
Aerospike is blazingly fast and reliable, but expensive. The cost, for us, is based on the amount of data stored.
We'd like the ability to query records based on their upsert time. Currently, when we add or update a record, we set a bin to the current epoch time and can run scan queries on this bin.
It just occurred to me that Aerospike knows when to expire a record based on when it was upserted, and since we can query the TTL value from the record metadata via a simple UDF, it might be possible to infer the upsert time for records with a TTL. We're effectively using space to store a value that's already known.
Is it possible to access record creation or expiry time, via UDF, without explicitly storing it?
At this point, Aerospike only stores the void time along with the record (the time when the record expires). So the upsert time is unfortunately not available. Stay tuned, though, as I heard there were some plans to have some new features that may help you. (I am part of Aerospike's OPS/Support team).
void time : This tracks the life of a key in system. This is the time at which key should expire and is used by eviction subsystem.
so ttl is derived from the void time.
As we get ttl from a record, we can only calculate the void time (now + ttl)
Based on what you have, I think you can evaluate the upsert time from ttl only if you add same amount of expiration to all your records, say CONSTANT_EXPIRATION_TIME.
in that case
upsert_time = now - (CONSTANT_EXPIRATION_TIME - ttl)
HTH
I'm keeping a list of online users in Redis with one key corresponding to one user. Keys are set to time out in 15 minutes, so all I have to do to see how many users have roughly been active in the past 15 minutes, I can do:
redisCli.keys('user:*').count
The problem is as the number of keys grows, the time it takes to fetch all the keys before counting them is increasing noticeably. Is there a way to count the keys without actually having to fetch all of them first?
There is an alternative to directly indexing keys in a Set or Sorted Set, which is to use the new SCAN command. It depends on the use case, memory / speed tradeoff, and required precision of the count.
Another alternative is that you use Redis HyperLogLogs, see PFADD and PFCOUNT.
Redis does not have an API for only counting keys with a specific pattern, so it is also not available in the ruby client.
What I can suggest is to have another data-structure to read to number of users from.
For instance, you can use redis's SortedSet, where you can keep each user with the timestamp of its last TTL set as the score, then you can call zcount to get the current number of active users:
redisCli.zcount('active_users', 15.minutes.ago.to_i, Time.now.to_i)
From time to time you will need to clean up the old values by:
redisCli.zremrangebyscore 'active_users', 0, 15.minutes.ago.to_i
Does the SCN itself encode a timestamp or is it a lookup from some table.
From an AskTom post he explains that the timestamp to +/-3seconds is stored in raw field in smon_scn_time. IS that where the function is going to get the value?
If so, when is that table purged if ever? If so, what triggers that purge?
If it is, does that make it impossible to translate old SCN's to Timestamps?
If it's impossible, then it eliminates any uses of that field that are long term things (read: auditing).
If I put that function in a query, would joining to that table be faster?
If so, anyone know how to covert that Raw column?
The SCN does not encode a time value. I believe it is an autoincrementing number.
I would guess that SMON is inserting a row into SMON_SCN_TIME (or whatever table underlies it) every time it increments the SCN, including the current timestamp.
I queried for the minimum recorded timestamp in several databases and they all go back about 5 days and have a little under 1500 rows in the table. So it is less than the instance lifetime.
I imagine the lower bound on how long the data is kept might be determined by the DB_FLASHBACK_RETENTION_TARGET parameter, which defaults to 1 day.
I would recommend using the function, they've probably provided it so they can change the internals at will.
No idea what the RAW column TIM_SCN_MAP contains, but the TIME_DP and SCN column would appear to give you the mapping.