Time based sliding windows in Rethinkdb - rethinkdb

i am testing Rethinkdb for a specific application
i have a collection of events each with it is own timestamp eg:
[{ event: "event1" , timestamp: "2016-05-28T00:01:00Z" },
{ event: "event2" , timestamp: "2016-05-28T00:02:00Z" },
{ event: "event3" , timestamp: "2016-05-28T00:03:00Z" },
]
and suppose that now is "2016-05-28T00:02:00Z".
is it possible to run a sliding window ** between() and change() query** on rethinkdb that works on time based events in order to extract events that are happening in this minute?
I know that there is the operator between() but in the documentation is written that r.now() is evaluated only when the query is fired and not during time.
Thank-you.

There isn't a way to do that inside of RethinkDB. You could subscribe to all changes on the table and post-filter on the client, which would actually work pretty well since most writes will be setting the timestamp to something within the last minute.

Related

Aerospike set expiration date for specific field

I have an Aerospike cache consists of list of data with value of json like structure.
example value: {"name": "John", "count": 10}
I was wandering if it is possible to set an expiration time for only the count field and reset it after some time.
Aerospike doesn't support such functionality out of the box. You would have to code this (hence your other post I guess:
Best way to update single field in Aerospike). You can add filters to only do this based on metadata of the record (the last update time of the record, accessible through Expressions) or any other logic and it should be super efficient and performant to then let a background ops query do the work.
Another approach can be adding your own custom expiration time stamp in your bin data like so:
{"name":"John", "count":10, "validTill":1672563600000000000}.
Here, I am using as below (you can use a different future timestamp format):
$ date --date="2023-01-01 09:00:00" +%s%N
1672563600000000000
Now when you read the record, read through an expression that returns count = 10 if your current clock is behind validTill, 0 otherwise. This can work if the count value on read is all you care about. Also, when you go to update the count value in a future write, you can use the same expression logic to update both count and validTill.
If this works for you, you don't have to scan and update records using background jobs.

Why GA4 doesn't set timestamp that I send with event?

I'm setting server events and sending to GA4. And I need to set timestamp_micros with event. But GA4 inteface always shows me the upload date of event, not that I setted. Also I see in Bigquery export that my event has upload timestamp. I even tried to send event via Event Builder, but it still shows upload timestamp.
And yes, I always set timestamp within last 72 hours according to documentation
body I send:
​data = {
"user_id": "123456",
"timestamp_micros": "1636025014649000",
"client_id": 'TLei4bvWcgN0rPjwmbMrT2QaIDRy7It5bzc0xNJ14Ew=.1635349750',
"non_personalized_ads": False,
"events":[{ "name":"tutorial_begin",
"params":{}
}]
}
Does anyone know why GA4 doesn't set timestamp_micros?
Please help, it's really important.
Only the one thing that I noted that in bigquery export in column user_propperties.value.set_timestamp_micros in the image in the rows 3 and 4 I see the exactly difference between my timestamp(setted) and ga4 timestamp(uploaded). But in column event_timestamp is uploaded timestamp. So we can say that GA4 sees my timestamp, but in the inetrface it shows uploaded timestamp.
It works. But in my case problem was that I looked at events_intraday_* table, because in events_intraday_* table it shows only uploaded time. And in the next day when data go to the events_* table column event_timestamp will have my setted timestmap.
Also I see the right my setted timestmap in GA4 interface.
And yes, I always set timestamp within last 72 hours according to documentation
I was shocked that in practice if event timestamp date doesn't equal the current date (in timezone that setted in your GA4 settings) then GA4 will not catch your event.
For example, if you send event with timestamp_micros date 2023-01-23 (yesterday), but current date in timezone of your GA4 is 2023-01-24 (today) so GA4 API say 200 OK, but doesn't catch your event.
I don't know why Google do this to us :(

Elasticsearch: record history of changing index

We're using elasticsearch to store data related to the features that our customers use. The index, say feature-usage is updated every time a customer activates or deactivates a feature.
Sample data:
Customer ID, Uses feature A, Uses feature B
1 , true , false
2 , true , true
3 , false , true
4 , true , false
This data reflects the "right now". There's no timestamp attached.
One of the views I can currently provide based on that is:
feature A is used by 3 customers right now.
feature B is used by 2 customers right now.
I would like to be able to show a history for this data:
feature A was used by 2 customers yesterday
feature A was used by 1 customer 2 days ago
Essentially, I want to create a graph showing the evolution of feature usages. For that, I need to store historical data, which I imagine would look something like this:
Day , customers using feature A, customers using feature B
2021-05-17, 2 , 1
2021-05-18, 3 , 1
2021-05-19, 2 , 2
On a SQL database, I would probably run a nightly cron job to generate this data. I tried playing around with elasticsearch's transforms and rollups, but I couldn't figure out a good solution.
Is there a way to transform feature-usage into the historical data as shown here, using only elasticsearch and no external code/cron jobs?
You can provide your index with an ingestion pipeline that adds the current timestamp, when the data is ingested. So you would have historical information.
You can define a sample pipeline as reported in the official documentation
PUT _ingest/pipeline/my-pipeline
{
"processors": [
{
"set": {
"description": "Index the ingest timestamp as 'event.ingested'",
"field": "event.ingested",
"value": "{{{_ingest.timestamp}}}"
}
}
]
}
Then, you can set the default pipeline for your index:
PUT feature-usage
{
"index.default_pipeline": "my-pipeline"
}
Now you should have a timestamp (event.ingested) for each document. Now you should be able to work with date_histograms and find the answers you seek.
The idea is that first you aggregate the data based on the date and then perform counts based on conditions.
Kind regards,
Mirko

couchdb get session date and query is on veiw

i want get the session date when it open to get all the record after i open my session not before ,i want something like this
function(doc) {
if (doc.created_at) {
if session.date => doc.created_at {
emit(doc.created_at, doc);
{
}
};
I've fallen for this problem myself when I was a couchdb newbie.
You need to understand first, that the map function is not executed when you run the view. The time of execution is only the very first time when the view is called after the document was last updated. And that only if the stale parameter was either not used or set to updateAfter.
What you can do instead is to use the startkey parameter when accessing the view. If you set this to the sessiondate, then only those documents created after the session date will be returned.
You however have to ensure consistent formatting and that the keys will be strictly sorted alphabetically numerically. E.g. by translating them to epoch times or a format in the style of yyyymmdd-ddmmss like 20140618-211259 for the time now (18th of June 2014 # 21:12:59)
Some examples of the parameters you can use are here.

Efficient way to find lots of "most recent" events in sqlite3

I've got an sqlite3 database that contains events. Each event is either "on" or "off" of something happening, and contains the time of the event as well as what the event is and some miscellaneous data which varies by event.
I want to query to find that last event of each type. So far this is the query I have come up with:
SELECT * from event where name='event1on' or name='event1off' ORDER BY t DESC LIMIT 1
This works, but it is slow when I have a lot of events I want to find the latest one of. I suspect this is because for each SELECT a full scan of the database must be made (several million rows), but I am at a loss to find a more efficient way to do this.
If you have SQLite 3.7.11 or later, you can use max to select other fields from the record that contains the maximum value:
SELECT *, max(t) FROM event GROUP BY name
To speed up this query, try creating one index on the name and t fields.

Resources