API input : begin end date better or date and time window - performance

For an API that is delivering time range of performance data down to a granularity of 1 second with primary consumption being live graphs (AJAX) though could be used for alerts as well, would it be better to have the API input take a begin date and end date or a start time and time range?
For example
GET_DATA("beginTime": "2016-01-01T00:00:00Z", "endTime": "2016-01-01T00:05:00Z")
or
GET_DATA("dataTime":"2016-01-01T00:00:00Z","timeWidthSecs":300)
Both versions have advantages.
The first API would be good in the cases where I knew exactly what time range I wanted. An exact time range would be most pertinent if I knew the time range that a performance problem occurred.
The second API is more flexible. Since the main use case is graphing data, the second one would allow me to overload the arguments. For example :
GET_DATA("dataTime":"2016-01-01T00:00:00Z") /* returns all data since time , good for polling and refreshing graph */
GET_DATA("timeWidthSecs":300) /* return most recent 300 secs , good for intial setup of graph */
GET_DATA("dataTime":"2016-01-01T00:00:00Z","timeWidthSecs":300) /* data since time, 300 secs , good for getting historic time range */
GET_DATA("dataTime":"2016-01-01T00:00:00Z","timeWidthSecs":-300) /* data before time, 300 seconds, good for getting historic time range */
Since as a consumer I might be polling , then getting all data since a time, i.e. the last date I have already data for, would be useful. Of course there should be some other construct to stream data for a push model, but for those that wanted a pull model this would be useful.
Also it would be easier to read/debug the code that gives timeWidth in seconds rather than two dates where a human intervention would require the human to do date arithmetic.
Thoughts?

Related

Custom timestamp is not taken into account in blob path for Stream analytics

Given a query that looks like this:
SELECT
EventDate,
system.Timestamp as test
INTO
[azuretableoutput]
FROM
[csvdata] TIMESTAMP BY EventDate
According to documentation, EventDate should now be used as timestamp.
However, when storing data into blobstorage with this path:
sadata/Y={datetime:yyyy}/M={datetime:MM}/D={datetime:dd}
I seem to still get ingested time. In my case, ingested time means nothing and I need to use EventDate for the path. Is this possible?
When checking data in Visual Studio, test and EventDate should be equal, however results look like this:
EventDate ;Test
2020-04-03T11:13:07.3670000Z;2020-04-09T02:16:15.5390000Z
2020-04-03T11:13:07.0460000Z;2020-04-09T02:16:15.5390000Z
2020-04-03T11:13:07.0460000Z;2020-04-09T02:16:15.5390000Z
2020-04-03T11:13:07.3670000Z;2020-04-09T02:16:15.5390000Z
2020-04-03T11:13:08.1470000Z;2020-04-09T02:16:15.5390000Z
Late tollerance arrival window is set as: 99:23:59:59
Out of order tollerance is set as: 00:00:00:00 with out of order action set to adjust.
When running same query in Stream Analytics on Azure i get this result:
[{"eventdate":"2020-04-03T11:13:20.1060000Z","test":"2020-04-03T11:13:20.1060000Z"},
{"eventdate":"2020-04-03T11:13:20.1060000Z","test":"2020-04-03T11:13:20.1060000Z"},
{"eventdate":"2020-04-03T11:13:20.1060000Z","test":"2020-04-03T11:13:20.1060000Z"}]
So far so good. When running the query with data on Azure it produces this path:
Y=2020/M=04/D=09
It should have produced this path:
Y=2020/M=04/D=03
Interestingly enough, when checking the data that is actually stored in blobstorage I find this:
EventDate,test
2020-04-03T11:20:39.3100000Z,2020-04-09T19:33:35.3870000Z,
System.timestamp seems to only be altered when testing the query on sampled data, but is not actually altered when the query is running normally and receiving data.
I have tested this with late arrival setting set to 0 and 20 days. In reality I need to disable late arrival adjustment as I might get events that are years old through the pipeline.
This issue has been brought up and closed on the MicrosoftDocs GitHub
The Microsoft folks say:
Maximum days for late arrival is 20, so if the policy is set to 99:23:59:59 (99 days). The adjustment could be causing a discrepancy in System.Timestamp.
By definition of late arrival tolerance window, for each incoming event, Azure Stream Analytics compares the event time with the arrival time; if the event time is outside of the tolerance window, you can configure the system to either drop the event or adjust the event’s time to be within the tolerance.
Consider that after watermarks are generated, the service can potentially receive events with event time lower than the watermark. You can configure the service to either drop those events, or adjust the event’s time to the watermark value.
As a part of the adjustment, the event’s System.Timestamp is set to the new value, but the event time field itself is not changed. This adjustment is the only situation where an event’s System.Timestamp can be different from the value in the event time field, and may cause unexpected results to be generated.
For more information, please see Understand time handling in Azure Stream Analytics.
Unfortunately, testing with sample data in Azure portal doesn't take policies into account at this time.
Potentially other helpful resources:
System.Timestamp()
TIMESTAMP BY
Event ordering policies
Time handling
Job monitoring

Iterating through Google events that are in the past

I'm implementing view for google events in my application using the following end-point:
https://developers.google.com/google-apps/calendar/v3/reference/events/list
The problem that I have is implementing a feature to make it possible to go to the previous page of events. For example: user is having 20 events for the current date and once he presses the button, they have 20 past events.
As I can see, Google provides only:
"nextPageToken": string
That fetches the results for the next page.
The way I see the problem can be solved:
Fetch results in descending order and then traverse them the same way as we do with nextPageToken. The problem is that it is stated in the doc that only asc is available:
"startTime": Order by the start date/time (ascending). This is only
available when querying single events (i.e. the parameter singleEvents
is True)
Fetch all the events for specific time period, traverse the pages until I get to the current date or to the end of the list, memorize all the nextPageTokens. Use memorized values to be able to go backwards. The clear drawback of it is the fact that we need to go through unpredictable number of pages to get the current date. That can dramatically affect the performance. But, at least it is something that Google APIs allow. Updated: Checked that approach with 5 years time span and sometimes it takes up to 20 seconds to get the current date page token.
Is there a more convenient way to implement the ability to go to the previous pages?

WinHttpWriteData completion

I'm using WinHTTP to transfer large files to a PHP-based web server and I want to display the progress and an estimated speed. After reading the docs I have decided to use chunked transfer encoding. The files get transferred correctly but there is an issue with estimating the time that I cannot solve.
I'm using a loop to send chunks with WinHttpWriteData (header+trailer+footer) and I compute the time difference between start and finish with GetTickCount. I have a fixed bandwidth of 4mbit configured on my router in order to test the correctness of my estimation.
The typical time difference for chunks of 256KB is between 450 - 550ms, which is correct. The problem is that once in a while (few seconds/tens of seconds) WinHttpWriteData returns really really fast, like 4-10ms, which is obviously not possible. The next difference is much higher than the average 500ms.
Why does WinHttpWriteData confirms, either synchronously or asynchronously that it has written the data to the destination when, in reality, the data is still being transferred ? Any ideas ?
Oversimplified, my code looks like:
while (dataLeft)
{
t1 = GetTickCount();
WinHttpWriteData(hRequest, chunkHdr, chunkHdrLen , NULL);
waitWriteConfirm();
WinHttpWriteData(hRequest, actualData, actualDataLen , NULL);
waitWriteConfirm();
WinHttpWriteData(hRequest, chunkFtr, chunkFtrLen , NULL);
waitWriteConfirm();
t2 = GetTickCount();
tdif= t2 - t1;
}
This is simply the nature of how sockets work in general.
Whether you call a lower level function like send() or a higher level function like WinHttpWriteData(), the functions return success/failure based on whether they are able to pass data to the underlying socket kernel to not. The kernel queues up data for eventual transmission in the background. The kernel does not report back when the data is actually transmitted, or if the receiver acks the data. The kernel happily accepts new data as long as there is room in the queue, even if it will take awhile to actually transmit. Otherwise, it will block the sender until room becomes available in the queue.
If you need to monitor actual transmission speed, you have to monitor the low level network activity directly, such as with a packet sniffer or driver hook. Otherwise, you can only monitor how fast you are able to pass data to the kernel (which is usually good enough for most purposes).

Java Redis Rate limiting

I just want to do rate limiting on a rest api using redi. Could you please suggest me, which datastructure in redis would be appropriate. I just used the RedisTemplate which is not feasible to expire an element, once after updating a key and value.
There are multiple approaches, depending on what exactly you are trying to achieve - from general "ops per second" limiting, to fine grained limits in a lower resolution, like how many posts a specific user can make per day, etc.
One very simple and elegant approach I like is an expiring counter. The technique is simple, and takes advantage of the fact that INCR does not change a key's expiration time in redis. So basically if you want 1000 requests per second on a resource, just create a key with the number 1 (by running INCR) and expire it in a second. Then for each request check if it's reached 1000 and if not increment it. If it has - block the request. When the time window has passed the key will expire automatically and will be recreated on the next request.
In terms of pseudo code, the algorithm is:
def limit(resource_key):
current = GET(resource_key)
if current != NULL and current >= 1000:
return ERROR
else:
value = INCR(resource_key)
IF value == 1:
EXPIRE(value,1)
return OK

Consisntent N1QL Query Couchbase GOCB sdk

I'm currently implementing EventSourcing for my Go Actor lib.
The problem that I have right now is that when an actor restarts and need to replay all it's state from the event journal, the query might return inconsistent data.
I know that I can solve this using MutationToken
But, if I do that, I would be forced to write all events in sequential order, that is, write the last event last.
That way the mutation token for the last event would be enough to get all the data consistently for the specific actor.
This is however very slow, writing about 10 000 events in order, takes about 5 sec on my setup.
If I instead write those 10 000 async, using go routines, I can write all of the data in less than one sec.
But, then the writes are in indeterministic order and I can know which mutation token I can trust.
e.g. Event 999 might be written before Event 843 due to go routine scheduling AFAIK.
What are my options here?
Technically speaking MutationToken and asynchronous operations are not mutually exclusive. It may be able to be done without a change to the client (I'm not sure) but the key here is to take all MutationToken responses and then issue the query with the highest number per vbucket with all of them.
The key here is that given a single MutationToken, you can add the others to it. I don't directly see a way to do this, but since internally it's just a map it should be relatively straightforward and I'm sure we (Couchbase) would take a contribution that does this. At the lowest level, it's just a map of vbucket sequences that is provided to query at the time the query is issued.

Resources