Iterating through Google events that are in the past - algorithm

I'm implementing view for google events in my application using the following end-point:
https://developers.google.com/google-apps/calendar/v3/reference/events/list
The problem that I have is implementing a feature to make it possible to go to the previous page of events. For example: user is having 20 events for the current date and once he presses the button, they have 20 past events.
As I can see, Google provides only:
"nextPageToken": string
That fetches the results for the next page.
The way I see the problem can be solved:
Fetch results in descending order and then traverse them the same way as we do with nextPageToken. The problem is that it is stated in the doc that only asc is available:
"startTime": Order by the start date/time (ascending). This is only
available when querying single events (i.e. the parameter singleEvents
is True)
Fetch all the events for specific time period, traverse the pages until I get to the current date or to the end of the list, memorize all the nextPageTokens. Use memorized values to be able to go backwards. The clear drawback of it is the fact that we need to go through unpredictable number of pages to get the current date. That can dramatically affect the performance. But, at least it is something that Google APIs allow. Updated: Checked that approach with 5 years time span and sometimes it takes up to 20 seconds to get the current date page token.
Is there a more convenient way to implement the ability to go to the previous pages?

Related

Properly way for obtaining the N latest videos of channel : search vs playlistItem vs activities endpoints

I'm developing for a web app that needs to retrieve the last 10 videos of a user(channel).
First approach
Was to use the search endpoint with param 'forMine' ordering by date, but then I figured that maybe that param could retrieve videos uploaded by the user in a diferent channel or whatever...
First result with channel ID and date - 1st Aproach
Second approach
Was to use the search endpoint with param 'channelId' ordering by date, but then I realized that descriptions were incomplete and most importantly there were some videos missing comparing with first aproach, even if the missing videos belonged to same channel (as showed in pics links)
First resutl with channel ID and date - 2nd Aproach
So, then I googled to find some solution and found other way.
Third approach
Was to use the playlistItem endpoint as I found in Google, and seemed ok (I supposed) because it returned same videos that first aproach and consumed less quota but this method left me with doubts as I didn't knew if the videos would be the latest or maybe they would be sorted by position in the playlist and couldn't be trusted to be the most recent
That said, what would be the correct way to get the N most recent videos from a channel, please?
Regardless of the quota consumption (the less quota the better, of course, but an accurate result is essential)
I'm so confussed with the API response...
Thank you so much!
-- EDITED: NEW APPROACH AND FURTHER INVESTIGATIONS --
Fourth approach
Was to use activities endpoint as was stated by #stvar in his answer. I found that this way, as on second approach, there were some videos missing comparing with first and third approaches, and it was required to retrieve everything without 'maxResults' param because there were activities not related to video upload, making mandatory to perform pagination and a self filtering by type 'upload' after retrieving response in order to get N videos (or be confident in getting N videos uploaded in first 50 retrieved items)
Self Investigations
Further investigations and tests bringed me response to the issue of 'missing videos' of some approaches.
The status of that missing videos were 'unlisted', so they were videos uploaded to the channel, property of the channel, uploaded by user of the channel... but not retrieved by some methods that seemed to retrieve only 'public' videos not 'unlisted' (hidden) nor 'private'.
NOTE: I did my test with Google API PHP Client Library, this behaviour seems not to be on 'Try this API' as it returns only 'public' items, so be careful on trust in 'Try this API' results as it seems to use some hidden filters or something...
Also I tested the channel upload playlist to verify that the order can not be changed and has a LIFO sorting
CONCLUSIONS
At this point, my self conclusion is that there is not a proper way to solve this but quite ways to do it in depend of requisites of status and amount of free quota
Search endpoint seems to work all right, if you have a good amount of unused quota (100 each call) that is the direct way and easiest one as you can sort it and filtering as needed by a bunch of params, taking care to use 'forMine' param if you need every uploaded video or 'channelId' if you need only 'listed' and 'public' ones.
PlaylistItems endpoint is a proper way if you are in a quota crisis (1 each call) as the result is sorted by recent date, taking care to do pagination and post filtering if only 'public' videos are needed till retrieve the desired amount of video ids, otherwhise you can go all the way easy.
Note that the date used to order is the upload date not the post date
(thanks to #stvar for bringing this to the attention)
Activity endpoint, also for quota crisis (1 each call), while it could be more accurate than the others if you only want public videos (it is ordered by recent 'first publish date' so not accurate 100% neither ), is for me the one that gives more work, as it retrieves activities other than 'video upload', so you can not skip pagination and post filtering to retrieve the desired amount of video ids, besides that way you only have access, as said before, to public videos (which is fine if that meets your needs).
Anyway, if you need more than 50 ids, you need to make pagination whatever the aproach you use.
Hope this help someone else and thanks so much to contributors
PS: People in charge of the YouTube API, perhaps a filter by state among some others would be interesting, Thanks!!!
You may employ the Activities.list API endpoint, queried with:
mine=true,
part=snippet,contentDetails,
fields=items(snippet(type),contentDetails(upload)), and
maxResults=50.
For to obtain your desired N uploads, you have to implement pagination. That is that you have to successively call the endpoint until you reach N result set items that have snippet.type equal with upload.
Note that you may well use channelId=CHANNEL_ID instead of mine=true, if you're interested about the most recent uploads of a channel identified by its ID CHANNEL_ID rather than your own channel.
According to the docs, you'll get from this endpoint a result set made of Activities resource items that will contain the following info:
contentDetails.upload (object)
The upload object contains information about the uploaded video. This property is only present if the snippet.type is upload.
contentDetails.upload.videoId (string)
The ID that YouTube uses to uniquely identify the uploaded video.
The official docs state that each call to Activities.list endpoint has a quota cost of one unit.
Futhermore, upon obtaining a set of video IDs, you may invoke the Videos.list endpoint with a properly assigned id parameter, for to obtain from the endpoint all the details you need for each and every video of your interest.
Note that if you have a set of video IDs of cardinality K, since the parameter id of Videos.list endpoint can be specified as a comma-separated list of video IDs, then you may reduce the number of calls to Videos.list endpoint from K to floor(K / 50) + (K % 50 ? 1 : 0) by appropriately using the feature of id just mentioned.
According to the official docs, each call to Videos.list endpoint has also a quota cost of one unit.
Clarifications upon OP's request:
Question no. 1: The Activities.list endpoint produces only the activities specified by the Activities resource. The type property enumerates them all:
snippet.type (string)
The type of activity that the resource describes.
Valid values for this property are: channelItem, comment (not currently returned), favorite, like, playlistItem, promotedItem, recommendation, social, subscription, upload, bulletin (deprecated).
Indeed your remark is correct. For example, when getting the most recent 10 uploads, is possible that you'll have to scan a number of pages P of result sets, with P >= 2, until you reached collecting the desired 10 upload items. (Actual tests have confirmed me this to be factual.)
Question no. 2: The Activities.list endpoint produces items that are sorted by publishedAt; just replace the above fields with:
fields=items(snippet(type,publishedAt),contentDetails(upload))
and see that for yourself.
I could make here the following argument justifying the necessity that the items resulted upon the invocation of Activities.list endpoint be ordered chronologically by publishedAt (the newest first). One may note that, indeed, the official docs quoted above do not specify explicitly that ordering condition I just mentioned; but bare with me for a while:
My argument is of a pragmatic kind: if the result set of Activities.list is not ordered as mentioned, then this endpoint becomes useless. This is so, since, in this case, for one to obtain the most recent upload activity would have to fetch locally all the upload activities, for to then scan that result set for the most recent one. Being compelled to fetch all upload activities only for to obtain the newest one is pragmatically a nonsense. Therefore, by way of contradiction, the result set has to be ordered chronologically by publishedAt with the newest being the first.
Question no. 3: Indeed Search.list is not precise -- it has a fuzzy behavior. I can confirm this based on my own experience; but, unfortunately, I cannot point you to official docs (from Google or YouTube) that acknowledge and explain this behavior. As unfortunate as it is, for its users Search.list is completely opaque.
On the other hand, Activities.list is precise -- it has to be like that; if it wouldn't be precise, then that's a serious bug in the implementation (in my educated opinion).

Mailchimp members activity

I've got some kind of script. Goal is:
Get Mailchimp Lists
For each list get members
For each member get activity
Store it
Does anyone know - if there any way to not use one API call for each member to get his activity?
I've got around 28 000 members.
28 000 API calls - seems as bad as it can be.
I've tried to get Lists Activity, but no way, it is always empty. So I really have to get exactly members activity.
I'm currently attempting to do something very similar and there is a workaround, although I am not sure how feasible it is. Basically, you can do it through reports, email activity:
http://developer.mailchimp.com/documentation/mailchimp/reference/reports/email-activity/
The challenge here will be that you will try to pull 28.000+ records at a time, therefore it will take a long time. From my brief calculations it can take up to 1 minute per 1000 records (you will need to loop through 1000 records at a time, otherwise it will most likely time out).
The larger problem is maintaining this 'database', if you have activity constantly happening (i.e. opens/clicks/bounces) then you will need to pull the whole campaign activity again and update wherever you store it. I've been trying to find a workaround with no success. You could use the 'since=2017-10-07T00:00:00+00:00' parameter, however it still returns a blank list when there is no activity unfortunately. If only 1000 members are actually active, it will return 27.000 rows of no activity. It would be great if there would be another parameter we could potentially apply to return only emails where there was an action.
Please let me know if you find a better solution.
P.S. - it might be worth reaching out to mailchimp support for this
Update - you can use the Mailchimp Export api: https://developer.mailchimp.com/documentation/mailchimp/guides/how-to-use-the-export-api/ and extract the email activity. I had huge issues unpacking it, please follow the links below: Decode text response from API in Python 3.6 and Separate pd DataFrame Rows that are dictionaries into columns . Let me know if you have any other questions.

Is Elasticsearch Scroll API not recommended for real-time pagination?

I understand that Elasticsearch Scroll API is not intended for real-time user requests. But would it be bad if it's used for that? I have a requirement to implement paginated results (to be displayed on web frontend) and from/size approach is returning duplicates across pages. Presumably because I have a sharded setup (with no replicas at all). I've tried setting preferencebut it did not help.
Scroll API does not seem to have this issue, I'm wondering if it's really bad to use it for my use case?
Thanks
Results from a scrolling search reflect the state of the index at the time of the initial search request. Subsequent indexing or document changes only affect later search and scroll requests. it means that your pagination is based on the time you requested the search result, so you don't see new document or will see deleted in your result. Also Scroll API is not recommended by ES for deep pagination any more(ES 7.x). you can find more info on ElasticSearch documentation page: https://www.elastic.co/guide/en/elasticsearch/reference/7.x/scroll-api.html
On the question 'why you get duplicate results', I think this is caused by intermediate indexing. When doing independent search calls with pagination, each call runs independently (still using some caching). So if you ask the first 100, you get the first 100 at that time. When then asking x seconds later the 'next' 100, you get 100 - 199 at x seconds later. If meanwhile a new document got indexed which logically fits in the first 100, it will push the rest further. This way, your result 100 (first in the second results) might have been #99 in the first call. When then gluing them together in the UI, you see the same result twice.
Both scroll and search-after are designed to refer ES back to the original call, indicating it that you want to continue counting from that moment onwards.
I have not found a good explanation though why search_after is better than scroll.
I assume that scroll is optimized for the use case where you will go through the entire set anyway (so the pagination is to avoid overloading the client and the pipe between ES and client with too big chunks at once). While search_after is optimized for the use case where you are likely to only go a few pages far/deep (it is known that human users tend to stay on the first page with a quickly lowering frequency of going much further, because you would force your eyes to find something into overwhelming amounts of information). Implementing good filters in the user interface is the much better approach.

What is the difference between startDate and a filter on "published" in the Okta Events API?

I've written a .NET app using the Okta.Core.Client 0.2.9 SDK to pull events from our organization's syslog for import into another system. We've got it running every 5 minutes, pulling events published since the last event received in the previous run.
We're seeing delays on some events showing up. If I do a manual run at the top of the hour for the previous hour's data, it'll include more rows than the 5-minute runs. While trying to figure out why I remembered the startDate param, mutually-exclusive with the filter one I've been using.
The docs don't mention much about it - just that it "Specifies the timestamp to list events after". Does it work the same as published gt "some-date"? We're capturing data for chunks of time, so I needed to include a "less than" filter and ignored startDate. But the delayed events have me looking for a workaround.
Are you facing delayed results using startDate or filter?
Yes published gt "some-date" and startDate work the same way. The following two API calls.
/api/v1/events?limit=100&startDate=2016-07-06T00:00:00.000Z
and
/api/v1/events?limit=100&filter=published gt "2016-07-06T00:00:00.000Z"
returns the same result. Since, they are mutually exclusive filter might come in handy in creating more specific queries including the other query parameters in your query using filter.

GWT - Populate Grid asynchronously

we've got a GWT application with a simple search mask displaying the results as a grid.
Server side processing time is ok as well as network latency.
Client rendering time is ok even on low spec hardware with internet explorer 6 as long as the number of results is not too high (max 100 rows in the grid).
We have implemented a navigation scheme allowing the user to scroll up/down the grid. That's fast enough also.
Has anybody an idea if it is possible to display the first 100 results immediately and pull the rest in the background? The GWT architecture allows this. However I'm interested in possible pitfalls e.g. what happens if the user starts another query while the browser is still fetching previous results etc.
Thanks!
Holger
LazyPanel and this blog post might be a good starting point for you :)
The GWT Incubator has also many interesting (albeit not always complete/perfect/stable) tables and other pagination solutions - like PagingScrollTable.
Assuming your plan is to send the first 100, and then bring the rest, you can use bulks for the rest of the results. then, if a user initiates another search, you just wait for the end of the bulk ( ie, check between bulk retrivals if you have a pending query ).
Another way you can go is assign identifiers to the user searches. this will make the problem of mixed results non-existant, and will also help you with results history for multiple searches.
we found that users love the live grid look & feel, which solves most of those problems, but that might not be optional always.

Resources