Why is Google Drive's files:list API so inconsistent? - google-api

I've been working a bunch with Google Drive's files.list API to get the metadata (mimetypes and sharing permissions) across all 331,000 accounts in our Workspace instance.
But the API is inconsistent.
For some users, it will return over 100 items per second, for others it's under 17 items per second. This is true even though the users have similar file/folder counts and similar numbers of shared items and collaborators:
For some users, it'll list out millions of files with no issues, with others, it throws 500 errors over and over and over again. The 500 errors are often related to listing certain specific accounts that throw those errors repeatedly, even across multiple runs, across multiple days.
This is all occurring even with huge exponential backoffs using Python's retrying module:
#retry(wait_exponential_multiplier=5000, wait_exponential_max=100000,stop_max_attempt_number=30)
For longer runs, the level of errors thrown by the API is extremely low for the first 12 hours or so, then later in the run it spikes to 5% or more:
Is this normal for Drive's file.list API and just something I will need to code around, or is there something about it that I'm not understanding?

Related

What is "sf_max_daily_api_calls"?

Does someone know what "sf_max_daily_api_calls" parameter in Heroku mappings does? I do not want to assume it is a daily limit for write operations per object and I cannot find an explanation.
I tried to open a ticket with Heroku, but in their support ticket form "Which application?" drop-down is required, but none of the support categories have anything to choose there from, the only option is "Please choose..."
I tried to find any reference to this field and can't - I can only see it used in Heroku's Quick Start guide, but without an explanation. I have a very busy object I'm working on, read/write, and want to understand any limitations I need to account for.
Salesforce orgs have rolling 24h limit of max daily API calls. Generally the limit is very generous in test orgs (sandboxes), 5M calls because you can make stupid mistakes there. In productions it's lower. Bit counterintuitive but protects their resources, forces you to write optimised code/integrations...
You can see your limit in Setup -> Company information. There's a formula in documentation, roughly speaking you gain more of that limit with every user license you purchased (more for "real" internal users, less for community users), same as with data storage limits.
Also every API call is supposed to return current usage (in special tag for SOAP API, in a header in REST API) so I'm not sure why you'd have to hardcode anything...
If you write your operations right the limit can be very generous. No idea how that Heroku Connect works. Ideally you'd spot some "bulk api 2.0" in the documentation or try to find synchronous vs async in there.
Normal old school synchronous update via SOAP API lets you process 200 records at a time, wasting 1 API call. REST bulk API accepts csv/json/xml of up to 10K records and processes them asynchronously, you poll for "is it done yet" result... So starting job, uploading files, committing job and then only checking say once a minute can easily be 4 API calls and you can process milions of records before hitting the limit.
When all else fails, you exhausted your options, can't optimise it anymore, can't purchase more user licenses... I think they sell "packets" of more API calls limit, contact your account representative. But there are lots of things you can try before that, not the least of them being setting up a warning when you hit say 30% threshold.

Fetch third party data in a periodic interval

I've an application with 10M users. The application has access to the user's Google Health data. I want to periodically read/refresh users' data using Google APIs.
The challenge that I'm facing is the memory-intensive task. Since Google does not provide any callback for new data, I'll be doing background sync (every 30 mins). All users would be picked and added to a queue, which would then be picked sequentially (depending upon the number of worker nodes).
Now for 10M users being refreshed every 30 mins, I need a lot of worker nodes.
Each user request takes around 1 sec including network calls.
In 30 mins, I can process = 1800 users
To process 10M users, I need 10M/1800 nodes = 5.5K nodes
Quite expensive. Both monetary and operationally.
Then thought of using lambdas. However, lambda requires a NAT with an internet gateway to access the public internet. Relatively, it very cheap.
Want to understand if there's any other possible solution wrt the scale?
Without knowing more about your architecture and the google APIs it is difficult to make a recommendation.
Firstly I would see if google offer a bulk export functionality, then batch up the user requests. So instead of making 1 request per user you can make say 1 request for 100k users. This would reduce the overhead associated with connecting and processing/parsing of the message metadata.
Secondly i'd look to see if i could reduce the processing time, for example an interpreted language like python is in a lot of cases much slower than a compiled language like C# or GO. Or maybe a library or algorithm can be replaced with something more optimal.
Without more details of your specific setup its hard to offer more specific advice.

JMeter and page views

I'm trying to use data from google analytics for an existing website to load test a new website. In our busiest month over an hour we had 8361 page requests. So should I get a list of all the urls for these page requests and feed these to jMeter, would that be a sensible approach? I'm hoping to compare the page response times against the existing website.
If you need to do this very quickly, say you have less than an hour for scripting, in that case you can do this way to compare that there are no major differences between 2 instances.
If you would like to go deeper:
8361 requests per hour == 2.3 requests per second so it doesn't make any sense to replicate this load pattern as I'm more than sure that your application will survive such an enormous load.
Performance testing is not only about hitting URLs from list and measuring response times, normally the main questions which need to be answered are:
how many concurrent users my application can support providing acceptable response times (at this point you may be also interested in requests/second)
what happens when the load exceeds the threshold, what types of errors start occurring and what is the impact.
does application recover when the load gets back to normal
what is the bottleneck (i.e. lack of RAM, slow DB queries, low network bandwidth on server/router, whatever)
So the options are in:
If you need "quick and dirty" solution you can use the list of URLs from Google Analytics with i.e. CSV Data Set Config or Access Log Sampler or parse your application logs to replay production traffic with JMeter
Better approach would be checking Google Analytics to identify which groups of users you have and their behavioral patterns, i.e. X % of not authenticated users are browsing the site, Y % of authenticated users are searching, Z % of users are doing checkout, etc. After it you need to properly simulate all these groups using separate JMeter Thread Groups and keep in mind cookies, headers, cache, think times, etc. Once you have this form of test gradually and proportionally increase the number of virtual users and monitor the correlation of increasing response time with the number of virtual users until you hit any form of bottleneck.
The "sensible approach" would be to know the profile, the pattern of your load.
For that, it's excellent you're already have these data.
Yes, you can feed it as is, but that would be the quick & dirty approach - while get the data analysed, patterns distilled out of it and applied to your test plan seems smarter.

Parse.com how to investigate excessive amount of requests

I'm developing a basic messaging system on the Parse.com at the moment and I have noticed in the Events Analytics screen I'm hitting 30,000+ requests per day. This is a shock considering I'm the only person using the system at the moment. Obviously with a few users I would blow my API request limit straight away.
I'm pretty experienced with Parse.com these days, so I'm lean with queries and I'm alert to not putting finds, saves, retrieves, etc in for loops. I also understand that saveAll() on an array of ParseObjects doesn't always limit the request count to 1 (depending on relationships inside that object).
So how does one track down where the excessive calls are coming from?
I see the above Analytics > Performance > Served Requests data, but how do I drill down to see if cloud code or iOS is the culprit?
Current solution is to effectively unit test each block of Parse code and look at the results in above screen.
For the benefit of others who may happen upon this thread with the same questions, I found some techniques to hunt down where excessive requests are coming from.
1) Parse's documentation on the API's themselves is really good, but there isn't a lot of information / guides for the admin interfaces. Under: Analytics -> Explorer -> Make a table there is a capability to download all the requests for a specific day (to import into a spreadsheet). The data isn't very detailed though and the dates are epoch timestamps, so hard to follow. At least you can see [Request Type, Class, Installation ID] e.g. ["find", "MyParseClass", "Cloud Code"].
2) My other technique was to add custom Analytic events to the code. So in Cloud Code for example, I added the following line to each beforeSave and afterSave event:
Parse.Analytics.track('MyClass_beforeSave', null);
3) Obviously, Parse logs these calls in the Logs window, but given you can only see the most recents transactions and can't clear them, I found it mostly unhelpful in tracking down the excessive calls.

google places api requests - OVER_QUERY_LIMIT before actually over the limit

I have developed a google places application to get info about places. I have verified my identity with google and as per the limits, I should be allowed up to 100 000 requests per day. However, after under 300 requests (different numbers go through each day), I get the message back: OVER_QUERY_LIMIT. Any similar experiences or ideas how to enable the requests to go through?
Thank you.
D Lax
You can track your requests at
https://code.google.com/apis/console/?noredirect#:stats
I ran into this issue as well. I was able to find 3 throttle limits for the places api, given by google.
10 api calls per every 1 second
100 api calls per every 100 seconds
50,000 api calls per 1 day
If I were to go over any of these limits, I would receive the OVER_QUERY_LIMIT error and it would return no results for that given address.
I found a way to have my program sleep for 11 seconds after calling the places api with a dataset of 10 addresses. Then the program would call the places api with a new dataset of 10 address. This solution gets around the 10calls/second and the 100calls/100seconds throttle limits. However, I did run into the OVER_QUERY_LIMIT error once I tried my 25th dataset of 10 address (after 240 api calls). So it is clear that there are other throttles not published to help protect the google maps platform.
But, I did see that the limits mentioned above may be changed if you get in contact with the google api help team and sort it out with them.

Resources