Parse.com how to investigate excessive amount of requests - parse-platform

I'm developing a basic messaging system on the Parse.com at the moment and I have noticed in the Events Analytics screen I'm hitting 30,000+ requests per day. This is a shock considering I'm the only person using the system at the moment. Obviously with a few users I would blow my API request limit straight away.
I'm pretty experienced with Parse.com these days, so I'm lean with queries and I'm alert to not putting finds, saves, retrieves, etc in for loops. I also understand that saveAll() on an array of ParseObjects doesn't always limit the request count to 1 (depending on relationships inside that object).
So how does one track down where the excessive calls are coming from?
I see the above Analytics > Performance > Served Requests data, but how do I drill down to see if cloud code or iOS is the culprit?
Current solution is to effectively unit test each block of Parse code and look at the results in above screen.

For the benefit of others who may happen upon this thread with the same questions, I found some techniques to hunt down where excessive requests are coming from.
1) Parse's documentation on the API's themselves is really good, but there isn't a lot of information / guides for the admin interfaces. Under: Analytics -> Explorer -> Make a table there is a capability to download all the requests for a specific day (to import into a spreadsheet). The data isn't very detailed though and the dates are epoch timestamps, so hard to follow. At least you can see [Request Type, Class, Installation ID] e.g. ["find", "MyParseClass", "Cloud Code"].
2) My other technique was to add custom Analytic events to the code. So in Cloud Code for example, I added the following line to each beforeSave and afterSave event:
Parse.Analytics.track('MyClass_beforeSave', null);
3) Obviously, Parse logs these calls in the Logs window, but given you can only see the most recents transactions and can't clear them, I found it mostly unhelpful in tracking down the excessive calls.

Related

investments/transactions/get endpoint - how long to return data?

I've been testing Plaid's investments transactions endpoint (investments/transactions/get) in development.
I'm encountering issues with highly variable delays for data to be returned (following the product initialization with Link). Plaid states that it takes 1–2 minutes to return investment transaction data, but I've found that in practice, it can be up to several hours before the data is returned.
Anyone else using this endpoint and getting data returned within 1–2 minutes, or is it generally a longer wait?
If it is a longer wait, do you simply wait for the DEFAULT_UPDATE webhook before you retrieve the data?
So far, my experience with their investments/transactions/get has been problematic (missing transactions, product doesn't work as described in their docs, limited sandbox dataset, etc.) so I'm very interested in hearing from anyone with more experience with this endpoint.
Do you find this endpoint generally reliable, and the data provided to be usable, or have you had issues? I've not seen any issues with investments/holdings/get, so I'm hoping that my problems are unusual, and I just need to push through it.
I'm testing in development with my own brokerage accounts, so I know what the underlying transactions are compared to what Plaid is returning to me. My calls are set up correctly, and I can't get a helpful answer from Plaid support.
I took at look at the support issue and it does appear like the problem you're hitting is related to a bug (or two different bugs, in this case).
However, for posterity/anyone else reading this question, I looked it up and the general answer to the question is that the endpoint in the general case is pretty fast -- P95 latency for calling /investments/transactions/get is currently about 1 second (initial calls on an Item will be higher latency as they have more data to fetch and because they are blocked on Plaid's extracting the data for the Item for the first time -- hence the 1-2 minute guidance in the docs).
In addition, Investments updates at some major brokerages are scheduled to happen only overnight after market close, so there might be a delay of 12+ hours between making a trade and seeing that trade be returned by the API.

What is "sf_max_daily_api_calls"?

Does someone know what "sf_max_daily_api_calls" parameter in Heroku mappings does? I do not want to assume it is a daily limit for write operations per object and I cannot find an explanation.
I tried to open a ticket with Heroku, but in their support ticket form "Which application?" drop-down is required, but none of the support categories have anything to choose there from, the only option is "Please choose..."
I tried to find any reference to this field and can't - I can only see it used in Heroku's Quick Start guide, but without an explanation. I have a very busy object I'm working on, read/write, and want to understand any limitations I need to account for.
Salesforce orgs have rolling 24h limit of max daily API calls. Generally the limit is very generous in test orgs (sandboxes), 5M calls because you can make stupid mistakes there. In productions it's lower. Bit counterintuitive but protects their resources, forces you to write optimised code/integrations...
You can see your limit in Setup -> Company information. There's a formula in documentation, roughly speaking you gain more of that limit with every user license you purchased (more for "real" internal users, less for community users), same as with data storage limits.
Also every API call is supposed to return current usage (in special tag for SOAP API, in a header in REST API) so I'm not sure why you'd have to hardcode anything...
If you write your operations right the limit can be very generous. No idea how that Heroku Connect works. Ideally you'd spot some "bulk api 2.0" in the documentation or try to find synchronous vs async in there.
Normal old school synchronous update via SOAP API lets you process 200 records at a time, wasting 1 API call. REST bulk API accepts csv/json/xml of up to 10K records and processes them asynchronously, you poll for "is it done yet" result... So starting job, uploading files, committing job and then only checking say once a minute can easily be 4 API calls and you can process milions of records before hitting the limit.
When all else fails, you exhausted your options, can't optimise it anymore, can't purchase more user licenses... I think they sell "packets" of more API calls limit, contact your account representative. But there are lots of things you can try before that, not the least of them being setting up a warning when you hit say 30% threshold.

How can i track AJAX performance using Google Analytics?

Since my web application using many AJAX request so categorize as Single Page Application.
what i want is to track AJAX technical performance using Google Analytics.
Regarding to GA document, it suggest to implement Virtual Pageviews Tracking as detail in this link
https://developers.google.com/analytics/devguides/collection/analyticsjs/single-page-applications
After implement virtual pageviews tracking, Pageviews stats and Page URI seem to be feed into GA correctly. But Timing Stats such as Avg.Page Load Time (sec) are not. all of them have no value!
I tried these 3 senario to implement Virtual Page Tracking but non of them is working.
do i miss something ? or it's GA limitation so we can not collect Timing stats of Virtual Page just like Real Pageview ?
any others Tools suggestion to track AJAX performance ?
GA is not meant to be used to track page performance and the Value in ga implies monetary value.
When it says "tracking pageviews" it's not about measuring performance, it's about tracking user activity. As in, how many pages per session, what pages, what led to conversions, where they have troubles going through and so forth. Not a technical tool, but an analytics/marketing tool.
Technically, you still could use it to track page performance and people do it. But not as you've done it. You have to remove any network influence on your timestamps since normal fluctuation there would exceed the useful timing of page performance.
I think the most elegant way of doing it would be creating a custom metric in GA interface and then populate it with performance measuring events (or pageviews). So:
You take a new Date() timestamp (or whatever you do in jquery to get current timestamp) right before the post request
You get another new Date() in the post callback
You calculate the difference in milliseconds and send that as the value of the custom metric with the pageview
You wait for two days for the new data to get processed and build a custom report using your custom metric.
Now when you improve performance of your endpoint, you will be able to see statistical improvements in that report.
This is usually done on the backend though, with the datadog or a similar tool with endpoint monitoring functionality.
When performance is measured on the front-end, we usually use the native performance API, so the window.performance object. Or whatever your front-end rendering library suggests using for that. Here's a bit more on this: https://developer.mozilla.org/en-US/docs/Web/API/performance_property That way you're taking into account a bit more data, not just one endpoint response time.

Eventual Consistency in microservice-based architecture temporarily limits functionality

I'll illustrate my question with Twitter. For example, Twitter has microservice-based architecture which means that different processes are in different servers and have different databases.
A new tweet appears, server A stored in its own database some data, generated new events and fired them. Server B and C didn't get these events at this point and didn't store anything in their databases nor processed anything.
The user that created the tweet wants to edit that tweet. To achieve that, all three services A, B, C should have processed all events and stored to db all required data, but service B and C aren't consistent yet. That means that we are not able to provide edit functionality at the moment.
As I can see, a possible workaround could be in switching to immediate consistency, but that will take away all microservice-based architecture benefits and probably could cause problems with tight coupling.
Another workaround is to restrict user's actions for some time till data aren't consistent across all necessary services. Probably a solution, depends on customer and his business requirements.
And another workaround is to add additional logic or probably service D that will store edits as user's actions and apply them to data only when they will be consistent. Drawback is very increased complexity of the system.
And there are two-phase commits, but that's 1) not really reliable 2) slow.
I think slowness is a huge drawback in case of such loads as Twitter has. But probably it could be solved, whereas lack of reliability cannot, again, without increased complexity of a solution.
So, the questions are:
Are there any nice solutions to the illustrated situation or only things that I mentioned as workarounds? Maybe some programming platforms or databases?
Do I misunderstood something and some of workarounds aren't correct?
Is there any other approach except Eventual Consistency that will guarantee that all data will be stored and all necessary actions will be executed by other services?
Why Eventual Consistency has been picked for this use case? As I can see, right now it is the only way to guarantee that some data will be stored or some action will be performed if we are talking about event-driven approach when some of services will start their work when some event is fired, and following my example, that event would be “tweet is created”. So, in case if services B and C go down, I need to be able to perform action successfully when they will be up again.
Things I would like to achieve are: reliability, ability to bear high loads, adequate complexity of solution. Any links on any related subjects will be very much appreciated.
If there are natural limitations of this approach and what I want cannot be achieved using this paradigm, it is okay too. I just need to know that this problem really isn't solved yet.
It is all about tradeoffs. With eventual consistency in your example it may mean that the user cannot edit for maybe a few seconds since most of the eventual consistent technologies would not take too long to replicate the data across nodes. So in this use case it is absolutely acceptable since users are pretty slow in their actions.
For example :
MongoDB is consistent by default: reads and writes are issued to the
primary member of a replica set. Applications can optionally read from
secondary replicas, where data is eventually consistent by default.
from official MongoDB FAQ
Another alternative that is getting more popular is to use a streaming platform such as Apache Kafka where it is up to your architecture design how fast the stream consumer will process the data (for eventual consistency). Since the stream platform is very fast it is mostly only up to the speed of your stream processor to make the data available at the right place. So we are talking about milliseconds and not even seconds in most cases.
The key thing in these sorts of architectures is to have each service be autonomous when it comes to writes: it can take the write even if none of the other application-level services are up.
So in the example of a twitter like service, you would model it as
Service A manages the content of a post
So when a user makes a post, a write happens in Service A's DB and from that instant the post can be edited because editing is just a request to A.
If there's some other service that consumes the "post content" change events from A and after a "new post" event exposes some functionality, that functionality isn't going to be exposed until that service sees the event (yay tautologies). But that's just physics: the sun could have gone supernova five minutes ago and we can't take any action (not that we could have) until we "see the light".

Java EE servlet to create a file and show progress while creating it

I need to write a servlet that will return to the user a csv that holds some statistics.
I know how to return just the file, but how can I do it while showing a progress bar of the file creation process?
I am having trouble understanding how can I do something ajaxy to show the progress of the file creation, while creating the file at the same time - if I create a servlet that will return the completion percentage, how can it keep the same file it is creating while returning a response every x seconds to the browser to show the progress.
There's two fundamentally different approaches. One is true asynchronous delivery using an approach such as Comet. You can see some descriptions in articles such as this. I would use this approach where the data your are delivering is naturally incremental - for example live measurements from instrumentation. Some Java App Servers have nice integration between their JMS message systems and comet to the browser.
The other approach is that you have a polling mechanism. The JavaScript in the browser makes periodic calls to the server to get status (and maybe the next chunk of data). The advantage of this approach is that you are using a very standard programming model, less new stuff to learn. For many cases, such as "are there new answers for the Stack Overflow question I'm working on?" this is quite sufficient.
Your challenge may be to determine any useful progress information. How would you know how far through the generation of the CSV file you are?
If you are firing off a long running request from a servlet it's quite likely that you will effectivley spin off a worker thread to do that work. (Maybe using JMS, maybe using asynch workers) and immediately return a response to the browser saying "Understood, I'm thinking". This ensures that you are not vulnerable to and Http response timeouts. The problem then is how to determine the current progress. Unless the "worker" doing the work has some way to communicate its partial progress you have nothing useful to say. This kind of thing tend to be very application-specific. Some tasks very naturally have progress points (consider printing we know how many pages to do and how many printed) others don't (consider determining if a number is prime - yes or no, no useful intermediate stages perhaps)

Resources