We want to migrate our dedicated servers to Azure platform for scaling easy and investigated a lot of Azure services for our needs. So one of the Azure service that we want to use is Azure Stream Analytics (ASA).
We've added some Azure Platforms according to our needs for performing some tests (it is not important what they for, for now). Here is the structure:
SimpleApp (Sending Request, Not In Azure) -> Event Hub 1 (EH1) -> ASA -> Event Hub 2 (EH2) -> Function App (FA)
SimpleApp sends a simple HTTP GET request to classic dedicated server
which is named TESTSERVER. It tooks max 100-150ms and it represents
our start time. After that it sends the message to EH1.
ASA's query is simple like this: SELECT * INTO [Output] FROM [Input]
Function App sends a simple HTTP GET request to TESTSERVER for
identifying finish time.
We've shocked when we see the results from our TESTSERVER logs. It tooks 4000-5000ms!
Then we started to investigate the issue. Checked values like EventEnqueuedUtcTime and EventProcessedUtcTime to identify which block causes this slowness. But these time values are totally irrelevant. For example; EventEnqueuedUtcTime should be less than EventProcessedUtcTime but not! So this shows us time servers may be different even in different Azure blocks and we cannot use them to measure. Am I wrong?
Anyway, after this we suspected that maybe the last Azure Function App may cause this slowness. We thought that maybe Function App's Event Hub Trigger does not work well. So we designed a new test environment:
SimpleApp (Sending Request, Not In Azure) -> Event Hub 1 (EH1) -> Function App (FA1) -> Event Hub 2 (EH2) -> Function App 2 (FA2)
Second shock... It tooks only ~400ms totally!
Then, we've performed a lot of tests with different architecture which contains ASA but all of them are too slow for us.
Have you experienced any performance issues with ASA? Could you please share your experience and your flows' total time consumption?
Best regards.
There is a latency when merging all the event in chronological order from the Event Hub.
ASA will visit all partitions from EH, get the data and organize the events into chronological order. This means that data must arrive at all partitions in the EH. I think this will also explain the strange behavior you are seeing with the EventProcessedUtcTime, it might be that because the events are ordered, the logical processing time is before the actual arrival time. Although I'm unsure about this because I do not know the inner workings of ASA.
This latency will increase with the number of partitions used, especially when the dataflow is slow.
You can sidestep the merger by partitioning on the field partitionid from EH.
Make sure you are sending the data to the correct partition in EH as well.
More information can be found here at the Stream Analytics blog.
Related
I've an application with 10M users. The application has access to the user's Google Health data. I want to periodically read/refresh users' data using Google APIs.
The challenge that I'm facing is the memory-intensive task. Since Google does not provide any callback for new data, I'll be doing background sync (every 30 mins). All users would be picked and added to a queue, which would then be picked sequentially (depending upon the number of worker nodes).
Now for 10M users being refreshed every 30 mins, I need a lot of worker nodes.
Each user request takes around 1 sec including network calls.
In 30 mins, I can process = 1800 users
To process 10M users, I need 10M/1800 nodes = 5.5K nodes
Quite expensive. Both monetary and operationally.
Then thought of using lambdas. However, lambda requires a NAT with an internet gateway to access the public internet. Relatively, it very cheap.
Want to understand if there's any other possible solution wrt the scale?
Without knowing more about your architecture and the google APIs it is difficult to make a recommendation.
Firstly I would see if google offer a bulk export functionality, then batch up the user requests. So instead of making 1 request per user you can make say 1 request for 100k users. This would reduce the overhead associated with connecting and processing/parsing of the message metadata.
Secondly i'd look to see if i could reduce the processing time, for example an interpreted language like python is in a lot of cases much slower than a compiled language like C# or GO. Or maybe a library or algorithm can be replaced with something more optimal.
Without more details of your specific setup its hard to offer more specific advice.
I'll illustrate my question with Twitter. For example, Twitter has microservice-based architecture which means that different processes are in different servers and have different databases.
A new tweet appears, server A stored in its own database some data, generated new events and fired them. Server B and C didn't get these events at this point and didn't store anything in their databases nor processed anything.
The user that created the tweet wants to edit that tweet. To achieve that, all three services A, B, C should have processed all events and stored to db all required data, but service B and C aren't consistent yet. That means that we are not able to provide edit functionality at the moment.
As I can see, a possible workaround could be in switching to immediate consistency, but that will take away all microservice-based architecture benefits and probably could cause problems with tight coupling.
Another workaround is to restrict user's actions for some time till data aren't consistent across all necessary services. Probably a solution, depends on customer and his business requirements.
And another workaround is to add additional logic or probably service D that will store edits as user's actions and apply them to data only when they will be consistent. Drawback is very increased complexity of the system.
And there are two-phase commits, but that's 1) not really reliable 2) slow.
I think slowness is a huge drawback in case of such loads as Twitter has. But probably it could be solved, whereas lack of reliability cannot, again, without increased complexity of a solution.
So, the questions are:
Are there any nice solutions to the illustrated situation or only things that I mentioned as workarounds? Maybe some programming platforms or databases?
Do I misunderstood something and some of workarounds aren't correct?
Is there any other approach except Eventual Consistency that will guarantee that all data will be stored and all necessary actions will be executed by other services?
Why Eventual Consistency has been picked for this use case? As I can see, right now it is the only way to guarantee that some data will be stored or some action will be performed if we are talking about event-driven approach when some of services will start their work when some event is fired, and following my example, that event would be “tweet is created”. So, in case if services B and C go down, I need to be able to perform action successfully when they will be up again.
Things I would like to achieve are: reliability, ability to bear high loads, adequate complexity of solution. Any links on any related subjects will be very much appreciated.
If there are natural limitations of this approach and what I want cannot be achieved using this paradigm, it is okay too. I just need to know that this problem really isn't solved yet.
It is all about tradeoffs. With eventual consistency in your example it may mean that the user cannot edit for maybe a few seconds since most of the eventual consistent technologies would not take too long to replicate the data across nodes. So in this use case it is absolutely acceptable since users are pretty slow in their actions.
For example :
MongoDB is consistent by default: reads and writes are issued to the
primary member of a replica set. Applications can optionally read from
secondary replicas, where data is eventually consistent by default.
from official MongoDB FAQ
Another alternative that is getting more popular is to use a streaming platform such as Apache Kafka where it is up to your architecture design how fast the stream consumer will process the data (for eventual consistency). Since the stream platform is very fast it is mostly only up to the speed of your stream processor to make the data available at the right place. So we are talking about milliseconds and not even seconds in most cases.
The key thing in these sorts of architectures is to have each service be autonomous when it comes to writes: it can take the write even if none of the other application-level services are up.
So in the example of a twitter like service, you would model it as
Service A manages the content of a post
So when a user makes a post, a write happens in Service A's DB and from that instant the post can be edited because editing is just a request to A.
If there's some other service that consumes the "post content" change events from A and after a "new post" event exposes some functionality, that functionality isn't going to be exposed until that service sees the event (yay tautologies). But that's just physics: the sun could have gone supernova five minutes ago and we can't take any action (not that we could have) until we "see the light".
I'm developing a basic messaging system on the Parse.com at the moment and I have noticed in the Events Analytics screen I'm hitting 30,000+ requests per day. This is a shock considering I'm the only person using the system at the moment. Obviously with a few users I would blow my API request limit straight away.
I'm pretty experienced with Parse.com these days, so I'm lean with queries and I'm alert to not putting finds, saves, retrieves, etc in for loops. I also understand that saveAll() on an array of ParseObjects doesn't always limit the request count to 1 (depending on relationships inside that object).
So how does one track down where the excessive calls are coming from?
I see the above Analytics > Performance > Served Requests data, but how do I drill down to see if cloud code or iOS is the culprit?
Current solution is to effectively unit test each block of Parse code and look at the results in above screen.
For the benefit of others who may happen upon this thread with the same questions, I found some techniques to hunt down where excessive requests are coming from.
1) Parse's documentation on the API's themselves is really good, but there isn't a lot of information / guides for the admin interfaces. Under: Analytics -> Explorer -> Make a table there is a capability to download all the requests for a specific day (to import into a spreadsheet). The data isn't very detailed though and the dates are epoch timestamps, so hard to follow. At least you can see [Request Type, Class, Installation ID] e.g. ["find", "MyParseClass", "Cloud Code"].
2) My other technique was to add custom Analytic events to the code. So in Cloud Code for example, I added the following line to each beforeSave and afterSave event:
Parse.Analytics.track('MyClass_beforeSave', null);
3) Obviously, Parse logs these calls in the Logs window, but given you can only see the most recents transactions and can't clear them, I found it mostly unhelpful in tracking down the excessive calls.
What is the best practise solution for programmaticaly changing the XML file where the number of instances are definied ? I know that this is somehow possible with this csmanage.exe for the Windows Azure API.
How can i measure which Worker Role VMs are actually working? I asked this question on MSDN Community forums as well: http://social.msdn.microsoft.com/Forums/en-US/windowsazure/thread/02ae7321-11df-45a7-95d1-bfea402c5db1
To modify the configuration, you might want to look at the PowerShell Azure Cmdlets. This really simplifies the task. For instance, here's a PowerShell snippet to increase the instance count of 'WebRole1' in Production by 1:
$cert = Get-Item cert:\CurrentUser\My\<YourCertThumbprint>
$sub = "<YourAzureSubscriptionId>"
$servicename = '<YourAzureServiceName>'
Get-HostedService $servicename -Certificate $cert -SubscriptionId $sub |
Get-Deployment -Slot Production |
Set-DeploymentConfiguration {$_.RolesConfiguration["WebRole1"].InstanceCount += 1}
Now, as far as actually monitoring system load and throughput: You'll need a combination of Azure API calls and performance counter data. For instance: you can request the number of messages currently in an Azure Queue:
http://yourstorageaccount.queue.core.windows.net/myqueue?comp=metadata
You can also set up your role to capture specific performance counters. For example:
public override bool OnStart()
{
var diagObj= DiagnosticMonitor.GetDefaultInitialConfiguration();
AddPerfCounter(diagObj,#"\Processor(*)\% Processor Time",60.0);
AddPerfCounter(diagObj, #"\ASP.NET Applications(*)\Request Execution Time", 60.0);
AddPerfCounter(diagObj,#"\ASP.NET Applications(*)\Requests Executing", 60.0);
AddPerfCounter(diagObj, #"\ASP.NET Applications(*)\Requests/Sec", 60.0);
//Set the service to transfer logs every minute to the storage account
diagObj.PerformanceCounters.ScheduledTransferPeriod = TimeSpan.FromMinutes(1.0);
//Start Diagnostics Monitor with the new storage account configuration
DiagnosticMonitor.Start("DiagnosticsConnectionString",diagObj);
}
So this code captures a few performance counters into local storage on each role instance, then every minute those values are transferred to table storage.
The trick, now, is to retrieve those values, parse them, evaluate them, and then tweak your role instances accordingly. The Azure API will let you easily pull the perf counters from table storage. However, parsing and evaluating will take some time to build out.
Which leads me to my suggestion that you look at the Azure Dynamic Scaling Example on the MSDN code site. This is a great sample that provides:
A demo line-of-business app hosting a wcf service
A load-generation tool that pushes messages to the service at a rate you specify
A load-monitoring web UI
A scaling engine that can either be run locally or in an Azure role.
It's that last item you want to take a careful look at. Based on thresholds, it compares your performance counter data, as well as queue-length data, to those thresholds. Based on the comparisons, it then scales your instances up or down accordingly.
Even if you end up not using this engine, you can see how data is grabbed from table storage, massaged, and used for driving instance changes.
Quantifying the load is actually very application specific - particularly when thinking through the Worker Roles. For example, if you are doing a large parallel processing application, the expected/hoped for behavior would be 100% CPU utilization across the board and the 'scale decision' may be based on whether or not the work queue is growing or shrinking.
Further complicating the decision is the lag time for the various steps - increasing the Role Instance Count, joining the Load Balancer, and/or dropping from the load balancer. It is very easy to get into a situation where you are "chasing" the curve, constantly churning up and down.
As to your specific question about specific VMs, since all VMs in a Role definition are identical, measuring a single VM (unless the deployment starts with VM count 1) should not really tell you much - all VMs are sitting behind a load balancer and/or are pulling from the same queue. Any variance should be transitory.
My recommendation would be to pick something that is not inherently highly variable to monitor (e.g. CPU). Generally, you want to find a trending point - for web apps it may be the response queue, for parallel apps it may be azure queue depth, etc. but for either they would be the trend and not the absolute number. I would also suggest measuring them at fairly broad intervals - minutes, not seconds. If you have a load you need to respond to in seconds, then realistically you will need to increase your running instance count ahead of time.
With regard to your first question, you can also use the Autoscaling Application Block to dynamically change instance counts based on a set of predefined rules.
I'm currently trying to build an application that inherently needs good time synchronization across the server and every client. There are alternative designs for my application that can do away with this need for synchronization, but my application quickly begins to suck when it's not present.
In case I am missing something, my basic problem is this: firing an event in multiple locations at exactly the same moment. As best I can tell, the only way of doing this requires some kind of time synchronization, but I may be wrong. I've tried modeling the problem differently, but it all comes back to either a) a sucky app, or b) requiring time synchronization.
Let's assume I Really Really Do Need synchronized time.
My application is built on Google AppEngine. While AppEngine makes no guarantees about the state of time synchronization across its servers, usually it is quite good, on the order of a few seconds (i.e. better than NTP), however sometimes it sucks badly, say, on the order of 10 seconds out of sync. My application can handle 2-3 seconds out of sync, but 10 seconds is out of the question with regards to user experience. So basically, my chosen server platform does not provide a very reliable concept of time.
The client part of my application is written in JavaScript. Again we have a situation where the client has no reliable concept of time either. I have done no measurements, but I fully expect some of my eventual users to have computer clocks that are set to 1901, 1970, 2024, and so on. So basically, my client platform does not provide a reliable concept of time.
This issue is starting to drive me a little mad. So far the best thing I can think to do is implement something like NTP on top of HTTP (this is not as crazy as it may sound). This would work by commissioning 2 or 3 servers in different parts of the Internet, and using traditional means (PTP, NTP) to try to ensure their sync is at least on the order of hundreds of milliseconds.
I'd then create a JavaScript class that implemented the NTP intersection algorithm using these HTTP time sources (and the associated roundtrip information that is available from XMLHTTPRequest).
As you can tell, this solution also sucks big time. Not only is it horribly complex, but only solves one half the problem, namely giving the clients a good notion of the current time. I then have to compromise on the server, either by allowing the clients to tell the server the current time according to them when they make a request (big security no-no, but I can mitigate some of the more obvious abuses of this), or having the server make a single request to one of my magic HTTP-over-NTP servers, and hoping that request completes speedily enough.
These solutions all suck, and I'm lost.
Reminder: I want a bunch of web browsers, hopefully as many as 100 or more, to be able to fire an event at exactly the same time.
Let me summarize, to make sure I understand the question.
You have an app that has a client and server component. There are multiple servers that can each be servicing many (hundreds) of clients. The servers are more or less synced with each other; the clients are not. You want a large number of clients to execute the same event at approximately the same time, regardless of which server happens to be the one they connected to initially.
Assuming that I described the situation more or less accurately:
Could you have the servers keep certain state for each client (such as initial time of connection -- server time), and when the time of the event that will need to happen is known, notify the client with a message containing the number of milliseconds after the beginning value that need to elapse before firing the event?
To illustrate:
client A connects to server S at time t0 = 0
client B connects to server S at time t1 = 120
server S decides an event needs to happen at time t3 = 500
server S sends a message to A:
S->A : {eventName, 500}
server S sends a message to B:
S->B : {eventName, 380}
This does not rely on the client time at all; just on the client's ability to keep track of time for some reasonably short period (a single session).
It seems to me like you're needing to listen to a broadcast event from a server in many different places. Since you can accept 2-3 seconds variation you could just put all your clients into long-lived comet-style requests and just get the response from the server? Sounds to me like the clients wouldn't need to deal with time at all this way ?
You could use ajax to do this, so yoǘ'd be avoiding any client-side lockups while waiting for new data.
I may be missing something totally here.
If you can assume that the clocks are reasonable stable - that is they are set wrong, but ticking at more-or-less the right rate.
Have the servers get their offset from a single defined source (e.g. one of your servers, or a database server or something).
Then have each client calculate it's offset from it's server (possible round-trip complications if you want lots of accuracy).
Store that, then you the combined offset on each client to trigger the event at the right time.
(client-time-to-trigger-event) = (scheduled-time) + (client-to-server-difference) + (server-to-reference-difference)
Time synchronization is very hard to get right and in my opinion the wrong way to go about it. You need an event system which can notify registered observers every time an event is dispatched (observer pattern). All observers will be notified simultaneously (or as close as possible to that), removing the need for time synchronization.
To accommodate latency, the browser should be sent the timestamp of the event dispatch, and it should wait a little longer than what you expect the maximum latency to be. This way all events will be fired up at the same time on all browsers.
Google found the way to define time as being absolute. It sounds heretic for a physicist and with respect to General Relativity: time is flowing at different pace depending on your position in space and time, on Earth, in the Universe ...
You may want to have a look at Google Spanner database: http://en.wikipedia.org/wiki/Spanner_(database)
I guess it is used now by Google and will be available through Google Cloud Platform.