We have just added Application Insights to our application and while monitoring requests, failures and exceptions all makes sense, the Dependency stats.
There are over 9000 items in our Total of Dependency calls by Dependency table for less than 250 requests. I'm sure that the app (which uses Entity Framework) has not issued 9000 sql calls for these almost 250 requests.
When I try to drill down into the individual items I can only see that the Dependency type is SQL, as shown below.
Could someone help me understand this more?
Seems like Application Insights has surfaced a real issue with your DAL. Naturally it would take looking into your code to confidently determine what's going on. My best guess is that your code suffers from the N+1 Selects anti-pattern, which is a very common pitfall when using Entity Framework.
You can read more about N+1 Selects and EF here.
Related
I know this question was already asked but I could not find a satisfying answer.
I started to dive deeper in building a real restful api and I like it's contraint of using links for decoupling. So I built my first service ( with java / spring ) and it works well ( although I struggled a bit with finding the right format but that's another question ). After this first step I thought about my real world use case. Micorservices. Highly decoupled individual services. So I made a my previous scenario and I came to some problems or doubts.
SCENARIO:
My setup consists of a reverse proxy ( Traefik which works as service discovery and api gateway) and 2 Microservices. In addition, there is an openid connect security layer. My services are a Player service and a Team service.
So after auth I have an access token with the userId and I am able to call player/userId to get the player information and teams?playerId=userId to get all the teams of the player.
In my opinion, I would in both responses link the opposite service. The player/userId would link to the teams?playerId=userId and vice versa.
QUESTION:
I haven't found a solution besides linking via a hardcoded url. But this comes with so many downfalls as I can't imagine that this a solution used in real world applications. I mean just imagine your api is a bit more advanced and you have to link to 10 resources. If something changes, you have refactor and redeploy them all.
Besides the synchonization problem, how do you handle state in such a case. I mean, REST is all about state transfer. So I won't offer the link of the player to teams service if the player is in no team. Of course I can add the team ids as attribute to the player to decide whether to include the link or not. But this again increases coupling between the services.
The more I dive in the more obstacles I find and I'm about to just stay with my spring rest docs and neglect the core of Rest which I is a pity to me.
Practicable for a microservice architecture?
Fielding, 2000
The REST interface is designed to be efficient for large-grain hypermedia data transfer, optimizing for the common case of the Web, but resulting in an interface that is not optimal for other forms of architectural interaction.
Fielding 2008
REST is intended for long-lived network-based applications that span multiple organizations.
It is not immediately clear to me that "microservices" are going to fall into the sweet spot of "the web". We're not, as a rule, tring to communicate with a microservice that is controlled by another company, we often don't get a lot of benefit out of caching, or code on demand, or the other rest architectural constraints. How important is it to us that we can use general purpose components to exchange information between different microservices within our solution? and so on.
If something changes, you have refactor and redeploy them all.
Yes; and if that's going to be a problem for us, then we need to invest more work up front to define a stable interface between the two. (The fact that we are using "links" isn't special in that regard - if these two things are going to talk to each other, then they are going to need to speak a common language; if that common language needs to evolve over time (likely) then you need to build those capabilities into it).
If you want change over time, then you have to plan for it.
If you want backwards/forwards compatibility, then you have to plan for it.
Your identifiers don't need to be static - there are lots of possible ways of deferring the definition of an identifier; the most obvious being that you can use another identifier to look up the identifier you want, or the formula for calculating it, or whetever.
Think about how Google works - the links they use change all the time, but it doesn't matter because the protocol (refresh your bookmarked search form, enter your text in "the" one field, click the button) hasn't changed in 20 years. The interface is stable (even though the underlying spellings of the identifiers is not) and that's enough.
Looking at eShopOnContainers, the microservice reference architecture from Microsoft. I see that for each service, in Program.cs a call is made to host.MigrateDbContext. This, in turn, executes all of the EF migrations for the given context.
In a real-world orchestrator isn't is possible that numerous containers for the same service could be spun up almost simultaneously? And if that happened, isn't it likely that multiple containers trying to execute the same migrations would deadlock or cause other issues?
Is this something that wasn't dealt with because it is beyond the scope of a reference project or does EF have something built in to handle concurrency that I'm not seeing?
I've found that there are numerous approaches to this problem, each with their own strengths and weaknesses. Some are straightforward... bringing the entire app down, updating schema, and then bringing the app back online. Some implement the schema changes as a series of smaller changes, each of which are both forward and backward compatible, allowing zero downtime. Still others leverage built-in or third-party tools written specifically to address this task.
So, to answer my own question, this topic was almost certainly omitted because it was beyond the scope of the eShopOnContainers project/eBooks. The right choice for you will vary based on your project's size, complexity, acceptable downtime, etc.
I’ve been doing a lot of googling regarding managing dependencies between microservices. We’re trying to move away from big monolithic app into micro-services in order to scale organizationally and be able to develop faster and with multiple teams working in parallel.
However, as we’re trying to functionally partition the monolith into the microservices, we see how intertwined business logic and data really is. This was not a problem when we were sitting on top of one big DB and were able to do big relational joins. But with microservices, this becomes a problem.
One solution is to make microservice-A go to 5-10 other microservices to get necessary data (this is equivalent of DB view with join). Another solution is to make microservice-A listen to events from 5-10 other services and populate local storage with relevant into (this is an equivalent of materialized view). Either way, microservice-A is coupled with 5-10 other services, and if new info is needed in microservice-A, the some of the services that it depends upon might will need to be release prior to microservice-A. Please note that microservice-A is itself depended upon by other services. Bottom line, we end up with DISTRIBUTED dependency hell.
Many articles advocate for second solution – i.e. something along the lines of Event Sourcing, Choreography, etc.
I would appreciate any shared experiences, recommendations and insights.
Philometor.
While not technically an "answer", I can definitely share some of my observations and experiences. Your question concerning services calling other services for database operations reminded me of a project where an architect sold senior management on the idea of "decoupling" persistence from the rest of the applications by implementing hundreds of REST interfaces in what essentially was a distributed DAO pattern in front of a very large enterprise database. The project ended up exactly the way I predicted - a dismal failure.
Microservices aren't about turning a monolithic application into a distributed monolithic application. In my example project above, the monolith was turned into a stove-piped, fragile, chaotic mess, with the coupling only moved to service contracts instead of Java class method signatures, and with a performance hit so bad the application was unusable. Last I heard they are still running their original monolith.
Microservices should be more of a vertical partitioning of your application and not a horizontal one. In my opinion it's better to think in terms of business function partitioning rather than "converting" an existing monolith. There's no rule that determines how big a microservice must be, but it should be big enough to do one complete synchronous function without needing to directly depend on outside services (as much as possible) to complete its work. If a microservice performs a complex business function that affects 50 tables, so be it! It owns those many tables. Ideally if a service goes down, it should affect only that business functionality it's responsible for, and not directly affect other services. As you can see, this thinking is the complete opposite from that which produced the distributed mess in my project example.
Not only do you need to ensure that the motivation behind replacing monoliths with microservices is sound, but also you need to step outside the monolith and revisit the actual business and begin partitioning that instead. Like everything else, baby steps are the way to go. Start with one small complete business function, and convert that into a single microservice instead of trying to replace a monolith all at once.
I'm trying to build a shopping-cart-like webapp on GAE. So far, I haven't deployed anything on GAE still and just keep doing some POC locally... then, I read this:
http://borglin.net/gwt-project/?page_id=688
It surprised me when I read through those "weaknesses" and I'm pretty worried about whether it's a good choice putting the app on GAE. Would someone, esp who has experience building a real-world app with cash transactions, please give me some ideas/share your thoughts?
The article said GAE has "No https support for your domain". Is it true? I thought what I need to do is just point my domain https://www.abc.com to GAE https://abc.appspot.com ...
For BigTable, I understand it would be quite hard to build analytic/statistical functionalities in my app (e.g. provide a monthly transaction summary). Does anyone has any experience to handle such situation...? export data from BigTable to RDBMS and do some SQL ...?
The articale also said that BigTable has a bad write performance: "I'm lucky to get 100 writes per second in a GAE request. " Is it true? I cannot find any figure to support/disprove it ..
I'm now using SpringMVC + Objectify at server-side. Is it too heavy for GAE ? Some said Spring could make a long cold start ... how cold is it? How long would it take to init an app with like, 20+ different pages/controllers, and 20+ kinds of entities/DAO ..?? Any ballpark figure ..??
p.s. If you know any real world app built on GAE, please kindly share here? Because I wanna know how far (or how big) my app could be.
Thanks a lot!
1) That is true. https is only supported for .appspot.com. A very big shortcoming.
2) That's not really true, you can do any kind of monthly summary reports using the remoting api if you need to do complex joins and such. You can also export the data and use an offline tool
3) I haven't seen that kind of write performance bottleneck, but there is eventual consistency to deal with. That said, 100 writes/second is not a small number...
4) I would avoid spring on appengine. A lot of people use it happily but I found that startup time was very slow and that caused problems.
The SSL for Custom Domains is in testing phase. Please note that HTTPS/SSL was not designed to work on shared-IP hosting (such as GAE), so there are some SSL extensions that have various support on browsers (SNI/VIP).
Queries are quite weak on Datastore. They are also expensive. There are two ways to do analytics:
a. Create a set of sharded counters and update the counter every time an event happens (= a financial transaction). We use this and is works well. The only downside is that this is "online" analytics. You can not add additional analytics parameters for the past data.
b. Upload (anonymized) data to Google Big Query and do analytics there.
Datastore has a limitation of about 5 writes-updates/s to a SINGLE entity or entity group (some sources say 1 w/s). There is no limitation on parallel writes to different entities. Remember, GAE is a distributed system where all apps use ONE BigTable database under the hood. So this is pretty scalable.
I don't have experience with this, but there are various reports on the net. See this http://www.listry.com/blog/2010/03/google-app-engine-cold-start-guide-for
I dont know about other topics, but what I can tell you is that we use a combination of Guice + jersey to substitute Spring :) its better for GAE if we take into account rhe startup time
What can be the various performance testing scenarios to be considered for a website with huge traffic? Is there any way to identify the elements of the code which are adversely affecting the site performance?
Please provide something similar to checklist of generalised scenarios to be tested to ensure proper performance testing.
It would be good to start with some load testing tools like JMeter or PushToTest and start running it against your web application. JMeter simulates HTTP traffic and loads the server that way. You can do that as well as load test AJAX parts of your application with PushToTest because it can use Selenium Scripts.
If you don't have the resources (computers to run load tests) you can always use a service like BrowserMob to run the scripts against a web accessible server.
It sounds like you need more of a test plan than a suggestion of tools to use. In performance testing, it is best to look at the users of the application -
How many will use the application on a light day? How many will use the app on a heavy day?
What type of users make up your user population?
What transactions will each of these user types perform?
Using this information, you can identify the major transactions and come up with different user levels (e.g. 10, 25, 50, 100) and percentages of user types (30% user A, 50% user B, ...) to test these transactions with. Time each of these transactions for each test you execute and examine how the transaction times change as compared to your user levels.
After gathering some metrics, since you should be able to narrow transactions to individual pieces of code, you will be able to know where to focus your code improvements. If you still need to narrow things down further, finer tests within each transaction can be created to provide more granular results.
Concurrency will kill you here, as you need to test your maximum projected concurrent users + wiggling room hitting the database, website, and any other web service simultaneously. It really depends on the technologies you're using, but if you have a large interaction of different web technologies, you may want to check out Neoload. I've had nothing but success with this web stress tool, and the support is top notch if you need to emulate specific, complicated behavior (such as mocking AMF traffic, or using responses from web pages to dictate request behavior.)
If you have a DB layer then this should be the initial focus of your attention, once the system is stable (i.e. no memory leaks or other resource issues). If the DB is not the bottle neck (or not relevant) then you need to correlate CPU/Memory/Disk IO and Network traffic with the increasing load and increasing response times. This gives you an idea of capacity and correlation (but not cause) to resource usage.
To find the cause of a given issue with resources you need to establish a Six Sigma style project where you define the problem and perform root case analysis in order to pin point the piece of code (or resource configuration) that is the bottleneck. Once you have done this a couple of times in your environment, you will notice patterns of workload, resource usage and counter measures (solutions) that will guide you in your future performance testing 'projects'.
To choose correct performance scenarios you need to go through the next basic checklist:
High priority scenarios from the business logic perspective. For example: login/order transactions, etc.
Mostly used scenarios by end users. Here you may need information from monitoring tools like NewRelic, etc.
Search / filtering functionality (if applicable) - Scenarios which involve different user roles/permissions
Performance test is a comparison test either with the previous release of the same application or with the existing players in the market.
Case 1- Existing application
1)Carry out the test for the same scenarios as covered before to get a clear picture on the response of the application before and after the upgrade.
2)If you need to dig deeper you can get back to the database team to understand which functionalities are getting more requests. Also ask them on the total number of requests on an average on any particular day so that you can take a call on what user load and time duration to be given for the test.
Case 2- New Application
1) Look for existing market players and design your test as per the critical functions of the rival product (for e.g. Gmail might support many functions what what is being used often is launch ->login ->compose mail -> inbox ->outbox).
2) Any time you can get back to your clients on what they suppose to be business critical scenarios or scenarios that will be used more often..