Building scalable applications VS just Cloud ready applications - performance

Recently, I got into a discussion with an Architect who is known to be a seasoned Architect. The discussion was around an ideal Architecture and design for a multi-tenant Web Based Application that runs in a Web Farm. The application’s only job is to allow users to upload ‘n number’ of Excel files which are being processed by the System to generate very complex reports. Processing of these files takes a long time ( an hour for each, let’s take it as a constraint). Hence, users after upload wait for the notification from the System to download the generated reports.
At first glance the requirement looks pretty simple, but the expectation is that the application must be 100% scalable.
We discussed on various solutions along with the Architectures but we didn’t find it satisfactory. I need members from this community to propose solution with design along with Technologies. This is not my professional assignment but its just a survey to find out Architect’s view on building scalable applications VS just Cloud ready applications where its easy to scale the infrastructure rather than focusing on applications scalability.

Users can upload excel files through the site. Their credentials are also passed. The backend registers this request in the database, and returns the request id (a Guid or something). The process ends.
A windows service is running and polling the database, looking for new requests to process. You could make use of Quartz.NET to schedule a number of parallel jobs that will handle requests. This request handling (processing of excel files and generating reports) is delegated to load balanced WCF services, so the more WCF services you have, the more parallel Quartz jobs can be scheduled. Another type of Quartz job can be scheduled to send mails if requests have been handled.
The site polls at regular intervals to see the status and progress of requests (or a specific request); and manage them. For requests that have completed the report can be downloaded.
I think this is a very scalable solution that is also loosely coupled.

Related

Stateful workflow engine vs Orchestrated idempotent services

I realize the benefits of workflow engine such as easy to understand communication, easy waiting, parallelism and compensative actions with informative graphical model. The concept is great and more manageable than dogmatic event driven architecture without central coordinator and specified flow.
We are currently using legacy workflow engine to orchestrate microservices in insurance business. Over the time chunks of business logic and little helper scripts has creeped into process model, which is not developer friendly solution to maintain and test with continuous integration standards. Also the lack of available expertise and future support is a huge risk from the project management perspective.
I played around with Camunda and Activiti, but immediately faced compability issues with Spring Boot 3 and a lack of up to date examples and general knowledge outside of relatively small user community. This gives me a bad feeling of drowning into the same swamp as we are now in the future.
We planned design our own Java based orchestrator, which just invokes specified microservices in a specified order when the process is started or user task is completed. The orchestrator will also handle monitoring and versioning of the process flow. It's up to microservices to validate their business context and halt the process by raising user tasks if necessary. When user task is completed, the orchestrator restarts the whole process from the beginning with all tasks cleared. It is the responsibility of microservices to no-op when their work is already done in the previous run. Eventually, the process will reach it's end and finish. This solution would be a good balance of modern DX and coordinated process management.
Is there examples or name for such an idempotent orchestrated architecture?
You only get into the challenge of aligning dependencies between your services and the process engine (and other components) if you tightly couple the orchestration / engine with the services. Happened to me many times in the past, too. If you separate the engine (called remote process engine with Camunda 7, only architecture with Camunda 8), then you are not influenced by its dependencies. Try for instance the Camunda RUN distribution and the external task pattern or C8 SaaS to get to a cleaner, decoupled architecture. See Bernd Ruecker's reasoning here.
Details will depend on your specific requirements, but I would definitely advise anyone against building a homegrown solution. There are enough options in the market and these times are over. Requirements grow over time. There are security vulnerabilities to be aware of and to fix, etc. High maintenance, no market for resources, no synergies, you would need to maintain proprietary knowledge in the company and cannot achieve the same level of quality and feature richness as a more broadly used solution can. For a list of options see for instance Bernd Ruecker's articles. Among the available options I would personally prefer an orchestrator, which uses a graphical process modelling approach based on the BPMN 2 standand. It helps clarity, knowledge transfer, and Business-IT alignment and the standard is a vendor-independent skill set.
There is no need to build your own. Use temporal.io open source project. Besides Java SDK it supports Go, Typescript/Javascript, Python, PHP.
The project started at Uber in 2016. There are hundreds of companies using it for mission critical applications.

Measure average web app response time from the client side during a long period of time

My company has over a hundred users of a specific CRM web application, which is provided as a service by another company to us.
The users of this application are very dissatisfied with its average response time, and I need to find a way to gather metrics during a certain period of time (let's say .. a week) to prove the service provider that they are really providing a bad service.
If the application were mine, I would get the metrics from New Relic or some other equivalent monitoring service, but since it is not, I'm looking for something that could do some sort of client side monitoring.
I already checked Page Speed from Google and YSlow from Yahoo, but both are only useful when you want to test the application during a few seconds. They are not meant for the long term monitoring I need.
Would anybody know a way to get this kind of monitoring from a client side perspective?
LoadRunner is no charge for 50 users, but what you really need is not a test tool but a synthetic user monitor which runs every n number of minutes and pulls the stats. You can build it yourself using LoadRunner 12, Jmeter, or any other http sampling technology. You could also use a service like Gomez for sampling or mpulse from SOASTA for tracking every page component across all users.
Keep in mind that your developer tools will time all of the components of the request to give you some page times. As will Dynatrace for the web client.
If you have access to the web server then consider configuring the web server logs to capture the w3c time-taken field, which will track every request. Depending upon the server the level of granularity can be to the millionth of a second on each and every request.
You could also look at a service like LiteSquare which can process those web logs and provide ammunition for changes to the server to improve performance on a no-gain, no-charge model.
One (expensive) solution would be using LoadRunner endurance test feature. Check here for a demonstration.
Another tool is Oracle OATS.
JMeter is a free tool, though I'm not sure if it's reliable enough to run for a whole week.
These are load generator tools, so if you are testing as a single client, you should carefully chose your load amount (e.g. one user).
Last but not least, you could create your own webservice client, and create a cron job to run it on your specified time of day and log the access time.
If what you want is to get data from their server, this is impossible ... without hacking into it. All you can do is monitor the website as a client, using some of the above tools, make a report and present that to them. But even so they could challenge your bandwidth, your test method etc.
I recommend that you negotiate with them to give you their logs and to prove that their system can support a certain amount of load. If you are a customer to them, you can file a complain or test additional offers.
Dynatrace was already mentioned in combination with Load Testing. As you said that you want to monitor your live system I want to bring Dynatrace up again. Most of the time it is used to do live system monitoring to understand what end users are actually doing. It is also available as a 30 day trial - so - no need to buy it - but - use it for your sanity check: http://bit.ly/dttrial

Performance Testing Methodology

I'm looking for a "concrete" methodology to individuate performance bottleneck of a service provided through a web application. I'm looking for an holistic approach that includes testing of computer network, database and web applications.
Suppose that you are in front of a web application that allows you to download pdf files once logged in your company network.
You access to the application with a browser.
The end user requirement is that the web application must allows to download pdf files (with size up to 5MB) in no more than 1 minute.
Some technical details:
- The application consists of a database, a document management system (e.g., Alfresco) and pieces of Java code.
- An user authenticates him/herself by providing username and password to the application, the application on its turn sends them to the LDAP server (the LDAP server is deployed on another physical server). A java serlet does this work and additionally queries the DB to understand the role of the user (a user can be the administrator, a reader, a writer).
- An authenticated user access to a search page, after searching a document the file will be downloaded. The search works in this way: the user fills in some fields (e.g., the name of the document) the field is sent to the document management systems which performs the actual search of the file and returns the results back to the application.
When the user clicks the download button, the application retrieves the document from the document management system.
The underlying network should be 1GB Eth with some routers/bridges and a load balancer, we have a broad knowledge of network topology.
My question is: if there is a performance bottleneck somewhere (in the network, in the web application, e.g., poor coding) that violates the former requirement (1 second download time) how can we discover it? From which element should we start? For instance trying to understand network performance, then document management systems and at the end the whole system (application, network, database). How should we incrementally increase the number of download request?
I'm looking for a methodology, I've already read
http://www.agileload.com/performance-testing/performance-testing-methodology/test-methodology
http://msdn.microsoft.com/en-us/library/bb924375.aspx
What performance testing methodology are you using for your webapps?
All them contain nice suggestions, but I want a more practical methodology with reference to testing of web application.
Thank you in advance
Is it one minute or one second for a 5MB file? Can you post a diagram
of how the various pieces are connected?
There is a way to determine how the network latency and application processing contribute towards the total response time.
It requires instrumenting the browser and other components that make up the complete system. I.e. writing code in JavaScript, Java, C/C++, Perl, Python, etc. and embedding it into each of the application component so that components can report events to a central collector.
If instrumentation cannot be easily added to the components, then the other alternative to insert event collecting proxies between components and then have them report events to a central collector. You can determine and factor out delays due to proxies by running few tests with and without proxies in the path.
Once the events arrive at the central collector, one can get good visibility into how the response time is made up.

Java EE App Design

I am writing a Java EE application which is supposed to consume SAP BAPIs/RFC using JCo and expose them as web-services to other downstream systems. The application needs to scale to huge volumes in scale of tens of thousands and thousands of simultaneous users.
I would like to have suggestions on how to design this application so that it can meet the required volume.
Its good that you are thinking of scalability right from the design phase. Martin Abbott and Michael Fisher (PayPal/eBay fame) layout a framework called AKF Scale for scaling web apps. The main principle is to scale your app in 3 axis.
X-axis: Cloning of services/ data such that work can be easily distributed across instances. For a web app, this implies ability to add more web servers (clustering).
Y-axis: separation of work responsibility, action or data. So for example in your case, you could have different API calls on different servers.
Z-Axis: separation of work by customer or requester. In your case you could say, requesters from region 1 will access Server 1, requesters from region 2 will access Server 2, etc.
Design your system so that you can follow all 3 above if you need to. But when you initially deploy, you may not need to use all three methods.
You can checkout the book "The Art of Scalability" by the above authors. http://amzn.to/oSQGHb
A final answer is not possible, but based on the information you provided this does not seem to be a problem as long as your application is stateless so that it only forwards requests to SAP and returns the responses. In this case it does not maintain any state at all. If it comes to e.g. asynchronous message handling, temporary database storage or session state management it becomes more complex. If this is true and there is no need to maintain state you can easily scale-out your application to dozens of application servers without changing your application architecture.
In my experience this is not necessarily the case when it comes to SAP integration, think of a shopping cart you want to fill based on products available in SAP. You may want to maintain this cart in your application and only submit the final cart to SAP. Otherwise you end up building an e-commerce application inside your backend.
Most important is that you reduce CPU utilization in your application to avoid a 'too-large' cluster and to reduce all kinds of I/O wherever possible, e.g. small SOAP messages to reduce network I/O.
Furthermore, I recommend to design a proper abstraction layer on top of JCo including the JCO.PoolManager for connection pooling. You may also need a well-thought-out authorization concept if you work with a connection pool managed by only one technical user.
Just some (not well structured) thoughts...

Performance monitoring all layers of a system

I use several loadtesting tools (Loadrunner, JMeter, NeoLoad) to performance test different applications. Im wondering if it is possible to monitor all layers of an application stack so for example. Say i have the following data chain.
Loadbalancer <-x-> Application Server <-x-> RMI <-x-> Java Application <-x-> MQ <-x-> Legacy application <-x-> Database
Where i have marked the x in the chain i am interested in monitoring, for example avg responsetimes.
Obviously we could simply create a wrapper on all endpoints which would gather the statistics for us and maybe we could import it into loadrunner or other loadtesting tools and sideline hem with the tools inbuilt performance statistics, but maybe there is tools/applications which already does this?
If not, how should we proceed, in order to gather this kind of statistics?
The standard for this was supposed to be Application Response Measurement (ARM). It was a cross language set of APIs that did just what you were looking for. The issue is that the products that implement this spec all tend to be big, expensive "enterprise" level monitoring tools. Think multi-week installs, consultants, more infrastructure and lots of buzzwords.
Still, if this is a mission critical app with a mission critical budget, this may be what you need. But you may be able to build your own that does just enough without too much effort. A quick search turns up at least one open source ARM implementation if you still want to use that API.
Another option is to simply to have transactions you can run against each tier of the system to check general responsiveness. For example you can have a static web page on the LB, a no-op tx on the app server, a "hello" servlet on the Java app, put a message directly on the queue, etc. During a performance / load test, these could be hit directly by the load testing tool or you could write a wrapper servlet / application call that does this as a single HTTP (RMI?) call. Running these a few times a minute won't add too much load to the system, but it should help you pinpoint which tier is slower. The nice thing about this approach is that it also works in production, just watch out for security issues.
For single user kind of test, where you know you have problem (e.g. this tx is "slow"), I have also had pretty good luck with network tracing. It's very tedious, but when you aren't sure what tier is slow, starting up a network trace on a few machines and running a single tx usually gives a good idea of what the system is doing.
I have handled this decomposition a number of ways in the past. The first is at a very low level using protocol analyzer dumped data to find the time points where a conversation leaves tier X and enters tier Y. The second method is through the use of log examination for the various tiers. Something that can make your examination quite usefule in this case is a common log server for all of your components (syslog, Rsyslog, etc....) and a nice log parsing tool, such as the freely available Microsoft Logparser. The third method utilization of the audit trail for an application stored in the database. You may find this when working on enterprise services bus style applications which have a consumer/producer model and a bus to pass information rather than a direct connection. The audit trails I have seen are typically stored in a database and allow the tracking of an individual transaction through the entire application infrastructure. Your Load balancer, as a network device, may be out of the hunt on this one.
Note, if you go the protocol analyzer or log route, then be sure and synchronize all of your source information devices to a common time server. Having one of your collectors (analyzer, app log) off on a time stamp basis can really be a hair pulling experience when you get into the analysis phase.
As to how you move from your collected data into LoadRunner, that part is very mechanical. The Analysis program supports an interface to import external datapoints. The format is very specific and is documented in both help and the online docs. This import process works very well, as I often have to use it for collection of statistics from hosts which I do not have direct monitoring access to, but which need to be included as a part of the monitored test infrastructure.
James Pulley
Moderator (YahooGroups LoadRunner, Advanced-Loadrunner; GoogleGroups lr-LoadRunner; Linkedin LoadRunner, LoadRunnerByTheHour; SQAForums LoadRunner, WinRunner)

Resources