How much concurrent users does your appliance successfully serve? - GSA QPS - google-search-appliance

Hi fellow GSA developers,
Just wanted to know, in your experience, what model of GSA are you using and how much concurrent search request load does your appliance serve successfully. And the number of total documents you have.
I know each and every environment is different but one can proportionate the data and understand the capability of the GSA Black Box.
I'm calling GSA, a black box, since you can never find out the Physical memory or any other hardware spec, nor can you change it. The only way to scale is to buy more boxes :)
Note: The question is about the GSA as a search engine and not from the portal perspective. In the sense, I'm just concerned about GSA's QPS rather than custom portal's QPS. Since custom portal, well they are custom and they are as good as it's design.

We use two GSAs with Software Version 7.2 and arranged them in a GSA^n "cluster". In the index are ca. 600,000 documents and as all of them are protected the GSA has to spend quite a lot of effort on determining which user is allowed to see which document.
Each of the two GSAs is guaranteed to perform 50 queries per second. We once did a loadtest and as some of the queries were completed in less than a second and thereby freed up the "slot" for incoming queries we were able to process 140 queries per second for a noticeable long time.
99% of our queries are completed in less than a second and as we have a rather complex structure of permissions (users with lots of group memberships) I would say this is a good result.
Like #BigMikeW already said: to get your own figures you should do a load test. Google Support provides a script which can exhaust the GSA and tell you at which QPS rate it started failing (it will simply return a http status code of 500 something).
And talking of "black box": you are able to find out the hardware specs. All of the GSAs I have seen so far (T3 and T4) have a dell Service tag. When you enter that tag at Dell you will find out what is inside the box. But that's pointless, because you can't modify anything of this ;-) It only will become interesting if you use a GSA model that can be repurposed.

This depends on a lot of factors beyond just what model/version you have.
Are the requests part of an already authenticated session?
Are you using early or late binding?
How many authentication mechanisms are you using?
What's the flex authz rule order?
What's the permit/deny ratio for the results?
Any numbers you get in response to this question will have no real meaning to any other environment. My advice would be load test your own environment and use those results for capacity planning.

With the latest software, the GSA has 50 threads dedicated for search responses. This means that it can be responding to 50 requests at any given time. If the searches take on average .5 seconds, this will mean that you can average about 100 qps.
If they take longer...you'll see this be reduced. The GSA will also queue up a few requests before responding with the appropriate http response saying the server is overloaded.

Related

Use traditional post, Ajax post or Channel API for my own Like button?

I'm developing a mobile app with 40 million users per day.
The app will show articles to the user that they can choose to read, no image, just plain text. The user can pull to refresh to get new articles.
I would like to implement the like button to each individual article (my own like button not Facebook). Assume that each client click 100 like per person per day it will be equal to 40M x 100 = 4000 M time of data transfer.
I'm a newbie with no experience with big project before. What is the best approach that suit my project. I found Google Channel API is 0.0001 dollars per channel created which is 80M x 0.0001 = 8000 USD per day (assume there are 2 connection per person) which is quite expensive. Or there is other way to do this? ex. Ajax or Traditional post. My app don't need real-time. which one is less resource consumption? Can someone please guide me. I really need help.
I'm plan to use google app engine for this project.
A small difference in efficiency would multiply to a significant change in operational costs at those volumes. I would not blindly trust theoretical claims made by documentation. It would be sensible to build and test each alternative design and ensure it is compatible with the rest of your software. A few days of trials with several thousand simulated users will produce valuable results at a bearable cost.
Channels, Ajax and conventional web requests are all feasible at the conceptual level of your question. Add in some monitoring code and compare the results of load tests at various levels of scale. In addition to performance and cost, the instrumentation code should also monitor reliability.
I highly doubt your app will get 40 million users a day, and doubt even more that each of those will click Like ten times a day.
But I don't understand why clicking Like would result in a lot of data transfer. It's a simple Ajax request, which wouldn't even need to return anything than an empty response, with a 200 code for success and a 400 for failure.
I would experiment with different options on a small scale first, to get some data from which you can extrapolate to calculate your costs. However, a simple ajax request with a lightweight handler is likely to be cheaper than Channel API.
If you are getting 40m daily users, reading at least 100 articles, then making 100 likes, I'm guessing you will have something exceeding 8bn requests a day. On that basis, your instance costs are likely to be significant before even considering a like button. At that volume of requests, how you handle the request on the server side will be important in managing your costs.
Using tools like AppStats, Chrome Dev Tools and Apache Jmeter will help you get a better view on your response times, instance & bandwidth costs and user experience before you scale up.

What's the correct Cloudwatch/Autoscale settings for extremely short traffic spikes on Amazon Web Services?

I have a site running on amazon elastic beanstalk with the following traffic pattern:
~50 concurrent users normally.
~2000 concurrent users for 1/2 minutes when post is made to Facebook page.
Amazon web services claim to be able to rapidly scale to challenges like this but the "Greater than x for more than 1 minute" setup of cloudwatch doesn't appear to be fast enough for this traffic pattern?
Usually within seconds all the ec2 instances crash, killing all cloudwatch metrics and the whole site is down for 4/6 minutes. So far I've yet to find a configuration that works for this senario.
Here is the graph of a smaller event that also killed the site:
Are these links posted predictably? If so, you can use Scaling by Schedule or as alternative you might change DESIRED-CAPACITY value of Auto Scaling Group or even trigger as-execute-policy to scale out straight before your link is posted.
Do you know you can have multiple scaling policies in one group? So you might have special Auto Scaling policy for your case, something like SCALE_OUT_HIGH which adds say 10 more instances at once. Take a look at as-put-scaling-policy command.
Also, you need to check your code and find bottle necks.
What HTTPD do you use? Consider of switching to Nginx as it's much more faster and less resource consuming software than Apache. Try to use Memcache... NoSQL like Redis for hight read and writes is fine option as well.
The suggestion from AWS was as follows:
We are always working to make our systems more responsive, but it is
challenging to provision virtual servers automatically with a response
time of a few seconds as your use case appears to require. Perhaps
there is a workaround that responds more quickly or that is more
resilient when requests begin to increase.
Have you observed whether the site performs better if you use a larger
instance type or a larger number of instances in the steady state?
That may be one method to be resilient to rapid increases in inbound
requests. Although I recognize it may not be the most cost-effective,
you may find this to be a quick fix.
Another approach may be to adjust your alarm to use a threshold or a
metric that would reflect (or predict) your demand increase sooner.
For example, you might see better performance if you set your alarm to
add instances after you exceed 75 or 100 users. You may already be
doing this. Aside from that, your use case may have another indicator
that predicts a demand increase, for example a posting on your
Facebook page may precede a significant request increase by several
seconds or even a minute. Using CloudWatch custom metrics to monitor
that value and then setting an alarm to Auto Scale on it may also be a
potential solution.
So I think the best answer is to run more instances at lower traffic and use custom metrics to predict traffic from an external source. I am going to try, for example, monitoring Facebook and Twitter for posts with links to the site and scaling up straight away.

How do I figure out my RAM requirements for Cloud Hosting?

I'm new to everything that is 'the cloud.'
I will be developing a website/platform that will have around 15,000,000 estimated monthly visitors after the first year of production.
I'm assuming that the site will have 5 page views per visitor, and 100kb of data transfer per page.
I've contacted several cloud hosting companies, but they tell me that I need to have 'hardware requirements.'
Since I'm rather clueless about IT stuff, I'd like to know:
What are the factors that need to be analyzed in order to determine
How many servers are required
VPUs / server required
RAM / server required
Total storage / server required
Big thanks in advance!
I don't agree with the other answer as it's nearly total guesswork, as will anything you can generate yourself.
The only surefire way to know is to get some hardware, stick your application on it and run some load testing to see if you can get to the point you want to traffic wise, and with a certain amount of free overhead on the servers. Only then will you know what you need. No-one else can answer this question as every application is different. This is your application, only you can test it.
Data given wont help much in determining what numbers you want. But based on my experience I'll try to help you in analysis.
15,000,000 visits a month means 700K visits a day (assuming approx 30-35% visits are by repeat visitors).
700Kx5=3.5million page views a day.
Assuming 14 hours of active period, typical for single timezeone sites. Its 70reqs/sec.
With this big userbase few thing you surely need is a high performance DB server, with one slave.
Config of these DB server
Memory so that whole active data + indexes fits in memory (No swapping/thrashing should happen). This you need to calculate based on
what you will be storing for user and for how long.
Use some reliable storage like RAID10 (higher read/write bandwith).
Take enough storage, see that its elastic enough. (like AWS EBS).
Make frontend app server lightweight and horizontally scalable. Put them behind a loadbalancer (use software loadbalancer like nginx or HAproxy). You should be able to put as many as you go to your goal.
For loadbalacer and frontend take 4CPU, 4-8GB RAM servers.
How much each frontend can take need to be tested using a load testing method and realistic test data.
Reduce load on database/persistent using a inmemory/+persistent caches like memcached/membase/redis etc. Take a servers with 8GB and add more as you feel need.
I have not discussed about DB partitioning. Do that only when you feel the need of it. Do not over invest at start.
With 15M users a month, this setup should be enough, but again it all depends on you 1. memory footprint, 2. amount of active data
I tried to answer as much as possible. Comments on points you disagree or wanna discuss more.

How to decide on what hardware to deploy web application

Suppose you have a web application, no specific stack (Java/.NET/LAMP/Django/Rails, all good).
How would you decide on which hardware to deploy it? What rules of thumb exist when determining how many machines you need?
How would you formulate parameters such as concurrent users, simultaneous connections, daily hits and DB read/write ratio to a decision on how much, and which, hardware you need?
Any resources on this issue would be very helpful...
Specifically - any hard numbers from real world experience and case studies would be great.
Capacity Planning is quite a detailed and extensive area. You'll need to accept an iterative model with a "Theoretical Baseline > Load Testing > Tuning & Optimizing" approach.
Theory
The first step is to decide on the Business requirements: how many users are expected for peak usage ? Remember - these numbers are usually inaccurate by some margin.
As an example, let's assume that all the peak traffic (at worst case) will be over 4 hours of the day. So if the website expects 100K hits per day, we dont divide that over 24 hours, but over 4 hours instead. So my site now needs to support a peak traffic of 25K hits per hour.
This breaks down to 417 hits per minute, or 7 hits per second. This is on the front end alone.
Add to this the number of internal transactions such as database operations, any file i/o per user, any batch jobs which might run within the system, reports etc.
Tally all these up to get the number of transactions per second, per minute etc that your system needs to support.
This gets further complicated when you have requirements such as "Avg response time must be 3 seconds etc" which means you have to figure in network latency / firewall / proxy etc
Finally - when it comes to choosing hardware, check out the published datasheets from each manufacturer such as Sun, HP, IBM, Windows etc. These detail the maximum transactions per second under test conditions. We usually accept 50% of those peaks under real conditions :)
But ultimately the choice of the hardware is usually a commercial decision.
Also you need to keep a minimum of 2 servers at each tier : web / app / even db for failover clustering.
Load testing
It's recommended to have a separate reference testing environment throughout the project lifecycle and post-launch so you can come back to run dedicated performance tests on the app. Scale this to be a smaller version of production, so if Prod has 4 servers and Ref has 1, then you test for 25% of the peak transactions etc.
Tuning & Optimizing
Too often, people throw some expensive hardware together and expect it all to work beautifully. You'll need to tune the hardware and OS for various parameters such as TCP timeouts etc - these are published by the software vendors, and these have to be done once the software are finalized. Set these tuning params on the Ref env, test and then decide which ones you need to carry over to Production.
Determine your expected load.
Setup a machine and run some tests against it with a Load testing tool.
How close are you if you only accomplished 10% of the peak load with some margin for error then you know you are going to need some load balancing. Design and implement a solution and test again. Make sure you solution is flexible enough to scale.
Trial and error is pretty much the way to go. It really depends on the individual app and usage patterns.
Test your app with a sample load and measure performance and load metrics. DB queries, disk hits, latency, whatever.
Then get an estimate of the expected load when deployed (go ask the domain expert) (you have to consider average load AND spikes).
Multiply the two and add some just to be sure. That's a really rough idea of what you need.
Then implement it, keeping in mind you usually won't scale linearly and you probably won't get the expected load ;)

Recommendations for Web application performance benchmarks

I'm about to start testing an intranet web application. Specifically, I've to determine the application's performance.
Please could someone suggest formal/informal standards for how I can judge the application's performance.
Use some tool for stress and load testing. If you're using Java take a look at JMeter. It provides different methods to test you application performance. You should focus on:
Response time: How fast your application is running for normal requests. Test some read/write use case
Load test: How your application behaves in high traffic times. The tool will submit several requests (you can configure that properly) during a period of time.
Stress test: Do your application can operate during a long period of time? This test will push your application to the limits
Start with this, if you're interested, there are other kinds of tests.
"Specifically, I have to determine the application's performance...."
This comes full circle to the issue of requirements, the captured expectations of your user community for what is considered reasonable and effective. Requirements have a number of components
General Response time, " Under a load of .... The Site shall have a general response time of less than x, y% of the time..."
Specific Response times, " Under a load of .... Credit Card processing shall take less than z seconds, a% of the time..."
System Capacity items, " Under a load of .... CPU|Network|RAM|DISK shall not exceed n% of capacity.... "
The load profile, which is the mix of the number of users and transactions which will take place under which the specific, objective, measures are collected to determine system performance.
You will notice the the response times and other measures are no absolutes. Taking a page from six sigma manufacturing principals, the cost to move from 1 exception in a million to 1 exception in a billion is extraordinary and the cost to move to zero exceptions is usually a cost not bearable by the average organization. What is considered acceptable response time for a unique application for your organization will likely be entirely different from a highly commoditized offering which is a public internet facing application. For highly competitive solutions response time expectations on the internet are trending towards the 2-3 second range where user abandonment picks up severely. This has dropped over the past decade from 8 seconds, to 4 seconds and now into the 2-3 second range. Some applications, like Facebook, shoot for almost imperceptible response times in the sub one second range for competitive reasons. If you are looking for a hard standard, they just don't exist.
Something that will help your understanding is to read through a couple of industry benchmarks for style, form, function.
TPC-C Database Benchmark Document
SpecWeb2009 Benchmark Design Document
Setting up a solid set of performance tests which represents your needs is a non-trivial matter. You may want to bring in a specialist to handle this phase of your QA efforts.
On your tool selection, make sure you get one that can
Exercise your interface
Report against your requirements
You or your team has the skills to use
You can get training on and will attend with management's blessing
Misfire on any of the four elements above and you as well have purchased the most expensive tool on the market and hired the most expensive firm to deploy it.
Good luck!
To test the front-end then YSlow is great for getting statistics for how long your pages take to load from a user perspective. It breaks down into stats for each specfic HTTP request, the time it took, etc. Get it at http://developer.yahoo.com/yslow/
Firebug, of course, also is essential. You can profile your JS explicitly or in real time by hitting the profile button. Making optimisations where necessary and seeing how long all your functions take to run. This changed the way I measure the performance of my JS code. http://getfirebug.com/js.html
Really the big thing I would think is response time, but other indicators I would look at are processor and memory usage vs. the number of concurrent users/processes. I would also check to see that everything is performing as expected under normal and then peak load. You might encounter scenarios where higher load causes application errors due to various requests stepping on each other.
If you really want to get detailed information you'll want to run different types of load/stress tests. You'll probably want to look at a step load test (a gradual increase of users on system over time) and a spike test (a significant number of users all accessing at the same time where almost no one was accessing it before). I would also run tests against the server right after it's been rebooted to see how that affects the system.
You'll also probably want to look at a concept called HEAT (Hostile Environment Application Testing). Really this shows what happens when some part of the system goes offline. Does the system degrade successfully? This should be a key standard.
My one really big piece of suggestion is to establish what the system is supposed to do before doing the testing. The main reason is accountability. Get people to admit that the system is supposed to do something and then test to see if it holds true. This is key because because people will immediately see the results and that will be the base benchmark for what is acceptable.

Resources