Does JMeter performance testing effect live websites - jmeter

I have been using my blog to learn JMeter and I wondered how risky this could be. For example if I load test a site ex:- has limited resource where the website is hosted) with 100,000 users or more wouldn't it effect the website? Are there mechanisms to prevent such scenario.

Yes it will affect your web site. Performance benchmarking tools do introduce load and are designed to stress test applications, websites and databases. The idea is to do this before you deploy your application, web site and other systems to know what your theoretical limits are. Also keep in mind by monitoring the systems performance with a tool you are also adding extra load. Thus the number you get from these tools are not always 100% accurate. Its better to know the theoretical limitations then not knowing at all.
One mechanism you can use to stop such tools being used in a malicious way is to run some intrusion detection system(IDS) on the network edge. These system will probably identify this type of activity as a DOS attack of sorts and then block the originating IP.
DDOS attacks makes things a lot more difficult to cope with. This is where 1000's of machines make requests small enough not to be picked up by the IDS as a DOS attack at the same target. The IDS just sees a lot of small amounts of traffic,request etc coming from a lot of addresses. This makes it very hard to determine what is a real request and what is a request that is an attack.


What differentiate virtual users / real users when performing load test?

Anyone can point out the difference between virtual user and real user?
In the context of web load testing, there are a lot of differences. A virtual user is a simulation of human using a browser to perform some actions on a website. One company offers what they call "real browser users", but they, too, are simulations - just at a different layer (browser vs HTTP). I'm going to assume you are using "real users" to refer to humans.
Using humans to conduct a load test has a few advantages, but is fraught with difficulties. The primary advantage is that there are real humans using real browsers - which means that, if they are following the scripts precisely, there is virtually no difference between a simulation and real traffic. The list of difficulties, however, is long: First, it is expensive. The process does not scale well beyond a few dozen users in a limited number of locations. Humans may not follow the script precisely...and you may not be able to tell if they did. The test is likely not perfectly repeatable. It is difficult to collect, integrate and analyze metrics from real browsers. I could go on...
Testing tools which use virtual users to simulate real users do not have any of those disadvantages - as they are engineered for this task. However, depending on the tool, they may not perform a perfect simulation. Most load testing tools work at the HTTP layer - simulating the HTTP messages passed between the browser and server. If the simulation of these messages is perfect, then the server cannot tell the difference between real and simulated users...and thus the test results are more valid. The more complex the application is, particularly in the use of javascript/AJAX, the harder it is to make a perfect simulation. The capabilities of tools in this regard varies widely.
There is a small group of testing tools that actually run real browsers and simulate the user by pushing simulated mouse and keyboard events to the browser. These tools are more likely to simulate the HTTP messages perfectly, but they have their own set of problems. Most are limited to working with only a single browser (i.e. Firefox). It can be hard to get good metrics out of real browsers. This approach is far more scalable better than using humans, but not nearly as scalable as HTTP-layer simulation. For sites that need to test <10k users, though, the web-based solutions using this approach can provide the required capacity.
There is a difference.
Depends on your jmeter testing, if you are doing from a single box, your IO is limited. You cant imitate lets say 10K users with jmeter in single box. You can do small tests with one box. If you use multiple jmeter boxes that s another story.
Also, how about the cookies, do you store cookies while load testing your app? that does make a difference
A virtual user is an automatic emulation of a real users browser and http requests.
Thus the virtual users is designed to simulate a real user. It is also possible to configure virtual users to run through what we think a real users would do, but without all the delay between getting a page and submitting a new one.
This allows us to simulate a much higher load on our server.
The real key differences between virtual user simulations and real users is are the network between the server and thier device as well as the actual actions a real user performs on the website.

Load testing consumer websites

I am looking to load test a consumer website. I have tried using JMeter. However, in that case, all the requests originate from one machine. What I really want is to simulate real users across the country some on low speed dialup connections and others on highspeed.
What are the best practices to follow in such a scenario?
JMeter supports distributed testing - so if you're already comfortable with it as a tool, you can use it to power these distributed requests from an arbitrary number of machines, too.
Note that all machines run the exact same test plan, so either your plan should have some random selection of fast/slow environments, or you may be able to select which profile to use based on some system properties.
You might want to consider using a 3rd party service such as Load Impact.
You've expressed two different but related concerns - traffic coming from a single machine and simulating various end-user network speeds.
Why is the first one a concern for your testing? Unless you have a load balancer that uses the IP address as part of its load-distributing algorithm, the vast majority of servers (and application platforms) don't care that all the traffic is coming from a single machine (or IP address). Note also that you can configure the OS of your load generator for multiple IP addresses and the better load-testing tools will make use of those IP addresses so that traffic comes from all of them.
For the simulation of end-user network speeds, again, the better load-testing tools will do this for you. That can give you a pretty good feel for how the bandwidth will affect the page load time, without actually using distributed load generation. But tools frequently do not account for latency. That is where there is no substitute for distributing your load generation.
You can do distributed testing with JMeter, though it can be a bit cumbersome. How many locations do you need? Without knowing more about what you need, my first suggestion would be to choose a tool that has features designed specifically to do what you need. I will pimp our product, Web Performance Load Tester, but there are certainly other options. Load Tester can emulate various end-user connection speeds and has built-in support for generating load from Amazon EC2 (US east and west coast and Dublin, for Asia coming soon). Once you set up an EC2 account, you can be running your first test from the cloud in 10 minutes.

Is perl the fastest way to write a high performance page?

I was inspired by Slashdot, I was heard that it uses very limited servers to support a lot of users with fast response. And there is a website named slashcode, not sure if slashdot uses its source code.
I am wondering if Perl is the best to write a high performance web page? I know using Apache or IIS will be having a lot of overhead?
Any idea, books, papers, tutorials?
I'm going to assume that by "high performance" you mean both in the real time taken to produce a page and also how many it can serve concurrently.
The programming language isn't so important as your servers and algorithms. You may want to look into The C10k Problem which is a series of new technologies and refinement of techniques with the aim to allow a single web server to concurrently handle more than 10,000 concurrent connections. Things like the Nginx and lighttpd web servers and varnish cache came out of this project.
Big wins come from using a very light, very fast, very modular web server (Apache and IIS ain't it) with a very light, very fast cache in front of it to avoid having to process the same thing twice. For a high concurrency server, even caching for a few seconds can save you hundreds or thousands of processes. By chopping up a static page into a series of AJAX requests you can cache the more static bits and pieces independently of the bits that change frequently.
Instead of using mod_blah that embeds your program into a web server, use FastCGI or similar that puts your programs into their own little application servers. This allows them to run independent of the web server, possibly on remote machines and with load balancing. This lets you easily scale your processing power.
Eventually you're going to micro-optimize really important bits of your application code to the point where the language matters, but you can focus on the really important bits rather than having to do the whole project solely according to raw performance.
Regardless of how fast your code is, at some point the bottleneck will stop being your code, and start being the web server itself.
As long as you're not using the CGI interface[1] to talk to the web server, the language isn't going to have a noticeable impact on performance in 99% of cases. The exceptions are those in which you're doing heavy back-end processing rather than simply grabbing something out of a database, lightly massaging it, and sending it off to the user - and, if you are doing that kind of thing, you're likely better off doing it asynchronously if possible and stuffing the results into a database to be lightly massaged and viewed later.
The reason is, quite simply, that network connection and data transfer times will be so much longer than your program's execution time that it's not even funny. If it's taking 2 seconds to establish a network connection to the server and do the data transmission in each direction, nobody is going to care whether the processing on the server adds 0.1s or 0.2s on top of that 2s of network activity.
[1] Note that I am talking here about the vanilla CGI "start up a new process to service each incoming request" model, not the Perl CGI module ( CGI). There are ways to use CGI while also making use of a long-lived process which handles multiple requests over its lifetime.
Architecture and system design are more important than language choice for a high traffic app.
But selecting a language is not the first thing you should do, unless you are planning to write everything from the ground up.
You should be selecting a toolset.
If you want to have something soonest, look at existing web applications. What meets your needs? How customizable is it? Does it meet your performance/scalability requirements? If so, the language you use will be the language your app uses.
If you can't find a good match in existing apps, look at different frameworks, Catalyst, Rails, Squatting, Camping, Jifty, Django. There's a nice list of them on Wikipedia.
You should be able to find a framework that will do the job, many of them. Pick some contenders and choose one. The language you use will be the language your framework uses.
There's really no such thing as a "high performance page". That's like asking what the fastest car is (and if you watch enough Top Gear, you know that's not a simple answer). You have to think about what you actually want to do (i.e. the particular task), what you have to do to make that happen, and which tools would work best for that.
Are you going to have a lot of people doing a lot of small things, or fewer people doing really big things? Is it all going to happen at once (i.e. spikes), or is it going to be constant demand? Are you send back small chunks of data or serving up really large files?
Suppose that every portion were as fast as possible. It's a fantasy for sure, but consider it anyway. Now that everything is fast as possible, rank every part according to how relatively fast they are. What's the slowest part? Is it disk access? Network IO? Socket availability?
If you aren't at the point where you're already thinking about this, the language probably isn't that important beyond your skill with it.
There are a lot of books on web performance out there. :)
This post on serverfault suggestst that you could write an extension module to nginx for serving dynamic content.
Such modules need to be compiled to native machine code, so most likely are faster than running Perl.
I don't believe it would be faster than other common choices such as PHP, Python, Ruby, Java, or C#.

Requiring web redirects for downloads in games

I've noticed that several game suffer from poor download speeds (<10Kb/s) when you don't use a web redirect (an apache,iis,... server), is there a programmatic reason for this that I don't see or is it for other reasons I'm missing.
It's been assumed that this is intended to not bog down the server but I'm looking to see if there is a less obvious (code) based reason.
There are three reasons, all linked.
Real web servers serve static content better. Fact. They're optimised for standard download-style traffic and separating that jazz from core game server code makes a lot of sense to a lot of developers. Plus they tend to use comparatively fewer resources and have higher levels of flexibility than a custom-made HTTP server.
Moving HTTP downloads off the game server keeps those important CPU cycles doing what you want them to: letting people frag the hell out of each other... If you can shuffle off the non-critical traffic to another, cheaper and/or clustered server, you keep your game playing smoothly.
As I hinted above, you can cluster or CDN the HTTP traffic, something you can't do (for fairly obvious reasons) with the game servers. This would only really apply on really busy networks but it's a good way to manage your traffic if you're dealing with a lot of potential downloads and they're all mission-critical.
Some game servers do handle it themselves, but, as you've noticed, most do it at a nauseatingly slow rate, again for the second reason above: resources. Bandwidth is almost as important as CPU, so uploads are heavily limited rate to keep players in the game going at top speed.

High traffic web sites

What makes a site good for high traffic?
Does it have more to do with the hardware/infrastructure, or with how one writes the software, using Java as the example, if it matters?
I'm wondering how the software changes just because it is expected that billions of users will be on the site, if at all.
My understanding up to this point is that the code doesn't change, but that it is deployed on multiple servers, in a cluster, and a load balancer distributes the load, so really, on any one server/deployment, the application is just as any other standard application/website.
I highly recommend reading Jeff Atwood's blog on Micro-Optimization. In previous blogs he talks somewhat about how this site was created and the hardware upgrades he has had (which quickly summarized said that better hardware performs better only the extent that it is faster/better), but the real speed of a site comes from good programming, and this article seems like it should sum up some of your site programming questions quite well.
Hardware is cheap. Programming is expensive.
There are some programming techniques to make sure your code can handle multiple simultaneous views/updates. If you're using an existing framework, much of that work is (hopefully) done for you, but otherwise you're going to find stuff that worked for a few hundred hits an hour on one server isn't going to work when you're getting hundreds of thousands of hits and you have to deploy multiple load balancing machines.
Well, it is primarily an issue of hardware scaling but there are a few things to keep in mind with respect to the software involved in scaling. For example, if you are on a server farm, you'll need to work with a session management server (either via SQL Server or via a state server - which has implications in that your session variables need to be serializable).
But, in the bigger picture, there are a variety of things that you would want to do to scale to an enterprise level. For example, it becomes particularly important that you abstract out your database calls to a DAL because you may well need to adopt the use of a middleware package for high volume environments.
