Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I have a set of Locust tests that I'm using to test a REST API.
These Locust tests perform extremely poorly when running on EC2 instances.
I developed the tests within a Vagrant VM environment using one VM as the Locust master and one VM as the Locust slave.
Without any effort to tweak the tests my Vagrant dev environment can run up 200 users and generate 50 rps.
But if I run the exact same Locust tests on EC2 and hit the exact same API host the performance is terrible.
Example using a C3.large instance as the master and a C3.2xlarge instance as the slave:
If I specify 300 users Locust will generate a 100 or so at the specified spawn rate then only add new users very slowly. It usually slows to the point where it will never actually create 300 users. And at best I get between 5 - 8 rps.
I'm not sure where to start looking for the discrepancy in performance. Is it the Locust master or the slave? Is it something specific to EC2?
JusDockin, have you checked the file descriptors limit on your instances?
ulimit -n
ulimit -Hn
This may directly affect the number of HTTP requests you're able to generate.
I can easily generate 200 users on a t2.small instance (didn't try to generate more, as Amazon doesn't look too kindly at load testing unless you warn them in advance). That's what I have in my user data:
echo "* soft nofile 40000" >> /etc/security/limits.conf
echo "* hard nofile 40000" >> /etc/security/limits.conf
Related
I have a 32GB, i7 core processor running on windows 10 and I am trying to generate 10kVU concurrent load via jmeter. For some reason I am unable to go beyond 1k concurrent and I start getting BindException error or Socket connection error. Can someone help me with the settings to achieve that kind of load? Also if someone is up for freelancing I am happy to consider that as well. Any help would be great as I am nearing production and am unable to load test this use case. If you guys have any other tools that I can use effectively, that would also help.
You reach the limit of 1 computer, thus you must execute in distributed environment of multiple computers.
You can setup JMeter's distributed testing on your own environment or use blazemeter or other cloud based load testing tool
we can use BlazeMeter, which provides us with an easy way to handle our load tests. All we need to do is to upload our JMX file to BlazeMeter. We can also upload a consolidated CSV file with all the necessary data and BlazeMeter will take care of splitting it, depending on the amount of engines we have set.
On BlazeMeter we can set the amount of users or the combination of engines (slave systems) and threads that we want to apply to our tests. We can also configure additional values like multi locations.
1k concurrent sounds low enough that it's something else ... it's also the default amount of open file descriptor limits on a lot of Linux distributions so maybe try to raise the limit.
ulimit -Sn
will show you your current limit and
ulimit -Hn
will show you the hard limit you can go before you have to touch configuration files. Editing /etc/security/limits.conf as root and setting something like
yourusername soft nofile 50000
yourusername hard nofile 50000
yourusername - will have to be the username of the user which with you run jmeter.
After this you will probably have to restart in order for the changes to take effect. If not on Linux I don't know how to actually do this you will have to google :D
Recommendation:
As a k6 developer I can propose it as an alternative tool, but running 10k VUs on a single machine will be hard with it as well. Every VU will take some memory - like at least 1-3mb and this will go up the larger your script is. But with 32gb you could still run upto 1-2kVUs and use http.batch to make concurrent requests which might simulate the 10k VUs depending on what your actual workflow is like.
I managed to run the stages sample with 300VUs on a single 3770 i7 CPU and 4gb of ram in a virtual machine and got 6.5k+ rps to another virtual machine on a neighboring physical machine (the latency is very low) so maybe 1.5-2kVUs with a a somewhat more interesting script and some higher latency as this will give time to the golang to actually run GC while waiting for tcp packets. I highly recommend using discardResponseBodies if you don't need them and even if you need some to actually get the response for them. This helps a lot with the memory consumption a each VU
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I want to schedules the time, around which an ec2 instance starts/stops? Is it possible to do it without using cron, aws data pipeline or Lambda? I am trying to create a shell script to automate this task. Please suggest!
Here is a Lambda function which will help you start or stop any instance which have correct tags with start and stop values.
https://gist.github.com/gregarious-repo/b75eb8cb34e9b3644542c81fa7c7c23b
You just need to create new function with proper permission and trigger it by the CloudWatch event which will execute every minute.
Once you have it up and running your instance tags should look like below:
You can use a shell script along with aws cli to get this done.
Incase your ec2 instance is in an auyo scaling setup or if you can place it in an auto scaling setup, you can use the scheduled autoscaling featute as well. You can read about scheduled auto scaling feature here.
http://docs.aws.amazon.com/autoscaling/latest/userguide/schedule_time.html
Note: Auto scaling feature is free to use and coats only for resources that is used up. You may place single instance setups also in auto scaling setup with minimum value set to 0 and use scheduled autoscale feature to get your purpose served.
I am using locust to load test an application. I wrote and tested the script on my local ubuntu system, and all went well.
I created an EC2 instance, using an Amazon Linux image, and after adjusting the file limits in /etc/security/limits.conf file I loaded up locust and things went normally for a small test (simple GET test, just to check the plumbing, 2000 users, 20 hatch rate).
However, when I loaded up a larger test, 8000 users 40 hatch rate, I noticed that somewhere around 3,000 or 4,000 users, the hatch rate appeared to slow down, just adding 4 - 5 rather the 40 new "users" at a time. So it took a long time to reach 8000. Is that expected behavior, if not, any idea what the problem might be?
What Locust calls "users" is actually gevent spawned TaskSets. This means you're spawning thousands of eventlets in a single Python process, which means a great deal of overhead managing those eventlets.
If you want to spawn thousands of TaskSets, I'd recommend running Locust in distributed mode. You can have many slaves running on the same hardware, or distribute your slaves across many instances. Google has written up a neat article and open sourced some Kubernetes containers for just such a purpose. We wrote our own Docker container with Alpine and a heavily modified Locust, our ratio of slaves to TaskSets ended up being 1:100. The ratio of slaves to instances heavily depends on what instance size you get.
I have created a test plan for creating userprofile.
I want to run my test plan for 100 users but when i run it for 10 users then it is running successfully with rump up time of 2 sec; but when i try it for 100 users & more than that it is getting failed I am giving rump uptime of 40 sec for 100 users.
I am not able to understand what may be the problem with it.
In my test plan the thread user are differentiated with id
Thanks in Advance.
It's a wide question, this behavior can be caused by
Your application under test can't handle load of 100 threads. Check logs for errors and make sure that application/web server and/or database configuration allow 100+ concurrent connections. Also you can check "Latency" metric to see if there is a problem with infrastructure or application itself.
Your load generator machine can't create 100 concurrent threads. If so - you'll need to consider JMeter Distributed Testing
Your script isn't optimized. I.e. using memory-consuming listeners like "View Results Tree", any graph listeners, regular expression extractors. Try following JMeter Performance and Tuning Tips guide and see whether it resolves your issue.
Agree with Dmitri, reason could be one of the above three.
One more thing you can try.
You can run your jmeter in ui mode for validation of your script and after validation you can run it in non-ui mode which will save lot of memory and cpu processing (basically UI is heaviest part in jmeter).
you can run your jmeter script in non-ui mode like this,
Jmeter -n -t -H proxy -P port
generally on a single dual core machine with 2 GB ram (Load Generator in your case) 100 user test can be carried out successfully.
some more things you can look at to find out the actual bottleneck
1.check application server logs (server on which your application is hosted)
if there are any failures in that then see performance counters on server (CPU, Memory, network etc) to see anything is overloaded.
(if server is windows then check using perfmon if linux then try sar)
if something is overloaded then reason is your app server cant take load of 100 users
probably try tuning it more.
2.check load generator system performance counters (JVM heap usage,CPU,Memory etc)
if JVM heap size is small enough try increasing it but if other counters are overloaded then try distributed load testing.
3.remove unwanted/heavy listeners, assertion from script.
maybe this will help :)
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 1 year ago.
Improve this question
Update (2013 Oct, 1st):
Today I finally found out what was the issue. After debugging for hours with xdebug and trying to execute multiple scripts I noticed that the body of the script is not the problem. When executing tests worker with a sleep of 10-20 seconds I've noted that the CPU was idling most of the time and so deducted that was consuming the most of the CPU is to bootstrap Symfony.
My scripts were very quickly executed and killed to pawn to new script, etc. I've fixed it adding a do{}while() that is exiting after a random amount of seconds (to avoid all the worker to restart at the same time).
I've reduced the load from an average of 35-45% to an average of 0.5-1.5% That's a HUGE improvement. To Resume Symfony is bootstrapped once and after the script is just waiting until a random timeout to kill itself and launch a new instance of itself. This is to avoid script to hang or the database connection to timeout, etc.
If you have a better solution do not hesitate to share. I'm so happy to go from 100% CPU usage (x4 servers because of the auto-scaling) to less than 1% (and only one server) for the same amount of work, it's even faster now.
Update (2013 Sep, 24th):
Just noticed that the console component of Symfony is using dev environment by default.
I've specified prod in the command line: ./app/console --env=prod my:command:action and I divide by 5 the execution time which is pretty good.
Also I have the feeling that curl_exec is eating a lot of CPU but I'm not sure.
I'm trying to debug the CPU usage using xdebug, reading the generated cachegrind, but there is no reference of CPU cycle used per function, class, ... Only the time spent and memory used.
If you want to use xdebug in a PHP command line just use #!/usr/bin/env php -d xdebug.profiler_enable=On at the top of the script
If anyone has a tip to debug this with xdebug I'll be happy to hear it ;)
I'm asking this question without real hope.
I have a server that I use to run workers to process some background tasks. This server is an EC2 server (m1.small) inside an auto-scaling group with high CPU alert setup.
I have something like 20 workers (php script instance) waiting for jobs to be processed. To run the script I'm using the console component of Symfony 2.3 framework.
There is not much happening in the job, fetching data from URL, looping over the results and insert it row by row (~1000 rows per job) in MySQL (RDS server).
The thing is that with 1 or 2 workers running, the CPU is at 100% (I don't think it's like at 100% all the time but it's spiking every second or so) which cause the auto-scaling group to launch new instances.
I'd like to reduce the CPU usage which is not justified at all. I was looking at php-fpm (fastCGI) but it looks like it's for web servers only. PHP client wouldn't use it? right?
Any help would be appreciated,
Cheers
I'm running PHP 5.5.3 with FPM SAPI and as #datasage pointed out in his comment this would only affect the web-based side of things. You run a php -v command on CLI you'd notice:
PHP 5.5.3 (cli) (built: Sep 17 2013 19:13:27)
So FPM isn't really part of the CLI stuff.
I'm also running a similar situation which you are, except I'm running jobs via Zend Framework 2. I've found that running jobs which loop over information can be resource intensive at times but I've also found that it was caused by the way I had originally developed that loop myself. It had nothing to do with PHP in general itself.
I'm not sure about your setup, but in one of my jobs which runs forever I've found this works out the best and my server load is almost null.
[root#router ~]$ w
12:20:45 up 4:41, 1 user, load average: 0.00, 0.01, 0.05
USER TTY LOGIN# IDLE JCPU PCPU WHAT
root pts/0 11:52 5.00s 0.02s 0.00s w
Here is just an "example":
do {
sleep(2);
// Do your processing here
} while (1);
Where // Do your processing here I'm actually running several DB Queries, processing files, and running server commands based on the job requirements.
So in short, I wouldn't blame or think that PHP-FPM is causing your problems but most likely I'd start looking at how you've developed your code to run and make the necessary changes.
I currently have four jobs right now which run forever and is continuously looking at my DB for jobs to process and the server load has never spiked. I've even tested this with 1,000 jobs pending.