performance testing uwsgi application in microservice - performance

I have my uwsgi application deployed as a container pod in kubernetes which uses cherrypy to expose APIs. I need to evaluate the performance of the container and to set the limits such as CPU and memory.
I am using Grafana to check the CPU and Memory usage.
Container resources requirements as configured below,
deployment.yaml
resources:
requests:
memory: 256Mi
cpu: 100m
limits:
memory: 512Mi
cpu: 200m
I am using locust to post requests parallely to check the limit of requests and other metrics but i am observing some thing very different.
import invokust
url = "http://<node_ipaddress>:<node_port>/config/"
settings = invokust.create_settings(
locustfile='locust_file.py',
host=url,
num_clients=500,
hatch_rate=500,
run_time='1m'
)
load_test = invokust.LocustLoadTest(settings)
load_test.run()
load_test.stats()
locust_file.py
users = {'name': 'abc'}
class UserTasks(TaskSet):
def __init__(self, *args, **kwargs):
super(UserTasks, self).__init__(*args, **kwargs)
self.payload = users
def on_start(self):
pass
#task
def index(self):
response = self.client.post("/SRVCC/Global",data=self.payload)
print(response.status_code)
assert_that(200).is_equal_to(response.status_code)
print("Successfully added user")
class User(HttpLocust):
task_set = UserTasks
min_wait = 100000
max_wait = 200000
I ran with 100 users, it works fine without error. Later i increased to 500 users and some of the requests are failed with status code 0.
When i checked the CPU and memory usage in Grafana i dont see any problem in CPU and memory which is well within the limit configured.
I could not understand the why the some of the requests are failing even when resources requirements are fine.

Any specific reason you were using invokust instead of using locust natively?
It'd be useful to include the following info as well, based on your question, it's impossible to determine the answer.
version of locust and invokust used
how did you invoke the load tests
exact output of your script
Even if you use locust directly, I'd recommend some tweaks:
It looks like you have quite a long min wait-time, which means your users are not very active. Try reduce the wait time and decrease the number of users.
Hatching 500 users per second is pretty intense and your machine may struggle with it. Try ramping up gradually. https://docs.locust.io/en/stable/running-locust-in-step-load-mode.html

Related

Form Recognizer Heavy Workload

My use case is the following :
Once every day I upload 1000 single page pdf to Azure Storage and process them with Form Recognizer via python azure-form-recognizer latest client.
So far I’m using the Async version of the client and I send the 1000 coroutines concurrently.
tasks = {asyncio.create_task(analyse_async(doc)): doc for doc in documents}
pending = set(tasks)
# Handle retry
while pending:
# backoff in case of 429
time.sleep(1)
# concurrent call return_when all completed
finished, pending = await asyncio.wait(
pending, return_when=asyncio.ALL_COMPLETED
)
# check if task has exception and register for new run.
for task in finished:
arg = tasks[task]
if task.exception():
new_task = asyncio.create_task(analyze_async(doc))
tasks[new_task] = doc
pending.add(new_task)
Now I’m not really comfortable with this setup. The main reason being the unpredictable successive states of the service in the same iteration. Can be up then throw 429 then up again. So not enough deterministic for me. I was wondering if another approach was possible. Do you think I should rather increase progressively the transactions. Start with 15 (default TPS) then 50 … 100 until the queue is empty ? Or another option ?
Thx
We need to enable the CORS and make some changes to that CORS to make it available to access the heavy workload.
Follow the procedure to implement the heavy workload in form recognizer.
Make it for page blobs here for higher and best performance.
Redundancy is also required. Make it ZRS for better implementation.
Create a storage account to upload the files.
Go to CORS and add the URL required.
Set the Allowed origins to https://formrecognizer.appliedai.azure.com
Go to containers and upload the documents.
Upload the documents. Use the container and blob information to give as the input for the recognizer. If the case is from Form Recognizer studio, the size of the total documents is considered and also the number of characters limit is there. So suggested to use the python code using the container created as the input folder.

How to use Sphinx Search with concurrency?

I have a large database (100M rows) indexed by SphinxSearch. Each search takes 0.1-0.5s. However, if I run 10 searches concurrently, they take 20s on average.
Is it the expected behaviour of SphinxSearch?
Should I adjust the config or move to another search engine for concurrency?
My config file is simple:
searchd
{
listen = 9312
listen = 9306:mysql41
pid_file = /var/searchd.pid
read_timeout = 30
log = /var/log/sphinxsearch/searchd.log
query_log = /var/log/sphinxsearch/query.log
}
Is it the expected behaviour of SphinxSearch?
It heavily depends on the number of CPUs. If you have more than 10 physical CPUs then latency degradation from 0.5 sec to 20 sec by increasing the concurrency from 1 to 10 is definitely not expected. In this case first of all make sure all your CPUs are busy under the concurrency load. If it's not - depending on your Sphinx version and multi-tasking mode let it run with more threads.
Should I adjust the config or move to another search engine for concurrency?
I recommend Manticore Search as:
it's open source - https://github.com/manticoresoftware/manticoresearch/
it's the only fork of Sphinx and if you are familiar with Sphinx in general it shouldn't be a problem to migrate
hundreds of bugs have been fixed
the multi-tasking mode is completely different (coroutines)

Locust response time for the first request

I am using Locust and my code looks as below
class RecommenderTasks(TaskSet):
#task
def test_recommender_multiple_platforms(self):
start = round(time.time() * 1000)
self.client.get('recommendations', name='Test')
end = round(time.time() * 1000)
print(end - start)
class RecommenderUser(FastHttpUser):
tasks = [RecommenderTasks]
wait_time = constant(1)
host = "https://my-host.com/"
When I test with this code, I get the following output times
374
62
65
68
64
I am not sure why the very first task time alone is about 300+ ms and the rest are as expected. With this, my overall average time also increases. Could you please help me here?
Locust response times are measured from the time the initial request is sent to the server to the time a response is received. By default Locust reuses socket connections when available but creates new ones if an existing one isn't available. When connecting via HTTPS, there are a number of things that need to be done to set up the connection initially. Generally performance of that connection set up is dependent on things the server is doing. You could look into ways of reducing your connection setup time. How to do that will vary widely depending on your stack but you can find general principles in SO answers like this one:
how to reduce ssl time of website

Neo4j 2.0.0 - Poor performance for dev/test in a virtual machine

I have Neo4j server running inside a virtual machine using Ubuntu 13.10 and I am accessing via REST using Cypher queries. The virtual machine has 4 GB of memory allocated to it.
I've changed the open file count to 40000, set the initial JVM heap to 1G and my neo4j.properties file is as follows:
neostore.nodestore.db.mapped_memory=250M
neostore.relationshipstore.db.mapped_memory=100M
neostore.propertystore.db.mapped_memory=100M
neostore.propertystore.db.strings.mapped_memory=100M
neostore.propertystore.db.arrays.mapped_memory=100M
keep_logical_logs=3 days
node_auto_indexing=true
node_keys_indexable=id
I've also updated sysctl based on the Neo4j Linux tuning guide:
vm.dirty_background_ratio = 50
vm.dirty_ratio = 80
Since I am testing queries, the basic routine is to run my suite of tests and then delete all of the nodes and run them all again. At the start of each test run, the database has 0 nodes in it. My suite of tests of about 100 queries is taking 22 seconds to run. Basic parameterized creates such as:
CREATE (x:user { email: {param0},
name: {param1},
displayname: {param2},
id: {param3},
href: {param4},
object: {param5} })
CREATE x-[:LOGIN]->(:login { password: {param6},
salt: {param7} } )
are currently taking over 170ms to execute (and that's the average, first query time is 700ms). During a test run, the CPU in the VM never exceeds 50% and memory usage is at a steady 1.4Gb.
Why would creating a single node in an empty database take 170ms? At this point unit testing is becoming almost impossible since it is so slow. This is my first time trying to tune Neo4j so I'm not really sure how to figure out where the problem is or what changes should be made.
Additional Details
I'm using Go 1.2 to make REST calls to the cypher endpoint (http://localhost:7474/db/data/cypher) of a locally installed Neo4j instance. I'm setting the request headers for content-type to "application/json", accept to "application/json" and "X-Stream" to true. I always return either an array of maps or nothing depending on the query.
It seems like the creates are the problem and are taking forever. For example:
2014/01/15 11:35:51 NewUser took 123.314938ms
2014/01/15 11:35:51 NewUser took 156.101784ms
2014/01/15 11:35:52 NewUser took 167.439442ms
2014/01/15 11:35:52 ValidatePassword took 4.287416ms
NewUser creates two new nodes and one relationship and is taking 167ms, while ValidatePassword is a read-only operation and it completes in 4ms. Also note that the three calls to NewUser are identical parameterized queries. While the creates are the big problem, I'm also a little concerned that Neo4j is taking 4ms to just find a labeled node when there are only 100 nodes in the database.
I do not restart the server in between test runs or delete the database. I issue a single delete all nodes query MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n,r at the end of the test run. Running the same test suite multiple times back to back does not improve the query times.
Are your 100 queries all the same only with different parameters, or actually 100 different queries?
What you see is actually setup work. The parser has to load the parsing rules initially that takes a few ms. Also new queries that have not been seen are compiled, planned and put in the query cache.
So the first query always takes a bit longer. But as you parametrize all subsequent ones should be fast.
Can you confirm that?
I think you see the transactional overhead of flushing the transaction to disk.
Did you try to batch more requests into one? I.e. with the transactional endpoint? Or the /db/data/batch (but I'd rather use the new tx-endpoint /db/data/transaction).
Did you create an index for your lookup property for your validate query?
Can you do me a favor and test your create query without a label? I found some perf issues when testing that myself earlier this week.
Just ran a test with curl
for i in `seq 1 10`; do time curl -i -H content-type:application/json -H accept:application/json -H X-Stream:true -d #perf_test.json http://localhost:7474/db/data/cypher; done
I'm getting between 16 and 30ms per request externally including starting curl
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8; stream=true
Access-Control-Allow-Origin: *
Transfer-Encoding: chunked
Server: Jetty(9.0.5.v20130815)
{"columns":[],"data":[]}
real 0m0.016s
user 0m0.005s
sys 0m0.005s
Perhaps it is rather the VM (disk or network) or the cross-vm communication?
Did another test with ab and 1000 requests for both endpoints, got a mean of about 5 ms both times.
https://gist.github.com/jexp/8452037

Django1.3 multiple gunicorn workers caching problems

i have weird caching problems with the 1.3 version of django. I probably have something configured wrong, but am not sure what.
A good example is django-avatar, which uses caching and many people use it. Even if I dont have a cache backend defined the avatar seems to be cached, which by itself would be ok, but it keeps switching back and forth between the last values cached. Example: I upload a new avatar, now on approximately 50% of the requests it will show me the new one, 50% the old one. If I delete the old one I still get it on the site 50% of the time. The only way to fix it is to disable the caching of the avatar by setting it to one second.
First I thought it was because i used django.core.cache.backends.locmem.LocMemCache, which I never used before, but it even happens when I dont configure a cache backend at all.
I found one similar bug:
Django caching bug .. even if caching is disabled
but my pages render just fine, its the templatetags (for now) that cause the problems in my setup.
I use django 1.3, postgres, nginx, gunicorn 0.12.0, greenlet==0.3.1, eventlet==0.9.16
I just did some more testing and realized that it only happens when I start gunicorn using the config file. If I start it with ./manage.py run_gunicorn everything is fine. Running "gunicorn_django -c deploy/gunicorn.conf.py" causes the problems.
The only explanation I can think of is that each worker gets his own cache (I wonder why, since I did not define a cache).
Update: running ./manage.py run_gunicorn -w 4 also causes the same problems. Therefore I am almost certain that the multiple workers are causing the problems and each worker caches the values seperately.
My configuration:
import os
import socket
import sys
PORT = 8000
PROC_NAME = 'myapp_gunicorn'
LOGFILE_NAME = 'gunicorn.log'
TIMEOUT = 3600
IP = '127.0.0.1'
DEPLOYMENT_ROOT = os.path.dirname(os.path.abspath(__file__))
SITE_ROOT = os.path.abspath(os.path.sep.join([DEPLOYMENT_ROOT, '..']))
CPU_CORES = os.sysconf("SC_NPROCESSORS_ONLN")
sys.path.insert(0, os.path.join(SITE_ROOT, "apps"))
bind = '%s:%s' % (IP, PORT)
logfile = os.path.sep.join([DEPLOYMENT_ROOT, 'logs', LOGFILE_NAME])
proc_name = PROC_NAME
timeout = TIMEOUT
worker_class = 'eventlet'
workers = 2 * CPU_CORES + 1
I also tried it without using 'eventlet', but got the same errors.
Thanks for any help.
It is most likely defaulting to the in-memory-cache, which means each worker has it's own version of the cache in it's own memory space. If you hit thread 1 you get a different cache then thread 3. Nginx is spreading the load between each thread most likely via a round robin distribution, so you are changing threads each hit. Which explains your wacky results.
When you do manage.py run_gunicorn it is most likely running single threaded, and thus only one cache, and that is why you don't see the same results.
Using memcached or something similar is the way to go.

Resources