I am building an application with InfluxDB. I need to monitor sensor data from many thousands of sensors, and I need to set up alerts and notifications for each one of them. This results in several thousands of checks.
When I started creating checks, Influx started choking at about 500 running checks. Is this expected? Am I working against how Influx is supposed to be used here?
Additional info:
My series cardinality is low, <50.
The database in question had no actual data in it, so the checks ran on an empty database.
Influx runs in docker, and the container went from about 2% cpu usage to closer to 400% when the checks were saved.
Related
We are experimenting with Apache Beam (using Go SDK) and Dataflow to parallelize one of our time consuming tasks. For little more context, we have caching job which takes some queries, runs it across database and caches them. Each database query may take few seconds to many minutes and we want to run those in parallel for quicker task completion.
Created a simple pipeline that looks like this:
// Create initial PCollection.
startLoad := beam.Create(s, "InitialLoadToStartPipeline")
// Emits a unit of work along with query and date range.
cachePayloads := beam.ParDo(s, &getCachePayloadsFn{Config: config}, startLoad)
// Emits a cache response which includes errCode, errMsg, time etc.
cacheResponses := beam.ParDo(s, &cacheQueryDoFn{Config: config}, cachePayloads)
...
The number units which getCachePayloadsFn emits are not a lot and will be mostly in hundreds and max few thousands in production.
Now the issue is cacheQueryDoFn is not getting executed in parallel and queries are getting executed sequentially one by one. We confirmed this by putting logs in StartBundle and ProcessElement by logging goroutine id, process id, start and end time etc in caching function to confirm that there is no overlap in execution.
We would want to run the queries always in parallel even if there are just 10 queries. From our understanding and documentations, it creates bundles from the overall input and those bundles run in parallel and within bundle it runs sequentially. Is there a way to control the number of bundles from the load or any way to increase parallelism?
Things we tried:
Keeping num_workers=2 and autoscaling_algorithm=None. It starts two VMs but runs Setup method to initialize DoFn on only one VM and uses that for entire load.
Found sdk_worker_parallelism option here. But not sure how to correctly set it. Tried setting it with beam.PipelineOptions.Set("sdk_worker_parallelism", "50"). No effect.
By default, the Create is not parallel and all the DoFns are being fused into the same stage as the Create, so they also have no parallelism. See https://beam.apache.org/documentation/runtime/model/#dependent-parallellism for some more info on this.
You can explicitly force a fusion break with the Reshuffle transform.
I have been running my JMeter script with the following setup
Users: 100
Loop Controller: 5
I used Loop Controller on the http request where transactions are needed to iterate.
My question is, there is a particular request which is searching where after 5 successful search the proceeding searches displayed “Customer API. Unable to establish connection”
Please see image below:
1 First Image displays lower load time while second image displays higher load time
It looks like you discovered a bottleneck in your application or at least it is not properly configured for high loads. Almost a minute response time for transfering 600 kilobytes is not something I would expect from a well-behaved application.
The error message you're getting is very specific so I would recommend checking your application logs as the very first step. The remaining steps would be inspecting application and middleware configuration and ensuring that it's properly set up for high performance and it has enough headroom to operate in terms of CPU, RAM, Network, Disk, etc.
Context:
My Spring-Boot app runs as expected on Cloud Run when I deploy it with max-instances set to 1: It receives a constant stream of pubsub messages via push, and makes anywhere from 0 to 5 writes to an associated CloudSQL instance, depending on the message payload. Typically it handles between 20 and 40 messages per second. Latency/response-time varies between 50ms and 60sec, probably due to some resource contention.
In order to increase throughput/ decrease resource contention, I'm looking to experiment with the connection pool size per app-instance, as well as the concurrency and max-instances parameters for my cloud run app.
I understand that due to Spring-Boot, my app has a relatively high cold-start time of about 30-40 seconds. This is acceptable for how this service is used.
Problem:
I'm experiencing problems when deploying a spring-boot app to cloud run with max-instances set to a value greater than 1:
Instances start, handle a single request successfully, and then produce no more logs.
This happens a few times per minute, leading me to believe that instances get started (cold-start), handle a single request, die, and then get started again. They are not being reused as described in the docs, and as is happening when I set max-instances to 1. Official docs on concurrency
Instead, I expect 3 container instances to be started, which then each requests according to max-concurrency setting.
Billable container time at max-instances=3:
As shown in the graph, the number of instances is fluctuating wildly, once the new revision with max-instances=3 is deployed.
The graphs for CPU- and memory-usage also look like this.
There are no error logs. As before at max-instaces=1, there are warnings indicating that there are not enough instances available to handle requests (HTTP 429).
Connection Limit of CloudSQL instance has not been exceeded
Requests are handled at less than 10/s
Finally, this is the command used to deploy:
gcloud beta run deploy my-service --project=[...] --image=[...] --add-cloudsql-instances=[...] --region=[...] --platform=managed --memory=1Gi --max-instances=3 --concurrency=3 --no-allow-unauthenticated
What could cause this behavior?
Some month ago, in private Alpha, I performed tests and I observed the same behavior. After discussion with Google team, I understood that instances are over provisioned "in case of": an instances crashes, an instances is preempted, the traffic suddenly increase,...
The trade-off of this is that you will have more cold start that your max instances values. Worse, you will be charged for this over provisioned cold start -> this is not an issue because Cloud Run has a huge free tier that covers this kind of glitches.
Going deeper in the logs (you can do it by creating a sink of Cloud Run logs into BigQuery and then by requesting them), even if there is more instances up than your max instances, only your max instances are active in the same time. I'm not sure to be clear. With your parameters, that means, if you have 5 instances up in the same time, only 3 serve the traffic at the same point of time
This part is not documented because it evolves constantly for find the best balance between over-provisioning and lack of ressources (and 429 errors).
#Steren #AhmetB can you confirm or correct me?
When Cloud Run receives and processes requests rapidly, it predicts how many instances it needs, and will try to scale to the amount. If a sudden burst of requests occur, Cloud Run will instantiate a larger number of instances as a response. This is done in order to adapt to a possible higher number of network requests beyond what it is currently serving, with attempts to take into consideration the length of time it will take for the existing instance to complete loading the request. Per the documentation, it is possible that the amount of container instances can go above the max instance value when it spikes.
You mentioned with max-instances set to 1 it was running fine, but later you mentioned it was in fact producing 429s with it set to 1 as well. Seeing behavior of 429s as well as the instances spiking could indicate that the amount of traffic is not being handled fluidly.
It is also worth noting, because of the cold start time you mention, when instances are serving the first request(s), by design, the number of concurrent requests is actually hard set to 1. Once things are fully ready,only then the concurrency setting you have chosen is applied.
Was there some specific reason you chose 3 and 3 for Max Instance settings and concurrency? Also how was the concurrency set when you had max instance set to 1? Perhaps you could try tinkering up further the concurrency (max 80) and /or Max instances (high limit up to 1000) and see if that removes the 429s.
Jmeter tests are run in master slave fashion with around 8 slave machines. However with the remote batching mode set to MODE_STRIPPED_BATCH, I am not able to run tests for more than 64 hours. Throughput is around 450 requests per minute, and per slave machine it results in the creation of jtl files that are around 1.5 gb. All 8 slaves are going to send this to the master (1.5 gb x 8) and probably the I/O gets too much for the master to handle. The master machines memory is at 16 gb ram and has disk storage of around 250 gb. I was wondering if the jmeter distributed architecture has any provision to make long running soak tests possible without any un explained stress on the master machine. Obviously I have the option to abandon master slave setup and go for 8 independent nodes, however I'll in that case run into complications with respect to serving data csv files ( which I currently serve using simple table server plugin from the master m) and also around aggregating result files. Any suggestions please. It would be great to be able to run tests atleast for around 4 days (96 hours or so).
I would suggest to go for an independent JMeter workers + external data collector setup.
Actually, the JMeter right-out-of-the-box "distributed scaling" abilities are weak, way outdated & overall pretty ridiculous. As well as it's data collection/agregation/processing abilities.
This situation actually puzzles me a lot - mind you, rivals are even worse, so there's literally NOTHING in the field (except for, perhaps, some SaaS solutions trying to monetize on this gap).
But is is what it is...
So that's about why-s, now to how-s.
If I were you, I would:
Containerize the JMeter worker
Equip each container with a watchdog to quickly restart the worker if things go south locally (or probably even on schedule to refresh it ultimately). Be that an internal one, or external like cloud services have - doesn't matter.
Set up a timeseries database - I recommend InfluxDB, it's an excellent product & it's free in basic version (which is going to be enough for your purposes).
Flow your test results/metrics into that DB - do not collect them locally! You can do it right from your tests with pretty simple custom listener (Influx line protocol is ridiculously simple & fast), or you can have external agent watching the result files as they flow. I just suggest you not to use so called Backend Listner to do the job - it's garbage, it won't shape your data right, so you'd have to do additional ops to bring them to order.
If you shape your test result/metrics data properly, you've get 'em already time-synced into a single set - and the further processing options are amazingly powerful!
My expectation is that you're looking for the StrippedAsynch sampler sender mode.
As per the documentation:
Asynch
samples are temporarily stored in a local queue. A separate worker thread sends the samples. This allows the test thread to continue without waiting for the result to be sent back to the client. However, if samples are being created faster than they can be sent, the queue will eventually fill up, and the sampler thread will block until some samples can be drained from the queue. This mode is useful for smoothing out peaks in sample generation. The queue size can be adjusted by setting the JMeter property asynch.batch.queue.size (default 100) on the server node.
StrippedAsynch
remove responseData from successful samples, and use Async sender to send them.
So on slave node add the following line to user.properties file:
mode=StrippedAsynch
and on the master node define asynch.batch.queue.size, to be as high to not to have impact onto JMeter's throughput (won't slow it down) and as low to not to overwhelm the master. I would start with 1000.
Another option is using StrippedDiskStore but you will have to manually collect serialized results after test completion (make sure that slave processes will not shut down because the results will be deleted when slave process finishes)
You could use JMeter PerfMon Plugin to monitor memory and network usage on master and slaves.
I am using to test my web server https://buyandbrag.in .
I have tested it for 100 users. But the main server is not showing like it is crowded or not.
I want to know whether it is really pressuring the main server(a cloud server I am using).Or just use the client resourse where the tool is installed.
Yes as mentioned you should be monitoring both servers to see how they handle the load. The simplest way to do this is with TOP (if your server OS is *NIX) also you should be watching the network activity i.e. Bandwidth, connection status (time wait, close wait and so on).
Also if your using apache keep an eye on the logs you should see the requests being logged there
Good luck with the tests
I want to know "how many users my website can handele ?",when I tested with 50 threads ,the cpu usage of my server increased but not the connections log(It showed just 2 connections).also the bandwidth usage is not that much
Firstly what connections are you referring to? Apache, DB etc?
Secondly if you want to see how many users your current setup can hand you need to create a profile or traffic model of what an average user will do on your site.
For example:
Say 90% of the time they will search for something
5% of the time they will purchase x
5% of the time they login.
Once you have your "Traffic Model" defined, implement it in jMeter then start increasing your load in increments i.e. running your load test for 10mins with x users, after 10mins increment that number and so on until you find your breaking point.
If you graph your responses you should see two main things:
1) The optimum response time / number of users before the service degrades
2) The tipping point i.e. at what point you start returning 503's etc
Now you'll have enough data to scale your site or to start making performance improvements from a code point of view.