What is the suitable number of queue workers? - laravel

I was wondering if there is a relation between number of queue workers and CPU or RAM resources. I noticed that Laravel defaults to 8-10 workers, however on my own experience I've tried once to increase them to 50 workers and what a huge and fast performance I get with my Digitalocean VPS 8GB RAM and 4CPUs compared to only 10 workers.
So is there any relation between there number and resources ?

Related

Elasticsearch Ram recommendation

I'm deploying an Elasticsearch cluster with roughly 40GB a day with a time-to-live of 365 days. Write speed would be around 50 msgs/sec. Reads would be mostly driven by user dashboards, so the read frequency won't be high. What would be the best hardware requirements for this amount of data? How many master and data nodes may require in this situation?
obviously base on search index rate you should choose the hardware. 50 msg/sec is very low for elasticsearch. you have total 14.6TB data that is your 85 percent of total disk (base on 85% watermark). this means that you need 17TB disk. I think you can use one server with 128GB RAM and atleast 10 Core CPU and 17TB disk or have two server with half of this config. one server is master and data node and one server will be only data node.

How to increase greenplum concurrency and # query per sec

We have a fairly big Greenplum v4.3 cluster. 18 hosts, each host has 3 segment nodes. Each host has approx 40 cores and 60G memory.
The table we have is 30 columns wide, which has 0.1 billion rows. The query we are testing has 3-10 secs response time when there is no concurrency pressure. As we increase the # of queries we fired in parallel, the latency is decreasing from avg 3 secs to 50ish secs as expected.
But we've found that regardless how many queries we fired in parallel, we only have like very low QPS(query per sec), almost just 3-5 queries/sec. We've set the max_memory=60G, memory_limit=800MB, and active_statments=100, hoping the CPU and memory can be highly utilized, but they are still poorly used, like 30%-40%.
I have a strong feeling, we tried to feed up the cluster in parallel badly, hoping to take the best out of the CPU and Memory utilization. But it doesn't work as we expected. Is there anything wrong with the settings? or is there anything else I am not aware of?
There might be multiple reasons for such behavior.
Firstly, every Greenplum query uses no more than one processor core on one logical segment. Say, you have 3 segments on every node with 40 physical cores. Running two parallel queries will utilize maximum 2 x 3 = 6 cores on every node, so you will need about 40 / 6 ~= 6 parallel queries to utilize all of your CPUs. So, maybe for your number of cores per node its better to create more segments (gpexpand can do this). By the way, are the tables that used in the queries compressed?
Secondly, it may be a bad query. If you will provide a plan for the query, it may help to understand. There some query types in Greenplum that may have master as a bottleneck.
Finally, that might be some bad OS or blockdev settings.
I think this document page Managing Resources might help you mamage your resources
You can use Resource Group limit/controll your resource especialy concurrency attribute(The maximum number of concurrent transactions, including active and idle transactions, that are permitted in the resource group).
Resouce queue help limits ACTIVE_STATEMENTS
Note: The ACTIVE_STATEMENTS will be the total statement current running, when you have 50s cost queries and next incoming queries, this could not be working, mybe 5 * 50 is better.
Also, you need config memory/CPU settings to enable your query can be proceed.

Storm topology processing slowing down gradually

I have been reading about apache Storm tried few examples from storm-starter. Also learnt about how to tune the topology and how to scale it to perform fast enough to meet the required throughput.
I have created example topology with acking enabled, i am able to achieve 3K-5K messages processing per second. It performs really fast in initial 10 to 15min or around 1mil to 2mil message and then it starts slowing down. On storm UI, I can see the overall latency starts going up gradually and does not comes back, after a while the processing drops to only few hundred a second. I am getting exact same behavior for all the typologies i tried, the simplest one is to just read from kafka using KafkaSpout and send it to transform bolt parse the msg and send it to kafka again using KafkaBolt. The parser is very fast as it takes less than a millisecond to parse the message. I tried few option of increasing/describing the parallelism, changing the buffer sizes etc. but same behavior. Please help me to find out the reason for gradual slowness in the topology. Here is the config i am using
1 Nimbus machine (4 CPU) 24GB RAM
2 Supervisor machines (8CPU) and using 1 thread per core with 24GB RAM
4 Node kafka cluster running on above 2 supervisor machines (each topic has 4 partitions)
KafkaSpout(2 parallelism)-->TransformerBolt(8)-->KafkaBolt(2)
topology.executor.receive.buffer.size: 65536
topology.executor.send.buffer.size: 65536
topology.spout.max.batch.size: 65536
topology.transfer.buffer.size: 32
topology.receiver.buffer.size: 8
topology.max.spout.pending: 250
At the start
After few minutes
After 45 min - latency started going up
After 80 min - Latency will keep going up and will go till 100 sec by the time it reaches 8 to 10mil messages
Visual VM screenshot
Threads
Pay attention to the capacity metric on RT_LEFT_BOLT, it is very close to 1; which explains why your topology is slowing down.
From the Storm documentation:
The Storm UI has also been made significantly more useful. There are new stats "#executed", "execute latency", and "capacity" tracked for all bolts. The "capacity" metric is very useful and tells you what % of the time in the last 10 minutes the bolt spent executing tuples. If this value is close to 1, then the bolt is "at capacity" and is a bottleneck in your topology. The solution to at-capacity bolts is to increase the parallelism of that bolt.
Therefore, your solution is to add more executors (and tasks) to that given bolt (RT_LEFT_BOLT). Another thing you can do is reduce the number of executors on RT_RIGHT_BOLT the capacity indicates you don't need that many executors, probably 1 or 2 can do the job.
The issue was due to GC setting with newgen params, it was not using the allocated heap completely so internal storm queues were getting full and running out of memory. The strange thing was that storm did not throw out of memory error, it just got stalled, with the help of visual vm i was able to trace it down.

Block Replication Limits in HDFS

I'm currently rebuilding our servers that have our region-servers and data nodes. When I take down a data node, after 10 minutes the blocks that it had are being re-replicated among other data nodes, as it should. We have 10 data-nodes, so I see heavy network traffic as the blocks are being re-replicated. However, I'm seeing that traffic to be about only 500-600mbps per server (the machines all have gigabit interfaces) so it's definitely not network-bound. I'm trying to figure out what is limiting the speed that the data-nodes send and receive blocks. Each data-node has six 7200 rpm sata drives, and the IO usage is very low during this, only peaking to 20-30% per drive. Is there a limit built into hdfs that limits the speed at which blocks are replicated?
The rate of replication work is throttled by HDFS to not interfere with cluster traffic when failures happen during regular cluster load.
The properties that control this are dfs.namenode.replication.work.multiplier.per.iteration (2), dfs.namenode.replication.max-streams (2) and dfs.namenode.replication.max-streams-hard-limit (4). The foremost controls the rate of work to be scheduled to a DN at every heartbeat that occurs, and the other two further limit the maximum parallel threaded network transfers done by a DataNode at a time. The values in () indicate their defaults. Some description of this is available at https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
You can perhaps try to increase the set of values to (10, 50, 100) respectively to spruce up the network usage (requires a NameNode restart), but note that your DN memory usage may increase slightly as a result of more blocks information being propagated to it. A reasonable heap size for these values for the DN role would be about 4 GB.
P.s. These values were not tried by me on production systems personally. You will also not want to max out the re-replication workload such that it affects regular cluster work, as recovery of 1/3 replicas may be of lesser priority than missing job/query SLAs due to lack of network resources (unless you have a really fast network that's always under-utilised even under loaded periods). Try to tune it till you're satisfied with the results.

Optimal number of Resque workers for maximum performance

I am using Resque for achieving cheap parallelism in my academic research - I split huge tasks into relatively small independent portions and submit them to Resque. These tasks do some heavy stuff, extensively using both database(MongoDB if that's important) and CPU.
All this works extremely slow - for my relatively small portion of dataset 1000 jobs get created and 14 hours of constant working of 2 workers is enough only for finishing ~800 of them. As you might've already suspected, this speed is more than frustrating.
I have a quad-core processor(Core i5 something, not high-end) and apart from Mongo instance and resque workers nothing gets scheduled on CPU for a considerable period of time.
Now that you know my story, all I am asking is - how do I squeeze maximum out of this setting? I believe that 3 workers + 1 mongo instance will quickly fill up all the cores, but at the same time mongo doesn't have to work all the time..

Resources