Too many consumers in Rabbit MQ (Bunny) - ruby

I'm sending lots of data to my app through JMeter.
My subscribe block and the publisher look like this:
BunnyStarter.start_bunny_components
cons = BunnyStarter.queue.subscribe do |delivery_info, metadata, payload|
method_calling ( payload )
cons.cancel
end
BunnyStarter.exchange.publish(body.to_json, routing_key: BunnyStarter.queue.name)
And my BunnyStarter class:
def self.start_bunny_components
if ##conn.nil?
##conn = Bunny.new
##conn.start
##ch = ##conn.create_channel
##queue = ##ch.queue("dump_probe_queue")
##exchange = ##ch.default_exchange
end
end
The problem is, although I call consumer.cancel after method_calling, in my Rabbit MQ admin I still see that I get like one thousand consumers created in about 6 minutes.
Is that because of the rate and the amount of data I'm sending?
How can I improve this?

I have seen this issue before. The reason its creating 1000 of consumers is because you are creating channel per connection. Eventually your consumer will shut down after a while because of this.
The number of consumers getting created are not because of the data but its because in the consumer its creating one connection per subscription.
Solution:
Instead of creating multiple channels, create only one channel and use number of connections using the same channel.
I mean one instance of Connection and multiple instances of Model so you can share the same connection for multiple model.

Related

Save consumer/tailer read offset for ChronicleQueue

I am exploring ChronicleQueue to save events generated in one of my application.I would like to publish the saved events to a different system in its original order of occurrence after some processing.I have multiple instances of my application and each of the instance could run a single threaded appender to append events to ChronicleQueue.Although ordering across instances is a necessity,I would like to understand these 2 questions.
1)How would the index of the read index for my events be saved so that I don't end up reading and publishing the same message from chronicle queue multiple times.
In the below code(picked from the example in github) the index is saved till we reach the end of the queue while we restarted the application.The moment we reach the end of the queue,we end up reading all the messages again from the start.I want to make sure for a particular consumer identified by a tailer Id, the messages are read only once.Do i need to save the read index in another queue and use that to achieve what I need here.
String file = "myPath";
try (ChronicleQueue cq = SingleChronicleQueueBuilder.binary(file).build()) {
for(int i = 0 ;i<10;i++){
cq.acquireAppender().writeText("test"+i);
}
}
try (ChronicleQueue cq = SingleChronicleQueueBuilder.binary(file).build()) {
ExcerptTailer atailer = cq.createTailer("a");
System.out.println(atailer.readText());
System.out.println(atailer.readText());
System.out.println(atailer.readText());
}
2)Also need some suggestion if there is a way to preserve ordering of events across instances.
Using a named tailer should ensure that the tailer only reads a message once. If you have an example where this doesn't happen can you create a test to reproduce it?
The order of entries in a queue are fixed when writing and all tailer see the same messages in the same order, there isn't any option.

direct reply pseudo queue with bunny gem

I am creating an rabbitmq rpc in ruby 2.3 using bunny 2.7.0
I've made it with one reply queue per client. But I am expected to have quite a large amount of clients and it is not efficient to do it in this way. I want to use a direct reply feature of rabbitmq
connection = Bunny.new(rabbitmq_url, :automatically_recover => true)
connection.start
channel = connection.create_channel
reply_queue = channel.queue('amq.rabbitmq.reply-to', no_ack: true)
on the last line of code I receive error
Bunny::AccessRefused: ACCESS_REFUSED - queue name 'amq.rabbitmq.reply-to' contains reserved prefix 'amq.*'
in theory that is expected due to http://rubybunny.info/articles/queues.html
but on other hand - there is an article https://www.rabbitmq.com/direct-reply-to.html that describes existance an usability of this queue.
i want to declare a queue because i need to subscribe to it to receive respond
consumer = reply_queue.subscribe do |_, properties, payload|
# action
end
I dont understand what am I doing wrong with it (
there are similar topics with examples of such approach but created on other languages and tools like nodejs and that seems to work fine. What am I doing wrong with bunny ?
Update
found the problem - I used odler version of rabbitmq server. That one that id not support direct reply queue yet
I think it's trying to create it which you're not allowed to do.
https://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2013-September/030095.html
My ruby is a tad rusty but give this a try:
channel = connection.create_channel
channel.queue_declare('amq.rabbitmq.reply-to', :passive => true)

Horizontally and vertically scaling RabbitMQ consumers?

How do I scale RabbitMQ consumers in Ruby using the AMQP gem?
I read the documentation and came up with something that (seemingly) works
in a trivial example. The examples scale horizontally. New processes connect
to the broker and receive a subset of messages. From there each process can
spin up multiple consumer threads. It uses the low level consumer interface
described in the documentation.
Here's the code:
require 'amqp'
workers = ARGV[1] || 4
puts "Running #{workers} workers"
AMQP.start do |amqp|
channel = AMQP::Channel.new
channel.on_error do |conn, ex|
raise ex.reply_text
end
exchange = channel.fanout 'scaling.test', durable: true, prefetch: 1
queue = channel.queue("worker_queue", auto_delete: true).bind(exchange)
workers.times do |i|
consumer = AMQP::Consumer.new channel, queue, "consumer-#{i}", exclusive = false, manual_ack = false
consumer.consume.on_delivery do |meta, payload|
meta.ack
puts "Consumer #{consumer.consumer_tag} in #{Process.pid} got #{payload}"
end
end
trap('SIGTERM') do
amqp.start { EM.stop }
end
end
There are a few things I'm unsure of:
Does the exchange type matter? The documentation states a direct exchange load balances
messages between queues. I tested this example using direct and fanout exchanges and it
functioned the same way. So if I'd like to support vertical and horizontal scaling does the
exchange type matter?
What should the :prefetch option be? I thought one would be best.
How does the load balancing work specifically? The documentation states that load
balancing happens between consumers and not between queues. However when I run two
processes I can see process one print out: "1, 2, 3, 4", then process two print out
"5, 6, 7, 8". I thought they would be out of order, or is the channel itself the consumer?
This would make sense in accordance to the output but not the documentation.
Does this look correct from the EventMachine perspective? Do I need to do some sort of
thread pooling to get multiple consumers in the same process working correctly?
Most of this is actually covered in the documentation for brokers like RabbitMQ, but in answer to your questions:
For a worker queue you most likely want a direct exchange, that is, one which will route a message (job) to one worker exactly, and not to multiple workers at once. But this might change depending on your work. Fanout by definition should route the same message to multiple consumers.
Prefetch should be 1 for this time of setup. Generally speaking this asks the broker to fill the consumer's network buffer with 1 message until ack'd. An alternate setup would be that you have 1 consumer and n workers, in which case you would set the prefetch to n. Additionally it is worth noting that in this sort of setup you shouldn't ack until after you've done the work.
Load balancing is basically a round-robin between consumers. Which is why you're seeing everything sequentially. If each bit of work takes a different amount of time you'll see the distribution change.
I hope that helps. I haven't done anything with the Ruby AMQP library for a while — we rewrote all our workers in Go.

Publisher finishes before subscriber and messages are lost - why?

Fairly new to zeromq and trying to get a basic pub/sub to work. When I run the following (sub starting before pub) the publisher finishes but the subscriber hangs having not received all the messages - why ?
I think the socket is being closed but the messages have been sent ? Is there a way of ensuring all messages are received ?
Publisher:
import zmq
import random
import time
import tnetstring
context=zmq.Context()
socket=context.socket(zmq.PUB)
socket.bind("tcp://*:5556")
y=0
for x in xrange(5000):
st = random.randrange(1,10)
data = []
data.append(random.randrange(1,100000))
data.append(int(time.time()))
data.append(random.uniform(1.0,10.0))
s = tnetstring.dumps(data)
print 'Sending ...%d %s' % (st,s)
socket.send("%d %s" % (st,s))
print "Messages sent: %d" % x
y+=1
print '*** SERVER FINISHED. # MESSAGES SENT = ' + str(y)
Subscriber :-
import sys
import zmq
import tnetstring
# Socket to talk to server
context = zmq.Context()
socket = context.socket(zmq.SUB)
socket.connect("tcp://localhost:5556")
filter = "" # get all messages
socket.setsockopt(zmq.SUBSCRIBE, filter)
x=0
while True:
topic,data = socket.recv().split()
print "Topic: %s, Data = %s. Total # Messages = %d" % (topic,data,x)
x+=1
In ZeroMQ, clients and servers always try to reconnect; they won't go down if the other side disconnects (because in many cases you'd want them to resume talking if the other side comes up again). So in your test code, the client will just wait until the server starts sending messages again, unless you stop recv()ing messages at some point.
In your specific instance, you may want to investigate using the socket.close() and context.term(). It will block until all the messages have been sent. You also have the problem of a slow joiner. You can add a sleep after the bind, but before you start publishing. This works in a test case, but you will want to really understand what is the solution vs a band-aid.
You need to think of the PUB/SUB pattern like a radio. The sender and receiver are both asynchronous. The Publisher will continue to send even if no one is listening. The subscriber will only receive data if it is listening. If the network goes down in the middle, the data will be lost.
You need to understand this in order to design your messages. For example, if you design your messages to be "idempotent", it doesn't matter if you lose data. An example of this would be a status type message. It doesn't matter if you have any of the previous statuses. The latest one is correct and message loss doesn't matter. The benefits to this approach is that you end up with a more robust and performant system. The downsides are when you can't design your messages this way.
Your example includes a type of message that requires no loss. Another type of message would be transactional. For example, if you just sent the deltas of what changed in your system, you would not be able to lose the messages. Database replication is often managed this way which is why db replication is often so fragile. To try to provide guarantees, you need to do a couple things. One thing is to add a persistent cache. Each message sent needs to be logged in the persistent cache. Each message needs to be assigned a unique id (preferably a sequence) so that the clients can determine if they are missing a message. A second socket (ROUTER/REQ) needs to be added for the client to request the missing messages individually. Alternatively, you could just use the secondary socket to request resending over the PUB/SUB. The clients would then all receive the messages again (which works for the multicast version). The clients would ignore the messages they had already seen. NOTE: this follows the MAJORDOMO pattern found in the ZeroMQ guide.
An alternative approach is to create your own broker using the ROUTER/DEALER sockets. When the ROUTER socket saw each DEALER connect, it would store its ID. When the ROUTER needed to send data, it would iterate over all client IDs and publish the message. Each message should contain a sequence so that the client can know what missing messages to request. NOTE: this is a sort of reimplementation of Kafka from linkedin.

Posting large number of messages to AMQP queue

Using v0.7.1 of the Ruby amqp library and Ruby 1.8.7, I am trying to post a large number (millions) of short (~40 bytes) messages to a RabbitMQ server. My program's main loop (well, not really a loop, but still) looks like this:
AMQP.start(:host => '1.2.3.4',
:username => 'foo',
:password => 'bar') do |connection|
channel = AMQP::Channel.new(connection)
exchange = channel.topic("foobar", {:durable => true})
i = 0
EM.add_periodic_timer(1) do
print "\rPublished #{i} commits"
end
results = get_results # <- Returns an array
processor = proc do
if x = results.shift then
exchange.publish(x, :persistent => true,
:routing_key => "test.#{i}")
i += 1
EM.next_tick processor
end
end
EM.next_tick(processor)
AMQP.stop {EM.stop} end
The code starts processing the results array just fine, but after a while (usually, after 12k messages or so) it dies with the following error
/Library/Ruby/Gems/1.8/gems/amqp-0.7.1/lib/amqp/channel.rb:807:in `send':
The channel 1 was closed, you can't use it anymore! (AMQP::ChannelClosedError)
No messages are stored on the queue. The error seems to be happening just when network activity from the program to the queue server starts.
What am I doing wrong?
First mistake is that you didn't post the RabbitMQ version that you are using. Lots of people are running old obsolete version 1.7.2 because that is what is in their OS package repositories. Bad move for anyone sending the volume of messages that you are. Get RabbitMQ 2.5.1 from the RabbitMQ site itself and get rid of your default system package.
Second mistake is that you did not tell us what is in the RabbitMQ logs.
Third mistake is that you said nothing about what is consuming the messages. Is there another process running somewhere that has declared a queue and bound it to the exchange. There is NO message queue unless somebody declares it to RabbitMQ and binds it to an exchange. Even then messages will only flow if the binding key for the queue matches the routing key that you publish with.
Fourth mistake. You have routing keys and binding keys mixed up. The routing key is a string such as topic.test.json.echos and the binding key (used to bind a queue to an exchange) is a pattern like topic.# or topic..json.
Updated after your clarifications
Regarding versions, I'm not sure when it was fixed but there was a problem in 1.7.2 with large numbers of persistent messages causing RabbitMQ to crash when it rolled over its persistence log, and after crashing it was unable to restart until someone manually undid the rollover.
When you say that a connection is being opened and closed, I hope that it is not per message. That would be a strange way to use AMQP.
Let me repeat. Producers do NOT write messages to queues. They write messages to exchanges which then route the messages to queues based on the routing key (string) and the queue's binding key (pattern). In your example I misread the use of the # sign, but I see nothing which declares a queue and binds it to the exchange.

Resources