ZMQ PUB/SUB socket high latency under low load

ZMQ PUB/SUB socket high latency under low load - zeromq

I am testing ZMQ PUB/SUB socket performance with NetMQ.
In the test program below, SUB socket is used as a server to receive messages and PUB socket is used as a client to receive messages. The topic of a message is simply a sequence number and the content of a message is the time when it is sent by the PUB client. The SUB server measures the latency by comparing between the time when the message is received time and the time (which is the message sent time) inside the message.
For 1,000 messages sent over ~1 second, the performance results I find are the average latency ~4ms and maximum latency ~40ms. However, based on http://wiki.zeromq.org/results:more-precise-0mq-tests, ZMQ's latency should be in the magnitude of microseconds which is much less than the latency I measured.
Do I use ZMQ PUB/SUB sockets wrongly? Is there any way to reduce the latency? Any ideas will be welcome and appreciated.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using NetMQ;
using NetMQ.Sockets;
namespace NetMQ_Hello
{
class Program
{
static void Main(string[] args)
{
if (args.Length == 0)
{
Console.WriteLine("Please use \"server\" or \"client\" as the first argument.");
return;
}
string mode = args[0];
if (mode == "server")
{
RunSubAsServer();
}
else if (mode == "client")
{
RunPubAsClient();
}
}
static void RunPubAsClient()
{
string address = "tcp://localhost:1234";
PublisherSocket pubSocket = new PublisherSocket();
pubSocket.Connect(address);
Thread.Sleep(10);
Console.WriteLine("Starting to send messages...");
DateTime startTime = DateTime.UtcNow;
for (int i = 0; i < 1000; i++)
{
long data = DateTime.UtcNow.ToBinary();
byte[][] frames = new byte[2][] { BitConverter.GetBytes(i), BitConverter.GetBytes(data) };
pubSocket.SendMultipartBytes(frames);
Console.WriteLine(string.Format("Sent: {0}, {1}", i, data));
}
DateTime endTime = DateTime.UtcNow;
Console.WriteLine(string.Format("1,000 messages sent in {0}", endTime - startTime));
Console.WriteLine("Press enter to exit...");
Console.ReadLine();
pubSocket.Disconnect(address);
}
static void RunSubAsServer()
{
string address = "tcp://*:1234";
SubscriberSocket subSocket = new SubscriberSocket();
subSocket.Bind(address);
subSocket.SubscribeToAnyTopic();
Thread.Sleep(10);
// Stats
List<TimeSpan> latencies = new List<TimeSpan>();
Console.WriteLine("Starting to receive messages...");
List<byte[]> frames = new List<byte[]>();
while (true)
{
if (subSocket.TryReceiveMultipartBytes(
timeout: TimeSpan.FromSeconds(1),
frames: ref frames,
expectedFrameCount: 2))
{
DateTime now = DateTime.UtcNow;
int topic = BitConverter.ToInt32(frames[0], 0);
DateTime sentTime = DateTime.FromBinary(BitConverter.ToInt64(frames[1], 0));
TimeSpan latency = now - sentTime;
Console.WriteLine(string.Format("Received: {0}, {1}, delay {2}",
topic, sentTime, latency));
latencies.Add(latency);
if (topic == 1000 - 1)
{
break;
}
}
}
int n = latencies.Count;
double max = latencies.Max().TotalMilliseconds;
double mean = latencies.Sum(s => s.TotalMilliseconds) / n;
Console.WriteLine(String.Format("Latency\nMax: {0}ms\nMean: {1}ms", max, mean));
Console.WriteLine("Press enter to exit...");
Console.ReadLine();
subSocket.Unbind(address);
}
}
}
Sample output from Sub server:
Received: 990, 9/30/2020 12:53:11 PM, delay 00:00:00.0010009
Received: 991, 9/30/2020 12:53:11 PM, delay 00:00:00
Received: 992, 9/30/2020 12:53:11 PM, delay 00:00:00.0019993
Received: 993, 9/30/2020 12:53:11 PM, delay 00:00:00.0010016
Received: 994, 9/30/2020 12:53:11 PM, delay 00:00:00
Received: 995, 9/30/2020 12:53:11 PM, delay 00:00:00.0019993
Received: 996, 9/30/2020 12:53:11 PM, delay 00:00:00.0010007
Received: 997, 9/30/2020 12:53:11 PM, delay 00:00:00
Received: 998, 9/30/2020 12:53:11 PM, delay 00:00:00
Received: 999, 9/30/2020 12:53:11 PM, delay 00:00:00.0030002
Latency
Max: 39ms
Mean: 4.49101730769231ms

Related

QueueBrowser vs MessageConsumer

When we compare QueueBrowser with MessageListener, QueueBrowser is very slow.
QueueBrowser is taking approx 1 min to process 100 messages where as consumer is processing ~840 messages.
This mush difference is expected? can you please suggest if anything needs to be changed in the below code:
queueEnum = queueBrowserIn.GetEnumerator();
while (true)
{
if (queueEnum.MoveNext())
{
messageCount++;
LogWrite($"Message No - {messageCount} - Method: ProcessNewMesage" + DateTime.Now);
IBytesMessage bytesMessage = queueEnum.Current as IBytesMessage;
if (bytesMessage != null)
{
byte[] arrayMessage = new byte[bytesMessage.BodyLength];
bytesMessage.ReadBytes(arrayMessage);
string message = System.Text.Encoding.Default.GetString(arrayMessage);
}
}
}

Transmitting 1500KB (hex files )data over UDS using CAPL test module

I am trying to download my hex file of size 1500KB via UDS with CAPL test module,
p2 timer = 50ms
p2* timer = 5000ms
Here is snippet of my code for data transfer :
void TS_transferData()
{
byte transferData_serviceid = 0x36;
byte blockSequenceCounter = 0x1;
byte buffer[4093];
byte binarydata[4095];
long i,ret1,ret2,ret3,temp,timeout = 0,Counter = 0;
char filename[30] = "xxx.bin";
dword readaccess_handle;
diagrequest ECU_QUALIFIER.* request;
long valueleft;
readaccess_handle = OpenFileRead(filename,1);
if (readaccess_handle != 0 )
{
while( (valueleft = fileGetBinaryBlock(buffer,elcount(buffer),readaccess_handle))==4093 )
{
binarydata[0] = transferData_serviceid;
binarydata[1] = blockSequenceCounter;
for(i=0;i<elcount(buffer);i++)
{
binarydata[i+2] = buffer[i];
}
diagResize(request, elCount(binarydata));
DiagSetPrimitiveData(request,binarydata,elcount(binarydata));
DiagSendRequest(request);
write("length of binarydata %d ",elcount(binarydata));
// Wait until the request has been completely sent
ret1 = TestWaitForDiagRequestSent(request, 20000);
if(ret1 == 1) // Request sent
{
ret2=TestWaitForDiagResponse(request,50);
if(ret2==1) // Response received
{
ret3=DiagGetLastResponseCode(request); // Get the code of the response
if(ret3==-1) // Is it a positive response?
{
;
}
else
{
testStepFail(0, "4.0","Binary Datatransfer on server Failed");
break;
}
}
else if(ret2 == timeout)
{
testStepFail(0, "4.0","Binary Datatransfer on server Failed");
write("timeout occured while TestWaitForDiagResponse with block %d ",blockSequenceCounter);
}
}
else if(ret1 == timeout)
{
testStepFail(0, "4.0","Binary Datatransfer on server Failed");
write("timeout occured while TestWaitForDiagRequestSent %d ",blockSequenceCounter);
}
if(blockSequenceCounter == 255)
blockSequenceCounter = 0;
else
++blockSequenceCounter;
}
}
//handle the rest of the bytes to be transmitted
fileClose (readaccess_handle);
}
The software downloading is happening but it is taking a long.... time for download.
For TestWaitForDiagRequestSent() function any value for timeout less than 20000 is giving me timeout error.
Is there any other way I can reduce the software transfer time or where am I going wrong with calculation?
Is there any example I can refer to see How to transmit such a long data using CAPL ?
Sorry, I am a beginner to CAPL and UDS protocol.

quarkus http calls load test results 1000 requests - 16 seconds vs 65 seconds

Test 1:
#Path("/performance")
public class PerformanceTestResource {
#Timeout(20000)
#GET
#Path("/resource")
#Produces(MediaType.APPLICATION_JSON)
public Response performanceResource() {
final String name = Thread.currentThread().getName();
System.out.println(name);
Single<Data> dataSingle = null;
try {
dataSingle = Single.fromCallable(() -> {
final String name2 = Thread.currentThread().getName();
System.out.println(name2);
Thread.sleep(1000);
return new Data();
}).subscribeOn(Schedulers.io());
} catch (Exception ex) {
int a = 1;
}
return Response.ok().entity(dataSingle.blockingGet()).build();
}
}
The test itself see also for the callPeriodically definition:
#QuarkusTest
public class PerformanceTestResourceTest {
#Tag("load-test")
#Test
public void loadTest() throws InterruptedException {
int CALL_N_TIMES = 1000;
final long CALL_NIT_EVERY_MILLISECONDS = 10;
final LoadTestMetricsData loadTestMetricsData = LoadTestUtils.callPeriodically(
this::callHttpEndpoint,
CALL_N_TIMES,
CALL_NIT_EVERY_MILLISECONDS
);
assertThat(loadTestMetricsData.responseList.size(), CoreMatchers.is(equalTo(Long.valueOf(CALL_N_TIMES).intValue())));
long executionTime = loadTestMetricsData.duration.getSeconds();
System.out.println("executionTime: " + executionTime + " seconds");
assertThat(executionTime , allOf(greaterThanOrEqualTo(1L),lessThan(20L)));
}
Results test 1:
executionTime: 16 seconds
Test 2: same but without #Timeout annotation:
executionTime: 65 seconds
Q: Why? I think even 16 seconds is slow.
Q: How to make it faster: say to be 2 seconds for 1000 calls.
I realise that I use .blockingGet() in the resource, but still, I would expect re-usage of the blocking threads.
P.S.
I tried to go more 'reactive' returning Single or CompletionStage to return from the responses - but this seems not yet ready (buggy on rest-easy side). So I go with simple .blockingGet()` and Response.
UPDATE: Reactive / RX Java 2 Way
#path("/performance")
public class PerformanceTestResource {
//#Timeout(20000)
#GET
#Path("/resource")
#Produces(MediaType.APPLICATION_JSON)
public Single<Data> performanceResource() {
final String name = Thread.currentThread().getName();
System.out.println(name);
System.out.println("name: " + name);
return Single.fromCallable(() -> {
final String name2 = Thread.currentThread().getName();
System.out.println("name2: " + name2);
Thread.sleep(1000);
return new Data();
});
}
}`
pom.xml:
io.smallrye
smallrye-context-propagation-propagators-rxjava2
org.jboss.resteasy
resteasy-rxjava2
Then when run same test:
executionTime: 64 seconds
The output would be something like:
name: vert.x-worker-thread-5 vert.x-worker-thread-9 name: vert.x-worker-thread-9
name2: vert.x-worker-thread-9
name2: vert.x-worker-thread-5
so, we are blocking the worker thread, that is use on REst/Resource side. That's hwy. Then:
If I use:Schedulers.io() to be on separate execution context for the sleep-1000-call:
return Single.fromCallable(() -> { ... }).subscribeOn(Schedulers.io());
executionTime: 16 seconds
The output will be something like this (see a new guy: RxCachedThreadScheduler)
name: vert.x-worker-thread-5
name2: RxCachedThreadScheduler-1683
vert.x-worker-thread-0
name: vert.x-worker-thread-0
vert.x-worker-thread-9
name: vert.x-worker-thread-9
name2: RxCachedThreadScheduler-1658
vert.x-worker-thread-8
Seems regardless: whether I use explicitly blockingGet() or not, i get the same result.
I assume if I am not blocking it it would be around 2-3 seconds.
Q: I there a way to fix/tweak this from this point?
I assume the use of Schedulers.io() that brings the RxCachedThreadScheduler is the bottle neck in this point so I end up with the 16 seconds, 200 io / io threads is the limit by default? But should those be reused and not really be blocked. (don't think is good idea to set that limit to 1000).
Q: Or anyways: how would make app be responsive/reactive/performant as it should with Quarkus. Or what did I miss?
Thanks!

Ok. Maybe it is me.
In my callPeriodically(); i pass CALL_NIT_EVERY_MILLISECONDS = 10 milliseconds.
10 * 1000 = 10 000 - + ten seconds just to add the requests.
This, I set it to 0.
And got my 6 seconds for server 1000 simulations requests.
Still not 2-3 seconds. But 6.
Seems there is no difference if I use .getBlocking and return Response or returning Single.
--
But just to mention it, this hello world app take 1 second to process 1000 parallel requests. While Quarkus one is 6 seconds.
public class Sample2 {
static final AtomicInteger atomicInteger = new AtomicInteger(0);
public static void main(String[] args) {
long start = System.currentTimeMillis();
final List<Single<Response>> listOfSingles = Collections.synchronizedList(new ArrayList<>());
for (int i=0; i< 1000; i++) {
// try {
// Thread.sleep(10);
// } catch (InterruptedException e) {
// e.printStackTrace();
// }
final Single<Response> responseSingle = longCallFunction();
listOfSingles.add(responseSingle);
}
Single<Response> last = Single.merge(listOfSingles).lastElement().toSingle();
final Response response = last.blockingGet();
long end = System.currentTimeMillis();
System.out.println("Execution time: " + (end - start) / 1000);
System.out.println(response);
}
static Single<Response> longCallFunction() {
return Single.fromCallable( () -> { // 1 sec
System.out.println(Thread.currentThread().getName());
Thread.sleep(1000);
int code = atomicInteger.incrementAndGet();
//System.out.println(code);
return new Response(code);
}).subscribeOn(Schedulers.io());
}
}

What is the overhead of Masstransit when running on RabbitMQ?

When using Masstransit with RabbitMQ I see a disappointing performance. Deliver/get drops to 0.20/sec. I started to investigate this problem with a simple test application. This application runs in parallel a thread sending messages to RabbitMQ using the RabbitMQ client library and a thread using the Masstransit library sending the same message to RabbitMQ. The RabbitMQ client library can send 10 times more message than the Masstransit library.
RabbitMQ is running in a Docker container in a HyperV machine.
The PublishConfirm flag of Masstransit has some effect but not that much.
In order to get a fair comparison a defined the same topology for the RabbitMQ case as is used in the Masstransit case.
Masstransit code:
public MasstransitMessageSender(string user, string password, string rabbitMqHost)
{
this.user = user;
this.password = password;
this.rabbitMqHost = rabbitMqHost;
}
public RabbitMqMessageSender(string user, string password, string rabbitMqHost)
{
var myUri = new Uri(rabbitMqHost);
this.factory = new ConnectionFactory
{
HostName = myUri.Host,
UserName = user,
Password = password,
VirtualHost = myUri.LocalPath,
Port = (myUri.Port > 0 ? myUri.Port : -1),
AutomaticRecoveryEnabled = true
};
}
public Task SendCommands(int numberOfMessages)
{
return Task.Run(() =>
{
var busControl = global::MassTransit.Bus.Factory.CreateUsingRabbitMq(
sbc =>
{
// Host control
var host =
sbc.Host(
new Uri(this.rabbitMqHost),
h =>
{
h.Username(this.user);
h.Password(this.password);
h.Heartbeat(60);
h.PublisherConfirmation = false;
});
});
busControl.StartAsync().Wait();
var task = busControl.GetSendEndpoint(new Uri(this.rabbitMqHost + "/MasstransitService"));
var tasks = new List<Task>();
task.Wait();
var endpoint = task.Result;
for (var i = 0; i < numberOfMessages; i++)
{
tasks.Add(endpoint.Send<IMyMessage>(new
{
Number = i,
Description = "MyMessage"
}));
}
Task.WaitAll(tasks.ToArray());
});
}
RabbitMQ code:
public RabbitMqMessageSender(string user, string password, string rabbitMqHost)
{
var myUri = new Uri(rabbitMqHost);
this.factory = new ConnectionFactory
{
HostName = myUri.Host,
UserName = user,
Password = password,
VirtualHost = myUri.LocalPath,
Port = (myUri.Port > 0 ? myUri.Port : -1),
AutomaticRecoveryEnabled = true
};
}
public Task SendCommands(int numberOfMessages)
{
return Task.Run(() =>
{
using (var connection = this.factory.CreateConnection())
{
using (var channel = connection.CreateModel())
{
var messageProperties = channel.CreateBasicProperties();
channel.ExchangeDeclare("PerformanceConsole:ShowRabbitMqMessage", "fanout", true, false, null);
channel.ExchangeDeclare("RabbitMqService", "fanout", true, false, null);
channel.QueueDeclare("RabbitMqService", true, false, false, null);
channel.ExchangeBind("RabbitMqService", "PerformanceConsole:ShowRabbitMqMessage", "", null);
channel.QueueBind("RabbitMqService", "RabbitMqService", "", null);
for (var i = 0; i < numberOfMessages; i++)
{
var bericht = new
{
Volgnummer = 1,
Tekst = "Bericht"
};
var body = Encoding.UTF8.GetBytes(JsonConvert.SerializeObject(bericht));
channel.BasicPublish("RabbitMqService", "", messageProperties, body);
}
}
}
});
}
The RabbitMQ dashboard shows the following rates:
Masstransit incoming 664 msg/s
RabbitMQ incoming 6764 msg/s
I expected to incoming rates to be in the same range.
Maybe I made a mistake in the configuration, suggestions are welcome.

Using the MassTransit-Benchmark I get the following performance on my MacBook Pro 2015, using netcoreapp2.2 on OS X.
PhatBoyG-Pro15:MassTransit-Benchmark Chris$ dotnet run -f netcoreapp2.2 -- --clients=50 --count=50000 --prefetch=100
MassTransit Benchmark
Transport: RabbitMQ
Host: localhost
Virtual Host: /
Username: guest
Password: *****
Heartbeat: 0
Publisher Confirmation: False
Running Message Latency Benchmark
Message Count: 50000
Clients: 50
Durable: False
Payload Length: 0
Prefetch Count: 100
Concurrency Limit: 0
Total send duration: 0:00:05.2250045
Send message rate: 9569.37 (msg/s)
Total consume duration: 0:00:07.2385114
Consume message rate: 6907.50 (msg/s)
Concurrent Consumer Count: 8
Avg Ack Time: 4ms
Min Ack Time: 0ms
Max Ack Time: 251ms
Med Ack Time: 4ms
95t Ack Time: 6ms
Avg Consume Time: 1431ms
Min Consume Time: 268ms
Max Consume Time: 2075ms
Med Consume Time: 1639ms
95t Consume Time: 2070ms
You can download the benchmark and run it yourself:
https://github.com/MassTransit/MassTransit-Benchmark

In Moon APNS, what is the logic behind fetching valid tokens, in GetFeedBack method?

I have been trying to figure out list of valid tokens for my Apple app using MoonAPNS library.
While using GetFeedBack() the count of received tokens is varying drastically after few minutes.
In first attempt it was around 8000.
Then it was 0
in third attempt it was 1
& again it was 0.
I am using it with valid production certificate. & Have successfully pushed notifications using same certificate to sampled devices.
I do not understand the logic of the code & on what basis it is providing received tokens.
The code downloaded from Moon APNS is as below.
public List<Feedback> GetFeedBack()
{
try
{
var feedbacks = new List<Feedback>();
Logger.Info("Connecting to feedback service.");
if (!_conected)
Connect(_feedbackHost, FeedbackPort, _certificates);
if (_conected)
{
//Set up
byte[] buffer = new byte[38];
int recd = 0;
DateTime minTimestamp = DateTime.Now.AddYears(-1);
//Get the first feedback
recd = _apnsStream.Read(buffer, 0, buffer.Length);
Logger.Info("Feedback response received.");
if (recd == 0)
Logger.Info("Feedback response is empty.");
//Continue while we have results and are not disposing
while (recd > 0)
{
Logger.Info("processing feedback response");
var fb = new Feedback();
//Get our seconds since 1970 ?
byte[] bSeconds = new byte[4];
byte[] bDeviceToken = new byte[32];
Array.Copy(buffer, 0, bSeconds, 0, 4);
//Check endianness
if (BitConverter.IsLittleEndian)
Array.Reverse(bSeconds);
int tSeconds = BitConverter.ToInt32(bSeconds, 0);
//Add seconds since 1970 to that date, in UTC and then get it locally
fb.Timestamp = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc).AddSeconds(tSeconds).ToLocalTime();
//Now copy out the device token
Array.Copy(buffer, 6, bDeviceToken, 0, 32);
fb.DeviceToken = BitConverter.ToString(bDeviceToken).Replace("-", "").ToLower().Trim();
//Make sure we have a good feedback tuple
if (fb.DeviceToken.Length == 64 && fb.Timestamp > minTimestamp)
{
//Raise event
//this.Feedback(this, fb);
feedbacks.Add(fb);
}
//Clear our array to reuse it
Array.Clear(buffer, 0, buffer.Length);
//Read the next feedback
recd = _apnsStream.Read(buffer, 0, buffer.Length);
}
//clode the connection here !
Disconnect();
if (feedbacks.Count > 0)
Logger.Info("Total {0} feedbacks received.", feedbacks.Count);
return feedbacks;
}
}
catch (Exception ex)
{
Logger.Error("Error occurred on receiving feed back. - " + ex.Message);
return null;
}
return null;
}
}

Personally I have not used Moon APNS, but the way the APNS Feedback Service works is every time you call it, it will return inactive tokens since the last time you called the feedback service. And that explains the pattern you are seeing here.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

ZMQ PUB/SUB socket high latency under low load - zeromq

Related

QueueBrowser vs MessageConsumer

Transmitting 1500KB (hex files )data over UDS using CAPL test module

quarkus http calls load test results 1000 requests - 16 seconds vs 65 seconds

What is the overhead of Masstransit when running on RabbitMQ?

In Moon APNS, what is the logic behind fetching valid tokens, in GetFeedBack method?

Categories

Resources