I was writing a simple program for learning the concept of Queue, the program is attached below,
#include <stdio.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "freertos/queue.h"
QueueHandle_t xQueue_handle;
void vSenderFunction(void* params){
uint16_t count=100;
while(true){
count += 50;
BaseType_t xStatus;
uint32_t dataSent = (uint32_t) params + count;
printf("SENDER::the data received is %d and is about to add to Queue\n", dataSent);
xStatus = xQueueSendToBack(xQueue_handle, &dataSent, 0);
printf("SENDER::Added to Queue\n");
vTaskDelay(300/portTICK_PERIOD_MS);
}
}
void vReceiverFunction(void* params){
uint16_t count=0;
uint32_t vDataReceived;
while(true){
count++;
uint8_t vItemsInQueue = uxQueueMessagesWaiting(xQueue_handle);
printf("RECEIVER:: %d.items in queue %d \n",count,vItemsInQueue);
xQueueReceive(xQueue_handle, &vDataReceived, 0);
printf("RECEIVER:: \t\t data RECEIVED is %d\n", vDataReceived);
vTaskDelay(1000/portTICK_PERIOD_MS);
}
}
void app_main(void)
{
xQueue_handle = xQueueCreate(5, sizeof(uint32_t));
if(xQueue_handle !=NULL){
printf("Queue creation successfull.\r\n");
xTaskCreate(vSenderFunction, "sender1", 4096, (void*) 100, 2, NULL);
xTaskCreate(vReceiverFunction, "Receiver1", 4096, (void*) 100, 1, NULL);
}else{
printf("Queue creation failed.\r\n");
}
}
in the main function I have assigned the priority of teh Receiver to 1, and the sender to 2 , But when executing the Receiver is getting executed first. out put is attached below
I (306) cpu_start: Starting scheduler on PRO CPU.
I (0) cpu_start: Starting scheduler on APP CPU.
Queue creation successfull.
RECEIVER:: 1.items in queue 0
SENDER::the data received is 250 and is about to add to Queue
RECEIVER:: data RECEIVED is 1073421060
SENDER::Added to Queue
SENDER::the data received is 300 and is about to add to Queue
SENDER::Added to Queue
SENDER::the data received is 350 and is about to add to Queue
SENDER::Added to Queue
SENDER::the data received is 400 and is about to add to Queue
SENDER::Added to Queue
RECEIVER:: 2.items in queue 4
RECEIVER:: data RECEIVED is 250
SENDER::the data received is 450 and is about to add to Queue
SENDER::Added to Queue
SENDER::the data received is 500 and is about to add to Queue
SENDER::Added to Queue
SENDER::the data received is 550 and is about to add to Queue
SENDER::Added to Queue
RECEIVER:: 3.items in queue 5
RECEIVER:: data RECEIVED is 300
SENDER::the data received is 600 and is about to add to Queue
SENDER::Added to Queue
SENDER::the data received is 650 and is about to add to Queue
SENDER::Added to Queue
SENDER::the data received is 700 and is about to add to Queue
SENDER::Added to Queue
SENDER::the data received is 750 and is about to add to Queue
SENDER::Added to Queue
RECEIVER:: 4.items in queue 5
RECEIVER:: data RECEIVED is 350
SENDER::the data received is 800 and is about to add to Queue
SENDER::Added to Queue
SENDER::the data received is 850 and is about to add to Queue
SENDER::Added to Queue
SENDER::the data received is 900 and is about to add to Queue
SENDER::Added to Queue
RECEIVER:: 5.items in queue 5
RECEIVER:: data RECEIVED is 400
SENDER::the data received is 950 and is about to add to Queue
SENDER::Added to Queue
SENDER::the data received is 1000 and is about to add to Queue
SENDER::Added to Queue
SENDER::the data received is 1050 and is about to add to Queue
SENDER::Added to Queue
RECEIVER:: 6.items in queue 5
RECEIVER:: data RECEIVED is 450
SENDER::the data received is 1100 and is about to add to Queue
SENDER::Added to Queue
SENDER::the data received is 1150 and is about to add to Queue
SENDER::Added to Queue
SENDER::the data received is 1200 and is about to add to Queue
SENDER::Added to Queue
SENDER::the data received is 1250 and is about to add to Queue
SENDER::Added to Queue
RECEIVER:: 7.items in queue 5
RECEIVER:: data RECEIVED is 500
SENDER::the data received is 1300 and is about to add to Queue
SENDER::Added to Queue
SENDER::the data received is 1350 and is about to add to Queue
SENDER::Added to Queue
SENDER::the data received is 1400 and is about to add to Queue
SENDER::Added to Queue
RECEIVER:: 8.items in queue 5
RECEIVER:: data RECEIVED is 600
Done
What is the program workflow here and What am I missing here.
Related
I am using a stm32f103c8 and I need a function that will return the correct time in microseconds when called from within an interrupt handler. I found the following bit of code online which proports to do that:
uint32_t microsISR()
{
uint32_t ret;
uint32_t st = SysTick->VAL;
uint32_t pending = SCB->ICSR & SCB_ICSR_PENDSTSET_Msk;
uint32_t ms = UptimeMillis;
if (pending == 0)
ms++;
return ms * 1000 - st / ((SysTick->LOAD + 1) / 1000);
}
My understanding of how this works is uses the system clock counter which repeatedly counts down from 8000 (LOAD+1) and when it reaches zero, an interrupt is generated which increments the variable UptimeMills. This gives the time in milliseconds. To get microseconds we get the current value of the system clock counter and divide it by 8000/1000 to give the offset in microseconds. Since the counter is counting down we subtract it from the current time in milliseconds * 1000. (Actually to be correct I believe one should have be added to the # milliseconds in this calculation).
This is all fine and good unless, when this function is called (in an interrupt handler), the system clock counter has already wrapped but the system clock interrupt has not yet been called, then UptimeMillis count will be off by one. This is the purpose of the following lines:
if (pending == 0)
ms++;
Looking at this does not make sense, however. It is incrementing the # ms if there is NO pending interrupt. Indeed if I use this code, I get a large number of glitches in the returned time at the points at which the counter rolls over. So I changed the lines to:
if (pending != 0)
ms++;
This produced much better results but I still get the occasional glitch (about 1 in every 2000 interrupts) which always occurs at a time when the counter is rolling over.
During the interrupt, I log the current value of milliseconds, microseconds and counter value. I find there are two situations where I get an error:
Milli Micros DT Counter Pending
1 1661 1660550 826 3602 0
2 1662 1661374 824 5010 0
3 1663 1662196 822 6436 0
4 1663 1662022 -174 7826 0
5 1664 1663847 1825 1228 0
6 1665 1664674 827 2614 0
7 1666 1665501 827 3993 0
The interrupts are comming in at a regular rate of about 820us. In this case what seems to be happening between interrupt 3 and 4 is that the counter has wrapped but the pending flag is NOT set. So I need to be adding 1000 to the value and since I fail to do so I get a negative elapsed time.
The second situation is as follows:
Milli Micros DT Counter Pending
1 1814 1813535 818 3721 0
2 1815 1814357 822 5151 0
3 1816 1815181 824 6554 0
4 1817 1817000 1819 2 1
5 1817 1816817 -183 1466 0
6 1818 1817637 820 2906 0
This is a very similar situation except in this case the counter has NOT yet wrapped and yet I am already getting the pending interrupt flag which causes me to erronously add 1000.
Clearly there is some kind of race condition between the two competing interrupts. I have tried setting the clock interrupt priority both above and below that of the external interrupt but the problem persists.
Does anyone have any suggestions how to deal with this problem or a suggestion for a different approach to get the time is microseconds within an interrupt handler.
Read UptimeMillis before and after SysTick->VAL to ensure a rollover has not occurred.
uint32_t microsISR()
{
uint32_t ms = UptimeMillis;
uint32_t st = SysTick->VAL;
// Did UptimeMillis rollover while reading SysTick->VAL?
if (ms != UptimeMillis)
{
// Rollover occurred so read both again.
// Must read both because we don't know whether the
// rollover occurred before or after reading SysTick->VAL.
// No need to check for another rollover because there is
// no chance of another rollover occurring so quickly.
ms = UptimeMillis;
st = SysTick->VAL;
}
return ms * 1000 - st / ((SysTick->LOAD + 1) / 1000);
}
Or here is the same idea in a do-while loop.
uint32_t microsISR()
{
uint32_t ms;
uint32_t st;
// Read UptimeMillis and SysTick->VAL until
// UptimeMillis doesn't rollover.
do
{
ms = UptimeMillis;
st = SysTick->VAL;
} while (ms != UptimeMillis);
return ms * 1000 - st / ((SysTick->LOAD + 1) / 1000);
}
Summary
I see very high latency at the receiver using zmq.PUSH/PULL sockets.
Details
I'm new to ZeroMQ, and am trying to send a few MBytes from the sender to the receiver (my final aim is to send a camera stream from the sender to the receiver). The sender and receiver are different machines on the same local network (one is a Mac and the other is running Ubuntu 18.04, if that matters). The sender is able to send its packets very fast (I'm guessing they get buffered in some zmq/tcp queue), but the receiver receives them very slowly with increasing latency per packet.
Here's the sender :
1 """
2 Sender
3 """
4 import zmq
5 from time import time
6 import sys
7
8 addr = "tcp://192.168.86.33:5555"
9
10 context = zmq.Context()
11 socket = context.socket(zmq.PUSH)
12 socket.bind(addr)
13
14 num = 0
15 while True:
16 frame = [0] * 1000000
17 ts = time()
18 num += 1
19 socket.send_pyobj(dict(num=num, frame=frame, ts=ts))
20
21 delay = time() - ts
22 print("Sender:: pkt_num: {}, delay: {}".format(num, delay))
And the receiver :
1 """
2 Receiver
3 """
4 import zmq
5 from time import time
6
7 addr = "tcp://192.168.86.33:5555"
8 context = zmq.Context()
9
10 socket = context.socket(zmq.PULL)
11 socket.connect(addr)
12
13 while True:
14 msg = socket.recv_pyobj()
15 frame = msg['frame']
16 num = msg['num']
17 ts = msg['ts']
18
19 delay = time() - ts
20
21 if True:
22 print("Receiver:: pkt_num: {} latency: {}".format(num, delay))
When I run this, I see the sender is able to send its packets very quickly :
Sender:: pkt_num: 1, delay: 0.026965618133544922
Sender:: pkt_num: 2, delay: 0.018309354782104492
Sender:: pkt_num: 3, delay: 0.01821303367614746
Sender:: pkt_num: 4, delay: 0.016669273376464844
Sender:: pkt_num: 5, delay: 0.01674652099609375
Sender:: pkt_num: 6, delay: 0.01668095588684082
Sender:: pkt_num: 7, delay: 0.015082836151123047
Sender:: pkt_num: 8, delay: 0.014363527297973633
Sender:: pkt_num: 9, delay: 0.014063835144042969
Sender:: pkt_num: 10, delay: 0.014398813247680664
But the receiver sees very high and growing packet latencies :
Receiver:: pkt_num: 1 latency: 0.1272585391998291
Receiver:: pkt_num: 2 latency: 0.2539491653442383
Receiver:: pkt_num: 3 latency: 0.40800905227661133
Receiver:: pkt_num: 4 latency: 0.5737316608428955
Receiver:: pkt_num: 5 latency: 0.7272651195526123
Receiver:: pkt_num: 6 latency: 0.9418754577636719
Receiver:: pkt_num: 7 latency: 1.0799565315246582
Receiver:: pkt_num: 8 latency: 1.228663682937622
Receiver:: pkt_num: 9 latency: 1.3731486797332764
Receiver:: pkt_num: 10 latency: 1.5067603588104248
I tried swapping the sender and receiver between the Mac and Linux machines, and saw the same behavior. Since my goal is to send a video stream from the sender to the receiver, these high latencies make this unusable for that purpose.
Edit 1
Based on user3666197's suggestion, I edited the sender/receiver test code to remove some overheads. On the sender side keep sending the same dict. I also added more prints.
Sender:
14 num = 0
15 frame = [0] * 1000000
16 payload = dict(num=0, frame=frame, ts=0.0)
17 while True:
18 payload['num'] += 1
19 payload['ts'] = time()
20 socket.send_pyobj(payload)
21
22 delay = time() - payload['ts']
23 print("Sender:: pkt_num: {:>6d}, delay: {:6f}" \
24 .format(payload['num'], delay))
Receiver
10 socket = context.socket(zmq.PULL)
11 socket.connect(addr)
12 clk = zmq.Stopwatch()
13 clk.start()
14
15 while True:
16 iterT = clk.stop()
17 clk.start()
18 msg = socket.recv_pyobj()
19 rcvT = clk.stop()
20 delay = time() - msg['ts']
21
22 print("Server:: pkt_num: {:>6d} latency: {:>6f} iterT: {} rcvT: {}" \
23 .format(msg['num'], delay, iterT, rcvT))
24 clk.start()
The sender's per packet delay has reduced further. An interesting datapoint revealed is that the receiver takes almost 0.15s to receive each packet, which seems to be the main problem.
Sender:: pkt_num: 1, delay: 1.797830
Sender:: pkt_num: 2, delay: 0.025297
Sender:: pkt_num: 3, delay: 0.019500
Sender:: pkt_num: 4, delay: 0.019500
Sender:: pkt_num: 5, delay: 0.018166
Sender:: pkt_num: 6, delay: 0.017320
Sender:: pkt_num: 7, delay: 0.017258
Sender:: pkt_num: 8, delay: 0.017277
Sender:: pkt_num: 9, delay: 0.017426
Sender:: pkt_num: 10, delay: 0.017340
In the receiver's prints, rcvT is the per-packet receive time in microseconds.
Server:: pkt_num: 1 latency: 2.395570 iterT: 1 rcvT: 331601
Server:: pkt_num: 2 latency: 0.735229 iterT: 1 rcvT: 137547
Server:: pkt_num: 3 latency: 0.844345 iterT: 1 rcvT: 134385
Server:: pkt_num: 4 latency: 0.991852 iterT: 1 rcvT: 166980
Server:: pkt_num: 5 latency: 1.089429 iterT: 2 rcvT: 117047
Server:: pkt_num: 6 latency: 1.190770 iterT: 2 rcvT: 119466
Server:: pkt_num: 7 latency: 1.348077 iterT: 2 rcvT: 174566
Server:: pkt_num: 8 latency: 1.460732 iterT: 1 rcvT: 129858
Server:: pkt_num: 9 latency: 1.585445 iterT: 2 rcvT: 141948
Server:: pkt_num: 10 latency: 1.717757 iterT: 1 rcvT: 149666
Edit 2
I implemented the solution pointed out in this answer that uses PUB/SUB, and it works perfectly; I get 30fps video on the receiver. I still do not understand why my PUSH/PULL sample sees a delay.
Q : "ZeroMQ: very high latency with PUSH/PULL"
The ZeroMQ PUSH/PULL archetype is a part of the story. Let's decompose the claimed latency :
1 )
Let's measure the actual payload assembly costs [us] :
aClk = zmq.Stopwatch()
num = 0
while True:
aClk.start() #--------------------------------------- .start() the microsecond timer
frame = [0] * 1000000
ts = time()
num += 1
aPayLOAD = dict( num = num,
frame = frame,
ts = ts
)
_1 = aClk.stop() #---------------------------------- .stop() the microsecond timer
aPayLOAD['ts'] = time()
aClk.start() #-------------------------------------- .start() the microsecond timer
socket.send_pyobj( aPayLOAD )
_2 = aClk.stop() #---------------------------------- .stop() the microsecond timer
delay = time() - aPayLOAD['ts']
print( "Sender:: pkt_num: {0:>6d}, assy: {1:>6d} [us] .send_pyobj(): {2:>6d} [us] 'delay': {3:>10.6f} [s] ".format( num, _1, _2, delay ) )
2 )
Doing the same for the receiving side, you have a clear picture of how many [us] got consumed on the Python side, before the ZeroMQ has ever touched the first byte of the payload.
3 )
Next comes the performance tuning :
Avoid Python garbage collection
Improve the wasted memory-management ( avoid creating new instances, better inject data into memory-efficient "handlers" ( numpy.ndarray.data-zones instead of assigning re-composing list-s into a python variable )
refactor the code to avoid expensive dict-based key-value mapping and rather use compact binary maps, as available in struct-module ( your proposed mapping is both static & trivial )
test adding a compression-step ( may improve a bit the over-the-network hop latency )
4 )
Only now the ZeroMQ part comes :
may test improved performance envelopes from adding more IO-threads into either of Context( nIO_threads )-instances
may go in for .setsockopt( zmq.CONFLATE, 1 ) to ignore all non-"last" frames, as video-streaming may get no added value from "late" re-creation of already "old" frames
I have an rsyslog that should just forward messages. It has the regular 514-UDP Port open and receives messages. Forwarding to omfwd-tcp works for a while and than stops.
if $syslogfacility != 1 then {
action(Name="syslog-fwd" Type="omfwd" Target="127.0.0.1" Port="10514" >template="JSONDefaultstr" Action.ResumeInterval="5" Protocol="tcp")
stop
}
In the log i can see the following:
2093.110977082:syslog-fwd queue:Reg/w0: wti 0x55e240948920: wti.c: worker awoke from idle processing
2093.110980024:syslog-fwd queue:Reg/w0: queue.c: DeleteProcessedBatch: we deleted 0 objects and enqueued 0 objects
2093.110982399:syslog-fwd queue:Reg/w0: queue.c: doDeleteBatch: delete batch from store, new sizes: log 1, phys 1
2093.110984879:syslog-fwd queue:Reg/w0: syslog-fwd queue: queue.c: dequeued 1 consumable elements, szlog 0 sz phys 1
2093.110991750:syslog-fwd queue:Reg/w0: ../action.c: action 'syslog-fwd': is transactional - executing in commit phase
2093.110994557:syslog-fwd queue:Reg/w0: omfwd.c: omfwd: beginTransaction
2093.110997258:syslog-fwd queue:Reg/w0: omfwd.c: omfwd: doTryResume 127.0.0.1 iRet 0
2093.110999651:syslog-fwd queue:Reg/w0: ../action.c: action[syslog-fwd] transitioned to state: itx
2093.111002109:syslog-fwd queue:Reg/w0: ../action.c: processBatchMain: i 0, processMsgMain iRet -2121
2093.111004393:syslog-fwd queue:Reg/w0: ../action.c: processBatchMain: i 0, COMM state set
2093.111006850:syslog-fwd queue:Reg/w0: ../action.c: actionCommit[syslog-fwd]: enter, 1 msgs
2093.111009128:syslog-fwd queue:Reg/w0: ../action.c: actionCommit[syslog-fwd]: processing...
2093.111011368:syslog-fwd queue:Reg/w0: ../action.c: actionTryCommit[syslog-fwd] enter
2093.111013724:syslog-fwd queue:Reg/w0: ../action.c: doTransaction: have commitTransaction IF, using that, pWrkrInfo 0x55e2409489f0
2093.111016211:syslog-fwd queue:Reg/w0: ../action.c: entering actionCallCommitTransaction[syslog-fwd], state: itx, nMsgs 1
2093.111018502:syslog-fwd queue:Reg/w0: omfwd.c: omfwd: doTryResume 127.0.0.1 iRet 0
2093.111020942:syslog-fwd queue:Reg/w0: omfwd.c: 127.0.0.1:10514/tcp
2093.111024094:syslog-fwd queue:Reg/w0: omfwd.c: omfwd: add 227 bytes to send buffer (curr offs 0)
2093.111047664:syslog-fwd queue:Reg/w0: omfwd.c: omfwd: TCP sent 227 bytes, requested 227
2093.111051182:syslog-fwd queue:Reg/w0: ../action.c: actionCallCommitTransaction[syslog-fwd] state: itx mod commitTransaction returned 0
2093.111053587:syslog-fwd queue:Reg/w0: ../action.c: action[syslog-fwd] transitioned to state: rdy
2093.111055999:syslog-fwd queue:Reg/w0: ../action.c: actionCommit[syslog-fwd]: return actionTryCommit 0
2093.111058371:syslog-fwd queue:Reg/w0: ../action.c: actionCommit[syslog-fwd]: done, iRet 0
2093.111060964:syslog-fwd queue:Reg/w0: queue.c: regular consumer finished, iret=0, szlog 0 sz phys 1
2093.111063484:syslog-fwd queue:Reg/w0: queue.c: DeleteProcessedBatch: etry 0 state 3
2093.111066649:syslog-fwd queue:Reg/w0: queue.c: DeleteProcessedBatch: we deleted 1 objects and enqueued 0 objects
2093.111069152:syslog-fwd queue:Reg/w0: queue.c: doDeleteBatch: delete batch from store, new sizes: log 0, phys 0
2093.111071641:syslog-fwd queue:Reg/w0: syslog-fwd queue: queue.c: dequeued 0 consumable elements, szlog 0 sz phys 0
2093.111074225:syslog-fwd queue:Reg/w0: queue.c: regular consumer finished, iret=4, szlog 0 sz phys 0
2093.111076514:syslog-fwd queue:Reg/w0: wti.c: syslog-fwd queue:Reg/w0: worker IDLE, waiting for work.
2093.280167252:imtcp.c : nsdpoll_ptcp.c: epoll returned 1 entries
2093.280182600:imtcp.c : tcpsrv.c: tcpsrv: ready to process 1 event entries
...
this works fine... but than suddenly:
2093.280485033:syslog-fwd queue:Reg/w0: wti 0x55e240948920: wti.c: worker awoke from idle processing
2093.280488998:syslog-fwd queue:Reg/w0: queue.c: DeleteProcessedBatch: we deleted 0 objects and enqueued 0 objects
2093.280491486:syslog-fwd queue:Reg/w0: queue.c: doDeleteBatch: delete batch from store, new sizes: log 2, phys 2
2093.280494077:syslog-fwd queue:Reg/w0: syslog-fwd queue: queue.c: dequeued 2 consumable elements, szlog 0 sz phys 2
2093.293312843:imtcp.c : nsdpoll_ptcp.c: epoll returned 1 entries
2093.293326156:imtcp.c : tcpsrv.c: tcpsrv: ready to process 1 event entries
And than "wti 0x55e240948920: wti.c: worker awoke from idle processing" never come up again.
Queue get filled:
2094.037943773:main Q:Reg/w0 : ../action.c: action 'syslog-fwd': called, logging to builtin:omfwd (susp 0/0, direct q 0)
2094.037946442:main Q:Reg/w0 : syslog-fwd queue: queue.c: qqueueAdd: entry added, size now log 11, phys 13 entries
2094.037948880:main Q:Reg/w0 : syslog-fwd queue: queue.c: EnqueueMsg advised worker start
2094.037951334:main Q:Reg/w0 : ../action.c: action 'syslog-fwd': set suspended state to 0
...
2363.077252235:main Q:Reg/w0 : ../action.c: action 'syslog-fwd': called, logging to builtin:omfwd (susp 0/0, direct q 0)
2363.077255829:main Q:Reg/w0 : syslog-fwd queue: queue.c: queue nearly full (3000 entries), but could not drop msg (iRet: 0, severity 6)
2363.077258619:main Q:Reg/w0 : syslog-fwd queue: queue.c: doEnqSingleObject: queue FULL - waiting 2000ms to drain.
And now the funny part: When I add the following rule (before the other)
if $syslogfacility == 4 then {
action(Name="write4" Type="omfile" File="/var/log/syslog4" )
stop
}
Everything works fine. Messages in
Oct 31 07:54:26 otherhost.com sssd_be: GSSAPI client step 2
Oct 31 07:54:27 somehost.com sssd_be: GSSAPI client step 1
Anybody with an hint?
I've observed the same issue myself and have raised it as a bug:
https://github.com/rsyslog/rsyslog/issues/3273
In my case it was not related to omfwd as I sent using omrelp, instead it is imfile that causes the issue.
I try to create a tcp receiver which can get data over a certain port and store it into a file. I use Apache Camel 2.16.1, Spring Boot 1.3.0-RELEASE and netty 4.0.33 for this.
My setup is like that described in the camel-spring-boot starter, the route definition is this:
#Component
public class Routes {
#Bean
RoutesBuilder myRouter() {
return new RouteBuilder() {
#Override
public void configure() throws Exception {
from( "netty4:tcp://192.168.0.148:10001?sync=false&allowDefaultCodec=false&decoder=#decoder")
.to("file:/temp/printjobs");
}
};
}
}
I created a decoder that looks like this:
public class RawPrinterDecoder extends MessageToMessageDecoder<ByteBuf> {
#Override
protected void decode(ChannelHandlerContext ctx, ByteBuf msg,
List<Object> out) throws Exception {
byte[] array = new byte[msg.readableBytes()];
msg.getBytes(0, array);
String string = new String(array);
out.add(string);
System.out.println("Received " + msg.readableBytes()
+ " in a buffer with maximum capacity of " + msg.capacity());
System.out.println("50 first bytes I received: "
+ string.substring(0, 50));
}
}
Data is sent by using this command:
cat binaryfile | nc 192.168.0.148 10001
The route is built but when I use it I am unable to get the former binaryfile in its original shape and rather receive several blocks of data:
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 16384 in a buffer with maximum capacity of 16384
Received 16384 in a buffer with maximum capacity of 16384
Received 16384 in a buffer with maximum capacity of 16384
Received 5357 in a buffer with maximum capacity of 16384
(As you can see my original file is processed in blocks of increasing size and it 70839 byte in size)
The received blocks are each stored in a separate file because I am not able to join the parts:
24.11.2015 23:47 1.024 ID-Tower-53867-1448405258593-0-1
24.11.2015 23:47 1.024 ID-Tower-53867-1448405258593-0-3
24.11.2015 23:47 1.024 ID-Tower-53867-1448405258593-0-5
24.11.2015 23:47 1.024 ID-Tower-53867-1448405258593-0-7
24.11.2015 23:47 1.024 ID-Tower-53867-1448405258593-0-9
24.11.2015 23:47 1.024 ID-Tower-53867-1448405258593-0-11
24.11.2015 23:47 1.024 ID-Tower-53867-1448405258593-0-13
24.11.2015 23:47 1.024 ID-Tower-53867-1448405258593-0-15
24.11.2015 23:47 1.024 ID-Tower-53867-1448405258593-0-17
24.11.2015 23:47 1.024 ID-Tower-53867-1448405258593-0-19
...
How can I identify the first and last block and join them in the decoder ? I made an approach which involved getMaxMessagesPerRead() but this returns 16 and my file is split into 20 blocks of data.
I created a basic TCP server that reads incoming binary data in protocol buffer format, and writes a binary msg as response. I would like to benchmark the the roundtrip time.
I tried iperf, but could not make it send the same input file multiple times. Is there another benchmark tool than can send a binary input file repeatedly?
If you have access to a linux or unix machine1, you should use tcptrace. All you need to do is loop through your binary traffic test while capturing with wireshark or tcpdump file.
After you have that .pcap file2, analyze with tcptrace -xtraffic <pcap_filename>3. This will generate two text files, and the average RTT stats for all connections in that pcap are shown at the bottom of the one called traffic_stats.dat.
[mpenning#Bucksnort tcpperf]$ tcptrace -xtraffic willers.pcap
mod_traffic: characterizing traffic
1 arg remaining, starting with 'willers.pcap'
Ostermann's tcptrace -- version 6.6.1 -- Wed Nov 19, 2003
16522 packets seen, 16522 TCP packets traced
elapsed wallclock time: 0:00:00.200709, 82318 pkts/sec analyzed
trace file elapsed time: 0:03:21.754962
Dumping port statistics into file traffic_byport.dat
Dumping overall statistics into file traffic_stats.dat
Plotting performed at 15.000 second intervals
[mpenning#Bucksnort tcpperf]$
[mpenning#Bucksnort tcpperf]$ cat traffic_stats.dat
Overall Statistics over 201 seconds (0:03:21.754962):
4135308 ttl bytes sent, 20573.672 bytes/second
4135308 ttl non-rexmit bytes sent, 20573.672 bytes/second
0 ttl rexmit bytes sent, 0.000 bytes/second
16522 packets sent, 82.199 packets/second
200 connections opened, 0.995 conns/second
11 dupacks sent, 0.055 dupacks/second
0 rexmits sent, 0.000 rexmits/second
average RTT: 67.511 msecs <------------------
[mpenning#Bucksnort tcpperf]$
The .pcap file used in this example was a capture I generated when I looped through an expect script that pulled data from one of my servers. This was how I generated the loop...
#!/usr/bin/python
from subprocess import Popen, PIPE
import time
for ii in xrange(0,200):
# willers.exp is an expect script
Popen(['./willers.exp'], stdin=PIPE, stdout=PIPE, stderr=PIPE)
time.sleep(1)
You can adjust the sleep time between loops based on your server's accept() performance and the duration of your tests.
END NOTES:
A Knoppix Live-CD will do
Filtered to only capture test traffic
tcptrace is capable of very detailed per-socket stats if you use other options...
================================
[mpenning#Bucksnort tcpperf]$ tcptrace -lr willers.pcap
1 arg remaining, starting with 'willers.pcap'
Ostermann's tcptrace -- version 6.6.1 -- Wed Nov 19, 2003
16522 packets seen, 16522 TCP packets traced
elapsed wallclock time: 0:00:00.080496, 205252 pkts/sec analyzed
trace file elapsed time: 0:03:21.754962
TCP connection info:
200 TCP connections traced:
TCP connection 1:
host c: myhost.local:44781
host d: willers.local:22
complete conn: RESET (SYNs: 2) (FINs: 1)
first packet: Tue May 31 22:52:24.154801 2011
last packet: Tue May 31 22:52:25.668430 2011
elapsed time: 0:00:01.513628
total packets: 73
filename: willers.pcap
c->d: d->c:
total packets: 34 total packets: 39
resets sent: 4 resets sent: 0
ack pkts sent: 29 ack pkts sent: 39
pure acks sent: 11 pure acks sent: 2
sack pkts sent: 0 sack pkts sent: 0
dsack pkts sent: 0 dsack pkts sent: 0
max sack blks/ack: 0 max sack blks/ack: 0
unique bytes sent: 2512 unique bytes sent: 14336
actual data pkts: 17 actual data pkts: 36
actual data bytes: 2512 actual data bytes: 14336
rexmt data pkts: 0 rexmt data pkts: 0
rexmt data bytes: 0 rexmt data bytes: 0
zwnd probe pkts: 0 zwnd probe pkts: 0
zwnd probe bytes: 0 zwnd probe bytes: 0
outoforder pkts: 0 outoforder pkts: 0
pushed data pkts: 17 pushed data pkts: 33
SYN/FIN pkts sent: 1/1 SYN/FIN pkts sent: 1/0
req 1323 ws/ts: Y/Y req 1323 ws/ts: Y/Y
adv wind scale: 6 adv wind scale: 1
req sack: Y req sack: Y
sacks sent: 0 sacks sent: 0
urgent data pkts: 0 pkts urgent data pkts: 0 pkts
urgent data bytes: 0 bytes urgent data bytes: 0 bytes
mss requested: 1460 bytes mss requested: 1460 bytes
max segm size: 792 bytes max segm size: 1448 bytes
min segm size: 16 bytes min segm size: 32 bytes
avg segm size: 147 bytes avg segm size: 398 bytes
max win adv: 40832 bytes max win adv: 66608 bytes
min win adv: 5888 bytes min win adv: 66608 bytes
zero win adv: 0 times zero win adv: 0 times
avg win adv: 14035 bytes avg win adv: 66608 bytes
initial window: 32 bytes initial window: 40 bytes
initial window: 1 pkts initial window: 1 pkts
ttl stream length: 2512 bytes ttl stream length: NA
missed data: 0 bytes missed data: NA
truncated data: 0 bytes truncated data: 0 bytes
truncated packets: 0 pkts truncated packets: 0 pkts
data xmit time: 1.181 secs data xmit time: 1.236 secs
idletime max: 196.9 ms idletime max: 196.9 ms
throughput: 1660 Bps throughput: 9471 Bps
RTT samples: 18 RTT samples: 24
RTT min: 43.8 ms RTT min: 0.0 ms
RTT max: 142.5 ms RTT max: 7.2 ms
RTT avg: 68.5 ms RTT avg: 0.7 ms
RTT stdev: 35.8 ms RTT stdev: 1.6 ms
RTT from 3WHS: 80.8 ms RTT from 3WHS: 0.0 ms
RTT full_sz smpls: 1 RTT full_sz smpls: 3
RTT full_sz min: 142.5 ms RTT full_sz min: 0.0 ms
RTT full_sz max: 142.5 ms RTT full_sz max: 0.0 ms
RTT full_sz avg: 142.5 ms RTT full_sz avg: 0.0 ms
RTT full_sz stdev: 0.0 ms RTT full_sz stdev: 0.0 ms
post-loss acks: 0 post-loss acks: 0
segs cum acked: 0 segs cum acked: 9
duplicate acks: 0 duplicate acks: 1
triple dupacks: 0 triple dupacks: 0
max # retrans: 0 max # retrans: 0
min retr time: 0.0 ms min retr time: 0.0 ms
max retr time: 0.0 ms max retr time: 0.0 ms
avg retr time: 0.0 ms avg retr time: 0.0 ms
sdv retr time: 0.0 ms sdv retr time: 0.0 ms
================================
You can always stick a shell loop around a program like iperf. Also, assuming iperf can read from a file (thus stdin) or programs like ttcp, could allow a shell loop catting a file N times into iperf/ttcp.
If you want a program which sends a file, waits for your binary response, and then sends another copy of the file, you probably are going to need to code that yourself.
You will need to measure the time in the client application for a roundtrip time, or monitor the network traffic going from, and coming to, the client to get the complete time interval. Measuring the time at the server will exclude any kernel level delays in the server and all the network transmission times.
Note that TCP performance will go down as the load goes up. If you're going to test under heavy load, you need professional tools that can scale to thousands (or even millions in some cases) of new connection/second or concurrent established TCP connections.
I wrote an article about this on my blog (feel free to remove if this is considered advertisement, but I think it's relevant to this thread): http://synsynack.wordpress.com/2012/04/09/realistic-latency-measurement-in-the-application-layers
As a very simple highlevel tool netcat comes to mind ... so something like time (nc hostname 1234 < input.binary | head -c 100) assuming the response is 100 bytes long.