Create netcat style consumer with Apache Camel and netty4: route - spring

I try to create a tcp receiver which can get data over a certain port and store it into a file. I use Apache Camel 2.16.1, Spring Boot 1.3.0-RELEASE and netty 4.0.33 for this.
My setup is like that described in the camel-spring-boot starter, the route definition is this:
#Component
public class Routes {
#Bean
RoutesBuilder myRouter() {
return new RouteBuilder() {
#Override
public void configure() throws Exception {
from( "netty4:tcp://192.168.0.148:10001?sync=false&allowDefaultCodec=false&decoder=#decoder")
.to("file:/temp/printjobs");
}
};
}
}
I created a decoder that looks like this:
public class RawPrinterDecoder extends MessageToMessageDecoder<ByteBuf> {
#Override
protected void decode(ChannelHandlerContext ctx, ByteBuf msg,
List<Object> out) throws Exception {
byte[] array = new byte[msg.readableBytes()];
msg.getBytes(0, array);
String string = new String(array);
out.add(string);
System.out.println("Received " + msg.readableBytes()
+ " in a buffer with maximum capacity of " + msg.capacity());
System.out.println("50 first bytes I received: "
+ string.substring(0, 50));
}
}
Data is sent by using this command:
cat binaryfile | nc 192.168.0.148 10001
The route is built but when I use it I am unable to get the former binaryfile in its original shape and rather receive several blocks of data:
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 1024 in a buffer with maximum capacity of 1024
Received 16384 in a buffer with maximum capacity of 16384
Received 16384 in a buffer with maximum capacity of 16384
Received 16384 in a buffer with maximum capacity of 16384
Received 5357 in a buffer with maximum capacity of 16384
(As you can see my original file is processed in blocks of increasing size and it 70839 byte in size)
The received blocks are each stored in a separate file because I am not able to join the parts:
24.11.2015 23:47 1.024 ID-Tower-53867-1448405258593-0-1
24.11.2015 23:47 1.024 ID-Tower-53867-1448405258593-0-3
24.11.2015 23:47 1.024 ID-Tower-53867-1448405258593-0-5
24.11.2015 23:47 1.024 ID-Tower-53867-1448405258593-0-7
24.11.2015 23:47 1.024 ID-Tower-53867-1448405258593-0-9
24.11.2015 23:47 1.024 ID-Tower-53867-1448405258593-0-11
24.11.2015 23:47 1.024 ID-Tower-53867-1448405258593-0-13
24.11.2015 23:47 1.024 ID-Tower-53867-1448405258593-0-15
24.11.2015 23:47 1.024 ID-Tower-53867-1448405258593-0-17
24.11.2015 23:47 1.024 ID-Tower-53867-1448405258593-0-19
...
How can I identify the first and last block and join them in the decoder ? I made an approach which involved getMaxMessagesPerRead() but this returns 16 and my file is split into 20 blocks of data.

Related

Does go use something like space padding for structs? [duplicate]

This question already has answers here:
Sizeof struct in Go
(6 answers)
Closed 4 months ago.
I was playing around in go, and was trying to calculate and get the size of struct objects. And found something interesting, if you take a look at the following structs:
type Something struct {
anInteger int16 // 2 bytes
anotherInt int16 // 2 bytes
yetAnother int16 // 2 bytes
someBool bool // 1 byte
} // I expected 7 bytes total
type SomethingBetter struct {
anInteger int16 // 2 bytes
anotherInt int16 // 2 bytes
yetAnother int16 // 2 bytes
someBool bool // 1 byte
anotherBool bool // 1 byte
} // I expected 8 bytes total
type Nested struct {
Something // 7 bytes expected at first
completingByte bool // 1 byte
} // 8 bytes expected at first sight
But the result I got using unsafe.Sizeof(...) was as following:
Something -> 8 bytes
SomethingBetter -> 8 bytes
Nested -> 12 bytes, still, after finding out that "Something" used 8 bytes, though this might use 9 bytes
I suspect that go does something kind of like padding, but I don't know how and why it does that, is there some formula? Or logics? If it uses space padding, is it done randomly? Or based on some rules?
Yes, we have padding! if your system architecture is 32-bit the word size is 4 bytes and if it is 64-bit, the word size is 8 bytes. Now, what is the word size? "Word size" refers to the number of bits processed by a computer's CPU in one go (these days, typically 32 bits or 64 bits). Data bus size, instruction size, address size are usually multiples of the word size.
For example, suppose this struct:
type data struct {
a bool // 1 byte
b int64 // 8 byte
}
This struct it's not 9 bytes because, when our word size is 8, for first cycle, cpu reads 1 byte of bool and padding 7 bytes for others.
Imagine:
p: padding
+-----------------------------------------+----------------+
| 1-byte bool | p | p | p | p | p | p | p | int-64 |
+-----------------------------------------+----------------+
first 8 bytes second 8 bytes
For better performance, sort your struct items from bigger to small.
This is not good performance:
type data struct {
a string // 16 bytes size 16
b int32 // 4 bytes size 20
// 4 bytes padding size 24
c string // 16 bytes size 40
d int32 // 4 bytes size 44
// 4 bytes padding size 48 - Aligned on 8 bytes
}
Now It's better:
type data struct {
a string // 16 bytes size 16
c string // 16 bytes size 32
d int32 // 4 bytes size 36
b int32 // 4 bytes size 40
// no padding size 40 - Aligned on 5 bytes
}
See here for more examples.

High CPU load on SYN flood

When being under SYN flood attack, my CPU reach to 100% in no time by the kernel proccess named ksoftirqd,
I tried so many mitigations but none solve the problem.
This is my sysctl configurations returned by the sysctl -p:
net.ipv4.tcp_syncookies = 1
net.ipv4.ip_forward = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv6.conf.all.accept_redirects = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
fs.file-max = 10000000
fs.nr_open = 10000000
net.core.somaxconn = 128
net.core.netdev_max_backlog = 2500
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.ip_nonlocal_bind = 1
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_max_orphans = 262144
net.ipv4.tcp_max_syn_backlog = 2048
net.ipv4.tcp_max_tw_buckets = 262144
net.ipv4.tcp_reordering = 3
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 16384 16777216
net.ipv4.tcp_syn_retries = 3
net.ipv4.tcp_tw_reuse = 1
net.netfilter.nf_conntrack_max = 10485760
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 30
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 15
vm.swappiness = 10
net.ipv4.icmp_echo_ignore_all = 1
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv4.tcp_synack_retries = 1
Even after activating the Syn cookies, the CPU stays the same,
The Listen queue of port 443 (the port under attack) is showing 512 SYN_RECV, which is the default backlog limit set by the NGINX.
Which is also wired because the SOMAXCONN is set to a much lower value than 512 (128), so how does it exceed that limit?
SOMAXCONN needs to be the upper boundary for every socket listen and its not..
I read so much and I'm confused,
As far as I understood the SOMAXCONN is the backlog size for both LISTEN and ACCECPT queues,
so what exactly is the tcp_max_syn_backlog?
And how do I calculate each queue size?
I also read that SYN cookies does not activate immediately, but only after reaching the tcp_max_syn_backlog size, is that true?
And if so, it means its value needs to be lower than the SOMAXCONN..
I tried even activating tcp_abort_on_overflow when being under attack but nothing changed,
if its true that the SYN coockies is activate on overflow, applying them togerther result what?
I have 3 gigs of ram that is using only 700MB, my only problem is the CPU load

Why malloc in WebAssembly requires 4x the memory?

I wrote a script in C to allocate memory with malloc() on a infinite loop.
My aim was to realize a simple Denial of Service using WebAssembly by opening multiple tabs and make the browser crash.
I can allocate about 2 GB max for each tab to prevent the tab crash (memory limitation for x64 browsers).
#include <stdlib.h>
#define MAX_MEM 2147483630 //2 GB
int main() {
long int mem_used=209715000;
while(1){
if(mem_used<MAX_MEM){
int *ptr = malloc(sizeof(int));
mem_used+=4;
}
}
return 0;
}
I expected it to work, but instead the tab crashes.
From tests I've made, mem_used+=16 is the right choice to prevent the tab crash.
I don't know deeply the WebAssembly memory management, so my thought is that maybe it requires 4x the memory. Is that correct?
With emscripten, malloc adds some minimum chunk size and then aligns the address to at least 8 byte boundaries. So for small allocations (even zero bytes), malloc will appear to take significantly more space than needed. For big allocations, the overhead will be relatively small.
See comments in dlmalloc.c.
The following program demonstrates how much space malloc takes:
#include <iostream>
int main() {
char *previous, *current;
previous = (char*)malloc(0);
for(int i=0; i<32; ++i) {
current = (char*)malloc(i+1);
std::cout << "malloc(" << i << ") consumed " << (current-previous) << " bytes\n";
previous = current;
}
std::cout << "\n";
previous = (char*)malloc(1);
for(int i=0; i<12; ++i) {
current = (char*)malloc( 1<<(i+1) );
std::cout << "malloc(" << (1<<i) << ") consumed " << (current-previous) << " bytes\n";
previous = current;
}
return 0;
}
This yields the following output:
malloc(0) consumed 16 bytes
malloc(1) consumed 16 bytes
malloc(2) consumed 16 bytes
malloc(3) consumed 16 bytes
malloc(4) consumed 16 bytes
malloc(5) consumed 16 bytes
malloc(6) consumed 16 bytes
malloc(7) consumed 16 bytes
malloc(8) consumed 16 bytes
malloc(9) consumed 16 bytes
malloc(10) consumed 16 bytes
malloc(11) consumed 16 bytes
malloc(12) consumed 16 bytes
malloc(13) consumed 24 bytes
malloc(14) consumed 24 bytes
malloc(15) consumed 24 bytes
malloc(16) consumed 24 bytes
malloc(17) consumed 24 bytes
malloc(18) consumed 24 bytes
malloc(19) consumed 24 bytes
malloc(20) consumed 24 bytes
malloc(21) consumed 32 bytes
malloc(22) consumed 32 bytes
malloc(23) consumed 32 bytes
malloc(24) consumed 32 bytes
malloc(25) consumed 32 bytes
malloc(26) consumed 32 bytes
malloc(27) consumed 32 bytes
malloc(28) consumed 32 bytes
malloc(29) consumed 40 bytes
malloc(30) consumed 40 bytes
malloc(31) consumed 40 bytes
malloc(1) consumed 16 bytes
malloc(2) consumed 16 bytes
malloc(4) consumed 16 bytes
malloc(8) consumed 16 bytes
malloc(16) consumed 24 bytes
malloc(32) consumed 40 bytes
malloc(64) consumed 72 bytes
malloc(128) consumed 136 bytes
malloc(256) consumed 264 bytes
malloc(512) consumed 520 bytes
malloc(1024) consumed 1032 bytes
malloc(2048) consumed 2056 bytes
See full source code in this repo
Your problem is that malloc implementations typically:
a) Include overhead; and
b) Round up to some unit
malloc (sizeof(int)) is using more than sizeof(int) bytes behind the scenes.
In any system, malloc() always slightly uses more memory than you request. Emscripten uses dlmalloc, a popular malloc() implementation, as default. According to Wikipedia:
Memory on the heap is allocated as "chunks", an 8-byte aligned data structure which contains a header, and usable memory. Allocated memory contains an 8 or 16 byte overhead for the size of the chunk and usage flags. Unallocated chunks also store pointers to other free chunks in the usable space area, making the minimum chunk size 16 bytes (32-bit system) and 24 bytes (64-bit system).
This means that even a single byte allocated memory block malloc(1) uses at least 16 bytes to 24 bytes. This is because memory alignment problem and each allocated block needs additional bytes to store metadata of the block. You can easily google how malloc() works to understand why there is such overhead.
Therefore, to meet your purpose, the test should allocate much larger memory block at each iteration to minimize such overhead. I would personally recommend 4kb or 1MB instead of sizeof(int).

How to change the size of the ESP32's UART-TX-FIFO

Configuration of the ESP32's UART_MEM_CONF_REG register does not change the size of the uart TX FIFO as expected.
I'm trying to change the size of UART0's TX FIFO o 512 Bytes.
The FIFO's size (in byte) can be set in UART_MEM_CONF_REG configuring bits 7 to bit 10. (ESP32 TRM V4.0, page 364)
This register is 0x88 by default: 128 Byte TX FIFO and 128 byte RX FIFO. So bit 7 = 1 sets 128 Byte TX FIFO size.
Unfortunately there is no info how to set Bits 7, 8,9, and 10 to change the FIFO size. My first idea was to set bit 8 for 256 bytes size, bit 9 for 512 bytes and bit 10 to 1024 bytes. I intend to use UART0 only, so there's no problem with the other UART's FIFO size.
I tried the following lines:
// Create a byte pattern to send
char buffer[256];
for (int i = 0; i < 256; i++) buffer[i] = i;
// f.e.set bit 8 for (maybe??) 256 bytes TX FIFO size, other configurations has been tested as well
WRITE_PERI_REG(UART_MEM_CONF_REG(uart_num),0x108);
// Start uart driver, no event queue, no TX ringbuffer
uart_driver_install(uart_num, UART_BUF_SIZE, 0, 0, NULL, 0);
// send 256 bytes from a buffer
uart_tx_chars(uart_num, (const char*)buffer, 256);
// but only 128 bytes are sent
At least I expected some change of the TX-FIFO size. But that's not working. The transmission ends after 128 bytes are sent out - no matter how I set the bits 7 to 10 in the UART_MEM_CONF_REG.
What's wrong, what did I miss?

Maximum value of PCR

What is the maximum value of Program Clock Reference(PCR) in MPEG?
I understand that it is derived from a 27MHz clock, periodically loaded into a 42bit register.
PCR(i)=PCR_Base(i) * 300 + PCR_Ext(i)
where PCR_Base is loaded into a 33 bits register
PCR_Ext is loaded into a 9-bit register.
So, the maximum value of PCR w.r.t 27MHz clock is:
PCR = (2^33 - 1)*300 + (2^9 - 1) = 2,576,980,374,811.
=> (2,576,980,374,811/27,000,000) = 95443.7s = 1590.7 min = 26.5 hours
The register overflow happens after 26.5 hours of continuous streaming. Is this understanding correct?
PCR_ext(i) value should be 0 .. 299.
So the maximum PCR = (2^33-1)*300+299 = 2,576,980,377,599

Resources