How do we configure TLS max record size in JSSE (with SunJCE provider) with JDK 1.8? Is the TLS record size hardcoded to 16K bytes. We care a lot about latency in inter-service calls and want to experiment with smaller TLS record size.
There are a lot of articles on TLS record size and how a large size may be detrimental (e.g., http://chimera.labs.oreilly.com/books/1230000000545/ch04.html#TLS_RECORD_SIZE)
Thanks,
Arvind
It is not hardcoded: it depends on the cipher suite in use; but it's in the vicinity of 16k.
SSLSocket doesn't do any buffering, so you can control the maximum size actually used via a BufferedOutputStream constructed with the buffer size paramater. For example, the default of 8k will map to whatever the cipher suite needs in terms of ciphertext length, but it will be less than 16k. You would have to experiment a bit to find the size needed for the target record size.
Related
ZigZag requires a lot of overhead to write/read numbers. Actually I was stunned to see that it doesn't just write int/long values as they are, but does a lot of additional scrambling. There's even a loop involved:
https://github.com/mardambey/mypipe/blob/master/avro/lang/java/avro/src/main/java/org/apache/avro/io/DirectBinaryEncoder.java#L90
I don't seem to be able to find in Protocol Buffers docs or in Avro docs, or reason myself, what's the advantage of scrambling numbers like that? Why is it better to have positive and negative numbers alternated after encoding?
Why they're not just written in little-endian, big-endian, network order which would only require reading them into memory and possibly reverse bit endianness? What do we buy paying with performance?
It is a variable length 7-bit encoding. The first byte of the encoded value has it high bit set to 0, subsequent bytes have it at 1. Which is the way the decoder can tell how many bytes were used to encode the value. Byte order is always little-endian, regardless of the machine architecture.
It is an encoding trick that permits writing as few bytes as needed to encode the value. So an 8 byte long with a value between -64 and 63 takes only one byte. Which is common, the range provided by long is very rarely used in practice.
Packing the data tightly without the overhead of a gzip-style compression method was the design goal. Also used in the .NET Framework. The processor overhead needed to en/decode the value is inconsequential. Already much lower than a compression scheme, it is a very small fraction of the I/O cost.
While testing I observed that media attachments more than 4194304 bytes are rejected by the proxy. It throws the message - "HTTP content length exceeded 4194304 bytes".
Is this the functionality of LittleProxy implementation or there is any configuration which will allow bigger size of attachments than this?
This sounds like a TooLongFrameException thrown by HttpObjectAggregator.
The only way that I think this could happen is when using an HttpFiltersSource that specifies a non-zero value from getMaximumRequestBufferSizeInBytes() or getMaximumResponseBufferSizeInBytes(). You can increase those, but that increases your memory usage. If the filter can be rewritten so as to work with frames (chunks) as they stream through, then you can set the buffer size to 0 and dramatically reduce your memory consumption.
How can I calculate storage when FTPing to MainFrame? I was told LRECL will always remain '80'. Not sure how I can calculate PRI and SEC dynamically based on the file size...
QUOTE SITE LRECL=80 RECFM=FB CY PRI=100 SEC=100
If the site has SMS, you shouldn't need to, but if you need to calculate the number of tracks is the size of the file in bytes divided by 56,664, or the number of cylinders is the size of the file in bytes divided by 849,960. In either case, you would round up.
Unfortunately IBM's FTP server does not support the newer space allocation specifications in number of records (the JCL parameter AVGREC=U/M/K plus the record length as the first specification in the SPACE parameter).
However, there is an alternative, and that is to fall back on one of the lesser-used SPACE parameters - the blocksize specification. I will assume 3390 disk types for simplicity, and standard data sets.
For fixed-length records, you want to calculate the largest number that will fit in half a track (27994 bytes), because z/OS only supports block sizes up to 32760. Since you are dealing with 80-byte records, that number is 27290. Divide your file size by that number and that will give you the number of blocks. Then in a SITE server command, specify
SITE BLKSIZE=27920 LRECL=80 RECFM=FB BLOCKS=27920 PRI=calculated# SEC=a_little_extra
This is equivalent to SPACE=(27920,(calculated#,a_little_extra)).
z/OS space allocation calculates the number of tracks required and rounds up to the nearest track boundary.
For variable-length records, if your reading application can handle it, always use BLKSIZE=27994. The reason I have the warning about the reading application is that even today there are applications from ISVs that still have strange hard-coded maximum variable length blocks such as 12K.
If you are dealing with PDSEs, always use BLKSIZE=32760 for variable-length and the closest-to-32760 for fixed-length in your specification (32720 for FB/80), but calculate requirements based on BLKSIZE=4096. PDSEs are strange in their underlying layout; the physical records are 4096 bytes, which is because there is some linear data set VSAM code that handles the physical I/O.
I am trying to store a wordlist in redis. The performance is great.
My approach is of making a set called "words" and adding each new word via 'sadd'.
When adding a file thats 15.9 MB and contains about a million words, the redis-server process consumes 160 MB of ram. How come I am using 10x the memory, is there any better way of approaching this problem?
Well this is expected of any efficient data storage: the words have to be indexed in memory in a dynamic data structure of cells linked by pointers. Size of the structure metadata, pointers and memory allocator internal fragmentation is the reason why the data take much more memory than a corresponding flat file.
A Redis set is implemented as a hash table. This includes:
an array of pointers growing geometrically (powers of two)
a second array may be required when incremental rehashing is active
single-linked list cells representing the entries in the hash table (3 pointers, 24 bytes per entry)
Redis object wrappers (one per value) (16 bytes per entry)
actual data themselves (each of them prefixed by 8 bytes for size and capacity)
All the above sizes are given for the 64 bits implementation. Accounting for the memory allocator overhead, it results in Redis taking at least 64 bytes per set item (on top of the data) for a recent version of Redis using the jemalloc allocator (>= 2.4)
Redis provides memory optimizations for some data types, but they do not cover sets of strings. If you really need to optimize memory consumption of sets, there are tricks you can use though. I would not do this for just 160 MB of RAM, but should you have larger data, here is what you can do.
If you do not need the union, intersection, difference capabilities of sets, then you may store your words in hash objects. The benefit is hash objects can be optimized automatically by Redis using zipmap if they are small enough. The zipmap mechanism has been replaced by ziplist in Redis >= 2.6, but the idea is the same: using a serialized data structure which can fit in the CPU caches to get both performance and a compact memory footprint.
To guarantee the hash objects are small enough, the data could be distributed according to some hashing mechanism. Assuming you need to store 1M items, adding a word could be implemented in the following way:
hash it modulo 10000 (done on client side)
HMSET words:[hashnum] [word] 1
Instead of storing:
words => set{ hi, hello, greetings, howdy, bonjour, salut, ... }
you can store:
words:H1 => map{ hi:1, greetings:1, bonjour:1, ... }
words:H2 => map{ hello:1, howdy:1, salut:1, ... }
...
To retrieve or check the existence of a word, it is the same (hash it and use HGET or HEXISTS).
With this strategy, significant memory saving can be done provided the modulo of the hash is
chosen according to the zipmap configuration (or ziplist for Redis >= 2.6):
# Hashes are encoded in a special way (much more memory efficient) when they
# have at max a given number of elements, and the biggest element does not
# exceed a given threshold. You can configure this limits with the following
# configuration directives.
hash-max-zipmap-entries 512
hash-max-zipmap-value 64
Beware: the name of these parameters have changed with Redis >= 2.6.
Here, modulo 10000 for 1M items means 100 items per hash objects, which will guarantee that all of them are stored as zipmaps/ziplists.
As for my experiments, It is better to store your data inside a hash table/dictionary . the best ever case I reached after a lot of benchmarking is to store inside your hashtable data entries that are not exceeding 500 keys.
I tried standard string set/get, for 1 million keys/values, the size was 79 MB. It is very huge in case if you have big numbers like 100 millions which will use around 8 GB.
I tried hashes to store the same data, for the same million keys/values, the size was increasingly small 16 MB.
Have a try in case if anybody needs the benchmarking code, drop me a mail
Did you try persisting the database (BGSAVE for example), shutting the server down and getting it back up? Due to fragmentation behavior, when it comes back up and populates its data from the saved RDB file, it might take less memory.
Also: What version of Redis to you work with? Have a look at this blog post - it says that fragmentation has partially solved as of version 2.4.
I'm trying to get somepython code to decrypt data that was encrypted using the OS X CommonCrypto APIs. There is little to no documentation on the exact options that CommonCrypto uses, so I'm needing some help figuring out what options to set in PyCrypto.
Specifically, my CommonCrypto decryption setup call is:
CCCryptorCreateWithMode(kCCDecrypt, kCCModeCFB, kCCAlgorithmAES128, ccDefaultPadding, NULL, key, keyLength, NULL, 0, 0, 0, &mAESKey);
My primary questions are:
Since there is both a kCCModeCFB and kCCModeCFB8, what is CommonCrypto's definition of CFB mode - what segment size, etc?
What block size is the CommonCrypto AES128 using? 16 or 128?
What is the default padding, and does it even matter in CFB mode?
Currently, the first 4 bytes of data is decrypting successfully with PyCrypto *as long as I set the segment_size to 16*.
Ideas?
Without knowing CommonCrypto or PyCrypto, some partial answers:
AES (in all three variants) has a block size of 128 bits, which are 16 bytes.
CFB (cipher feedback mode) would actually also work without padding (i.e. with a partial last block), since for each
block the ciphertext is created as the XOR of plaintext with some keystream block, which only depends on previous blocks.
(You still can use any padding you want.)
If you can experiment with some known data, first have a look at the ciphertext size. If it is not a multiple of a
full block (and the same as the plaintext + IV), then it is quite likely no padding.
Otherwise, decrypt it with noPadding mode, have a look at the result, and compare with the different known padding modes.
From a glance at the source code, it might be PKCS#5-padding.
CFB8 is a variant of CFB which uses only the top 8 bits (= one byte) of each block cipher call output (which takes the
previous 128 bits (= 16 bytes) of ciphertext (or IV) as input). This needs 16 times as many block cipher calls, but
allows partial sending of a stream without having to worry about block boundaries.
There is another definition of CFB which includes a segment size - here the segment size is the number of
bits (or bytes) to be used from each cipher output. In this definition, the "plain" CFB would have a segment size of 128 bits (= 16 bytes), CFB8 would have a segment size of 8 bits (one byte).