I need implement random number generator by myself in Pytorch, and in my knowledge, all kinds of RNG need unsigned int in the implementation. However PyTorch only supports uint8 tensors but not uint32, uint64. Is there an RNG implementation that uses only int instead of uint?
Related
Consider the following code snippet in C++ :(visual studio 2015)
First Block
const int size = 500000000;
int sum =0;
int *num1 = new int[size];//initialized between 1-250
int *num2 = new int[size];//initialized between 1-250
for (int i = 0; i < size; i++)
{
sum +=(num1[i] / num2[i]);
}
Second Block
const int size = 500000000;
int sum =0;
float *num1 = new float [size]; //initialized between 1-250
float *num2 = new float [size]; //initialized between 1-250
for (int i = 0; i < size; i++)
{
sum +=(num1[i] / num2[i]);
}
I expected that first block runs faster because it is integer operation . But the Second block is considerably faster , although it is floating point operation . here is results of my bench mark :
Division:
Type Time
uint8 879.5ms
uint16 885.284ms
int 982.195ms
float 654.654ms
As well as floating point multiplication is faster than integer multiplication.
here is results of my bench mark :
Multiplication:
Type Time
uint8 166.339ms
uint16 524.045ms
int 432.041ms
float 402.109ms
My system spec: CPU core i7-7700 ,Ram 64GB,Visual studio 2015
Floating point number division is faster than integer division because of the exponent part in floating point number representation. To divide one exponent by another one plain subtraction is used.
int32_t division requires fast division of 31-bit numbers, whereas float division requires fast division of 24-bit mantissas (the leading one in mantissa is implied and not stored in a floating point number) and faster subtraction of 8-bit exponents.
See an excellent detailed explanation how division is performed in CPU.
It may be worth mentioning that SSE and AVX instructions only provide floating point division, but no integer division. SSE instructions/intrinsincs can be used to quadruple the speed of your float calculation easily.
If you look into Agner Fog's instruction tables, for example, for Skylake, the latency of the 32-bit integer division is 26 CPU cycles, whereas the latency of the SSE scalar float division is 11 CPU cycles (and, surprisingly, it takes the same time to divide four packed floats).
Also note, in C and C++ there is no division on numbers shorter that int, so that uint8_t and uint16_t are first promoted to int and then the division of ints happens. uint8_t division looks faster than int because it has fewer bits set when converted to int which causes the division to complete faster.
Problem
I am working with flash memory optimization of STM32F051. It's revealed, that conversion between floatand int types consumes a lot of flash.
Digging into this, it turned out that the conversion to int takes around 200 bytes of flash memory; while the conversion to unsigned int takes around 1500 bytes!
It’s known, that both int and unsigned int differ only by the interpretation of the ‘sign’ bit, so such behavior – is a great mystery for me.
Note: Performing the 2-stage conversion float -> int -> unsigned int also consumes only around 200 bytes.
Questions
Analyzing that, I have such questions:
1) What is a mechanism of the conversion of float to unsigned int. Why it takes so many memory space, when in the same time conversion float->int->unsigned int takes so little memory? Maybe it’s connected with IEEE 754 standard?
2) Are there any problems expected when the conversion float->int->unsigned int is used instead of a direct float ->int?
3) Are there any methods to wrap float -> unsigned int conversion keeping the low memory footprint?
Note: The familiar question has been already asked here (Trying to understand how the casting/conversion is done by compiler,e.g., when cast from float to int), but still there is no clear answer and my question is about the memory usage.
Technical data
Compiler: ARM-NONE-EABI-GCC (gcc version 4.9.3 20141119 (release))
MCU: STM32F051
MCU's core: 32 bit ARM CORTEX-M0
Code example
float -> int (~200 bytes of flash)
int main() {
volatile float f;
volatile int i;
i = f;
return 0;
}
float -> unsigned int (~1500 bytes! of flash)
int main() {
volatile float f;
volatile unsigned int ui;
ui = f;
return 0;
}
float ->int-> unsigned int (~200 bytes of flash)
int main() {
volatile float f;
volatile int i;
volatile unsigned int ui;
i = f;
ui = i;
return 0;
}
There is no fundamental reason for the conversion from float to unsigned int should be larger than the conversion from float to signed int, in practice the float to unsigned int conversion can be made smaller than the float to signed int conversion.
I did some investigations using the GNU Arm Embedded Toolchain (Version 7-2018-q2) and
as far as I can see the size problem is due to a flaw in the gcc runtime library. For some reason this library does not provide an specialized version of the __aeabi_f2uiz function for Arm V6m, instead it falls back on a much larger general version.
In gdb,
(gdb) p -2147483648
$28 = 2147483648
(gdb) pt -2147483648
type = unsigned int
Since -2147483648 is within the range of type int, why is gdb treating it as an unsigned int?
(gdb) pt -2147483647-1
type = int
(gdb) p -2147483647-1
$27 = -2147483648
I suspect that gdb applies the unary negation operator after setting the type of the digit value:
In case 1: gdb parses 2147483648 which overflows int type and becomes unsigned int. Then it applies the negation.
In case 2: 2147483647 is a valid int and stays int when negation and subtraction are subsequently applied.
gdb appears to be following a set of rules for determining the type of a decimal integer literal that are inconsistent with the rules given by the C standard.
I'll assume your system has a 32-bit int and long int types, using 2's-complement and no padding bits (that's a common choice for 32-bit systems, and it's consistent with what you're seeing). Then the ranges of int and unsigned int are:
int: -2147483648 .. +2147483647
unsigned int: 0 .. 4294967295
and the ranges of long int and unsigned long int are the same.
2147483647 is within the range of type int, so that's its type.
Since the value of 2147483648 is outside the range of type int, apparently gdb is choosing to treat it as an unsigned int. And -2147483648 is not an integer literal, it's an expression consisting of a unary - operator applied to the constant 2147483648. Since gdb treats 2147483648 as an unsigned int, it also treats -2147483648 as an unsigned int, and the unary - operator for unsigned types wraps around, yielding 2147483648.
As for -2147483647-1, that's an expression all of whose operands are of type int, and there's no overflow.
In all versions of ISO C, though, an unsuffixed decimal literal can never be of type unsigned int. In C90, its type is the first of:
int
long int
unsigned long int
that can represent its value. Under C99 rules (and later), the type of a decimal integer constant is the first of:
int
long int
long long int
that can represent its value.
I don't know whether there's a way to tell gdb to use C rules for integer literals.
The documentation for GMP seems to list only the following algorithms for random number generation:
gmp_randinit_mt, the Mersenne Twister;
gmp_randinit_lc_2exp and gmp_randinit_lc_2exp_size, linear congruential.
There is also gmp_randinit_default, but it points to gmp_randinit_mt.
Neither the Mersenne Twister nor linear congruential generators should be used for Cryptography.
What do people usually do, then, when they want to use the GMP to build some cryptographic code?
(Using a cryptographic API for encrypting/decrypting/etc doesn't help, because I'd actually implement a new algorithm, which crypto libraries do not have).
Disclaimer: I have only "tinkered" with RNGs, and that was over a year ago.
If you are on a linux box, the solution is relatively simple and non-deterministic. Just open and read a desired number of bits from /dev/urandom. If you need a large number of random bits for your program however, then you might want to use a smaller number of bits from /dev/urandom as seeds for a PRNG.
boost offers a number of PRNGs and a non-deterministic RNG, random_device. random_device uses the very same /dev/urandom on linux and a similar(IIRC) function on windows, so if you need windows or x-platform.
Of course, you just might want/need to write a function based on your favored RNG using GMP's types and functions.
Edit:
#include<stdio.h>
#include<gmp.h>
#include<boost/random/random_device.hpp>
int main( int argc, char *argv[]){
unsigned min_digits = 30;
unsigned max_digits = 50;
unsigned quantity = 1000; // How many numbers do you want?
unsigned sequence = 10; // How many numbers before reseeding?
mpz_t rmin;
mpz_init(rmin);
mpz_ui_pow_ui(rmin, 10, min_digits-1);
mpz_t rmax;
mpz_init(rmax);
mpz_ui_pow_ui(rmax, 10, max_digits);
gmp_randstate_t rstate;
gmp_randinit_mt(rstate);
mpz_t rnum;
mpz_init(rnum);
boost::random::random_device rdev;
for( unsigned i = 0; i < quantity; i++){
if(!(i % sequence))
gmp_randseed_ui(rstate, rdev.operator ()());
do{
mpz_urandomm(rnum, rstate, rmax);
}while(mpz_cmp(rnum, rmin) < 0);
gmp_printf("%Zd\n", rnum);
}
return 0;
}
The encryption algorithm for Private and CharStrings dictionaries used in type 1 font,
unsigned short int r;
unsigned short int c1 = 52845;
unsigned short int c2 = 22719;
unsigned char Decrypt(cipher)
unsigned char cipher;
{
unsigned char plain;
plain = (cipher ^ (r>>8));
r = (cipher + r) * c1 + c2;
return plain;
}
the Type 1 Spec states,
This layer of encryption is intended to protect some ofthe hint information in thePrivate dictionary from casual inspection
Could you explain what is casual inspection and why adobe design the enryption algorithm as it is now?
Thanks very much.
Casual encryption in this case is anything not involving serious cryptanalysis. Note that all inputs of Decrypt are linear (albeit in a modulo ring), and would be fairly easy to decrypt for someone versed in cryptanalysis. However, many people are not, the data are probably not worth real encryption, and the encryption and decryption is computationally easy and cheap.