PRIVATEKEYBLOB to RSA struct of OpenSSL - winapi

how can i convert Microsoft exported key in privatekeyblob to struct RSA, which can be used with openSSL?

On inspection of the documents, it looks like the fields map across pretty intuitively, one to one. Have you tried implementing mapping code using this information? I would give that a go if not.
Here is the Microsoft blob (linked to RSAPUBKEY):
typedef struct _RSAPUBKEY {
DWORD magic;
DWORD bitlen;
DWORD pubexp;
} RSAPUBKEY;
BLOBHEADER blobheader;
RSAPUBKEY rsapubkey;
BYTE modulus[rsapubkey.bitlen/8];
BYTE prime1[rsapubkey.bitlen/16];
BYTE prime2[rsapubkey.bitlen/16];
BYTE exponent1[rsapubkey.bitlen/16];
BYTE exponent2[rsapubkey.bitlen/16];
BYTE coefficient[rsapubkey.bitlen/16];
BYTE privateExponent[rsapubkey.bitlen/8];
Here is the RSA struct:
struct
{
BIGNUM *n; // public modulus
BIGNUM *e; // public exponent
BIGNUM *d; // private exponent
BIGNUM *p; // secret prime factor
BIGNUM *q; // secret prime factor
BIGNUM *dmp1; // d mod (p-1)
BIGNUM *dmq1; // d mod (q-1)
BIGNUM *iqmp; // q^-1 mod p
// ...
};
RSA

Related

CRC32 Calculation for Zero Filled Buffer/File

If I want to calculate the CRC32 value for a large number of consecutive zero bytes, is there a constant time formula I can use given the length of the run of zeros? For example, if I know I have 1000 bytes all filled with zeros, is there a way to avoid a loop with 1000 iterations (just an example, actual number of zeros is unbounded for the sake of this question)?
You can compute the result of applying n zeros not in O(1) time, but in O(log n) time. This is done in zlib's crc32_combine(). A binary matrix is constructed that represents the operation of applying a single zero bit to the CRC. The 32x32 matrix multiplies the 32-bit CRC over GF(2), where addition is replaced by exclusive-or (^) and multiplication is replaced by and (&), bit by bit.
Then that matrix can be squared to get the operator for two zeros. That is squared to get the operator for four zeros. The third one is squared to get the operator for eight zeros. And so on as needed.
Now that set of operators can be applied to the CRC based on the one bits in the number n of zero bits that you want to compute the CRC of.
You can precompute the resulting matrix operator for any number of zero bits, if you happen to know you will be frequently applying exactly that many zeros. Then it is just one matrix multiplication by a vector, which is in fact O(1).
You do not need to use the pclmulqdq instruction suggested in another answer here, but that would be a little faster if you have it. It would not change the O() of the operation.
Time complexity can be reduced to O(1) using a table lookup followed by a multiply. The explanation and example code are shown in the third section of this answer.
If the 1000 is a constant, a precomputed table of 32 values, each representing
each bit of a CRC to 8000th power mod poly could be used. A set of matrices (one set per byte of the CRC) could be used to work with a byte at a time. Both methods would be constant time (fixed number of loops) O(1).
As commented above, if the 1000 is not a constant, then exponentiation by squaring could be used which would be O(log2(n)) time complexity, or a combination of precomputed tables for some constant number of zero bits, such as 256, followed by exponentiation by squaring could be used so that the final step would be O(log2(n%256)).
Optimization in general: for normal data with zero and non-zero elements, on an modern X86 with pclmulqdq (uses xmm registers), a fast crc32 (or crc16) can be implemented, although it's close to 500 lines of assembly code. Intel document: crc using pclmulqdq. Example source code for github fast crc16. For a 32 bit CRC, a different set of constants is needed. If interested, I converted the source code to work with Visual Studio ML64.EXE (64 bit MASM), and created examples for both left and right shift 32 bit CRC's, each with two sets of constants for the two most popular CRC 32 bit polynomials (left shift polys: crc32:0x104C11DB7 and crc32c: 0x11EDC6F41, right shift poly's are bit reversed).
Example code for fast adjustment of CRC using a software based carryless multiply modulo the CRC polyonomial. This will be much faster than using a 32 x 32 matrix multiply. A CRC is calculated for non-zero data: crf = GenCrc(msg, ...). An adjustment constant is calculated for n zero bytes: pmc = pow(2^(8*n))%poly (using exponentiation by repeated squaring). Then the CRC is adjusted for the zero bytes: crf = (crf*pmc)%poly.
Note that time complexity can be reduced to O(1) with generation of a table of pow(2^(8*i))%poly constants for i = 1 to n. Then the calculation is a table lookup and a fixed iteration (32 cycles) multiply % poly.
#include <stdio.h>
#include <stdlib.h>
typedef unsigned char uint8_t;
typedef unsigned int uint32_t;
static uint32_t crctbl[256];
void GenTbl(void) /* generate crc table */
{
uint32_t crc;
uint32_t c;
uint32_t i;
for(c = 0; c < 0x100; c++){
crc = c<<24;
for(i = 0; i < 8; i++)
crc = (crc<<1)^((0-(crc>>31))&0x04c11db7);
crctbl[c] = crc;
}
}
uint32_t GenCrc(uint8_t * bfr, size_t size) /* generate crc */
{
uint32_t crc = 0u;
while(size--)
crc = (crc<<8)^crctbl[(crc>>24)^*bfr++];
return(crc);
}
/* carryless multiply modulo crc */
uint32_t MpyModCrc(uint32_t a, uint32_t b) /* (a*b)%crc */
{
uint32_t pd = 0;
uint32_t i;
for(i = 0; i < 32; i++){
pd = (pd<<1)^((0-(pd>>31))&0x04c11db7u);
pd ^= (0-(b>>31))&a;
b <<= 1;
}
return pd;
}
/* exponentiate by repeated squaring modulo crc */
uint32_t PowModCrc(uint32_t p) /* pow(2,p)%crc */
{
uint32_t prd = 0x1u; /* current product */
uint32_t sqr = 0x2u; /* current square */
while(p){
if(p&1)
prd = MpyModCrc(prd, sqr);
sqr = MpyModCrc(sqr, sqr);
p >>= 1;
}
return prd;
}
/* # data bytes */
#define DAT ( 32)
/* # zero bytes */
#define PAD (992)
/* DATA+PAD */
#define CNT (1024)
int main()
{
uint32_t pmc;
uint32_t crc;
uint32_t crf;
uint32_t i;
uint8_t *msg = malloc(CNT);
for(i = 0; i < DAT; i++) /* generate msg */
msg[i] = (uint8_t)rand();
for( ; i < CNT; i++)
msg[i] = 0;
GenTbl(); /* generate crc table */
crc = GenCrc(msg, CNT); /* generate crc normally */
crf = GenCrc(msg, DAT); /* generate crc for data */
pmc = PowModCrc(PAD*8); /* pmc = pow(2,PAD*8)%crc */
crf = MpyModCrc(crf, pmc); /* crf = (crf*pmc)%crc */
printf("%08x %08x\n", crc, crf); /* crf == crc */
free(msg);
return 0;
}
CRC32 is based on multiplication in GF(2)[X] modulo some polynomial, which is multiplicative. Tricky part is splitting the non-multiplicative from the multiplicative.
First define a sparse file with the following structure (in Go):
type SparseFile struct {
FileBytes []SparseByte
Size uint64
}
type SparseByte struct {
Position uint64
Value byte
}
In your case it would be SparseFile{[]FileByte{}, 1000}
Then, the function would be:
func IEEESparse (file SparseFile) uint32 {
position2Index := map[uint64]int{}
for i , v := range(file.FileBytes) {
file.FileBytes[i].Value = bits.Reverse8(v.Value)
position2Index[v.Position] = i
}
for i := 0; i < 4; i++ {
index, ok := position2Index[uint64(i)]
if !ok {
file.FileBytes = append(file.FileBytes, SparseByte{Position: uint64(i), Value: 0xFF})
} else {
file.FileBytes[index].Value ^= 0xFF
}
}
// Add padding
file.Size += 4
newReminder := bits.Reverse32(reminderIEEESparse(file))
return newReminder ^ 0xFFFFFFFF
}
So note that:
Division is performed on bits in the opposite order (per byte).
First four bytes are xored with 0xFF.
File is padded with 4 bytes.
Reminder is reversed again.
Reminder is xored again.
The inner function reminderIEEESparse is the true reminder and it is easy to implement it in O(log n) where n is the size of the file.
You can find a full implementation here.

Go - Perform unsigned shift operation

Is there anyway to perform an unsigned shift (namely, unsigned right shift) operation in Go? Something like this in Java
0xFF >>> 3
The only thing I could find on this matter is this post but I'm not sure what I have to do.
Thanks in advance.
The Go Programming Language Specification
Numeric types
A numeric type represents sets of integer or floating-point values.
The predeclared architecture-independent numeric types include:
uint8 the set of all unsigned 8-bit integers (0 to 255)
uint16 the set of all unsigned 16-bit integers (0 to 65535)
uint32 the set of all unsigned 32-bit integers (0 to 4294967295)
uint64 the set of all unsigned 64-bit integers (0 to 18446744073709551615)
int8 the set of all signed 8-bit integers (-128 to 127)
int16 the set of all signed 16-bit integers (-32768 to 32767)
int32 the set of all signed 32-bit integers (-2147483648 to 2147483647)
int64 the set of all signed 64-bit integers (-9223372036854775808 to 9223372036854775807)
byte alias for uint8
rune alias for int32
The value of an n-bit integer is n bits wide and represented using
two's complement arithmetic.
There is also a set of predeclared numeric types with
implementation-specific sizes:
uint either 32 or 64 bits
int same size as uint
uintptr an unsigned integer large enough to store the uninterpreted bits of a pointer value
Conversions are required when different numeric types are mixed in an
expression or assignment.
Arithmetic operators
<< left shift integer << unsigned integer
>> right shift integer >> unsigned integer
The shift operators shift the left operand by the shift count
specified by the right operand. They implement arithmetic shifts if
the left operand is a signed integer and logical shifts if it is an
unsigned integer. There is no upper limit on the shift count. Shifts
behave as if the left operand is shifted n times by 1 for a shift
count of n. As a result, x << 1 is the same as x*2 and x >> 1 is the
same as x/2 but truncated towards negative infinity.
In Go, it's an unsigned integer shift. Go has signed and unsigned integers.
It depends on what type the value 0xFF is. Assume it's one of the unsigned integer types, for example, uint.
package main
import "fmt"
func main() {
n := uint(0xFF)
fmt.Printf("%X\n", n)
n = n >> 3
fmt.Printf("%X\n", n)
}
Output:
FF
1F
Assume it's one of the signed integer types, for example, int.
package main
import "fmt"
func main() {
n := int(0xFF)
fmt.Printf("%X\n", n)
n = int(uint(n) >> 3)
fmt.Printf("%X\n", n)
}
Output:
FF
1F

Algorithm Challenge: Arbitrary in-place base conversion for lossless string compression

It might help to start out with a real world example. Say I'm writing a web app that's backed by MongoDB, so my records have a long hex primary key, making my url to view a record look like /widget/55c460d8e2d6e59da89d08d0. That seems excessively long. Urls can use many more characters than that. While there are just under 8 x 10^28 (16^24) possible values in a 24 digit hex number, just limiting yourself to the characters matched by a [a-zA-Z0-9] regex class (a YouTube video id uses more), 62 characters, you can get past 8 x 10^28 in only 17 characters.
I want an algorithm that will convert any string that is limited to a specific alphabet of characters to any other string with another alphabet of characters, where the value of each character c could be thought of as alphabet.indexOf(c).
Something of the form:
convert(value, sourceAlphabet, destinationAlphabet)
Assumptions
all parameters are strings
every character in value exists in sourceAlphabet
every character in sourceAlphabet and destinationAlphabet is unique
Simplest example
var hex = "0123456789abcdef";
var base10 = "0123456789";
var result = convert("12245589", base10, hex); // result is "bada55";
But I also want it to work to convert War & Peace from the Russian alphabet plus some punctuation to the entire unicode charset and back again losslessly.
Is this possible?
The only way I was ever taught to do base conversions in Comp Sci 101 was to first convert to a base ten integer by summing digit * base^position and then doing the reverse to convert to the target base. Such a method is insufficient for the conversion of very long strings, because the integers get too big.
It certainly feels intuitively that a base conversion could be done in place, as you step through the string (probably backwards to maintain standard significant digit order), keeping track of a remainder somehow, but I'm not smart enough to work out how.
That's where you come in, StackOverflow. Are you smart enough?
Perhaps this is a solved problem, done on paper by some 18th century mathematician, implemented in LISP on punch cards in 1970 and the first homework assignment in Cryptography 101, but my searches have borne no fruit.
I'd prefer a solution in javascript with a functional style, but any language or style will do, as long as you're not cheating with some big integer library. Bonus points for efficiency, of course.
Please refrain from criticizing the original example. The general nerd cred of solving the problem is more important than any application of the solution.
Here is a solution in C that is very fast, using bit shift operations. It assumes that you know what the length of the decoded string should be. The strings are vectors of integers in the range 0..maximum for each alphabet. It is up to the user to convert to and from strings with restricted ranges of characters. As for the "in-place" in the question title, the source and destination vectors can overlap, but only if the source alphabet is not larger than the destination alphabet.
/*
recode version 1.0, 22 August 2015
Copyright (C) 2015 Mark Adler
This software is provided 'as-is', without any express or implied
warranty. In no event will the authors be held liable for any damages
arising from the use of this software.
Permission is granted to anyone to use this software for any purpose,
including commercial applications, and to alter it and redistribute it
freely, subject to the following restrictions:
1. The origin of this software must not be misrepresented; you must not
claim that you wrote the original software. If you use this software
in a product, an acknowledgment in the product documentation would be
appreciated but is not required.
2. Altered source versions must be plainly marked as such, and must not be
misrepresented as being the original software.
3. This notice may not be removed or altered from any source distribution.
Mark Adler
madler#alumni.caltech.edu
*/
/* Recode a vector from one alphabet to another using intermediate
variable-length bit codes. */
/* The approach is to use a Huffman code over equiprobable alphabets in two
directions. First to encode the source alphabet to a string of bits, and
second to encode the string of bits to the destination alphabet. This will
be reasonably close to the efficiency of base-encoding with arbitrary
precision arithmetic. */
#include <stddef.h> // size_t
#include <limits.h> // UINT_MAX, ULLONG_MAX
#if UINT_MAX == ULLONG_MAX
# error recode() assumes that long long has more bits than int
#endif
/* Take a list of integers source[0..slen-1], all in the range 0..smax, and
code them into dest[0..*dlen-1], where each value is in the range 0..dmax.
*dlen returns the length of the result, which will not exceed the value of
*dlen when called. If the original *dlen is not large enough to hold the
full result, then recode() will return non-zero to indicate failure.
Otherwise recode() will return 0. recode() will also return non-zero if
either of the smax or dmax parameters are less than one. The non-zero
return codes are 1 if *dlen is not long enough, 2 for invalid parameters,
and 3 if any of the elements of source are greater than smax.
Using this same operation on the result with smax and dmax reversed reverses
the operation, restoring the original vector. However there may be more
symbols returned than the original, so the number of symbols expected needs
to be known for decoding. (An end symbol could be appended to the source
alphabet to include the length in the coding, but then encoding and decoding
would no longer be symmetric, and the coding efficiency would be reduced.
This is left as an exercise for the reader if that is desired.) */
int recode(unsigned *dest, size_t *dlen, unsigned dmax,
const unsigned *source, size_t slen, unsigned smax)
{
// compute sbits and scut, with which we will recode the source with
// sbits-1 bits for symbols < scut, otherwise with sbits bits (adding scut)
if (smax < 1)
return 2;
unsigned sbits = 0;
unsigned scut = 1; // 2**sbits
while (scut && scut <= smax) {
scut <<= 1;
sbits++;
}
scut -= smax + 1;
// same thing for dbits and dcut
if (dmax < 1)
return 2;
unsigned dbits = 0;
unsigned dcut = 1; // 2**dbits
while (dcut && dcut <= dmax) {
dcut <<= 1;
dbits++;
}
dcut -= dmax + 1;
// recode a base smax+1 vector to a base dmax+1 vector using an
// intermediate bit vector (a sliding window of that bit vector is kept in
// a bit buffer)
unsigned long long buf = 0; // bit buffer
unsigned have = 0; // number of bits in bit buffer
size_t i = 0, n = 0; // source and dest indices
unsigned sym; // symbol being encoded
for (;;) {
// encode enough of source into bits to encode that to dest
while (have < dbits && i < slen) {
sym = source[i++];
if (sym > smax) {
*dlen = n;
return 3;
}
if (sym < scut) {
buf = (buf << (sbits - 1)) + sym;
have += sbits - 1;
}
else {
buf = (buf << sbits) + sym + scut;
have += sbits;
}
}
// if not enough bits to assure one symbol, then break out to a special
// case for coding the final symbol
if (have < dbits)
break;
// encode one symbol to dest
if (n == *dlen)
return 1;
sym = buf >> (have - dbits + 1);
if (sym < dcut) {
dest[n++] = sym;
have -= dbits - 1;
}
else {
sym = buf >> (have - dbits);
dest[n++] = sym - dcut;
have -= dbits;
}
buf &= ((unsigned long long)1 << have) - 1;
}
// if any bits are left in the bit buffer, encode one last symbol to dest
if (have) {
if (n == *dlen)
return 1;
sym = buf;
sym <<= dbits - 1 - have;
if (sym >= dcut)
sym = (sym << 1) - dcut;
dest[n++] = sym;
}
// return recoded vector
*dlen = n;
return 0;
}
/* Test recode(). */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <assert.h>
// Return a random vector of len unsigned values in the range 0..max.
static void ranvec(unsigned *vec, size_t len, unsigned max) {
unsigned bits = 0;
unsigned long long mask = 1;
while (mask <= max) {
mask <<= 1;
bits++;
}
mask--;
unsigned long long ran = 0;
unsigned have = 0;
size_t n = 0;
while (n < len) {
while (have < bits) {
ran = (ran << 31) + random();
have += 31;
}
if ((ran & mask) <= max)
vec[n++] = ran & mask;
ran >>= bits;
have -= bits;
}
}
// Get a valid number from str and assign it to var
#define NUM(var, str) \
do { \
char *end; \
unsigned long val = strtoul(str, &end, 0); \
var = val; \
if (*end || var != val) { \
fprintf(stderr, \
"invalid or out of range numeric argument: %s\n", str); \
return 1; \
} \
} while (0)
/* "bet n m len count" generates count test vectors of length len, where each
entry is in the range 0..n. Each vector is recoded to another vector using
only symbols in the range 0..m. That vector is recoded back to a vector
using only symbols in 0..n, and that result is compared with the original
random vector. Report on the average ratio of input and output symbols, as
compared to the optimal ratio for arbitrary precision base encoding. */
int main(int argc, char **argv)
{
// get sizes of alphabets and length of test vector, compute maximum sizes
// of recoded vectors
unsigned smax, dmax, runs;
size_t slen, dsize, bsize;
if (argc != 5) { fputs("need four arguments\n", stderr); return 1; }
NUM(smax, argv[1]);
NUM(dmax, argv[2]);
NUM(slen, argv[3]);
NUM(runs, argv[4]);
dsize = ceil(slen * ceil(log2(smax + 1.)) / floor(log2(dmax + 1.)));
bsize = ceil(dsize * ceil(log2(dmax + 1.)) / floor(log2(smax + 1.)));
// generate random test vectors, encode, decode, and compare
srandomdev();
unsigned source[slen], dest[dsize], back[bsize];
unsigned mis = 0, i;
unsigned long long dtot = 0;
int ret;
for (i = 0; i < runs; i++) {
ranvec(source, slen, smax);
size_t dlen = dsize;
ret = recode(dest, &dlen, dmax, source, slen, smax);
if (ret) {
fprintf(stderr, "encode error %d\n", ret);
break;
}
dtot += dlen;
size_t blen = bsize;
ret = recode(back, &blen, smax, dest, dlen, dmax);
if (ret) {
fprintf(stderr, "decode error %d\n", ret);
break;
}
if (blen < slen || memcmp(source, back, slen)) // blen > slen is ok
mis++;
}
if (mis)
fprintf(stderr, "%u/%u mismatches!\n", mis, i);
if (ret == 0)
printf("mean dest/source symbols = %.4f (optimal = %.4f)\n",
dtot / (i * (double)slen), log(smax + 1.) / log(dmax + 1.));
return 0;
}
As has been pointed out in other StackOverflow answers, try not to think of summing digit * base^position as converting it to base ten; rather, think of it as directing the computer to generate a representation of the quantity represented by the number in its own terms (for most computers probably closer to our concept of base 2). Once the computer has its own representation of the quantity, we can direct it to output the number in any way we like.
By rejecting "big integer" implementations and asking for letter-by-letter conversion you are at the same time arguing that the numerical/alphabetical representation of quantity is not actually what it is, namely that each position represents a quantity of digit * base^position. If the nine-millionth character of War and Peace does represent what you are asking to convert it from, then the computer at some point will need to generate a representation for Д * 33^9000000.
I don't think any solution can work generally because if ne != m for some integer e and some MAX_INT because there's no way to calculate the value of the target base in a certain place p if np > MAX_INT.
You can get away with this for the case where ne == m for some e because the problem is recursively doable (the first e digits of n can be summed and converted into the first digit of M, and then chopped off and repeated.
If you don't have this useful property, then eventually you're going to have to try to take some part of the original base and try to perform modulus in np and np is going to be greater than MAX_INT, which means it's impossible.

Floating Point Divider Hardware Implementation Details

I am trying to implement a 32-bit floating point hardware divider in hardware and I am wondering if I can get any suggestions as to some tradeoffs between different algorithms?
My floating point unit currently suppports multiplication and addition/subtraction, but I am not going to switch it to a fused multiply-add (FMA) floating point architecture since this is an embedded platform where I am trying to minimize area usage.
Once upon a very long time ago i come across this neat and easy to implement float/fixed point divison algorithm used in military FPUs of that time period:
input must be unsigned and shifted so x < y and both are in range < 0.5 ; 1 >
don't forget to store the difference of shifts sh = shx - shy and original signs
find f (by iterating) so y*f -> 1 .... after that x*f -> x/y which is the division result
shift the x*f back by sh and restore result sign (sig=sigx*sigy)
the x*f can be computed easily like this:
z=1-y
(x*f)=(x/y)=x*(1+z)*(1+z^2)*(1+z^4)*(1+z^8)*(1+z^16)...(1+z^2n)
where
n = log2(num of fractional bits for fixed point, or mantisa bit size for floating point)
You can also stop when z^2n is zero on fixed bit width data types.
[Edit2] Had a bit of time&mood for this so here 32 bit IEEE 754 C++ implementation
I removed the old (bignum) examples to avoid confusion for future readers (they are still accessible in edit history if needed)
//---------------------------------------------------------------------------
// IEEE 754 single masks
const DWORD _f32_sig =0x80000000; // sign
const DWORD _f32_exp =0x7F800000; // exponent
const DWORD _f32_exp_sig=0x40000000; // exponent sign
const DWORD _f32_exp_bia=0x3F800000; // exponent bias
const DWORD _f32_exp_lsb=0x00800000; // exponent LSB
const DWORD _f32_exp_pos= 23; // exponent LSB bit position
const DWORD _f32_man =0x007FFFFF; // mantisa
const DWORD _f32_man_msb=0x00400000; // mantisa MSB
const DWORD _f32_man_bits= 23; // mantisa bits
//---------------------------------------------------------------------------
float f32_div(float x,float y)
{
union _f32 // float bits access
{
float f; // 32bit floating point
DWORD u; // 32 bit uint
};
_f32 xx,yy,zz; int sh; DWORD zsig; float z;
// result signum abs value
xx.f=x; zsig =xx.u&_f32_sig; xx.u&=(0xFFFFFFFF^_f32_sig);
yy.f=y; zsig^=yy.u&_f32_sig; yy.u&=(0xFFFFFFFF^_f32_sig);
// initial exponent difference sh and normalize exponents to speed up shift in range
sh =0;
sh-=((xx.u&_f32_exp)>>_f32_exp_pos)-(_f32_exp_bia>>_f32_exp_pos); xx.u&=(0xFFFFFFFF^_f32_exp); xx.u|=_f32_exp_bia;
sh+=((yy.u&_f32_exp)>>_f32_exp_pos)-(_f32_exp_bia>>_f32_exp_pos); yy.u&=(0xFFFFFFFF^_f32_exp); yy.u|=_f32_exp_bia;
// shift input in range
while (xx.f> 1.0f) { xx.f*=0.5f; sh--; }
while (xx.f< 0.5f) { xx.f*=2.0f; sh++; }
while (yy.f> 1.0f) { yy.f*=0.5f; sh++; }
while (yy.f< 0.5f) { yy.f*=2.0f; sh--; }
while (xx.f<=yy.f) { yy.f*=0.5f; sh++; }
// divider block
z=(1.0f-yy.f);
zz.f=xx.f*(1.0f+z);
for (;;)
{
z*=z; if (z==0.0f) break;
zz.f*=(1.0f+z);
}
// shift result back
for (;sh>0;) { sh--; zz.f*=0.5f; }
for (;sh<0;) { sh++; zz.f*=2.0f; }
// set signum
zz.u&=(0xFFFFFFFF^_f32_sig);
zz.u|=zsig;
return zz.f;
}
//---------------------------------------------------------------------------
I wanted to keep it simple so it is not optimized yet. You can for example replace all *=0.5 and *=2.0 by exponent inc/dec ... If you compare with FPU results on float operator / this will be a bit less precise because most FPUs compute on 80 bit internal format and this implementation is only on 32 bits.
As you can see I am using from FPU just +,-,*. The stuff can be speed up by using fast sqr algorithms like
Fast bignum square computation
especially if you want to use big bit widths ...
Do not forget to implement normalization and or overflow/underflow correction.

Produce MD5 or SHA1 hash code to long (64 bits)

I need to compute a hash code of a string and store it into a 'long' variable.
MD5 and SHA1 produce hash codes which are longer than 64 bits (MD5 - 128 bits, SHA1 - 160 bit).
Ideas any one?
Cheers,
Doron
You can truncate the hash and use just the first 64 bits. The hash will be somewhat less strong, but the first 64 bits are still extremely likely to be unique.
For most uses of a hash this is both a common and perfectly acceptable practice.
You can also store the complete hash in two 64-bit integers.
The FNV Hash is pretty easy to implement. We extended it to 64 bits and it works very well. Using it is much faster than computing MD5 or SHA1 and then truncating the result. However, we don't depend on it for cryptographic functions--just for hash tables and such.
More information on FNV, with source code and detailed explanations: http://isthe.com/chongo/tech/comp/fnv/
I'm using this (Java):
public class SimpleLongHash {
final MessageDigest md;
//
public SimpleLongHash() throws NoSuchAlgorithmException {
md = MessageDigest.getInstance("MD5");
}
//
public long hash(final String str) {
return hash(str.getBytes());
}
public long hash(final byte[] buf) {
md.reset();
final byte[] digest = md.digest(buf);
return (getLong(digest, 0) ^ getLong(digest, 8));
}
//
private static final long getLong(final byte[] array, final int offset) {
long value = 0;
for (int i = 0; i < 8; i++) {
value = ((value << 8) | (array[offset+i] & 0xFF));
}
return value;
}
}
What would be the probability for a collision as a result of a XOR between the first 64 bits and the last 64 bits?
XOR the bits together? E.g. for MD5, bits 0-63 XOR bits 64-127, voila, 64 bits. This will give you a weaker hash, check if that's acceptable for you.
(also, unless your environment is extremely constrained - e.g. embedded devices - there's a question of "why do you need to shorten it?")
You can also play with various hash algorithms with FooBabel Hasher

Resources