How to convert fixed-point VHDL type back to float? - vhdl

I am using IEEE fixed point package in VHDL.
It works well, but I now facing a problem concerning their string representation in a test bench : I would like to dump them in a text file.
I have found that it is indeed possible to directly write ufixed or sfixed using :
write(buf, to_string(x)); --where x is either sfixed or ufixed (and buf : line)
But then I get values like 11110001.10101 (for sfixed q8.5 representation).
So my question : how to convert back these fixed point numbers to reals (and then to string) ?

The variable needs to be split into two std-logic-vector parts, the integer part can be converted to a string using standard conversion, but for the fraction part the string conversion is a bit different. For the integer part you need to use a loop and divide by 10 and convert the modulo remainder into ascii character, building up from the lower digit to the higher digit. For the fractional part it also need a loop but one needs to multiply by 10 take the floor and isolate this digit to get the corresponding character, then that integer is used to be substracted to the fraction number, etc. This is the concept, worked in MATLAB to test and making a vhdl version I will share soon. I was surprised not to find such useful function anywhere. Of course fixed-point format can vary Q(N,M) N and M can have all sorts of values, while for floating point, it is standardized.

Related

Convert string currency to float with ruby

I have the following string:
"1.273,08"
And I need to convert to float and the result must be:
1273.08
I tried some code using gsub but I can't solve this.
How can I do this conversion?
You have already received two good answers how to massage your String into your desired format using String#delete and String#tr.
But there is a deeper problem.
The decimal value 1 273.0810 cannot be accurately represented as an IEEE 754-2019 / ISO/IEC 60559:2020 binary64 floating point value.
Just like the value 1/3rd can easily be represented in ternary (0.13) but has an infinite representation in decimal (0.33333333…10, i.e. 0.[3]…10) and thus cannot be accurately represented, the value 8/100th can easily be represented in decimal (0.0810) but has an infinite representation in binary (0.0001010001111010111000010100011110101110000101…2, i.e. 0.[00010100011110101110]…2). In other words, it is impossible to express 1 273.0810 as a Ruby Float.
And that's not specific to Ruby, or even to programming, that is just basic high school maths: you cannot represent this number in binary, period, just like you cannot represent 1/3rd in decimal, or π in any integer base.
And of course, computers don't have infinite memory, so not only does 1 273.0810 have an infinite representation in binary, but as a Float, it will also be cut off after 64 bits. The closest possible value to 1 273.0810 as an IEEE 754-2019 / ISO/IEC 60559:2020 binary64 floating point value is 1 273.079 999 999 999 927 240 423 858 1710, which is less than 1 273.0810.
That is why you should never represent money using binary numbers: everybody will expect it to be decimal, not binary; if I write a cheque, I write it in decimal, not binary. People will expect that it is impossible to represent $ 1/3rd, they will expect that it is impossible to represent $ π, but they will not expect and not accept that if they put $ 1273.08 into their account, they will actually end up with slightly less than that.
The correct way to represent money would be to use a specialized Money datatype, or at least using the bigdecimal library from the standard library:
require 'bigdecimal'
BigDecimal('1.273,08'.delete('.').tr(',', '.'))
#=> 0.127308e4
I would do
"1.273,08".delete('.') # delete '.' from the string
.tr(',', '.') # replace ',' with '.'
.to_f # translate to float
#=> 1273.08
So, we're using . as a thousands separator and , instead of a dot:
str = "1.273,08"
str.gsub('.','').gsub(',', '.').to_f

Implementation of strtof(), floating-point multiplication and mantissa rounding issues

This question is not so much about the C as about the algorithm. I need to implement strtof() function, which would behave exactly the same as GCC one - and do it from scratch (no GNU MPL etc.).
Let's skip checks, consider only correct inputs and positive numbers, e.g. 345.6e7. My basic algorithm is:
Split the number into fraction and integer exponent, so for 345.6e7 fraction is 3.456e2 and exponent is 7.
Create a floating-point exponent. To do this, I use these tables:
static const float powersOf10[] = {
1.0e1f,
1.0e2f,
1.0e4f,
1.0e8f,
1.0e16f,
1.0e32f
};
static const float minuspowersOf10[] = {
1.0e-1f,
1.0e-2f,
1.0e-4f,
1.0e-8f,
1.0e-16f,
1.0e-32f
};
and get float exponent as a product of corresponding bits in integer exponent, e.g. 7 = 1+2+4 => float_exponent = 1.0e1f * 1.0e2f * 1.0e4f.
Multiply fraction by floating exponent and return the result.
And here comes the first problem: since we do a lot of multiplications, we get a somewhat big error becaule of rounding multiplication result each time. So, I decided to dive into floating point multiplication algorithm and implement it myself: a function takes a number of floats (in my case - up to 7) and multiplies them on bit level. Consider I have uint256_t type to fit mantissas product.
Now, the second problem: round mantissas product to 23 bits. I've tried several rounding methods (round-to-even, Von Neumann rounding - a small article about them), but no of them can give the correct result for all the test numbers. And some of them really confuse me, like this one:
7038531e-32. GCC's strtof() returns 0x15ae43fd, so correct unbiased mantissa is 2e43fd. I go for multiplication of 7.038531e6 (biased mantissa d6cc86) and 1e-32 (b.m. cfb11f). The resulting unbiased mantissa in binary form is
( 47)0001 ( 43)0111 ( 39)0010 ( 35)0001
( 31)1111 ( 27)1110 ( 23)1110 ( 19)0010
( 15)1011 ( 11)0101 ( 7)0001 ( 3)1101
which I have to round to 23 bits. However, by all rounding methods I have to round it up, and I'll get 2e43fe in result - wrong! So, for this number the only way to get correct mantissa is just to chop it - but chopping does not work for other numbers.
Having this worked on countless nights, my questions are:
Is this approach to strtof() correct? (I know that GCC uses GNU MPL for it, and tried to see into it. However, trying to copy MPL's implementation would require porting the entire library, and this is definitely not what I want). Maybe this split-then-multiply algorithm is inevitably prone to errors? I did some other small tricks, (e.g. create exponent tables for all integer exponents in float range), but they led to even more failed conversions.
If so, did I miss something while rounding? I thought so for long time, but this 7038531e-32 number completely confused me.
If I want to be as precise as I can I usually do stuff like this (however I usually do the reverse operation float -> text):
use only integers (no floats what so ever)
as you know float is integer mantissa bit-shifted by integer exponent so no need for floats.
For constructing the final float datatype you can use simple union with float and 32 bit unsigned integer in it ... or pointers to such types pointing to the same address.
This will avoid rounding errors for numbers that fit completely and shrink error for those that don't fit considerably.
use hex numbers
You can convert your text of decadic number on the run into its hex counterpart (still as text) from there creating mantissa and exponent integers is simple.
Here:
How to convert a gi-normous integer (in string format) to hex format? (C#)
is C++ implementation example of dec2hex and hex2dec number conversions done on text
use more bits for mantissa while converting
for task like this and single precision float I usually use 2 or 3 32 bit DWORDs for the 24 bit mantissa to still hold some precision after the multiplications If you want to be precise you have to deal with 128+24 bits for both integer and fractional part of number so 5x32 bit numbers in sequence.
For more info and inspiration see (reverse operation):
my best attempt to print 32 bit floats with least rounding errors (integer math only)
Your code will be just inverse of that (so many parts will be similar)
Since I post that I made even more advanced version that recognize formatting just like printf , supports much more datatypes and more without using any libs (however its ~22.5 KByte of code). I needed it for MCUs as GCC implementation of prints are not very good there ...

Fastest algorithm to convert hexadecimal numbers into decimal form without using a fixed length variable to store the result

I want to write a program to convert hexadecimal numbers into their decimal forms without using a variable of fixed length to store the result because that would restrict the range of inputs that my program can work with.
Let's say I were to use a variable of type long long int to calculate, store and print the result. Doing so would limit the range of hexadecimal numbers that my program can handle to between 8000000000000001 and 7FFFFFFFFFFFFFFF. Anything outside this range would cause the variable to overflow.
I did write a program that calculates and stores the decimal result in a dynamically allocated string by performing carry and borrow operations but it runs much slower, even for numbers that are as big as 7FFFFFFFF!
Then I stumbled onto this site which could take numbers that are way outside the range of a 64 bit variable. I tried their converter with numbers much larger than 16^65 - 1 and still couldn't get it to overflow. It just kept on going and printing the result.
I figured that they must be using a much better algorithm for hex to decimal conversion, one that isn't limited to 64 bit values.
So far, Google's search results have only led me to algorithms that use some fixed-length variable for storing the result.
That's why I am here. I wanna know if such an algorithm exists and if it does, what is it?
Well, it sounds like you already did it when you wrote "a program that calculates and stores the decimal result in a dynamically allocated string by performing carry and borrow operations".
Converting from base 16 (hexadecimal) to base 10 means implementing multiplication and addition of numbers in a base 10x representation. Then for each hex digit d, you calculate result = result*16 + d. When you're done you have the same number in a 10-based representation that is easy to write out as a decimal string.
There could be any number of reasons why your string-based method was slow. If you provide it, I'm sure someone could comment.
The most important trick for making it reasonably fast, though, is to pick the right base to convert to and from. I would probably do the multiplication and addition in base 109, so that each digit will be as large as possible while still fitting into a 32-bit integer, and process 7 hex digits at a time, which is as many as I can while only multiplying by single digits.
For every 7 hex digts, I'd convert them to a number d, and then do result = result * ‭(16^7) + d.
Then I can get the 9 decimal digits for each resulting digit in base 109.
This process is pretty easy, since you only have to multiply by single digits. I'm sure there are faster, more complicated ways that recursively break the number into equal-sized pieces.

Integer to Binary Conversion in Simulink

This might look a repetition to my earlier question. But I think its not.
I am looking for a technique to convert the signal in the Decimal format to binary format.
I intend to use the Simulink blocks in the Xilinx Library to convert decimal to binary format.
So if the input is 3, the expected output should in 11( 2 Clock Cycles). I am looking for the output to be obtained serially.
Please suggest me how to do it or any pointers in the internet would be helpful.
Thanks
You are correct, what you need is the parallel to serial block from system generator.
It is described in this document:
http://www.xilinx.com/support/documentation/sw_manuals/xilinx13_1/sysgen_ref.pdf
This block is a rate changing block. Check the mentions of the parallel to serial block in these documents for further descriptions:
http://www.xilinx.com/support/documentation/sw_manuals/xilinx13_1/sysgen_gs.pdf
http://www.xilinx.com/support/documentation/sw_manuals/xilinx13_1/sysgen_user.pdf
Use a normal constant block with a Matlab variable in it, this already gives the output in "normal" binary (assuming you set the properties on it to be unsigned and the binary point at 0.
Then you need to write a small serialiser block, which takes that input, latches it into a shift register and then shifts the register once per clock cycle with the bit that "falls off the end" becoming your output bit. Depending on which way your shift, you can make it come MSB first of LSB first.
You'll have to build the shift register out of ordinary registers and a mux before each one to select whether you are doing a parallel load or shifting. (This is the sort of thing which is a couple of lines of code in VHDL, but a right faff in graphics).
If you have to increase the serial rate, you need to clock it from a faster clock - you could use a DCM to generate this.
Matlab has a dec2bin function that will convert from a decimal number to a binary string. So, for example dec2bin(3) would return 11.
There's also a corresponding bin2dec which takes a binary string and converts to a decimal number, so that bin2dec('11') would return 3.
If you're wanting to convert a non-integer decimal number to a binary string, you'll first want to determine what's the smallest binary place you want to represent, and then do a little bit of pre- and post-processing, combined with dec2bin to get the results you're looking for. So, if the smallest binary place you want is the 1/512th place (or 2^-9), then you could do the following (where binPrecision equals 1/512):
function result = myDec2Bin(decNum, binPrecision)
isNegative=(decNum < 0);
intPart=floor(abs(decNum));
binaryIntPart=dec2bin(intPart);
fracPart=abs(decNum)-intPart;
scaledFracPart=round(fracPart / binPrecision);
scaledBinRep=dec2bin(scaledFracPart);
temp=num2str(10^log2(1/binPrecision)+str2num(scaledBinRep),'%d');
result=[binaryIntPart,'.',temp(2:end)];
if isNegative
result=['-',result];
end
end
The result of myDec2Bin(0.256, 1/512) would then be 0.010000011, and the result of myDec2Bin(-0.984, 1/512) would be -0.111111000. (Note that the output is a string.)

How do I trim the zero value after decimal

As I tried to debug, I found that : just as I type in
Dim value As Double
value = 0.90000
then hit enter, and it automatically converts to 0.9
Shouldn't it keep the precision in double in visual basic?
For my calculation, I absolutely need to show the precision
If precision is required then the Currency data type is what you want to use.
There are at least two representations of your value in play. One is the value you see on the screen -- a string -- and one is the internal representation -- a binary value. In dealing with fractional values, the two are often not equivalent and where they aren't, it's because they can't be.
If you stick with doubles, VB will maintain 53 bits of mantissa throughout your calculations, no matter how they might appear when printed. If you transition through the string domain, say by saving to a file or DB and later retrieving, it often has to leave some of that precision behind. It's inevitable, because the interface between the two domains is not perfect. Some values that can be exactly represented as strings (or Decimals, that is, powers of ten) can't be exactly represented as fractional powers of 2.
This has nothing to do with VB, it's the nature of floating point. The best you can do is control where the rounding occurs. For this purpose your friend is the Format function, which controls how a value appears in string form.
? Format$(0.9, "0.00000") will show you an example.
You are getting what you see on the screen confused with what bits are being set in the Double to make that number.
VB is simply being "helpful", and simply knocking off excess zeros. But for all intents and purposes,
0.9
is identical to
0.90000
If you don't believe me, try doing this comparison:
Debug.Print CDbl("0.9") = CDbl("0.90000")
As has already been said, displayed precision can be shown using the Format$() function, e.g.
Debug.Print Format$(0.9, "0.00000")
No, it shouldn't keep the precision. Binary floating point values don't retain this information... and it would be somewhat odd to do so, given that you're expressing the value in one base even though it's being represented in another.
I don't know whether VB6 has a decimal floating point type, but that's probably what you want - or a fixed point decimal type, perhaps. Certainly in .NET, System.Decimal has retained extra 0s from .NET 1.1 onwards. If this doesn't help you, you could think about remembering two integers - e.g. "90000" and "100000" in this case, so that the value you're representing is one integer divided by another, with the associated level of precision.
EDIT: I thought that Currency may be what you want, but according to this article, that's fixed at 4 decimal places, and you're trying to retain 5. You could potentially just multiply by 10, if you always want 5 decimal places - but it's an awkward thing to remember to do everywhere... and you'd have to work out how to format it appropriately. It would also always be 4 decimal places, I suspect, even if you'd specified fewer - so if you want "0.300" to be different to "0.3000" then Currency may not be appropriate. I'm entirely basing this on articles online though...
You can also enter the value as 0.9# instead. This helps avoid implicit coercion within an expression that may truncate the precision you expect. In most cases the compiler won't require this hint though because floating point literals default to Double (indeed, the IDE typically deletes the # symbol unless the value was an integer, e.g. 9#).
Contrast the results of these:
MsgBox TypeName(0.9)
MsgBox TypeName(0.9!)
MsgBox TypeName(0.9#)

Resources