My actual vector has 110 elements that I'll use to extract features from images in matlab, I took this one (tb) to simplify
tb=[22.9 30.0 30.3 27.8 24.1 28.2 26.4 12.6 39.7 38.0];
normalized_V = tb/norm(tb);
I = mat2gray(tb);
For normalized_v I got 0.2503 0.3280 0.3312 0.3039 0.2635 0.3083 0.2886 0.1377 0.4340 0.4154.
For I I got 0.3801 0.6421 0.6531 0.5609 0.4244 0.5756 0.5092 0 1.0000 0.9373 which one should I use if any of those 2 methods and why, and should I transform the features vector to 1 element after extraction for better training or leave it as a 110 element vector.
Normalization can be performed in several ways, such as the following:
Normalizing the vector between 0 and 1. In that case, just use:(tb-min(tb))/max(tb)
Making the maximum point at 1. In that case, just use: tb/max(tb) (which is the method that you have been used before).
Making the mean 0 and the standard deviation as 1. This is the most common method for using the returned values as features in a classification procedure and thus, I think that it is the one that you should use right now: zscore(tb) (or (tb-mean(tb))/std(tb)).
So, your final values would be:
zscore(tb)
ans =
-0.6664
0.2613
0.3005
-0.0261
-0.5096
0.0261
-0.2091
-2.0121
1.5287
1.3066
Edit:
In regard to your second question, it depends on the number of observations. Every single classifier takes an MxN matrix of data and an Mx1 vector of labels as inputs. In this case, M refers to the number of observations, whereas N refers to the number of features. Usually, in order to avoid over-fitting, it is recommended to use a number of features less than the tenth part of the number of observations (i.e., the number of observations must be M > 10N).
So, in your case, if you use the entire 110-set of features, you should have a minimum of 1100 observations, otherwise you can have problems with over-fitting.
I am struggling to find a way to generate a random number within a given interval in PostScript.
Basically PostScript has three functions to help you generate (pseudo-)random numbers. Those are rand, srand and rrand.
The later two are for passing a seed to the number generator to be able to reproduce specific results. At least that´s what I understood they are for. Anyway they don´t seem suitable for my case.
So rand seems to be the only function I can use to generate a random number, but...
rand returns a random integer in the range 0 to 231 − 1 (From the PostScript Language Reference, page 637 (651 in the PDF))
This is far beyond the the interval I´m looking for. I am more interested in values up to small thousands, maybe 10.000 or something like that and small float values, up to 100, all with the lower limit of 0.
I thought I could just narrow my numbers down by simple divisions and extracting the root but that tends to give me unusable small values in quite a lot cases. I am wondering if there are robust ways to either shrink a large number down to what I need or, I´d prefer that, only generate numbers in the desired interval.
Besides: while-loops are not possible in PostScript, otherwise I´d have written a function to generate numbers until they fit in my interval.
Any hints on what to look for breaking numbers down into my interval?
mod is often good enough and it's fast. But you may get a more uniform distribution by using floating-point ops.
rand 16#7fffffff div 100 mul cvi
This is because mod discards the upper bits of the input. And the PRNG is usually trying to randomize over all the bits. By scaling down then up, they all contribute something in the way of rounding effects.
Just use the modulo operator to get it down to the size you want:
GS>rand 100 mod stack
7
I have been solving a programming challenge in UVA and got this problem, which is REALLY strange. Here is the flawed code:
program WTF;
begin
WriteLn(Trunc(2.01 * 100));
ReadLn();
end.
Obviously, I need to get 201 as Integer, but I get 200, this happens because Double somehow doesn't store the exact value... It's 2.01 = 2.00(9) for reasons unbeknownst to me, can someone explain this and provide a solution?
Edit: Yet, I figgured that using Round() instead of Trunc() fixes this... But still, why wouldn't Trunc() work?
Double stores numbers of the form s*2p where s and p are integers. The number 2.01 is not of the form s*2p for any integers s, p so it cannot be stored exactly in a Double.
The solution here is to round 2.01 * 100 to the nearest integer instead of truncating it. Although 2.01 is not exactly 2.01, it is only a little bit below. Rounding to the nearest integer would result in 201.
Note that if by 2.00(9) you mean 2.0099999999… repeating indefinitely, then 2.00(9)is not the Double you get when you write 2.01. The nearest Double to the real 2.01, and the number you got, is 2.0099999999999997868371792719699442386627197265625. It is of the form s * 2p: 9052235251014696 * 2-52
I have an external application which monitors CPU and GPU temperatures...
I am using Lua with the alien extension to grab these values (via GetWindowText) and to do some pattern matching on these values, effectively extracting the temperature digits out of the string, which by default shows up as something like CPU 67.875 °C...
But perhaps I have the wrong idea on how patterns work in LUA (since they don't appear to be exactly like regex)?
The pattern I am using is [%d]+[.%d+]* which should match any number between 0 and 100.0, correct?
Yet oddly enough, I am getting incredibly strange output when values reach around 56.5 degrees (see link).
Why is this happening?
And how can I extract the correct floating point values (as a string) between 0 and 100 in the format of XYY.ZZZ, where X is not optional, Y is optional, and . is optional unless Z exists?
You're seeing the effect of accumulated rounding errors because 0.16 cannot be precisely represented in floating point. The code below performs better:
local n = 0
while n < 10000 do
local s = tostring(n/100)
local t = s:match("[%d]+[.%d+]*")
print(t)
n = n + 16
end
Now, to your question, try the simpler pattern below:
s="CPU 67.875 °C"
print(s:match("CPU +(.-) +"))
I have an array of numbers that potentially have up to 8 decimal places and I need to find the smallest common number I can multiply them by so that they are all whole numbers. I need this so all the original numbers can all be multiplied out to the same scale and be processed by a sealed system that will only deal with whole numbers, then I can retrieve the results and divide them by the common multiplier to get my relative results.
Currently we do a few checks on the numbers and multiply by 100 or 1,000,000, but the processing done by the *sealed system can get quite expensive when dealing with large numbers so multiplying everything by a million just for the sake of it isn’t really a great option. As an approximation lets say that the sealed algorithm gets 10 times more expensive every time you multiply by a factor of 10.
What is the most efficient algorithm, that will also give the best possible result, to accomplish what I need and is there a mathematical name and/or formula for what I’m need?
*The sealed system isn’t really sealed. I own/maintain the source code for it but its 100,000 odd lines of proprietary magic and it has been thoroughly bug and performance tested, altering it to deal with floats is not an option for many reasons. It is a system that creates a grid of X by Y cells, then rects that are X by Y are dropped into the grid, “proprietary magic” occurs and results are spat out – obviously this is an extremely simplified version of reality, but it’s a good enough approximation.
So far there are quiet a few good answers and I wondered how I should go about choosing the ‘correct’ one. To begin with I figured the only fair way was to create each solution and performance test it, but I later realised that pure speed wasn’t the only relevant factor – an more accurate solution is also very relevant. I wrote the performance tests anyway, but currently the I’m choosing the correct answer based on speed as well accuracy using a ‘gut feel’ formula.
My performance tests process 1000 different sets of 100 randomly generated numbers.
Each algorithm is tested using the same set of random numbers.
Algorithms are written in .Net 3.5 (although thus far would be 2.0 compatible)
I tried pretty hard to make the tests as fair as possible.
Greg – Multiply by large number
and then divide by GCD – 63
milliseconds
Andy – String Parsing
– 199 milliseconds
Eric – Decimal.GetBits – 160 milliseconds
Eric – Binary search – 32
milliseconds
Ima – sorry I couldn’t
figure out a how to implement your
solution easily in .Net (I didn’t
want to spend too long on it)
Bill – I figure your answer was pretty
close to Greg’s so didn’t implement
it. I’m sure it’d be a smidge faster
but potentially less accurate.
So Greg’s Multiply by large number and then divide by GCD” solution was the second fastest algorithm and it gave the most accurate results so for now I’m calling it correct.
I really wanted the Decimal.GetBits solution to be the fastest, but it was very slow, I’m unsure if this is due to the conversion of a Double to a Decimal or the Bit masking and shifting. There should be a
similar usable solution for a straight Double using the BitConverter.GetBytes and some knowledge contained here: http://blogs.msdn.com/bclteam/archive/2007/05/29/bcl-refresher-floating-point-types-the-good-the-bad-and-the-ugly-inbar-gazit-matthew-greig.aspx but my eyes just kept glazing over every time I read that article and I eventually ran out of time to try to implement a solution.
I’m always open to other solutions if anyone can think of something better.
I'd multiply by something sufficiently large (100,000,000 for 8 decimal places), then divide by the GCD of the resulting numbers. You'll end up with a pile of smallest integers that you can feed to the other algorithm. After getting the result, reverse the process to recover your original range.
Multiple all the numbers by 10
until you have integers.
Divide
by 2,3,5,7 while you still have all
integers.
I think that covers all cases.
2.1 * 10/7 -> 3
0.008 * 10^3/2^3 -> 1
That's assuming your multiplier can be a rational fraction.
If you want to find some integer N so that N*x is also an exact integer for a set of floats x in a given set are all integers, then you have a basically unsolvable problem. Suppose x = the smallest positive float your type can represent, say it's 10^-30. If you multiply all your numbers by 10^30, and then try to represent them in binary (otherwise, why are you even trying so hard to make them ints?), then you'll lose basically all the information of the other numbers due to overflow.
So here are two suggestions:
If you have control over all the related code, find another
approach. For example, if you have some function that takes only
int's, but you have floats, and you want to stuff your floats into
the function, just re-write or overload this function to accept
floats as well.
If you don't have control over the part of your system that requires
int's, then choose a precision to which you care about, accept that
you will simply have to lose some information sometimes (but it will
always be "small" in some sense), and then just multiply all your
float's by that constant, and round to the nearest integer.
By the way, if you're dealing with fractions, rather than float's, then it's a different game. If you have a bunch of fractions a/b, c/d, e/f; and you want a least common multiplier N such that N*(each fraction) = an integer, then N = abc / gcd(a,b,c); and gcd(a,b,c) = gcd(a, gcd(b, c)). You can use Euclid's algorithm to find the gcd of any two numbers.
Greg: Nice solution but won't calculating a GCD that's common in an array of 100+ numbers get a bit expensive? And how would you go about that? Its easy to do GCD for two numbers but for 100 it becomes more complex (I think).
Evil Andy: I'm programing in .Net and the solution you pose is pretty much a match for what we do now. I didn't want to include it in my original question cause I was hoping for some outside the box (or my box anyway) thinking and I didn't want to taint peoples answers with a potential solution. While I don't have any solid performance statistics (because I haven't had any other method to compare it against) I know the string parsing would be relatively expensive and I figured a purely mathematical solution could potentially be more efficient.
To be fair the current string parsing solution is in production and there have been no complaints about its performance yet (its even in production in a separate system in a VB6 format and no complaints there either). It's just that it doesn't feel right, I guess it offends my programing sensibilities - but it may well be the best solution.
That said I'm still open to any other solutions, purely mathematical or otherwise.
What language are you programming in? Something like
myNumber.ToString().Substring(myNumber.ToString().IndexOf(".")+1).Length
would give you the number of decimal places for a double in C#. You could run each number through that and find the largest number of decimal places(x), then multiply each number by 10 to the power of x.
Edit: Out of curiosity, what is this sealed system which you can pass only integers to?
In a loop get mantissa and exponent of each number as integers. You can use frexp for exponent, but I think bit mask will be required for mantissa. Find minimal exponent. Find most significant digits in mantissa (loop through bits looking for last "1") - or simply use predefined number of significant digits.
Your multiple is then something like 2^(numberOfDigits-minMantissa). "Something like" because I don't remember biases/offsets/ranges, but I think idea is clear enough.
So basically you want to determine the number of digits after the decimal point for each number.
This would be rather easier if you had the binary representation of the number. Are the numbers being converted from rationals or scientific notation earlier in your program? If so, you could skip the earlier conversion and have a much easier time. Otherwise you might want to pass each number to a function in an external DLL written in C, where you could work with the floating point representation directly. Or you could cast the numbers to decimal and do some work with Decimal.GetBits.
The fastest approach I can think of in-place and following your conditions would be to find the smallest necessary power-of-ten (or 2, or whatever) as suggested before. But instead of doing it in a loop, save some computation by doing binary search on the possible powers. Assuming a maximum of 8, something like:
int NumDecimals( double d )
{
// make d positive for clarity; it won't change the result
if( d<0 ) d=-d;
// now do binary search on the possible numbers of post-decimal digits to
// determine the actual number as quickly as possible:
if( NeedsMore( d, 10e4 ) )
{
// more than 4 decimals
if( NeedsMore( d, 10e6 ) )
{
// > 6 decimal places
if( NeedsMore( d, 10e7 ) ) return 10e8;
return 10e7;
}
else
{
// <= 6 decimal places
if( NeedsMore( d, 10e5 ) ) return 10e6;
return 10e5;
}
}
else
{
// <= 4 decimal places
// etc...
}
}
bool NeedsMore( double d, double e )
{
// check whether the representation of D has more decimal points than the
// power of 10 represented in e.
return (d*e - Math.Floor( d*e )) > 0;
}
PS: you wouldn't be passing security prices to an option pricing engine would you? It has exactly the flavor...