Need explanation on reverse scientific notation function in Lisp - scheme

Here's the function i have and understand to go from 1) your coefficient and 2) your exponent to then extract the number out of the scientific notation.
Example:
coefficient 7,
exponent 3
7 * 10^3 = 7000
(define (scientific coeffiecent exponent) (* coefficient (expt 10 exponent)))
Here's what im struggling with: The function to go the other way around, From 7000 to the coeffiecent and exponent used to get it into the scientific notation. I've got a working function through networking, but really struggle understanding it entirely.
(define (sci-exponent number)
(floor (/ (log number) (log 10))))
(define (sci-coefficient number)
(/ number (expt 10 (sci-exponent number))))
if anyone could help me understand, It'll be greatly appreciated! Thanks for reading either way!

Look at the body of sci-exponent, it takes the floor of log(number)/log(10). As you might remember from math class: loga(n1)/loga(n2) = logn2(n1). So what you're getting there is log10(number), the floor of which gives you the number of digits of number minus 1, which would be the exponent for the scientific notation.
The coefficient is then easily derived from the exponent. Since, as you wrote, coeff * exp = number, then number / exp = coeff, which is exactly what sci-coefficient is implementing.

Related

How to generate random numbers in [0 ... 1.0] in Common Lisp

My understanding of Common Lisp pseudorandom number generation is that (random 1.0) will generate a fraction strictly less than 1. I would like to get numbers upto 1.0 inclusive. Is this possible? I guess I could decide on a degree of precision and generate integers and divide by the range but I'd like to know if there is a more widely accepted way of doing this. Thanks.
As you say, random will generate numbers in [0,1) by default, and in general (random x) will generate random numbers in [0,x). If these were real numbers and if the distribution really is random, then the probability of getting any number is zero, so this is effectively no different than [0,1]. But they're not real numbers: they're floats, so the probability of getting any particular value is higher since there are only a finite number of floats in [0,1].
Fortunately you can express exactly what you want: CL has a bunch of constants with names like *-epsilon which are defined so that, for instance
(/= (+ 1.0f0 single-float-epsilon) 1.0f0)
and single-float-epsilon is the smallest single-float for which this is true.
Thus (random (+ 1.0f0 single-float-epsilon)) will produce random single-floats in the range [0,1], and will eventually probably turn out 1.0f0. You can test this:
(defun tsit ()
(let ((f (+ 1.0f0 single-float-epsilon)))
(assert (/= f 1.0f0) (f) "oops")
(loop for i upfrom 1
for v = (random f)
when (= v 1.0f0)
return (values i v))))
And for me
> (tsit)
12839205
1.0
If you use double floats it takes ... quite a lot longer ... to get 1.0d0 (and remember to use double-float-epsilon).
I have a bit of a different idea here. Instead of trying to stretch the range over an epsilon, we can work with the original range, and pick a victim number somewhere in that range which gets mapped to the range limit. We can avoid a hard-coded victim by choosing one randomly, and changing it from time to time:
(defun make-random-gen (range)
(let ((victim nil)
(count 1))
(lambda ()
(when (zerop (decf count))
(setf count 10000
victim (random range)))
(let ((out (random range)))
(if (eql out victim) range out)))))
(defun testit ()
(loop with r = (make-random-gen 1.0)
for x = (funcall r)
until (eql x 1.0)
counting t))
At the listener:
[5]> (testit)
23030093
There is a small bias here in that the victim is never equal to range. So that is to say, the range value such as 1.0 is never victim and therefore always has a certain chance of occurring. Whereas every other value can potentially take a turn at being victim, having its chance of occurring temporarily reduced to zero. That should be faintly detectable in a statistical analysis of the output in that the range value will occur slightly more often than any other value.
It would be interesting to update this approach with a correction for that, an attempt to do which is this:
(defun make-random-gen (range)
(let ((victim nil)
(count 1))
(labels ((gen ()
(when (zerop (decf count))
(setf count 10000
victim (gen)))
(let ((out (random range)))
(if (eql out victim) range out))))
#'gen)))
Now when we select victim, we recurse on our own function which can potentially select range. Whenever range is selected as victim, that value is correctly suppressed: range will not occur in the output, because out will never be eql to range.
We can justify this with the following hand-waving argument:
Let us suppose that the recursive call to gen has a slight bias in favor of range being output. But whenever that happens, range is selected as victim, which prevents it from appearing in the output of gen.
There is a kind of negative feedback which should almost entirely correct the bias.
Note: our random-number-generating lambda would be better designed if it also captured a random state object also and used that. Then the sequence it yields would be undisturbed by other uses of the pseudo-random-number generator. That's a different topic.
On a theoretical note, note that neither [0, 1) nor [0, 1] yield strictly correct distributions. If we had a mathematically ideal PRNG, it would yield actual real numbers in these ranges. Since that range contains an uncountable infinity of real values, each one would occur with a zero probability: 1/aleph-null, which, I'm guessing, so tiny, that it cannot be distinguished from a real zero.
What we want is the floating-point PRNG to approximate the ideal PRNG.
The problem is that each floating-point value approximates a range of real values. So this means that if we have a generator of values in the range 0.0 to 1.0, it actually represents a range of real numbers from -epsilon to 1.0 + epsilon. If we take values from this PRNG and plot a bar graph of values, each bar in the graph has to have some nonzero width. The 0.0 bar is centered on 0, and the 1.0 bar is centered on 1. The distribution of real numbers extends from the left edge of the left bar, to the right edge of the right bar.
In order to create a PRNG which mimics an even distribution of values in the 0.0 to 1.0 interval, we have to include the 0.0 and 1.0 values with half probability. So that is to say, when we collect a large number of values from the PRNG, the 0.0 and 1.0 bars of the graph should be about half as high as all the other bars.
Under these conditions, we cannot distinguish the [0, 1.0) interval from the [0, 1.0] interval because they are exactly as large. We must include the 1.0 value, at about half the usual probability to account for the above uniformity problem. If we simply exclude that value, we create a bias in the wrong direction, because the 1.0 bar in the histogram now has a zero value.
One way we could rescue the situation might be to take the 1.0-epsilon bar of the histogram and make that value 50% more likely, so that the bar is 50% taller than average. Basically, we overload that last value of the range just before 1.0 to represent everything up to and not including 1.0, requiring that value to be more likely. And then, we exclude the 1.0 value from the output. All values approaching 1.0 from the left get mapped to the extra 50% probability of 1.0 - epsilon.

Number of items necessary to exceed a given collision probability for large spaces

(This is not a homework problem. If there is a class that offers this question as homework, please tell me as I would love to take it.)
This is related to the birthday problem.
I'm looking for a practical algorithm to calculate the number of items necessary to exceed a collision probability of p for large spaces. I need this for evaluating the suitability of hashing algorithms for storing large numbers of items.
For example, f(365, .5) should return 23, the number of people needed to exceed 0.5 probability that anyone share the same birthday.
I have created a simple implementation using an exact collision probability calculation:
def _items_for_p(buckets, p):
"""Return the number of items for chance of collision to exceed p."""
logger.debug('_items_for_p($r, $r)', buckets, p)
up = buckets
down = 1
while up > (down + 1):
n = (up + down) // 2
logger.debug('up=%r, down=%r, n=%r', up, down, n)
if _collision_p(buckets, n) > p:
logger.debug('Lowering up to %r', n)
up = n
else:
logger.debug('Raising down to %r', n)
down = n
return up
def _collision_p(buckets, items):
"""Return the probability of a collision."""
return 1 - _no_collision_p(buckets, items)
def _no_collision_p(buckets, items):
"""Return the probability of no collision."""
logger.debug('_no_collision_p(%r, %r)', buckets, items)
fac = math.factorial
return fac(buckets) / ((buckets ** items) * fac(buckets - items))
Needless to say, this does not work for the large spaces I want to work with (2^256, 2^512, etc).
I am looking for an algorithm that can calculate this in a reasonable amount of time with reasonable accuracy. The Wikipedia page provides mathematical approximations, but admittedly my math is a bit rusty, and I don't want to spend a lot of time investigating one approximation only to find that I cannot both generalize it and implement it quickly.
Solution to generalised birthday problem or probability p=0.5:
As noted by Wikipedia there is no proven formula that is quick to compute, but there is a formula that is conjectured to be exact. The formula involves computing square roots, natural logarithms, and basic arithmetic:
Sqrt(2*d*ln 2) + (3 - 2 * ln 2)/6 + (9 - 4(ln 2)^2)/(72 + Sqrt(2*d*ln 2) ) - 2 ln(2)^2/(135* d)
so you can feed in your d=2^256 and find out the answer that is conjectured to be exact.
Here's a quick attempt at implementing it, limited to the accuracy of python floats:
def solve_birthday_problem( d ):
ln2 = math.log(2)
term1 = (2*d*ln2)**0.5
term2 = (3 - 2 * ln2)/6.0
term3 = (9 - 4*(ln2)**2)/(72 + (2*d*ln2)**0.5 )
term4 = 2*ln2**2/(135.0 * d)
return math.ceil(term1 + term2 + term3 - term4)
You will need to fix it up to get an accurate precision integer result. The decimal library may be what is needed to fix this.

Behaviour of a own haskell function: does sometimes stop to produce (easy to produce) results

I wrote a haskell function to produce prime factorizations for numbers until a certain threshould – made of some prime factors. A minimal working code can be found here:
http://lpaste.net/117263
The problem: It works very good for "threshould <= 10^9" on my computer. But beginning with "threshould = 10^10" the method don't produce any results on my computer – I never see (even not) the first list element on my screen. The name of the critical function is "exponentSets". For every prime in the list 'factors', it computes the possible exponents (with respect to already chosen exponents for other primes). Further commends are in the code. If 10^10 works good on your machine, try it with an higher exponent (10^11 ...).
My question: what is responsible for that? How can I improve the quality of the function "exponentSets"? (I'm still not very experienced in Haskell so someone more experienced might have an Idea)
Even though you are using 64-bit integers, you still do not have enough capacity to store a temporary integer which is created in intLog:
intLog base num =
let searchExtend lower#(e, n) =
let upper#(e', n') = (2 * e, n^2) -- this line is what causes the problems
-- some code
in (some if) searchExtend (1, base)
rawLists is defined like this:
rawLists = recCall 1 threshould
Which in turn sets remaining_threshould in recCall to
threshould `quot` 1 -- same as threshould
Now intLog gets called by recCall like this:
intLog p remaining_threshould
which is the same as
intLog p threshould
Now comes the interesing part: Since num p is smaller than your base threshold, you call searchExtend (1, base), which then in turn does this:
searchExtend (e, n) =
let (e', n') = (2 * e, n ^ 2)
Since n is remaining_threshould, which is the same as threshould, you essentially square 2^32 + 1 and store this in an Int, which overflows and causes rawLists to give bogus results.
(2 ^ 32 + 1) ^ 2 :: Int is 8589934593
(2 ^ 32 + 1) ^ 2 :: Integer is 18446744082299486209

Hill Cipher using a 2 x 2 Key Matrix

I'm new to cryptography and I cannot seem to get my head around this problem:
The problem says that the Hill Cipher using the below 2 x 2 key matrix (K) was used to produce the ciphered text "KCFL".
K = (3 5)
(2 3)
It then asks to use the Hill Cipher to show the calculations and the plain text when I decipher the same encrypted message "KCFL".
I know with other matrices, e.g. for the determinant there is usually a formula, such as:
a x d - b x c
However, for the Hill Cipher I am completely lost.
I have done the following:
a) found the inverse of K:
K inverse = (-3 5)
(2 -3)
b) Found "KFCL":
KFCL = (10 5)
(2 11)
c) The next step (mod 26) confuses me. How do I decipher (using mod 26) and the Cipher Key to find the plain text?
Any help is greatly appreciated.
Many thanks.
To perform MOD26 of the matrix, take each number and MOD26. If the number is negative, add multiples of 26 until you hit a positive number.
This may also help you.
26 modulo in hill cipher encryption

First N digits of a long number in constant-time?

In a Project Euler problem I need to deal with numbers that can have hundreds of digits. And I need to perform some calculation on the first 9 digits.
My question is: what is the fastest possible way to determine the first N digits of a 100-digit integer? Last N digits are easy with modulo/remainder. For the first digits I can apply modulo 100 times to get digit by digit, or I can convert the number to String and truncate, but they all are linear time. Is there a better way?
You can count number of digits with this function:
(defn dec-digit-count [n]
(inc (if (zero? n) 0
(long (Math/floor (Math/log10 n))))))
Now we know how many digits are there, and we want to leave only first 9. What we have to is divide the number with 10^(digits-9) or in Clojure:
(defn first-digits [number digits]
(unchecked-divide number (int (Math/pow 10 digits))))
And call it like: (first-digits your-number 9) and I think it's in constant time. I'm only not sure about log10 implementation. But, it's sure a lot faster that a modulo/loop solution.
Also, there's an even easier solution. You can simply copy&paste first 9 digits from the number.
Maybe you can use not a long number, but tupple of two numbers: [first-digits, last-digits]. Perform operations on both of them, each time truncating to the required length (twice of the condition, 9 your case) the first at the right and the second at the left.
Like
222000333 * 666000555
147|852344988184|815
222111333 * 666111555
147|950925407752|815
so you can do only two small calculations: 222 * 666 = 147[852] and 333 * 555 = [184]815
But the comment about "a ha" solution is the most relevant for Project Euler :)
In Java:
public class Main {
public static void main(String[] args) throws IOException {
long N = 7812938291232L;
System.out.println(N / (int) (Math.pow(10, Math.floor(Math.log10(N)) - 8)));
N = 1234567890;
System.out.println(N / (int) (Math.pow(10, Math.floor(Math.log10(N)) - 8)));
N = 1000000000;
System.out.println(N / (int) (Math.pow(10, Math.floor(Math.log10(N)) - 8)));
}
}
yields
781293829
123456789
100000000
It may helps you first n digits of an exponentiation
and the answer of from this question
This algorithm has a compexity of O(b). But it is easy to change it to get O(log b)

Resources