Testing functions through I/O in Haskell [duplicate] - debugging

This question already has answers here:
How to "debug" Haskell with printfs?
(6 answers)
Closed 7 years ago.
I'm playing codewars to sharpen my Haskell skills, and running into a problem that I haven't had in imperative languages.
Let's say I'm writing a function foo() in javascript, which takes an int, adds two, squares it, subtracts one, and returns the square root of that number.
var foo = function(n) {
n += 2;
n = n * n;
n -= 1;
n = Math.sqrt(n);
}
I want to check on the state of the data being processed in the function at various points to help me troubleshoot/revise/debug code, so I will insert console.log() statements whenever I want to see where I'm at. For example, am I, in fact, squaring the sum of n+2 correctly halfway through the function? Let's see...
var foo = function(n) {
n += 2;
n = n * n;
console.log("n = " + n);
n -= 1;
n = Math.sqrt(n);
}
While this example should be simple enough for a Haskeller to write in one line, if you have a complex function and want to check the state at different points, how do Haskellers do it? Is there a standard practice using the IO() monad? Do they get around it some other way?

GHCi has a fancy debugger that lets you step through your code and evaluate it line by line, checking it's state and intermediary results.
But, of course, there is also the printf-style debugging that you are asking for, using the trace function from Debug.Trace. Nothing wrong with using that for small scripts imho, but it's generally discouraged.
trace has the type String -> a -> a, so you pass a string that gets printed (via unsafePerformIO) and any argument that gets simply returned.
In your example we could use it as follows. This is your function translated to Haskell:
foo x = sqrt $ (x+2)**2 - 1
Now we can just add trace and the string we want to see, e.g. "Debug Me: " ++ (show ((x+2)**2)). First import Debug.Trace, though:
import Debug.Trace
foo x = sqrt $ (trace ("Debug Me: " ++ (show ((x+2)**2))) ((x+2)**2)) - 1
A bit ugly? Following David Young's comment below, we better use traceShowId :: Show a => a -> a, if what we want to output is identical to the intermediary result (converted to String, of course):
import Debug.Trace
foo x = sqrt $ (traceShowId ((x+2)**2)) - 1
See here for a summary of debugging options.

Related

How to compute and store the digits of sqrt(n) up to 10^6 decimal places?

I am doing research work. for which I need to compute and store the square root of 2 up to 10^6 places. I have googled for this but I got only a NASA page but how they computed that I don't know. I used set_precision of c++. but that is giving the result up to around 50 places only.what should I do?
NASA page link: https://apod.nasa.gov/htmltest/gifcity/sqrt2.1mil
I have tried binary search also but not fruitful.
long double ans = sqrt(n);
cout<<fixed<<setprecision(50)<<ans<<endl;
You have various options here. You can work with an arbitrary-precision floating-point library (for example MPFR with C or C++, or mpmath or the built-in decimal library in Python). Provided you know what error guarantees that library gives, you can ensure that you get the correct decimal digits. For example, both MPFR and Python's decimal guarantee correct rounding here, but MPFR has the disadvantage (for your particular use-case of getting decimal digits) that it works in binary, so you'd also need to analyse the error induced by the binary-to-decimal conversion.
You can also work with pure integer methods, using an arbitrary-precision integer library (like GMP), or a language that supports arbitrary-precision integers out of the box (for example, Java with its BigInteger class: recent versions of Java provide a BigInteger.sqrt method): scale 2 by 10**2n, where n is the number of places after the decimal point that you need, take the integer square root (i.e., the integer part of the exact mathematical square root), and then scale back by 10**n. See below for a relatively simple but efficient algorithm for computing integer square roots.
The simplest out-of-the-box option here, if you're willing to use another language, is to use Python's decimal library. Here's all the code you need, assuming Python 3 (not Python 2, where this will be horribly slow).
>>> from decimal import Decimal, getcontext
>>> getcontext().prec = 10**6 + 1 # number of significant digits needed
>>> sqrt2_digits = str(Decimal(2).sqrt())
The str(Decimal(2).sqrt()) operation takes less than 10 seconds on my machine. Let's check the length, and the first and last hundred digits (we obviously can't reproduce the whole output here):
>>> len(sqrt2_digits)
1000002
>>> sqrt2_digits[:100]
'1.41421356237309504880168872420969807856967187537694807317667973799073247846210703885038753432764157'
>>> sqrt2_digits[-100:]
'2637136344700072631923515210207475200984587509349804012374947972946621229489938420441930169048412044'
There's a slight problem with this: the result is guaranteed to be correctly rounded, but that's rounded, not truncated. So that means that that final "4" digit could be the result of a final round up - that is, the actual digit in that position could be a "3", with an "8" or "9" (for example) following it.
We can get around this by computing a couple of extra digits, and then truncating them (after double checking that rounding of those extra digits doesn't affect the truncation).
>>> getcontext().prec = 10**6 + 3
>>> sqrt2_digits = str(Decimal(2).sqrt())
>>> sqrt2_digits[-102:]
'263713634470007263192351521020747520098458750934980401237494797294662122948993842044193016904841204391'
So indeed the millionth digit after the decimal point is a 3, not a 4. Note that if the last 3 digits computed above had been "400", we still wouldn't have known whether the millionth digit was a "3" or a "4", since that "400" could again be the result of a round up. In that case, you could compute another two digits and try again, and so on, stopping when you have an unambiguous output. (For further reading, search for "The table maker's dilemma".)
(Note that setting the decimal module's rounding mode to ROUND_DOWN does not work here, since the Decimal.sqrt method ignores the rounding mode.)
If you want to do this using pure integer arithmetic, Python 3.8 offers a math.isqrt function for computing exact integer square roots. In this case, we'd use it as follows:
>>> from math import isqrt
>>> sqrt2_digits = str(isqrt(2*10**(2*10**6)))
This takes a little longer: around 20 seconds on my laptop. Half of that time is for the binary-to-decimal conversion implicit in the str call. But this time, we got the truncated result directly, and didn't have to worry about the possibility of rounding giving us the wrong final digit(s).
Examining the results again:
>>> len(sqrt2_digits)
1000001
>>> sqrt2_digits[:100]
'1414213562373095048801688724209698078569671875376948073176679737990732478462107038850387534327641572'
>>> sqrt2_digits[-100:]
'2637136344700072631923515210207475200984587509349804012374947972946621229489938420441930169048412043'
This is a bit of a cheat, because (at the time of writing) Python 3.8 hasn't been released yet, although beta versions are available. But there's a pure Python version of the isqrt algorithm in the CPython source, that you can copy and paste and use directly. Here it is in full:
import operator
def isqrt(n):
"""
Return the integer part of the square root of the input.
"""
n = operator.index(n)
if n < 0:
raise ValueError("isqrt() argument must be nonnegative")
if n == 0:
return 0
c = (n.bit_length() - 1) // 2
a = 1
d = 0
for s in reversed(range(c.bit_length())):
# Loop invariant: (a-1)**2 < (n >> 2*(c - d)) < (a+1)**2
e = d
d = c >> s
a = (a << d - e - 1) + (n >> 2*c - e - d + 1) // a
return a - (a*a > n)
The source also contains an explanation of the above algorithm and an informal proof of its correctness.
You can check that the results by the two methods above agree (modulo the extra decimal point in the first result). They're computed by completely different methods, so that acts as a sanity check on both methods.
You could use big integers, e.g. BigInteger in Java. Then you calculate the square root of 2e12 or 2e14. Note that sqrt(2) = 1.4142... and sqrt(200) = 14.142... Then you can use the Babylonian method to get all the digits: E.g. S = 10^14. x(n+1) = (x(n) + S / x(n)) / 2. Repeat until x(n) doesn't change. Maybe there are more efficient algorithms that converge faster.
// Input: a positive integer, the number of precise digits after the decimal point
// Output: a string representing the long float square root
function findSquareRoot(number, numDigits) {
function get_power(x, y) {
let result = 1n;
for (let i = 0; i < y; i ++) {
result = result * BigInt(x);
}
return result;
}
let a = 5n * BigInt(number);
let b = 5n;
const precision_digits = get_power(10, numDigits + 1);
while (b < precision_digits) {
if (a >= b) {
a = a - b;
b = b + 10n;
} else {
a = a * 100n;
b = (b / 10n) * 100n + 5n;
}
}
let decimal_pos = Math.floor(Math.log10(number))
if (decimal_pos == 0) decimal_pos = 1
let result = (b / 100n).toString()
result = result.slice(0, decimal_pos) + '.' + result.slice(decimal_pos)
return result
}

Ocaml performance vs go

I'm trying to implement utf8 decoding in Ocaml as a learning project. To check the performance I'm benchmarking against the go standard library.
This is the go code:
package main
import (
"fmt"
"time"
"unicode/utf8"
)
func main() {
start := time.Now()
for i := 0; i < 1000000000; i++ {
utf8.ValidRune(23450)
}
elapsed := time.Since(start)
fmt.Println(elapsed)
}
When I run it, I get:
go build b.go
./b
344.979492ms
I decided to write an equivalent in ocaml:
let min = 0x0000
let max = 0x10FFFF
let surrogateMin = 0xD800
let surrogateMax = 0xDFFF
let validUchar c =
if (0 <= c && c < surrogateMin) then
true
else if (surrogateMax < c && c <= max) then
true
else
false
let time f x =
let t = Sys.time () in
let _ = f x in
let t2 = Sys.time () in
let diff = (t2 -. t) *. 1000. in
print_endline ((string_of_float diff) ^ "ms")
let test () =
for i = 0 to 1000000000 do
let _ = validUchar 23450 in
()
done
let () = time test ()
Output:
ocamlopt bMl.ml -o bMl
./bMl
2041.075ms
The ocaml equivalent basically copies the implementation of the go stdlib from https://golang.org/src/unicode/utf8/utf8.go#L517
Why is the ocaml code so much slower?
As observed you should be using Unix.gettimeofday to measure wallclock time. You can use Sys.opaque_identity however to prevent OCaml from optimizing useless operations away, and you can use ignore to 'return unit' instead of the usual value of an expression. Altogether:
let time f x =
let t = Unix.gettimeofday () in
ignore (Sys.opaque_identity (f x));
let t2 = Unix.gettimeofday () in
...
let test () =
for i = 1 to 1_000_000_000 do
ignore (Sys.opaque_identity (validUchar 23450));
done
Note the i = 1, which you want if you want exactly one billion iterations (a figure I couldn't tell was one billion before adding the underscores, which OCaml allows). Previously you were measuring one billion plus 1 iterations. Not that that was the difference.
Your verbose definition of validUchar did not benefit its performance any. Please write a microbenchmark and confirm that.
Finally, after making the changes suggested above and writing your validUchar in a more natural manner, I get an OCaml runtime that's identical to the Go runtime ... after adding -O3 to the ocamlopt arguments. And it's easy to confirm that this isn't due to the compiler 'optimizing the operations away' -- commenting out the f x call in time results in runtimes of 0 or near-0 values like 1.19e-06.
Don't be discouraged by the responses you got to this question. But do expect that any kind of "why does this benchmark have this result?" question to a programming forum will be answered similarly.
The Sys.time shouldn't be used for time measurements, as it returns a processor time, not the real time. The Unix.gettimeofday function is a much better candidate. Alternatively, you can time your program from the shell using the time command.
As a side note, benchmarking is hard, and it is very easy to get misleading results. In your particular case, if you will turn on optimizations both compilers will remove the computations, since they are not used and will produce code that does nothing, and thus is rather fast :)

How to decide between Global or Local vector for m choose n

Suppose we want to write a code that prints all ways of selecting n out of m options.
I think, the programming language does not matter, but if I should state it, Python.
I put the assignments in a vector A. Do I better define A as a global variable or pass it to the function each time? Why?
def choose(ind, n):
if n == 0:
print(A)
return
elif len(A)<= ind:
return
else:
A[ind] = 1
choose(ind + 1, n - 1)
A[ind] = 0
choose(ind + 1, n)
Always prefer passing over mutating globals whenever feasible.
Say you have the following functions:
def some_fun1 (n):
return n + 1;
m = 1;
def some_fun2 ():
return m + 1
With the first function, you can load up your REPL and throw data at it just by passing it as an argument. Your testing of that pure function has 0 effect on the rest of the program, which makes testing significantly easier.
With the second function, any time you need to test it, you must manually set all the globals the function relies on, which could potentially affect the operation of other functions if they rely on the same globals. This makes testing harder, and for that reason, among others, mutating globals should be avoided.

Better random "feeling" integer generator for short sequences

I'm trying to figure out a way to create random numbers that "feel" random over short sequences. This is for a quiz game, where there are four possible choices, and the software needs to pick one of the four spots in which to put the correct answer before filling in the other three with distractors.
Obviously, arc4random % 4 will create more than sufficiently random results over a long sequence, but in a short sequence its entirely possible (and a frequent occurrence!) to have five or six of the same number come back in a row. This is what I'm aiming to avoid.
I also don't want to simply say "never pick the same square twice," because that results in only three possible answers for every question but the first. Currently I'm doing something like this:
bool acceptable = NO;
do {
currentAnswer = arc4random() % 4;
if (currentAnswer == lastAnswer) {
if (arc4random() % 4 == 0) {
acceptable = YES;
}
} else {
acceptable = YES;
}
} while (!acceptable);
Is there a better solution to this that I'm overlooking?
If your question was how to compute currentAnswer using your example's probabilities non-iteratively, Guffa has your answer.
If the question is how to avoid random-clustering without violating equiprobability and you know the upper bound of the length of the list, then consider the following algorithm which is kind of like un-sorting:
from random import randrange
# randrange(a, b) yields a <= N < b
def decluster():
for i in range(seq_len):
j = (i + 1) % seq_len
if seq[i] == seq[j]:
i_swap = randrange(i, seq_len) # is best lower bound 0, i, j?
if seq[j] != seq[i_swap]:
print 'swap', j, i_swap, (seq[j], seq[i_swap])
seq[j], seq[i_swap] = seq[i_swap], seq[j]
seq_len = 20
seq = [randrange(1, 5) for _ in range(seq_len)]; print seq
decluster(); print seq
decluster(); print seq
where any relation to actual working Python code is purely coincidental. I'm pretty sure the prior-probabilities are maintained, and it does seem break clusters (and occasionally adds some). But I'm pretty sleepy so this is for amusement purposes only.
You populate an array of outcomes, then shuffle it, then assign them in that order.
So for just 8 questions:
answer_slots = [0,0,1,1,2,2,3,3]
shuffle(answer_slots)
print answer_slots
[1,3,2,1,0,2,3,0]
To reduce the probability for a repeated number by 25%, you can pick a random number between 0 and 3.75, and then rotate it so that the 0.75 ends up at the previous answer.
To avoid using floating point values, you can multiply the factors by four:
Pseudo code (where / is an integer division):
currentAnswer = ((random(0..14) + lastAnswer * 4) % 16) / 4
Set up a weighted array. Lets say the last value was a 2. Make an array like this:
array = [0,0,0,0,1,1,1,1,2,3,3,3,3];
Then pick a number in the array.
newValue = array[arc4random() % 13];
Now switch to using math instead of an array.
newValue = ( ( ( arc4random() % 13 ) / 4 ) + 1 + oldValue ) % 4;
For P possibilities and a weight 0<W<=1 use:
newValue = ( ( ( arc4random() % (P/W-P(1-W)) ) * W ) + 1 + oldValue ) % P;
For P=4 and W=1/4, (P/W-P(1-W)) = 13. This says the last value will be 1/4 as likely as other values.
If you completely eliminate the most recent answer it will be just as noticeable as the most recent answer showing up too often. I do not know what weight will feel right to you, but 1/4 is a good starting point.

Algorithm for series

A, B, C,…. Z, AA, AB, ….AZ, BA,BB,…. , ZZ,AAA, …., write a function that takes a integer n and returns the string presentation. Can somebody tell me the algorithm to find the nth value in the series?
Treat those strings as numbers in base 26 with A=0. It's not quite an exact translation because in real base 26 A=AA=AAA=0, so you have to make some adjustments as necessary.
Here's a Java implementation:
static String convert(int n) {
int digits = 1;
for (int j = 26; j <= n; j *= 26) {
digits++;
n -= j;
}
String s = "";
for (; digits --> 0 ;) {
s = (char) ('A' + (n % 26)) + s;
n /= 26;
}
return s;
}
This converts 0=A, 26=AA, 702=AAA as required.
Without giving away too much (since this question seems to be a homework problem), what you're doing is close to the same as translating that integer n into base 26. Good luck!
If, as others suspect, this is homework, then this answer probably won't be much help. If this is for a real-world project though, it might make sense to do make a generator instead, which is an easy and idiomatic thing to do in some languages, such as Python. Something like this:
def letterPattern():
pattern = [0]
while True:
yield pattern
pattern[0] += 1
# iterate through all numbers in the list *except* the last one
for i in range(0,len(pattern)-1):
if pattern[i] == 26:
pattern[i] = 0
pattern[i+1] += 1
# now if the last number is 26, set it to zero, and append another zero to the end
if pattern[-1] == 26:
pattern[-1] = 0
pattern.append(0)
Except instead of yielding pattern itself you would reverse it, and map 0 to A, 1 to B, etc. then yield the string. I've run the code above and it seems to work, but I haven't tested it extensively at all.
I hope you'll find this readable enough to implement, even if you don't know Python. (For the Pythonistas out there, yes the "for i in range(...)" loop is ugly and unpythonic, but off the top of my head, I don't know any other way to do what I'm doing here)

Resources