In the following video, Thinking Parallel Programming, in the end, Guy Steele recommends ways to improve your code performance
He mentions at around 1:01, if you have a state s and array of functions f[i], then rather than doing
for i <- 1 to n; s = f[i](s)
do
fcomposed(s) where fcomposed is the composition of all functions in f[i]
I don't see any benefit here. Have I misinterpreted something?
Related
As part of an advent of code challenge, I've written the following functions in Haskell:
simulateUntilRepeat_int a b i = if (a /= b) then (simulateUntilRepeat_int a (updateCycle b) (i+1)) else i
simulateUntilRepeat a = simulateUntilRepeat_int a (updateCycle a) 1
The purpose of this is to take a list of moons and simulate their movement until they resume their original position, returning the number of cycles it took for them to get there. (the function updateCycle does one iteration of the simulation). However, when I attempt to run this it uses all available memory and then gets killed by the operating system. The question does admit that this may take a very large number of cycles.
Googling around about this problem I find the usual fix is to make some of the parameters strict, but I think I've experimented with all possible permutations of strictness on the parameters to no avail. By the looks of this function, I'd have anticipated the compiler would be able to use the tail recursion optimisation and turn it into a loop, but this seems to not be happening somehow.
A friend of mine, who is knowledgeable in haskell suggested changing the form of the function to the following:
f a b0 = length (takeWhile (/= a) (iterate updateCycle b0))
But doing this didn't fix it either, leaving me out of ideas.
The comments are undoubtedly correct that your approach is not the intended solution method.
However, the functions you've posted would not, in and of themselves, cause a memory leak, fail to tail recurse, or lead to poor performance. Given your code above plus the definitions:
updateCycle 4686774942 = 0
updateCycle n = n+1
main = do
print $ simulateUntilRepeat (0 :: Int)
and compiling with -O2, the program runs in constant memory on my laptop in about 30 seconds. Adding explicit type signatures to use Int in place of Integer for the iteration count:
simulateUntilRepeat_int :: Int -> Int -> Int -> Int
simulateUntilRepeat :: Int -> Int
it runs in about 2.4 seconds.
So, to understand why your program is gobbling all available memory or why your strictness annotations failed to make a difference, it would probably be necessary to see the whole working program (or preferably a minimal example that illustrates the performance problem). If the program is short, and the question is "why is the performance of this program totally unreasonable?" instead of "how can I optimize my program to run as fast as possible?", it might still be a good SO question. Otherwise, the Code Review site might be better -- you can post a larger program there and ask for general performance advice, and that's considered on-topic for that site.
We know that while and for are the most frequently used loop syntax for many programming and scripting languages. Here I would like to ask some questions regarding convertibility and feasibility of using while vs for loop.
Is for to while and vice versa transformation or conversion always possible? I mean suppose one used while loop for some functionality and I want to replace while with for or say vice-versa, then Is while ⇋ for transformation/conversion always possible (also interested in knowing the feasibility)? It would be helpful of I can refer If any research regarding this carried out.
I'm also interested in getting the general guidance for using while vs for. Also want to know if while has some advantages over for and vice versa.
Note: I've this question for log time, I thought -- being a great programming site, this question can be useful here. If the question is not suitable here. I'm unsure if this question is acceptable here, so requesting to consider it liberal; you can ask me to remove if such question hurts the quality of site :)
I will answer using Java as a reference, though this answer should also be completely valid for C, C#, C++ and many others. If we consider the following for loop:
for (int i=0; i < 10; ++i) {
// do something, maybe involving i
}
We can see that the loop has 3 components:
int i=0; initialization of loop counter
i < 10 criteria for loop to execute
++i increment to loop counter
The following while loop is functionally equivalent to the above for loop:
int i=0;
while (i < 10) {
// do something, maybe involving i
++i;
}
We can see that the main difference between this while loop and the for loop are that the declaration and initialization of the loop counter is outside the loop in the former case. Also, we increment the loop counter inside the actual while loop. The check for the loop continuing is still done inside the loop structure, as with for loops.
So a for loop can be thought of an enhanced while loop of sorts. It frees us from having to create a loop counter outside the loop, and also we can increment/change the loop counter within the loop structure, rather than mixing such logic with the code of the loop body.
I am having trouble constructing my own nested selection statements (ifs) and repetition statements (for loops, whiles and do-whiles). I can understand what most simple repetition and selection statements are doing and although it takes me a bit longer to process what the nested statements are doing I can still get the general gist of the code (keeping count of the control variables and such). However, the real problem comes down to the construction of these statements, I just can't for the life of me construct my own statements that properly aligns with the pseudo-code.
I'm quite new to programming in general so I don't know if this is an experience thing or I just genuinely lack a very logical mind. It is VERY demoralising when it takes me a about an hour to complete 1 question in a book when I feel like it should just take a fraction of the time.
Can you people give me some pointers on how I can develop proper nested selection and repetition statements?
First of all, you need to understand this:
An if statement defines behavior for when **a decision is made**.
A loop (for, while, do-while) signifies **repetitive (iterative) work being done** (such as counting things).
Once you understand these, the next step, when confronted with a problem, is to try to break that problem up in small, manageable components:
i.e. decisions, that provide you with the possible code paths down the way,
and various work you need to do, much of which will end up being repetitive,
hence the need for one or more loops.
For instance, say we have this simple problem:
Given a positive number N, if N is even, count (display numbers) from
0(zero) to N in steps of 2, if N is odd, count from 0 to N in steps of
1.
First, let's break up this problem.
Step 1: Get the value of N. For this we don't need any decision, simply get it using the preferred method (from file, read console, etc.)
Step 2: Make a decision: is N odd or even?
Step 3: According to the decision made in Step 2, do work (count) - we will iterate from 0 to N, in steps of 1 or 2, depending on N's parity, and display the number at each step.
Now, we code:
//read N
int N;
cin<<N;
//make decision, get the 'step' value
int step=0;
if (N % 2 == 0) step = 2;
else step = 1;
//do the work
for (int i=0; i<=N; i=i+step)
{
cout >> i >> endl;
}
These principles, in my opinion, apply to all programming problems, although of course, usually it is not so simple to discern between concepts.
Actually, the complicated phase is usually the problem break-up. That is where you actually think.
Coding is just translating your thinking so the computer can understand you.
20-30 years ago arithmetic operations like division were one of the most costly operations for CPUs. Saving one division in a piece of repeatedly called code was a significant performance gain. But today CPUs have fast arithmetic operations and since they heavily use instruction pipelining, conditionals can disrupt efficient execution. If I want to optimize code for speed, should I prefer arithmetic operations in favor of conditionals?
Example 1
Suppose we want to implement operations modulo n. What will perform better:
int c = a + b;
result = (c >= n) ? (c - n) : c;
or
result = (a + b) % n;
?
Example 2
Let's say we're converting 24-bit signed numbers to 32-bit. What will perform better:
int32_t x = ...;
result = (x & 0x800000) ? (x | 0xff000000) : x;
or
result = (x << 8) >> 8;
?
All the low hanging fruits are already picked and pickled by authors of compilers and guys who build hardware. If you are the kind of person who needs to ask such question, you are unlikely to be able to optimize anything by hand.
While 20 years ago it was possible for a relatively competent programmer to make some optimizations by dropping down to assembly, nowadays it is the domain of experts, specializing in the target architecture; also, optimization requires not only knowing the program, but knowing the data it will process. Everything comes down to heuristics, tests under different conditions etc.
Simple performance questions no longer have simple answers.
If you want to optimise for speed, you should just tell your compiler to optimise for speed. Modern compilers will generally outperform you in this area.
I've sometimes been surprised trying to relate assembly code back to the original source for this very reason.
Optimise your source code for readability and let the compiler do what it's best at.
I expect that in example #1, the first will perform better. The compiler will probably apply some bit-twiddling trick to avoid a branch. But you're taking advantage of knowledge that it's extremely unlikely that the compiler can deduce: namely that the sum is always in the range [0:2*n-2] so a single subtraction will suffice.
For example #2, the second way is both faster on modern CPUs and simpler to follow. A judicious comment would be appropriate in either version. (I wouldn't be surprised to see the compiler convert the first version into the second.)
I was trying to solve ITA Software's "Word Nubmers" puzzle using a brute force approach. It looks like my Haskell version is more than 10 times slower than a C#/C++ version.
The answer
Thanks to Bryan O'Sullivan's answer, I was able to "correct" my program to acceptable performance. You can read his code which is much cleaner than mine. I am going to outline the key points here.
Int is Int64 on Linux GHC x64. Unless you unsafeCoerce, you should just use Int. This saves you from having to fromIntegral. Doing Int64 on Windows 32-bit GHC is just darn slow, avoid it. (This is in fact not GHC's fault. As mentioned in my blog post below, 64 bit integers in 32-bit programs is slow in general (at least in Windows))
-fllvm or -fvia-C for performance.
Prefer quotRem to divMod, quotRem already suffices. That gave me 20% speed up.
In general, prefer Data.Vector to Data.Array as an "array"
Use the wrapper-worker pattern liberally.
The above points were enough to give me about 100% boost over my original version.
In my blog post, I have detailed a step-by-step illustrated example of how I turned the original program to match Bryan's program. There are other points mentioned there as well.
The original question
(This may sound like a "could you do the work for me" post, but I argue that such a concrete example would be very instructive since profiling Haskell performance is often seen as a myth)
(As noted in the comments, I think I have misinterpreted the problem. But who cares, we can focus on performance in a different problem)
Here's a my version of a quick recap of the problem:
A wordNumber is defined as
wordNumber 1 = "one"
wordNumber 2 = "onetwo"
wordNumber 3 = "onethree"
wordNumber 15 = "onetwothreefourfivesixseveneightnineteneleventwelvethirteenfourteenfifteen"
...
Problem: Find the 51-billion-th letter of (wordNumber Infinity); assume that letter is found at 'wordNumber x', also find 'sum [1..x]'
From an imperative perspective, a naive algorithm would be to have 2 counters, one for sum of numbers and one for sum of lengths. Keep counting the length of each wordNumber and "break" to return the result.
The imperative brute-force approach is implemented in C# here: http://ideone.com/JjCb3. It takes about 1.5 minutes to find the answer on my computer. There is also an C++ implementation that runs in 45 seconds on my computer.
Then I implemented a brute-force Haskell version: http://ideone.com/ngfFq. It cannot finish the calculation in 5 minutes on my machine. (Irony: it's has more lines than the C# version)
Here is the -p profile of the Haskell program: http://hpaste.org/49934
Question: How to make it perform comparatively to the C# version? Are there obvious mistakes I am making?
(Note: I am fully aware that brute-forcing it is not the correct solution to this problem. I am mainly interested in making the Haskell version perform comparatively to the C# version. Right now it is at least 5x slower so obviously I am missing something obvious)
(Note 2: It does not seem to be space leaking. The program runs with constant memory (about 2MB) on my computer)
(Note 3: I am compiling with `ghc -O2 WordNumber.hs)
To make the question more reader friendly, I include the "gist" of the two versions.
// C#
long sumNum = 0;
long sumLen = 0;
long target = 51000000000;
long i = 1;
for (; i < 999999999; i++)
{
// WordiLength(1) = 3 "one"
// WordiLength(101) = 13 "onehundredone"
long newLength = sumLen + WordiLength(i);
if (newLength >= target)
break;
sumNum += i;
sumLen = newLength;
}
Console.WriteLine(Wordify(i)[Convert.ToInt32(target - sumLen - 1)]);
-
-- Haskell
-- This has become totally ugly during my squeeze for
-- performance
-- Tail recursive
-- n-th number (51000000000 in our problem) -> accumulated result -> list of 'zipped' left to try
-- accumulated has the format (sum of numbers, current lengths of the whole chain, the current number)
solve :: Int64 -> (Int64, Int64, Int64) -> [(Int64, Int64)] -> (Int64, Int64, Int64)
solve !n !acc#(!sumNum, !sumLen, !curr) ((!num, !len):xs)
| sumLen' >= n = (sumNum', sumLen, num)
| otherwise = solve n (sumNum', sumLen', num) xs
where
sumNum' = sumNum + num
sumLen' = sumLen + len
-- wordLength 1 = 3 "one"
-- wordLength 101 = 13 "onehundredone"
wordLength :: Int64 -> Int64
-- wordLength = ...
solution :: Int64 -> (Int64, Char)
solution !x =
let (sumNum, sumLen, n) = solve x (0,0,1) (map (\n -> (n, wordLength n)) [1..])
in (sumNum, (wordify n) !! (fromIntegral $ x - sumLen - 1))
I've written a gist that contains both a C++ version (a copy of yours from a Haskell-cafe message, with a bug fixed) and a Haskell translation.
Notice that the two are structurally almost identical. When compiled with -fllvm, the Haskell code runs at about half the speed of the C++ code, which is pretty good.
Now let's compare my Haskell wordLength code to yours. You're passing around an extra unnecessary parameter, which is unnecessary (you apparently figured that out when writing the C++ code that I translated). Also, the large number of bang patterns suggests panic; they're almost all useless.
Your solve function is also very confused.
You're passing parameters in three different ways: a regular Int, a 3-tuple, and a list! Whoa.
This function is necessarily not very regular in its behaviour, so while you gain nothing stylistically by using a list to supply your counter, you probably force GHC to allocate memory. In other words, this both obfuscates the code and makes it slower.
By using a tuple for three parameters (for no obvious reason), you're again working hard to force GHC to allocate memory for every step through the loop, when it could avoid doing so if you passed the parameters directly.
Only your n parameter is dealt with in a sensible way, but you don't need a bang pattern on it.
The only parameter that needs a bang pattern is sumNum, because you never inspect its value until after the loop has finished. GHC's strictness analyser will deal with the others. All of your other bang patterns are unnecessary at best, misdirections at worst.
Here are two pointers I could come up with in a quick investigation:
Note that using Int64 is really slow when you are using a 32 bit build of GHC, as is the default for Haskell Platform, currently. This also turned out to be the main villain in a previous performance problem (there I give a few more details).
For reasons I don't quite understand the divMod function does not seem to get inlined. As a result, the numbers are returned on the heap. When using div and mod separately, wordLength' executes purely on the stack as it should be.
Sadly I currently have no 64-bit GHC around to test whether this is enough to solve the problem.