Related
Today, while practicing some Algorithm questions I found an interesting question.
The question is
You have to divide 1 to n (with one missing value x ) into two equal
halfs such that sum of the two halfs are equal.
Example:
If n = 7 and x = 4
The solution will be {7, 5} and {1, 2, 3, 6}
I can answer it with brute force method but i want an efficient solution
Can any one help me out?
If the sum of the elements 1→N without x is odd then there is no solution.
Otherwise you can find your solution in O(N) with balanced selection.
4 in a row
First let us consider that any sequence of four contiguous numbers can be split in two sets with equal sum given that:
[x, x+1, x+2, x+3] → [x+3, x];[x+2, x+1]
Thus selecting them and placing them in sets A B B A balances sets A and B.
4 across
Moreover, when we have two couples across an omitted value, it can hold a similar property:
[x-2, x-1, x+1, x+2] → [x+2, x-2]; [x+1, x-1]
so still A B B A
At this point we can fix the following cases:
we have a quadruplet: we split it as in case 1
we have 2 numbers, x and other 2 numbers: we split as in case 2
Alright, but it can happen we have 3 numbers, x and other 3 numbers, or other conditions. How can we select in balanced manner anyway?
+2 Gap
If we look again at the gap across x:
[x-1, x+1]
we can notice that somehow if we split the two neighbors in two separate sets we must balance a +2 on the set with bigger sum.
Balancing Tail
We can do this by using the last four numbers of the sequence:
[4 3 2 1] → [4, 2] ; [3, 1] → 6 ; 4
Finally we have to consider that we might not have one of them, so let's build the other case:
[3 2 1] → [2] ; [3, 1] → 2 ; 4
and let us also realize we can do the very same at the other end of the sequence with an A B A B (or B A B A) pattern - if our +2 stands on B (or A);
4 across +
It is amazing that 4 across still holds if we jump h (odd!) numbers:
[x+3, x+2, x-2, x-3] → [x+3, x-3]; [x+2, x-2]
So, exploring the array we can draw the solution step by step
An example:
11 10 9 8 7 6 5 4 3 2 1
the sum it's even, so x can be only an even number:
x = 10
11 - 9 | 8 7 6 5 | 4 3 2 1 → (+2 gap - on A) (4 in a row) (balancing tail)
A B A B B A B A B A
x = 8
11 10 | 9 - 7 | 6 5 | 4 3 2 1 → (4 across +) (+2 gap - on A) (balancing tail)
a b A B | b a | B A B A
x = 6
11 10 9 8 | 7 - 5 | 4 3 2 1 → (4 in a row) (+2 gap - on A) (balancing tail)
A B B A A B A B B B
x = 4 we have no balancing tail - we have to do that with head
11 10 9 8 | 7 6 | 5 - 3 | 2 1 → (balancing head) (4 across +) (+2 gap)
A B A B A B | b a | B A
x = 2
11 10 9 8 | 7 6 5 4 | 3 - 1 → (balancing head) (4 in a row) (+2 gap)
A B A B A B B A B A
It is interesting to notice the symmetry of the solutions. Another example.
10 9 8 7 6 5 4 3 2 1
the sum it's odd, so x can be only an odd number, and the number of elements now is odd.
x = 9
10 - 8 | 7 6 5 4 | 3 2 1 → (+2 gap - on A) (4 in a row) (balancing tail)
A B A B B A B A B
x = 7
10 9 | 8 - 6 | 5 4 | 3 2 1 → (4 across +) (+2 gap - on A) (balancing tail)
a b | A B | b a B A B
x = 5
10 9 8 7 | 6 - 4 | 3 2 1 → (4 in a row) (+2 gap - on A) (balancing tail)
A B B A A B B A B
x = 3
10 9 8 7 | 6 5 | 4 - 2 | 1 → (balancing head) (4 across + virtual 0) (+2 gap)
A B A B B A | a b | A
x = 1
10 9 8 7 | 6 5 4 3 | 2 → (balancing head) (4 in a row) (+2 gap virtual 0)
A B A B A B B A B
Finally it is worth to notice we can switch from A to B whenever we have a full balanced segment (i.e. 4 in a row or 4 across)
Funny said - but the property requesting the sum([1 ... N]-x) to be even makes the cases quite redundant if you try yourself.
I am pretty sure this algorithm can be generalized - I'll probably provide a revised version soon.
This problem can be solved by wrapping the standard subset sum problem of dynamic programming with preprocessing steps. These steps are of O(1) com
Algorithm (n, x):
sum = n * (n+1) / 2
neededSum = sum - x
If (neededSum % 2 != 0): return 0
create array [1..n] and remove x from it
call standard subsetsum(arr, 0, neededSum/2, [])
Working python implementation of subsetsum algorithm - printing all subsets is given below.
def subsetsum(arr, i, sum, ss):
if i >= len(arr):
if sum == 0:
print ss
return 1
else:
return 0
ss1 = ss[:]
count = subsetsum(arr, i + 1, sum, ss1)
ss1.append(arr[i])
count += subsetsum(arr, i + 1, sum - arr[i], ss1)
return count
arr = [1, 2, 3, 10, 5, 7]
sum = 14
a = []
print subsetsum(arr, 0, sum, a)
Hope it helps!
I'm solving the following problem, which in essence is "find the diameter of a connected undirected weighted graph", in Haskell. Now, the solution below produces correct answers, but exceeds the time limit on 9/27 of the tests. I'm far from a Haskell prodigy, can you guys give me a clue whether and how I can improve the performance of my solution without using the builtin Data.Graph module? I tried using accumulator parameters, strict pairs and strict evaluation in some places, but either I used them incorrectly or the performance issue is elsewhere. Thanks in advance!
import qualified Data.Map as Map
import qualified Data.Set as Set
import Data.List (maximumBy)
import Data.Ord (comparing)
buildGraph :: [Int] -> Map.Map Int [(Int, Int)] -> Map.Map Int [(Int, Int)]
buildGraph [] acc = acc
buildGraph (from:to:dist:rest) acc = let withTo = Map.insertWith (++) from [(to, dist)] acc
withFromTo = Map.insertWith (++) to [(from, dist)] withTo
in buildGraph rest $ withFromTo
data Queue a = Queue {
ingoing :: [a]
, outgoing :: [a]
} deriving Show
toQueue xs = Queue [] xs
enqMany xs (Queue is os) = (Queue (reverse xs ++ is) os)
deq (Queue is []) = deq (Queue [] $ reverse is)
deq (Queue is (o:os)) = (o, Queue is os)
extract :: (Ord a) => a -> Map.Map a [b] -> [b]
extract k m = case Map.lookup k m of
Just value -> value
Nothing -> error "sdfsd" -- should never happen
bfs node graph = bfs' Set.empty (toQueue [(node, 0)]) []
where
bfs' :: Set.Set Int -> Queue (Int, Int) -> [(Int, Int)] -> [(Int, Int)]
bfs' visited (Queue [] []) acc = acc
bfs' visited que acc = let ((n, dist), rest) = deq que
in if Set.member n visited
then bfs' visited rest acc
else let children = map (\(i, d) -> (i, d + dist)) $ extract n graph
newNodes = enqMany children rest
in bfs' (Set.insert n visited) newNodes ((n, dist):acc)
findMostDistant xs = maximumBy (comparing snd) xs
solve input = answer
where
-- the first number is the number of edges and is not necessary
(_:triples) = map read $ words input
graph = buildGraph triples Map.empty
-- pick arbitary node, find the farther node from it using bfs
(mostDistant, _) = findMostDistant $ bfs (head triples) graph
-- find the farthest node from the previously farthest node, counting the distance on the way
(_, answer) = findMostDistant $ bfs mostDistant graph
tests = [
"11 2 7 2 1 7 6 5 1 8 2 8 6 8 6 9 10 5 5 9 1 9 0 10 15 3 1 21 6 4 3" -- 54
, "5 3 4 3 0 3 4 0 2 6 1 4 9" -- 22
, "16 2 3 92 5 2 10 14 3 42 2 4 26 14 12 50 4 6 93 9 6 24 15 14 9 0 2 95 8 0 90 0 13 60 9 10 59 1 0 66 11 12 7 7 10 35" -- 428
]
runZeroTests = mapM_ print $ map solve tests
main = do
answer <- solve <$> getContents
print answer
deq (Queue [] []) causes an infinite loop, I think.
When I’ve solved contest problems in Haskell, typically the biggest performance hog has been the slow I/O library, which operates on lazy linear linked lists of wide characters. The first thing I always do for a programming contest is replace that with fast I/O,
Here’s a version that makes minimal changes to the program logic and just replaces the I/O with Data.ByteString.Lazy.Char8, implemented with a lazily-evaluated list of strict byte arrays, and Data.ByteString.Builder, which builds a function to fill an output buffer. It should be useful to calculate the speed-up from fast I/O alone.
{-# LANGUAGE OverloadedStrings #-} -- Added
import Data.ByteString.Builder
(Builder, char7, intDec, toLazyByteString) -- Added
import qualified Data.ByteString.Lazy.Char8 as B8 -- Added
import qualified Data.Map as Map
import qualified Data.Set as Set
import Data.List (maximumBy)
import Data.Maybe (fromJust) -- Added
import Data.Monoid ((<>)) -- Added
import Data.Ord (comparing)
buildGraph :: [Int] -> Map.Map Int [(Int, Int)] -> Map.Map Int [(Int, Int)]
buildGraph [] acc = acc
buildGraph (from:to:dist:rest) acc = let withTo = Map.insertWith (++) from [(to, dist)] acc
withFromTo = Map.insertWith (++) to [(from, dist)] withTo
in buildGraph rest $ withFromTo
data Queue a = Queue {
ingoing :: [a]
, outgoing :: [a]
} deriving Show
toQueue xs = Queue [] xs
enqMany xs (Queue is os) = (Queue (reverse xs ++ is) os)
deq (Queue is []) = deq (Queue [] $ reverse is)
deq (Queue is (o:os)) = (o, Queue is os)
extract :: (Ord a) => a -> Map.Map a [b] -> [b]
extract k m = case Map.lookup k m of
Just value -> value
Nothing -> error "sdfsd" -- should never happen
bfs node graph = bfs' Set.empty (toQueue [(node, 0)]) []
where
bfs' :: Set.Set Int -> Queue (Int, Int) -> [(Int, Int)] -> [(Int, Int)]
bfs' visited (Queue [] []) acc = acc
bfs' visited que acc = let ((n, dist), rest) = deq que
in if Set.member n visited
then bfs' visited rest acc
else let children = map (\(i, d) -> (i, d + dist)) $ extract n graph
newNodes = enqMany children rest
in bfs' (Set.insert n visited) newNodes ((n, dist):acc)
findMostDistant xs = maximumBy (comparing snd) xs
solve triples = answer -- Changed (by deleting one line)
where
graph = buildGraph triples Map.empty
-- pick arbitary node, find the farther node from it using bfs
(mostDistant, _) = findMostDistant $ bfs (head triples) graph
-- find the farthest node from the previously farthest node, counting the distance on the way
(_, answer) = findMostDistant $ bfs mostDistant graph
tests = [ -- Unchanged, but now interpreted as OverloadedStrings
"11 2 7 2 1 7 6 5 1 8 2 8 6 8 6 9 10 5 5 9 1 9 0 10 15 3 1 21 6 4 3" -- 54
, "5 3 4 3 0 3 4 0 2 6 1 4 9" -- 22
, "16 2 3 92 5 2 10 14 3 42 2 4 26 14 12 50 4 6 93 9 6 24 15 14 9 0 2 95 8 0 90 0 13 60 9 10 59 1 0 66 11 12 7 7 10 35" -- 428
]
runZeroTests = B8.putStr -- Changed
. toLazyByteString
. foldMap format
. map (solve . parse)
$ tests
main :: IO () -- Changed
main = B8.interact ( toLazyByteString . format . solve . parse )
parse :: B8.ByteString -> [Int] -- Added
-- the first number is the number of edges and is not necessary
parse = map (fst . fromJust . B8.readInt) . tail . B8.words
format :: Int -> Builder -- Added
format n = intDec n <> eol where
eol = char7 '\n'
With help from #Davislor with doing IO using ByteString and a few other things I managed to get 100 points on the problem. In the end, what I did to optimize it was:
Using ByteString IO as #Davislor suggested
Since I knew integers in the input were valid, I wrote my own parseInt function that does not perform unnecessary checks.
Instead of lazy Map, I used Array to create an adjacency list. I do not know what the asymptotic complexity of constructing an Array using accumArray is (I believe it should be O(n)), but lookup in the array should be O(1), instead of the O(log n) for the Map.
Here is the final solution:
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE BangPatterns #-}
import Data.ByteString.Builder
(Builder, char7, intDec, toLazyByteString)
import qualified Data.ByteString.Lazy.Char8 as B8
import qualified Data.Set as Set
import Data.Monoid ((<>))
import Data.Char (ord)
import Data.ByteString (getLine)
import Data.Array (Array, array, accumArray, (!), (//))
buildAdjList :: Int -> [Int] -> Array Int [(Int, Int)]
buildAdjList n xs = accumArray (flip (:)) [] (0, n) $ triples xs []
where
triples [] res = res
triples (x:y:dist:rest) res = let edgeXY = (x, (y, dist))
edgeYX = (y, (x, dist))
in triples rest (edgeXY:edgeYX:res)
data Queue a = Queue {
ingoing :: [a]
, outgoing :: [a]
} deriving Show
enqMany xs (Queue is os) = Queue (reverse xs ++ is) os
deq (Queue [] []) = error "gosho"
deq (Queue is []) = deq (Queue [] $ reverse is)
deq (Queue is (o:os)) = (o, Queue is os)
bfs !node adjList = let start = (node, 0) in bfs' Set.empty (Queue [] [start]) start
where
bfs' :: Set.Set Int -> Queue (Int, Int) -> (Int, Int) -> (Int, Int)
bfs' visited (Queue [] []) !ans = ans
bfs' visited que !ans = let (curr#(n, dist), rest) = deq que
in if Set.member n visited
then bfs' visited rest ans
else let children = map (\(i, d) -> (i, d + dist)) $ adjList ! n
newNodes = enqMany children rest
in bfs' (Set.insert n visited) newNodes (longerEdge curr ans)
longerEdge :: (Int, Int) -> (Int, Int) -> (Int, Int)
longerEdge a b = if (snd a) < (snd b) then b else a
parseInt :: B8.ByteString -> Int
parseInt str = parseInt' str 0 where
parseInt' str !acc
| B8.null str = acc
| otherwise = parseInt' (B8.tail str) $ ((ord $ B8.head str) - 48 + acc * 10)
parseIntList :: B8.ByteString -> [Int]
parseIntList = map parseInt . B8.words
solve :: [Int] -> Int
solve (n:triples) = answer
where
graph = buildAdjList n triples
-- pick arbitary node, find the farther node from it using bfs
(mostDistant, _) = bfs (head triples) graph
-- find the farthest node from the previously farthest node, counting the distance on the way
(_, answer) = bfs mostDistant graph
main :: IO ()
main = B8.interact ( toLazyByteString . intDec . solve . parseIntList )
-- debug code below
tests = [
"11 2 7 2 1 7 6 5 1 8 2 8 6 8 6 9 10 5 5 9 1 9 0 10 15 3 1 21 6 4 3" -- 54
, "5 3 4 3 0 3 4 0 2 6 1 4 9" -- 22
, "16 2 3 92 5 2 10 14 3 42 2 4 26 14 12 50 4 6 93 9 6 24 15 14 9 0 2 95 8 0 90 0 13 60 9 10 59 1 0 66 11 12 7 7 10 35" -- 428
]
runZeroTests = B8.putStr
. toLazyByteString
. foldMap format
. map (solve . parseIntList)
$ tests
format :: Int -> Builder
format n = intDec n <> eol
where eol = char7 '\n'
There could still be room for improvement, the Set for visited nodes could be changed to a bit array, Int32 can be used instead of Int, BangPatterns could be applied, although I feel like I can't really make sense of the execution order of Haskell programs.
insert :: Eq(a) => a -> a -> [a] -> [a]
insert m n [] = []
insert m n (x:xs) | m==x = n : x : insert m n xs
| otherwise = x : insert m n xs
The insert function above is a working function that inserts a value n into a list before all instances of a value m.
I need help writing an interspace function that inserts a value n between all values m and q in a list. This is what I have so far:
interspace :: Eq(a) => a -> a -> a->[a] -> [a]
interspace m n q[] = []
interspace m n q (x:xs)| m==x && q==(head xs) = n: x : insert m n (headxs)++interspace m n q (xs)
| otherwise = x : interspace m n q xs
Since you will only be adding values to the front of the list, your insert function is unnecessary. (:) will suffice instead. Much like in insert, we pass recursively over the list. Since we want to check if two values match at a time and will also call the function recursively on different lists based on whether or not we find a match, it's a good idea to pattern match (x1:x2:xs) rather than just (x:xs).
If m matches x1 and q matches x2, we place the onto the head of the list and call interspace recursively on the rest of the list. If they do not mach, we call interspace on (x2:xs).
interspace :: Eq a => a -> a -> a-> [a] -> [a]
interspace m n q [] = []
interspace m n q [x] = [x]
interspace m n q (x1:x2:xs) | m == x1 && q == x2 = m : n : q : interspace m n q xs
| otherwise = x1 : interspace m n q (x2:xs)
Example usage:
ghci>> interspace 1 2 1 [1,1,2,1,1]
[1,2,1,2,1,2,1]
ghci>> interspace 1 2 1 [1,1,3]
[1,2,1,3]
ghci>> interspace 1 2 1 [1,2,4,2]
[1,2,4,2]
ghci>> interspace 1 2 1 [1]
[1]
ghci>> interspace 1 2 1 []
[]
I'm looking for an efficient polynomial-time solution to the following problem:
Implement a recursive function node x y for calculating the (x,y)-th number in a number triangle defined as
g(x,y) = 0 if |x| > y
= 1 if (x,y) = (0,0)
= sum of all incoming paths otherwise
The sum of all incoming paths to a node is defined as the sum of the values of all possible paths from the root node (x, y) = (0, 0) to the node under consideration, where at each node (x,y) a path can either continue diagonally down and left (x−1,y+1), straight down (x,y+1), or diagonally down and right (x+1,y+1). The value of a path to a node is defined as the sum of all the nodes along that path up to, but not including, the node under consideration.
The first few entries in the number triangle are given in the table:
\ x -3 -2 -1 0 1 2 3
\
y \ _________________________
|
0 | 0 0 0 1 0 0 0
|
1 | 0 0 1 1 1 0 0
|
2 | 0 2 4 6 4 2 0
|
3 | 4 16 40 48 40 16 4
I am trying to work out a naive solution first, here is what I have:
node x y | y < 0 = error "number cannot be negative"
| (abs x) > y = 0
| (x == 0) && (y == 0) = 1
| otherwise = node (x+1) (y-1) + node x (y-1) + node (x-1) (y-1)
Whenever I run this I get:
"* Exception: stack overflow"?
I believe your problem is a bit more complicated than your example code suggests. First, let's be clear about some definitions here:
Let pathCount x y be the number of paths that end at (x, y). We have
pathCount :: Int -> Int -> Integer
pathCount x y
| y == 0 = if x == 0 then 1 else 0
| otherwise = sum [ pathCount (x + d) (y - 1) | d <- [-1..1]]
Now let's pathSum x y be the sum of all paths that end in (x, y). We have:
pathSum :: Int -> Int -> Integer
pathSum x y
| y == 0 = if x == 0 then 1 else 0
| otherwise = sum [ pathSum (x + d) (y - 1) + node x y * pathCount (x + d) (y - 1)
| d <- [-1..1] ]
With this helper, we can finally define node x y properly:
node :: Int -> Int -> Integer
node x y
| y == 0 = if x == 0 then 1 else 0
| otherwise = sum [ pathSum (x + d) (y - 1) | d <- [-1..1]]
This algorithm as such is exponential time in its current form. We can however add memoization to make the number of additions quadratic. The memoize package on Hackage makes this easy as pie. Full example:
import Control.Monad
import Data.List (intercalate)
import Data.Function.Memoize (memoize2)
node' :: Int -> Int -> Integer
node' x y
| y == 0 = if x == 0 then 1 else 0
| otherwise = sum [ pathSum (x + d) (y - 1) | d <- [-1..1]]
node = memoize2 node'
pathCount' :: Int -> Int -> Integer
pathCount' x y
| y == 0 = if x == 0 then 1 else 0
| otherwise = sum [ pathCount (x + d) (y - 1) | d <- [-1..1]]
pathCount = memoize2 pathCount'
pathSum' :: Int -> Int -> Integer
pathSum' x y
| y == 0 = if x == 0 then 1 else 0
| otherwise = sum [ pathSum (x + d) (y - 1) + node x y * pathCount (x + d) (y - 1)
| d <- [-1..1] ]
pathSum = memoize2 pathSum'
main =
forM_ [0..n] $ \y ->
putStrLn $ intercalate " " $ map (show . flip node y) [-n..n]
where n = 5
Output:
0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 1 1 1 0 0 0 0
0 0 0 2 4 6 4 2 0 0 0
0 0 4 16 40 48 40 16 4 0 0
0 8 72 352 728 944 728 352 72 8 0
16 376 4248 16608 35128 43632 35128 16608 4248 376 16
As you can see the algorithm the size of the numbers will get out of hands rather quickly. So the runtime is not O(n^2), while the number of arithmetic operations is.
You're thinking in terms of outgoing paths, when you should be thinking in terms of incoming paths. Your recursive step is currently looking for nodes from below, instead of above.
First of all, sorry if this is long. I wanted to explain the step by step thought process.
To start off with, you need one crucial fact: You can represent the "answer" at each "index" by a list of paths. For all the zeros, this is [[]], for your base case it is [[1]], and for example, for 0,2 it is [[6,1,1],[6,1,1],[6,1,1]]. This may seem like some redundancy, but it simplifies things down the road. Then, extracting the answer is head . head if the list is non empty, or const 0 if it is.
This is very useful because you can store the answer as a list of rows (the first row would be '[[1]], [], [] ...) and the results of any given row depend only on the previous row.
Secondly, this problem is symmetrical. This is pretty obvious.
The first thing we will do will mirror the definition of fib very closely:
type Path = [[Integer]]
triangle' :: [[Path]]
triangle' = ([[1]] : repeat []) : map f triangle'
We know this must be close to correct, since the 2nd row will depend on the first row only, the third on the 2nd only, etc. So the result will be
([[1]] : repeat []) : f ([[1]] : repeat []) : f ....
Now we just need to know what f is. Firstly, its type: [Path] -> [Path]. Quite simply, given the previous row, return the next row.
Now you may see another problem arising. Each invocation of f needs to know how many columns in the current row. We could actually count the length of non-null elements in the previous row, but it is simpler to pass the parameter directly, so we change map f triangle' to zipWith f [1..] triangle', giving f the type Int -> [Path] -> [Path].
f needs to handle one special case and one general case. The special case is x=0, in this case we simply treat the x+1,y-1 and x-1,y-1 recursions the same, and otherwise is identical to gn. Lets make two functions, g0 and gn which handle these two cases.
The actually computation of gn is easy. We know for some x we need the elements x-1, x, x+1 of the previous row. So if we drop x-1 elements before giving the previous row to the xth invocation of gn, gn can just take the first 3 elements and it will have what it needs. We write this as follows:
f :: Int -> [Path] -> [Path]
f n ps = g0 ps : map (gn . flip drop ps) [0..n-1] ++ repeat []
The repeat [] at the end should be obvious: for indices outside the triangle, the result is 0.
Now writing g0 and gs is really quite simple:
g0 :: [Path] -> Path
g0 (a:b:_) = map (s:) q
where
s = sum . concat $ q
q = b ++ a ++ b
gn :: [Path] -> Path
gn (a:b:c:_) = map (s:) q
where
s = sum . concat $ q
q = a ++ b ++ c
On my machine this version is about 3-4 times faster than the fastest version I could write with normal recursion and memoization.
The rest is just printing or pulling out the number you want.
triangle :: Int -> Int -> Integer
triangle x y = case (triangle' !! y) !! (abs x) of
[] -> 0
xs -> head $ head xs
triList :: Int -> Int -> Path
triList x y = (triangle' !! y) !! (abs x)
printTri :: Int -> Int -> IO ()
printTri width height =
putStrLn $ unlines $ map unwords
[[ p $ triangle x y | x <- [-x0..x0]] | y <- [0..height]]
where maxLen = length $ show $ triangle 0 height
x0 = width `div` 2
p = printf $ "%" ++ show maxLen ++ "d "
Suppose that you were given a list of input/ouput pairs:
f 0 = 0
f 1 = 2
f 2 = 1
f 3 = -1
f 4 = 0
f 5 = 0
f 6 = -76
f 7 = -3
f 8 = 3
f 9 = -1
f 10 = -1
f 11 = -6
f 12 = -1
f 13 = -1
f 14 = 4
f 15 = -2
f 16 = -10
f 17 = 0
f 18 = 0
f 19 = -1
f 20 = 2
f 21 = 3
f 22 = 0
f 23 = 4
f 24 = 2
f 25 = -1
f 26 = 0
f 27 = 0
f 28 = -4
f 29 = -2
f 30 = -14
Now suppose you were asked to find the definition of f using a proper, small mathematical formula instead of an enumeration of values. That is, the answer should be f x = floor(tan(x*x-3)) (or similar), because that is a small formula that is correct for every input. How would you do it?
So let's simplify. You want a function such that
f 1 = 10
f 2 = 3
f 3 = 8
There exists a formula for immediately finding a polynomial function which meets these demands. In particular
f x = 6 * x * x - 25 * x + 29
works. It turns out to be the case that if you have the graph of any function
{ (x_1, y_1), (x_2, y_2), ..., (x_i, y_i) }
you can immediately build a polynomial which exactly matches those inputs and outputs.
So, given that polynomials like this exist you're never going to solve your problem (finding a particular solution like floor(tan(x*x-3))) without enforcing more constraints. In particular, if you don't somehow outlaw or penalize polynomials then I'm always going to deliver them to you.
In general, what you'd like to do is (a) define a search space and (b) define a metric of fitness, also known as a loss function. If your search space is finite then you have yourself a solution immediately: rank every element of your search space according to your loss function and select randomly from the set of solutions which tie for best.
What it sounds like you're asking for is much harder though—if you're looking through the space of all possible programs then that space is unbelievably large. Searching it exhaustively is impossible unless we constrain ourselves heavily or accept approximation. Secondly, we must have very good understanding of your loss function and how it interacts with the search space as we'll want to make intelligent guesses to move forward through this vast space.
You mention genetic algorithms—they're often lauded for this kind of work and indeed they can be a method of driving search through a large space with an uncertain loss function, but they also fail as often as they succeed. Someone who is genuinely skilled at using genetic algorithms to solve problems will spend all of their time crafting the search space and the loss function to direct the algorithm toward meaningful answers.
Now this can be done for general programs if you're careful. In fact, this was the subject of last year's ICFP programming contest. In particular, search on this page for "Rules of the ICFP Contest 2013" to see the set up.
I think feed forward neural network (FFNN) and genetic programming (GP) are good techniques for complicated function simulation.
if you need function as polynomials use the GP otherwise FFNN is very simple and the matlab have a library for it.
I think the "interpolation" don't get what I am asking. Maybe I was not clear enough, but fortunately I've managed to get a semi-satisfactory answer to my question using a brute-force search algorithm myself. Using only a list of input/output pairs, as presented in the question, I was able to recover the original function. The comments on this snippet should explain it:
import Control.Monad.Omega
{- First we define a simple evaluator for mathematical expressions -}
data A = Add A A | Mul A A | Div A A | Sub A A | Pow A A |
Sqrt A | Tan A | Sin A | Cos A |
Num Float | X deriving (Show)
eval :: A -> Float -> Float
eval (Add a b) x = eval a x + eval b x
eval (Mul a b) x = eval a x * eval b x
eval (Div a b) x = eval a x / eval b x
eval (Sub a b) x = eval a x - eval b x
eval (Pow a b) x = eval a x ** eval b x
eval (Sqrt a) x = sqrt (eval a x)
eval (Tan a) x = tan (eval a x)
eval (Sin a) x = sin (eval a x)
eval (Cos a) x = cos (eval a x)
eval (Num a) x = a
eval X x = x
{- Now we enumerate all possible terms of that grammar -}
allTerms = do
which <- each [1..15]
if which == 1 then return X
else if which == 2 then do { x <- allTerms; y <- allTerms; return (Add x y) }
else if which == 3 then do { x <- allTerms; y <- allTerms; return (Mul x y) }
else if which == 4 then do { x <- allTerms; y <- allTerms; return (Div x y) }
else if which == 5 then do { x <- allTerms; y <- allTerms; return (Sub x y) }
else if which == 6 then do { x <- allTerms; y <- allTerms; return (Pow x y) }
else if which == 7 then do { x <- allTerms; y <- allTerms; return (Sqrt x) }
else if which == 8 then do { x <- allTerms; y <- allTerms; return (Tan x) }
else if which == 9 then do { x <- allTerms; y <- allTerms; return (Sin x) }
else if which == 10 then do { x <- allTerms; y <- allTerms; return (Cos x) }
else return (Num (which-10))
{- Then we create 20 input/output pairs of a random function -}
fun x = x+tan(x*x)
maps = let n=20 in zip [1..n] (map fun [1..n])
{- This tests a function in our language against a map of in/out pairs -}
check maps f = all test maps where
test (a,b) = (eval f a) == b
{- Naw lets see if a brute-force search can recover the original program
from the list of input/output pairs alone! -}
main = print $ take 1 $ filter (check maps) (runOmega allTerms)
{- Ouput: [Add X (Tan (Mul X X))]
Yay! As much as there are infinite possible solutions,
the first solution is actually our initial program.
-}
One possible definition goes like this:
f 0 = 0
f 1 = 2
f 2 = 1
f 3 = -1
f 4 = 0
f 5 = 0
f 6 = -76
f 7 = -3
f 8 = 3
f 9 = -1
f 10 = -1
f 11 = -6
f 12 = -1
f 13 = -1
f 14 = 4
f 15 = -2
f 16 = -10
f 17 = 0
f 18 = 0
f 19 = -1
f 20 = 2
f 21 = 3
f 22 = 0
f 23 = 4
f 24 = 2
f 25 = -1
f 26 = 0
f 27 = 0
f 28 = -4
f 29 = -2
f 30 = -14