Binary to ternary representation conversion - algorithm

Does anybody know (or may point to some source to read about) a method or algorithm to convert a number represented in binary numeral system into the ternary one (my particular case), or universal algorithm for such conversions?
The solution I've already implemented is to convert a number to decimal first and then convert it into required numeral system. This works, but there are two steps. I wonder if it could be done in one step easily without implementing ternary arithmetic first? Is there some trick, guys?
UPD: It seems I didn't manage to describe clearly which way of conversion I'm looking for. I'm not asking for some way to convert base-2 to base-3, I do know how to do this. You may consider that I have algebraic data structures for ternary and binary numbers, in Haskell it looks like this:
data BDigit = B0 | B1
type BNumber = [BDigit]
data TDigit = T0 | T1 | T2
type TNumber = [TDigit]
And there are two obvious ways to convert one to another: first is to convert it into Integer first and get the result (not interesting way), second is to implement own multiplication and addition in base-3 and compute the result multiplying digit values to respective power of two (straightforward and heavy).
So I'm wondering if there's another method than these two.

If you are doing it with a computer things are already in binary, so just repeatedly dividing by 3 and taking remainders is about as easy as things get.
If you are doing it by hand, long division in binary works just like long division in decimal.
just divide by three and take remainders. if we start with 16
___101
11 |10000
11
100
11
1
100000 / 11 = 101 + 1/11 so the least significnnt digit is 1
101/ 11 = 1 + 10/11 the next digit is 2
1 and the msd is 1
so in ternary 121

You can use some clever abbreviations for converting. The following code is the "wrong" direction, it is a conversion from ternary to binary based on the fact that 3^2 = 2^3 + 1 using only binary addition. Basically I'm converting two ternary digits in three binary digits. From binary to ternary would be slightly more complicated, as ternary addition (and probably subtraction) would be required (working on that). I'm assuming the least significant digit in head of the list (which is the only way that makes sense), so you have to read the numbers "backwards".
addB :: BNumber → BNumber → BNumber
addB a [] = a
addB [] b = b
addB (B0:as) (B0:bs) = B0 : (addB as bs)
addB (B0:as) (B1:bs) = B1 : (addB as bs)
addB (B1:as) (B0:bs) = B1 : (addB as bs)
addB (B1:as) (B1:bs) = B0 : (addB (addB as bs) [B1])
t2b :: TNumber → BNumber
t2b [] = []
t2b [T0] = [B0]
t2b [T1] = [B1]
t2b [T2] = [B0,B1]
t2b (T2:T2:ts) = let bs = t2b ts in addB bs (B0:B0:B0:(addB bs [B1]))
t2b (t0:t1:ts) =
let bs = t2b ts
(b0,b1,b2) = conv t0 t1
in addB bs (b0:b1:b2:bs)
where conv T0 T0 = (B0,B0,B0)
conv T1 T0 = (B1,B0,B0)
conv T2 T0 = (B0,B1,B0)
conv T0 T1 = (B1,B1,B0)
conv T1 T1 = (B0,B0,B1)
conv T2 T1 = (B1,B0,B1)
conv T0 T2 = (B0,B1,B1)
conv T1 T2 = (B1,B1,B1)
[Edit] Here is the binary to ternary direction, as expected a little bit more lengthy:
addT :: TNumber → TNumber → TNumber
addT a [] = a
addT [] b = b
addT (T0:as) (T0:bs) = T0 : (addT as bs)
addT (T1:as) (T0:bs) = T1 : (addT as bs)
addT (T2:as) (T0:bs) = T2 : (addT as bs)
addT (T0:as) (T1:bs) = T1 : (addT as bs)
addT (T1:as) (T1:bs) = T2 : (addT as bs)
addT (T2:as) (T1:bs) = T0 : (addT (addT as bs) [T1])
addT (T0:as) (T2:bs) = T2 : (addT as bs)
addT (T1:as) (T2:bs) = T0 : (addT (addT as bs) [T1])
addT (T2:as) (T2:bs) = T1 : (addT (addT as bs) [T1])
subT :: TNumber → TNumber → TNumber
subT a [] = a
subT [] b = error "negative numbers supported"
subT (T0:as) (T0:bs) = T0 : (subT as bs)
subT (T1:as) (T0:bs) = T1 : (subT as bs)
subT (T2:as) (T0:bs) = T2 : (subT as bs)
subT (T0:as) (T1:bs) = T2 : (subT as (addT bs [T1]))
subT (T1:as) (T1:bs) = T0 : (subT as bs)
subT (T2:as) (T1:bs) = T1 : (subT as bs)
subT (T0:as) (T2:bs) = T1 : (subT as (addT bs [T1]))
subT (T1:as) (T2:bs) = T2 : (subT as (addT bs [T1]))
subT (T2:as) (T2:bs) = T0 : (subT as bs)
b2t :: BNumber → TNumber
b2t [] = []
b2t [B0] = [T0]
b2t [B1] = [T1]
b2t [B0,B1] = [T2]
b2t [B1,B1] = [T0,T1]
b2t (b0:b1:b2:bs) =
let ts = b2t bs
(t0,t1) = conv b0 b1 b2
in subT (t0:t1:ts) ts
where conv B0 B0 B0 = (T0,T0)
conv B1 B0 B0 = (T1,T0)
conv B0 B1 B0 = (T2,T0)
conv B1 B1 B0 = (T0,T1)
conv B0 B0 B1 = (T1,T1)
conv B1 B0 B1 = (T2,T1)
conv B0 B1 B1 = (T0,T2)
conv B1 B1 B1 = (T1,T2)
[Edit2] A slightly improved version of subT which doesn't need addT
subT :: TNumber → TNumber → TNumber
subT a [] = a
subT [] b = error "negative numbers supported"
subT (a:as) (b:bs)
| b ≡ T0 = a : (subT as bs)
| a ≡ b = T0 : (subT as bs)
| a ≡ T2 ∧ b ≡ T1 = T1 : (subT as bs)
| otherwise = let td = if a ≡ T0 ∧ b ≡ T2 then T1 else T2
in td : (subT as $ addTDigit bs T1)
where addTDigit [] d = [d]
addTDigit ts T0 = ts
addTDigit (T0:ts) d = d:ts
addTDigit (T1:ts) T1 = T2:ts
addTDigit (t:ts) d = let td = if t ≡ T2 ∧ d ≡ T2 then T1 else T0
in td : (addTDigit ts T1)

I think that everybody is missing something important. First, compute a table in advance, for each binary bit, we need the representation in ternary. In MATLAB, I'd built it like this, although every other step after that will be done purely by hand, the computation is so easy.
dec2base(2.^(0:10),3)
ans =
0000001
0000002
0000011
0000022
0000121
0001012
0002101
0011202
0100111
0200222
1101221
Now, consider the binary number 011000101 (which happens to be the decimal number 197, as we will find out later.) Extract the ternary representation for each binary bit from the table. I'll write out the corresponding rows.
0000001
0000011
0002101
0011202
Now just sum. We get this representation, in uncarried ternary.
0013315
Yes, those are not ternary numbers, but they are almost in a valid base 3 representation. Now all you need to do is to do the carries. Start with the units digit.
5 is larger than 2, so subtract off the number of multiples of 3, and increment the second digit of the result as appropriate.
0013322
The second digit is now a 2, a legal ternary digit, so go on to the third digit. Do that carry too,
0014022
Finally yielding the now completely valid ternary number...
0021022
Were my computations correct? I'll let MATLAB make the final judgement for us:
base2dec('011000101',2)
ans =
197
base2dec('0021022',3)
ans =
197
Have I pointed out just how trivial this operation was, that I could do the conversion entirely by hand, going essentially directly from binary to ternary, at least once I had that initial table written down and stored?

I'm afraid I don't know enough Haskell to be able to express this in code but I wonder if using Horner's rule for evaluating polynomials might yield a method.
For example ax^2 + bx + c can be evaluated as c+x*(b+x*a).
To convert, say,
the ternary number a*9+b*3+c to binary, one starts with the binary representation of a, then multiplies that by 3 (i.e shift and add), then adds the binary representation of b, multiplies the result by 3 and adds c.
It seems to me this should be doable with a map (to get the binary representation of the ternary digits) and a fold (of a,b -> a+3*b)

In case this is homework, pseudocode to write x in base b backwards:
while (x != 0) {
q <-- x/b
r <-- x - q*b
print r
x <-- q
}
I'm sure you can figure out how to write the result forwards instead of backwards. Note that / needs to be C-style integer division (the result is an integer, truncated toward zero).
Note that this doesn't depend at all on the base that the arithmetic is performed in. Arithmetic is defined on integers, not the representation of integers in a specific base.
Edit: Based on your updated question, I would slam the digit representation into an integer (via ors and shifts) and use the algorithm described above with integer arithmetic.
Certainly you could do it as you describe, but it seems like an awful lot of work.

I don't think there's a super-efficient way.
"The solution I've already implemented
is to convert a number to decimal
first."
I assume that you are actually converting to some built-in integer type first. I don't think that built-in integer has anything to do with base 10. (Though, when you print it, there will be a base 10 conversion).
Maybe you'd expect there to be some algorithm which looks at the input one digit at a time and produces the output.
But, say you want to convert 3486784400 (base 10) to base 3. You'll need to examine every digit before producing output, because
3486784401 (base 10) = 100000000000000000000 (base 3)
3486784400 (base 10) = 22222222222222222222 (base 3)
..also
"compute the result multiplying digit
values to respective power of two"
explicitly computing a power isn't necessary, see convert from base 60 to base 10

I think there might be some different different "views" of the problem, though I'm not sure any of them are faster or better. For example, the lower order base 3 digit of n is just n mod 3. Let say you already have the binary representation of n. Then consider how the powers of 2 work out mod 3. 2^0 = 1 mod 3, 2^1 = 2 mod 3, 2^2 = 1 mod 3, 2^3 = 2 mod 3, ... In other words, the powers alternate between being 1 mod 3 and being 2 mod 3. You now have an easy way to get the low-order base 3 digit by scanning the binary representation of n and usually only addition of either 1 or 2 at each bit position where a 1 occurs.

No, you can't convert a base2 number to a base3 number without loading it into an integer. The reason is that 2 and 3 are coprime - they have no common factors.
If you were working with base2 and base4, or even base6 and base9, then the set of integers up to the lowest common multiple of the two bases would be represented by two isomorphic sets. For example 13 (base4) = 0111 (base2), so converting 1313 (base4) = 01110111 (base2) - it's a find and replace operation.
At least the solution that you have works and is relatively simple. If you need to improve performance then convert the entire base2 representation to an integer before starting the base3 conversion; it means less modulus operations. The alternative would be process each character in the base2 number one by one, in which case you'll be dividing by all the powers of 3 for each digit in the base2 representation.

If you use binary-coded-ternary (one pair of bits per trit) you can convert using parallel arithmetic. See this tutorial.

Related

This LL(1) parse table is correct?

Given grammar:
S -> AB
A -> aA | b
B -> CA
C -> cC | ɛ
Is its LL(1) parsing table is this?
No, it is not entirely correct because of these calculations:
First(S) = First(A) = {a,b}
First(A) = {a,b}
First(B) = First(C) = {c,ε}
First(C) = {c,ε}
Considering that the Follow of each non-terminal symbol is the terminal symbol right after:
Follow(S) ={a,b} (if SAB --> AB then SaAB --> aAB or SbB --> bB)
Follow(A) = {a,c} (if AaA-->aA and Ab --> b then AaA --> aA or Ab --> b)
Follow(B) = Follow (A) = {a,c} (model production A --> aB, which a terminal, and a = ε, then Follow (A) = Follow (B))
Follow(C) = {a,b} (from B-->CA, B-->CaA or B-->Cb)
So the the difference with your parse table, and these calculations, is that in non-terminal B row in columns a and b the values are NULL.
Yes it is correct.
First(S) = First(A) = {a,b}
First(A) = {a,b}
First(B) = {a,b,c}
B->CA and C->cC|ɛ
First(C) = {c,ε}
so if we put ɛ as a replacement of C in B -> CA, we'll have B -> A, Thus First(B)= First(A) instead of ɛ.

Haskell Optimizations for List Processing stymied by Lazy Evaluation

I'm trying to improve the efficiency of the following code. I want to count all occurrences of a symbol before a given point (as part of pattern-matching using a Burrows-Wheeler transform). There's some overlap in how I'm counting symbols. However, when I have tried to implement what looks like it should be more efficient code, it turns out to be less efficient, and I'm assuming that lazy evaluation and my poor understanding of it is to blame.
My first attempt at a counting function went like this:
count :: Ord a => [a] -> a -> Int -> Int
count list sym pos = length . filter (== sym) . take pos $ list
Then in the body of the matching function itself:
matching str refCol pattern = match 0 (n - 1) (reverse pattern)
where n = length str
refFstOcc sym = length $ takeWhile (/= sym) refCol
match top bottom [] = bottom - top + 1
match top bottom (sym : syms) =
let topCt = count str sym top
bottomCt = count str sym (bottom + 1)
middleCt = bottomCt - topCt
refCt = refFstOcc sym
in if middleCt > 0
then match (refCt + topCt) (refCt + bottomCt - 1) syms
else 0
(Stripped down for brevity - I'm memoizing first occurrences of symbols in refCol through a Map, and a couple other details as well).
Edit: Sample use would be:
matching "AT$TCTAGT" "$AACGTTTT" "TCG"
which should be 1 (assuming I didn't mistype anything).
Now, I'm recounting everything in the middle between the top pointer and the bottom twice, which adds up when I count a million character DNA string with only 4 possible choices for characters (and profiling tells me that this is the big bottleneck, too, taking 48% of my time for bottomCt and around 38% of my time for topCt). For reference, when calculating this for a million character string and trying to match 50 patterns (each of which is between 1 and 1000 characters), the program takes about 8.5 to 9.5 seconds to run.
However, if I try to implement the following function:
countBetween :: Ord a => [a] -> a -> Int -> Int -> (Int, Int)
countBetween list sym top bottom =
let (topList, bottomList) = splitAt top list
midList = take (bottom - top) bottomList
getSyms = length . filter (== sym)
in (getSyms topList, getSyms midList)
(with changes made to the matching function to compensate), the program takes between 18 and 22 seconds to run.
I've also tried passing in a Map which can keep track of previous calls, but that also takes about 20 seconds to run and runs up the memory usage.
Similarly, I've shorted length . filter (== sym) to a fold, but again - 20 seconds for foldr, and 14-15 for foldl.
So what would be a proper Haskell way to optimize this code through rewriting it? (Specifically, I'm looking for something that doesn't involve precomputation - I may not be reusing strings very much - and which explains something of why this is happening).
Edit: More clearly, what I am looking for is the following:
a) Why does this behaviour happen in Haskell? How does lazy evaluation play a role, what optimizations is the compiler making to rewrite the count and countBetween functions, and what other factors may be involved?
b) What is a simple code rewrite which would address this issue so that I don't traverse the lists multiple times? I'm looking specifically for something which addresses that issue, rather than a solution which sidesteps it. If the final answer is, count is the most efficient possible way to write the code, why is that?
I'm not sure lazy evaluation has much to do with the performance of the code. I think the main problem is the use of String - which is a linked list - instead of more performant string type.
Note that this call in your countBetween function:
let (topList, bottomList) = splitAt top list
will re-create the linked link corresponding to topList meaning
a lot more allocations.
A Criterion benchmark to compare splitAt versus using take n/drop n
may be found here: http://lpaste.net/174526. The splitAt version is
about 3 times slower and, of course, has a lot more allocations.
Even if you don't want to "pre-compute" the counts you can improve
matters a great deal by simply switching to either ByteString or Text.
Define:
countSyms :: Char -> ByteString -> Int -> Int -> Int
countSyms sym str lo hi =
length [ i | i <- [lo..hi], BS.index str i == sym ]
and then:
countBetween :: ByteString -> Char -> Int -> Int -> (Int,Int)
countBetween str sym top bottom = (a,b)
where a = countSyms sym str 0 (top-1)
b = countSyms sym str top (bottom-1)
Also, don't use reverse on large lists - it will reallocate the
entire list. Just index into a ByteString / Text in reverse.
Memoizing counts may or may not help. It all depends on how it's done.
It seems that the main point of the match routine is
to transform a interval (bottom,top) to another interval
based on the current symbol sym. The formulas are
basically:
ref_fst = index of sym in ref_col
-- defined in an outer scope
match :: Char -> (Int,Int) -> (Int,Int)
match sym (bottom, top) | bottom > top = (bottom, top) -- if the empty interval
match sym (bottom, top) =
let
top_count = count of sym in str from index 0 to top
bot_count = count of sym in str from index 0 to bottom
mid_count = top_count - bot_count
in if mid_count > 0
then (ref_fst + bot_count, ref_fst + top_count)
else (1,0) -- the empty interval
And then matching is just a fold over pattern using match
with the initial interval (0, n-1).
Both top_count and bot_count can be computed efficiently
using a precomputed lookup table, and below is code which
does that.
If you run test1 you'll see a trace of how the interval
is transformed via each symbol in the pattern.
Note: There may be off-by-1 errors, and I've hard coded
ref_fst to be 0 - I'm not sure how this fits into the
larger algorithm, but the basic idea should be sound.
Note that once the counts vector has been created
there is no need to index into the original string anymore.
Therefore, even though I use a ByteString here for
the (larger) DNA sequence, it's not crucial, and the
mkCounts routine should work just as well if passed a String
instead.
Code also available at http://lpaste.net/174288
{-# LANGUAGE OverloadedStrings #-}
import Data.Vector.Unboxed ((!))
import qualified Data.Vector.Unboxed as UV
import qualified Data.Vector.Unboxed.Mutable as UVM
import qualified Data.ByteString.Char8 as BS
import Debug.Trace
import Text.Printf
import Data.List
mkCounts :: BS.ByteString -> UV.Vector (Int,Int,Int,Int)
mkCounts syms = UV.create $ do
let n = BS.length syms
v <- UVM.new (n+1)
let loop x i | i >= n = return x
loop x i = let s = BS.index syms i
(a,t,c,g) = x
x' = case s of
'A' -> (a+1,t,c,g)
'T' -> (a,t+1,c,g)
'C' -> (a,t,c+1,g)
'G' -> (a,t,c,g+1)
_ -> x
in do UVM.write v i x
loop x' (i+1)
x <- loop (0,0,0,0) 0
UVM.write v n x
return v
data DNA = A | C | T | G
deriving (Show)
getter :: DNA -> (Int,Int,Int,Int) -> Int
getter A (a,_,_,_) = a
getter T (_,t,_,_) = t
getter C (_,_,c,_) = c
getter G (_,_,_,g) = g
-- narrow a window
narrow :: Int -> UV.Vector (Int,Int,Int,Int) -> DNA -> (Int,Int) -> (Int,Int)
narrow refcol counts sym (lo,hi) | trace msg False = undefined
where msg = printf "-- lo: %d hi: %d refcol: %d sym: %s top_cnt: %d bot_count: %d" lo hi refcol (show sym) top_count bot_count
top_count = getter sym (counts ! (hi+1))
bot_count = getter sym (counts ! lo)
narrow refcol counts sym (lo,hi) =
let top_count = getter sym (counts ! (hi+1))
bot_count = getter sym (counts ! (lo+0))
mid_count = top_count - bot_count
in if mid_count > 0
then ( refcol + bot_count, refcol + top_count-1 )
else (lo+1,lo) -- signal an wmpty window
findFirst :: DNA -> UV.Vector (Int,Int,Int,Int) -> Int
findFirst sym v =
let n = UV.length v
loop i | i >= n = n
loop i = if getter sym (v ! i) > 0
then i
else loop (i+1)
in loop 0
toDNA :: String -> [DNA]
toDNA str = map charToDNA str
charToDNA :: Char -> DNA
charToDNA = go
where go 'A' = A
go 'C' = C
go 'T' = T
go 'G' = G
dnaToChar A = 'A'
dnaToChar C = 'C'
dnaToChar T = 'T'
dnaToChar G = 'G'
first :: DNA -> BS.ByteString -> Int
first sym str = maybe len id (BS.elemIndex (dnaToChar sym) str)
where len = BS.length str
test2 = do
-- matching "AT$TCTAGT" "$AACGTTTT" "TCG"
let str = "AT$TCTAGT"
refcol = "$AACGTTTT"
syms = toDNA "TCG"
-- hard coded for now
-- may be computeed an memoized
refcol_G = 4
refcol_C = 3
refcol_T = 5
counts = mkCounts str
w0 = (0, BS.length str -1)
w1 = narrow refcol_G counts G w0
w2 = narrow refcol_C counts C w1
w3 = narrow refcol_T counts T w2
firsts = (first A refcol, first T refcol, first C refcol, first G refcol)
putStrLn $ "firsts: " ++ show firsts
putStrLn $ "w0: " ++ show w0
putStrLn $ "w1: " ++ show w1
putStrLn $ "w2: " ++ show w2
putStrLn $ "w3: " ++ show w3
let (lo,hi) = w3
len = if lo <= hi then hi - lo + 1 else 0
putStrLn $ "length: " ++ show len
matching :: BS.ByteString -> BS.ByteString -> String -> Int
matching str refcol pattern =
let counts = mkCounts str
n = BS.length str
syms = toDNA (reverse pattern)
firsts = (first A refcol, first T refcol, first C refcol, first G refcol)
go (lo,hi) sym = narrow refcol counts sym (lo,hi)
where refcol = getter sym firsts
(lo, hi) = foldl' go (0,n-1) syms
len = if lo <= hi then hi - lo + 1 else 0
in len
test3 = matching "AT$TCTAGT" "$AACGTTTT" "TCG"

Generic algorithm to enumerate sum and product types on Haskell?

Some time ago, I've asked how to map back and forth from godel numbers to terms of a context-free language. While the answer solved the issue specificaly, I'm having trouble in actually programming it generically. So, this question is more generic: given a recursive algebraic data type with terminals, sums and products - such as
data Term = Prod Term Term | SumL Term | SumR Term | AtomA | AtomB
what is an algorithm that will map a term of this type to its godel number, and its inverse?
Edit: for example:
data Foo = A | B Foo | C Foo deriving Show
to :: Foo -> Int
to A = 1
to (B x) = to x * 2
to (C x) = to x * 2 + 1
from :: Int -> Foo
from 1 = A
from n = case mod n 2 of
0 -> B (from (div n 2))
1 -> C (from (div n 2))
Here, to and from do what I want for Foo. I'm just asking for a systematic way to derive those functions for any datatype.
In order to avoid dealing with a particular Goedel numbering, let's define a class that'll abstract the necessary operations (with some imports we'll need later):
{-# LANGUAGE TypeOperators, DefaultSignatures, FlexibleContexts, DeriveGeneric #-}
import Control.Applicative
import GHC.Generics
import Test.QuickCheck
import Test.QuickCheck.Gen
class GodelNum a where
fromInt :: Integer -> a
toInt :: a -> Maybe Integer
encode :: [a] -> a
decode :: a -> [a]
So we can inject natural numbers and encode sequences. Let's further create a canonical instance of this class that'll use throughout the code, which does no real Goedel encoding, just constructs a tree of terms.
data TermNum = Value Integer | Complex [TermNum]
deriving (Show)
instance GodelNum TermNum where
fromInt = Value
toInt (Value x) = Just x
toInt _ = Nothing
encode = Complex
decode (Complex xs) = xs
decode _ = []
For real encoding we'd use another implementation that'd use just one Integer, something like newtype SomeGoedelNumbering = SGN Integer.
Let's further create a class for types that we can encode/decode:
class GNum a where
gto :: (GodelNum g) => a -> g
gfrom :: (GodelNum g) => g -> Maybe a
default gto :: (Generic a, GodelNum g, GGNum (Rep a)) => a -> g
gto = ggto . from
default gfrom :: (Generic a, GodelNum g, GGNum (Rep a)) => g -> Maybe a
gfrom = liftA to . ggfrom
The last four lines define a generic implementation of gto and gfrom using GHC Generics and DefaultSignatures. The class GGNum that they use is a helper class which we'll use to define encoding for the atomic ADT operations - products, sums, etc.:
class GGNum f where
ggto :: (GodelNum g) => f a -> g
ggfrom :: (GodelNum g) => g -> Maybe (f a)
-- no-arg constructors
instance GGNum U1 where
ggto U1 = encode []
ggfrom _ = Just U1
-- products
instance (GGNum a, GGNum b) => GGNum (a :*: b) where
ggto (a :*: b) = encode [ggto a, ggto b]
ggfrom e | [x, y] <- decode e = liftA2 (:*:) (ggfrom x) (ggfrom y)
| otherwise = Nothing
-- sums
instance (GGNum a, GGNum b) => GGNum (a :+: b) where
ggto (L1 x) = encode [fromInt 0, ggto x]
ggto (R1 y) = encode [fromInt 1, ggto y]
ggfrom e | [n, x] <- decode e = case toInt n of
Just 0 -> L1 <$> ggfrom x
Just 1 -> R1 <$> ggfrom x
_ -> Nothing
-- metadata
instance (GGNum a) => GGNum (M1 i c a) where
ggto (M1 x) = ggto x
ggfrom e = M1 <$> ggfrom e
-- constants and recursion of kind *
instance (GNum a) => GGNum (K1 i a) where
ggto (K1 x) = gto x
ggfrom e = K1 <$> gfrom e
Having that, we can then define a data type like yours and just declare its GNum instance, everything else will be automatically derived.
data Term = Prod Term Term | SumL Term | SumR Term | AtomA | AtomB
deriving (Eq, Show, Generic)
instance GNum Term where
And just to be sure we've done everything right, let's use QuickCheck to verify that our gfrom is an inverse of gto:
instance Arbitrary Term where
arbitrary = oneof [ return AtomA
, return AtomB
, SumL <$> arbitrary
, SumR <$> arbitrary
, Prod <$> arbitrary <*> arbitrary
]
prop_enc_dec :: Term -> Property
prop_enc_dec x = Just x === gfrom (gto x :: TermNum)
main :: IO ()
main = quickCheck prop_enc_dec
Notes:
The same thing could be accomplished using Scrap Your Boilerplate, perhaps more efficiently, as it allows somewhat higher-level access - enumerating constructors and records, etc.
See also paper Efficient Bijective G¨odel Numberings for Term Algebras (I haven't read the paper yet, but seems related).
For fun, I decided to try the approach in the link you posted, and didn't get stuck anywhere. So here's my code, with no commentary (the explanation is the same as the last time). First, code stolen from the other answer:
{-# LANGUAGE TypeSynonymInstances #-}
import Control.Applicative
import Data.Universe.Helpers
type Nat = Integer
class Godel a where
to :: a -> Nat
from :: Nat -> a
instance Godel Nat where to = id; from = id
instance (Godel a, Godel b) => Godel (a, b) where
to (m_, n_) = (m + n) * (m + n + 1) `quot` 2 + m where
m = to m_
n = to n_
from p = (from m, from n) where
isqrt = floor . sqrt . fromIntegral
base = (isqrt (1 + 8 * p) - 1) `quot` 2
triangle = base * (base + 1) `quot` 2
m = p - triangle
n = base - m
And the code specific to your new type:
data Term = Prod Term Term | SumL Term | SumR Term | AtomA | AtomB
deriving (Eq, Ord, Read, Show)
ts = AtomA : AtomB : interleave [uncurry Prod <$> ts +*+ ts, SumL <$> ts, SumR <$> ts]
instance Godel Term where
to AtomA = 0
to AtomB = 1
to (Prod t1 t2) = 2 + 0 + 3 * to (t1, t2)
to (SumL t) = 2 + 1 + 3 * to t
to (SumR t) = 2 + 2 + 3 * to t
from 0 = AtomA
from 1 = AtomB
from n = case quotRem (n-2) 3 of
(q, 0) -> uncurry Prod (from q)
(q, 1) -> SumL (from q)
(q, 2) -> SumR (from q)
The same ghci test as last time:
*Main> take 30 (map from [0..]) == take 30 ts
True

Make a century in OCaml

Here is quite a typical make a century problem.
We have a natural number list [1;2;3;4;5;6;7;8;9].
We have a list of possible operators [Some '+'; Some '*';None].
Now we create a list of operators from above possibilities and insert each operator into between each consecutive numbers in the number list and compute the value.
(Note a None b = a * 10 + b)
For example, if the operator list is [Some '+'; Some '*'; None; Some '+'; Some '+'; Some '+'; Some '+'; Some '+'], then the value is 1 + 2 * 34 + 5 + 6 + 7 + 8 + 9 = 104.
Please find all possible operator lists, so the value = 10.
The only way I can think of is brute-force.
I generate all possible operator lists.
Compute all possible values.
Then filter so I get all operator lists which produce 100.
exception Cannot_compute
let rec candidates n ops =
if n = 0 then [[]]
else
List.fold_left (fun acc op -> List.rev_append acc (List.map (fun x -> op::x) (candidates (n-1) ops))) [] ops
let glue l opl =
let rec aggr acc_l acc_opl = function
| hd::[], [] -> (List.rev (hd::acc_l), List.rev acc_opl)
| hd1::hd2::tl, None::optl -> aggr acc_l acc_opl (((hd1*10+hd2)::tl), optl)
| hd::tl, (Some c)::optl -> aggr (hd::acc_l) ((Some c)::acc_opl) (tl, optl)
| _ -> raise Cannot_glue
in
aggr [] [] (l, opl)
let compute l opl =
let new_l, new_opl = glue l opl in
let rec comp = function
| hd::[], [] -> hd
| hd::tl, (Some '+')::optl -> hd + (comp (tl, optl))
| hd1::hd2::tl, (Some '-')::optl -> hd1 + (comp ((-hd2)::tl, optl))
| hd1::hd2::tl, (Some '*')::optl -> comp (((hd1*hd2)::tl), optl)
| hd1::hd2::tl, (Some '/')::optl -> comp (((hd1/hd2)::tl), optl)
| _, _ -> raise Cannot_compute
in
comp (new_l, new_opl)
let make_century l ops =
List.filter (fun x -> fst x = 100) (
List.fold_left (fun acc x -> ((compute l x), x)::acc) [] (candidates ((List.length l)-1) ops))
let rec print_solution l opl =
match l, opl with
| hd::[], [] -> Printf.printf "%d\n" hd
| hd::tl, (Some op)::optl -> Printf.printf "%d %c " hd op; print_solution tl optl
| hd1::hd2::tl, None::optl -> print_solution ((hd1*10+hd2)::tl) optl
| _, _ -> ()
I believe my code is ugly. So I have the following questions
computer l opl is to compute using the number list and operator list. Basically it is a typical math evaluation. Is there any nicer implementation?
I have read Chapter 6 in Pearls of Functional Algorithm Design. It used some techniques to improve the performance. I found it really really obscurity and hard to understand. Anyone who read it can help?
Edit
I refined my code. Basically, I will scan the operator list first to glue all numbers where their operator is None.
Then in compute, if I meet a '-' I will simply negate the 2nd number.
A classic dynamic programming solution (which finds the = 104
solution instantly) that does not risk any problem with operators
associativity or precedence. It only returns a boolean saying whether
it's possible to come with the number; modifying it to return the
sequences of operations to get the solution is an easy but interesting
exercise, I was not motivated to go that far.
let operators = [ (+); ( * ); ]
module ISet = Set.Make(struct type t = int let compare = compare end)
let iter2 res1 res2 f =
res1 |> ISet.iter ## fun n1 ->
res2 |> ISet.iter ## fun n2 ->
f n1 n2
let can_make input target =
let has_zero = Array.fold_left (fun acc n -> acc || (n=0)) false input in
let results = Array.make_matrix (Array.length input) (Array.length input) ISet.empty in
for imax = 0 to Array.length input - 1 do
for imin = imax downto 0 do
let add n =
(* OPTIMIZATION: if the operators are known to be monotonous, we need not store
numbers above the target;
(Handling multiplication by 0 requires to be a bit more
careful, and I'm not in the mood to think hard about this
(I think one need to store the existence of a solution,
even if it is above the target), so I'll just disable the
optimization in that case)
*)
if n <= target && not has_zero then
results.(imin).(imax) <- ISet.add n results.(imin).(imax) in
let concat_numbers =
(* concatenates all number from i to j:
i=0, j=2 -> (input.(0)*10 + input.(1))*10 + input.(2)
*)
let rec concat acc k =
let acc = acc + input.(k) in
if k = imax then acc
else concat (10 * acc) (k + 1)
in concat 0 imin
in add concat_numbers;
for k = imin to imax - 1 do
let res1 = results.(imin).(k) in
let res2 = results.(k+1).(imax) in
operators |> List.iter (fun op ->
iter2 res1 res2 (fun n1 n2 -> add (op n1 n2););
);
done;
done;
done;
let result = results.(0).(Array.length input - 1) in
ISet.mem target result
Here is my solution, which evaluates according to the usual rules of precedence. It finds 303 solutions to find [1;2;3;4;5;6;7;8;9] 100 in under 1/10 second on my MacBook Pro.
Here are two interesting ones:
# 123 - 45 - 67 + 89;;
- : int = 100
# 1 * 2 * 3 - 4 * 5 + 6 * 7 + 8 * 9;;
- : int = 100
This is a brute force solution. The only slightly clever thing is that I treat concatenation of digits as simply another (high precedence) operation.
The eval function is the standard stack-based infix expression evaluation that you will find described many places. Here is an SO article about it: How to evaluate an infix expression in just one scan using stacks? The essence is to postpone evaulating by pushing operators and operands onto stacks. When you find that the next operator has lower precedence you can go back and evaluate what you pushed.
type op = Plus | Minus | Times | Divide | Concat
let prec = function
| Plus | Minus -> 0
| Times | Divide -> 1
| Concat -> 2
let succ = function
| Plus -> Minus
| Minus -> Times
| Times -> Divide
| Divide -> Concat
| Concat -> Plus
let apply op stack =
match op, stack with
| _, [] | _, [_] -> [] (* Invalid input *)
| Plus, a :: b :: tl -> (b + a) :: tl
| Minus, a :: b :: tl -> (b - a) :: tl
| Times, a :: b :: tl -> (b * a) :: tl
| Divide, a :: b :: tl -> (b / a) :: tl
| Concat, a :: b :: tl -> (b * 10 + a) :: tl
let rec eval opstack numstack ops nums =
match opstack, numstack, ops, nums with
| [], sn :: _, [], _ -> sn
| sop :: soptl, _, [], _ ->
eval soptl (apply sop numstack) ops nums
| [], _, op :: optl, n :: ntl ->
eval [op] (n :: numstack) optl ntl
| sop :: soptl, _, op :: _, _ when prec sop >= prec op ->
eval soptl (apply sop numstack) ops nums
| _, _, op :: optl, n :: ntl ->
eval (op :: opstack) (n :: numstack) optl ntl
| _ -> 0 (* Invalid input *)
let rec incr = function
| [] -> []
| Concat :: rest -> Plus :: incr rest
| x :: rest -> succ x :: rest
let find nums tot =
match nums with
| [] -> []
| numhd :: numtl ->
let rec try1 ops accum =
let accum' =
if eval [] [numhd] ops numtl = tot then
ops :: accum
else
accum
in
if List.for_all ((=) Concat) ops then
accum'
else try1 (incr ops) accum'
in
try1 (List.map (fun _ -> Plus) numtl) []
I came up with a slightly obscure implementation (for a variant of this problem) that is a bit better than brute force. It works in place, rather than generating intermediate data structures, keeping track of the combined values of the operators that have already been evaluated.
The trick is to keep track of a pending operator and value so that you can evaluate the "none" operator easily. That is, if the algorithm had just progressed though 1 + 23, the pending operator would be +, and the pending value would be 23, allowing you to easily generate either 1 + 23 + 4 or 1 + 234 as necessary.
type op = Add | Sub | Nothing
let print_ops ops =
let len = Array.length ops in
print_char '1';
for i = 1 to len - 1 do
Printf.printf "%s%d" (match ops.(i) with
| Add -> " + "
| Sub -> " - "
| Nothing -> "") (i + 1)
done;
print_newline ()
let solve k target =
let ops = Array.create k Nothing in
let rec recur i sum pending_op pending_value =
let sum' = match pending_op with
| Add -> sum + pending_value
| Sub -> if sum = 0 then pending_value else sum - pending_value
| Nothing -> pending_value in
if i = k then
if sum' = target then print_ops ops else ()
else
let digit = i + 1 in
ops.(i) <- Add;
recur (i + 1) sum' Add digit;
ops.(i) <- Sub;
recur (i + 1) sum' Sub digit;
ops.(i) <- Nothing;
recur (i + 1) sum pending_op (pending_value * 10 + digit) in
recur 0 0 Nothing 0
Note that this will generate duplicates - I didn't bother to fix that. Also, if you are doing this exercise to gain strength in functional programming, it might be beneficial to reject the imperative approach taken here and search for a similar solution that doesn't make use of assignments.

Writing infinite list to skip every factor of p?

How can I efficiently represent the list [0..] \\ [t+0*p, t+1*p ..]?
I have defined:
Prelude> let factors p t = [t+0*p, t+1*p ..]
I want to efficiently represent an infinite list that is the difference of [0..] and factors p t, but using \\ from Data.List requires too much memory for even medium-sized lists:
Prelude Data.List> [0..10000] \\ (factors 5 0)
<interactive>: out of memory
I know that I can represent the values between t+0*p and t+1*p with:
Prelude> let innerList p1 p2 t = [t+p1+1, t+p1+2 .. t+p2-1]
Prelude> innerList 0 5 0
[1,2,3,4]
However, repeatedly calculating and concatenating innerList for increasing intervals seems clumsy.
Can I efficiently represent [0..] \\ (factors p t) without calculating rem or mod for each element?
For the infinite list [0..] \\ [t,t+p..],
yourlist t p = [0..t-1] ++ [i | m <- [0,p..], i <- [t+m+1..t+m+p-1]]
Of course this approach doesn't scale, at all, if you'd want to remove some other factors, like
[0..] \\ [t,t+p..] \\ [s,s+q..] \\ ...
in which case you'll have to remove them in sequence with minus, mentioned in Daniel Fischer's answer. There is no magic bullet here.
But there's also a union, with which the above becomes
[0..] \\ ( [t,t+p..] `union` [s,s+q..] `union` ... )
the advantage is, we can arrange the unions in a tree, and get algorithmic improvement.
You can't use (\\) for that, because
(\\) :: (Eq a) => [a] -> [a] -> [a]
(\\) = foldl (flip delete)
the list of elements you want to remove is infinite, and a left fold never terminates when the list it folds over is infinite.
If you rather want to use something already written than write it yourself, you can use minus from the data-ordlist package.
The performance should be adequate.
Otherwise,
minus :: Ord a => [a] -> [a] -> [a]
minus xxs#(x:xs) yys#(y:ys)
| x < y = x : minus xs yys
| x == y = minus xs ys
| otherwise = minus xss ys
minus xs _ = xs
You can use a list comprehesion with a predicate, using rem:
>>> let t = 0
>>> let p = 5
>>> take 40 $ [ x | x <- [1..], x `rem` p /= t ]
[1,2,3,4,6,7,8,9,11,12,13,14,16,17,18,19,21,22,23,24,26,27,28,29,31,32,33,34,36,37,38,39,41,42,43,44,46,47,48,49]
If you want efficiency, why does your solution have to use list comprehension syntax?
Why not something like this?
gen' n i p | i == p = gen' (n + p) 1 p
gen' n i p = (n+i) : gen' n (i+1) p
gen = gen' 0 1
and then do
gen 5
Because you have ascending lists, you can simply lazily merge them:
nums = [1..]
nogos = factors p t
result = merge nums (dropWhile (<head nums) nogos) where
merge (a:as) (b:bs)
| a < b = a : merge as (b:bs)
| a == b = merge as bs
| otherwise = error "should not happen"
Writing this in a general way so that we have a function that builds the difference of two infinite lists, provided only that they are in ascending order, is left as exercise. In the end, the following should be possible
[1..] `infiniteDifference` primes `infiniteDifference` squares
For this, make it a left associative operator.

Resources