I'm trying to understand how a old machine (PLC) generates a check byte in its data exchange, but i can't figure what and how is done or what kind of algorithm is using.
I have a very sparse documentation about the machine and i already try some algorithms like normal crc, ccitt crc, xmodem crc type... and no one is given the right result.
The message is formed like this:
M*NNNNNNwwSSdd
where:
M* - is fixed
NNNNNN - N is a number or a space
ww - w is a number too or a space
SS - S is a char or a space
dd - d a number or a space
Some of the examples generate the following byte check (where de byte '×' is realy the space char ' ', i use this char only to be easier to identify the number of spaces):
a:
M*614976××××12 -> a
M*615138×××××× -> a
b:
M*615028××××12 -> b
M*615108×××××× -> b
c:
M*614933×××××× -> c
M*614956××××12 -> c
d:
M*614934×××××× -> d
M*614951××××12 -> d
e:
M*614942×××××× -> e
M*615079×××××× -> e
f:
M*614719××××12 -> f
M*614936×××××× -> f
g:
M*614718××××12 -> g
M*614937×××××× -> g
h:
M*614727×××××× -> h
M*614980××××12 -> h
i:
M*614734××××12 -> i
M*614939×××××× -> i
M*×××××××××××× -> i
z:
M*××××××××SC12 -> z
j:
M*××××××××××12 -> j
y:
M*××××××××SC×× -> y
There are more combinations but these ones are enough.
Another particularity is that the check byte result exists only in a defined range - from char 0x60 to 0x7F and no more (the current solution is working because i loop in this range until the machine gives me an ok)
So my question is, do you know how this check byte is calculated? can you point me some simpler algorithms to calculate the integrity of data in PLC machines, it must be simpler that the result byte check is only one char.
Thanks
It seems to me that if I xor together all the characters in the message, treating them as ascii and replacing your odd quasi-x with space, and then xor in 0xe, I get the character in the checksum. At the very least I suggest that you construct a table showing the xor of all the characters in the message, and the checksum character, written out as hex. Something like this is quite plausible considering the block check described in www.bttautomatyka.com.pl/pdf/HA466357.pdf
(I had actually written a mod-2 equation solver and was going to look for a 5-bit CRC, when this popped out!)
Related
Can somebody please help me draw a NFA that accepts this language:
{ w | the length of w is 6k + 1 for some k ≥ 0 }
I have been stuck on this problem for several hours now. I do not understand where the k comes into play and how it is used in the diagram...
{ w | the length of w is 6k + 1 for some k ≥ 0 }
We can use the Myhill-Nerode theorem to constructively produce a provably minimal DFA for this language. This is a useful exercise. First, a definition:
Two strings w and x are indistinguishable with respect to a language L iff: (1) for every string y such that wy is in L, xy is in L; (2) for every string z such that xz is in L, wz is in L.
The insight in Myhill-Nerode is that if two strings are indistinguishable w.r.t. a regular language, then a minimal DFA for that language will see to it that the machine ends up in the same state for either string. Indistinguishability is reflexive, symmetric and transitive so we can define equivalence classes on it. Those equivalence classes correspond directly to the set of states in the minimal DFA. Now, to find the equivalence classes for our language. We consider strings of increasing length and see for each one whether it's indistinguishable from any of the strings before it:
e, the empty string, has no strings before it. We need a state q0 to correspond to the equivalence class this string belongs to. The set of strings that can come after e to reach a string in L is L itself; also written c(c^6)*
c, any string of length one, has only e before it. These are not, however, indistinguishable; we can add e to c to get ce = c, a string in L, but we cannot add e to e to get a string in L, since e is not in L. We therefore need a new state q1 for the equivalence class to which c belongs. The set of strings that can come after c to reach a string in L is (c^6)*.
It turns out we need a new state q2 here; the set of strings that take cc to a string in L is ccccc(c^6)*. Show this.
It turns out we need a new state q3 here; the set of strings that take ccc to a string in L is cccc(c^6)*. Show this.
It turns out we need a new state q4 here; the set of strings that take cccc to a string in L is ccc(c^6)*. Show this.
It turns out we need a new state q5 here; the set of strings that take ccccc to a string in L is cc(c^6)*. Show this.
Consider the string cccccc. What strings take us to a string in L? Well, c does. So does c followed by any string of length 6. Interestingly, this is the same as L itself. And we already have an equivalence class for that: e could also be followed by any string in L to get a string in L. cccccc and e are indistinguishable. What's more: since all strings of length 6 are indistinguishable from shorter strings, we no longer need to keep checking longer strings. Our DFA is guaranteed to have one the states q0 - q5 we have already identified. What's more, the work we've done above defines the transitions we need in our DFA, the initial state and the accepting states as well:
The DFA will have a transition on symbol c from state q to state q' if x is a string in the equivalence class corresponding to q and xc is a string in the equivalence class corresponding to q';
The initial state will be the state corresponding to the equivalence class to which e, the empty string, belongs;
A state q is accepting if any string (hence all strings) belonging to the equivalence class corresponding to the language is in the language; alternatively, if the set of strings that take strings in the equivalence class to a string in L includes e, the empty string.
We may use the notes above to write the DFA in tabular form:
q x q'
-- -- --
q0 c q1 // e + c = c
q1 c q2 // c + c = cc
q2 c q3 // cc + c = ccc
q3 c q4 // ccc + c = cccc
q4 c q5 // cccc + c = ccccc
q5 c q0 // ccccc + c = cccccc ~ e
We have q0 as the initial state and the only accepting state is q1.
Here's a NFA which goes 6 states forward then if there is one more character it stops on the final state. Otherwise it loops back non-deterministcally to the start and past the final state.
(Start) S1 -> S2 -> S3 -> S5 -> S6 -> S7 (Final State) -> S8 - (loop forever)
^ |
^ v |_|
|________________________| (non deterministically)
As a personal excercize in the process of learning Haskell, I'm trying to port this F# snippet for Random Art.
I've not embedded full source code for not bloating the question, but is available as gist.
An important part of the program is this Expr type:
data Expr =
VariableX
| VariableY
| Constant
| Sum Expr Expr
| Product Expr Expr
| Mod Expr Expr
| Well Expr
| Tent Expr
| Sin Expr
| Level Expr Expr Expr
| Mix Expr Expr Expr
deriving Show
and two functions:
gen :: Int -> IO Expr random generates a tree-like structure given a number of iterations
eval :: Expr -> IO (Point -> Rgb Double) walks the tree and terminates producing a drawing function.
More high is the number passed to gen than higher are the probability that the following exception is generated: Ratio has zero denominator.
I'm new to Haskell so to solve the problem I've tried to compile it as above:
ghc RandomArt.hs -prof -auto-all -caf-all
Obtaining only this more (to me quite useless) info:
$ ./RandomArt +RTS -xc
*** Exception (reporting due to +RTS -xc): (THUNK_STATIC), stack trace:
GHC.Real.CAF
--> evaluated by: Main.eval.\,
called from Main.eval,
called from Main.tga.pxs',
called from Main.tga,
called from Main.save,
called from Main.main,
called from :Main.CAF:main
--> evaluated by: Main.eval.\.r,
called from Main.eval.\,
called from Main.eval,
called from Main.tga.pxs',
called from Main.tga,
called from Main.save,
called from Main.main,
called from :Main.CAF:main
*** Exception (reporting due to +RTS -xc): (THUNK_STATIC), stack trace:
Main.tga,
called from Main.save,
called from Main.main,
called from GHC.Real.CAF
RandomArt: Ratio has zero denominator
The code that persist the generated function to a TGA file works because it was my previous excercize (a port from OCaml).
I've tried executing various Expr tree from GHCi, assembling data by hand or applying functions as in the program but I wasn't able to identify the bug.
Haskell docs talks about a package named loch that should able to compile preserving source code line numbers, but I was not able to install it (while I normally install with cabal install every package I need).
The question, to be honest are two:
where's is the bug (in this specific case)?
which tool do I need to master to find bugs like this (or bugs in general)?
Thanks in advance.
The exception
Let's focus on the exception first.
Finding the bug
where's is the bug (in this specific case)?
In mod'. We can check this easily if we provide an alternative version instead of the one by Data.Fixed:
mod' :: RealFrac a => a -> a -> a
mod' _ 0 = error "Used mod' on 0"
mod' a b =
let k = floor $ a / b
in a - (fromInteger k) * b
We now get Used mod' on 0.
Rationale
which tool do I need to master to find bugs like this (or bugs in general)?
In this case, the necessary hint was already in the exception's message:
Ratio has zero denominator
This means that there's a place where you divide by zero in the context of a Ratio. So you need to look after all places where you divide something. Since you use only (/) and mod', it boils down to whether one of them actually can throw this exception:
(/) usually returns ±Infinity on division by zero if used on Double,
mod' uses toRational internally, which is a Ratio Integer.
So there's only one culprit left. Note that the other implementation yields the same results if b isn't zero.
The actual problem
Using mod or mod' with b == 0 isn't well-defined. After all, a modulo operation should hold the following property:
prop_mod :: Integral n => n -> n -> Bool
prop_mod a b =
let m = a `mod` b
d = a `div` b
in a == b * d + m -- (1)
&& abs m < abs b -- (2)
If b == 0, there doesn't exist any pair (d, m) such that (1) and (2) hold. If we relax this law and throw (2) away, the result of mod isn't necessarily unique anymore. This leads to the following definition:
mod' :: RealFrac a => a -> a -> a
mod' a 0 = a -- this is arbitrary
mod' a b =
let k = floor $ a / b
in a - (fromInteger k) * b
However, this is an arbitrary definition. You have to ask yourself, "What do I actually want to do if I cannot use mod in a sane way". Since F# apparently didn't complain about a % 0, have a look at their documentation.
Either way, you cannot use a library mod function, since they aren't defined for a zero denominator.
My problem is the following. I have a list of substitutions, including one substitution for each letter of the alphabet, but also some substitutions for groups of more than one letter. For example, in my cipher p becomes b, l becomes w, e becomes i, but le becomes by, and ple becomes memi.
So, while I can think of a few simple/naïve ways of implementing this cipher, it's not very efficient, and I was wondering what the most efficient way to do it would be. The answer doesn't have to be in any particular language, a general structured English algorithm would be fine, but if it must be in some language I'd prefer C++ or Java or similar.
EDIT: I don't need this cipher to be decipherable, an algorithm that mapped all single letters to the letter 'w' but mapped the string 'had' to the string 'jon' instead should be ok, too (then the string "Mary had a little lamb." would become "Wwww jon w wwwwww wwww.").
I'd like the algorithm to be fully general.
One possible approach is to use deterministic automaton. The closest to your problem and commonly used example is Aho–Corasick string matching algorithm. The difference will be, instead of matching you would like to emit cypher at some transition. Generally at each transition you will emit or do not emit cypher.
In your example
p -> b
l -> w
e -> i
le -> by
ple -> memi
The automaton (in Erlang like pseudocode)
start(p) -> p(next());
start(l) -> l(next());
start(e) -> e(next());
...
p(l) -> pl(next);
p(X) -> emit(b), start(X).
l(e) -> emit(by), start(next());
l(X) -> emit(w), start(X).
e(X) -> emit(i), start(X).
pl(e) -> emit(memi), start(next());
pl(X) -> emit(b), l(X).
If you are not familiar with Erlang, start(), p() are functions each for one state. Each line with -> is one transition and the actions follows the ->. emit() is function which emits cypher and next() is function returning next character. The X is variable for any other character.
I have a question about algorithm design.
Suppose I have S, R, and B these three bitstreams as below in an encoder, and the output is D.
My question is how to derive the S or B in decoder with the only given D and R.
My initial idea goes below with xor (^) operation, and D=1001.
Encoder:
S |R |X=S^R |B |D=X^B
0001 |1010 |1011 |0010 |1001
Moreover, in decoder, it receives only D and R as below,
Decoder:
D |R |
1001 |1010 |
yet basically X=S^R and X=B^D, thus S^R=B^D or said D^R = S^B. Hence, I have no idea about how to derive S and B separately.
Does any one can give an concept, or a thought, or an algorithm you known to deal with this.
Many thanks:)
You can't. D = S ^ B ^ R, given D and R, you can get S ^ B back (as you showed), but the rest of the information is just gone.
Of course it had to be gone: you have only 8 bits, you can't store 12 bits of information in there. If you could, you could recursively apply that transformation until you've compressed any arbitrary amount of information down to a single byte.
But, maybe you have some a-priory knowledge about S and B, or some useful relation between them or between one of them and R. If you have enough of that knowledge, you might be able to reconstruct S and B (for example, if B = R, then D = S so you know them all).
I wanted to write an efficient implementation of the Floyd-Warshall all pairs shortest path algorithm in Haskell using Vectors to hopefully get good performance.
The implementation is quite straight-forward, but instead of using a 3-dimensional |V|×|V|×|V| matrix, a 2-dimensional vector is used, since we only ever read the previous k value.
Thus, the algorithm is really just a series of steps where a 2D vector is passed in, and a new 2D vector is generated. The final 2D vector contains the shortest paths between all nodes (i,j).
My intuition told me that it would be important to make sure that the previous 2D vector was evaluated before each step, so I used BangPatterns on the prev argument to the fw function and the strict foldl':
{-# Language BangPatterns #-}
import Control.DeepSeq
import Control.Monad (forM_)
import Data.List (foldl')
import qualified Data.Map.Strict as M
import Data.Vector (Vector, (!), (//))
import qualified Data.Vector as V
import qualified Data.Vector.Mutable as V hiding (length, replicate, take)
type Graph = Vector (M.Map Int Double)
type TwoDVector = Vector (Vector Double)
infinity :: Double
infinity = 1/0
-- calculate shortest path between all pairs in the given graph, if there are
-- negative cycles, return Nothing
allPairsShortestPaths :: Graph -> Int -> Maybe TwoDVector
allPairsShortestPaths g v =
let initial = fw g v V.empty 0
results = foldl' (fw g v) initial [1..v]
in if negCycle results
then Nothing
else Just results
where -- check for negative elements along the diagonal
negCycle a = any not $ map (\i -> a ! i ! i >= 0) [0..(V.length a-1)]
-- one step of the Floyd-Warshall algorithm
fw :: Graph -> Int -> TwoDVector -> Int -> TwoDVector
fw g v !prev k = V.create $ do -- ← bang
curr <- V.new v
forM_ [0..(v-1)] $ \i ->
V.write curr i $ V.create $ do
ivec <- V.new v
forM_ [0..(v-1)] $ \j -> do
let d = distance g prev i j k
V.write ivec j d
return ivec
return curr
distance :: Graph -> TwoDVector -> Int -> Int -> Int -> Double
distance g _ i j 0 -- base case; 0 if same vertex, edge weight if neighbours
| i == j = 0.0
| otherwise = M.findWithDefault infinity j (g ! i)
distance _ a i j k = let c1 = a ! i ! j
c2 = (a ! i ! (k-1))+(a ! (k-1) ! j)
in min c1 c2
However, when running this program with a 1000-node graph with 47978 edges, things does not look good at all. The memory usage is very high and the program takes way too long to run. The program was compiled with ghc -O2.
I rebuilt the program for profiling, and limited the number of iterations to 50:
results = foldl' (fw g v) initial [1..50]
I then ran the program with +RTS -p -hc and +RTS -p -hd:
This is... interesting, but I guess it's showing that it's accumulating tonnes of thunks. Not good.
Ok, so after a few shots in the dark, I added a deepseq in fw to make sure prev really is evaluted:
let d = prev `deepseq` distance g prev i j k
Now things look better, and I can actually run the program to completion with constant memory usage. It's obvious that the bang on the prev argument was not enough.
For comparison with the previous graphs, here is the memory usage for 50 iterations after adding the deepseq:
Ok, so things are better, but I still have some questions:
Is this the correct solution for this space leak? I am wrong in feeling that inserting a deepseq is a bit ugly?
Is my usage of Vectors here idiomatic/correct? I'm building a completely new vector for every iteration and hoping that the garbage collector will delete the old Vectors.
Is there any other things I could do to make this run faster with this approach?
For references, here is graph.txt: http://sebsauvage.net/paste/?45147f7caf8c5f29#7tiCiPovPHWRm1XNvrSb/zNl3ujF3xB3yehrxhEdVWw=
Here is main:
main = do
ls <- fmap lines $ readFile "graph.txt"
let numVerts = head . map read . words . head $ ls
let edges = map (map read . words) (tail ls)
let g = V.create $ do
g' <- V.new numVerts
forM_ [0..(numVerts-1)] (\idx -> V.write g' idx M.empty)
forM_ edges $ \[f,t,w] -> do
-- subtract one from vertex IDs so we can index directly
curr <- V.read g' (f-1)
V.write g' (f-1) $ M.insert (t-1) (fromIntegral w) curr
return g'
let a = allPairsShortestPaths g numVerts
case a of
Nothing -> putStrLn "Negative cycle detected."
Just a' -> do
putStrLn $ "The shortest, shortest path has length "
++ show ((V.minimum . V.map V.minimum) a')
First, some general code cleanup:
In your fw function, you explicitly allocate and fill mutable vectors. However, there is a premade function for this exact purpose, namely generate. fw can therefore be rewritten as
V.generate v (\i -> V.generate v (\j -> distance g prev i j k))
Similarly, the graph generation code can be replaced with replicate and accum:
let parsedEdges = map (\[f,t,w] -> (f - 1, (t - 1, fromIntegral w))) edges
let g = V.accum (flip (uncurry M.insert)) (V.replicate numVerts M.empty) parsedEdges
Note that this totally removes all need for mutation, without losing any performance.
Now, to the actual questions:
In my experience, deepseq is very useful, but only as quick fix to space leaks like this one. The fundamental problem is not that you need to force the results after you've produced them. Instead, the use of deepseq implies that you should have been building the structure more strictly in the first place. In fact, if you add a bang pattern in your vector creation code like so:
let !d = distance g prev i j k
Then the problem is fixed without deepseq. Note that this doesn't work with the generate code, because, for some reason (I might create a feature request for this), vector does not provide strict functions for boxed vectors. However, when I get to unboxed vectors in answer to question 3, which are strict, both approaches work without strictness annotations.
As far as I know, the pattern of repeatedly generating new vectors is idiomatic. The only thing not idiomatic is the use of mutability - except when they are strictly necessary, mutable vectors are generally discouraged.
There are a couple of things to do:
Most simply, you can replace Map Int with IntMap. As that isn't really the slow point of the function, this doesn't matter too much, but IntMap can be much faster for heavy workloads.
You can switch to using unboxed vectors. Although the outer vector has to remain boxed, as vectors of vectors can't be unboxed, the inner vector can be. This also solves your strictness problem - because unboxed vectors are strict in their elements, you don't get a space leak. Note that on my machine, this improves the performance from 4.1 seconds to 1.3 seconds, so the unboxing is very helpful.
You can flatten the vector into a single one and use multiplication and division to switch between two dimensional indicies and one dimentional indicies. I don't recommend this, as it is a bit involved, quite ugly, and, due to the division, actually slows down the code on my machine.
You can use repa. This has the huge advantage of automatically parallelizing your code. Note that, since repa flattens its arrays and apparently doesn't properly get rid of the divisions needed to fill nicely (it's possible to do with nested loops, but I think it uses a single loop and a division), it has the same performance penalty as I mentioned above, bringing the runtime from 1.3 seconds to 1.8. However, if you enable parallelism and use a multicore machine, you start seeing some benifits. Unfortunately, you current test case is too tiny to see much benifit, so, on my 6 core machine, I see it drop back down to 1.2 seconds. If I up the size back to [1..v] instead of [1..50], the parallelism brings it from 32 seconds to 13. Presumably, if you give this program a larger input, you might see more benifit.
If you're interested, I've posted my repa-ified version here.
EDIT: Use -fllvm. Testing on my computer, using repa, I get 14.7 seconds without parallelism, which is almost as good as without -fllvm and with parallelism. In general, LLVM can just handle array based code like this very well.