Verify wether the following answer is correct? - computation-theory

I am asked to write the grammar which generate the following language over the alphabet Z={a,b}
M={w | numbers of b's in w is 3 modulo 4}
My Answer is
S->bP| bJ | aS
P->bQ | bK | aP
Q->bR | bL | aQ
R->bS | e | aS
L->e
will this work?
Can we make it better?

Not sure what J, K and L are. But yes, you can probably do better; a DFA with 4 states can recognize your language, so there's definitely a regular grammar with four nonterminals:
S -> aS | bR
R -> aR | bT
T -> aT | bU | b
U -> aU | bS | a
This works because states S, R, T and U correspond to having seen 0, 1, 2 and 3 instances of b, modulo 4. Seeing instances of a leaves you in whatever state you were in before, while seeing b takes you to the next state. Because state u is accepting we add U -> e, and then remove the empty production in the usual way.

Related

Max sum in the vector, according to the condition, which is defined by another vector

I have 2 variables in the data.frame: a, b.
I need to find the max sum of a, where sum of b = x.
Ok, for example:
| a | b |
|401| 2 |
|380| 3 |
|380| 2 |
|370| 1 |
So, for sum(b)=1, max(sum(a)) = 370, for sum(b)=2, max(sum(a))=401 etc.
How can I find a solution to this problem?
Not sure that this problem can be solved using linear programming

Dynamic Programming Algorithm: Walking on Grid

I was given a few practice questions for an upcoming exam. I already have been given the solution for this problem which is described in this picture here
There really is no explanation to the solution.
I am curious as to how I can arrive to an answer here. I figure i can create a bunch of subproblems like
Traversal from A->C, A->D, A->E, then figure out A->B based on the previous solutions. But I am quite lost.
First, how many ways are there to get from (0,0) to (x, y) using only R=right and U=up steps? (No restrictions on not passing through other points.) Each such path with have length x+y and contain x R's and y U's, so there are binom(x+y, x) or binom(x+y, y) ways to do this.
Using this information you could calculate how many paths there are from A-B (call this nAB), from A-C (call it nAC), from A-D, ... etc (all of the combination of pairs). Note that there are no paths from C to D (since you cannot go down) and no paths from D to C since you cannot go left.
Now, use inclusion exclusion. The idea is to subtract off bad cases. For example, a bad case would be starting at A going through C and then to B. How many ways can this be done? nAC x nCB. Another bad case is going through E, there are nAE x nEB ways to do this... subtract them off. Then going through D is also bad. There are nAD x nDB ways to end up at B going through E... subtract those off too. Now, the problem is you've subtracted off too much (paths that go through 2 of the bad points)... so add those back in. How many points go through C and E and end at B? nAC x nCE x nEB, add those in. How many points go through D and E? nAD x nDE x nEB, add those in. In principle, you'd then subtract the paths that went through all three but there are none of those.
This problem can be solved by hand in a way very similar to how you calculate the numbers in Pascal's triangle. In that triangle, every number is the sum of the two numbers above it. Similarly, in your case, the number of ways to get to a certain point (using the shortest path), is simply the sum of the number of ways to get to the point to its left and to the one below it. Demonstrated with ASCII art:
1 -> 1 -> 7 -> 7 ->14
^ ^ ^
| | |
1 C 6 E 7
^ ^ ^
| | |
1 -> 3 -> 6 -> 6 -> 7
^ ^ ^ ^
| | | |
1 -> 2 -> 3 D 1
^ ^ ^ ^
| | | |
A -> 1 -> 1 -> 1 -> 1
So to calculate the number of paths for a certain field, you first need to know the answer to the two sub-problems to its left and below it. This is a textbook example of a problem that can be solved using dynamic programming. The only tricky bit in implementing this is how you handle the forbidden points and the edges. One way how you could do this in practice would be to initialize all your edges and your forbidden points to zero, and point A to 1:
0 ->
^ ^
| |
0 -> 0 -> 0 ->
0 ->
^
|
0 -> 0 ->
^
|
1 ->
^ ^ ^ ^
| | | |
0 0 0 0
From there on, you can calculate all the missing fields using a simple sum, starting from point A in the lower left, and working your way up to point B in the top right:
0 -> 1 -> 1 -> 7 -> 7 ->14
^ ^ ^ ^ ^
| | | | |
0 -> 1 0 -> 6 0 -> 7
^ ^ ^
| | |
0 -> 1 -> 3 -> 6 -> 6 -> 7
^ ^ ^ ^ ^
| | | | |
0 -> 1 -> 2 -> 3 0 -> 1
^ ^ ^ ^
| | | |
1 -> 1 -> 1 -> 1 -> 1
^ ^ ^ ^
| | | |
0 0 0 0

Why SRP is not plaintext-equivalent?

About the SRP Protocol:
http://en.wikipedia.org/wiki/Secure_remote_password_protocol
I can see that the generation of the session key (K) is perfectly safe, but in the last step the user sends proof of K (M). If the network is insecure and the attacker in the midlle captures M, he would be able to authenticate without having K. right?
A little background
Well known values (established beforehand):
n A large prime number. All computations are performed modulo n.
g A primitive root modulo n (often called a generator).
The users password is established as:
x = H(s, P)
v = g^x
H() One-way hash function
s A random string used as the user's salt
P The user's password
x A private key derived from the password and salt
v The host's password verifier
The authentication:
+---+------------------------+--------------+----------------------+
| | Alice | Public Wire | Bob |
+---+------------------------+--------------+----------------------+
| 1 | | C --> | (lookup s, v) |
| 2 | x = H(s, P) | <-- s | |
| 3 | A = g^a | A --> | |
| 4 | | <-- B, u | B = v + g^b |
| 5 | S = (B - g^x)^(a + ux) | | S = (A ยท v^u)^b |
| 6 | K = H(S) | | K = H(S) |
| 7 | M[1] = H(A, B, K) | M[1] --> | (verify M[1]) |
| 8 | (verify M[2]) | <-- M[2] | M[2] = H(A, M[1], K) |
+---+------------------------+--------------+----------------------+
u Random scrambling parameter, publicly revealed
a,b Ephemeral private keys, generated randomly and not publicly revealed
A,B Corresponding public keys
m,n The two quantities (strings) m and n concatenated
S Calculated exponential value
K Session key
The answer to your question:
As you can see, both parties calculate K (=the session key) separately, based upon the values available to each of them.
If Alice's password P entered in Step 2 matches the one she originally used to generate v, then both values of S will match.
The actual session key K is however never send over the wire, only the proof that both parties have successfully calculated the same session key. So a man-in-the middle could resend the proof, but since he does not have the actual session key, he would not be able to do anything with the intercepted data.
The proof is only valid for a certain K.
Without MITM:
Alice <-K-> Bob
Alice produces a proof for K and Bob accepts it
With MITM:
Alice <-K1-> Eve <-K2-> Bob
Alice produces a proof for K1 but when Eve presents it to Bob he doesn't accept it because it doesn't fit K2.

Why is (a | b ) equivalent to a - (a & b) + b?

I was looking for a way to do a BITOR() with an Oracle database and came across a suggestion to just use BITAND() instead, replacing BITOR(a,b) with a + b - BITAND(a,b).
I tested it by hand a few times and verified it seems to work for all binary numbers I could think of, but I can't think out quick mathematical proof of why this is correct.
Could somebody enlighten me?
A & B is the set of bits that are on in both A and B. A - (A & B) leaves you with all those bits that are only on in A. Add B to that, and you get all the bits that are on in A or those that are on in B.
Simple addition of A and B won't work because of carrying where both have a 1 bit. By removing the bits common to A and B first, we know that (A-(A&B)) will have no bits in common with B, so adding them together is guaranteed not to produce a carry.
Imagine you have two binary numbers: a and b. And let's say that these number never have 1 in the same bit at the same time, i.e. if a has 1 in some bit, the b always has 0 in the corresponding bit. And in other direction, if b has 1 in some bit, then a always has 0 in that bit. For example
a = 00100011
b = 11000100
This would be an example of a and b satisfying the above condition. In this case it is easy to see that a | b would be exactly the same as a + b.
a | b = 11100111
a + b = 11100111
Let's now take two numbers that violate our condition, i.e. two numbers have at least one 1 in some common bit
a = 00100111
b = 11000100
Is a | b the same as a + b in this case? No
a | b = 11100111
a + b = 11101011
Why are they different? They are different because when we + the bit that has 1 in both numbers, we produce so called carry: the resultant bit is 0, and 1 is carried to the next bit to the left: 1 + 1 = 10. Operation | has no carry, so 1 | 1 is again just 1.
This means that the difference between a | b and a + b occurs when and only when the numbers have at least one 1 in common bit. When we sum two numbers with 1 in common bits, these common bits get added "twice" and produce a carry, which ruins the similarity between a | b and a + b.
Now look at a & b. What does a & b calculate? a & b produces the number that has 1 in all bits where both a and b have 1. In our latest example
a = 00100111
b = 11000100
a & b = 00000100
As you saw above, these are exactly the bits that make a + b differ from a | b. The 1 in a & b indicate all positions where carry will occur.
Now, when we do a - (a & b) we effectively remove (subtract) all "offending" bits from a and only such bits
a - (a & b) = 00100011
Numbers a - (a & b) and b have no common 1 bits, which means that if we add a - (a & b) and b we won't run into a carry, and, if you think about it, we should end up with the same result as if we just did a | b
a - (a & b) + b = 11100111
A&B = C where any bits left set in C are those set in both A and in B.
Either A-C = D or B-C = E sets just these common bits to 0. There is no carrying effect because 1-1=0.
D+B or E+A is similar to A+B except that because we subtracted A&B previously there will be no carry due to having cleared all commonly set bits in D or E.
The net result is that A-A&B+B or B-A&B+A is equivalent to A|B.
Here's a truth table if it's still confusing:
A | B | OR A | B | & A | B | - A | B | +
---+---+---- ---+---+--- ---+---+--- ---+---+---
0 | 0 | 0 0 | 0 | 0 0 | 0 | 0 0 | 0 | 0
0 | 1 | 1 0 | 1 | 0 0 | 1 | 0-1 0 | 1 | 1
1 | 0 | 1 1 | 0 | 0 1 | 0 | 1 1 | 0 | 1
1 | 1 | 1 1 | 1 | 1 1 | 1 | 0 1 | 1 | 1+1
Notice the carry rows in the + and - operations, we avoid those because A-(A&B) sets cases were both bits in A and B are 1 to 0 in A, then adding them back from B also brings in the other cases were there was a 1 in either A or B but not where both had 0, so the OR truth table and the A-(A&B)+B truth table are identical.
Another way to eyeball it is to see that A+B is almost like A|B except for the carry in the bottom row. A&B isolates that bottom row for us, A-A&B moves those isolated cased up two rows in the + table, and the (A-A&B)+B becomes equivalent to A|B.
While you could commute this to A+B-(A&B), I was afraid of a possible overflow but that was unjustified it seems:
#include <stdio.h>
int main(){ unsigned int a=0xC0000000, b=0xA0000000;
printf("%x %x %x %x\n",a, b, a|b, a&b);
printf("%x %x %x %x\n",a+b, a-(a&b), a-(a&b)+b, a+b-(a&b)); }
c0000000 a0000000 e0000000 80000000
60000000 40000000 e0000000 e0000000
Edit: So I wrote this before there were answers, then there was some 2 hours of down time on my home connection, and I finally managed to post it, noticing only afterwards that it'd been properly answered twice. Personally I prefer referring to a truth table to work out bitwise operations, so I'll leave it in case it helps someone.

Apriori Algorithm

I've heard about the Apriori algorithm several times before but never got the time or the opportunity to dig into it, can anyone explain to me in a simple way the workings of this algorithm? Also, a basic example would make it a lot easier for me to understand.
Apriori Algorithm
It is a candidate-generation-and-test approach for frequent pattern mining in datasets. There are two things you have to remember.
Apriori Pruning Principle - If any itemset is infrequent, then its superset should not be generated/tested.
Apriori Property - A given (k+1)-itemset is a candidate (k+1)-itemset only if everyone of its k-itemset subsets are frequent.
Now, here is the apriori algorithm in 4 steps.
Initially, scan the database/dataset once to get the frequent 1-itemset.
Generate length k+1 candidate itemsets from length k frequent itemsets.
Test the candidates against the database/dataset.
Terminate when no frequent or candidate set can be generated.
Solved Example
Suppose there is a transaction database as follows with 4 transactions including their transaction IDs and items bought with them. Assume the minimum support - min_sup is 2. The term support is the number of transactions in which a certain itemset is present/included.
Transaction DB
tid | items
-------------
10 | A,C,D
20 | B,C,E
30 | A,B,C,E
40 | B,E
Now, let's create the candidate 1-itemsets by the 1st scan of the DB. It is simply called as the set of C_1 as follows.
itemset | sup
-------------
{A} | 2
{B} | 3
{C} | 3
{D} | 1
{E} | 3
If we test this with min_sup, we can see {D} does not satisfy the min_sup of 2. So, it will not be included in the frequent 1-itemset, which we simply call as the set of L_1 as follows.
itemset | sup
-------------
{A} | 2
{B} | 3
{C} | 3
{E} | 3
Now, let's scan the DB for the 2nd time, and generate candidate 2-itemsets, which we simply call as the set of C_2 as follows.
itemset | sup
-------------
{A,B} | 1
{A,C} | 2
{A,E} | 1
{B,C} | 2
{B,E} | 3
{C,E} | 2
As you can see, {A,B} and {A,E} itemsets do not satisfy the min_sup of 2 and hence they will not be included in the frequent 2-itemset, L_2
itemset | sup
-------------
{A,C} | 2
{B,C} | 2
{B,E} | 3
{C,E} | 2
Now let's do a 3rd scan of the DB and get candidate 3-itemsets, C_3 as follows.
itemset | sup
-------------
{A,B,C} | 1
{A,B,E} | 1
{A,C,E} | 1
{B,C,E} | 2
You can see that, {A,B,C}, {A,B,E} and {A,C,E} does not satisfy min_sup of 2. So they will not be included in frequent 3-itemset, L_3 as follows.
itemset | sup
-------------
{B,C,E} | 2
Now, finally, we can calculate the support (supp), confidence (conf) and lift (interestingness value) values of the Association/Correlation Rules that can be generated by the itemset {B,C,E} as follows.
Rule | supp | conf | lift
-------------------------------------------
B -> C & E | 50% | 66.67% | 1.33
E -> B & C | 50% | 66.67% | 1.33
C -> E & B | 50% | 66.67% | 1.77
B & C -> E | 50% | 100% | 1.33
E & B -> C | 50% | 66.67% | 1.77
C & E -> B | 50% | 100% | 1.33
See Top 10 algorithms in data mining (free access) or The Top Ten Algorithms in Data Mining. The latter gives a detailed description of the algorithm, together with details on how to get optimized implementations.
Well, I would assume you've read the wikipedia entry but you said "a basic example would make it a lot easier for me to understand". Wikipedia has just that so I'll assume you haven't read it and suggest that you do.
Read the wikipedia article.
The best introduction to Apriori can be downloaded from this book:
http://www-users.cs.umn.edu/~kumar/dmbook/index.php
you can download the chapter 6 for free which explain Apriori very clearly.
Moreover, if you want to download a Java version of Apriori and other algorithms for frequent itemset mining, you can check my website:
http://www.philippe-fournier-viger.com/spmf/

Resources