Prove that the language is context free - formal-languages

How can we design a PDA for the language
L= {w | number of 010s in w is more than the number of 101s}
to prove it is context free

You use the stack as a counter for the difference of the numbers of 010 and 101 up to the point you have read. For example, you can use A to count the 010 if they have been more and B for the 101.
Now in your state you remember the last two symbols you have seen. If together with the currently read symbol they form 010 you:
Put an A on the stack if the topmost stack symbol is an A or the stack is empty.
Else remove a B from the stack.
Analogously for 101.
So when you finish reading the input there is one ore more A on the stack if the input word is to be accepted.

Related

What is the quickest algorithm to merge input symbols and remove blanks in a turing machine

I am looking to find an algorithm that would let me while starting at the right most of an input string containing blanks in between, merge input symbols so that there are no blanks.
For example, 0 _ 0 _ 0 _ into 000.
I am aware of methods shown in the form of state diagrams to merge input symbols when there are no blanks in between, but I would like to know of good a way to do it when this is not the case.
More example of what I mean by removing input symbols in between (assuming an alphabet of 1,0 for now):
Example input: 10011 output: 101
Example input: 11101101 output: 1110
I am not sure you can do that with only 2 alphabet symbols. Basic idea is to use additional (blank) symbol from alphabet and use it as skip characters when moving on tape.
If you know how to solve initial problem, for example 0_1_0 you can can covert arbitrary tape having only 0 and 1 to tape with blanks then proceed from there.

In BCPL what does "of" do?

I am trying to understand some ancient code from a DEC PDP10 written in BCPL. A sample of the code is as follows:
test scanner()=S.DOTNAME then
$( word1:=checklook.up(scan.info,S.SFUNC,"unknown Special function [:s]")
D7 of temp:=P1 of word1
scanner()
$) or D7 of temp:=SF.ACTION
What do the "D7 of temp" and "P1 of word1" constructs do in this case?
The unstoppable Martin Richards is continuing to add features to the BCPL language(a), despite the fact that so few people are aware of it(b). Only seven or so questions are tagged bcpl on Stack Overflow but don't get me wrong: I liked this language and I have fond memories of using it back in the '80s.
Some of the things added since the last time I used it are the sub-field operators SLCT and OF. As per the manual on Martin's own site:
An expression of the form K OF E accesses a field of consecutive bits in memory. K must be a manifest constant equal to SLCT length:shift:offset and E must yield a pointer, p say.
The field is contained entirely in the word at position p + offset. It has a bit length of length and is shift bits from the right hand end of the word. A length of zero is interpreted as the longest length possible consistent with shift and the word length of the implementation.
Hence it's a more fine-grained way of accessing parts of memory than just the ! "dereference entire word" operator in that it allows you to get at specific bits within a word.
(a) Including, apparently, a version for the Raspberry PI, which may finally give me an excuse to break out all those spare PIs I have lying around, and educate the kids about the "good old days".
(b) It was used for at least one MC6809 embedded system I worked on, and formed a non-trivial part of AmigaDOS many moons ago.

ZPL - Code 128 Understanding better how to use Subsets B and C

I'm getting involved with ZPL (a little bit) since a few days, so I'm sorry if the questions will look stupid.
I've got to build a bar code 128 and I finally realized: I got to make it as shorter as possible.
My main question is: is it possible to switch to subset C and then back to B for just 2 digits? I read the documentation and subset C will ready digits from 00 to 99, so in theory it should work, practically, will it be worth it?
Basically when I translate a bar code with Zebra designer, and print it to a file, it doesn't bother to switch to subset C for just a couple of digits.
This is the text I need to see in the bar code:
AB1C234D567890123456
By the documentation I read, I would build something like this:
FD>:AB1C>523>64D>5567890123456
Instead Zebra Designer does:
FD>:AB1C234D>5567890123456
So the other question is, will the bar code be the same length? Actually, will mine be shorter? [I don't have a printer with me at the moment]
Last question:
Let's say I don't want to spend much time scripting this up, will the following work ok, or will it make the bar code larger?
AB1C>523>64D>556>578>590>512>534>556
So I can just build a very simple script which checks two chars per time, if they're both numbers, then add >5 to the string.
Thank you :)
Ah, some nice loose terminology. Do you mean couple="exactly 2" or couple="a few"?
Changing from one subset to another takes one code element, so for exactly 2 digits, you'd need one element to change and one to represent the 2 digits in subset C. On the other hand, staying with your original subset would take 2 elements - so no, it's not worth the change.
Further, if you were to change to C for 2 digits and then back to your original, the change would actually be costly - C(12)B = 3 elements whereas 12 would only be 2.
If you repeat the exercise for 4 digits, then switching to C would generate C(12)(34) = 3 elements against 4 to stay with what you have; or C(12)(34)B = 4 elements if you switch and change back, or 4 elements if you stick - so no gain.
With 6 or more successive numerics, then you gain regardless of whether or not you switch back.
So overall,
2-digit terminal : No difference
2-digit other : code is longer
4-digit terminal : code is shorter
4-digit other : no difference
more than 4 digits : code is shorter.
And an ODD number of digits would need to be output in code A or B for the first digit and then the above table applies to the remainder.
This may not be the answer you're looking for, but specifying A (Automatic Mode) as the final parameter to the ^BC command will make the printer do this for you.
Example:
^XA
^FO100,100
^BY3
^BCN,100,N,N,A
^FD0123456789^FS
^XZ

Stack question: pop out in a pattern

At run time I want to input two types of datatype, double and string.
One of the condition is that String should pop in the order I input, and double will pop as the usual stack behaviour, LIFO. Another condition is that the stack is limited to max size 10
E.g. one runtime example
Input Hello 1 World 2 blah blah 3 4 5
Output Hello 5 World 4 blah blah 3 2 1
My first question is how many ways is there to solve this problem?
I have solved this problem using 3 stacks, one which stores double, one which store strings, and one which is used to reverse the string order.
I need to save the pattern so the program know which order the doubles comes, thus I save the pattern to the string stack. Since the stack is limited to size 10, I will need to save the pattern in another way.
So this is how my string stack will look like after the push
Hello*
World*
blah
blah***
So when at the first read I need to make specific read in that Stack position and just extract Hello out of it. Asterisk * is left for later use when I tell the program next pop is an double.
My second question is that I wonder if there is some other more elegant solution to this problem. Since my solution will involve some string manipulation to solve this problem. And as for now I'm not actually using the pop function in the string case as it is supposed to be used. I made the solution in C++ btw.
What you're doing is fine, except that if you must use a stack, then you aren't allowed to access random locations in a stack -- you can only push/pop -- and also it's not so nice to modify the input strings and store asterisks in them.
You can solve this using only push/pop operations with 5 stacks (technically, only 4 will be used at any one time, but since they are of different types, you need to declare all 5 in your program):
stack 1: push doubles in input order
stack 2: push strings in input order
stack 3: push data types (double or string) in input order
stack 4: reverse the order of strings in stack 2
stack 5: reverse the order of data types in stack 3
Now pop one data type at a time from stack 5, if it is a double, pop from stack 1, otherwise pop from stack 5, and print the popped value.
Edit: #jleedev makes a good point that there isn't a general solution when the stack size is limited. What I've described above assumes that you're allowed to use multiple stacks and each stack can hold as many items as present in the input.
I'll ignore the stack size constraint since I think it's meaning is unclear. In addition, if you can use multiple stacks all limited to size 10, then you can simulate larger stacks by using multiple actual stacks.
So, this can be done with 2 stacks using only push/pop.
push everything onto stack A.
pop everything from A onto B.
if B.empty return
if B.top is a double goto 7
output B.top and pop it off of B
goto 3
pop all of B onto A
while A.top is not a double pop A onto B
output A.top and pop it off of A
goto 2.

In simple terms, how is compression commonly implemented?

So I've been thinking lately about how compression might be implemented, and what I've postulated so far is that it might be using a sort of HashTable of 'byte signature' keys with memory location values where that 'byte signature' should be replaced upon expansion of the compressed item in question.
Is this far from the truth?
How is compression typically implemented? No need for a page worth of answer, just in simple terms is fine.
Compressing algorithms try to find repeated subsequences to replace them with a shorter representation.
Let's take the 25 byte long string Blah blah blah blah blah! (200 bit) from An Explanation of the Deflate Algorithm for example.
Naive approach
A naive approach would be to encode every character with a code word of the same length. We have 7 different characters and thus need codes with the length of ceil(ld(7)) = 3. Our code words can than look like these:
000 → "B"
001 → "l"
010 → "a"
011 → "h"
100 → " "
101 → "b"
110 → "!"
111 → not used
Now we can encode our string as follows:
000 001 010 011 100 101 001 010 011 100 101 001 010 011 100 101 001 010 110
B l a h _ b l a h _ b l a h _ b l a !
That would just need 25·3 bit = 75 bit for the encoded word plus 7·8 bit = 56 bit for the dictionary, thus 131 bit (65.5%)
Or for sequences:
00 → "lah b"
01 → "B"
10 → "lah!"
11 → not used
The encoded word:
01 00 00 00 00 10
B lah b lah b lah b lah b lah!
Now we just need 6·2 bit = 12 bit for the encoded word and 10·8 bit = 80 bit plus 3·8 bit = 24 bit for the length of each word, thus 116 bit (58.0%).
Huffman code approach
The Huffman code is used to encode more frequent characters/substrings with shorter code than less frequent ones:
5 × "l", "a", "h"
4 × " ", "b"
1 × "B", "!"
// or for sequences
4 × "lah b"
1 × "B", "lah!"
A possible Huffman code for that is:
0 → "l"
10 → "a"
110 → "h"
1110 → " "
11110 → "b"
111110 → "B"
111111 → "!"
Or for sequences:
0 → "lah b"
10 → "B"
11 → "lah!"
Now our Blah blah blah blah blah! can be encoded to:
111110 0 10 110 1110 11110 0 10 110 1110 11110 0 10 110 1110 11110 0 10 110 1110 11110 0 10 110 111111
B l a h _ b l a h _ b l a h _ b l a h _ b l a h !
Or for sequences:
10 0 0 0 0 11
B lah b lah b lah b lah b lah!
Now out first code just needs 78 bit or 8 bit instead of 25·8 = 200 bit like our initial string has. But we still need to add the dictionary where our characters/sequences are stored. For our per-character example we would need 7 additional bytes (7·8 bit = 56 bit) and our per-sequence example would need again 7 bytes plus 3 bytes for the length of each sequence (thus 59 bit). That would result in:
56 + 78 = 134 bit (67.0%)
59 + 8 = 67 bit (33.5%)
The actual numbers may not be correct. Please feel free to edit/correct it.
Check this wiki page...
Lossless compression algorithms usually exploit statistical redundancy in such a way as to represent the sender's data more concisely without error. Lossless compression is possible because most real-world data has statistical redundancy. For example, in English text, the letter 'e' is much more common than the letter 'z', and the probability that the letter 'q' will be followed by the letter 'z' is very small.
Another kind of compression, called lossy data compression or perceptual coding, is possible if some loss of fidelity is acceptable. Generally, a lossy data compression will be guided by research on how people perceive the data in question. For example, the human eye is more sensitive to subtle variations in luminance than it is to variations in color. JPEG image compression works in part by "rounding off" some of this less-important information. Lossy data compression provides a way to obtain the best fidelity for a given amount of compression. In some cases, transparent (unnoticeable) compression is desired; in other cases, fidelity is sacrificed to reduce the amount of data as much as possible.
Lossless compression schemes are reversible so that the original data can be reconstructed, while lossy schemes accept some loss of data in order to achieve higher compression.
However, lossless data compression algorithms will always fail to compress some files; indeed, any compression algorithm will necessarily fail to compress any data containing no discernible patterns. Attempts to compress data that has been compressed already will therefore usually (text files usually can be compressed more after being compressed, due to fewer symbols), result in an expansion, as will attempts to compress all but the most trivially encrypted data.
In practice, lossy data compression will also come to a point where compressing again does not work, although an extremely lossy algorithm, like for example always removing the last byte of a file, will always compress a file up to the point where it is empty.
An example of lossless vs. lossy compression is the following string:
25.888888888
This string can be compressed as:
25.[9]8
Interpreted as, "twenty five point 9 eights", the original string is perfectly recreated, just written in a smaller form. In a lossy system, using
26
instead, the original data is lost, at the benefit of a smaller file size.
Lossless compression algorithms translate each possible input into distinct outputs, in such a way that more common inputs translate to shorter outputs. It's mathematically impossible for all possible inputs to be compressed -- otherwise, you'd have multiple inputs A and B compressing to the same form, so when you decompress it, do you get back to A or back to B? In practice, most useful information has some redundancy and this redundancy fits certain patterns; hence the data can usefully be compressed because the cases that expand when you compress them don't naturally arise.
Lossy compression, for example, that used in JPEG or MP3 compression, works by approximating the input data by some signal that can be expressed in fewer bits than the original. When you decompress it, you don't get the original, but you usually get something close enough.
In VERY simple terms, a common form of compression is a http://en.wikipedia.org/wiki/Dictionary_coder. This involves replacing longer repeated strings with shorter ones.
For example if you have a file that looks like this:
"Monday Night","Baseball","7:00pm"
"Tuesday Night","Baseball","7:00pm"
"Monday Night","Softball","8:00pm"
"Monday Night","Softball","8:00pm"
"Monday Night","Baseball","5:00pm"
It would be roughly 150 characters, but if you where to do a simple substitution as follows:
A="Monday Night",B="Tuesday Night",C="Baseball",D="Softball",E="7:00pm",F="8:00pm",G=5:00pm"
Then the same content could be encoded as:
A,C,E
B,C,E
A,D,F
A,D,F
A,C,G
Using on 25 characters! A clever observer could also see how to easily reduce this further to 15 characters if we assumed some more things about the format of the file. Obviously there is the overhead of the substitution key, but often very large files have a lot of these substitutions. This can be a very efficient way to compress large files or data structures and still allow them to be "somewhat" human readable.
Rosetta Code has an entry on Huffman Coding, as does an earlier blog entry of mine.

Resources