Is there a non-RE language that is composed of only 1 element? - computation-theory

I read in a book (Hromkovic, Communication Complexity and Parallel Computating) that there is an infinite number of non recursively - enumerable (non-RE) languages that are composed of only 1 element? But is that possible? I though that for a language to be non-RE (or even undecidable), the language has to be infinite.

No, all finite languages are regular because they can be accepted by finite automata. There are at least three possible explanations for what you read:
the author(s) wrote something different and you are paraphrasing incorrectly;
the author(s) wrote something other than what they intended, possibly using sloppy terminology;
the author(s) wrote something incorrect due to a misunderstanding of subject matter not belonging to their specialty.
If you quote the relevant passage, it might be possible to explore which of these options is most likely. Note that everybody makes mistakes - people who read books and people who write books alike. Note also that I use author(s) and not authors because it's possible this passage was written in isolation by one author and was not properly reviewed.

Related

software or algorithms to detect grammar units in texts

I'm not sure this is the right fit for stackoverflow but maybe you guys would suggest where to put this question otherwise but here it is anyway. Suppose I have a few sentences of a text like this:
John reads newspapers everyday. Right now he has just finished reading
one. He will read another one and might even read a small book
tomorrow.
This small extract contains the following grammar units:
present simple (reads)
present perfect (has finished)
future simple (will read)
modal verb may
Do you know of any software, algorithm or study that defines rules for identifying these grammar patterns?
Read this also if you are going to use Ruby than you can use TreeTop or find the equivalent parser in other programming language.
NTLK is a natural language parser for python, it works by tagging words. You can look at some examples here. It creates a parse-tree, which are very useful for these types of problems.
I haven't seen it distinguish between simple and perfect, but it could be modified to do so.

QA Algorithm for Q Processing

What Algorithm/method do I use for a Question Answering System's Question Processing?
I have been searching possible algorithms for my Question Answering System, the only thing that I think that would be possible to use is Parsing but I have asked about parsing in my last question and with the answers there i think its not possible to be used?(I'm not sure).
My idea of using Parsing is by Cutting the question into pieces word per word and then it will go through a Storage of Words that would determine what Kind of Word(noun,adjective,verb,etc) is being said. My purpose of using Parsing is to remove or rather to determine the Topic of the question.
The other idea of mine is the ChatterBot. A Chatterbot uses a query of words? Correct me if I'm not mistaken and those words are assigned to another Word. It would randomly choose a word from its Query.
Example: User's Statement: Hello > ChatterBot's Possible Replies: Hi,Hello,Hey
I'm not quite sure what is the possible method/algorithm to use in a Question Answering, I have read the Wikipedia post : http://en.wikipedia.org/wiki/Question_answering but I do not quite understand what algorithm to use in Question Processing.
Thank you.
PS: I'm developing in Javascript. Q = Question
You could use a naive bayes classifier in order to look at the questions and determine their subject. You'd need a lot of training data and a fairly narrow domain.
The sophisticated responses to this problem involve a lot of machine inference techniques which are a bit out of my skill level to explain extremely well. My idea is to use a markov network in which each word has an edge to one or two words next to it. A series of tests are applied to each word which indicate likely memberhood of that word to one of its possible meanings (For example, Mark is more likely a name if it's capitalized, but if the next word is 'a' it probably is used in the sense of a verb.) From there the machine can attempt to determine the actual meaning of the sentence, which will rely on the use of, again, unimaginably large amounts of training data.
Coursera's Probabilistic Graphical Models class (Probably their NLP class too) would probably be the best resource if you're interested in becoming skilled in this area. (PGM is the only reason I know anything about this!)
here's a great book, you may need to read to get a lot of stuff related to NLP, and Question answering systems http://www.amazon.com/Speech-Language-Processing-2nd-Edition/dp/0131873210
the book has a full section (V.Applications) that will help you a lot to develop a good system.
but note that the book is discussing theories and algorithms only (no code)
it's not about parsing text only, you'll need to understand the context to provide better answer. actually you need to extract some keywords and ignore everything else.
also you may read in topics Keywords (Bag of words), algorithms like (TF/IDF).

How can I learn to read formulas with greek symbols?

I suppose maybe it's because I don't know the keywords to google for, but I can't find any sources on how to read those formulas you see on wikipedia, like this for instance:
Erlang Distribution
I've searched in the math world and computer science world. It feels like it is assumed that we're supposed to understand it out of thin air. Beginner lessons seem scarce.
So far I know how sigma works. And that upside-down shape that is used as the half-life logo is called lambda. But what the heck is it trying to say?? Why is there a semi-colon in the function, etc..
If there is a book on this stuff I'd buy it in an instant. It is probably very basic stuff but I never had experience in theoretical math or even know where to look.
Does anyone know what this subject is called, and what to google for?
Formulas with this symbols usually are statistics or probability notations.
Greek letters (e.g. θ, β) are commonly used to denote unknown parameters (population parameters).
Greek letters used in mathematics, science, and engineering
you can find info here
Notation in probability and statistics
here
I think the colon in alt.: \scriptstyle \theta \;=\; \frac{1}{\lambda} > 0\, scale (real) in the box in Wikipedia is just saying that there is an alternative definition, in which you specify theta rather than lambda, and in that definition what is called theta is the reciprocal of lambda in the other definition.
I once complained to a much better mathematician than I was that I came unstuck with formulas with some of the weirder greek letters in because I couldn't write them recognisably in my handwriting (which is bad enough for the latin alphabet). He said a lot of the people he knew simply said "let x be funny-squiggle-thing" and rewrote with sensible letters. I really wish I'd thought of that.
In general, letters in weird alphabets behave pretty much like sensible letters, at least in the sort of thing you are pointing at. It's done as a sort of type-checking - usually all of the letters pinched from some particular foreign language are related in some way - e.g. all parameters. Unfortunately that doesn't hold exactly in the Wikipedia example you quote, where two of the greek letters stand for functions - one is definitely the Gamma function. I suspect the other is http://en.wikipedia.org/wiki/Digamma_function, but I'm not really sure.
Check out the resources list here: http://en.wikipedia.org/wiki/Greek_alphabet
I would say your best bet is still searching in Google (or other search engine, whatever float your boat) about the specific formula you are trying to learn. Sometimes a symbol may be used in different meaning in different formula.
Anyway, there is a good resource in here that explained a lot of math symbols, not just the Greek symbol.
Some link that may interest you here and here.
First, find a Greek alphabet (upper and lower case) to refer to, so that you can at least call lambda by it's name. No one starts out knowing automatically what the various Greek characters are, not even Greeks.
Second, Read the actual article, usually either the character is defined (as lambda happens to be in the Wikipedia page you references) or it's standard nomenclature (in which case you've done the right thing by looking for a basic article on the function in question-- I do this all the time so don't feel bad.) Or, as a third option, it's a crappy paper. Happens sometimes. It's kind of a pain, though, since you can't just do a text search on the lambda character in a PDF.
(Someone educate me on that if I'm wrong....)
Third, try to pick out which unfamiliar symbols are variables (like lambda) and which are operators (like sigma, and it's helpers.) It's the operators that can sometimes cause real trouble. A variable is just a name for something, but operators come freighted with more meaning, more rules, and more syntax. It's not always obvious which symbols are operators, either.
Finally, and specifically for computer science, a good introductory book (college freshman/sophomore level) on discrete math will hopefully treat most of the basic notations and operators to at least get your feet on the ground. Nowadays, you kids and your newfangled internet might be able to get something similar from Udacity, Edx, Course RA, or the Khan Academy.
Basically, it's a lot of hard work, especially on your own, but you're already doing most of the right things.

pseudo code - formal rules

Has anyone set out a proposal for a formal pseudo code standard?
Is it better to be a 'rough' standard to infer an understanding?
It is better to be a rough standard; the intent of pseudocode is to be human-readable, not machine-readable, and the goal of actually writing pseudocode is to convey a higher-level description of an algorithm while being unconcerned (typically) with the minutiae of the implementation. My opinion is that for it to qualify as pseudocode there has to be some ambiguity, and your goal should be a clear conveyance of your algorithmic intentions. Stick to common control structures, declarations and concepts that are paradigmatic to your target audience or language and you'll get the point across. If you start getting too formal, you're getting too close to writing actual code.
NOT AN ANSWER.
IMHO forcing a standard (pseudo code syntax, if you will) will cause people to be less clear on what they want to say.
Browse around, try to gather some knowledge about used conventions, and do your best to be clear.
Although this is by no means a formal proposal, Python is considered by some to be Executable Pseudocode.
in my opinion it depends on the people who are using your programs. In books for algorithms the pseudo code is very formal an near to maths, but is also described in a few paragraphs, so this is for scientific situations.
If I develop in others environments I would prefer a not that formal way because that is easier to understand for most people. I prefer spoken words over formalism. If you want formalism you could read the code instead.

Pseudocode interpreter?

Like lots of you guys on SO, I often write in several languages. And when it comes to planning stuff, (or even answering some SO questions), I actually think and write in some unspecified hybrid language. Although I used to be taught to do this using flow diagrams or UML-like diagrams, in retrospect, I find "my" pseudocode language has components of C, Python, Java, bash, Matlab, perl, Basic. I seem to unconsciously select the idiom best suited to expressing the concept/algorithm.
Common idioms might include Java-like braces for scope, pythonic list comprehensions or indentation, C++like inheritance, C#-style lambdas, matlab-like slices and matrix operations.
I noticed that it's actually quite easy for people to recognise exactly what I'm triying to do, and quite easy for people to intelligently translate into other languages. Of course, that step involves considering the corner cases, and the moments where each language behaves idiosyncratically.
But in reality, most of these languages share a subset of keywords and library functions which generally behave identically - maths functions, type names, while/for/if etc. Clearly I'd have to exclude many 'odd' languages like lisp, APL derivatives, but...
So my questions are,
Does code already exist that recognises the programming language of a text file? (Surely this must be a less complicated task than eclipse's syntax trees or than google translate's language guessing feature, right?) In fact, does the SO syntax highlighter do anything like this?
Is it theoretically possible to create a single interpreter or compiler that recognises what language idiom you're using at any moment and (maybe "intelligently") executes or translates to a runnable form. And flags the corner cases where my syntax is ambiguous with regards to behaviour. Immediate difficulties I see include: knowing when to switch between indentation-dependent and brace-dependent modes, recognising funny operators (like *pointer vs *kwargs) and knowing when to use list vs array-like representations.
Is there any language or interpreter in existence, that can manage this kind of flexible interpreting?
Have I missed an obvious obstacle to this being possible?
edit
Thanks all for your answers and ideas. I am planning to write a constraint-based heuristic translator that could, potentially, "solve" code for the intended meaning and translate into real python code. It will notice keywords from many common languages, and will use syntactic clues to disambiguate the human's intentions - like spacing, brackets, optional helper words like let or then, context of how variables are previously used etc, plus knowledge of common conventions (like capital names, i for iteration, and some simplistic limited understanding of naming of variables/methods e.g containing the word get, asynchronous, count, last, previous, my etc). In real pseudocode, variable naming is as informative as the operations themselves!
Using these clues it will create assumptions as to the implementation of each operation (like 0/1 based indexing, when should exceptions be caught or ignored, what variables ought to be const/global/local, where to start and end execution, and what bits should be in separate threads, notice when numerical units match / need converting). Each assumption will have a given certainty - and the program will list the assumptions on each statement, as it coaxes what you write into something executable!
For each assumption, you can 'clarify' your code if you don't like the initial interpretation. The libraries issue is very interesting. My translator, like some IDE's, will read all definitions available from all modules, use some statistics about which classes/methods are used most frequently and in what contexts, and just guess! (adding a note to the program to say why it guessed as such...) I guess it should attempt to execute everything, and warn you about what it doesn't like. It should allow anything, but let you know what the several alternative interpretations are, if you're being ambiguous.
It will certainly be some time before it can manage such unusual examples like #Albin Sunnanbo's ImportantCustomer example. But I'll let you know how I get on!
I think that is quite useless for everything but toy examples and strict mathematical algorithms. For everything else the language is not just the language. There are lots of standard libraries and whole environments around the languages. I think I write almost as many lines of library calls as I write "actual code".
In C# you have .NET Framework, in C++ you have STL, in Java you have some Java libraries, etc.
The difference between those libraries are too big to be just syntactic nuances.
<subjective>
There has been attempts at unifying language constructs of different languages to a "unified syntax". That was called 4GL language and never really took of.
</subjective>
As a side note I have seen a code example about a page long that was valid as c#, Java and Java script code. That can serve as an example of where it is impossible to determine the actual language used.
Edit:
Besides, the whole purpose of pseudocode is that it does not need to compile in any way. The reason you write pseudocode is to create a "sketch", however sloppy you like.
foreach c in ImportantCustomers{== OrderValue >=$1M}
SendMailInviteToSpecialEvent(c)
Now tell me what language it is and write an interpreter for that.
To detect what programming language is used: Detecting programming language from a snippet
I think it should be possible. The approach in 1. could be leveraged to do this, I think. I would try to do it iteratively: detect the syntax used in the first line/clause of code, "compile" it to intermediate form based on that detection, along with any important syntax (e.g. begin/end wrappers). Then the next line/clause etc. Basically write a parser that attempts to recognize each "chunk". Ambiguity could be flagged by the same algorithm.
I doubt that this has been done ... seems like the cognitive load of learning to write e.g. python-compatible pseudocode would be much easier than trying to debug the cases where your interpreter fails.
a. I think the biggest problem is that most pseudocode is invalid in any language. For example, I might completely skip object initialization in a block of pseudocode because for a human reader it is almost always straightforward to infer. But for your case it might be completely invalid in the language syntax of choice, and it might be impossible to automatically determine e.g. the class of the object (it might not even exist). Etc.
b. I think the best you can hope for is an interpreter that "works" (subject to 4a) for your pseudocode only, no-one else's.
Note that I don't think that 4a,4b are necessarily obstacles to it being possible. I just think it won't be useful for any practical purpose.
Recognizing what language a program is in is really not that big a deal. Recognizing the language of a snippet is more difficult, and recognizing snippets that aren't clearly delimited (what do you do if four lines are Python and the next one is C or Java?) is going to be really difficult.
Assuming you got the lines assigned to the right language, doing any sort of compilation would require specialized compilers for all languages that would cooperate. This is a tremendous job in itself.
Moreover, when you write pseudo-code you aren't worrying about the syntax. (If you are, you're doing it wrong.) You'll wind up with code that simply can't be compiled because it's incomplete or even contradictory.
And, assuming you overcame all these obstacles, how certain would you be that the pseudo-code was being interpreted the way you were thinking?
What you would have would be a new computer language, that you would have to write correct programs in. It would be a sprawling and ambiguous language, very difficult to work with properly. It would require great care in its use. It would be almost exactly what you don't want in pseudo-code. The value of pseudo-code is that you can quickly sketch out your algorithms, without worrying about the details. That would be completely lost.
If you want an easy-to-write language, learn one. Python is a good choice. Use pseudo-code for sketching out how processing is supposed to occur, not as a compilable language.
An interesting approach would be a "type-as-you-go" pseudocode interpreter. That is, you would set the language to be used up front, and then it would attempt to convert the pseudo code to real code, in real time, as you typed. An interactive facility could be used to clarify ambiguous stuff and allow corrections. Part of the mechanism could be a library of code which the converter tried to match. Over time, it could learn and adapt its translation based on the habits of a particular user.
People who program all the time will probably prefer to just use the language in most cases. However, I could see the above being a great boon to learners, "non-programmer programmers" such as scientists, and for use in brainstorming sessions with programmers of various languages and skill levels.
-Neil
Programs interpreting human input need to be given the option of saying "I don't know." The language PL/I is a famous example of a system designed to find a reasonable interpretation of anything resembling a computer program that could cause havoc when it guessed wrong: see http://horningtales.blogspot.com/2006/10/my-first-pli-program.html
Note that in the later language C++, when it resolves possible ambiguities it limits the scope of the type coercions it tries, and that it will flag an error if there is not a unique best interpretation.
I have a feeling that the answer to 2. is NO. All I need to prove it false is a code snippet that can be interpreted in more than one way by a competent programmer.
Does code already exist that
recognises the programming language
of a text file?
Yes, the Unix file command.
(Surely this must be a less
complicated task than eclipse's syntax
trees or than google translate's
language guessing feature, right?) In
fact, does the SO syntax highlighter
do anything like this?
As far as I can tell, SO has a one-size-fits-all syntax highlighter that tries to combine the keywords and comment syntax of every major language. Sometimes it gets it wrong:
def median(seq):
"""Returns the median of a list."""
seq_sorted = sorted(seq)
if len(seq) & 1:
# For an odd-length list, return the middle item
return seq_sorted[len(seq) // 2]
else:
# For an even-length list, return the mean of the 2 middle items
return (seq_sorted[len(seq) // 2 - 1] + seq_sorted[len(seq) // 2]) / 2
Note that SO's highlighter assumes that // starts a C++-style comment, but in Python it's the integer division operator.
This is going to be a major problem if you try to combine multiple languages into one. What do you do if the same token has different meanings in different languages? Similar situations are:
Is ^ exponentiation like in BASIC, or bitwise XOR like in C?
Is || logical OR like in C, or string concatenation like in SQL?
What is 1 + "2"? Is the number converted to a string (giving "12"), or is the string converted to a number (giving 3)?
Is there any language or interpreter
in existence, that can manage this
kind of flexible interpreting?
On another forum, I heard a story of a compiler (IIRC, for FORTRAN) that would compile any program regardless of syntax errors. If you had the line
= Y + Z
The compiler would recognize that a variable was missing and automatically convert the statement to X = Y + Z, regardless of whether you had an X in your program or not.
This programmer had a convention of starting comment blocks with a line of hyphens, like this:
C ----------------------------------------
But one day, they forgot the leading C, and the compiler choked trying to add dozens of variables between what it thought was subtraction operators.
"Flexible parsing" is not always a good thing.
To create a "pseudocode interpreter," it might be necessary to design a programming language that allows user-defined extensions to its syntax. There already are several programming languages with this feature, such as Coq, Seed7, Agda, and Lever. A particularly interesting example is the Inform programming language, since its syntax is essentially "structured English."
The Coq programming language allows "syntax extensions", so the language can be extended to parse new operators:
Notation "A /\ B" := (and A B).
Similarly, the Seed7 programming language can be extended to parse "pseudocode" using "structured syntax definitions." The while loop in Seed7 is defined in this way:
syntax expr: .while.().do.().end.while is -> 25;
Alternatively, it might be possible to "train" a statistical machine translation system to translate pseudocode into a real programming language, though this would require a large corpus of parallel texts.

Resources