How can I get informal synonyms or abbreviations for a word? I tried using stemmers (like the Porter filter) and thesauruses, but they don't seem to recognize "informal" synonyms for a word. I guess my examples below are not really synonyms, but rather abbreviations.
Examples include:
Technology => Tech
Business => Biz
Applications => Apps
To the best of my knowledge, there is no such library available. The synonyms/abbreviations you mention in the question are a part of the evolutionary nature of any natural language. That is to say, hard-coding such a list will never ever give you a complete list of such equivalences.
The only good long (or even medium) term solution is to use appropriate NLP/ML paradigms to "learn" them. Such equivalences are highly context dependent. For example:
NLP == natural language processing OR neuro-linguistic programming (ambiguous acronym)
Ft. == foot OR featuring (ambiguous abbreviation)
A historical (and slightly philosophical) presentation of this context dependence is explained here. For a more day-to-day example, see this Wikipedia disambiguation page (this is the second example in the above list).
Basically, what I am trying to illustrate here is that there is no off-the-shelf tool/library for this because resolving synonymy (especially colloquial terms, abbreviations, etc.) is a difficult problem.
Related
I have a list of names of millions of famous people (from Wikidata), and I need to create a system that efficiently finds all people mentioned in a fairly short text: it can be just one word (eg. "Einstein") to a few pages of text (eg. a Wikipedia page).
I need the system to be fairly tolerant to spelling mistakes (eg. Mikael Jackson instead of Michael Jackson), and short forms (eg. M. Jackson). In case of ambiguity, it should return all possible people (eg. "George Bush" should return both father and son, and possibly other homonyms).
This related question has a few interesting answers, including using the Aho-Corasick algorithm. There are libraries in many languages, including in Python. However, it does not seem to support fuzzy search (ie. tolerate misspellings).
I guess I could extend the vocabulary to include all the possible spellings of each name, but that would make the vocabulary too large, so I would rather avoid that if possible (moreover, I may want to extend this solution to more than just people at one point).
I took a quick look at Lucene/ElasticSearch but it does not seem to support this use case (unless I missed it).
Any ideas?
Elasticsearch has support for fuzzy matching: See documentation here.
If a person learns data-structure and algorithm in one programming language does it needs to learn other language's data-structure and algorithm ?
As i am about to start a book Data-structure and algorithm in JavaScript as i also want to learn Web
will it help me for other languages too?
Data structures and algorithms are concepts that are independent of language. Therefore once you master them in you favorite language it's relatively easy to switch to another one.
Now if you're asking about built-in methods and apis that different languages have, they do differ, but you shouldn't learn specific APIs in your data-structure and algorithms book anyways.
Yes... and no.
While the concepts behind algorithms and data structures, like space and time complexity or mutability, are language agnostic, some languages might not let you implement some of those patterns at all.
A good example would be a recursive algorithm. In some languages like haskell, recursivity is the best way to iterate over a collection of element. In other languages like C, you should avoid using recursive algorithm on unbound collections as you can easily hit the dept limit of the stack. You could also easily imagine a language that is not even stack based and in which a recursive algorithm would be completely impossible to implement. You could implement a stack on top of such a language but it would most definitely be slower than implementing the algorithm in a different fashion.
An other example would be object oriented data structures. Some languages like haskell do not let you change values. All elements in such language are immutable and must be copied to be changed. This is analog to how numbers are handled in javascript where you cannot change the value 2, but you can take the value 2, add 1 to it and then store it to a new location. Other languages like C do not have (or very poorly handle) object oriented programming. This will break down most data structure pattern you will learn about in a javascript oriented book.
In the end, it all boils down to performance. You don't write C code like you write JavaScript or F# code. They all have their quirks and thus need different implementations even though the idea behind those algorithms and structures will stay the same. You can always emulate a pattern on a language that does not supports it, like OOP in C, but it will always feel more natural to solve the problem in a different way.
That being said, as long as you stay within the same kind of language, you can probably reuse 80%+ of that book. There are many OOP languages out there. Javascript is probably the most exotic of them all with its ability to treat all objects like dictionaries and its weird concept of "this" so a lot of patterns in there will not apply in those other languages.
You need not learn data structure and algorithm when you use another language.The reason is evident, all of data structures and algorithm is a realization of some kinds of "mathmatical or logical thought".
for example,if you learn the sort algorithm, you must hear about the quick sort and merge sort and any others, the realization of different sort algotithm is based on fundamental element that almost every language has,such as varible,arrays,loop and so on. i mean you can realize it without using the language(such as JavaScript) characteristics.
although it is nothing relevant to language,i still suggest you stduy it with C.The reason is that C is a lower high-level language which means it is near the operating system.And the algorithm you write with C is faster than Java or Python.(Most cases).And C don't have so many characteristic like c++ stl or java collection. In C++ or Java, it realize hashmap itself.If you are a green hand to data structure, you'd better realize it from 0 to 1 yourself rather directly use other "tools" to lazy.
The data structure and algorithm as concepts are the same across languages, the implementation however varies greatly.
Just look at the implementation of quicksort in an imperative language like C and in a functional language like Haskell. This particular algorithm is a poster boy for functional languages as it can be implemented in just about two lines (and people are particularly fond of stressing that's it).
There is a finer point. Depending on the language, many data structures and algorithms needn't be implemented explicitly at all, other than as an academic exercise to learn about them. For example, in Python you don't really need to implement an associative container whereas in C++ you need to.
If it helps, think of using DS and algo in different programming languages as narrating a story in multiple human languages. The plot elements remain the same but the ease of expression, or the lack thereof, varies greatly depending on the language used to narrate it.
(DSA) Data structures and algorithms are like emotions in humans (in all humans emotions are same like happy, sad etc)
and, programming languages are like different languages that humans speak (like spanish, english, german, arabic etc)
all humans have same emotions (DSA) but when they express them in different languages (programming languages) , the way of expressing (implementation) of these emotions (DSA) are different.
so when you switch to using different or new language, just have a look at how those DSA are implemented in that languages.
I would like to ask you about what formal system could be more interesting to implement from scratch/reverse engineer.
I've looked through some existing and open-source projects of logical/declarative programming systems. I've decided to make up something similar in my free time, or at least to catch the general idea of implementation.
It would be great if some of these systems would provide most of the expressive power and conciseness of modern academic investigations in logic and its relation with computational models.
What would you recommend to study at least at the conceptual level? For example, Lambda-Prolog is interesting particularly because it allows for higher order relations, but AFAIK is based on intuitionist logic and therefore lack the excluded-middle principle; that's generally a disadvantage for me.
I would also welcome any suggestions about modern logical programming systems which are less popular but more expressive/powerful.
Prolog was the first language which changed my point of view at programming. But later I found it to be not so high-level as I'd like to see it.
Curry - I've tried only Munster CC, and found it somewhat inconvenient. Actually, at this point, I decided to stop ignoring Haskell.
Mercury has many things which I wanted to see in Prolog. I have a really good expectation about the possibility to distinguish modes of rules. Programs written in Mercury should inspire compiler to do a lot of optimizations (I guess).
Twelf.
It generalizes lambda-prolog significantly, and it's a logical framework and a metalogical framework as well as a logic programming language. If you need a language with a heavy focus on logic as well as computation, it's the best I know of.
If I were to try to extend a logic based system, I'd choose Prolog Cafe as it is small, open sourced, standards compliant, and can be easily integrated into java based systems.
For the final project in a programming languages course I took, we had to embed a Prolog evaluator in Scheme using continuations and macros. The end result was that you could freely mix Scheme and Prolog code, and even pass arbitrary predicates written in Scheme to the Prolog engine.
It was a very instructive exercise. The first 12 lines of code (and and or) literally took about 6 hours to write and get correct. It was pretty much the search logic, written very concisely using continuations. The rest followed a bit more easily. Then once I added the unification algorithm, it all just worked.
I am brainstorming an idea of developing a high level software to manipulate matrix algebra equations, tensor manipulations to be exact, to produce optimized C++ code using several criteria such as sizes of dimensions, available memory on the system, etc.
Something which is similar in spirit to tensor contraction engine, TCE, but specifically oriented towards producing optimized rather than general code.
The end result desired is software which is expert in producing parallel program in my domain.
Does this sort of development fall on the category of expert systems?
What other projects out there work in the same area of producing code given the constraints?
What you are describing is more like a Domain-Specific Language.
http://en.wikipedia.org/wiki/Domain-specific_language
It wouldn't be called an expert system, at least not in the traditional sense of this concept.
Expert systems are rule-based inference engines, whereby the expertise in question is clearly encapsulated in the rules. The system you suggest, while possibly encapsulating insight about the nature of the problem domain inside a linear algebra model of sorts, would act more as a black box than an expert system. One of the characteristics of expert systems is that they can produce an "explanation" of their reasoning, and such a feature is possible in part because the knowledge representation, while formalized, remains close to simple statements in a natural language; matrices and operations on them, while possibly being derived upon similar observation of reality, are a lot less transparent...
It is unclear from the description in the question if the system you propose would optimize existing code (possibly in a limited domain), or if it would produced optimized code, in that case driven bay some external goal/function...
Well production systems (rule systems) are one of four general approaches to computation (Turing machines, Church recursive functions, Post production systems and Markov algorithms [and several more have been added to that list]) which more or less have these respective realizations: imperative programming, functional programming, rule based programming - as far as I know Markov algorithms don't have an independent implementation. These are all Turing equivalent.
So rule based programming can be used to write anything at all. Also early mathematical/symbolic manipulation programs did generally use rule based programming until the problem was sufficiently well understood (whereupon the approach was changed to imperative or constraint programming - see MACSYMA - hmmm MACSYMA was written in Lisp so perhaps I have a different program in mind or perhaps they originally implemented a rule system in Lisp for this).
You could easily write a rule system to perform the matrix manipulations. You could keep a trace depending on logical support to record the actual rules fired that contributed to a solution (some rules that fire might not contribute directly to a solution afterall). Then for every rule you have a mapping to a set of C++ instructions (these don't have to be "complete" - they sort of act more like a semi-executable requirement) which are output as an intermediate language. Then that is read by a parser to link it to the required input data and any kind of fix up needed. You might find it easier to generate functional code - for one thing after the fix up you could more easily optimize the output code in functional source.
Having said that, other contributors have outlined a domain specific language approach and that is what the TED people did too (my suggestion is that too just using rules).
First of all, is this only possible on algorithms which have no side effects?
Secondly, where could I learn about this process, any good books, articles, etc?
COQ is a proof assistant that produces correct ocaml output. It's pretty complicated though. I never got around to looking at it, but my coworker started and then stopped using it after two months. It was mostly because he wanted to get things done quicker, but if you need to verify an algorithm this might be a good idea.
Here is a course that uses COQ and talks about proving algorithms.
And here is a tutorial about writing academic papers in COQ.
It's generally a lot easier to verify/prove correctness when no side effects are involved, but it's not an absolute requirement.
You might want to look at some of the documentation for a formal specification language like Z. A formal specification isn't a proof itself, but is often the basis for one.
I think that verifying the correctness of an algorithm would be validating its conformance with a specification. There is a branch of theoretical Computer Science called Formal Methods which may be what you are looking for if you need to get as close to proof as you can. From wikipedia,
Formal Methods are a particular kind
of mathematically-based techniques for
the specification, development and
verification of software and hardware
systems
You will be able to find many learning resources and tools from the multitude of links on the linked Wikipedia page and from the Formal Methods wiki.
Usually proofs of correctness are very specific to the algorithm at hand.
However, there are several well known tricks that are used and re-used again. For example, with recursive algorithms you can use loop invariants.
Another common trick is reducing the original problem to a problem for which your algorithm's proof of correctness is easier to show, then either generalizing the easier problem or showing that the easier problem can be translated to a solution to the original problem. Here is a description.
If you have a particular algorithm in mind, you may do better in asking how to construct a proof for that algorithm rather than a general answer.
Buy these books: http://www.amazon.com/Science-Programming-Monographs-Computer/dp/0387964800
The Gries book, Scientific Programming is great stuff. Patient, thorough, complete.
Logic in Computer Science, by Huth and Ryan, gives a reasonably readable overview of modern systems for verifying systems. Once upon a time people talked about proving programs correct - with programming languages which may or may not have side effects. The impression I get from this book and elsewhere is that real applications are different - for instance proving that a protocol is correct, or that a chip's floating point unit can divide correctly, or that a lock-free routine for manipulating linked lists is correct.
ACM Computing Surveys Vol 41 Issue 4 (October 2009) is a special issue on software verification. It looks like you can get to at least one of the papers without an ACM account by searching for "Formal Methods: Practice and Experience".
The tool Frama-C, for which Elazar suggests a demo video in the comments, gives you a specification language, ACSL, for writing function contracts and various analyzers for verifying that a C function satisfies its contract and safety properties such as the absence of run-time errors.
An extended tutorial, ACSL by example, shows examples of actual C algorithms being specified and verified, and separates the side-effect-free functions from the effectful ones (the side-effect-free ones are considered easier and come first in the tutorial). This document is also interesting in that it was not written by the designers of the tools it describe, so it gives a fresher and more didactic look at these techniques.
If you are familiar with LISP then you should definitely check out ACL2: http://www.cs.utexas.edu/~moore/acl2/acl2-doc.html
Dijkstra's Discipline of Programming and his EWDs lay the foundation for formal verification as a science in programming. A simpler work is Wirth's Systematic Programming, which begins with the simple approach to using verification. Wirth uses pre-ISO Pascal for the language; Dijkstra uses an Algol-68-like formalism called Guarded (GCL). Formal verification has matured since Dijkstra and Hoare, but these older texts may still be a good starting point.
PVS tool developed by Stanford guys is a specification and verification system. I worked on it and found it very useful for Theoram Proving.
WRT (1), you will probably have to create a model of the algorithm in a way that "captures" the side-effects of the algorithm in a program variable intended to model such state-based side-effects.