Tutorials For Natural Language Processing [closed] - algorithm

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I recently attended a class on coursera about "Natural Language Processing" and I learnt a lot about parsing, IR and other interesting aspects like Q&A etc. though I grasped the concepts well but I did not actually get any practical knowledge of it. Can anyone suggest me good online tutorials or books for Natural Language Processing?
Thanks

You could read Jurafsky and Martin's Speech and Language Processing (2008 edition), which is the standard textbook in the field. It's long, and has a variety of topics, so I'd suggest reading just the chapters that really apply to your interests.
Further, the best way to learn is almost certainly to actually implement NLP algorithms from scratch. You could pick some standard tasks (language modeling, text classification, POS-tagging, NER, parsing) and implement various algorithms from the ground up (ngram models, HMMs, Naive Bayes, MaxEnt, CKY) to really understand what makes them work. It also shouldn't be too hard to find some free dataset to test your implementations on.
Finally, there are lots of tutorials out there for specific NLP algorithms that are excellent. For example, if you want to build an HMM, I suggest Jason Eisner's tutorial which also covers smoothing and unsupervised training with EM. If you want to implement Gibbs sampling for unsupervised Naive Bayes training, I suggest Philip Resnik's tutorial.

Aside from Jurafsky and Martin's book, Christopher D. Manning and Hinrich Schütze's Foundations of Statistical Natural Language Processing is also widely used. For IR, Manning et al. also wrote Introduction to Information Retrieval which can be read or downloaded online at their site.

If you want practical knowledge on how can you work on Natural language you should start implementing it.
I suggest to use NLTK(Natural Language Proecessing Toolkit) with Python. Its easy to implement NLP in python.
You can refer to this link
http://nltk.org/
Or you can try it online on
http://cst.dk/online/pos_tagger/uk/

Instead of reading a specific book, diving into the sea of papers might be an as good idea. http://www.aclweb.org, for example, contains many topics on NLP. Through those papers, you get references to more papers, some of which are the foundations of a certain branch of NLP. And because they were written by different authors, you are unlikely to be influenced too much by one point of view.

If you are a Java developer there is an extensive list of tutorials for how to build components of NLP systems using LingPipe at http://alias-i.com/lingpipe/demos/tutorial/read-me.html. Full disclosure I wrote some of those tutorials and one of the books below.
There are a few books that are more industrially oriented:
1) Natural Language Processing with Java by Richard M Reese
This covers how to do some common tasks with a range of open source toolkits (including LingPipe).
2) Natural Language Processing with Java and LingPipe Cookbook Paperback
by Breck Baldwin, Krishna Dayanidhi
This book is task driven at the level of "get the component built" and covers the major technologies driving most NLP systems that are text driven. It does not cover translation. It goes into more detail than the first book and has broader coverage than the LingPipe tutorials but is sometimes less detailed than the tutorials.
Breck

There is a hub for teaching and learning materials called TeLeMaCo. You can find resources for many aspects of NLP, and you can easily add more materials that you have found on the web.

Related

Data structures for bioinformatics [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
What are some data structures that should be known by somebody involved in bioinformatics? I guess that anyone is supposed to know about lists, hashes, balanced trees, etc., but I expect that there are domain specific data structures. Is there any book devoted to this subject?
The most fundamental data structure used in bioinformatics is string. There are also a whole range of different data structures representing strings. And algorithms like string matching are based on the efficient representation/data structures.
A comprehensive work on this is Dan Gusfield's Algorithms on Strings, Trees and Sequences
A lot of introductory books on bioinformatics will cover some of the basic structures you'd use. I'm not sure what the standard textbook is, but I'm sure you can find that. It might be useful to look at some of the language-specific books:
Bioinformatics Programming With Python
Beginning Perl for Bioinformatics
I chose those two as examples because they're published by O'Reilly, which, in my experience, publishes good quality books.
I just so happen to have the Python book on my hard drive, and a great deal of it talks about processing strings for bioinformatics using Python. It doesn't seem like bioinformatics uses any fancy special data structures, just existing ones.
Spatial hashing datastructures (kd-tree) for example are used often for nearest neighbor queries of arbitrary feature vectors as well as 3d protein structure analysis.
Best book for your $$ is Understanding Bioinformatics by Zvelebil because it covers everything from sequence analysis to structure comparison.
In addition to basic familiarity with the structures you mentioned, suffix trees (and suffix arrays), de Bruijn graphs, and interval graphs are used extensively. The Handbook of Computational Molecular Biology is very well written. I've never read the whole thing, but I've used it as a reference.
I also highly recommend this book, http://www.comp.nus.edu.sg/~ksung/algo_in_bioinfo/
And more recently, python is much more frequently used in bioinformatics than perl. So I really suggest you start with python, it is widely used in my projects.
Many projects in bioinformatics involve combining information from different, semi-structured sources. RDF and ontologies are essential for much of this. See, for example, the bio2RDF project. http://bio2rdf.org/. A good understanding of identifiers is valuable.
Much bioinformatics is exploratory and rapid lightweight tools are often used. See workflow tools such as Taverna where the primary resource is often a set of web services - so HTTP/REST are common.
Whatever your mathematical or computational expertise is, you are likely to find an application in computational biology. If not, make this another question of stackoverflow and you'll be helped :o)
As mentioned in the other answers, somewhat timeless are string comparisons and pattern discovery in 1-dimensional data since sequences are so easy to get. With a renewed interest in medical informatics though you also have two/three-dimensional image analysis that you run e.g. against genomic data. With molecular biochemistry you also have pattern searches on 3D surfaces and molecular simulations. To study drug effects you will work with gene networks and compare those across tissues. Typical challenges for big data and information integration apply. And then, you need statistical descriptions of the likelihood of a pattern or the clinical association of any features identified to be found by chance.

Priority of learning programming craft and other suggestions [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
As I am in my starting career year in software development (C++ & C#) I now see my flaws and what I miss in this sphere. Because of that I came into some conclusions and made myself a plan to fill those gaps and increase my knowledge in software development. But the question I stumbled upon after making a tasks which I need to do has not quite obvious answer to me. What is the priority of those tasks? Here are these tasks and my priority by numbering:
Learning:
Functional programming (Scala)
Data structures & Algorithms (Cormen book to the rescue + TopCoder/ProjectEuler/etc)
Design patterns (GOF or Head First)
Do you agree with this tasks and priorities? Or do I miss something here? Any suggestions are welcome!
I think you have it backwards. Start with design patterns, which will help you reduce the amount messy code you produce, and understand better code made by other people (particularly libraries written with design patterns in mind).
In addition to the book of four, there are many other design pattern books -- Patterns of Enterprise Application Architecture, for example. It might be worth looking at them after you get a good grounding. But I also highly recommend Domain Driven Design, which I think gives you a way of thinking about how to structure your program, instead of just identifying pieces here and there.
Next you can go with algorithms. I prefer Skiena's The Algorithm Design Manual, whose emphasis is more on getting people to know how to select and use algorithms, as well as building them from well known "parts" than on getting people to know to make proofs about algorithms. It is also available for Kindle, which was useful to me.
Also, get a good data structures book -- people often neglect that. I like the Handbook of Data Structures and Applications, though I'm also looking into Advanced Data Structures.
However, I cannot recommend either TopCoder or Euler for this task. TopCoder is, imho, mostly about writing code fast. Nothing bad about it, but it's hardly likely to make a difference on day-to-day stuff. If you like it, by all means do it. Also, it's excellent preparation for job interviews with the more technically minded companies.
Project Euler, on the other hand, is much more targeted at scientific computing, computer science and functional programming. It will be an excellent training ground when learning functional programming.
There's something that has a bit of design patterns, algorithms and functional programming, which is Elements of Programming. It uses C++ for its examples, which is a plus for you.
As for functional programming, I think it is less urgent than the other two. However, I indicate either Clojure or Haskell instead of Scala.
Learning functional programming in Scala is like learning Spanish in a latino neighborhood, while learning functional programming in Clojure is like learning Spanish in Madrid, and learning functional programming in Haskell is like learning Spanish in an isolated monastery in Spain. :-)
Mind you, I prefer Scala as a programming language, but I already knew FP when I came to it.
When you do get to functional programming, get Chris Okasaki's Purely Functional Data Structures, for a good grounding on algorithms and data structures for functional programming.
Beyond that, try to learn a new language every year. Even if not for the language itself, you are more likely to keep up to date with what people are doing nowadays.
Data structures and algorithms will help you no matter what language you use. I'd work on it first. Then design patterns (any OOP language will benefit from them). Functional programming is nice, but not necessarily a top priority.
Depends entirely on what you're doing.
I'd tailor which one you learn first to what would help you the most with your current job.
Write lots of code. Try to do it better every time. Occasionally work with more senior people, who can provide guidance praise and gentle correction.
I think that in general the topics that you have picked are very important, and my give you the chance to do something more than the usual boring stuff. However, I believe that the order should be something like this:
Data structures & Algorithms
Functional programming
Software Design
Specific technologies you need
My opinion is that Algorithms and data structures should be first. It is very hard to study algorithms if you have a lot of other things in you head (good coding practices, lots of programming paradigms, etc.). Also with time, people tend to become more lazy, and lose the patience to get into the ideas of this complex matter. On the other hand, missing some fundamental understanding about how things can be represented or operate, may lead to serious flaws in understanding anything more sophisticated. So, assuming that you have some ideas about imperative programming (the usual stuff tаught in the introductory courses) you should enhance your knowledge with algorithms and data structures.
It is important to have at least basic understanding of other paradigms. Functional programming is a good example. You may also consider getting familiar with logic programming. Having basic understanding of Algorithms and Data Structures will help you a lot in understanding how such languages work. I don't know whether Scala is the best language for that purpose, but will probably do. Alternatively, you can pick something more classic like Lisp, or Scheme. Haskell is also an interesting language.
About the Design Patterns... knowing design patterns will help you in doing object oriented design, but you should be aware, that design patterns are just a set of solutions to popular problems. Knowing Design Patterns is by no means that same as knowing how to design software. In order to improve you software design skills you should study other materials too. A good example from where you can get understanding about these concepts is the book Code Complete, or the MIT course 6.170 (its materials are publicly available).
At some point you will need to get into the details of a specific framework (or frameworks) that you will need for what you do. Keep in mind, that such frameworks change, and you should be able to adapt, and learn new technologies. For instance, knowing ASP.NET MVC now, may be worthless 5 years from now (or may not be, who knows?).
Finally, keep in mind, that no matter what you read, you need to practice a lot, which means solving problems, writing code, designing software, etc. Most of these concepts can not be easily explained, or even expressed with words, so you will need to reach most of them by yourself, (that is, you will need to reinvent the wheel many times).
Good luck with your career!
If would think Functional Programming would be low in priority since the languages you use are OO in nature, I would think spending some time in Design Patterns and on the specifics of the language itself would be more useful.
I read both GOF and HeadFirst, HeadFirst is probably the easier and more fun of the 2 but much thicker. You should probably look at Enterprise Design Patterns, like Martin Fowler's page http://martinfowler.com/eaaCatalog/
What field do you think you will work in? Games ? Web? That will probably decide how important the Algo part would be for.
I would say that you first need to understand (even if not remember) the base algorithms and data structures. (use Knuth and Cormen), then get to learn architecture (design patterns are here.)..
Functional programming is just one type of programming and is mandatory. There are many great programmers that are not using functional programming, but I assume that for all kinds you must first know the basics- algorithms and data structures.
I'd say #2 goes first, especially if you are planning to use C++/C# at work, having a good command of data structures and algorithms will give you some edge. I see #1 and #3 as somewhat parallel paths, but I do have a couple of suggestions: start with the Head First book for patterns, the GOF is more like a reference book and also the notation and language may get quite abstruse. As for functional programming, may I suggest Clojure instead of Scala? I'm convinced that a "functional-first" language (like F# or Clojure) will force you to think functional (a good thing) instead of just patching your O-O/imperative skills.

Learning Pascal FC

I'm looking for tutorials and examples on Pascal FC's channels and rendesvouz mechanisms.
There is a nice introductory tutorial, but unluckily it is in Spanish.
You may also take a look at the language's reference manual and user guide, but they are not suitable for learning the language from scratch.
I have not found any digital, freely accessible, introductory material in English yet.
I've been learning with a Spanish book on Concurrent Programming and there may be a bunch of books that explain topics on Concurrent Programming with Pascal-FC, but I have not checked them.
However, you might find the bibliographies of this papers useful:
Teaching concurrent programming with Pascal-FC
Pascal-FC: a language for teaching concurrent programming
You do not need to download the papers to see the bibliography, the list is shown on the webpage. However, there are some explanations and examples in the papers that might be useful to you, it would thus be nice if you could access those papers.
There is also another thorough list of books on Concurrent Programming which you may want to have a look at if you are realy looking forward to learning Pascal-FC.

Genetic algorithm resource [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Lately I'm interested in the topic of genetic algorithms, but I couldn't find any good resource. If you know any good resource, book or a site I would appreciate it. I have solid knowledge of algorithms and Artificial Intelligence but I'm looking for something with good introduction in Genetic Programming.
Best references for me so far:
Genetic Algorithms in Search,
Optimization, and Machine
Learning by David E. Goldberg: a
classic, still considered as the
bible of GAs by many.
An Introduction to Genetic
Algorithms by Melanie Mitchell:
more recent than the previous reference and packed with
probably more interesting examples.
A Field Guide to Genetic Programming by Poli, Langdon, McPhee: this is more of a hands on guide and is getting very good reviews.
Also if you're an absolute beginner I'd suggest you to start with the Hello World of Genetics Algorithms. There's nothing like a nice clean example to get started.
I found Melanie Mitchell's book, An Introduction to Genetic Algorithms, to be very good. For a wider coverage of evolutionary computation topics, Introduction to Evolutionary Computing by Eiben and Smith is also worthwhile.
If you're just starting out, I recently wrote an introductory article that may be of use.
There are further links both in that article and also on the home page for my evolutionary computation framework.
I know this is an old question, but no answer has been accepted yet, so I thought I'd add my own contribution. One of the best free resources in my opinion for all things related to evolutionary computation (genetic algorithms, evolution strategies, genetic programming, etc.) is Sean Luke's online book Essentials of Metaheuristics.
This is a nice free book on the subject
http://www.lulu.com/items/volume_63/2167000/2167025/2/print/book.pdf
There is a great introduction to genetic algorithms at AI-Junkie.com as well as tutorials on many other AI and machine learning techniques. The genetic algorithms tutorial is aimed to 'explain genetic algorithms sufficiently for you to be able to use them in your own projects' while keeping the mathematics down as much as possible.
Here is Roger Alsing's recent article about building "Mona Lisa's picture" with a genetic algorithm :http://rogeralsing.com/2008/12/07/genetic-programming-evolution-of-mona-lisa/
Edited to remove hot link to the picture See: http://rogeralsing.files.wordpress.com/2008/12/evolutionofmonalisa1.gif
I've implemented my own version of this algorithm:
(source: tumblr.com)
See http://plindenbaum.blogspot.com/2008/12/random-notes-2008-12.html
Clever Algorithms: Nature-Inspired Programming Recipes
by Jason Brownlee PhD.
This book is available free in PDF. Book covers large amount of nature-inspired algorithms, including evolutionary, swarm and neural algorithms.
A short introduction I wrote a long time ago is available here, but a better short introduction is here.
For a larger and comprehensive, although somewhat out-dated, list of resources visit the comp.ai.genetic FAQ.
If I may plug one of my favorite books, The Algorithm Design Manual by Steve Skiena has a great section on genetic algorithms (plus a lot of other interesting heuristics for solving various types of problems).
The book Programming Collective Intelligence by OReilly had chapter covering genetic algorithms.
It might be a little bit to basic but it was a very illustrating example.
Practical Genetic Algorithms
'An Introduction to Genetic Algorithms' http://www.burns-stat.com/pages/Tutor/genetic.html
For an introductory approach (with an application to the Prisoner's Dilemma), see into:
http://www2.econ.iastate.edu/tesfatsi/holland.gaintro.htm
I implemented a Genetic Algorithm with java generics. https://github.com/juanmf/ga
It will apply the 3 operators (Mutation, crossing, Selection), and evolve a population, given the concrete implementations of Individual, Gen, FitnessMeter and factories exposed as spring beans.
/*This is all you have to add to the Spring App context
* before running the application
*/
#Configuration
public class Config {
#Bean(name="individualFactory")
public IndividualFactory getIndividualFactory() {
return new Team.TeamFactory();
}
#Bean(name="populationFactory")
public PopulationFactory getPopulationFactory() {
return new Team.TeamPopulationFactory();
}
#Bean(name="fitnessMeter")
public FitnessMeter getFitnessMeter() {
System.out.println("getFitnessMeter");
return new TeamAptitudeMeter();
}
}
This is the design, inside grandt there is an implementation of a specific problem solution, as an example.

Natural Language Processing in Ruby [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I'm looking to do some sentence analysis (mostly for twitter apps) and infer some general characteristics. Are there any good natural language processing libraries for this sort of thing in Ruby?
Similar to Is there a good natural language processing library but for Ruby. I'd prefer something very general, but any leads are appreciated!
Three excellent and mature NLP packages are Stanford Core NLP, Open NLP and LingPipe. There are Ruby bindings to the Stanford Core NLP tools (GPL license) as well as the OpenNLP tools (Apache License).
On the more experimental side of things, I maintain a Text Retrieval, Extraction and Annotation Toolkit (Treat), released under the GPL, that provides a common API for almost every NLP-related gem that exists for Ruby. The following list of Treat's features can also serve as a good reference in terms of stable natural language processing gems compatible with Ruby 1.9.
Text segmenters and tokenizers (punkt-segmenter, tactful_tokenizer, srx-english, scalpel)
Natural language parsers for English, French and German and named entity extraction for English (stanford-core-nlp).
Word inflection and conjugation (linguistics), stemming (ruby-stemmer, uea-stemmer, lingua, etc.)
WordNet interface (rwordnet), POS taggers (rbtagger, engtagger, etc.)
Language (whatlanguage), date/time (chronic, kronic, nickel), keyword (lda-ruby) extraction.
Text retrieval with indexation and full-text search (ferret).
Named entity extraction (stanford-core-nlp).
Basic machine learning with decision trees (decisiontree), MLPs (ruby-fann), SVMs (rb-libsvm) and linear classification (tomz-liblinear-ruby-swig).
Text similarity metrics (levenshtein-ffi, fuzzy-string-match, tf-idf-similarity).
Not included in Treat, but relevant to NLP: hotwater (string distance algorithms), yomu (binders to Apache Tiki for reading .doc, .docx, .pages, .odt, .rtf, .pdf), graph-rank (an implementation of GraphRank).
There are some things at Ruby Linguistics and some links therefrom, though it doesn't seem anywhere close to what NLTK is for Python, yet.
You can always use jruby and use the java libraries.
EDIT: The ability to do ruby natively on the jvm and easily leverage java libraries is a big plus for rubyists. This is a good option that should be considered in a situation like this.
Which NLP toolkit to use in JAVA?
I found an excellent article detailing some NLP algorithms in Ruby here. This includes stemmers, date time parsers and grammar parsers.
TREAT – the Text REtrieval and Annotation Toolkit – is the most comprehensive toolkit I know of for Ruby: https://github.com/louismullie/treat/wiki/
I maintain a list of Ruby Natural Language Processing resources (libraries, APIs, and presentations) on GitHub that covers the libraries listed in the other answers here as well as some additional libraries.
Also consider using SaaS APIs like MonkeyLearn. You can easily train text classifiers with machine learning and integrate via an API. There's a Ruby SDK available.
Besides creating your own classifiers, you can pick pre-created modules for sentiment analysis, topic classification, language detection and more.
We also have extractors like keyword extraction and entities, and we'll keep adding more public modules.
Other nice features:
You have a GUI to create/test algorithms.
Algorithms run really fast in our cloud computing platform.
You can integrate with Ruby or any other programming language.
Try this one
https://github.com/louismullie/stanford-core-nlp
About stanford-core-nlp gem
This gem provides high-level Ruby bindings to the Stanford Core NLP package, a set natural language processing tools for tokenization, sentence segmentation, part-of-speech tagging, lemmatization, and parsing of English, French and German. The package also provides named entity recognition and coreference resolution for English.
http://nlp.stanford.edu/software/corenlp.shtml
demo page
http://nlp.stanford.edu:8080/corenlp/
You need to be much more specific about what these "general characteristics" are.
In NLP "general characteristics" of a sentence can mean a million different things - sentiment analysis (ie, the attitude of the speaker), basic part of speech tagging, use of personal pronoun, does the sentence contain active or passive verbs, what's the tense and voice of the verbs...
I don't mind if you're vague about describing it, but if we don't know what you're asking it's highly unlikely we can be specific in helping you.
My general suggestion, especially for NLP, is you should get the tool best designed for the job instead of limiting yourself to a specific language. Limiting yourself to a specific language is fine for some tasks where the general tools are implemented everywhere, but NLP is not one of those.
The other issue in working with Twitter is a great deal of the sentences there will be half baked or compressed in strange and wonderful ways - which most NLP tools aren't trained for. To help there, the NUS SMS Corpus consists of "about 10,000 SMS messages collected by students". Due to the similar restrictions and usage, analysing that may be helpful in your explorations with Twitter.
If you're more specific I'll try and list some tools that will help.
I would check out Mark Watson's free book Practical Semantic Web and Linked Data Applications, Java, Scala, Clojure, and JRuby Edition. He has chapters on NLP using java, clojure, ruby, and scala. He also provides links to the resources you need.
For people looking for something more lightweight and simple to implement this option worked well for me.
https://github.com/yohasebe/engtagger

Resources