QA Algorithm for Q Processing - algorithm

What Algorithm/method do I use for a Question Answering System's Question Processing?
I have been searching possible algorithms for my Question Answering System, the only thing that I think that would be possible to use is Parsing but I have asked about parsing in my last question and with the answers there i think its not possible to be used?(I'm not sure).
My idea of using Parsing is by Cutting the question into pieces word per word and then it will go through a Storage of Words that would determine what Kind of Word(noun,adjective,verb,etc) is being said. My purpose of using Parsing is to remove or rather to determine the Topic of the question.
The other idea of mine is the ChatterBot. A Chatterbot uses a query of words? Correct me if I'm not mistaken and those words are assigned to another Word. It would randomly choose a word from its Query.
Example: User's Statement: Hello > ChatterBot's Possible Replies: Hi,Hello,Hey
I'm not quite sure what is the possible method/algorithm to use in a Question Answering, I have read the Wikipedia post : http://en.wikipedia.org/wiki/Question_answering but I do not quite understand what algorithm to use in Question Processing.
Thank you.
PS: I'm developing in Javascript. Q = Question

You could use a naive bayes classifier in order to look at the questions and determine their subject. You'd need a lot of training data and a fairly narrow domain.
The sophisticated responses to this problem involve a lot of machine inference techniques which are a bit out of my skill level to explain extremely well. My idea is to use a markov network in which each word has an edge to one or two words next to it. A series of tests are applied to each word which indicate likely memberhood of that word to one of its possible meanings (For example, Mark is more likely a name if it's capitalized, but if the next word is 'a' it probably is used in the sense of a verb.) From there the machine can attempt to determine the actual meaning of the sentence, which will rely on the use of, again, unimaginably large amounts of training data.
Coursera's Probabilistic Graphical Models class (Probably their NLP class too) would probably be the best resource if you're interested in becoming skilled in this area. (PGM is the only reason I know anything about this!)

here's a great book, you may need to read to get a lot of stuff related to NLP, and Question answering systems http://www.amazon.com/Speech-Language-Processing-2nd-Edition/dp/0131873210
the book has a full section (V.Applications) that will help you a lot to develop a good system.
but note that the book is discussing theories and algorithms only (no code)
it's not about parsing text only, you'll need to understand the context to provide better answer. actually you need to extract some keywords and ignore everything else.
also you may read in topics Keywords (Bag of words), algorithms like (TF/IDF).

Related

Do I need to remember the hard-code for different sorts?

I am currently a freshmen in college doing some self-studying about the different sorting algorithms.
My studying source does provide the codes and I did some practice on it (coding the sorting base on the concept it). At the moment, I can provide selection sort with just a little bit of trouble (coding that is).
Am I required to memorize the codes? I know the difference between the sorts and the concept behind it. Do I need to memorize the Pseudocodes behind it as well? Will interviewers ever ask you to produce the codes on the spot?
There is no need to memorize the exact code syntax , but it would be important to understand the logic behind the sorting algorithms (ie. be able to explain using pseduocode).
I've been asked in interview how to do some basic sorting algorithms like a bubble sort, but nothing very complex. I was not required to write the exact "code" in any particular language, but just prove that i know the logic and can explain how it works.
Hello and welcome on Stack Overflow. I'm answering your post below, even though it will be closed as too broad or primarly opinion-based, because this QA site is oriented towards coding questions. You might want to ask this on another QA site, like programmers stackexchange.
Do I need to memorize the Pseudocodes behind it as well?
not really, as in every decent language you'll have a standard library that offers you state of the art implementations, and what you really need to remember is the complexity and mechanism of each sort algorithm, to choose the best fit for your dataset, when you need it.
And otherwise, when you really need to dig again in the pseudocodes, there are books (like the Art of Computer Programming by Donald Knuth), and wikipedia, and many other resources online.
Will interviewers ever ask you to produce the codes on the spot?
Yes they will. It happened to me at least five times. But most of the time, they'll understand that you might not remember the full pseudocode on the spot, but they expect you to know the mechanism and complexity, and be able to reinvent the algorithm on the spot.
Though, when you do interviews, you're usually compared to other people passing the same interview, and between two candidates that passes, they'll choose the one that did the best on the tests… And then you might loose a job opportunity because someone else remembered better those algorithms.

Evaluating language identification methods

Part of my thesis work is to evaluate number of language detection methods that are already available and then finally implement one them.
For this I have chosen the following methods,
N-Gram-Based Text Categorization by Cavnar and Trenkle
Statistical Identification of Language by Ted Dunning
Using compression-based language models for text categorization by Teahan and Harper
Character Set Detection
A composite approach to language/encoding detection
I have to first evaluate the methods and preferably present a table with accuracy for each of these methods. My question is that in order to find the accuracy of each of these methods, do I need to go ahead a build the language models using training data, then test them and record the accuracy or is there any other approach that I can follow here. Though most of the researches already include these accuracy tables, I am not sure if it's accepted in my education to simply grab it and present in the report.
Appreciate any thoughts on this.
I would also suggest asking your thesis advisor. Implementing all of them will be a lot of work, and it is very difficult to really compare them without being able to test them. If I remember correctly the last three have not been well evaluated in the literature, so it would be difficult to compare their results. I have implemented (and evaluated) only the first one of those myself. One big question is also how big a part of your thesis this LI evaluation and implementation is?

Techniques for calculating adjective frequency [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I need to calculate word frequencies of a given set of adjectives in a large set of customer support reviews. However I don't want to include those that are negated.
For example suppose my list of adjectives was: [helpful, knowledgeable, friendly]. I want to make sure "friendly" isn't counted in a sentence such as "The representative was not very friendly."
Do I need to do a full NLP parse of the text or is there an easier approach? I don't need super high accuracy.
I'm not at all familiar with NLP. I'm hoping for something that doesn't have such a steep learning curve and isn't so processor intensive.
Thanks
If all you want is adjective frequencies, then the problem is relatively simple, as opposed to some brutal, not-so-good machine learning solution.
Wat do?
Do POS tagging on your text. This annotates your text with part of speech tags, so you'll have 95% accuracy or more on that. You can tag your text using the Stanford Parser online to get a feel for it. The parser actually also gives you the grammatical structure, but you only care about the tagging.
You also want to make sure the sentences are broken up properly. For this you need a sentence breaker. That's included with software like the Stanford parser.
Then just break up the sentences, tag them, and count all things with the tag ADJ or whatever tag they use. If the tags don't make sense, look up the Penn Treebank tagset (Treebanks are used to train NLP tools, and the Penn Treebank tags are the common ones).
How?
Java or Python is the language of NLP tools. Python, use NLTK. It's easy, well documented and well understood.
For Java, you have GATE, LingPipe and the Stanford Parser among others. It's a complete pain in the ass to use the Stanford Parser, fortunately I've suffered so you do not have to if you choose to go that route. See my google page for some code (at the bottom of the page) examples with the Stanford Parser.
Das all?
Nah, you might want to stem the adjectives too- that's where you get the root form of a word:
cars -> car
I can't actually think of a situation where this is necessary with adjectives, but it might happen. When you look at your output it'll be apparent if you need to do this. A POS tagger/parser/etc will get you your stemmed words (also called lemmas).
More NLP Explanations
See this question.
It depends on the source of your data. If the sentences come from some kind of generator, you can probably split them automatically. Otherwise you will need NLP, yes.
Properly parsing natural language pretty much is an open issue. It works "largely" for English, in particular since English sentences tend to stick to the SVO order. German for example is quite nasty here, as different word orders convey different emphasis (and thus can convey different meanings, in particular when irony is used). Additionally, German tends to use subordinate clauses much more.
NLP clearly is the way to go. At least some basic parser will be needed. It really depends on your task, too: do you need to make sure every one is correct, or is a probabilistic approach good enough? Can "difficult" cases be discarded or fed to a human for review? etc.

Artificial Intelligence/Rules to guess user taste in Apparel/Clothing

Are there standard rules engine/algorithms around AI that would predict the user taste on a particular kind of product like clothes.
I know it's one thing all e-commerce website will kill for. But I am looking out for theoretical patterns defined out there which would help make that prediction in a better way, if not accurately.
Two books that cover recommender systems:
Programming Collective Intelligence: Python, does a good job explaining the algorithm, but doesn't provide enough help IMO in terms of understanding how to scale.
Algorithms of the Intelligent Web: Java, harder to follow, but also covers using persistence, in this case MySQL, to facilitate scaling and identifiers areas in example code that will not scale as-is.
Basically two ways of approaching the problem, user or item based. Netflix appears to use the former, while Amazon the latter. Typically user based requires more time and/or processing power to generate recommendations because you tend to have more users than items to consider.
Not sure how to answer this, as this question is overly broad. What you are describing is a Machine Learning kind of task, and thus would fall under that (very broad) umbrella. There are a number of different algorithms that can be used for something like this, but most texts would tell you that the definition of the problem is the important part.
What parts of fashion are important? What parts are not? How are you going to gather the data? How noisy is the data? All of these are important considerations to the problem space. Pandora does a similar type of thing with music, with their big benefit being that their users tell them initially what they like and don't like.
To categorize their music, they actually have trained musicians listening to the music to identify all sorts of stuff. See the article on Ars Technica here for more information about that. Based on what I know about fashion tastes, I would say that it is a similar problem space, and would probably require experts to "codify" the information before you could attempt to draw parallels.
Sorry for the vague answer - if you want more specifics, I would recommend asking a more specific question, about specific algorithms or data sets, etc.

Basic programming/algorithmic concepts [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I'm about to start (with fellow programmers) a programming & algorithms club in my high school. The language of choice is C++ - sorry about that, I can't change this. We can assume students have little to no experience in the aforementioned topics.
What do you think are the most basic concepts I should focus on?
I know that teaching something that's already obvious to me isn't an easy task. I realize that the very first meeting should be given an extreme attention - to not scare students away - hence I ask you.
Edit: I noticed that probably the main difference between programmers and beginners is "programmer's way of thinking" - I mean, conceptualizing problems as, you know, algorithms. I know it's just a matter of practice, but do you know any kind of exercises/concepts/things that could stimulate development in this area?
Make programming fun!
Possible things to talk about would be Programming Competitions that either your club could hold itself or it could enter in locally. I compete in programming competitions at the University (ACM) level and I know for a fact that they have them at lower levels as well.
Those kind of events can really draw out some competitive spirit and bring the club members closer.
Things don't always have to be about programming either. Perhaps suggest having a LAN party where you play games, discuss programming, etc could be a good idea as well.
In terms of actual topics to go over that are programming/algorithm related, I would suggest as a group attempting some of these programming problems in this programming competition primer "Programming Challenges": Amazon Link
They start out with fairly basic programming problems and slowly progress into problems that require various Data Structures like:
Stacks
Queues
Dictionaries
Trees
Etc
Most of the problems are given in C++.
Eventually they progress into more advanced problems involving Graph Traversal and popular Graph algorithms (Dijkstra's, etc) , Combinatrics problems, etc. Each problem is fun and given in small "story" like format. Be warned though, some of these are very hard!
Edit:
Pizza and Soda never hurts either when it comes to getting people to show up for your club meetings. Our ACM club has pizza every meeting (once a month). Even though most of us would still show up it is a nice ice breaker. Especially for new clubs or members.
Breaking it Down
To me, what's unique about programming is the need to break down tasks into small enough steps for the computer. This varies by language, but the fact that you may have to write a "for loop" just to count to 100 takes getting used to.
The "top-down" approach may help with this concept. You start by creating a master function for your program, like filterItemsByCriteria();
You have no idea how that will work, so you break it down into further steps:
(Note: I don't know C++, so this is just a generic example)
filterItemsByCritera() {
makeCriteriaList();
lookAtItems();
removeNonMatchingItems();
}
Then you break each of those down further. Pretty soon you can define all the small steps it takes to make your criteria list, etc. When all of the little functions work, the big one will work.
It's kind of like the game kids play where they keep asking "why?" after everything you say, except you have to keep asking "how?"
Linked lists - a classic interview question, and for good reason.
I would try to work with a C subset, and not try to start with the OO stuff. That can be introduced after they understand some of the basics.
Greetings!
I think you are getting WAY ahead of yourself in forcing a specific language and working on specific topics and a curriculum.. It sounds like you (and some of the responders) are confusing "advising a programming club" with "leading a programming class". They are very different things.
I would get the group together, and the group should decide what exactly they want to get out of the club. In essence, make a "charter" for the club. Then (and only then) can you make determinations such as preferred language/platform, how often to meet, what will happen at the meetings, etc.
It may turn out that the best approach is a "survey", where different languages/platforms are explored. Or it may turn out that the best approach is a "topical"one, where there topic changes (like a book club) on a regular basis (this month is pointers, next month is sorting, the following is recursion, etc.) and then examples and discussions occur in various languages.
As an aside, I would consider a "language-agnostic" orientation for the club. Encourage the kids to explore different languages and platforms.
Good luck, and great work!
Well, it's a programming club, so it should be FUN! So I would say dive into some hand on experience right away. Start with explaining what a main() method is,then have students write a hello world program. Gradually improve the hello world program so it has functions and prints out user inputs.
I would say don't go into algorithm too fast for beginners, let them play with C++ first.
Someone mentioned above, "make programming fun". It is interesting today that people don't learn for the sake of learning. Most people want instant gratification.
Teach a bit of logic using Programming. This helps with(and is) problem solving. The classing one I have in my head are guessing games.
Have them make a program that guesses at a number between 0 and 100.
Have them make a black jack clone ... I have done this in basic :-(
Make paper instructions.
Explain the "Fried eggs" story. Ask the auditory what they would do to make themselves fried eggs. Make them note the step they think about. Probably you will receive less than 5 steps algorithm. Then explain them how many steps should be written down if we want to teach a computer to fry eggs. Something like:
1) Go to the Fridge
2) Open the fridge door
3) Search for eggs
4) If there are no eggs - go to the shop to buy eggs ( this is another function ;) )
5) If there are eggs - calculate how many do you need to fry
6) Close the fridge door
7) e.t.c. :)
Start with basics of C - syntax semantics e.t.c, and in parallel with that explain the very basic algorithms like bubble sort.
After the auditory is familiar with structured programming (this could take several weeks or months, depending how often you make the lessons), you can advance to C++ and OOP.
The content in Deitel&Deitel's C++ programming is a decent introduction, and the exercises proposed at the end of each chapter are nice toy problems.
Basically, you're talking about:
- control structures
- functions
- arrays
- pointers and strings
You might want to follow up with an introduction to the STL ("ok, now that we've done it the hard way... here's a simpler option")
Start out by making them understand a problem like for instance sorting. This is very basic and they should be able to relate quite fast. Once they see the problem then present them with the tools/solution to solve it.
I remember how it felt when I first was show an example of merge-sort. I could follow all the steps but what the hell was I for? Make then crave a solution to a problem and they will understand the tool and solution much better.
start out with a simple "hello world" program. This introduces fundamentals such as variables, writing to a stream and program flow.
Then add complexity from there (linked lists, file io, getting user input, etc).
The reason I say start with hello world is because the kid will get to see a running program really quick. It's nearly immediate feedback-as they will have written a running program right from the start.
IMO, Big-O is one of the more important concepts for beginning programmers to learn.
Have a debugging contest. Provide code samples that include a bug. Have a contest to see who can find the most or fastest.
There is an excellent book, How Not to Program in C++, that you could use to start with.
You always learn best from mistakes and I prefer to learn from some else's.
It will also let those with little experience learn by see code, even if the code only almost works.
In addition to the answers to this question, there are certain important topics to cover. Here's an example of how you could structure the lessons.
First Lesson: Terminology and Syntax
Terminology to cover: variable, operator, loop (iteration), method, reserved word, data type, class
Syntax to cover: assignment, operation, if/then/else, for loop, while loop, select, input/output
Second Lesson: Basic Algorithm Construction
Cover a few simple algorithms, involving some input, maybe a for or a while loop.
Third Lesson: More Advanced Algorithm Topics
This is for things like recursion, matrix manipulation, and higher-level mathematics. You don't have to get into too complex of topics, but introduce enough complexity to be useful in a real project.
Final Lesson: Group Project
Make a project that groups can get involved in doing.
These don't have to be single day lessons. You can spread the topics across multiple days.
Pseudocode should be a very first.
Edit: If they are total programming beginners then I would make the first half just about programming. Once you get to a level where talking about algorithms would make sense then pseudocode is really important to get under the nails.
Thanks for your replies!
And how would you teach them actual problem solving?
I know a bunch of students that know C++ syntax and a few basic algorithms, but they can't apply the knowledge they know when they solve real problems - they don't know the approach, the way to transcribe their thoughts into a set of strict steps. I do not talk about 'high-level' approaches like dynamic programming, greedy etc., but about basic algorithmic mindset.
I assume it's just because of the poor learning process they were going through. In other sciences - math, for example - they are really brilliant.
Just because you are familiar with algorithms does not mean you can implement them and just because you can program does not mean you can implement an algorithm.
Start simple with each topic (keep programming separate from designing algorithms). Once they have a handle on each, slowly start to bring the two concepts together.
Wow. C++ is one of the worst possible languages to start with, in terms of the amount of unrelated crap you need to get anything working (Java would be slightly worse, I guess).
When teaching beginners in a boilerplate-heavy environment, it's usual to start with "here's a simple C program. We'll discuss what all this crap at the top of the file is for later, but for now, concentrate on the lines between 'int main(void)' and the 'return' statement, which is where all the useful work is accomplished".
Once you're past that point, basic concepts to cover include the basic data structures (arrays, linked lists, trees, and dictionaries), and the basic algorithms (sorting, searching, etc).
Have your club learn how to actually program in any language by teaching the concepts of building software. Instead of running out an buying a dozen licenses for Visual Studio, have students use compilers, make systems, source files, objects and librarys in order to turn their C code into programs. I feel this is truly the beginning and actually empowers these kids to understand how to make software on any platform, without crutches that many educational institutions like to rely on.
As for the language of choice - congratulations - you'll find C++ is very rich in making you think of mathematical shortcuts and millions of ways to make your code perform even better (or to implement fancy patterns).
To the question: When I was beggining to program I would always try to break down one real life problem into several steps and then as I see similarity between tasks or data they transform I would always try to find a lazier, easier, meanier way to implement it.
Elegance came after when learning patterns and real algorithms.
Hank: Big O??? you mean tell beginning programmers that their code is of O(n^2) and yours is of n log n ??
I could see a few different ways to take this:
1) Basic programming building blocks. What are conditional statements, e.g. switch and if/else? What are repetition statements, e.g. for and while loops? How do we combine these to get a program to be the sequence of steps we want? You could take something as easy as adding up a grocery bill or converting temperatures or distances from metric to imperial or vice versa. What are basic variable types like a string, integer, or double? Also in here you could have Boolean Algebra for an advanced idea or possibly teach how to do arithmetic in base 2 or 16 which some people may find easy and others find hard.
2) Algorithmically what are similar building blocks. Sorting is a pretty simple topic that can be widely discussed and analysed to try to figure out how to make this faster than just swapping elements that seem out of order if you learn the Bubblesort which is the most brain dead way to do.
3) Compiling and run-time elements. What is a call stack? What is a heap? How is memory handled to run a program,e.g. the code pieces and data pieces? How do we open and manipulate files? What is compiling and linking? What are make files? Some of this is simple, but it can also be eye-opening just to see how things work which may be what the club covers most of the time.
These next 2 are somewhat more challenging but could be fun:
4) Discuss various ideas behind algorithms such as: 1) Divide and conquer, 2) Dynamic programming, 3) Brute force, 4) Creation of a data structure, 5) Reducing a problem to a similar one already solved for example Fibonacci numbers is a classic recursive problem to give beginning programmers, and 6) The idea of being, "greedy," like in a making change example if you were in a country where coin denominations where a,b, and c. You could also get into some graph theory examples like a minimum weight spanning tree if you want something somewhat exotic, or the travelling salesmen for something that can be easy to describe but a pain to solve.
5) Mathematical functions. How would you program a factorial, which is the product of all numbers from 1 to n? How would you compute the sums of various Arithmetic or Geometric Series? Or compute the number of Combinations or Permutations of r elements from a set of n? Given a set of points, approximate the polynomial that meets this requirement, e.g. in a 2-dimensional plane called x and y you could give 2 points and have people figure out what are the slope and y intercept if you have solved pairs of linear equations already.
6) Lists which can be implemented using linked lists and arrays. Which is better for various cases? How do you implement basic functions such as insert, delete, find, and sort?
7) Abstract Data Structures. What are stacks and queues? How do you build and test classes?
8) Pointers. This just leads to huge amounts of topics like how to allocate/de-allocate memory, what is a memory leak?
Those are my suggestions for various starting points. I think starting a discussion may lead to some interesting places if you can get a few people together that don't mind talking on the same subject week after week in some cases as sorting may be a huge topic to cover well if you want to get into the finer points of things.
You guys could build the TinyPIM project from "C++ Standard Library from Scratch" and then, when it's working, start designing your own extensions.

Resources