Dynamic text-pattern detection algorithm? [closed] - algorithm

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
I was wondering if such algorithm exists. I have a bunch of text documents and would like to find a pattern among all these documents, if a pattern exists. Please note im NOT trying to classify the documents all i want to do is find a pattern if it exists among some documents. Thanks!

The question as it stands now is kinda vague.. you kinda need to know what you are looking for in order to be able to find it.
Some ideas that may be of use -
Get n-gram counts for each document separately for n = 1,2,3,4 and then compare the frequencies of each ngram across the documents. This should help you find commonly occuring phrases across all documents.
Get a part of speech tagger to get convert all the docs into a stream of POS tags and then do the same as 1
Use a PCFG software such as the Stanford Parser to get parse trees for all the sentences across all the documents, and then try to figure out how similar the distribution of sentence structures are for different documents.

Related

Tag Generating Algorithm [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I'm trying to think of an algorithm that can search through a piece of text looking for keywords for example i have an array of words:
Sample=['Andy' 'Murray' 'is' 'expecting' 'a' 'difficult' 'test' 'when' 'he' 'faces' 'David' 'Ferrer' 'in' 'the' 'final' 'of' 'the' 'Sony' 'Open' 'on' 'Sunday'];
I want to pick out the important words like "Andy, Murray, David, Ferrer, Sunday, Open, Final" etc but my knowledge of the technical side of english is limited so i dont know the types of words I should be ignoring.
are there any other good methods of finding tags from text you can suggest? /do you know the types of words i should be ignoring etc
p.s i would prefer any code to be in c++ but thats not a requirement :)
The classic way in the field of Information Retrieval to do so is using the tf-idf model.
The tf component indicate how much times each term repeats in the
document/sentence - the more the 'better' - since it indicates importance in the text.
The idf component indicates how many documents in the collection have this term in them, the lower this number is - the more significant the word is (because if a rare word appears in a text, it helps you to use this word to split this document from the others much better, for intuition - the word 'the' will most likely say nothing about the document, and the idf value makes sure its weight is small).

Algorithm for News Ranking? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
Now. I build a formula for news to get list hot news.
have any factor as pageview, time,content.
what does a solution for this problem?
Thanks
You can try a machine learning approach for this problem.
Extract your features, and give each a numeric value (you can use
the Bag of Words model for content). Note that some feature selection algorithm might be needed.
Manually label large enough set of examples - and give each of them a score according to its importance.
Use linear regression and build a function that evaluates each article and gives it a score.
Now that you have your regression function, you can use it to give score to each article. Use it to achieve the raw score.
For post processing - combine this score with the time in a second function to get the article's final score. #MattBall's suggested link seems like reasonable approach.

How to create an efficient auto-complete? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
I want to implement a "text" suggestion.
I have a huge number of data, how I can implement an efficient and scalable auto-complete?
Edit 1:
i have a mysql table with one client per row and a 'name' column, i want to create a suggest in order to search client name (like google suggest but instead of queries it is client name) - I have a huge numbers of rows, how I can design an efficient suggest?
When user will start typing inside an "input text", I want to display possible client names
OK, I think I understand what you're looking for and here are some possible solutions for you:
What is the best autocomplete/suggest algorithm,datastructure [C++/C] (the answers are generic enough despite the fact that it's a C/C++ question)
How to implement autocomplete on a massive dataset
Autocomplete using a trie
Algorithm for autocomplete?
Trie based addressbook and efficient search by name and contact number
How do you autocomplete names containing spaces?
Essentially, it seems like you're looking for auto-complete functionality (if I understood your question correctly). Along those lines, the above questions and their answers also provide a lot of references on how to do more complex suggestions (i.e. based on content, semantics, intent, etc.).
This will probably not address your question if you're looking for an algorithm that makes "related" suggestions, e.g.:
"water" may suggest kool-aid, gatorade, vitamin water.
"sea" may suggest ocean, lake, river

Matching jobs to applicants [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
I have a big database of jobs, each job has numerical and non numerical attributes like(position,field,salary, needed experience...)
and applicants to jobs that determine some attributes like(age, expected salary...)
I want to create an application that do automatic matching between the jobs and the appropriate candidates*What is the best Algorithm to apply (data mining or artificial intelligence) to implement this app.*
thx for your replies
It seems that what you want is a recommendation algorithm, not matching algorithm.
There is not best recommendation algorithm which work for all cases. You should look into several algorihms and select which suits best for your situation. I recommend you to look at Apache Mahout which implements lots of such recommendation algorithms and is an open-source library.

Algorithm to understand meaning [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
I want to know if is there any specific algorithm that can be followed to understand the meaning of a word/sentence/paragraph. Basically, I want to write a program that will take text/paragraph as input and try to find out what its meaning is. And thereby highlight the emotions within the text.
Also, if there is an algorithm to understand things, can the same algorithm be applied to itself? It reduces the quest further to a point where we become interested in knowing meaning of meaning OR rather definition of definition.
You want Natural Language Processing and Semantic Technology. This is still a flourishing area in computer science. Look at things such as a Semantic Reasoner. You can start with Jena. There are also other things you can look at such as Academic Thesis papers.

Resources