Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I need a simple random English sentence generator. I need to populate it with my own words, but it needs to be capable of making longer sentences that at least follow the rules of English, even if they don't make sense.
I expect there are millions of them out there, so rather than re-inventing the wheel, I'm hoping you know of a source for one, or a resource that will give me enough information that I don't have to hunt down my rusty English skills.
You're looking for an implementation of markov chains for English sentences.
A quick Google search for "markov chain sentence generator" returned:
http://www.jwz.org/dadadodo/
http://code.google.com/p/lorem-ipsum-generator/
http://kartoffelsalad.googlecode.com/svn-history/r9/trunk/lib/markov.py
I know this is an old question, but as I found it via Google I think it's worth mentioning something like Context Free Grammars in addition to Markov Chains.
My understanding of Markov Chains is that they create the "next" item probabilistically only according to what the "current" item is. Perhaps I'm mistaken, but I don't see how this would ensure that the result would follow grammatical rules.
For instance, I usually see Markov Chains suggested as a way of creating "English sounding" words. If you create a Markov chain using letters from a dataset of English words, the output would be a word that sounds English, but it wouldn't necessarily be a real word. I believe the same would be true of sentences- you can generate something which may sound ok, but it may not be a grammatically correct sentence.
Context Free Grammars (or possibly also Regular Grammars?) might be a better candidate, since they generate sentences according to a defined ruleset. And it would be easy to populate it with your own words, as the original question requests. The downside is that you need to define these rules yourself, rather than relying on a dataset. It's been a long time since I've used a grammar to generate an English sentence, so I don't remember how hard it was to get good / varied responses.
You might be able to use/modify part of the CS Paper Generator.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
One of my tasks is to analyze source code. The chunks that usually takes the most time to be understood are those where the developer uses confusing terms for variables or methods, like...
myfavoritething=json.dumps(url) (what is this variable? what is the purpose? apparently the user is encoding the url for some dark purpose...)
public void getName(values, result) { ... (this was not a getter, the function has no return value, but it calculates a result from the input values)
POST /loadService (which does not load something, just returns a lot of variables)
Given that this is a quite common issue, is there a technical term for this coding-style bad habit?
Bad naming is considered to be a so called code smell. The term was first coined by Kent Beck.
In his book Refactoring: Improving the Design of Existing Code, Second Edition Martin Fowler describes patterns that help with refactoring specific flaws in code.
He also dedicates a chapter to code smells which are closely related to refactoring.
Getting back to your question:
Given that this is a quite common issue, is there a technical term for this coding-style bad habit?
As I consider this book as one of the top sources concerning this topic I would say the term that comes closest to what you are referring to is the Mysterious Name code smell as described in Fowlers book:
Mysterious Name
Puzzling over some text to understand what’s going on is a great thing if you’re reading a detective , but not when you’re reading code. We may fantasize about being International Men of Mystery, but our code needs to be mundane and clear. One of the most important parts of clear code is good names, so we put a lot of thought into naming functions, modules, variables, classes, so they clearly communicate what they do and how to use them.
Finding good names often is hard and if you are looking for some good advice in the right direction I recommend the book Clean Code by Uncle Bob. He dedicates a whole chapter (see Chapter 2 - Meaningful Names) to this topic.
There are few naming conventions to be followed by every developer to improve readability,maintainablity etc.
The above example given by you are called bad practice
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I am looking for an algorithm which would help me classify/group similar words (e.g., "Amazon.com" is similar to "Amazon" or "Amz" or "Amzon"). Levenshtein is a commonly suggested algorithm to use, but there are others like Jaro Winkler and such (for example, this is the Python library with a few word similarity metrics)
I'm wondering if those, who have done similar word aggregation/grouping, might have more effective suggestions. Thank you!
I have done something like this. I used Levenshtein with a lot of heuristics.
You should really look at the data and try to figure out what works best for you. Jaro Winkler works well for names. If you try to use it for md5 ids you're going to have a bad time.
If your strings are naturally very close to each other both approaches might not have insufficient granularity to help you, or you might need some more information from external sources.
In conclusion, try to setup some sandbox environment and try running different algorithms through the data and see which one works better. You can also look at the mistakes each algorithm makes and see if a) you can live with it or b) you can fix it easily.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
We have a user, with a music library of 100 songs. Out of those he loves 20, he hates 10 and there are 5 he neither hates nor loves. He never listened to the remaining 65.
Question: What kind of algorithm(s) is/are used to scan the remaining 65 songs and find out music the user will like?
Do some research on a product called MusicIP, it had some very clever algorithm fingerprinting technology. It converted the track to WAV and then created a fingerprint, then some clever magic to match songs that were similar.
To suggest new unfamiliar content to a user, the general approach is to use machine learning, specifically collaborative filtering, which is often used for recommender systems. The idea is to use the knowledge of the crowd, and finds people (or groups) that have similar taste to yours, and recommend new items that they tend to like.
An alternative is creating a classification algorithm for like/dislike, but that might require extracting features from each song that will describe the essense of the problem, and that's usually not trivial at all.
Some classification algorithms you might want to try are SVM, Naive Bayes, neural networks, Decision trees and more. The real challenge, as I mentioned would be to find the right features for the problem.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Let's say you wanted to compare essays by students and see if one of those essays was plagiarized. How would you go about this in a naive manner (i.e. not too complicated approach)? Of course, there are simple ways like comparing the words used in the essays, and complicated ways like using compressing functions, but what are some other ways to check plagiarism without too much complexity/theory?
There are several papers giving several approaches, I recommend reading this
The paper shows an algorithm based on an index structure
built over the entire file collection.
So they say their algorithm can be used to find similar code fragments in a large software system. Before the index is built, all the files in the
collection are tokenized. This is a simple parsing problem, and can be solved in
linear time. For each of the N files in the collection, The output of the tokenizer
for a file F_i is a string of n_i tokens.
here is other paper you could read
Other good algorithm is a scam based algorithm that consists on detecting plagiarism by making comparison on a set of words that are common between test document
and registered document. Our plagiarism detection system, like many Information Retrieval systems, is evaluated with metrics of precision and recall.
You could take a look at Dick Grune's similarity comparator, which claims to work on natural language texts as well (I've only tried it on software). The algorithms are described as well. (By the way, his book on parsing is really good, in my opinion.)
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
In a big corp, they often ask developers to fill in a matrix of what skills they have at what level. It's generally seen as a bit of a pain but is it actually useful, or another way for bureaucrats to try and reduce developers to a bunch of numbers on a spreadsheet?
Skills matrix are only partially helpful, they are good at giving you a general picture of your current "experience".
However these skills matrix does not include the most important aspect, the ability to learn.
This is the most important skill in IT in my view. And everyone learns at different speeds.
Eg. Throwing guy A into a new technology stack, and how long before he/she is productive?
Since IT/software development is a very wide field I regard skill sheets as quite useful. I used to be a Linux expert and my skill sheet reflected that. Then I shifted into iOS/Mac development and my now-employer asked me to fill out a skill sheet tuned to Mac... and I immediately noticed that I was novice in this field back then ;-) Vice versa, they were able to see whether I can fit into the company and where (in which team).
So of course they can be harmful if you lack the skills, but I think they make choices for employers easier (and I regard a big skill sheet in my CV as the most important part of the CV, even more so than the list of projects done).
The usefulness is totally dependent on what is being assessed. I work in an insurance company and this was done for all staff here. There was no category that I fit into and all the criteria were irrelevant.
I can see the benefit of assessing relevant criteria, it can identify weaknesses and target training, but those criteria need to be defined by someone who knows what you might not know.
Most of all, don't berate the bureaucrat for simplifying a complex object into a manageable set of information. As a programmer that's what you should be doing every day.
I think it is appropiate on big corps, but for small and specialized consultancies I would make a personal interview.
In big corporations if you dont fit in one place you may fit in other... in small teams I rather do personal assessment .