It's kind of interesting that pi's decimal representation never ends and never settles into a permanent repeating pattern. Meaning it's highly possible that pi contains every possible combination of numbers.
This guy calculated 5 trillions 5x(10^12) numbers of pi :D
http://www.numberworld.org/misc_runs/pi-5t/details.html
From the internet: "Converted into ASCII text, somewhere in that infinite string of digits is the name of every person you will ever love, the date, time and manner of your death, and the answers to all the great questions of the universe."
Wondering if somebody has already converted and analyzed the resulting string for known sequences of letters (words/sentences)?
Check out this page: http://pi.nersc.gov/.
It allows you to search for both character strings and hexadecimal sequences. Note that this search engine only has indexed the first 4 billion decimals of pi, and uses a formula for arbitrarily positioned binary or hexadecimal digits after those indexed.
The idea that Pi contains everything ever is a nice idea, but if it's correct, that means there is also an infinite amount of false things about everything ever. For example, if Pi contains a list of all the people you will ever love, then it will also have a list of people that seems that it is a list of people you will love, but in reality it's just a mix of names in a pattern that makes it look legit.
Following the same idea, the date, time, and manner of your death could also be "falsified". For example, let's say you are a man named Jason Delara, and you die at the age of 83 at 11:35 PM in your sleep. In Pi somewhere it can say in ASCII text "Jason Delara will die at age 83, 11:35 PM, passed in his sleep." It would also say somewhere else that "Jason Delara will die at age 35, at 6:00 AM, passed in a car accident." There could be an "infinite" amount of these false predictions.
There's also the fact that, if following the idea from above, all but one the answers to one of the great questions of the universe in that digit are wrong, even if many of the answers make sense. I've thought about this a lot, and I thought "What if there's part of the digit that states which facts are correct and which are not?" The answer is "Then there is an infinite amount of false lists in the digit claiming to do the same as the real list." In short, it would be pointless to convert Pi to ASCII text to try and figure everything out.
I know I'm a little late the party, but I wrote this for anybody who comes here looking for the answers to the universe in an endless, non-repeating decimal.
It is massively convenient that pi is an irrational number we're still finding digits for as if you can't find what you want in the sequence then by definition it just happens to be later on.
As for it containing hidden information - if you create any random sequence long enough, you'll be able to create simple words from the resulting output.
Conspiracy theorists just love to see patterns where there are none. They forget the other noise and are endlessly fascinated by mere coincidences.
Would just like to provide further context this question. Yes, the point is that PI goes on infinitely. That means there are endless possibilities for sentence structure and letter combination. This means every single combination of letters will happen and is happening in PI. So technically, everything in PI could apply to everything in the observable world around us.
Apologies for the non-descriptive question; if you can think of a better one, I'm all ears.
I'm writing some Perl to implement an algorithm and the code I have smells fishy. Since I don't have a CS background, I don't have a lot of knowledge of standard algorithms in my back pocket, but this seems like something that it might be.
Let me describe what I'm doing by way of metaphor:
You have a conveyor belt of oranges. The oranges pass you one by one. You also have an unlimited supply of flat-packed boxes.
For each orange, check it. If it is rotten, dispose of it
If it is good, put it in a box. If you don't have a box, grab a new one and construct it.
If the box has 10 oranges in it, close it up and put it on a pallet. Do not construct a new one.
Repeat until you have no more oranges
If you have a constructed box with some oranges in it, close it up and put it on a pallet
So, we have an algorithm for processing items in a list, if they meet some criteria, they should be added to a structure which, when it meets some other criteria, should be 'closed out'. Also, once the list has been processed, if there's an 'open' structure, it should be 'closed out' as well.
Naively, I assume that the algorithm consists of a loop acting over the list, with a conditional to see if the list element belongs in the structure and a conditional to see if the structure needs to be 'closed'.
Outside the loop, there would be one more conditional to close any outstanding structures.
So, here are my questions:
Is this a description of a well-known algorithm? If so, does it have a name?
Is there an effective way to coalesce the 'closing out the box' activity into a single place, as opposed to once inside the loop and once outside of the loop?
I tagged this as 'Perl' because Perlish approaches are of interest, but I'd be interested to hear of any other languages that have neat solutions to this.
It's a nice fit with a functional approach - you're iterating over a stream of Oranges, testing, grouping and operating on them. In Scala, it would be something like:
val oranges:Stream[Oranges] = ... // generate a stream of Oranges
oranges.filter(_.isNotRotten).grouped(10).foreach{ o => {(new Box).fillBox(o)}}
(grouped does the right thing with the partial box at the end)
There's probably Perl equivalents.
Is there an effective way to coalesce the 'closing out the box' activity into a single place,
as opposed to once inside the loop and once outside of the loop?
Yes. Simply add "... or there are no more oranges" to the "does the structure need to be closed" function. The easiest way of doing this is a do/while construct (technically speaking it's NOT a loop in Perl, though it looks like one):
my $current_container;
my $more_objects;
do {
my $object = get_next_object(); # Easiest implementation returns undef if no more
$more_objects = more_objects($object) # Easiest to implement as "defined $object"
if (!$more_objects || can_not_pack_more($current_container) {
close_container($current_container);
$current_container = open_container() if $more_objects;
}
pack($object, $current_container) if $more_objects;
} while ($more_objects);
IMHO, this doesn't really win you anything if the close_container() is encapsulated into a method - there's no major technical or code quality cost to calling it both inside and outside the loop. Actually, I'm strongly of the opinion that a convoluted workaround like I presented above is WORSE code quality wise than a straightforward:
my $current_container;
while (my $more_objects = more_objects(my $object = get_next_object())) {
if (can_not_pack_more($current_container)) { # false on undef
close_container($current_container);
}
$current_container = open_container_if_closed($current_container); # if defined
pack($object, $current_container);
}
close_container($current_container);
It seems like a bit over-complicated for the problem you are describing, but it sounds theoretically close to Petri Nets. check Petri Nets on wikipedia
A perl implementation can be found here
I hope this will help you,
Jerome Wagner
I don't think there's a name for this algorithm. For a straight-forward implementation you'll need two tests: one to detect a full box while in the processing loop and one after the loop to detect a partially full box. The "closing the box" logic can be made into a subroutine to avoid duplicating it. A functional approach could provide a way around that:
use List::MoreUtils qw(part natatime);
my ($good, $bad) = part { $_->is_rotten() } #oranges;
$_->dispose() foreach #$bad;
my $it = natatime 10, #$good;
while (my #batch = $it->()) {
my $box = Box->new();
$box->add(#batch);
$box->close();
$box->stack();
}
When looking at algorithms, the mainstream CS ones tend to handle very complex situations, or employ very complex approaches (look up NP-Complete for example). Moreover, the algorithms tend to focus on optimization. How can this system be more efficient? How can I use less steps in this production schedule? What is the most amount of foos that I can fit in my bar? etc.
An example of a complex approach in my opinion is quick sort because the recursion is pretty genius. I know it is standard, but I really like it. If you like algorithms, then check out the Simplex Algorithm - it has been very influential.
An example of a complex situation would be if you had oranges that go in, get sorted into 5 orange piles, then went to 5 different places to be peeled, then all came back with another path of oranges to total 10 orange piles, then each orange was individually sliced, and boxed in groups of exactly 2 pounds.
Back to your example. Your example is a simplified version of a flow network. Instead of having so many side paths and options, there is only one path with a capacity of one orange at a time. Of the flow network algorithms, the Ford-Fulkerson algorithm is probably the most influential.
So, you can probably fit one of these algorithms into the example posed, but it would be through a simplification process. Basically there is not enough complexity here to need any optimization. And there is no risk of running at an inefficient time complexity so there is no need to be running the "perfect approach".
The approach you detailed is going to be fine here, and the accepted answer above does a good job suggesting an actual functional solution to the defined problem. I just wanted to add my 2 cents with regards to algorithms.
As all developers do, we constantly deal with some kind of identifiers as part of our daily work. Most of the time, it's about bugs or support tickets. Our software, upon detecting a bug, creates a package that has a name formatted from a timestamp and a version number, which is a cheap way of creating reasonably unique identifiers to avoid mixing packages up. Example: "Bug Report 20101214 174856 6.4b2".
My brain just isn't that good at remembering numbers. What I would love to have is a simple way of generating alpha-numeric identifiers that are easy to remember.
It takes about 5 minutes to whip up an algorithm like the following in python, which produces halfway usable results:
import random
vowels = 'aeiuy' # 0 is confusing
consonants = 'bcdfghjklmnpqrstvwxz'
numbers = '0123456789'
random.seed()
for i in range(30):
chars = list()
chars.append(random.choice(consonants))
chars.append(random.choice(vowels))
chars.append(random.choice(consonants + numbers))
chars.append(random.choice(vowels))
chars.append(random.choice(vowels))
chars.append(random.choice(consonants))
print ''.join(chars)
The results look like this:
re1ean
meseux
le1ayl
kuteef
neluaq
tyliyd
ki5ias
This is already quite good, but I feel it is still easy to forget how they are spelled exactly, so that if you walk over to a colleagues desk and want to look one of those up, there's still potential for difficulty.
I know of algorithms that perform trigram analysis on text (say you feed them a whole book in German) and that can generate strings that look and feel like German words and are thus easier to handle generally. This requires lots of data, though, and makes it slightly less suitable for embedding in an application just for this purpose.
Do you know of any published algorithms that solve this problem?
Thanks!
Carl
As you said, your sample is quite good. But if you want random identifiers that can easily be remembered, then you should not mix alphanumeric and numeric characters. Instead, you could opt to postfix an alphanumeric string with a couple of digits.
Also, in your sample You wisely excluded 'o', but forgot about the 'l', which you can easily confuse with '1'. I suggest you remove the 'l' as wel. ;-)
I am not sure that this answers your question, but maybe think about how many unique bug report number you need.
Simply using a four letter uppercase alphanumeric key like "BX-3D", you can have 36^4 = 1.7 million bug reports.
Edit: I just saw your sample. Maybe the results could be considerably improved if you used syllables instead of consonants and vowels.
I'd like to find good way to find some (let it be two) sentences in some text. What will be better - use regexp or split-method? Your ideas?
As requested by Jeremy Stein - there are some examples
Examples:
Input:
The first thing to do is to create the Comment model. We’ll create this in the normal way, but with one small difference. If we were just creating comments for an Article we’d have an integer field called article_id in the model to store the foreign key, but in this case we’re going to need something more abstract.
First two sentences:
The first thing to do is to create the Comment model. We’ll create this in the normal way, but with one small difference.
Input:
Mr. T is one mean dude. I'd hate to get in a fight with him.
First two sentences:
Mr. T is one mean dude. I'd hate to get in a fight with him.
Input:
The D.C. Sniper was executed was executed by lethal injection at a Virginia prison. Death was pronounced at 9:11 p.m. ET.
First two sentences:
The D.C. Sniper was executed was executed by lethal injection at a Virginia prison. Death was pronounced at 9:11 p.m. ET.
Input:
In her concluding remarks, the opposing attorney said that "...in this and so many other instances, two wrongs won’t make a right." The jury seemed to agree.
First two sentences:
In her concluding remarks, the opposing attorney said that "...in this and so many other instances, two wrongs won’t make a right." The jury seemed to agree.
Guys, as you can see - it's not so easy to determine two sentences from text. :(
As you've noticed, sentence tokenizing is a bit tricker than it first might seem. So you may as well take advantage of existing solutions. The Punkt sentence tokenizing algorithm is popular in NLP, and there is a good implementation in the Python Natural Language Toolkit which they describe the use of here. They also describe another approach here.
There's probably other implementations around, or you could also read the original paper describing the Punkt algorithm: Kiss, Tibor and Strunk, Jan (2006): Unsupervised Multilingual Sentence Boundary Detection. Computational Linguistics 32: 485-525.
You can also read another Stack Overflow question about sentence tokenizing here.
your_string = "First sentence. Second sentence. Third sentence"
sentences = your_string.split(".")
=> ["First sentence", " Second sentence", " Third sentence"]
No need to make simple code complicated.
Edit: Now that you've clarified that the real input is more complex that your initial example you should disregard this answer as it doesn't consider edge cases. An initial look at NLP should show you what you're getting into though.
Some of the edge cases that I've seen in the past to be a bit complicated are:
Dates: Some regions use dd.mm.yyyy
Quotes: While he was sighing — "Whatever, do it. Now. And by the way...". This was enough.
Units: He was going at 138 km. while driving on the freeway.
If you plan to parse these texts you should stay away from splits or regular expressions.
This will usually match sentences.
/\S(?:(?![.?!]+\s).)*[.?!]+(?=\s|$)/m
For your example of two sentences, take the first two matches.
irb(main):005:0> a = "The first sentence. The second sentence. And the third"
irb(main):006:0> a.split(".")[0...2]
=> ["The first sentence", " The second sentence"]
irb(main):007:0>
EDIT: here's how you handle the "This is a sentence ...... and another . And yet another ..." case :
irb(main):001:0> a = "This is the first sentence ....... And the second. Let's not forget the third"
=> "This is the first sentence ....... And the second. Let's not forget the thir
d"
irb(main):002:0> a.split(/\.+/)
=> ["This is the first sentence ", " And the second", " Let's not forget the thi rd"]
And you can apply the same range operator ... to extract the first 2.
You will find tips and software links on the sentence boundary detection Wikipedia page.
If you know what sentences to search, Regex should do well searching for
((YOUR SENTENCE HERE)|(YOUR OTHER SENTENCE)){1}
Split would probably use up quite a lot of memory, as it also saves the things you don't need (the whole text that's not your sentence) as Regex only saves the sentence you searched (if it finds it, of course)
If you're segmenting a piece of text into sentences, then what you want to do is begin by determining which punction marks can separate sentences. In general, this is !, ? and . (but if all you care about is a . for the texts your processing, then just go with that).
Now since these can appear inside quotations, or as parts of abbreviations, what you want to do is find each occurrence of these punctuation marks and run some sort of machine learning classifier to determine whether that occurance starts a new sentence, or whether it does something else. This involves training data and a properly-constructed classifier. And it won't be 100% accurate, because there's probably no way to be 100% accurate.
I suggest looking in the literature for sentence segmentation techniques, and have a look at the various natural language processing toolkits that are out there. I haven't really found one for Ruby yet, but I happen to like OpenNLP (which is in Java).
Background
While at the Gym the other day, I was working with my combination lock, and realized something that would be useful to me as a programmer. To wit, my combination is three seperate sets of numbers that either sound alike, or have some other relation that makes them easy to remember. For instance, 5-15-25, 7-17-2, 6-24-5. These examples seem easy to remember.
Question
How would I implement something similar for passwords? Yes, they ought to be hard to crack, but they also should be easy for the end user to remember. Combination Locks do that with a mix of numbers that have similar sounds, and with numbers that have similar properties (7-17-23: All Prime, 17 rolls right off the tongue after 7, and 23 is another prime, and is (out of that set), the 'hard' one to remember).
Criteria
The Password should be easy to remember. Dog!Wolf is easy to remember, but once an attacker knows that your website gives out that combination, it makes it infinitely easier to check.
The words or letters should mostly follow the same sounds (for the most part).
At least 8 letters
Not use !##$%^&*();'{}_+<>?,./ These punctuation marks, while appropriate for 'hard' passwords, do not have an 'easy to remember' sound.
Resources
This question is language-agnostic, but if there's a specific implementation for C#, I'd be glad to hear of it.
Update
A few users have said that 'this is bad password security'. Don't assume that this is for a website. This could just be for me to make an application for myself that generates passwords according to these rules. Here's an example.
The letters
A-C-C-L-I-M-O-P 'flow', and they happen to be two
regular words put together
(Acclimate and Mop). Further,
when a user says these letters, or
says them as a word, it's an actual
word for them. Easy to remember, but
hard to crack (dictionary attack,
obviously).
This question has a two-part goal:
Construct Passwords from letters that sound similar (using alliteration) or
Construct Passwords that mesh common words similarly to produce a third set of letters that is not in a dictionary.
You might want to look at:
The pronouncable password generation algorithm used by apg and explained in FIPS-181
Koremutake
First of all make sure the password is long. Consider using a "pass-phrase" instead of a single "pass-word". Breaking pass-phrases like "Dogs and wolves hate each other." is very hard yet they are quite easy to remember.
Some sites may also give you an advice which may be helpful, like Strong passwords: How to create and use them (linked from Password checker, which is a useful tool on its own).
Also, instead of trying to create easy to remember password, in some cases a much better alternative is to avoid remembering the password at all by using (and educating your users to use) a good password management utility (see What is your favourite password storage tool?) - when doing this, the only part left is to create a hard to crack password, which is easy (any long enough random sentence will do).
I am surprised no one has mentioned the Multics algorithm described at http://www.multicians.org/thvv/gpw.html , which is similar to the FIPS algorithm but based on trigraphs rather than digraphs. It produces output such as
ahmouryleg
thasylecta
tronicatic
terstabble
I have ported the code to python as well: http://pastebin.com/f6a10de7b
You could use Markov Chains to generate words that sounds like English(or any other language you want) but they are not actual words.
The question of easy to remember is really subjective, so I don't think you can write an algorithm like this that will be good for everyone.
And why use short passwords on web sites/computer applications instead of pass phrases? They are easy to remember but hard to crack.
After many years, I have decided to use the first letter of words in a passphrase. It's impossible to crack, versatile for length and restrictions like "you must have a digit", and hard to make errors.
This works by creating a phrase. A crazy fun vivid topic is useful!
"Stack Overflow aliens landed without using rockets or wheels".
Take the first letter, your password is "soalwurow"
You can type this quickly and accurately since you're not remembering letter by letter, you're just speaking a sentence inside your head.
I also like having words alternate from the left and right side of the keyboard, it gives you a fractionally faster typing speed and more pleasing rhythm. Notice in my example, your hands alternate left-right-left-right.
I have a few times used a following algorithm:
Put all lowercase vowels (from a-z) into an array Vowels
Put all lowercase consonants (from a-z) into another array Consonants
Create a third array Pairs of two letters in such a way, that you create all possible pairs of letters between Vowels and Consonants ("ab", "ba", "ac", etc...)
Randomly pick 3-5 elements from Pairs and concatenate them together as string Password
Randomly pick true or false
If true, remove the last letter from Password
If false, don't do anything
Substitute 2-4 randomly chosen characters in Password with its uppercase equivalent
Substitute 2-4 randomly chosen characters in Password with a randomly chosen integer 0-9
Voilá - now you should have a password of length between 5 and 10 characters, with upper and lower case alphanumeric characters. Having vowels and consonants take turns frequently make them semi-pronounceable and thus easier to remember.
FWIW I quite like jumbling word syllables for an easy but essentially random password. Take "Bongo" for example as a random word. Swap the syllables you get "Gobong". Swap the o's for zeros on top (or some other common substitution) and you've got an essentially random character sequence with some trail that helps you remember it.
Now how you pick out syllables programmatically - that's a whole other question!
When you generate a password for the user and send it by email, the first thing you should do when they first login if force them to change their password. Passwords created by the system do not need to be easy to remember because they should only be needed once.
Having easy to remember, hard to guess passwords is a useful concept for your users but is not one that the system should in some manner enforce. Suppose you send a password to your user's gmail account and the user doesn't change the password after logging in. If the password to the gmail account is compromised, then the password to your system is compromised.
So generating easy to remember passwords for your users is not helpful if they have to change the password immediately. And if they aren't changing it immediately, you have other problems.
I prefer giving users a "hard" password, requiring them to change it on the first use, and giving them guidance on how to construct a good, long pass phrase. I would also couple this with reasonable password complexity requirements (8+ characters, upper/lower case mix, and punctuation or digits). My rationale for this is that people are much more likely to remember something that they choose themselves and less likely to write it down somewhere if they can remember it.
A spin on the 'passphrase' idea is to take a phrase and write the first letters of each word in the phrase. E.g.
"A specter is haunting Europe - the specter of communism."
Becomes
asihe-tsoc
If the phrase happens to have punctation, such as !, ?, etc - might as well shove it in there. Same goes for numbers, or just substitute letters, or add relevant numbers to the end. E.g. Karl Marx (who said this quote) died in 1883, so why not 'asihe-tsoc83'?
I'm sure a creative brute-force attack could capitalise on the statistical properties of such a password, but it's still orders of magnitude more secure than a dictionary attack.
Another great approach is just to make up ridiculous words, e.g. 'Barangamop'. After using it a few times you will commit it to memory, but it's hard to brute-force. Append some numbers or punctuation for added security, e.g. '386Barangamop!'
Here's part 2 of your idea prototyped in a shell script. It takes 4, 5 and 6 letter words (roughly 50,000) from the Unix dictionary file on your computer, and concatenate those words on the first character.
#! /bin/bash
RANDOM=$$
WORDSFILE=./simple-words
DICTFILE=/usr/share/dict/words
grep -ve '[^a-z]' ${DICTFILE} | grep -Ee '^.{4,6}$' > ${WORDSFILE}
N_WORDS=$(wc -l < ${WORDSFILE})
for i in $(seq 1 20); do
password=""
while [ ! "${#password}" -ge 8 ] || grep -qe"^${password}$" ${DICTFILE}; do
while [ -z "${password}" ]; do
password="$(sed -ne "$(( (150 * $RANDOM) % $N_WORDS + 1))p" ${WORDSFILE})"
builtfrom="${password}"
done
word="$(sort -R ${WORDSFILE} | grep -m 1 -e "^..*${password:0:1}")"
builtfrom="${word} ${builtfrom}"
password="${word%${password:0:1}*}${password}"
done
echo "${password} (${builtfrom})"
done
Like most password generators, I cheat by outputting them in sets of twenties. This is often defended in terms of "security" (someone looking over your shoulder), but really its just a hack to let the user just pick the friendliest password.
I found the 4-to-6 letter words from the dictionary file still containing obscure words.
A better source for words would be a written document. I copied all the words on this page and pasted them into a text document, and then ran the following set of commands to get the actual english words.
perl -pe 's/[^a-z]+/\n/gi' ./624425.txt | tr A-Z a-z | sort -u > ./words
ispell -l ./words | grep -Fvf - ./words > ./simple-words
Then I used these 500 or so very simple words from this page to generate the following passwords with the shell script -- the script parenthetically shows the words that make up a password.
backgroundied (background died)
soundecrazy (sounding decided crazy)
aboupper (about upper)
commusers (community users)
reprogrammer (replacing programmer)
alliterafter (alliteration after)
actualetter (actual letter)
statisticrhythm (statistical crazy rhythm)
othereplacing (other replacing)
enjumbling (enjoying jumbling)
feedbacombination (feedback combination)
rinstead (right instead)
unbelievabut (unbelievably but)
createdogso (created dogs so)
apphours (applications phrase hours)
chainsoftwas (chains software was)
compupper (computer upper)
withomepage (without homepage)
welcomputer (welcome computer)
choosome (choose some)
Some of the results in there are winners.
The prototype shows it can probably be done, but the intelligence you require about alliteration or syllable information requires a better data source than just words. You'd need pronunciation information. Also, I've shown you probably want a database of good simple words to choose from, and not all words, to better satisfy your memorable-password requirement.
Generating a single password the first time and every time -- something you need for the Web -- will take both a better data source and more sophistication. Using a better programming language than Bash with text files and using a database could get this to work instantaneously. Using a database system you could use the SOUNDEX algorithm, or some such.
Neat idea. Good luck.
I'm completely with rjh. The advantage of just using the starting letters of a pass-phrase is that it looks random, which makes it damn hard to remember if you don't know the phrase behind it, in case Eve looks over your shoulder as you type the password.
OTOH, if she sees you type about 8 characters, among which 's' twice, and then 'o' and 'r' she may guess it correctly the first time.
Forcing the use of at least one digit doesn't really help; you simply know that it will be "pa55word" or "passw0rd".
Song lyrics are an inexhaustible source of pass-phrases.
"But I should have known this right from the start"
becomes "bishktrfts". 10 letters, even only lowercase gives you 10^15 combinations, which is a lot, especially since there's no shortcut for cracking it. (At 1 million combinations a second it takes 30 years to test all 10^15 combinations.)
As an extra (in case Eve knows you're a Police fan), you could swap e.g. the 2nd and 3rd letter, or take the second letter of the third word. Endless possibilities.
System generated passwords are a bad idea for anything other than internal service accounts or temporary resets (etc).
You should always use your own "passphrases" that are easy for you to remember but that are almost impossible to guess or brute force. For example the password for my old university account was.
Here to study again!
That is 20 characters using upper and lower case with punctuation. This is an unbelievably strong password and there is no piece of software that could generate a more secure one that is easier to remember for me.
Take look at the gpw tool. The package is also available in Debian/Ubuntu repositories.
One way to generate passwords that 'sound like' words would be to use a markov chain. An n-degree markov chain is basically a large set of n-tuples that appear in your input corpus, along with their frequency. For example, "aardvark", with a 2nd-degree markov chain, would generate the tuples (a, a, 1), (a, r, 2), (r, d, 1), (d, v, 1), (v, a, 1), (r, k, 1). Optionally, you can also include 'virtual' start-word and end-word tokens.
In order to create a useful markov chain for your purposes, you would feed in a large corpus of english language data - there are many available, including, for example, Project Gutenburg - to generate a set of records as outlined above. For generating natural language words or sentences that at least mostly follow rules of grammar or composition, a 3rd degree markov chain is usually sufficient.
Then, to generate a password, you pick a random 'starting' tuple from the set, weighted by its frequency, and output the first letter. Then, repeatedly select at random (again weighted by frequency) a 'next' tuple - that is, one that starts with the same letters that your current one ends with, and has only one letter different. Using the example above, suppose I start at (a, a, 1), and output 'a'. My only next choice is (a, r, 2), so I output another 'a'. Now, I can choose either (r, d, 1) or (r, k, 1), so I pick one at random based on their frequency of occurrence. Suppose I pick (r, k, 1) - I output 'r'. This process continues until you reach an end-of-word marker, or decide to stop independently (since most markov chains form a cyclic graph, you can potentially never finish generating if you don't apply an artificial length limitation).
At a word level (eg, each element of the tuple is a word), this technique is used by some 'conversation bots' to generate sensible-seeming nonsense sentences. It's also used by spammers to try and evade spam filters. At a letter level, as outlined above, it can be used to generate nonsense words, in this case for passwords.
One drawback: If your input corpus doesn't contain anything other than letters, nor will your output phrases, so they won't pass most 'secure' password requirements. You may want to apply some post-processing to substitute some characters for numbers or symbols.
edit: After answering, I realized that this is in no way phonetically memorable. Leaving the answer anyway b/c I find it interesting. /edit
Old thread, I know... but it's worth a shot.
1) I'd probably build the largest dictionary you can ammass. Arrange them into buckets by part of speech.
2)Then, build a grammar that can make several types of sentences. "Type" of sentence is determined by permutations of parts of speech.
3)Randomly (or as close to random as possible), pick a type of sentence. What is returned is a pattern with placeholders for parts of speech (n-v-n would be noun-verb-noun)
3)Pick words at random in each part of speech bucket to stand in for the placeholders. Fill them in. (The example above might become something like car-ate-bicycle.)
4)randomly scan each character deciding whether or not you want to replace it with either a similar-sounding character (or set of characters), or a look-alike. This is the hardest step of the problem.
5) resultant password would be something like kaR#tebyCICle
6) laugh at humorous results like the above that look like "karate bicycle"
I would really love to see someone implement passwords with control characters like "<Ctrl>+N" or even combo characters like "A+C" at the same time. Converting this to some binary equivalent would, IMHO, make password requirements much easier to remember, faster to type, and harder to crack (MANY more combinations to check).