Suggested variable system and procedures to incorporate this LEET table into my Free Pascal program - pascal

I code using Free Pascal and Lazarus.
I want to incorporate the LEET Table seen here (http://en.wikipedia.org/wiki/Leet#Orthography) into a new program, but I'm unsure of the best way to do so. Should I use array structures (one for each letter of the alphabet) or 'Set Types' for each letter or records for each letter? Any suggestions of how to implement an idea would be appreciated.
The aim of the program is to open and read a text file line by line (I've got this done already) using an OpenDialog and it will then say "For each word, if it finds the letters 'E', 'O' or 'I', replace them with values from the table for the letter found"
e.g. if strLineFromFile contains letter 'E', replace it with 3, £, + &....and so on
repeat
...
strLineFromFile(Readln(SourceFile));
Look for letters E, I and O in strLineFromFile
Lookup LEET Table - Switch chars
until EOF(SourceFile);
I'm open to suggestions on the best way to optimise this process - I'm not expecting pure code but pointers as to perhaps what function\procedures would be best and what variable system to use for ptimum performance.
Note : I'm still learning so nothing too complex please!
Ted

Sets are not ordered, so they don't make sense here.
An array['a'..'z'] of array of string. The first array level is all letters in the input, the second array allows for various translations of the same input-letter.

Related

How do I write a regex for Excel cell range?

I need to validate that something is an Excel cell range in Ruby, i.e: "A4:A6". By looking at it, the requirement I am looking for is:
<Alphabetical, Capitalised><Integer>:<Integer><Alphabetical, Capitalised>
I am not sure how to form a RegExp for this.
I would appreciate a small explanation for a solution, as opposed to purely a solution.
A bonus would be to check that the range is restricted to within a row or column. I think this would be out of scope of Regular Expressions though.
I have tried /[A-Z]+[0-9]+:[A-Z]+[0-9]+/ this works but allows extra characters on the ends.
This does not work because it allows extra's to be added on to the beginning or end:
"HELLOAA3:A7".match(/\A[A-Z]+[0-9]+:[A-Z]+[0-9]+\z/) also returns a match, but is more on the right track.
How would I limit the number range to 10000?
How would I limit the number of characters to 3?
This is my solution:
(?:(?:\'?(?:\[(?<wbook>.+)\])?(?<sheet>.+?)\'?!)?(?<colabs>\$)?(?<col>[a-zA-Z]+)(?<rowabs>\$)?(?<row>\d+)(?::(?<col2abs>\$)?(?<col2>[a-zA-Z]+)(?<row2abs>\$)?(?<row2>\d+))?|(?<name>[A-Za-z]+[A-Za-z\d]*))
It includes named ranges, but the R1C1 notation is not supported.
The pattern is written in perl compatible regex dialect (i.e. can also be used with C#), I'm not familiar with Ruby, so I can't tell the difference, but you may want to look here: What is the difference between Regex syntax in Ruby vs Perl?
This will do both: match Excel range and that they must be same row or column. Stub
^([A-Z]+)(\d+):(\1\d+|[A-Z]+\2)$
A4:A6 // ok
A5:B10 // not ok
B5:Z5 // ok
AZ100:B100hello // not ok
The magic here is the back-reference group:
([A-Z]+)(\d+) -- column is in capture group 1, row in group 2
(\1\d+|[A-Z]+\2) -- the first column followed by any number; or
-- the first row preceded by any character

How to implement Siri/Cortana like functionality in commandline?

I would like to implement a small subset of siri/cortana like features in command line.
For e.g.
$ What is the sum of 100 and 1000
> Response: 1100
$ What is the product of 10 and 12
> Response: 120
The questions are predefined regular expressions. It needs to call the matching function in ruby.
Pattern: What is the sum of (\d)+ and (\d)+
Ruby method to call: sum(a,b)
Any pointers/suggestion is appreciated.
That sounds exactly like cucumber, maybe take a look and see if you can just use their classes to hack something together :) ?
You could do something like the following:
question = gets.chomp
/\A.*(sum |product |quotient |difference )\D+([0-9]+)\D+([0-9]+).*\z/.match question
send($1, $2.to_i, $3.to_i)
Quick explanation for anyone that may be new to matching in Ruby:
This gets a line of input from the command line and scans it for a function name (i.e. sum, product, etc) followed by a space and potentially some non-digit characters. Then, it looks for a first number (similarly followed by a space and 0 or more non-digit characters) and a second number followed by nothing or anything. The parentheses determine what gets assigned to the variables preceded by a $, i.e. the substring that matches the contents of the first set of parentheses gets assigned to $1.
Next, it calls the method whose name is the value of $1 with the arguments (casted to integers) found in $2 and $3.
Obviously, this isn't generalized at all--you're putting the method names in the regex, and it's taking a fixed number of arguments--but it'll hopefully be useful for getting you on the right track.

How do you autocomplete names containing spaces?

I am working on implementing an autocompletion script in javascript. However, some of the names are two word names with a space in the middle. What kind of algorithm can you use to deal with it. I am using a trie to store the names.
The only solutions I could come up with were just saying that two word names cannot be used (either run them together or put a dash in the middle). The other idea was to create a list of these kind of names and have a separate loop to check the input. The other and possibly best idea I have is to redesign it slightly and have categories for first and last names and then an extra name category. I was wondering if there was a better solution out there?
Edit: I realized I wasn't very clear on what I was asking. My problem isn't adding two word phrases to the trie, but returning them when someone is typing in a name. In the trie I split the first and last names so you can search by either. So if someone types in the first name and then a space, how would I tell if they are typing in the rest of the first name or if they are now typing in the last name.
Why not have the trie also include the names with spaces?
Once you have a list of candidates, split each of them on the space and show the first token...
Is there a reason you are rolling your own autocomplete script, instead of using a currently existing one, such as YUI autocomplete? (i.e. are you doing it just for fun?, etc.)
If you have a way to parse the two-word names, then just include spaces in your trie. But if you cannot determine what is a two-word name and what is two separate words, and your trie cannot be large enough to hold all two-word sequences, then you have a problem.
One simple way to solve this is to default to allowing two-word pairs, but if you have too much branching after the space, throw away that entire branch. This way, when the first word is predictive for the second, you'll get autocompletion, but when it could be any of a huge number of things, your trie will end at the end of a single word.
If you using multiline editor, i guess the best choice autocomplete items will be a word. So firstname, middlename and lastname must be parsed and add a lookup item.
For (one line) textbox use you can add whitespaces (and firstname + space + middlename + space + lastname pattern) in search criteria.

History of trailing comma in programming language grammars

Many programming languages allow trailing commas in their grammar following the last item in a list. Supposedly this was done to simplify automatic code generation, which is understandable.
As an example, the following is a perfectly legal array initialization in Java (JLS 10.6 Array Initializers):
int[] a = { 1, 2, 3, };
I'm curious if anyone knows which language was first to allow trailing commas such as these. Apparently C had it as far back as 1985.
Also, if anybody knows other grammar "peculiarities" of modern programming languages, I'd be very interested in hearing about those also. I read that Perl and Python for example are even more liberal in allowing trailing commas in other parts of their grammar.
I'm not an expert on the commas, but I know that standard Pascal was very persnickity about semi-colons being statement separators, not terminators. That meant you had to be very very careful about where you put one if you didn't want to get yelled at by the compiler.
Later Pascal-esque languages (C, Modula-2, Ada, etc.) had their standards written to accept the odd extra semicolon without behaving like you'd just peed in the cake mix.
I just found out that a g77 Fortran compiler has the -fugly-comma Ugly Null Arguments flag, though it's a bit different (and as the name implies, rather ugly).
The -fugly-comma option enables use of a single trailing comma to mean “pass an extra trailing null argument” in a list of actual arguments to an external procedure, and use of an empty list of arguments to such a procedure to mean “pass a single null argument”.
For example, CALL FOO(,) means “pass two null arguments”, rather than “pass one null argument”. Also, CALL BAR() means “pass one null argument”.
I'm not sure which version of the language this first appeared in, though.
[Does anybody know] other grammar "peculiarities" of modern programming languages?
One of my favorites, Modula-3, was designed in 1990 with Niklaus Wirth's blessing as the then-latest language in the "Pascal family". Does anyone else remember those awful fights about where semicolon should be a separator or a terminator? In Modula-3, the choice is yours! The EBNF for a sequence of statements is
stmt ::= BEGIN [stmt {; stmt} [;]] END
Similarly, when writing alternatives in a CASE statement, Modula-3 let you use the vertical bar | as either a separator or a prefix. So you could write
CASE c OF
| 'a', 'e', 'i', 'o', 'u' => RETURN Char.Vowel
| 'y' => RETURN Char.Semivowel
ELSE RETURN Char.Consonant
END
or you could leave off the initial bar, perhaps because you prefer to write OF in that position.
I think what I liked as much as the design itself was the designers' awareness that there was a religious war going on and their persistence in finding a way to support both sides.
Let the programmer choose!
P.S. Objective Caml allows permissive use of | in case expressions whereas the earlier and closely related dialect Standard ML does not. As a result, case expressions are often uglier in Standard ML code.
EDIT: After seeing T.E.D.'s answer I checked the Modula-2 grammar and he's correct, Modula-2 also supported semicolon as terminator, but through the device of the empty statement, which makes stuff like
x := x + 1;;;;;; RETURN x
legal. I suppose that's not a bad thing. Modula-2 didn't allow flexible use of the case separator |, however; that seems to have originated with Modula-3.
Something which has always galled me about C is that although it allows an extra trailing comma in an intializer list, it does not allow an extra trailing comma in an enumerator list (for defining the literals of an enumeration type). This little inconsistency has bitten me in the ass more times than I care to admit. And for no reason!

Putting spaces back into a string of text with unreliable space information

I need to parse some text from pdfs but the pdf formatting results in extremely unreliable spacing. The result is that I have to ignore the spaces and have a continuous stream of non-space characters.
Any suggestions on how to parse the string and put spaces back into the string by guessing?
I'm using ruby. Or should I say I'musingruby?
Edit: I've pulled the text out using pdf-reader. Some of the pdf files are nicely formatted and some are not. An example of text mixed with positioning:
.7aspe-5.5cts-715.1o0.6f-708.5f-0.4aces-721.4that-716.3are-720.0i-1.8mportant-716.3in-713.9soc-5.5i-1.8alcommunica6.6tion6.3.-711.6Althoug6.3h-708.1m-1.9od6.3els-709.3o6.4f-702.8f5.4ace-707.9proc6.6essing-708.2haveproposed-611.2ways-615.5to-614.7deal-613.2with-613.0these-613.9diff10.4erent-613.7tasks,-611.9it-617.1remainsunclear-448.0how-450.7these-443.2mechanisms-451.7might-446.7be-447.7implemented-447.2in-450.3visualOne-418.9model-418.8of-417.3human-416.4face-421.9processing-417.5proposes-422.7that-419.8informa-tion-584.5is-578.0processed-586.1in-583.1specialised-584.7modules-577.0(Breen-584.4et-582.9al.,-582.32002;Bruce-382.1and-384.0Y92.0oung,-380.21986;-379.2Haxby-379.9et-380.5al.,-
and if I print just string data (I added returns at the end of each line to keep it from
messing up the layout here:
'Distinctrepresentationsforfacialidentityandchangeableaspectsoffacesinthehumantemporal
lobeTimothyJ.Andrews*andMichaelP.EwbankDepartmentofPsychology,WolfsonResearchInstitute,
UniversityofDurham,UKReceived23December2003;revised26March2004;accepted27July2004Availab
leonline14October2004Theneuralsystemunderlyingfaceperceptionmustrepresenttheunchanging
featuresofafacethatspecifyidentity,aswellasthechangeableaspectsofafacethatfacilitates
ocialcommunication.However,thewayinformationaboutfacesisrepresentedinthebrainremainsc
ontroversial.Inthisstudy,weusedfMRadaptation(thereductioninfMRIactivitythatfollowsthe
repeatedpresentationofidenticalimages)toaskhowdifferentface-andobject-selectiveregionsofvisualcortexcontributetospecificaspectsoffaceperception'
The data is spit out by callbacks so if I print each string as it is returned it looks like this:
'The
-571.3
neural
-573.7
system
-577.4
underly
13.9
ing
-577.2
face
-573.0
perc
13.7
eption
-574.9
must
-572.1
repr
20.8
esent
-577.0
the
unchangin
14.4
g
-538.5
featur
16.5
es
-529.5
of
-536.6
a
-531.4
face
'
On examination it looks like the true spaces are large negative numbers < -300 and the false spaces are much smaller positive numbers. Thanks guys. Just getting to the point where i am asking the question clearly helped me answer it!
Hmmmm... I'd have to say that guessing is never a good idea. Looking at the problem root cause and solving that is the answer, anything else is a kludge.
If the spacing is unreliable from the PDF, how is it unreliable? The PDF viewer needs to be able to reliably space the text so the data is there somewhere, you just need to find it.
EDIT following comment:
The idea of parsing the file using a dictionary (your only other option really, apart from randomly inserting spaces and hoping for the best) and inserting spaces at identified word boundaries (a real problem when dealing with punctuation, plurals that don't alter the base word i.e. plural, etc) would, I believe, be a much greater programming challenge than correctly parsing the PDF in the first place. After all, PDF is clearly defined whereas English is somewhat wooly.
Why not look down the route of existing solutions like ps2ascii in linux, call the function from your Ruby and pick up the result.
PDF doesn't only store spaces as space characters, but also uses layout commands for spacing (so it doesn't print a space, but moves the "pen" to the right). Perhaps you should have a look at the PDF reference (the big PDF on the bottom of the site), Chapter 9 "Text" should be what you're looking for.
EDIT: After reading your comment to Lazarus' answer, this doesn't seem to be what you're looking for. I think you should try to get a word list from somewhere and try to split your text using it. A good strategy would be to do that using recursion, because for example:
"meandyou"
The first word could be "me" or "mean", but if you try "mean", "dyou" doesn't make sense, so it will be "me", same for the next word that could be "a" or "an" or "and", only "and" makes sense.
If it were me I'd go back to the source PDFs and try a different method of extracting the text, such as iText (for Java) or maybe some kind of PDF-to-HTML to text conversion software method.

Resources