What means donna in cryptography? - algorithm

I've often see the word "donna" in some cryptography algorithm implementations. So what it means?
Example: poly1305-donna, curve25519-donna and etc.

The poly1305-donna implementation was written by Andrew Moon, who states the following on naming his project at GitHub:
I borrowed the idea for these from Adam Langley's curve25519-donna,
hence the name.
And Adam Langley, who is the author of curve25519-donna, runs a weblog about security and cryptography with his email address on (agl AT imperialviolet DOT org). Furthermore, he also has a Twitter account.
So if you are really interested, maybe just drop him a line. I'd suppose, it simply is his wife's name. Or maybe it is related to the term prima donna, who is the leading female singer in the opera, to emphasize the algorithm's supremacy? But again, just guessing.

Related

Is there a standard computer vocabulary for German? for Spanish?

I was given the task of coming up with shorter German words for the German version of our software.
It got me to thinking that there should be some sort of standard vocabulary for information technology somewhere. Like there "have to be" terms that most (if not all) German computer users use for what English-speakers call file, database, record, search, search terms, search hits, find and replace, delete, OCR ... you get the idea.
I found ISO 2382 on the ISO Web site, but it only seems to standardize English and French. Is there an equivalent standard for German? How about for Spanish, or for other languages?
I may suggest this book, although quite dated, was an attempt to come up with a set of standard computer terms for translating from German to English and back:
Grosses IWT-Wörterbuch der Computertechnik und der Wirtschaftsinformatik. Englisch-Deutsch. Deutsch-Englisch
I will offer up the answer, "no".
Even within English, there are not standard words to describe computer operations as you have presented them. Certainly one can "delete" a file, but they can also "erase" it, "remove" it, an (shudder) "move it to the trash can".
Instead of trying to solve the problem in the large, I suggest you solve the problem in the small. Build a glossary of commonly used German words, and whenever there is an opportunity to expand the Glossary, first look over the existing entries and do your best to reuse the current terminology.
In a way, the reason good English documentation works well is because good writers of English use a glossary like technique explicitly or implicitly. In the event that much of your documentation comes from a single source, or related set of sources, you can make a "translation map" of "when they say X, we say Y". But, even such simplifications often require native readers to re-read the translation in context, as languages are not nearly regular enough to do simple substitution without many pitfalls.
As a starting point, The Open Group (www.opengroup.org) seems to have defined glossaries as part of their work on The Open Group Architecture Framework (TOGAF), which appear to be the sort of thing I needed. For example, these document numbers and titles are taken directly from their Web site:
C148 TOGAF® 9.1 Translation Glossary: English – Hrvatski (Croatian)
C149 TOGAF® 9.1 Translation Glossary: English – Castilian Spanish
C146 TOGAF® 9.1 Translation Glossary: English – Portuguese (Portugal)
C13H TOGAF® 9.1 Translation Glossary: English – Slovak

Matching users with objects based on keywords and activity in Ruby

I have users that have authenticated with a social media site. Now based on their last X (let's say 200) posts, I want to map how much that content matches up with a finite list of keywords.
What would be the best way to do this to capture associated words/concepts (maybe that's too difficult) or just get a score of how much, say, my tweet history maps to 'Walrus' or 'banana'?
Would a naive Bayes work here to separate into 'matches' and 'no match'?
In Python I would say NLTK can easily do it. In Ruby maybe gem called lda-ruby will help you. Whole LDA concept is well explained here - look at Sarah Palin's email for example. There's even the example of an app (not entirely in Ruby, but still) which did that -> github.com/echen/sarah-palin-lda
Or maybe I just say stupid things and that can't help you at all. I'm not an expert ;)
A simple bayes would work in this case, it is highly used to detect if emails are spam or not so for a simple keyword matching it should work pretty well.
For this problem you could also apply a recommendation system where you look for the top recommended keyword for a user (or for a post).
There are a ton of ways for doing this. I would recommend you to read Programming Collective Intelligence. It is explained using python but since you know ruby there should be not problem to understand the code.

How can I do "related tags"?

I have tags on my website, and I input them one by one when I create a blog post. I love gmail's new feature, that ask you if you want to include X in a mail, if you type Y's name and that you often include both of them in the same messages.
I'd like to do something similar on my website, but I don't know how to represent the tags "related-ness" in an object or database ... thoughts ?
It all boils down to create associations between certain characteristics of your posts and certain tags, and then - when you press the "publish" button - to analyse the new post and propose all tags matched with your post characteristics.
This can be done in several ways from a "totally hard-coded" association to some sort of "learning AI"... and everything in-between.
Hard-coded solutions
This are the simplest algorithms to implement. You should first decide what characteristics of your post are relevant for tagging (e.g.: it's length if you tag them "short" or "long", the presence of photos or videos if you tag them "multimedia-content", etc...). The most obvious is however to focus on which words are used in posts. For example you could build a mapping like this:
tag_hint_words = {'code-development' : ['programming',
'language', 'python', 'function',
'object', 'method'],
'family' : ['Theresa', 'kids',
'uncle Ben', 'holidays']}
Then you would check your post for the presence of the words in the list (the code between [ and ] ) and propose the tag (the word before :) as a possible candidate.
A common approach is to give "scores", or in other word to put a number that indicates the probability a given tag is the right one. For example: if your post would contain the sentence...
After months of programming, we finally left for the summer holidays at uncle Ben's cottage. Theresa and the kids were ecstatic!
...despite the presence of the word "programming" the program should indicate family as the most likely tag to use, as there are many more words hinting.
Learning AI's
One of the obvious limitations of the above method is that - say one day you pick up java beside python - you would probably need to change your code and include words like "java" or "oracle" too. The same applies if you create new tags.
To circumvent this limitation (and have some fun!!) you could try to implement a learning algorithm. Learning algorithms are those who refine their outcome the more you use them (so they indeed... learn!). Some algorithm requires initial training (many spam filters and voice recognition programs need this initial "primer"). Some don't.
I am absolutely no expert on the subject, but two common AI's are: the Naive Bayes Classifier and some flavour of Neural network.
Although the WP pages might look scary, they are surprisingly easy to implement (at least in Python). Here's the recording of a lecture at PyCon 2009 on the subject "Easy AI with Python". I found it very informative and even somehow inspiring! :)
HTH!
You should have a look at this post :
Any suggestions for a db schema for storing related keywords?
If you're looking for a schema for storing related tags it will help.
Relevancy searches where multiple agents play a part are usually done using Collaborative filtering. You might want to give that a look see.
Look up Clustering (Machine Learning algorithm). Don't be intimidated by math, it's a pretty straightforward algorithm. Check out Machine Learning for Hackers for simpler explanations of many Machine Learning algorithms and methods.

"ID" or "Id" on User Interface

The QA manager where I work just informed me there is a bug in my desktop app due to the sign-on prompt being "Operator Id" when it should be "Operator ID". Her argument being that "Id" refers to the ego portion of Freud's "psychic apparatus" and is not semantically correct.
Now being an anal engineer (AE) I of course had to go and lookup Id vs ID and from my cursory investigations (google) it seems ID is just as commonly used for Freud's ego as Id is.
So my reasoning would be that Id is a shortened version of "Identifier" and is more correct or at least more commonly used than ID which would typically indicate a two word abbreviation.
I could just change the UI but then I wouldn't be holding up my profession as an AE so I was wondering if there any best practices or references for this sort of thing that I could use to support my argument? Keeping in mind that this question relates to the user interface and not the source code where abbreviations and casing are a whole different branch of philosophy.
According to Merriam-Webster, the abbreviation is "ID". If it were a correct abbreviation, it would have to be "Id." with the period.
Personally, I use "Id". The compiler doesn't care but my eyes do. Compare:
GetIDByWhatever <-- looks terrible
GetIdByWhatever <-- oh so pretty!
Aesthetics is more important than grammar when it comes to code, always. (Update: 4 years later, I don't stand by this statement anymore)
The 'D' doesn't stand for anything, so I've always considered it an abbreviation, not an acronym - and therefore I too use 'Id', not 'ID'.
I don't know about your qa's reasoning - words can have more than one meaning - this is not unusual in English :)
But it looks like the common usage is actually 'ID' (right or wrong :P), which is probably the format your users would expect.
As an UAUA (ultra-anal usability analyst), please use ID instead of Id.
Visually, it's more recognizable in English. Grammatically, "Id" is a word (rhymes with "squid") and the Freudian definition has been given above. We're never verbally asked to show "id", but "ID." I.D. is fine but passe, as the periods imply multiple words.
So.
Just use ID, okay?
OK.
It's interesting that so many feel "Id" should be the way to go.
I feel "ID" is appropriate because it hints at how we pronounce it -- I.D. Also, when I read "Id" in a running sentence, I sometimes have to come back and read it again just to ensure it's not a typo for "is" or "it".
So, as a technical writer, this is an issue that comes up for me quite regularly when reviewing other people's work, whether it be programmers, BAs or other writers. Typically, id refers to ego as others have said before me and the accepted abbreviation for identification is ID, just because plenty of people don't know or understand the rule doesn't mean that they are correct (sorry to be blunt), mind you the rules for punctuation and spelling to a large degree are almost as changeable as fashion!
However, what no-one seems to have asked is, does your company have a standard? At the end of the day if your company has a style guide and they have covered this topic in that guide, you should follow the guide. If it is not covered, then may I suggest that you raise the issue with the person that maintains the guide and include any stakeholders in the conversation. Consistency is key here. If the company you work for doesn't have a style guide, then perhaps it is time to start one!
Hope this helps...
The QA manager's line of reasoning is silly. Lots of English words have multiple meanings. "Lead", "lead", "lead" (metal, be at the front of, or a connector).
I would just try to be consistent with the capitalization used elsewhere in the app.
User interface and code are very different beasts...
"ID" is the correct answer for a user interface.
In code, consistency is your friend. Whether you like it or not, go with what is already there elsewhere in the code. If it's not there, then read up and make a decision, or get with the team and work out a way to go that everyone can agree to. Consistency makes life so much easier.
I prefer Id because when used with other 2-letter text, it doesn't become a single all-caps word
Photovoltaics systems ... PVID (one word or 2?) PvId (much more clear).
ID = Idaho! Id = Freud! Let the OCD begin!
There is a little OcD in all of us!
Anyway, the google style guide says this:
"ID: Not Id or id, except in string literals or enums. In some contexts, best to spell out as identifier or identification."
I'm going with that.
Microsoft is more vague, from what I could find.
as a short version of Identifier, I would use Id. Also ID it's freaky when you have functions like
getUserIDByName()
Multiple capitals in domain terms are quite problematic with CamelCase, as they can produce ambiguities and therefore dishomogeneity in your interfaces and namings
How would you say it if you were reading out loud? I'd pronounce the two letters. ID is correct, analogous with similar abbreviations such as TV. (No dots, please, as the letters don't stand for anything.)
When I'm dealing with abbreviations like this, I like to format them in small block capitals, but that's just a personal taste. Capitals, anyway.
(But I probably would continue to use Id in the code itself.)
I think its depend on the way we spell. We don't spell "it", but "ai-di". Id-ID is spell by two sounds, so people make the D in cap to avoid thinking id is a "word". Its more like a character symbol. I like the "ID" more, just because it's nicer.
ID is correct. I have been in the Id camp for years, simply because I believed it was an argument between abbreviations and acronyms. Until one day I learned of a 3rd type (subtype really) called Initialisms which are specific to abbreviations or acronyms that are specifically read one letter at a time rather than pronounced as a single word. Initialisms are all caps.

Algorithms to recognize misspelled names in texts

I need to develop an application that will index several texts and I need to search for people’s names inside these texts. The problem is that, while a person’s correct name is “Gregory Jackson Junior”, inside the text, the name might me written as:
- Greg Jackson Jr
- Gegory Jackson Jr
- Gregory Jackson
- Gregory J. Junior
I plan to index the texts on a nightly bases and build a database index to speed up the search. I would like recommendation for good books and/or good articles on the subject.
Thanks
Check these related questions.
Algorithm to find articles with similar text
How to search for a person's name in a text? (heuristic)
Your question is incorrectly phrased. The examples do not indicate misspelling but change in the form of writing a full name.
And,
would your search expect to match on words like son with reference to the example?
would it expect to match bob when looking for a name called Robert?
Are you looking for things like this and this?
Ok, reading your comment suggests you do not want to venture into that.
For the record. Use a Bayesian filter. You may use mechanical truck for initializing your algorithm.

Resources