Arranging First and Last Names in data - names

Are there guidelines or best practice for formatting first and last names displayed in gridviews? Is it better to display the first name first or the last name first?

There is no universally accepted convention; the ordering of parts of personal names depends on the culture.
You can read more about this here: http://www.w3.org/International/questions/qa-personal-names
Then again, from a user-interface perspective, if the user is more likely to sort by the last name it should come before the first name.

In general, I've always found it better to show the surname first, followed by given name. Perhaps it's possible for you to have two separate columns if sorting on either one could be useful.
Bear in mind, if this is appropriate to your task, that the first/last names are not always the given name/surname in certain cultures so it's better to refer to them as given/surnames.

Related

What's the point of the "/" after defining something in webfocus?

I'm currently learning webfocus and I'm learning about defining, but what is teaching me isn't doing a great job. An example that has been given to me is about cars. If I want to make a new field to find the difference between the retail cost and the dealer cost, they said to input
DEFINE FILE CAR
CALCULATED_DIFFERENCE/D5 = RETAIL_COST - DEALER_COST
END
What I'm confused about is what the
/D5
is there for. Is it required to define a file? Does it have to be something specific? I've researched a lot but haven't found any answers about it.
From what I understand, it joins the aliases and roots. For example the D could be short for difference. The 5 could just be a number. Without all of the example and question it's hard to say.
Here is another example http://www.hwconline.com/ibi_html/javaassist/intl/EN/help/source/topic547.htm
Section
A preceding forward slash '/' is required for all aliases and context roots. The WebFOCUS Administration Console automatically adds a forward slash if one was not entered.
I am assuming it could be a pointer. Look at the url and the / indicates like a tree node. If you want to get to a location then it's location/location/location/here
here being what your looking at. So if the /D5 is called it could mean page/page/page/Difference5 so for short it omits all of the pages and signifies the direct alias D5.
I am not entirely sure but that is what I have always thought. Look at everything dealing with locations a \ or / is always used.
I've found out what it means. The /D5 is known as formatting. It allows WebFOCUS to know what is being used in the file. The D stands for decimal, which means that if I wanted to I could format it as D/5.2 and an A stands for alphanumeric. If I were to define a file called person_last_name, I would use A/30. The number determines how much data it can hold, so 30 is all I need as a person's last name probably won't be longer than 30 characters.

How to change long gene names to abbreviated in some automatic way (microarray data processing)?

Is there any automatic way to convert a list of long gene names (like Cadherin_3453) to its abbreviations, like CDHRN_3453?
Are there any abbreviation name convention in Genomics, Bioinformatics?
Sorry, no code herein
There is the HUGO database which tries to standardize gene names. Depending on your use case you can either try to access their online search every time or download the data and use your own database.
Since you did not post a programming language that you want, I am guessing that this is just a simple, one-time exercise that you would like to do.
Though it is not a true abbreviation, you could just remove all of the vowels in the gene name (as you may have done by accident in your example).
You should use:
http://www.togglecase.com/convert_to_disemvowelled_text.php
It was able to change Cadherin_3453 to Cdhrn_3453.
If you a looking to be able to do this with a program that you can tailor to your specific needs, you can look into this SO question: String replace vowels in Python?

Intern Problem Statement for a bank

I saw an intern opportunity in a bank in dubai. They have a defined problem statement to be solved in 2 months. They told us just 2 lines -
"Basically the problem is about name matching logic.
There are two fields (variables) – both are employer names, and it’s a free text field. So we need to write a program to match these two variables."
Can anyone help me in understanding it? Is it just a simple pattern matching stuff?
Any help/comments would be appreciated.
I think this is what they are asking for:
They have two sources of related data, for example, one from an internal database, and the other from name card input.
Because the two fields are free text fields, there will be inconsistency. For example, Nitin Garg, or Garg, Nitin, or Mr. Nitin Garg, etc. Here is an extreme case of Gadaffi.
What you are supposed to do is to find a way to match all the names for a specific person together.
In short, match two pieces of data together by employer names, taking possible inconsistency into account.
Once upon a time there was a nice simple answer to the problem of matching up names despite mis-spellings and different transliterations - Soundex. But people have put a lot of work into this problem, so now you should probably use the results of that work, which is built into databases and add-ons - some free. See Fuzzy matching using T-SQL and http://anastasiosyal.com/archive/2009/01/11/18.aspx and http://msdn.microsoft.com/en-us/magazine/cc163731.aspx

How to detect vulnerable/personal information in CVs programmatically (by means of syntax analysis/parsing etc...)

To make matter more specific:
How to detect people names (seems like simple case of named entity extraction?)
How to detect addresses: my best guess - find postcode (regexes); country and town names and take some text around them.
As for phones, emails - they could be probably caught by various regexes + preprocessing
Don't care about education/working experience at this point
Reasoning:
In order to build a fulltext index on resumes all vulnerable information should be stripped out from them.
P.S. any 3rd party APIs/services won't do as a solution.
The problem you're interested in is information extraction from semi structured sources. http://en.wikipedia.org/wiki/Information_extraction
I think you should download a couple of research papers in this area to get a sense of what can be done and what can't.
I feel it can't be done by a machine.
Every other resume will have a different format and layout.
The best you can do is to design an internal format and manually copy every resume content in there. Or ask candidates to fill out your form (not many will bother).
I think that the problem should be broken up into two search domains:
Finding information relating to proper names
Finding information that is formulaic
Firstly the information relating to proper names could probably be best found by searching for items that are either grammatically important or significant. I.e. English capitalizes only the first word of the sentence and proper nouns. For the gramatical rules you could look for all of the words that have the first letter of the word capitalized and check it against a database that contains the word and the type [i.e. Bob - Name, Elon - Place, England - Place].
Secondly: Information that is formulaic. This is more about the email addresses, phone numbers, and physical addresses. All of these have a specific formats that don't change. Use a regex and use an algorithm to detect the quality of the matches.
Watch out:
The grammatical rules change based on language. German capitalizes EVERY noun. It might be best to detect the language of the document prior to applying your rules. Also, another issue with this [and my resume sometimes] is how it is designed. If the resume was designed with something other than a text editor [designer tools] the text may not line up, or be in a bitmap format.
TL;DR Version: NLP techniques can help you a lot.

Detecting misspelled words

I have a list of airport names and my users have the possibility to enter one airport name to select it for futher processing.
How would you handle misspelled names and present a list of suggestions?
Look up Levenshtein distances to match a correct name against a given user input.
http://norvig.com/spell-correct.html
does something like levenshtein but, because he doesnt go all the way, its more efficient
Employ spell check in your code. The list of words should contain only correct spellings of airports.
This is not a great way to do this. You should either go for a control that provides auto complete option or a drop down as someone else suggested.
Use AJAX if your technology supports.
I know its not what you asked, but if this is an application where getting the right airport is important (e.g. booking tickets) then you might want to have a confirmation stage to make sure you have the right one. There have been cases of people getting tickets for the wrong Sydney, for instance.
It may be better to let the user select from the list of airport names instead of letting them type in their own. No mistakes can be made that way.
While it won't help right away, you could keep track of typos, and see which name they finally enter when a correct name is entered. That way you can track most common typos, and offer the best options.
Adding to Kevin's suggestion, it might be a best of both worlds if you use an input box with javascript autocomplete. such as jquery autocomplete
edit: danish beat me :(
There may be an existing spell-check library you can use. The code to do this sort of thing well is non-trivial. If you do want to write this yourself, you might want to look at dictionary trie's.
One method that may work is to just generate a huge list of possible error words and their corrections (here's an implementation in Python), which you could cache for greater performance.

Resources