Sort on last name, first name, or both? - sorting

I have a dilemma that I've encountered before. What's the best in terms of usability when one displays personal names in a table? Should there be a single column for the name? If so, is "firstname lastname" or "lastname, firstname" preferable? Or would a column for "firstname" and a column for "lastname" be best? I'm thinking in terms of the user's desire to sort the columns. I like having a column for each name component because I can imagine that in some cases the first name will be more important to the user whereas in other cases the last name would be more important.
I would assume that many out there have had this dilemma and am looking for pearls of wisdom based on past experience.

Definitely have a column for each part. That gives you much more flexibility. So you could sort by surname, but print "firstname surname", for example.

If you don't have the screen real estate to have a column for each part, you can combine them into a single string whose format represents the sorting order. Each click on the column header cycles to the next sort order. For example:
Default: sort by last, first (ASC)
Bimbleman, Wally P.
Zonkenstein, Arnold Q.
1st click: sort by last, first (DESC)
Zonkenstein, Arnold Q.
Bimbleman, Wally P.
2nd click: sort by first, middle, last (ASC)
Arnold Q. Zonkenstein
Wally P. Bimbleman
3rd click: sort by first, middle, last (DESC)
Wally P. Bimbleman
Arnold Q. Zonkenstein
etc...
Easier to read an entire name this way (vs. having it span across columns), takes up less screen real estate, and frees you from having to decide upon a single format & sort.

As far as I know, each country has Its own rules to Sort the names, some countries have the uses of do it By First name, and some by Last Name, I believe that the right answer here is, what is about your app? how many users will appear on those columns? And which users (age/nationality/context) are going to use your app?

Really, I agree with Skilldrick - a good UI has at least separate columns for first and last names...
But don't forget that CONSISTENCY in a UI is actually more important and makes things usable: giving the end user an implied expectation of how things are done.

You might consider calling the fields "Given Name" and "Family Name" to account for people who put their family name first. Of course this doesn't cover everyone (some people only have a given name) but it might reduce potential confusion with Chinese and Japanese names, for example.

In most cases you will find that these fields will cover for most scenarios: Title, Firstname, Middlename, Lastname
Most systems that I have worked with here in Australia, data are sorted by their lastname on default display. Also on the screen if you are providing search, usually Lastname field comes before firstname. Sorting by firstname is just as common too, so your systems should always allow the view to switch to sorting by Firstname

Here is a solution for a single column, I don't think separate columns can be scanned and read as quickly, although I don't have any data to back that up.
The primary focus of a user-oriented solution should be to display names as they would be read aloud, i.e. Title Firstname Middlename Lastname.
For most domains where the names are known to the user, sorting by firstname is acceptable. Here is an example where a persons title is ignored in the sorting, and the sort field is clear as it is highlighted:
Arnold Q. Zonkenstein
Mr. David Cliff
Marty P. Bimbleman
For formal business oriented applications, the default sorting could be by surname. You can preserve reading order, while still sorting by last name, again using highlighting:
Marty P. Bimbleman
Mr. David Cliff
Arnold Q. Zonkenstein
If you want the sorting field to be configurable, use an explicit checkbox, the solution of clicking multiple times on the column heading to cycle between sort fields will be jarring to the user (toggling sort direction by clicking on the heading is more acceptable).
IMO this is the simplest solution without any compromises.

Related

DAX COUNT/COUNTA functions

I've looked at many threads regarding COUNT and COUNTA, but I can't seem to figure out how to use it correctly.
I am new to DAX and am learning my way around. I have attempted to look this up and have gotten a little ways to where I need to be but not exactly. I think I am confused about how to apply a filter.
Here's the situation:
Four separate queries used to generate the data in the report; but only need to use two for the DAX function (Products and Display).
I have three columns I need to filter by, as follows:
Customer (Display or Products query; can do either)
Brand (Products query)
Location (Display query)
I want to count the columns based on if the data is unique.
Here's an example:
Customer: Big Box Buy;
Item: Lego Big Blocks;
Brand: Lego;
Location: Toys;
BREAK
Customer: Big Box Buy;
Item: Lego Star Wars;
Brand: Lego;
Location: Toys;
BREAK
Customer: Big Box Buy;
Item: Surface Pro;
Brand: Microsoft;
Location: Electronics;
BREAK
Customer: Little Shop on the Corner;
Item: Red Bicycle;
Brand: Trek;
Location: Racks;
In this example, no matter the fact that the items are different, we want to look at just the customer, the brand, and the location. We see in the first two records, the customer is "Big Box Buy" and the brand is "Lego" and the location is "Toys". This appears twice, but I want to count it distinct as "1". The next "Big Box Buy" store has the brand "Microsoft" and the location is "Electronics". It appears once and only once, and thus the distinct count is "1" anyway. This means that there are two separate entries for "Big Box Buy", both with a count of 1. And lastly there is "Little Shop on the Corner" which appears just once and is counted just once.
The "skeleton" of the code I have is basically just to see if I can get a count to work at all, which I can. It's the FILTER that I think is the problem (not used in the below example) judging by other threads I've read.
TotalDisplays = CALCULATE(COUNTA(products[Brand]))
Obviously I can't just count the amount of times a brand appears as that would give me duplicates. I need it unique based on if the following conditions are met:
Customer must be the same
Brand must be the same
Location must be the same
If so, we distinctly count it as one.
I know I ranted a bit and may seem to have gone in circles, but I was trying to figure out how to explain it. Please let me know if I need to edit this post or post clarification.
Many thanks in advance as I go through my journey with DAX!
I believe I have the answer. I used a NATURALINNERJOIN in DAX to create a new, merged table since I needed to reference all values in the same query (couldn't figure out how to do it otherwise). I also created an "unique identity" calculated column that combined data from multiple rows, but was hidden behind the scenes (not actually displayed on the report) so I could then take a measure of the unique values that way.
TotalDisplays = COUNTROWS(DISTINCT('GD-DP-Merge'[DisplayCountCalcCol]))
My calculated column is as follows:
DisplayCountCalcCol = 'GD-DP-Merge'[CustID] & 'GD-DP-Merge'[Brand] & 'GD-DP-Merge'[Location] & 'GD-DP-Merge'[Order#]
So the measure TotalDisplays now reports back the distinct count of rows based on the unique value of the customer ID, the brand, and the location of the item. I also threw in an order number just in case.
Thanks!
I am semi new to DAX and was struggling with Count and CountA formula, you post has helped me with answers. I would like to add the solution which i got for my query: Wanted count for Right Time start Achieved hence if anyone is looking for this kind of answer use below, filter will be selecting the table and adding string which you want to
RTSA:=calculate(COUNTA([RTS]),VEO_Daily_Services[RTS]="RTSA")

How to write my query in Relational Algebra?

I have a dataset which has files of hotel reviews. Each file contains multiple reviews for a single hotel. Here are my two relations in BCNF:
Hotel(hotelID, OverallRating, AveragePrice, URL)
Review(hotelID, Author, Content, Date, No. Reader, No. Helpful,
Overall, Value, Rooms, Location, Cleanliness, Checkin / front desk,
Service, Business Service)
I am trying to write the following query in relational algebra:
Find all the reviews by the same user (i.e., given a user ID, return the list of all their
reviews).
By User ID, the question is referring to the Author attribute found in my second relation. The way I understand the question, it must take a user ID as an argument. Maybe you see it differently?
Here is what I have so far:
(Selection) Author = $1 (Review)
Replace selection with the sigma symbol used to represent selection in relational algebra, I was having trouble inserting it into my question. $1 represents where it would take the user ID argument, this is just to show my thinking, I do not think its correct.
Thanks for your time
Query will be:
σ(Author="Your_User Id") ( Hotel Join(X)(Hotel.hotelID=Review.hotelID) Review )
Where
σ = Selection Operator
X= Join Operator
(-----) = Condition
Hope it helps. For More detail Refer My notes for DBMS: Relational Algebra
Search "Relational Algebra" Term in site to find your exact information fast.

GT.M order values

is there any way how to sort the result set in GT.M by specific value?
Let's say I have global variable ^People(name,surname)=age and I want to get all the people with age between 20 and 40, ordered by their age?
Edit: Sorry to Answer instead of continuing the comment chain... my permissions level isn't high enough to comment on this question yet.
As long as you won't have multiple people with the same name, but different surnames, Yogesh is correct. It would be best to use the index to find the correct node of ^People, then grab what you need from ^People. But if you would have multiple people with the same name, but a different surname, then you would want to include the surname in the ^PeopleAgeIndex subnodes.

How to properly organize search of the person?

Let's say I have list of persons in my datastore. Each person there may have the following fields:
last name (*)
first name
middle name
id (*)
driving licence id (*)
another id (*)
date of birth
region
place of birth
At least one of the fields marked with (*) must exist.
Now user provides me with the same list of fields (and again at least one of the fields marked with (*) must be provided). I should search for the person user provided. But not all fields should be matched. I should display to the user somehow how I am sure in the results of search. Something like:
if person matched by id and last name (and user provided just these 2 fields for the search), then I am sure that result is correct (100%);
if person matched by id and last name (and user provided other fields, which were found in the database, but were not matched), then I am sure that result is almost correct by 60%;
etc.
(numbers are provided just as example)
How can I organize such search? Is there any standard algorithm? I also would like to minimize number of requests to the database.
P.S. I can not provide user with the actual field values from the database.
It sounds like your logic for determining the quality of a match will be too complex to handle at the database layer. I think you'll get the best performance by retrieving all of the records that match at least one of the mandatory keys, calculating the match score for each of them in memory, and returning the best score. For example, if the user provides you with an id, last name and place of birth, your query would look something like:
SELECT * FROM users WHERE id = `the_id` OR last_name = `the_last_name`;
This could be a performance problem if you have a VERY large dataset with lots of common last names but otherwise I would expect not to see too many collisions. You can check this on your own dataset outside of GAE. You could also get better performance if all mandatory fields MUST match by changing the OR to an AND.

How do I go about building a matching algorithm?

I've never built an algorithm for matching before and don't really know where to start. So here is my basic set up and why I'm doing it. Feel free to correct me if I'm not asking the right questions.
I have a database of names and unique identifiers for people. Several generated identifiers (internally generated and some third party), last name, first name, and birth date are the primary ones that I would be using.
Several times throughout the year I receive a list from a third party that needs to be imported and tied to the existing people in my database but the data is never as clean as mine. IDs could change, birth dates could have typos, names could have typos, last names could change, etc.
Each import could have 20,000 records so even if it's 99% accurate that's still 200 records I'd have to go in manually and match. I think I'm looking for more like 99.9% accuracy when it comes to matching the incoming people to my users.
So, how do I go about making an algorithm that can figure this out?
PS Even if you don't have an exact answer but do know of some materials to reference would also be helpful.
PPS Some examples would be similar to what m3rLinEz wrote:
ID: 9876234 Fname: Jose LName: Guitierrez Birthdate:01/20/84 '- Original'
ID: 9876234 Fname: Jose LName: Guitierrez Birthdate:10/20/84 '- Typo in birth date'
ID: 0876234 Fname: Jose LName: Guitierrez Birthdate:01/20/84 '- Wrong ID'
ID: 9876234 Fname: Jose LName: Guitierrez-Brown Birthdate:01/20/84 '- Hyphenated last name'
ID: 9876234 Fname: Jose, A. LName: Guitierrez Birthdate:01/20/84 '- Added middle initial'
ID: 3453555 Fname: Joseph LName: Guitierrez Birthdate:01/20/84 '- Probably someone else with same birthdate and same last name'
You might be interested in Levenshtein distance.
The Levenshtein distance between two
strings is defined as the minimum
number of edits needed to transform
one string into the other, with the
allowable edit operations being
insertion, deletion, or substitution
of a single character. It is named
after Vladimir Levenshtein, who
considered this distance in 1965.1
It is possible to compare every of your fields and computing the total distance. And by trial-and-error you may discover the appropriate threshold to allow records to be interpret as matched. Have not implemented this myself but just thought of the idea :}
For example:
Record A - ID: 4831213321, Name: Jane
Record B - ID: 431213321, Name: Jann
Record C - ID: 4831211021, Name: John
The distance between A and B will be lower than A and C / B and C, which indicates better match.
When it comes to something like this, do not reinvent the wheel. The Levehstein distance is probably your best bet if you HAVE to do this yourself, but otherwise, do some research on existing solutions which do database query and fuzzy searches. They've been doing it longer than you, it'll probably be better, too..
Good luck!
If you're dealing with data sets of this size and different resources being imported, you may want to look into an Identity Management solution. I'm mostly familiar with Sun Identity Manager, but it may be overkill for what you're trying to do. It might be worth looking into.
If the data you are getting from 3rd parties is consistent (same format each time) I'd probably create a table for each of the 3rd parties you are getting data from. Then import each new set of data to the same table each time. I know there's a way to then join the two tables based on common columns in each using an SQL statement. That way you can perform SQL queries and get data from multiple tables, but make it look like it came from one single unified table. Similarly records that were added that don't have matches in both tables could be found and then manually paired. This way you keep your 'clean' data separate from the junk you get from third parties. If you wanted a true import you could then use that joined table to create a third table containing all your data.
I would start with the easy near 100% certain matches and handle them first, so now you have a list of say 200 that need fixing.
For the remaining rows you can use a simplified version of Bayes' Theorem.
For each unmatched row, calculate the likelihood that it is a match for each row in your data set assuming that the data contains certain changes which occur with certain probabilities. For example, a person changes their surname with probability 0.1% (possibly also depends on gender), changes their first name with probability 0.01%, and is a has a single typo with probility 0.2% (use Levenshtein's distance to count the number of typos). Other fields also change with certain probabilities. For each row calculate the likeliness that the row matches considering all the fields that have changed. Then pick the the one that has the highest probability of being a match.
For example a row with only a small typo in one field but equal on all others would have a 0.2% chance of a match, whereas rows which differs in many fields might have only a 0.0000001% chance. So you pick the row with the small typo.
Regular expressions are what you need, why reinvent the wheel?

Resources