What type of list should be used for a timeline? - html-lists

An online exercise I am doing gave as a solution for building a timeline an ordered list.
I had constructed the timeline using a description list, since I thought it would look weird to have a number or letter preceding a year.
I think a description list looks better, but I'm wondering about WAI-ARIA: does it make sense for a timeline to be constructed as an ordered list so the progression is semantically logical as well as in appearance?
And, if so, is it possible to hide the ordinal indicator (i.e., letter, number) of the <ol>?

Like you've suggested, it's all about semantics. Without referring to a spec, it makes sense to use HTML elements that "suggest" to someone/something (developers/machines) reading the code directly that there's further meaning, in this case the data following a logical order.
Other examples would be semantic elements introduced in HTML5 like <header>, <article>, <section>, <aside>, <time> or even older elements like <address>.
Comparing your two options:
An ordered list <ol> implies that the data is ordered, which suits a list of dates/events in a timeline.
A data list <dl> uses term elements <dt> for holding the term/name and description elements <dd> for describing that term. Depending on the type of timeline, it could be argued that a year is the term, but are you describing it as a term? Most likely it's not being described but just used as a point in time for other data (think: x-axis).
Furthermore, using an ordered list would mean:
Other developers (even in the CSS/JS) would know to respect the order, whether that be in the generation of those elements or in the styling, and get some insight into the data.
If you have an end user with a disability using a screen-reader, the reader can respect that order (think: cooking instructions).
So an ordered list is probably most appropriate, though don't lose sleep over your choice either, we're almost splitting hairs in this case.
If you need to hide the ordinal indicator you can do that quite easily with CSS:
ol {
list-style: none; // removes ordinal indicator
padding-left: 0; // removes the left-over space, if needed
}

Related

Relation between two texts with different tags

I'm currently having a problem with the conception of an algorithm.
I want to create a WYSIWYG editor that goes along the current [bbcode] editor I have.
To do that, I use a div with contenteditable set to true for the WYSIWYG editor and a textarea containing the associated bbcode. Until there, no problem. But my concern is that if a user wants to add a tag (for example, the [b] tag), I need to know where they want to include it.
For that, I need to know exactly where in the bbcode I should insert the tags. I thought of comparing the two texts (one with html tags like <span>, the other with bbcode tags like [b]), and that's where I'm struggling.
I did some research but couldn't find anything that would help me, or I did not understand it correctly (maybe did I do a wrong research). What I could find is the Jaccard index, but I don't really know how to make it work correctly.
I also thought of another alternative. I could just take the code in the WYSIWYG editor before the cursor location, and split it every time I encounter a html tag. That way, I can, in the bbcode editor, search for the first occurrence, then search for the second occurrence starting at the last index found, and so on until I reach the place where the cursor is pointing at.
I'm not sure if it would work, and I find that solution a bit dirty. Am I totally wrong or should I do it this way?
Thanks for the help.
A popular way of determining what is the level of the similarity between the two texts is computing the mentioned Jaccard similarity. Citing Wikipedia:
The Jaccard index, also known as Intersection over Union and the Jaccard similarity coefficient, is a statistic used for comparing the similarity and diversity of sample sets. The Jaccard coefficient measures the similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets:
If you have a large number of texts though, computing the full Jaccard index of every possible combination of two texts is super computationally expensive. There is another way to approximate this index that is called minhashing. What it does is use several (e.g. 100) independent hash functions to create a signature and it repeats this procedure many times. This whole process has a nice property that the probability (over all permutations) that T1 = T2 is the same as J(A,B).
Another way to cluster similar texts (or any other data) together is to use Locality Sensitive Hashing which by itself is an approximation of what KNN does, and is usually worse than that, but is definitely faster to compute. The basic idea is to project the data into low-dimensional binary space (that is, each data point is mapped to a N-bit vector, the hash key). Each hash function h must satisfy the sensitive hashing property prob[h(x)=h(y)]=sim(x,y) where sim(x,y) in [0,1] is the similarity function of interest. For dots products it can be visualized as follows:
we can now ask what would be the has of the indicated point (in this case it's 101) and everything that is close to this point has the same hash.
EDIT to answer the comment
No, you asked about the text similarity and so I answered that. You basically ask how can you predict the position of the character in text 2. It depends on whether you analyze the writer's style or just pure syntax. In any of those two cases, IMHO you need some sort of statistics that will tell where it is likely for this character to occur given all the other data/text. You can go with n-grams, RNNs, LSTMs, Markov Chains or any other form of sequential data analysis.

Why are the elements of a rascal set not ordered when printing?

For the sake of usability, when I write the following code
{1,2,3,4,5,6,7,8,9,10}
I expect the Rascal console to print the same, yet in the output window I see:
{10,9,8,7,6,5,4,3,2,1}
This example is overly simple of course, and for this the ordering doesn't really hurt. However, in more complex examples I would expect the output to be sorted so that I can more easily verify if a certain element is included in the set.
Does the current ordering of the printed set have a meaning?
From the Rascal Tutor:
A set is an unordered sequence of values and has the following
properties:
All elements have the same static type.
The order of the elements does not matter.
A set contains an element only once. In other words, duplicate elements are eliminated and no matter how many times an element is
added to a set, it will occur in it only once.
And the wikipedia page on Sets:
In computer science, a set is an abstract data structure that can store certain values, without any particular order, and no repeated values.
So the behaviour you observe is as expected, there is no order inside a set, the order displayed is due to the implementation (a java HashSet). Sorting before printing, or during construction will have negative performance overhead, and might give a user the incorrect impression that there is an order.
In regards to the first suggestion, using the same sequence as supplied, that would require a less efficient data structure, and would again hurt performance in the off-change we have to print a set.
And of course, you can always do:
import List;
import Set;
sort(toList({4,2,1,3}))
if you really want the output sorted.

how to recycle images, but not show anyone the same image twice?

I'm writing a web app similar to wtfimages.com in that one visitor should never (or rarely) see the same thing twice, but different visitors can see the same thing. Ideally, this would span visits, so that when Bob comes back tomorrow he doesn't see today's things again either.
Three first guesses:
have enough unique things that it's unlikely any user will draw enough items to repeat
actually track each user somehow and log what he has seen
have client-side Javascript request things by id according to a pseudorandom sequence seeded with something unique to the visitor and session (e.g., IP and time)
Edit: So the question is, which of these three is the best solution? Is there a better one?
Note: I suspect this question is the web 2.0 equivalent of "how do I implement strcpy?", where everybody worth his salt knows K&R's idiomatic while(*s++ = *t++) ; solution. If that's the case, please point me to the web 2.0 K&R, because this specific question is immaterial. I just wanted a a "join the 21st century" project to learn CGI scripting with Python and AJAX with jQuery.
The simplest implementation I can think of would be to make a circular linked list, and then start individual users at random offsets in the linked list. You are guaranteed that they will see every image there is to see before they will see any image twice.
Technically, it only needs to be a linked list in a conceptual sense. For example, you could just use the database identifiers of the various items and wrap around once you've hit the last one.
There are complexity problems with other solutions. For example, if you want it to be a different order for each person, that requires permuting the elements in some way. But then you have to store that permutation, so as to guarantee that people see things in different orders. That's going to take up a lot of space. It will also require you to update everybody's permutations if you add or remove an image to the list of things to see, which is yet more work.
A compromise solution that still allows you to guarantee a person sees every image before they see any image twice while still varying things among people might be something like this:
Using some hash function H (say, MD5), take the hash of each image, and store the image with a filename equal to the digest (e.g. 194db8c5[...].jpg).
Decide on a number N. This will be the number of different paths that a randomly selected person could take to traverse all the images. For example, if you pick N = 10, each person will take one of 10 possible distinct journeys through the images. Don't pick an N larger than the digest size of H (for MD5, this is 16; for SHA-1, it's 64).
Make N different permutations of the image list, with the ith such permutation being generated by rotating the characters in each file name i characters to the left, and then sorting all the entries. (For example, a file originally named abcdef with i == 4 will become efabcd. Now sort all the files that have been transformed in this way, and you have a distinct list.)
Randomly assign to each user a number r from 0 .. N - 1 inclusive. They now see the images in the ordering specified by r.
Ultimately, this seems like a lot of work. I'd say just suck it up and make it random, accept that people will occasionally see the same image again, and move on.
Personally I would just store a cookie on the user's machine which holds all the ID's of what he's seen. That way you can keep the 'randomness' and not have to show the items in sequential order as John Feminella's otherwise great solution suggests.
Applying the cookie data in an SQL query would also be trivial: say that you have a comma separated ID's in the cookie, you can just do this (in PHP):
"SELECT image FROM images WHERE id NOT IN(".$_COOKIE['myData'].") ORDER BY RAND() LIMIT 1"
Note that this is just an simple example, you should of course escape the cookie data properly and there might be more efficient ways to select a random entry from a table.
Using a cookie also makes it possible to start off where the user left off the previous time. And cookie sizes won't probably be an issue, you can hold a lot of ID's in 4KB which is (usually) the maximum size of cookie files.
EDIT
If your cookie data looks like this:
$_COOKIE['myData'] == '1,6,19,200,70,16';
You can safely use that data in a SQL query with:
$ids = array_map('mysql_real_escape_string', explode(',', $_COOKIE['myData']));
$query = "SELECT image FROM images WHERE id NOT IN('".implode("', '", $ids)."') ORDER BY RAND() LIMIT 1"
What this will do is that it splits the ID string into individual ID's, then runs mysql_real_escape_string to each of them, then implodes them with quotes so that the query becomes:
$query == "SELECT image FROM images WHERE id NOT IN('1', '6', '19', '200', '70', '16') ORDER BY RAND() LIMIT 1"
So $_COOKIE[] variables are just like any other variable, and you must do same precautions for them as with other data.
You have 2 class of solutions:
state-less
state-full
You need to pick one: (#1) is of course not guaranteed (i.e. probability of showing same image to user is variable) whilst (#2) allows you guarantees (depending on the implementation of course).
Here is another suggestion you might want to consider:
Maintain state on the Client-Side through HTML5 localstorage (when available): the value of this option will only continue to increase as Web Browsers with HTML5 support increases.

Categorizing Words and Category Values

We were set an algorithm problem in class today, as a "if you figure out a solution you don't have to do this subject". SO of course, we all thought we will give it a go.
Basically, we were provided a DB of 100 words and 10 categories. There is no match between either the words or the categories. So its basically a list of 100 words, and 10 categories.
We have to "place" the words into the correct category - that is, we have to "figure out" how to put the words into the correct category. Thus, we must "understand" the word, and then put it in the most appropriate category algorthmically.
i.e. one of the words is "fishing" the category "sport" --> so this would go into this category. There is some overlap between words and categories such that some words could go into more than one category.
If we figure it out, we have to increase the sample size and the person with the "best" matching % wins.
Does anyone have ANY idea how to start something like this? Or any resources ? Preferably in C#?
Even a keyword DB or something might be helpful ? Anyone know of any free ones?
First of all you need sample text to analyze, to get the relationship of words.
A categorization with latent semantic analysis is described in Latent Semantic Analysis approaches to categorization.
A different approach would be naive bayes text categorization. Sample text with the assigned category are needed. In a learning step the program learns the different categories and the likelihood that a word occurs in a text assigned to a category, see bayes spam filtering. I don't know how well that works with single words.
Really poor answer (demonstrates no "understanding") - but as a crazy stab you could hit google (through code) for (for example) "+Fishing +Sport", "+Fishing +Cooking" etc (i.e. cross join each word and category) - and let the google fight win! i.e. the combination with the most "hits" gets chosen...
For example (results first):
weather: fish
sport: ball
weather: hat
fashion: trousers
weather: snowball
weather: tornado
With code (TODO: add threading ;-p):
static void Main() {
string[] words = { "fish", "ball", "hat", "trousers", "snowball","tornado" };
string[] categories = { "sport", "fashion", "weather" };
using(WebClient client = new WebClient()){
foreach(string word in words) {
var bestCategory = categories.OrderByDescending(
cat => Rank(client, word, cat)).First();
Console.WriteLine("{0}: {1}", bestCategory, word);
}
}
}
static int Rank(WebClient client, string word, string category) {
string s = client.DownloadString("http://www.google.com/search?q=%2B" +
Uri.EscapeDataString(word) + "+%2B" +
Uri.EscapeDataString(category));
var match = Regex.Match(s, #"of about \<b\>([0-9,]+)\</b\>");
int rank = match.Success ? int.Parse(match.Groups[1].Value, NumberStyles.Any) : 0;
Debug.WriteLine(string.Format("\t{0} / {1} : {2}", word, category, rank));
return rank;
}
Maybe you are all making this too hard.
Obviously, you need an external reference of some sort to rank the probability that X is in category Y. Is it possible that he's testing your "out of the box" thinking and that YOU could be the external reference? That is, the algorithm is a simple matter of running through each category and each word and asking YOU (or whoever sits at the terminal) whether word X is in the displayed category Y. There are a few simple variations on this theme but they all involve blowing past the Gordian knot by simply cutting it.
Or not...depends on the teacher.
So it seems you have a couple options here, but for the most part I think if you want accurate data you are going to need to use some outside help. Two options that I can think of would be to make use of a dictionary search, or crowd sourcing.
In regards to a dictionary search, you could just go through the database, query it and parse the results to see if one of the category names is displayed on the page. For example, if you search "red" you will find "color" on the page and likewise, searching for "fishing" returns "sport" on the page.
Another, slightly more outside the box option would be to make use of crowd sourcing, consider the following:
Start by more or less randomly assigning name-value pairs.
Output the results.
Load the results up on Amazon Mechanical Turk (AMT) to get feedback from humans on how well the pairs work.
Input the results of the AMT evaluation back into the system along with the random assignments.
If everything was approved, then we are done.
Otherwise, retain the correct hits and process them to see if any pattern can be established, generate a new set of name-value pairs.
Return to step 3.
Granted this would entail some financial outlay, but it might also be one of the simplest and accurate versions of the data you are going get on a fairly easy basis.
You could do a custom algorithm to work specifically on that data, for instance words ending in 'ing' are verbs (present participle) and could be sports.
Create a set of categorization rules like the one above and see how high an accuracy you get.
EDIT:
Steal the wikipedia database (it's free anyway) and get the list of articles under each of your ten categories. Count the occurrences of each of your 100 words in all the articles under each category, and the category with the highest 'keyword density' of that word (e.g. fishing) wins.
This sounds like you could use some sort of Bayesian classification as it is used in spam filtering. But this would still require "external data" in the form of some sort of text base that provides context.
Without that, the problem is impossible to solve. It's not an algorithm problem, it's an AI problem. But even AI (and natural intelligence as well, for that matter) needs some sort of input to learn from.
I suspect that the professor is giving you an impossible problem to make you understand at what different levels you can think about a problem.
The key question here is: who decides what a "correct" classification is? What is this decision based on? How could this decision be reproduced programmatically, and what input data would it need?
I am assuming that the problem allows using external data, because otherwise I cannot conceive of a way to deduce the meaning from words algorithmically.
Maybe something could be done with a thesaurus database, and looking for minimal distances between 'word' words and 'category' words?
Fire this teacher.
The only solution to this problem is to already have the solution to the problem. Ie. you need a table of keywords and categories to build your code that puts keywords into categories.
Unless, as you suggest, you add a system which "understands" english. This is the person sitting in front of the computer, or an expert system.
If you're building an expert system and doesn't even know it, the teacher is not good at giving problems.
Google is forbidden, but they have almost a perfect solution - Google Sets.
Because you need to unterstand the semantics of the words you need external datasources. You could try using WordNet. Or you could maybe try using Wikipedia - find the page for every word (or maybe only for the categories) and look for other words appearing on the page or linked pages.
Yeah I'd go for the wordnet approach.
Check this tutorial on WordNet-based semantic similarity measurement. You can query Wordnet online at princeton.edu (google it) so it should be relatively easy to code a solution for your problem.
Hope this helps,
X.
Interesting problem. What you're looking at is word classification. While you can learn and use traditional information retrieval methods like LSA and categorization based on such - I'm not sure if that is your intent (if it is, then do so by all means! :)
Since you say you can use external data, I would suggest using wordnet and its link between words. For instance, using wordnet,
# S: (n) **fishing**, sportfishing (the act of someone who fishes as a diversion)
* direct hypernym / inherited hypernym / sister term
o S: (n) **outdoor sport, field sport** (a sport that is played outdoors)
+ direct hypernym / inherited hypernym / sister term
# S: (n) **sport**, athletics
(an active diversion requiring physical exertion and competition)
What we see here is a list of relationships between words. The term fishing relates to outdoor sport, which relates to sport.
Now, if you get the drift - it is possible to use this relationship to compute a probability of classifying "fishing" to "sport" - say, based on the linear distance of the word-chain, or number of occurrences, et al. (should be trivial to find resources on how to construct similarity measures using wordnet. when the prof says "not to use google", I assume he means programatically and not as a means to get information to read up on!)
As for C# with wordnet - how about http://opensource.ebswift.com/WordNet.Net/
My first thought would be to leverage external data. Write a program that google-searches each word, and takes the 'category' that appears first/highest in the search results :)
That might be considered cheating, though.
Well, you can't use Google, but you CAN use Yahoo, Ask, Bing, Ding, Dong, Kong...
I would do a few passes. First query the 100 words against 2-3 search engines, grab the first y resulting articles (y being a threshold to experiment with. 5 is a good start I think) and scan the text. In particular I"ll search for the 10 categories. If a category appears more than x time (x again being some threshold you need to experiment with) its a match.
Based on that x threshold (ie how many times a category appears in the text) and how may of the top y pages it appears in you can assign a weigh to a word-category pair.
for better accuracy you can then do another pass with those non-google search engines with the word-category pair (with a AND relationship) and apply the number of resulting pages to the weight of that pair. Them simply assume the word-category pair with highest weight is the right one (assuming you'll even have more than one option). You can also multi assign a word to a multiple category if the weights are close enough (z threshold maybe).
Based on that you can introduce any number of words and any number of categories. And You'll win your challenge.
I also think this method is good to evaluate the weight of potential adwords in advertising. but that's another topic....
Good luck
Harel
Use (either online, or download) WordNet, and find the number of relationships you have to follow between words and each category.
Use an existing categorized large data set such as RCV1 to train your system of choice. You could do worse then to start reading existing research and benchmarks.
Appart from Google there exist other 'encyclopedic" datasets you can build of, some of them hosted as public data sets on Amazon Web Services, such as a complete snapshot of the English language Wikipedia.
Be creative. There is other data out there besides Google.
My attempt would be to use the toolset of CRM114 to provide a way to analyze a big corpus of text. Then you can utilize the matchings from it to give a guess.
My naive approach:
Create a huge text file like this (read the article for inspiration)
For every word, scan the text and whenever you match that word, count the 'categories' that appear in N (maximum, aka radio) positions left and right of it.
The word is likely to belong in the category with the greatest counter.
Scrape delicious.com and search for each word, looking at collective tag counts, etc.
Not much more I can say about that, but delicious is old, huge, incredibly-heavily tagged and contains a wealth of current relevant semantic information to draw from. It would be very easy to build a semantics database this way, using your word list as a basis from scraping.
The knowledge is in the tags.
As you don't need to attend the subject when you solve this 'riddle' it's not supposed to be easy I think.
Nevertheless I would do something like this (told in a very simplistic way)
Build up a Neuronal Network which you give some input (a (e)book, some (e)books)
=> no google needed
this network classifies words (Neural networks are great for 'unsure' classification). I think you may simply know which word belongs to which category because of the occurences in the text. ('fishing' is likely to be mentioned near 'sports').
After some training of the neural network it should "link" you the words to the categories.
You might be able to put use the WordNet database, create some metric to determine how closely linked two words (the word and the category) are and then choose the best category to put the word in.
You could implement a learning algorithm to do this using a monte carlo method and human feedback. Have the system randomly categorize words, then ask you to vote them as "match" or "not match." If it matches, the word is categorized and can be eliminated. If not, the system excludes it from that category in future iterations since it knows it doesn't belong there. This will get very accurate results.
This will work for the 100 word problem fairly easily. For the larger problem, you could combine this with educated guessing to make the process work faster. Here, as many people above have mentioned, you will need external sources. The google method would probably work the best, since google's already done a ton of work on it, but barring that you could, for example, pull data from your facebook account using the facebook apis and try to figure out which words are statistically more likely to appear with previously categorized words.
Either way, though, this cannot be done without some kind of external input that at some point came from a human. Unless you want to be cheeky and, for example, define the categories by some serialized value contained in the ascii text for the name :P

Which direction should the arrows point in a sorted table? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
The community reviewed whether to reopen this question 10 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
In a sorted table, it's common to have an up or a down arrow indicating the sort style. However, I'm having some trouble determining which direction the arrow should point. In an ASC sort, characters are sorted 1-9A-Za-z. Should the arrow point up or down?
I've found implementations of both on the web, so that didn't help me much: Up and Down (you have to create the table first).
Is there a hard and fast rule for this? I find myself able to justify both implementations. Which method do you use? Which is more intuitive to you and why?
Edit: Some of you have suggested alternate implementations like rising bars or having letters with an arrow indicating sort direction. Great suggestions. I'm definitely open to other options. The less ambiguous, the better. It might be picky, but I'd really like there to be minimal or no confusion on the part of the user.
Edit: I ended up going with the rising and falling bars for now. It's not standard, but seems less ambiguous than the triangles. The current sort column shows three bars, small to large (left to right) for ASC, the opposite for DESC. Other sortable columns have no bars by default, but hovering over any sortable column heading (including the current) shows bars depicting how the table will be sorted if that column heading is clicked.
I don't think of them as arrows, but as a visual mnemonic of the current state. So, showing a triangle pointing down shows descending order. It is visually in line with the icon with the largest item (base of the triangle) at the top of the list and the smallest (point of the triangle) at the bottom.
I've always went with the following:
Ascending -- Arrow pointing up
Descending -- Arrow pointing down
In my opinion, the visual representation of the arrow pointing up/down most accurately explains the sort order.
I’ve done usability tests on this. There does not appear to be a consistent interpretation among users on what the arrows mean. I seem to recall that even each user was not consistent, thinking the arrow down meant ascending in one case and descending in another. I tried arrows to left and right (“forward” versus “backwards” sort), but they failed to be interpreted consistently too. I tried showing current state and showing the state that would result. Neither worked.
What did work was a schematic text depiction of the sort order: “A..Z” and “Z..A” for alpha, “1..9” and “9..1” for numeric, “1..12” and “12..1” for dates (the usability test used mm/dd/yy date format).
Show this text as read-only indicating the current state. Place a button beside the text to set or swap the sort order.
Didn’t try the rising/falling bar icon, but I expect it can run into difficulties where “bigger” is ambiguous. For example, is an older date in the past bigger (longer ago) or smaller (closer to Time 0) than a more recent date? Is Priority 1 bigger or smaller than Priority 2? Grade A bigger or smaller than Grade B? For that matter, who, other than geeks, thinks that “Zuschlag” is vastly bigger than “Abbott”? Not that I’m taking this personally, of course.
For some reason I feel is always backwards. For me the down pointing arrow/triangle should represent the way I usually read things (from top to bottom -> from a to z) and the up pointing arrow is backwards from the way I read things (from z to a). But that's just me, since most popular UIs (Mac, Windows, etc. etc.) use it the other way, they must know something :).
In any case consistency with what the user is used to is a good option.
My favorite is actually the way that e.g. Excel handles it -- don't use an arrow, but rather a custom icon with
A |
Z V
for ascending sort and
Z |
A V
for descending sort. Nobody will ever wonder which way you're sorting.
Now, if you can't use a custom icon but rather need a printable character, I'd say people are about as likely to be confused by either one. Windows uses the "small part of arrow corresponds to smaller value" for Explorer, which is to say that ascending sort points up. But plenty of other sources assume that the base of the arrow starts at the lowest value and points in the direction of the sort, which frankly makes as much sense as anything else. In other words, half your users will probably have to adjust either way.
Ascending : Arrow pointing up
Descending : Arrow pointing down
Tricks to remember:
Alphabets:
A scending i.e. A B C D
D escending i.e. D C B A
Numbers:
A=1,B=2...Z=26.
Ascending A B C D so 1 2 3 4 i.e. small to large
Descending D C B A so 4 3 2 1 i.e. large to small
Date:
Date is actually converts to a number, it increase on day base, so it works again a number system. today is bigger than yesterday, today is smaller than tomorrow.
I like:
arrow pointing down for ascending
order
arrow pointing up for
descending order
Why? Because it feels like I just sorted the page. I clicked on the heading and it was "Wow! Sorted top-to-bottom". Why "top-to-bottom" is called ascending, is because the numbers/letters get higher in value as the computer writes to the screen. The opposite for descending. However, the list is actually descending down the screen from the top to the bottom - a to z. When you order it the other way, the beginning of the list is at the bottom of the screen.
So to the physical human mental logic - the kind that means clockwise is close and counter-clockwise is open, it makes sense to ignore how the machine sorts and outputs the data, and think rather about how a human might sort data: start at the beginning (smallest values) and at the top of the paper, then advance through to the end (largest value) on down the paper.
The reason the beginning is smallest, is because 1 comes before 2, and the Roman alphabet starts with A and end with Z. So this is sort of default for us humans at this point in time. We write top to bottom and left to right. It has to do with handedness and the way we hold paper - I think. I'm not actually human interface specialist. I just thought about why it seems more natural. The KDE guys are human interface specialists. Take a look how Oxygen is done.
The other way I think is alright is a triangle that is actually showing that the data is smallest to largest. Again, this is rather technical and at first glance, the human might not "get it".
in the classic Finder, Apple didn't use arrows. instead, there were a small icon that looked like three (or four?) horizontal lines of increasing or decreasing length. at first sight, it was like a triangle; but when looking at it, it was clear if it was getting bigger or smaller.
other GUIs (KDE, for example) use triangles, but most people interpret them as arrows, making the message ambiguous.
An arrow pointing up usually means larger or getting larger, so that should be used for Ascending order.
An arrow pointing downwards usually means something is smaller or getting smaller and it should be used for Descending order.
I expect the arrows to show the current state (pointing up when the list is currently ascending). The is what Windows Explorer does in Details View.
The other thing that you need to consider is whether the arrow represents the current sort direction or the sort direction that will be applied if you click on the arrow. (Not always obvious from the contents of the table as there can be arrows on every column)
Sorry to add to the confusion, but you need to consider this.
Clarification on this front can be partially achieved by adding a suitably worded tooltip to the arrow.
"Is there a hard and fast rule for this?" - Apparently not, since you found examples of both.
For general consistency, I'd say that the arrow should point up for ascending, down for descending. This is consistent with Windows (click a column header in Explorer) and Office (filter/sort a column in Excel).
Best place to check, in my opinion.. would be large corporate websites like amazon, dabs etc. as these will be what users are most used to.
I think everybody agrees that the direction of the arrow requires the users to think about its meaning. What does a down arrow mean? A-Z or Z-A? What does it mean when sorting dates? It is clear only when the user looks at the actual sorted data.
For this reason, I find it best not to use any indication of direction. It is enough to indicate the fact that rows are sorted by a certain column. It is important to find out which way the users usually expect the rows to be sorted for the first click. The second click reverses the order, the third click turns sorting off.
I have implemented this several times and users have no problem sorting the rows by different columns the way they like.
This is a highly opinionated question. But here is a logical solution and the reason behind why I chose it :
Solution:
If using an arrow to display sort order in a Table Column, it will be better to use a Down arrow for ascending and vice versa.
Reasons:
If we are referring to a picture or a graph, where visual and value based traversal are both in the same direction, using "Ascending" or "Descending" will serve its purpose as intended. But when it comes to Tables, the main source of confusion is that, the values are traversed to higher values (upward, but only conceptually) but the direction of visual traversal is downward. And since an arrow is a visual clue (direction) it might be easier for someone to understand the directional traversal better with it.
For many people the concept of ascending and descending are understandable in terms of values. But in certain cases, the users of that table might not be aware of these concepts. For example, someone who has never been to school or a primary school kid or someone completely new to digital world. For them, directional concept will be easier to understand. As in a-z, upward or downward. As in 1-9 upward or downward. It is to be noted that educated or experienced users can understand it either way (the 4th reason).
Next reason is that whether we are masters in digital tables or not, we have always written lists and tables. And in almost all the cases, we write it in ascending order in downward directions. So, it is somewhat hardwired to our brain.
Finally, the confusion on this issue always existed and a universal method is always better. To this day, I always analyse how the values are sorted to see if it is in ascending or descending order. The directional arrows never served its purpose since it is not reliable. For educated or experienced users knowing the order immediately will not be a problem. When we create a universal standard though, we must see to it that every probable user would be 'able' to understand it...
How I use it:
I use a tailed arrow for numbers, alphabets or any other values increasing progressively while traversing in that direction.
Since the values are increasing (by default), user can call it Ascending if he wants to, but the arrow is downwards. This also helps me in sorting 'word sets' (for example, the status of a record in the table. It may not be sorted in alphanumeric order but in order of status progression).
Hope this helped.
Remember that descending is for down. So, I would use the down arrow for descending. But, I always get confused by this anyway. I recommend that you use letters instead, like A-Z and Z-A instead of the arrows. Or, use them in conjunction with the arrows.
There is no hard and fast rule, but the best approach is to reduce the apparent complexity for the user, using the best mapping of down and up arrows to the terminology "Ascending" and "Descending".
Note that most non-numeric concepts doesn't have a strong natural mapping to "up" or "down".
Do letters/words go "up" or "down" in the dictionary? How about dates and times? Where it is ambiguous, I believe there is no "right" answer - I recommend allowing "Descending" to represent the more useful sort order for consistency, as the user is likely to think about the table as being "down" as they move their eyes down. Leaving aside digital representations since the users of most applications do not know or care about the internal representation. Therefore alphabet sorting could be "Descending" for A-Z, and time could be Descending with most recent first. The good news is, as long as the first click gives the user the sorting behavior that they expect, and the second reverses it, they mostly won't care which of the two modes is used.
That challenge about the correct sort to do "first" (on showing the content, or sorting on a new column) is the more important implication of the question. The type AND intended usage of the information determines it should sort. Alphabetic should always default A-Z, "descending" by my above logic. Numbers vary much more by use: numbers that represent sequential identifiers, say, an employee ID would be ascending (1-10), while sorting on quantities or price would usually be descending, to bring the largest values to the top. Time also varies - most recent first ("Descending") usually works but in some contexts, the oldest should be listed first.

Resources