Nested sorting in dimension hierarchies (Tableau) - sorting

I am working on a vizualisation in Tableau that has dimension hierarchy (Product category, product sub-category, product type etc.) sorted descending by number of orders. I want my viz to show by default only first product level (product category) sorted the same way, but give an option to drill down (using "+" on the dimension) to detailed product levels and using nested sorting (again, descending by number of orders).
superstore data sample
I tried using nested sorting option for each product level, but when I drill up and down again, the sorting is wrong again, as if it clears out. I cannot find an option to keep them fixed unless I keep all product levels visible in the viz (without drill-down option).
Does anyone know how can I do it? I tried also different ways of indexing and ranking calculations, but nothing seem to work. I know there is one option to combine hierarchy dimensions and using sorting option on them, but it keeps the viz really untidy.
Thanks in advance!

Tableau will always sort based on the left most column. With the newer nested sorting you can more easily do a secondary sorting. However, when you expand up/down hierarchies like you are noting the formatting might not be retained.
The "classic" way to do this is to create a Rank by number of orders (sounds like you were close on this one). rank(COUNT([Order ID]),'desc'). Make this a discrete measure and put it to the left of all the other dimensions.
To clean it up, you can uncheck "Show Header" on the rank pill.
And if you expand/collapse the hierarchy, it keeps the sorting... Final product:
EDIT: Here is another way to try to accomplish this. It seems to work for 3 levels but starts to break down after that. (It also didn't seem to work well on grouped dimensions.)
Expand the hierarchy to all three levels.
On each dimension, enforce a sort order of Count of Order ID Descending.

Related

In SPSS modeler (17/18), what is criteria for evaluating ties encountered while sorting particular column using sorting nugget?

Am sorting by particular column using sorting nugget in SPSS modeler 17/18. However, do not understand how ties are evaluated when values are repeated in sorting column. None of the other columns have any sequence associated with it? Can someone throw some light on this.
Have attached illustration here where am sorting on col3 (excel file is original data). However, after sorting, no other cols (Key) seem to follow any sequence/order. How was final data arrived at then?
I have not been able to find any documentation to answer this question, but I believe that the order of ties after the sort is essentially random or at least determined by a number of factors that are outside of the user's control. Generally, I think it is determined by the order of the records in the source, but if you are querying a database or similar without specifying a sort order, you may see that the data will be sorted differently depending on the source system and it may even differ between each execution.
If your processing depends on the sort order of the data (including the order of the ties), the best approach will be to specify the sort order in such a detail that ties will not happen.

Need Opinions / Perspectives - Pagination approach for large data set

We have a requirement where we need to show a lot of data in multiple grids & also provide the option to sort at the UI side. There are 2 approaches:
Load everything in UI & have UI side pagination & sorting.
Load server side paginated data in UI & if user clicks sorting based on any other column, recall the API to re-index the data on sorted column & send back the results again in paginated fashion.
The general feeling is that with approach 1, UI would be unnecessarily loaded with extreme volumes of data (like 10k records across grids - ranging from 1-2 MB) & might cause UI performance issues - not to forget servers serving those requests for large user group close to a million. With approach 2, every time the user clicks sorting, there is an API call & the server resources are wasted for re-sorting the huge data (where the user would care only to see few 10's of records)
What is the best way to handle this kind of scenario?
Is there any industry standard practices where we can refer?
How do we quantify the UI performance?
There's a third approach:
The server has a different index for each possible sort order. When new data is added, it's inserted in the right place in each index. The UI for each user asks the server for "entries N*K to (N=1)*K" of the index that corresponds to whichever sort order the user selected. There is no sorting. There is no need to load everything into each UI.
Note 1: You can probably cheat a little - e.g. if you have an index for "sorted alphabetically in ascending order" then you can use the same index for "sorted alphabetically in descending order". In this way you might only need 4 indexes for 8 possible sort orders.
Note 2: You can probably cheat more. Rather than having one index for each sort order, you can split the data into "buckets" and have an index for each bucket for each sort order. E.g. instead of one index for "sorted alphabetically in ascending order" you could have one index for "starts with A", another index for "starts with B", ... In the same way, instead of one index for "sorted chronologically" you could have one index for this year, one index for last year, ... This helps to speed up insertion costs (when new data is added), and could allow you improve the UI (e.g. little "skip to bucket" buttons users can use).
Is there any industry standard practices where we can refer?
The industry standard practice depends on which industry. Far too many things are shifting to "web apps", where the industry standard practice is to get incompetent developers working for below minimum wage to slap together a piece of trash using extremely inefficient frameworks.
How do we quantify the UI performance?
I'd use response times (the time it takes for the app to start and show the user data, the time it takes to show data after scrolling/moving to a different page, the time between the user clicking on a different "sort order" button and the screen showing the dat ain the new sort order, etc).

Using scoring to find customers

I have a site where customers purchase items that are tagged with a variety of taxonomy terms. I want to create a group of customers who might be interested in the same items by considering the tags associated with purchases they've made. Rather than comparing a list of tags for each customer each time I want to build the group, I'm wondering if I can use some type of scoring to solve the problem.
The way I'm thinking about it, each tag would have some unique number assigned to it. When I perform a scoring operation it would render a number that could only be achieved by combining a specific set of tags.
I could update a customer's "score" periodically so that it remains relevant.
Am I on the right track? Any ideas?
Your description of the problem looks much more like a clustering or recommendation problem. I am not sure if those tags are enough of an information to use clustering or recommendation tough.
Your idea of the score doesn't look promising to me, because the same sum could be achieved in several ways, if those numbers aren't carefully enough chosen.
What I would suggest you:
You can store tags for each user. When some user purchases a new item, you will add the tags of the item to the user's tags. On periodical time you will update the users profiles. Let's say we have users A and B. If at the time of the update the similarity between A and B is greater than some threshold, you will add a relation between the users which will indicate that the two users are similar. If it's lower you will remove the relation (if previously they were related). The similarity could be either a number of common tags or num_common_tags / num_of_tags_assigned_either_in_A_or_B.
Later on, when you will want to get users with particular set of tags, you will just do a query which checks which users have that set of tags. Also you can check for similar users to given user, just by looking up which users are linked with the user in question.
If you assign a unique power of two to each tag, then you can sum the values corresponding to the tags, and users with the exact same sets of tags will get identical values.
red = 1
green = 2
blue = 4
yellow = 8
For example, only customers who have the set of { red, blue } will have a value of 5.
This is essentially using a bitmap to represent a set. The drawback is that if you have many tags, you'll quickly run out of integers. For example, if your (unsigned) integer type is four bytes, you'd be limited to 32 tags. There are libraries and classes that let you represent much larger bitsets, but, at that point, it's probably worth considering other approaches.
Another problem with this approach is that it doesn't help you cluster members that are similar but not identical.

Sorting a list based on multiple indices and weights

Sort of a very long winded explanation of what I'm looking at so I apologize in advance.
Let's consider a Recipe:
Take the bacon and weave it ...blahblahblah...
This recipe has 3 Tags
author (most important) - Chandler Bing
category (medium importance) - Meat recipe (out of meat/vegan/raw/etc categories)
subcategory (lowest importance) - Fast food (our of fast food / haute cuisine etc)
I am a new user that sees a list of randomly sorted recipes (my palate/profile isn't formed yet). I start interacting with different recipes (reading them, saving them, sharing them) and each interaction adds to my profile (each time I read a recipe a point gets added to the respective category/author/subcategory). After a while my profile starts to look something like this :
Chandler Bing - 100 points
Gordon Ramsey - 49 points
Haute cuisine - 12 points
Fast food - 35 points
... and so on
Now, the point of all this exercise is to actually sort the recipe list based on the individual user's preferences. For example in this case I will always see Chandler Bing's recipes on the top (regardless of category), then Ramsey's recipes. At the same time, Bing's recipes will be sorted based on my preferred categories and subcategories, seeing his fast food recipes higher than his haute cuisine ones.
What am I looking at here in terms of a sorting algorithm?
I hope that my question has enough information but if there's anything unclear please let me know and I'll try to add to it.
I would allow the "Tags" with the most importance to have the greatest capacity in point difference. Example: Give author a starting value of 50 points, with a range of 0-100 points. Give Category a starting value of 25 points, with a possible range of 0-50 points, give subcategory a starting value of 12.5 points, with a possible range of 0-25 points. That way, if the user's palate changes over time, s/he will only have to work down from the maximum, or work up from the minimum.
From there, you can simply add up the points for each "Tag", and use one of many languages' sort() methods to compare each recipe.
You can write a comparison function that is used in your sort(). The point is when you're comparing two recipes just add up the points respectively based on their tags and do a simple comparison. That and whatever sorting algorithm you choose should do just fine.
You can use a recursively subdividing MSD (sort of radix sort algorithm). Works as follows:
Take the most significant category of each recipe.
Sort the list of elements based on that category, grouping elements with the same category into one bucket (Ramsay bucket, Bing bucket etc).
Recursively sort each bucket, starting with the next category of importance (Meat bucket etc).
Concatenate the buckets together in order.
Complexity: O(kn) where k is the number of category types and N is the number of recipes.
I think what you're looking for is not a sorting algorithm, but a rating scheme.
You say, you want to sort by preferences. Let's assume, these preferences have different “dimensions”, like level of complexity, type of cuisine, etc.
These dimensions have different levels of measurement. These can be e.g. numeric or simple categories/tags. It would be your job to:
Create a scheme of dimensions and scales that can represent a user's preferences.
Operationalize real-world data to fit into this scheme.
Create a profile for the users which reflects their preferences. Same for the chefs; treat them just like normal users here.
To actually match a user to a chef (or, even to another user), create a sorting callback that matches all your dimensions against each other and makes sure that in each of the dimension the compared users have a similar value (on a numeric scale), or an overlapping set of properties (on a nominal scale, like tags). Then you sort the result by the best match.

Sorting Series Data on an RDLC Bar Chart

I have a list of objects that are sorted by a percentage value in ascending order. Then when I right-click on the bars and under the series data value field, I enter the following expression:
=FormatNumber(Avg(Fields!Percentage.Value),2)
Attached is an image of the chart. I would like to sort the bars in ascending order for each Milestone Phase. How can I do this? It appeared to be working before but I'm not quite sure what happened that caused it to not work anymore. Perhaps before I only had one value per contact and the average was the percentage since there was only one percentage to average out. Not quite sure but I can't seem to find out how to fix this.
I've selected the Series Group Properties, set the Group Expression to Group on ContactName and then changed the Sorting expression to =Avg(Fields!Percentage.Value) but this does not sort the bars in any order whatsoever so I'm not quite sure what to do next.

Resources