I am looking to sort the following table in Tableau by risk rating and date. For risk rating in can bethe form of Low-Medium-High or the other way round and for the date it is from earliest to latest and the other way round. I understand that Tableau does some kind of nesting from left to right such that it only sorts the leftmost column. How can I overcome that such that a user can do a sort with as little effort as possible (perhaps in the form of a button?)
Related
I was creating some analysis on revenue for past years. One thing I noticed is measures of revenue for each month of a year are same for every year's corresponding months. That is revenue for April 2015 is same as revenue for April 2016.
I did some searching to solve this problem. I found that our measure column 'Revenue' is aggreagted based on time dimension as 'Last(sum(revenue))'. So actual revenue values of April 2019 is considered by OBIEE as last and copied to other year's April month revenue.
I can understand that keyword 'last' may be the reason of this, but shouldn't year, quarter, month columns choose exactly those numbers that corresponds to that date? Can someone explain how this works and suggest solutions, please?
Very simply put: The "LAST" is the reason. It doesn't "copy" the value though. It aggregates the values to the last existing value along the dimensional hierarchy specified.
The question is: What SHOULD that Saldo show? What is the real business rule?
Also lastly: Using technical column names and ALL UPPER CASE COLUMN NAMES in the BMM layer shouldn't be done. The names should be user-focused, readabla and pretty. Otherwise everybody has to go and change it 50 times over and over in the front-end.
It's been a year since I posted this question,but a fix for this incorrect representation of data was added today. In the previous version of rpd, we used another alternative solution to this by creating two measure columns of saldo ( saldo_year and saldo_month) and setting level for them at year and level respectively and using them both in an analysis. This was a temporary solution until we did the second version of our rpd since we realized that structure of the old one wasn't completely correct and it was easier and less time consuming to make it from ground and create a new one than to fix the old one.
So as #Chris mentioned, it was all about correct time dimension and hierarchies. We thought we created it with all requirements met, but recently we got the same problem in our analyses. Then we figured out that we didn't set id columns as primary key in month and quarter logical levels. After we got the data we want. If anybody faces this kind of problem, then the first thing to check in rpd is how the time dimension and hierarchy is defined, how logical levels and primary keys and chronological keys are set in hierarchy.
Am sorting by particular column using sorting nugget in SPSS modeler 17/18. However, do not understand how ties are evaluated when values are repeated in sorting column. None of the other columns have any sequence associated with it? Can someone throw some light on this.
Have attached illustration here where am sorting on col3 (excel file is original data). However, after sorting, no other cols (Key) seem to follow any sequence/order. How was final data arrived at then?
I have not been able to find any documentation to answer this question, but I believe that the order of ties after the sort is essentially random or at least determined by a number of factors that are outside of the user's control. Generally, I think it is determined by the order of the records in the source, but if you are querying a database or similar without specifying a sort order, you may see that the data will be sorted differently depending on the source system and it may even differ between each execution.
If your processing depends on the sort order of the data (including the order of the ties), the best approach will be to specify the sort order in such a detail that ties will not happen.
General Overview: I have an Oracle table 'product' that contains approximately 80 million records and I would like to improve the performance of joins that use this table. In most cases we are interested in a very small subset of records from (table) 'product' with (column) 'valid_until' date (value) 'mm/dd/9999'.
Possible solutions:
Partition 'mm/dd/9999' and use partition exchange to quickly load new data.
Use an index on 'valid_until' date.
Do you guys have any other possible Oracle solutions or ideas?
Based on needing to find 1% of records, I would expect an index to be adequate. It might pay to include the PK of the table as well if the query is just to find that for the current products.
If there is not a need to identify records by other valid_until dates then it might be worth using Oracle's equivalent of a partial index by indexing on:
case value_until
when date '...whatever the date is...'
then valid_until
else null
end
... but that would mean changing the schema or the tool that generates the queries or both.
You might keep an eye on the table's statistics to make sure that the cardinality of the selected rows is subject to a reasonably accurate estimation.
I wouldn't go for a partition-based solution as a first choice, as the overhead of row-migration during the update of the valid_until values would be fairly high, but if an index cannot deliver the query performance then by all means try.
I'm building a table that contains about 400k rows of a messaging app's data.
The current table's columns looks something like this:
message_id (int)| sender_userid (int)| other_col (string)| other_col2 (int)| create_dt (timestamp)
A lot of queries I would be running in the future will rely on a where clause involving the create_dt column. Since I expect this table to grow, I would like to try and optimize it right now. I'm aware that partitioning is one way, but when I partition it based on create_dt the result is too many partitions since I have every single date spanning back to Nov 2013.
Is there a way to instead partition by a range of dates? How about partition for every 3 months? or even every month? If this is possible - Could I possibly have too many partitions in the future making it inefficient? What are some other possible partition methods?
I've also read about bucketing, but as far as I'm aware that's only useful if you would be doing joins on a column that the bucket is based on. I would most likely be doing joins only on column sender_userid (int).
Thanks!
I think this might be a case of premature optimization. I'm not sure what your definition of "too many partitions" is, but we have a similar use case. Our tables are partitioned by date and customer column. We have data that spans back to Mar 2013. This created approximately 160k+ partitions. We also use a filter on date and we haven't seen any performance problems with this schema.
On a side note, Hive is getting better at scaling up to 100s of thousands of partitions and tables.
On another side note, I'm curious as to why you're using Hive in the first place for this. 400k rows is a tiny amount of data and is not really suited for Hive.
Check out hive built in UDFs. With the right combination of them you can achieve what you want. Here's an example to partition on every month (produces "YEAR-MONTH" string that you can use as partition column value):
select concat(cast(year(to_date(create_dt)) as string),'-',cast(month(to_date(create_dt)) as string))
But when partitioning on dates it is usually useful to have multiple levels of the date dimension so in this case you should have two partition columns, first for year and second for month:
select year(to_date(create_dt)),month(to_date(create_dt))
Keep in mind that timestamps and dates are strings, and that functions like month() or year() return integers as values of date fields. You can use simple mathematical operations to figure out the right partition.
I have an Excel 2010 pivot table that has categories and a count measure as the data. Those categories then have a date dimension nested underneath, filtered to show only the last two months.
When I sort the categories, I am sorting them by the total of the count measure across both June and July, in descending order.
Can anyone suggest how I can sort the categories based on the June data alone, as opposed to the total for both June and July?
Thanks!
Your questions is not related to Sql Server Analysis Services. SSAS provides multidimensional data that is also used by pivot tables as datasource. So that's why you have seen pivot table questions here. But they are not Excel related.
Anyway, i want to try to answer your question. As far as i understand your question, changing the order of the dimensions in your pivot table will be sufficient to achieve your goal. Add the date dimension to the pivot table first and then the category dimension to get your data grouped by date (month). You may then sort by categories to get the result you want.
Hope this help.