I was creating some analysis on revenue for past years. One thing I noticed is measures of revenue for each month of a year are same for every year's corresponding months. That is revenue for April 2015 is same as revenue for April 2016.
I did some searching to solve this problem. I found that our measure column 'Revenue' is aggreagted based on time dimension as 'Last(sum(revenue))'. So actual revenue values of April 2019 is considered by OBIEE as last and copied to other year's April month revenue.
I can understand that keyword 'last' may be the reason of this, but shouldn't year, quarter, month columns choose exactly those numbers that corresponds to that date? Can someone explain how this works and suggest solutions, please?
Very simply put: The "LAST" is the reason. It doesn't "copy" the value though. It aggregates the values to the last existing value along the dimensional hierarchy specified.
The question is: What SHOULD that Saldo show? What is the real business rule?
Also lastly: Using technical column names and ALL UPPER CASE COLUMN NAMES in the BMM layer shouldn't be done. The names should be user-focused, readabla and pretty. Otherwise everybody has to go and change it 50 times over and over in the front-end.
It's been a year since I posted this question,but a fix for this incorrect representation of data was added today. In the previous version of rpd, we used another alternative solution to this by creating two measure columns of saldo ( saldo_year and saldo_month) and setting level for them at year and level respectively and using them both in an analysis. This was a temporary solution until we did the second version of our rpd since we realized that structure of the old one wasn't completely correct and it was easier and less time consuming to make it from ground and create a new one than to fix the old one.
So as #Chris mentioned, it was all about correct time dimension and hierarchies. We thought we created it with all requirements met, but recently we got the same problem in our analyses. Then we figured out that we didn't set id columns as primary key in month and quarter logical levels. After we got the data we want. If anybody faces this kind of problem, then the first thing to check in rpd is how the time dimension and hierarchy is defined, how logical levels and primary keys and chronological keys are set in hierarchy.
Related
I'm obtaining wrong results from a DAX formula and I can't understand why.
In my database I have articles that are composed by multiple tools, which are produced from blank tools. One blank can be used to produce multiple tools. I need to calculate blank sales by 3 time periods: last 6, last 12 and last 24 months.
This is my Power BI model:
The time period table I used for the time period slicer and the measure look like this :
To obtain Blank's sales volumes, I created 3 measures:
When I use the last formula, which I thought would have returned the right amount of Blank sold by article by time period, I obtain strange results.
When I select "last 24 months" time period, everything looks fine:
When I select "Last 12 months", the total is fine, but the total by article is wrong:
Finally, if I select "Last 6 months" time period, all the results are totally wrong:
The curious fact is that I checked the result by executing a sql query on the database, and the DAX formula returns the right result (so 1466 for the selected time period), but only when used in a card, without filtering it by Article number.
I have no other filters that affect the visuals.
Could you help me understand why I'm not obtaining the right result, or suggest a better way to reach the desired results?
I'm guessing (at least part of) the problem is that you are backing up from different end dates because LASTDATE(Sales[DocumentDate]) can return different values for different ArticleNo.
I'm not sure what value you actually want for that date, possibly LASTDATE('Dates Table'[Date]), but I'm pretty sure you want it consistent across different ArticleNo.
I have no clue where to post this question, so here it goes.
I have a Vue.js application which lets me order milk for a customer. The customer can choose to order milk either 1, 2, or 7 times a week as a part of a subscription service(90 days, for example).
Based on these number of days(once a week, twice, or 7 times a week), I need to create future orders. But the problem arises in the way I am trying to address this problem.
Should I create future orders beforehand? But if I do so, and if the customer wants milk 3 times a week for 90 days, I would have to create 3*90=270 future records in my firestore subcollection. And that is just for one customer. Is this wise(I know it isn't but I can't put a finger on a solution)?
Also, the user will have a date-time picker where in they can choose a date in the future and get a list of all the orders on that future date.
Instead of storing each recurrence, you could only store the recurrence pattern and then calculate the occurrences on the fly.
This of course does require more calculations, but it saves on the amount of data stored. In that sense this is a classical space vs time tradeoff.
I am designing a Data Warehouse and need some help with my fact table.
My fact table is capturing the facts for aged debt, this table captures all transactions against bills.
The dimension keys i have are listed below:
dim_month_end_key
dim_customer_key
dim_billing_account_key
dim_property_key
dim_bill_key
dim_charge_key
dim_payment_plan_key
dim_income_type_key
dim_transaction_date_key
dim_bill_date_key
I am trying to work out what my level of granularity would be as all the keys together could be duplicated, let's say if a customer makes a payment twice in one day.
I am thinking to solve this i can add a time dimension as the time should always be different.
However the company do not need to report on time, do i add it to prevent duplication regardless?
Thanks
Cheryl
No you don't need a time dimension.
there may be an apparent duplication in your fact, but it will actually reflect 2 deposits in one day - so two valid records. the fact that you might not be able to tell the two transactions apart is not (necessarily) a problem for the system
the report will Sum all the deposits amounts, or count the number of deposits, along any dimension and the totals will still be fine.
I have a dataset like below. I have datetime column as index, type is a column with sequence. For ex; R,C,D,D,D,R,R is a sequence.
start_time type
2019-12-14 09:00:00 RCDDDRR
2019-12-14 10:00:00 CCRD
2019-12-14 11:00:00 DDRRCC
2019-12-14 12:00:00 ?
I want to predict what would be the next sequence at time 12:00:00? which is the best algorithm to predict the next sequence?
I know that we can use Markov chain to predict the probable sequence. However, are there any other better algorithms?
Thanks
you can use from knn,svm for prediction.but the first of all you have to change database and define feature for training dataset for example
you can use from another method base on deep learning , I think this link can help you
https://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/
LSTMs have an edge over conventional feed-forward neural networks and RNN in many ways. This is because of their property of selectively remembering patterns for long durations of time.
LSTMs on the other hand, make small modifications to the information by multiplications and additions. With LSTMs, the information flows through a mechanism known as cell states. This way, LSTMs can selectively remember or forget things. The information at a particular cell state has three different dependencies.
Let’s take the example of predicting stock prices for a particular stock. The stock price of today will depend upon:
The trend that the stock has been following in the previous days, maybe a downtrend or an uptrend.
The price of the stock on the previous day, because many traders compare the stock’s previous day price before buying it.
The factors that can affect the price of the stock for today. This can be a new company policy that is being criticized widely, or a drop in the company’s profit, or maybe an unexpected change in the senior leadership of the company.
These dependencies can be generalized to any problem as:
The previous cell state (i.e., the information that was present in the memory after the previous time step).
The previous hidden state (this is the same as the output of the previous cell).
The input at the current time step (i.e., the new information that is being fed in at that moment).
Maybe this link and method could help you
https://www.bioinf.jku.at/publications/older/2604.pdf
https://www.analyticsvidhya.com/blog/2017/12/fundamentals-of-deep-learning-introduction-to-lstm/
I will try to explain what I want to accomplish. I am looking for an algorithm or approach, not the actual implementation in my specific system.
I have a table with actuals (incoming customer requests) on a daily basis. These actuals need to be "copied" into the next year, where they will be used as a basis for planning the amount of requests in the future.
The smallest timespan for planning, on a technical basis, is a "period", which consists of at least one day. A period always changes after a week or after a month. This means, that if a week is both in May and June, it will be split in two periods.
Here's an example:
2010-05-24 - 2010-05-30 Week 21 | Period_Id 123
2010-05-31 - 2010-05-31 Week 22 | Period_Id 124
2010-06-01 - 2010-06-06 Week 22 | Period_Id 125
We did this to reduce the amount of data, because we have a few thousand items that have 356 daily values. For planning, this is reduced to "a few thousand x 65" (or whatever the period count is per year). I can aggregate a month, or a week, by combining all periods that belong to one month. The important thing about this is, I could still use daily values, then find the corresponding period and add it there if necessary.
What I need, is an approach on aggregating the actuals for every (working)day, week or month in next years equivalent period. My requirements are not fixed here. The actuals have a certain distribution, because there are certain deadlines and habits that are reflected in the data. I would like to be able to preserve this as far as possible, but planning is never completely accurate, so I can make a compromise here.
Don't know if this is what you're looking for, but this is a strategy for calculating the forecasts using flexible periods:
First define a mapping for each day in next year to the corresponding day in this year. Then when you need a forecast for period x you take all days in that period and sum the actuals for the matching days.
With this you can precalculate every week/month but create new forecasts if the contents of periods change.
Map weeks to weeks. The first full week of this year to the first full week of the next. Don't worry about "periods" and aggregation; they are irrelevant.
Where a missing holiday leaves a hole in the data, just take the values for the same day of the previous week or the next week, and do the same at the beginning/end of the year.
Now for each day of the week, combine the results for the year and look for events more than, say, two standard deviations from the mean (if you don't know what that means then skip this step), and look for correlations with known events like holidays. If a holiday doesn't show an effect in this test then ignore it. If you find an effect, shift it to compensate for the different date next year. Don't worry about higher-order effects, you don't have enough data to pin them down.
Now draw in periods wherever you like and aggregate all you want.
Don't make any promises about the accuracy of these predictions, there's no way to know it. Don't worry about whether this is the best possible way; it isn't, but it's as good as any you're likely to find. You can spend as much more time and effort fine-tuning this as you wish; it might raise expectations but it's not likely to make the results much more accurate-- it's about as likely to make them worse.
There is no A-priori way to answer that question. You have to look at your data, and decide what the important parameters (day of week, week number, month, season, temperature outside?) using the results.
For example, if many of your customers are jewish/muslim, then the gregorian calendar, and ISO-week numbers and all that won't help you much, because jewish/muslim holidays (and so users behaviour) are determined using other calendars.
Another example - Trying to predict iPhone search volume according to last year's search doesn't sound like a good idea. It seems that the important timescales are much longer than a year (the technology becoming mainstream over the years) and much shorter than a year (Specific events that affect us for days-weeks).