Cognos: How to round an average of a data item result - reporting

I'm using Cognos Report Studio. I'm pretty new to the software. But anyway, I've created a query that is meant to count the number of days between two dates. There are multiple records and I need the average of all the days. I'm able to do all of this. But my result is 6.57254211... I want this number to be rounded. But I can't seem to figure out how to do this. I achieved the average by applying an aggregate function. I though the round would be applied the same way. But there is no Round in the rollup aggregate function option. I also tried to use _round() in my data item code, but that returned an error. Plus, I'm pretty sure that just rounds each individual number I get, not the average of all of them. Anyone know how to do this?

I was able to round my average by creating a query calculation and doing _round([Weekdays]) for the code. [Weekdays] being the average I need rounded.

Related

DAX formula, crossfilter function nor returning expected result

I'm obtaining wrong results from a DAX formula and I can't understand why.
In my database I have articles that are composed by multiple tools, which are produced from blank tools. One blank can be used to produce multiple tools. I need to calculate blank sales by 3 time periods: last 6, last 12 and last 24 months.
This is my Power BI model:
The time period table I used for the time period slicer and the measure look like this :
To obtain Blank's sales volumes, I created 3 measures:
When I use the last formula, which I thought would have returned the right amount of Blank sold by article by time period, I obtain strange results.
When I select "last 24 months" time period, everything looks fine:
When I select "Last 12 months", the total is fine, but the total by article is wrong:
Finally, if I select "Last 6 months" time period, all the results are totally wrong:
The curious fact is that I checked the result by executing a sql query on the database, and the DAX formula returns the right result (so 1466 for the selected time period), but only when used in a card, without filtering it by Article number.
I have no other filters that affect the visuals.
Could you help me understand why I'm not obtaining the right result, or suggest a better way to reach the desired results?
I'm guessing (at least part of) the problem is that you are backing up from different end dates because LASTDATE(Sales[DocumentDate]) can return different values for different ArticleNo.
I'm not sure what value you actually want for that date, possibly LASTDATE('Dates Table'[Date]), but I'm pretty sure you want it consistent across different ArticleNo.

Statistics/Algorithm: How do I compare a weekly graph with its own history to see when in the past it was almost the same?

I’ve got a statistical/mathematical problem I’m stumped on and I was really hoping to get some help. I’m working on a research where I need to compare a weekly graph with its own history to see when in the past it was almost the same. Think of this as “finding the closest match”. The information is displayed as a line graph, but it’s readily available as raw data:
Date...................Result
08/10/18......52.5
08/07/18......60.2
08/06/18......58.5
08/05/18......55.4
08/04/18......55.2
and so on...
What I really want is the output to be a form of correlation between the current data points with the other set of 5 concurrent data points in history. So, something like:
Date range.....................Correlation
07/10/18-07/15/18....0.98
We’ll be getting a code written in Python for the software to do this automatically (so that as new data is added, it automatically runs and finds the closest set of numbers to match the current one).
Here’s where the difficulty sets in: Since numbers are on a general upward trend over time, we don’t want it to compare the absolute value (since the numbers might never really match). One suggestion has been to compare the delta (rate of change as a percentage over the previous day), or using a log scale.
I’m wondering: how do I go about this? What kind of calculation I can use to get the desired results? I’ve looked at the different kind of correlation equations, but they don’t account for the “shape” of the data, and they generally just average it out. The shape of the line chart is the important thing.
Thanks very much in advance!
I would simply divide the data of each week by their average (i.e., normalize them to an average of 1), then sum the squares of the differences of each day of each pair of weeks. This sum is what you want to minimize.
If you don't care about how much a graph oscillates relative to its mean, you can normalize also the variance. For each week, calculate mean and variance, then subtract the mean and divide by the root of the variance. Each week will have mean 0 and variance 1. Then minimize the sum of squares of differences like before.
If the normalization of data is all you can change in your workflow, just leave out the sum of squares of differences minimization part.

Tableau - Calculated fields / grouping / Custom Dim

Tableau:
This may seem simple, but I ran out of the usual tricks I've used in other systems.
I want a variance column. Essentially adding a member 'Variance' to the Act/Plan dimension which only contains the members 'Actual' and 'Plan'
I've come in where the data structure and reporting is set up like so:
Actual | Plan
Profit measure
measure 2
measure 3
etc
The goal is to have a Variance column (calculated and not part of the Actual/Plan dimension)
Actual | Plan | Variance
Profit measure
measure 2
measure 3
etc
There are solutions where it works for one measure only, and I've looked into that.
ie, create calculated field as such
Profit_Actual | Profit_Plan | Variance
You put this on the columns, and you get a grid that I want... except a grid with only 1 measure.
This does not work if I want to run several measures on rows. Essentially the solution above will only display the Profit measure, not Measure 1_Actual , Measure 2_Plan etc.
So I tried a trick where I grouped a the 3 calculated measures, ie Profit_Actual | Profit_Plan | Profit_Variance as 'Profit_Measure'
Created a parameter list - 'Actual', 'Plan', 'Variance'
Now I can half achieve my goal, by having the parameter on columns and the 'Profit Measure' on Rows (so I can have Measure 123_group etc down on rows too). Trouble is, I found that parameters are single select only. Only if it can display all options in the custom paramater at once, I would've solved my problem.
Any ideas on how I can achieve the Variance column I want?
Virtually adding a member to a dimension/Calculated fieds/tricks/workaround
Thank you
Any leads is appreciated
Gemmo
Okay. First thing, I had a really hard time trying to understand how your data is organized, try to be more clear (say how each entry in your database looks like, and not how a specific view in Tableau looks like).
But I think I got it. I guess you have a collection of entries, and each entry has a number of measure fields (profits and etc.) and an Act/Plan field, to identify whether that entry is an actual value or a planned value. Is that correct?
Well, if that's the case, I'm sorry to say you have to calculate a variance field for each dimension. Think about it, how your original dataset is structured. Do you think you can add a single field "Variance" to represent the variance of each measure? Well, you can, store the values in a string, and then collect it back using some string functions, but it's not very practical. The problem is that each entry have many measures, if it had only 1 measure, than 1 single variance field would suffice.
So, if you can re-organize your data, what would be an easier to work set (but with many more entries) is something with the fields: Measure, Value, Actual/Plan. The measure field would have a string to identify what you're measuring in that entry. Value would be a number to represent the actual measure. And the Actual/Plan is the same. For instance:
Measure Value Actual/Plan
Profit 100 Actual
So, each line in your current model would become n entries, where n is the number of measures you have right now. So a larger dataset in a way, but easier to work with. Think about, now you can have a calculated field, and use some table calculations to calculate the variance only for that measure and/or Actual/Plan. Just use WINDOW_VAR, and put Measure and/or Actual/Plan in the partition.
Table calculations are awesome, take a look at this to understand it better. http://onlinehelp.tableausoftware.com/current/pro/online/en-us/help.htm#calculations_tablecalculations_understanding_addressing.html
I generally like to have my data staged such that Actual is its own column and Plan is its own column in the data being fed to Tableau. It makes calculations so much easier.
If your data is such that there is a column called "Actual/Plan" and every row is populated with either "Actual" or "Plan" and there is another column called "Value" or "Measure" that is populated with the values, you can force Tableau to make them columns assuming you can't or won't rearrange your data.
Create a calculated field called "Actual" with the following calc:
IF [Actual/Plan] = 'Actual' THEN [Value] END
Similarly, create a calculated field called "Plan" with the following calc:
IF [Actual/Plan] = 'Plan' THEN [Value] END
Now, you can finally create your "Variance" and "Variance %" calculations (respectively):
SUM([Actual]) - SUM([Plan])
[Variance] / SUM([Plan])

Estimating number of results in Google App Engine Query

I'm attempting to estimate the total amount of results for app engine queries that will return large amounts of results.
In order to do this, I assigned a random floating point number between 0 and 1 to every entity. Then I executed the query for which I wanted to estimate the total results with the following 3 settings:
* I ordered by the random numbers that I had assigned in ascending order
* I set the offset to 1000
* I fetched only one entity
I then plugged the entities's random value that I had assigned for this purpose into the following equation to estimate the total results (since I used 1000 as the offset above, the value of OFFSET would be 1000 in this case):
1 / RANDOM * OFFSET
The idea is that since each entity has a random number assigned to it, and I am sorting by that random number, the entity's random number assignment should be proportionate to the beginning and end of the results with respect to its offset (in this case, 1000).
The problem I am having is that the results I am getting are giving me low estimates. And the estimates are lower, the lower the offset. I had anticipated that the lower the offset that I used, the less accurate the estimate should be, but I thought that the margin of error would be both above and below the actual number of results.
Below is a chart demonstrating what I am talking about. As you can see, the predictions get more consistent (accurate) as the offset increases from 1000 to 5000. But then the predictions predictably follow a 4 part polynomial. (y = -5E-15x4 + 7E-10x3 - 3E-05x2 + 0.3781x + 51608).
Am I making a mistake here, or does the standard python random number generator not distribute numbers evenly enough for this purpose?
Thanks!
Edit:
It turns out that this problem is due to my mistake. In another part of the program, I was grabbing entities from the beginning of the series, doing an operation, then re-assigning the random number. This resulted in a denser distribution of random numbers towards the end.
I did a little more digging into this concept, fixed the problem, and tried it again on a different query (so the number of results are different from above). I found that this idea can be used to estimate the total results for a query. One thing of note is that the "error" is very similar for offsets that are close by. When I did a scatter chart in excel, I expected the accuracy of the predictions at each offset to "cloud". Meaning that offsets at the very begging would produce a larger, less dense cloud that would converge to a very tiny, dense could around the actual value as the offsets got larger. This is not what happened as you can see below in the cart of how far off the predictions were at each offset. Where I thought there would be a cloud of dots, there is a line instead.
This is a chart of the maximum after each offset. For example the maximum error for any offset after 10000 was less than 1%:
When using GAE it makes a lot more sense not to try to do large amounts work on reads - it's built and optimized for very fast requests turnarounds. In this case it's actually more efficent to maintain a count of your results as and when you create the entities.
If you have a standard query, this is fairly easy - just use a sharded counter when creating the entities. You can seed this using a map reduce job to get the initial count.
If you have queries that might be dynamic, this is more difficult. If you know the range of possible queries that you might perform, you'd want to create a counter for each query that might run.
If the range of possible queries is infinite, you might want to think of aggregating counters or using them in more creative ways.
If you tell us the query you're trying to run, there might be someone who has a better idea.
Some quick thought:
Have you tried Datastore Statistics API? It may provide a fast and accurate results if you won't update your entities set very frequently.
http://code.google.com/appengine/docs/python/datastore/stats.html
[EDIT1.]
I did some math things, I think the estimate method you purposed here, could be rephrased as an "Order statistic" problem.
http://en.wikipedia.org/wiki/Order_statistic#The_order_statistics_of_the_uniform_distribution
For example:
If the actual entities number is 60000, the question equals to "what's the probability that your 1000th [2000th, 3000th, .... ] sample falling in the interval [l,u]; therefore, the estimated total entities number based on this sample, will have an acceptable error to 60000."
If the acceptable error is 5%, the interval [l, u] will be [0.015873015873015872, 0.017543859649122806]
I think the probability won't be very large.
This doesn't directly deal with the calculations aspect of your question, but would using the count attribute of a query object work for you? Or have you tried that out and it's not suitable? As per the docs, it's only slightly faster than retrieving all of the data, but on the plus side it would give you the actual number of results.
http://code.google.com/appengine/docs/python/datastore/queryclass.html#Query_count

Random noise in Solr score

I am looking for a way of introducing random noise into my scoring function, and I'm at a loss on how to best proceed.
Some background:
We use Solr for a web application that manages large-ish sets of photos for agencies.
One customer has an interesting requirement for scoring:
'quality' field, maintained by editors, from 1 (highest) to 3 (lowest);
'date' field, boosting more recent photos; I would probably use a logarithmic function;
However, due to how the stock photo market works, this will likely result in many similar photos appearing together.
Their request is to give 'quality' a large boost, but introduce some randomness so that photos will not appear in a strict date order.
Any idea?
EDITED: a key requirement is to have "stable" query results: if I search twice for "tropical island" I can get a slightly different result set, but if I ask for the first page, then the second, then the first, I'd better get the same results :)
You could do this with FunctionQueries. For each photo add a field with a random number close to 1 (e.g. 0.99, 1.02) and use it in a product function query to alter the "natural" score.
Turns out my first approach to solving the problem was the correct one, and I had a trivial implementation bug. In case it helps others:
RandomSortField does have the characteristics I need (that is, returning repeatable results for the same query).
Leaving aside the FunctionQuery for a moment, even something trivial like:
sort=quality_i asc, date_d desc, random_12345 desc
will approximate my requirements.
However, when using the Sunspot ruby gem, there's no way of passing the seed, and that's what was tricking me earlier: I ended up using a different seed each time, thus getting "true" random results.

Resources