Cognos dynamic prompts and multiple reports - reporting

My company makes individual products that go through different phases, have different timelines on those phases, and have different departments involved. For example, one product might have a 3-day start phase with 3 departments, a 2-day prototype phase and 4 depts, etc. Another product might have all, none, or more of these phases. The phases are and timelines are input by the user.
I need to resolve the following:
1). Allow the user to determine how many phases will be used, and generate that number of prompt sets (a prompt set is a phase name, begin date, end date, and product ID [which is the same for each phase]). However many prompt sets are requested, that would correspond to the number of phases output.
I have currently hard-coded 5 prompt sets, but I want it dynamic and user-driven, since they know how many phases they want to query and which dates correspond to those phases.
2). Provide a sum for each phase, then an overall sum.
3). Provide percentages (phase % of total, employee % of phase total) (optional)
Right now, I have a static number of prompts leading to a static number of crosstab report outputs. It looks like this:
**D E P A R T M E N T S**
Design | Req | Rev | Total
Phase: Start hrs|amt hrs|amt hrs|amt | hrs|amt
BegDt Endt Emp
1/3 1/6 Sue 1 |100 2|200 | 3|300
JJ 3|300 | 3|300
Ted 2|200 | 2|200
___________________________________
Total 1|100 5|500 2|200 | 8|800
Build | Design | Model | Rev | Total
Phase: Proto hrs|amt hrs|amt hrs|amt hrs|amt | hrs|amt
BegDt Endt Emp
1/7 1/8 Joe 1 |100 1|100 | 2|200
Chris 3|300 | 3|300
Patty 1|100 2|200 2|200 | 5|500
____________________________________|_________
Total 1|100 2|200 5|500 2|200 | 10|1000
I want it to look like: (notice all phases have all all departments)
**D E P A R T M E N T S**
Build | Design | Model | Rev | Req | Total
Phase: Start hrs|amt hrs|amt hrs|amt hrs|amt hrs|amt | hrs|amt
BegDt Endt Emp
1/3 1/6 Sue 1|100 2|200 | 3|300
JJ 3|300 | 3|300
Ted 2|200 | 2|200
________________________________________________________
Start Subtotal 1|100 2|200 5|500 | 8|800
Build | Design | Model | Rev | Req | Total
Phase: Proto hrs|amt hrs|amt hrs|amt hrs|amt hrs|amt | hrs|amt
BegDt Endt Emp
1/3 1/6 Joe 1|100 1|100 | 2|200
Chris 3|300 | 3|300
Patty 1|100 2|200 2|200 | 5|500
________________________________________________________
Proto Subtotal 1|100 2|200 5|500 2|200 | 10|1000
________________________________________________________
Total: 2|200 2|200 5|500 4|400 5|500 | 18|1800
Thanks for your help.

I've done similar thing once.
Unfortunately Report Studio can't create prompts (and variables) dynamically.
You can construct your set of prompts using JavaScript. Not Cognos prompts. HTML EditBoxes. Then carefully pass values from EditBoxes to real hidden prompt as text.

Related

Expand rectangles as much as possible to cover another rectangle, minimizing overlap

Given a tiled, x- and y-aligned rectangle and (potentially) a starting set of other rectangles which may overlap, I'd like to find a set of rectangles so that:
if no starting rectangle exists, one might be created; otherwise do not create additional rectangles
each of the rectangles in the starting set are expanded as much as possible
the overlap is minimal
the whole tiled rectangle's area is covered.
This smells a lot like a set cover problem, but it still is... different.
The key is that each starting rectangle's area has to be maximized while still minimizing general overlap. A good solution keeps a balance between necessary overlaps and high initial rectangles sizes.
I'd propose a rating function such as that:
Higher is better.
Examples (assumes a rectangle tiled into a 4x4 grid; numbers in tiles denote starting rectangle "ID"):
easiest case: no starting rectangles provided, can just create one and expand it fully:
.---------------. .---------------.
| | | | | | 1 | 1 | 1 | 1 |
|---|---|---|---| |---|---|---|---|
| | | | | | 1 | 1 | 1 | 1 |
|---|---|---|---| => |---|---|---|---|
| | | | | | 1 | 1 | 1 | 1 |
|---|---|---|---| |---|---|---|---|
| | | | | | 1 | 1 | 1 | 1 |
·---------------· ·---------------·
rating: 16 * 1 - 0 = 16
more sophisticated:
.---------------. .---------------. .---------------.
| 1 | 1 | | | | 1 | 1 | 1 | 1 | | 1 | 1 | 2 | 2 |
|---|---|---|---| |---|---|---|---| |---|---|---|---|
| 1 | 1 | | | | 1 | 1 | 1 | 1 | | 1 | 1 | 2 | 2 |
|---|---|---|---| => |---|---|---|---| or |---|---|---|---|
| | | 2 | 2 | | 2 | 2 | 2 | 2 | | 1 | 1 | 2 | 2 |
|---|---|---|---| |---|---|---|---| |---|---|---|---|
| | | 2 | 2 | | 2 | 2 | 2 | 2 | | 1 | 1 | 2 | 2 |
·---------------· ·---------------· ·---------------·
ratings: (4 + 4) * 2 - 0 = 16 (4 + 4) * 2 - 0 = 16
pretty bad situation, with initial overlap:
.-----------------. .-----------------------.
| 1 | | | | | 1 | 1 | 1 | 1 |
|-----|---|---|---| |-----|-----|-----|-----|
| 1,2 | 2 | | | | 1,2 | 1,2 | 1,2 | 1,2 |
|-----|---|---|---| => |-----|-----|-----|-----|
| | | | | | 2 | 2 | 2 | 2 |
|-----|---|---|---| |-----|-----|-----|-----|
| | | | | | 2 | 2 | 2 | 2 |
·-----------------· ·-----------------------·
rating: (8 + 12) * 2 - (2 + 2 + 2 + 2) = 40 - 8 = 36
covering with 1 only:
.-----------------------.
| 1 | 1 | 1 | 1 |
|-----|-----|-----|-----|
| 1,2 | 1,2 | 1 | 1 |
=> |-----|-----|-----|-----|
| 1 | 1 | 1 | 1 |
|-----|-----|-----|-----|
| 1 | 1 | 1 | 1 |
·-----------------------·
rating: (16 + 2) * 1 - (2 + 2) = 18 - 4 = 16
more starting rectangles, also overlap:
.-----------------. .---------------------.
| 1 | 1,2 | 2 | | | 1 | 1,2 | 1,2 | 1,2 |
|---|-----|---|---| |---|-----|-----|-----|
| 1 | 1 | | | | 1 | 1 | 1 | 1 |
|---|-----|---|---| => |---|-----|-----|-----|
| 3 | | | | | 3 | 3 | 3 | 3 |
|---|-----|---|---| |---|-----|-----|-----|
| | | | | | 3 | 3 | 3 | 3 |
·-----------------· ·---------------------·
rating: (8 + 3 + 8) * 3 - (2 + 2 + 2) = 57 - 6 = 51
The starting rectangles may be located anywhere in the tiled rectangle and have any size (minimum bound 1 tile).
The starting grid might be as big as 33x33 currently, though potentially bigger in the future.
I haven't been able to reduce this problem instantiation to a well-problem, but this may only be my own inability.
My current approach to solve this in an efficient way would go like this:
if list of starting rects empty:
create starting rect in tile (0,0)
for each starting rect:
calculate the distances in x and y direction to the next object (or wall)
sort distances in ascending order
while free space:
pick rect with lowest distance
expand it in lowest distance direction
I'm unsure if this gives the optimal solution or really is the most efficient one... and naturally if there are edge cases this approach would fail on.
Proposed attack. Your mileage may vary. Shipping costs higher outside the EU.
Make a list of open tiles
Make a list of rectangles (dimension & corners)
We're going to try making +1 growth steps: expand some rectangle one unit in a chosen direction. In each iteration, find the +1 with the highest score. Iterate until the entire room (large rectangle) is covered.
Scoring suggestions:
Count the squares added by the extension: open squares are +1; occupied squares are -1 for each other rectangle overlapped.
For instance, in this starting position:
- - 3 3
1 1 12 -
- - 2 -
...if we try to extend rectangle 3 down one row, we get +1 for the empty square on the right, but -2 for overlapping both 1 and 2.
Divide this score by the current rectangle area. In the example above, we would have (+1 - 2) / (1*2), or -1/2 as the score for that move ... not a good idea, probably.
The entire first iteration would consider the moves below; directions are Up-Down-Left-Right
rect dir score
1 U 0.33 = (2-1)/3
1 D 0.33 = (2-1)/3
1 R 0.33 = (1-0)/3
2 U -0.00 = (0-1)/2
2 L 0.00 = (1-1)/2
2 R 0.50 = (2-1)/2
3 D 0.00 = (1-1)/2
3 L 0.50 = (1-0)/2
We have a tie for best score: 2 R and 3 L. I'll add a minor criterion of taking the greater expansion, 2 tiles over 1. This gives:
- - 3 3
1 1 12 2
- - 2 2
For the second iteration:
rect dir score
1 U 0.33 = (2-1)/3
1 D 0.33 = (2-1)/3
1 R 0.00 = (0-1)/3
2 U -0.50 = (0-2)/4
2 L 0.00 = (1-1)/4
3 D -1.00 = (0-2)/2
3 L 0.50 = (1-0)/2
Naturally, the tie from last time is now the sole top choice, since the two did not conflict:
- 3 3 3
1 1 12 2
- - 2 2
Possible optimization: If a +1 has no overlap, extend it as far as you can (avoiding overlap) before computing scores.
In the final two iterations, we will similarly get 3 L and 1 D as our choices, finishing with
3 3 3 3
1 1 12 2
1 1 2 2
Note that this algorithm will not get the same answer for your "pretty bad example": this one will cover the entire room with 2, reducing to only 2 overlap squares. If you'd rather have 1 expand in that case, we'll need a factor for the proportion of another rectangle that you're covering, instead of my constant value of 1.
Does that look like a tractable starting point for you?

Manually calculating time complexity of recursive Fibonacci algorithm

I am trying to understand the time complexity of the recursive Fibonacci algorithm.
fib(n)
if (n < 2)
return n
return fib(n-1)+fib(n-2)
Having not much mathematical background, I tried computing it by hand. That is, I manually count the number of steps as n increases. I ignore all things that I think are constant time. Here is how I did it. Say I want to compute fib(5).
n = 0 - just a comparison on an if statement. This is constant.
n = 1 - just a comparison on an if statement. This is constant.
n = 2 - ignoring anything else, this should be 2 steps, fib(1) takes 1 step and fib(0) takes 1 step.
n = 3 - 3 steps now, fib(2) takes two steps and fib(1) takes 1 step.
n = 4 - 5 steps now, fib(3) takes 3 steps and fib(2) takes 2 steps.
n = 5 - 8 steps now, fib(4) takes 5 steps and fib(3) takes 3 steps.
Judging from these, I believe the running time might be fib(n+1). I am not so sure if 1 is a constant factor because the difference between fib(n) and fib(n+1) might be very large.
I've read the following on SICP:
In general, the number of steps required by a tree-recursive process
will be proportional to the number of nodes in the tree, while the
space required will be proportional to the maximum depth of the tree.
In this case, I believe the number of nodes in the tree is fib(n+1). So I am confident I am correct. However, this video confuses me:
So this is a thing whose time complexity is order of actually, it
turns out to be Fibonacci of n. There's a thing that grows exactly as
Fibonacci numbers. 
...
That every one of these nodes in this tree has to be examined.
I am absolutely shocked. I've examined all nodes in the tree and there are always fib(n+1) nodes and thus number of steps when computing fib(n). I can't figure out why some people say it is fib(n) number of steps and not fib(n+1).
What am I doing wrong?
In your program, you have this time-consuming actions (sorted by time used per action, quick actions on top of the list):
Addition
IF (conditional jump)
Return from subroutine
Function call
Lets look at how many of this actions are executed, and lets compare this with n and fib(n):
n | fib | #ADD | #IF | #RET | #CALL
---+-----+------+-----+------+-------
0 | 0 | 0 | 1 | 1 | 0
1 | 1 | 0 | 1 | 1 | 0
For n≥2 you can calculate the numbers this way:
fib(n) = fib(n-1) + fib(n-2)
ADD(n) = 1 + ADD(n-1) + ADD(n-2)
IF(n) = 1 + IF(n-1) + IF(n-2)
RET(n) = 1 + RET(n-1) + RET(n-2)
CALL(n) = 2 + CALL(n-1) + CALL(n-2)
Why?
ADD: One addition is executed directly in the top instance of the program, but in the both subroutines, that you call are also additions, that need to be executed.
IF and RET: Same argument as before.
CALL: Also the same, but you execute two calls in the top instance.
So, this is your list for other values of n:
n | fib | #ADD | #IF | #RET | #CALL
---+--------+--------+--------+--------+--------
0 | 0 | 0 | 1 | 1 | 0
1 | 1 | 0 | 1 | 1 | 0
2 | 1 | 1 | 3 | 3 | 2
3 | 2 | 2 | 5 | 5 | 4
4 | 3 | 4 | 9 | 9 | 8
5 | 5 | 7 | 15 | 15 | 14
6 | 8 | 12 | 25 | 25 | 24
7 | 13 | 20 | 41 | 41 | 40
8 | 21 | 33 | 67 | 67 | 66
9 | 34 | 54 | 109 | 109 | 108
10 | 55 | 88 | 177 | 177 | 176
11 | 89 | 143 | 287 | 287 | 286
12 | 144 | 232 | 465 | 465 | 464
13 | 233 | 376 | 753 | 753 | 752
14 | 377 | 609 | 1219 | 1219 | 1218
15 | 610 | 986 | 1973 | 1973 | 1972
16 | 987 | 1596 | 3193 | 3193 | 3192
17 | 1597 | 2583 | 5167 | 5167 | 5166
18 | 2584 | 4180 | 8361 | 8361 | 8360
19 | 4181 | 6764 | 13529 | 13529 | 13528
20 | 6765 | 10945 | 21891 | 21891 | 21890
21 | 10946 | 17710 | 35421 | 35421 | 35420
22 | 17711 | 28656 | 57313 | 57313 | 57312
23 | 28657 | 46367 | 92735 | 92735 | 92734
24 | 46368 | 75024 | 150049 | 150049 | 150048
25 | 75025 | 121392 | 242785 | 242785 | 242784
26 | 121393 | 196417 | 392835 | 392835 | 392834
27 | 196418 | 317810 | 635621 | 635621 | 635620
You can see, that the number of additions is exactly the half of the number of function calls (well, you could have read this directly out of the code too). And if you count the initial program call as the very first function call, then you have exactly the same amount of IFs, returns and calls.
So you can combine 1 ADD, 2 IFs, 2 RETs and 2 CALLs to one super-action that needs a constant amount of time.
You can also read from the list, that the number of Additions is 1 less (which can be ignored) than fib(n+1).
So, the running time is of order fib(n+1).
The ratio fib(n+1) / fib(n) gets closer and closer to Φ, the bigger n grows. Φ is the golden ratio, i.e. 1.6180338997 which is a constant. And constant factors are ignored in orders. So, the order O(fib(n+1)) is exactly the same as O(fib(n)).
Now lets look at the space:
It is true, that the maximum space, needed to process a tree is equal to the maximum distance between the tree and the maximum distant leaf. This is true, because you call f(n-2) after f(n-1) returned.
So the space needed by your program is of order n.

Is it possible to do a 'normalized' dense_rank() in hive?

I have a consumer table like so.
consumer | product | quantity
-------- | ------- | --------
a | x | 3
a | y | 4
a | z | 1
b | x | 3
b | y | 5
c | x | 4
What I want is a 'normalized' rank assigned to each consumer so that I can split the table easily for testing and training. I used the dense_rank() in hive, so I got the below table.
rank | consumer | product | quantity
---- | -------- | ------- | --------
1 | a | x | 3
1 | a | y | 4
1 | a | z | 1
2 | b | x | 3
2 | b | y | 5
3 | c | x | 4
This is well and good, but I want to scale this to use with any number of consumers, so I would ideally like the range of ranks between 0 and 1, like so.
rank | consumer | product | quantity
---- | -------- | ------- | --------
0.33 | a | x | 3
0.33 | a | y | 4
0.33 | a | z | 1
0.67 | b | x | 3
0.67 | b | y | 5
1 | c | x | 4
This way, I'd always know what the range of ranks is, and can split the data in a standard way (rank <= 0.7 training, and rank > 0.7 testing)
Is there a way to achieve this in hive?
Or, is there a different and better approach to my original issue of splitting the data?
I tried to do a select * where rank < 0.7*max(rank), but hive says the MAX UDAF is not yet available in where clause.
percent_rank
select percent_rank() over (order by consumer) as pr
,*
from mytable
;
+-----+----------+---------+----------+
| pr | consumer | product | quantity |
+-----+----------+---------+----------+
| 0.0 | a | z | 1 |
| 0.0 | a | y | 4 |
| 0.0 | a | x | 3 |
| 0.6 | b | y | 5 |
| 0.6 | b | x | 3 |
| 1.0 | c | x | 4 |
+-----+----------+---------+----------+
For filtering you'll need a sub-query / CTE
select *
from (select percent_rank() over (order by consumer) as pr
,*
from mytable
) t
where pr <= ...
;

Fast algorithm for simple data group

There are several billions rows like this
id | type | groupId
---+------+--------
1 | a |
1 | b |
2 | a |
2 | c |
1 | a |
2 | d |
2 | a |
1 | e |
5 | a |
1 | f |
4 | a |
1 | b |
4 | a |
1 | t |
8 | a |
3 | c |
6 | a |
I need to add groupId for these data, if id same or type same, then its a same groupId, the result like this:
id | type | group
---+------+--------
1 | a | 1
1 | b | 1
2 | a | 1
2 | c | 1
1 | a | 1
2 | d | 1
2 | a | 1
1 | e | 1
5 | a | 1
1 | f | 1
4 | a | 1
1 | b | 1
4 | a | 1
7 | t | 2
8 | g | 3
3 | c | 1
6 | a | 1
I try to use a loop to do this, but its very inefficiency, its need server weeks to finish all this.
This is a classic example where you can use a Quick-Union algorithm.
Computational Limits
Time complexity for grouping N rows : O(N log* N) where log* N is the "number of times needed to take the lg of a number until reaching 1" . eg Log* 10^100 = 3 (approx)
Space complexity : O(N)
Read more on this algorithm:
https://www.youtube.com/watch?v=MaNCMWhYIHo ,
https://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf

80% Rule Estimation Value in PL/SQL

Assume a range of values inserted in a schema table and in the end of the month i want to apply for these records (i.e. 2500 rows = numeric values) the algorithm: sort the values descending (from the smallest to highest value) and then find the 80% value of the sorted column.
In my example, if each row increases by one starting from 1, the 80% value will be the 2000 row=value (=2500-2500*20/100). This algorithm needs to be implemented in a procedure where the number of rows is not constant, for example it can varries from 2500 to 1,000,000 per month
Hint: You can achieve this using Oracle's cumulative aggregate functions. For example, suppose your table looks like this:
MY_TABLE
+-----+----------+
| ID | QUANTITY |
+-----+----------+
| A | 1 |
| B | 2 |
| C | 3 |
| D | 4 |
| E | 5 |
| F | 6 |
| G | 7 |
| H | 8 |
| I | 9 |
| J | 10 |
+-----+----------+
At each row, you can sum the quantities so far using this:
SELECT
id,
quantity,
SUM(quantity)
OVER (ORDER BY quantity ROWS UNBOUNDED PRECEDING)
AS cumulative_quantity_so_far
FROM
MY_TABLE
Giving you:
+-----+----------+----------------------------+
| ID | QUANTITY | CUMULATIVE_QUANTITY_SO_FAR |
+-----+----------+----------------------------+
| A | 1 | 1 |
| B | 2 | 3 |
| C | 3 | 6 |
| D | 4 | 10 |
| E | 5 | 15 |
| F | 6 | 21 |
| G | 7 | 28 |
| H | 8 | 36 |
| I | 9 | 45 |
| J | 10 | 55 |
+-----+----------+----------------------------+
Hopefully this will help in your work.
Write a query using the percentile_disc function to solve your problem. Sounds like it does what you want.
An example would be
select percentile_disc(0.8) within group (order by the_value)
from my_table

Resources