How to do a dynamic Range with Pentaho CDE Dial Component - pentaho-cde

I am trying to make the Pentaho CDE Dial Component show a dynamic range.
Here's an example. Let's say am looking at how Alex performs in respect to the total marks assignable e.g 50 out of 100.
I do a query that outputs the two numbers in two separate columns 50 | 100
Now I want the dial component to show a range of 0 to 100 and have the pointer to be at 50.
But if tomorrow Alex gets 20 out of 70, I want the Dial component to show a range of 0 to 70 and the pointer to be at 20.
Currently, my dial component picks the first number and does a range from 0 to that number. e.g if Alex scores 20 out of 70, the dial shows a range of 0 to 20 and the pointer is at 20.
I am picking the numbers from the query to a query component which in turn pushes to the dial component.
My intervals array looks like [0,33,66,100].
What am I missing here?

Related

Why doesn't Javers report the correct row(s) that was added when comparing two objects?

When comparing two objects of the same size, Javers compares 1-to-1. However, if a new change is added such as new row to one of the objects, the comparison reports changes that are NOT changes. Is it possible to have Javers ignore the addition/deletion for the sake of just comparing like objects?
Basically the indices get out of sync.
Row Name Age Phone(Cell/Work)
1 Jo 20 123
2 Sam 25 133
3 Rick 30 152
4 Rick 30 145
New List
Row Name Age Phone(Cell/Work)
1 Jo 20 123
2 Sam 25 133
3 Bill 30 170
4 Rick 30 152
5 Rick 30 145
Because Bill is added the new comparison result will say that Rows 4,5 have changed when they actually didn't.
Thanks.
I'm guessing that your 'rows' are objects representing rows in an excel table and that you have mapped them as ValueObjects and put them into some list.
Since ValueObjects don't have its own identity, it's unclear, even for a human, what was the actual change. Take a look at your row 4:
Row Name Age Phone(Cell/Work)
before:
4 Rick 30 145
after:
4 Rick 30 152
Did you changed Phone at row 4 from 145 to 152? Or maybe you inserted a new data to row 4? How can we know?
We can't. By default, JaVers chooses the simplest answer, so reports value change at index 4.
If you don't care aboute the indices, you can change the list comparision algorithm from Simple to Levenshtein distance. See https://javers.org/documentation/diff-configuration/#list-algorithms
SIMPLE algorithm generates changes for shifted elements (in case when elements are inserted or removed in the middle of a list). On the contrary, Levenshtein algorithm calculates short and clear change list even in case when elements are shifted. It doesn’t care about index changes for shifted elements.
But, I'm not sure if Levenshtein is implemented for ValueObjects, if it is not implemented yet, it's a feature request to javers-core.

How to display a 2 terms graph in Kibana

In my ElasticSearch DB I have 2 fields:
Is a running number (1,2,3, etc)
Is a diferent number that depends on the first number (100, 50, 8, 4005 etc).
Now, I want to create a graph where the first number is on Y axis, and the second number is on the X axis.
How can it be done in Kibana?
Thank you.
You can create a linechart/vertical bar chart/etc and use a Histogram Aggregation on first field and appropriate term aggregation for second field (e.g. avg/sum)

What data structure can we use to efficiently check for resource availability?

This question is asked on behalf of reddit user /u/Dasharg95.
I want to build a hotel room reservation system where each hotel room can be booked for an arbitrary set of time frames. A common query against the reservation data set is trying to figure out what rooms are available for a given time frame. Is there a data structure for the reservation data set that allows this kind of query to be performed efficiently?
For example, say, we have five rooms with the following occupation times:
room 1: 9:00 -- 12:00, 15:00 -- 18:00, 19:30 -- 20:00
room 2: 8:00 -- 9:30, 15:30 -- 17:30, 18:00 -- 20:00
room 3: 6:30 -- 7:00, 7:30 -- 8:15
room 4: 12:00 -- 20:00,
room 5: 7:00 -- 14:15, 18:00 -- 21:55
I want a data structure for the occupation times that is reasonably space efficient and allows for the following queries to be performed with reasonable performance:
what times a given room is occupied for
what rooms are free for the entirety of a given time frame
The 2D array system can still be useful without a heavy resource usage. The room number could be equivalent to the index- for example, index i is the room number:
String [] = {"taken","not","taken","not","taken"}
An index is the position of an element
The second element, "not", is the index of 1, as the first element (item) is the index of zero. To get the room number, add the index with 1, as if a hotel had just one room, it would be "Room 1" not "Room 0". So the index + 1 holds the number.
If you assign the times with equal size (xxxx.yyyy, with xxxx being the open time and yyyy being the close), then you can cut half of the element by using a substring to get the first four / last four characters for the time, printing it out by putting a colon in the middle of the xxxx like xx:xx.
It could be stored in a simple 1D array, like so:
String [] rooms = {"0900.1200", "1500.1800", "1930.2000")
...... edit, just realised that those times would be for one room x( ...
So, to assign multiple times for one room you might want to use a formatting system - like:
// * = the next four digits are the opening time
// - = the next four digits are the closing time
So you could hold multiple times in one element, like: `{"*0800-0930*1530-1730*1800*2000", ....}
Its extremely complicated, but this only uses one array, and the computer could use a while loop to check if there are more times after the closing time -> if there are none, move to the next element / set of times, and room number / index.
Once you cycle through all elements, the room check is finished.
Just imagine if you like to have it in 15min intervall then u would have 24×4 = 92 different intervalls with the first from 0:00 to 0:15. Put this in binary with some added information to check if you selected the right room u could use 100 bits. Now you create functions to create the bitstring and to decrypt the string an store the strings in an array. Done.

Suitable machine learning algorithm for column selection

I am new in machine learning. In my work I require a machine learning algorithm to select some columns out of many columns in a 2D matrix depending on the spread of the data. Below is a sample of the 2D matrix:
400 700 4 1400
410 710 4 1500
416 716 4 1811
..............
410 710 4 1300
Previously I have used standard deviation method to select columns depending on some threshold values(as a measure of spread of data for a particular column). Observe that the 3rd column is constant and last column in varying tremendously. 1st and 2nd column in also varying but the spread of their data is small. By applying standard deviation on each of the columns I get (sigma) = 10, 10, 0, 200 respectively.
I have considered some experimental threshold values to discard some columns. If the (sigma) crosses the threshold value range then the corresponding column gets discarded. I calculated those threshold values manually. Though this method was very simple but dealing with the threshold values is a very tedious task as there are many existing columns.
For this reason I want to use a standard machine learning algorithm or somehow if I can make these threshold values adaptive. So that I don't require to hard-code the threshold values inside the code. Can anyone please suggest me an appropriate algorithm for this?

Calculating an average metric in GoodData

Based on GoodData's excellent suggestion for implementing Fact tables, I have been able to design a model that meets our client’s requirements for joining different attributes across different tables. The issue I have now is that the model metrics are highly denormalized, with data repeating itself. I am currently trying to figure out a way to dedupe results.
For example, I have two tables—the first is a NAMES table and the second is my fact table:
NAMES
Val2 Name
35 John
36 Bill
37 Sally
FACT
VAL1 VAL2 SCORE COURSEGRADE
1 35 50 90%
2 35 50 80%
3 35 50 60%
4 36 10 75%
5 37 40 95%
What I am trying to do is write a metric in such a way so that we can get an average of SCORE that eliminates the duplicate value. GoodData is excellent in that it can actually give me back the unique results using the COUNT(VARIABLE1,RECORD) metric, but I can’t seem to get the average store to stick when eliminating the breakout information. If I keep all fields (including the VAL2), it shows me everything:
VAL2 SCORE(AVG)
35 50
36 10
37 40
AVG: 33.33
But when I remove VAL2, I suddenly lose the "uniqueness" of the record.
SCORE(AVG)
40
What I want is the score of 33.33 we got above.
I’ve tried using a BY statement in my SELECT AVG(SCORE), but this doesn’t seem to work. It’s almost like I need some kind of DISTINCT clause. Any thoughts on how to get that rollup value shown in my first example above?
Happy to help here. I would try the following:
Create an intermediate metric (let's call it Score by Employee):
SELECT MIN( SCORE ) BY ID ALL IN ALL OTHER DIMENSIONS
Then, once you have this metric defined you should be able to create a metric for the average score as follows:
SELECT AVG( Score by Employee )
The reason we create the first metric is to force the table to normalize score around the ID attribute which gets rid of duplicates when we use this in the next metric (we could have used MAX or AVG also, it doesn't matter).
Hopefully this solves your issue, let me know if it doesn't work and I'll be happy to help out more. Also feel free to check out GoodData's Developer Portal for more information about reporting:
https://developer.gooddata.com/docs/reporting
Best,
JT
you should definitively check "How to build a metric in a metric" presentation, made by Petr Olmer (http://www.slideshare.net/petrolmer/in10-how-to-build-a-metric-in-a-metric).
It can help you to understand it better.
Cheers,
Peter

Resources