gnuplot give wrong results from stats matrix - matrix

Suppose that I have the file data.dat with follow content:
Days 1 2 4 6 10 15 20 30
Group 01 37.80 30.67 62.88 86.06 26.24 98.49 65.42 61.28
Group 02 38.96 72.99 38.24 74.11 39.54 91.59 81.14 91.22
Group 03 82.34 75.25 82.58 28.22 39.21 81.30 41.30 42.48
Group 04 75.52 42.83 66.80 20.50 94.08 74.78 95.09 53.16
Group 05 89.32 56.78 30.05 68.07 59.18 94.18 39.77 67.56
Group 06 70.03 78.71 37.59 60.55 46.40 82.73 67.34 93.38
Group 07 67.83 88.73 48.01 62.19 49.40 67.68 25.97 58.98
Group 08 61.15 96.06 59.62 39.42 60.06 94.18 76.06 32.02
Group 09 65.61 72.39 54.07 92.79 56.58 39.14 81.81 39.16
Group 10 59.65 77.81 40.51 68.49 66.15 80.33 87.31 42.07
The final intention is create a histogram using histogram clustered.
Besides the graph, I need of some values from data.dat such as
size_x, size_y, min, max, and mean. To achieve the last task I used
set datafile separator tab
stats 'data.dat' skip 1 matrix
The summed up output was:
* MATRIX: [9 X 10]
Minimum: 0.0000 [ 0 0 ]
Maximum: 98.4900 [ 6 0 ]
Mean: 56.0549
The size_x and size_y values are correct – 9 columns and 10 rows – but the min is not.
This is due to the fact that the first column is string-type.
When I include every
set datafile separator tab
stats 'data.dat' skip 1 matrix every ::1
to skip the first column, the summed up output is:
* MATRIX: [9 X 8]
Minimum: 20.5000 [ 0 3 ]
Maximum: 98.4900 [ 5 0 ]
Mean: 63.0617
This time the min and max values are right, but the size_y (shown 8, expected 9) and index from min (expected [ 3 3 ]) is not.
What is going on? I made some mistake? I'm not noticing something?

The program tries to read a value from the first field of each row, sees "Group xx" and ends up filling in 0 for that entry. You need to tell it to skip the first column.
Amended answer
I think there is a bug here, as well as confusion between documentation and the actual implementation. The matrix rows and columns as implemented by the every selector are indexed from 0 to N-1 as they would be for C language arrays. The documentation incorrectly states or at least implies that the first row and column is matrix[1][1] rather than [0][0]. So the full command needed for your case is
gnuplot> set datafile sep tab
gnuplot> stats 'data.dat' every 1:1:1:1 matrix
warning: matrix contains missing or undefined values
* FILE:
Records: 80
Out of range: 0
Invalid: 0
Header records: 0
Blank: 10
Data Blocks: 1
* MATRIX: [9 X 8]
Mean: 63.0617
Std Dev: 20.6729
Sample StdDev: 20.8033
Skewness: -0.1327
Kurtosis: 1.9515
Avg Dev: 17.4445
Sum: 5044.9400
Sum Sq.: 352332.2181
Mean Err.: 2.3113
Std Dev Err.: 1.6343
Skewness Err.: 0.2739
Kurtosis Err.: 0.5477
Minimum: 20.5000 [ 0 3 ]
Maximum: 98.4900 [ 5 0 ]
I.e. every 1:1:1:1 tells it for both rows and columns the index increment is 1 and the submatrix starts at [1][1] rather than at the origin [0][0].
The output values are all correct, but the indices shown for the size [9 x 8] and the min/max entries are wrong. I will file a bug report for both issues.

I got sidetracked trying to characterize the bug revealed by the original answer and forgot to mention a simpler alternative. For this specific case of one row of column headers and one column of rowheaders, gnuplot provides a special syntax that works without error:
set file separator tab
stats 'data.dat' matrix rowheaders columnheaders

Related

How to extract vectors from a given condition matrix in Octave

I'm trying to extract a matrix with two columns. The first column is the data that I want to group into a vector, while the second column is information about the group.
A =
1 1
2 1
7 2
9 2
7 3
10 3
13 3
1 4
5 4
17 4
1 5
6 5
the result that i seek are
A1 =
1
2
A2 =
7
9
A3 =
7
10
13
A4=
1
5
17
A5 =
1
6
as an illustration, I used the eval function but it didn't give the results I wanted
Assuming that you don't actually need individually named separated variables, the following will put the values into separate cells of a cell array, each of which can be an arbitrary size and which can be then retrieved using cell index syntax. It makes used of logical indexing so that each iteration of the for loop assigns to that cell in B just the values from the first column of A that have the correct number in the second column of A.
num_cells = max (A(:,2));
B = cell (num_cells,1);
for idx = 1:max(A(:,2))
B(idx) = A((A(:,2)==idx),1);
end
B =
{
[1,1] =
1
2
[2,1] =
7
9
[3,1] =
7
10
13
[4,1] =
1
5
17
[5,1] =
1
6
}
Cell arrays are accessed a bit differently than normal numeric arrays. Array indexing (with ()) will return another cell, e.g.:
>> B(1)
ans =
{
[1,1] =
1
2
}
To get the contents of the cell so that you can work with them like any other variable, index them using {}.
>> B{1}
ans =
1
2
How it works:
Use max(A(:,2)) to find out how many array elements are going to be needed. A(:,2) uses subscript notation to indicate every value of A in column 2.
Create an empty cell array B with the right number of cells to contain the separated parts of A. This isn't strictly necessary, but with large amounts of data, things can slow down a lot if you keep adding on to the end of an array. Pre-allocating is usually better.
For each iteration of the for loop, it determines which elements in the 2nd column of A have the value matching the value of idx. This returns a logical array. For example, for the third time through the for loop, idx = 3, and:
>> A_index3 = A(:,2)==3
A_index3 =
0
0
0
0
1
1
1
0
0
0
0
0
That is a logical array of trues/falses indicating which elements equal 3. You are allowed to mix both logical and subscripts when indexing. So using this we can retrieve just those values from the first column:
A(A_index3, 1)
ans =
7
10
13
we get the same result if we do it in a single line without the A_index3 intermediate placeholder:
>> A(A(:,2)==3, 1)
ans =
7
10
13
Putting it in a for loop where 3 is replaced by the loop variable idx, and we assign the answer to the idx location in B, we get all of the values separated into different cells.

Finding maximum and minimum value in a matrix column using Octave

I have a 10 x 2 sample matrix as follows
2104 3
1600 3
2400 3
1416 2
3000 4
1985 4
1534 3
1427 3
1380 3
1494 3
I need a generalized method to find the minimum and maximum value in a column.
I can use
max(max(X)) to find the maximum value in a matrix, but not of a column.
Also, max(min(X)) to find the minimum value is not a generalized solution.
Given a matrix X, max(X) will return the maximum value in each column. You can index the result to get the value for a given column:
max(X)(1) % max of the fist column (doesn't work in MATLAB)
Alternatively, extract the column and get its max:
max(X(:,1)) % max of the fist column
max (and many similar functions) operate on columns by default. To get the maximum of each row, use max(X,[],2).

MATLAB: finding a row index in a matrix

I have a matrix and I want to find the maximum value in each column, then find the index of the row of that maximum value.
A = magic(5)
A =
17 24 1 8 15
23 5 7 14 16
4 6 13 20 22
10 12 19 21 3
11 18 25 2 9
[~,colind] = max(max(A))
colind =
3
returns colind as the column index that contains the maximum value. If you want the row:
[~,rowind] = max(A);
max(rowind)
ans =
5
You can use a fairly simple code to do this.
MaximumVal=0
for i= i:length(array)
if MaximumVal>array(i)
MaximumVal=array(i);
Indicies=i;
end
end
MaximumVal
Indicies
Another way to do this would be to use find. You can output the row and column of the maximum element immediately without invoking max twice as per your question. As such, do this:
%// Define your matrix
A = ...;
% Find row and column location of where the maximum value is
[maxrow,maxcol] = find(A == max(A(:)));
Also, take note that if you have multiple values that share the same maximum, this will output all of the rows and columns in your matrix that share this maximum, so it isn't just limited to one row and column as what max will do.

Sum up custom grand total on crosstab in BIRT

I have a crosstab and create custom grand total for the row level in each column dimension, by using a data element expression.
Crosstab Example:
Cat 1 Cat 2 GT
ITEM C F % VALUE C F % VALUE
A 101 0 0.9 10 112 105 93.8 10 20
B 294 8 2.7 6 69 66 95.7 10 16
C 211 7 3.3 4 212 161 75.9 6 10
------------------------------------------------------------------
GT 606 15 2.47 6 393 332 84.5 8 **14**
Explanation for GT row:
Those C and F column is summarized from the above. But the
% column is division result of F/C.
Create a data element to fill the VALUE column, which comes from range of value definition, varies for each Cat (category). For instance... in Cat 1, if the value is between 0 - 1 the value will be 10, or between 1 - 2 = 8, etc. And condition for Cat 2, between 85 - 100 = 10, and 80 - 85 = 8, etc.
The GT row (with the value of 14), is gathered by adding VALUE of Cat 1 + Cat 2.
I am able to work on point 1 and 2 above, but I can't seem to make it working for GT row. I don't know the code/expression to sum up the VALUE data element for this 2 categories. Because those VALUE field comes from one data element in design mode.
I have found the solution for my problem. I can show the result by using a report variable. I am assigning 2 report variables in % field expression, based on the category in data cube dimension (by using if statement). And then in data element expression, I am calling both of the expressions and add them.

Sparse Matrices Storage formats - Conversion

Is there an efficient way of converting a sparse matrix in Compressed Row Storage(CRS) format to Coordinate List (COO) format ?
Have a look at Yousef Saad's library SPARSKIT -- he has subroutines to convert back and forth between compressed sparse row and coordinate formats, as well as several other sparse matrix storage schemes.
Anyhow, to see how to get the coordinate format from the compressed one, it's easiest to consider how you could have come up with the compressed row format in the first place. Say you have a sparse matrix in COO, where you've put everything in order, for example
rows: 1 1 1 1 2 2 2 2 2 3 3 3 ...
cols: 1 3 5 9 2 3 7 9 11 1 2 3 ...
So the non-zero entries in row 1 are (1,1), (1,3), (1,5), (1,9) and so forth. You're storing a lot of redundant data in the array of rows; you can instead just have an array ia such that ia(i) tells you the starting address in the array cols for row i. In our example above, we would then have
ia : 1 5 10 ...
cols: 1 3 5 9 2 3 7 9 11 1 2 3 ...
To go from COO to CSR, we just use the fact that
ia(i+1) = ia(i) + number of non-zero entries in row i
for any i. Knowing that, you can work backwards to get the COO format from CSR.

Resources