Discrepancy between classification report and confusion matrix - matrix

Maybe I'm reading the classification report or the confusion matrix wrong (or both!), but after having trained my classifier and run on it my test set, I get the following report:
precision recall f1-score support
0 0.71 0.67 0.69 5086
1 0.64 0.54 0.59 2244
2 0.42 0.25 0.31 598
3 0.65 0.22 0.33 262
4 0.53 0.42 0.47 266
5 0.42 0.15 0.22 466
6 0.35 0.25 0.29 227
7 0.07 0.05 0.06 127
8 0.39 0.14 0.21 376
9 0.35 0.25 0.29 167
10 0.25 0.14 0.18 229
avg / total 0.61 0.52 0.55 10048
Which is good and all, but when I create my confusion matrix:
0 1 2 3 4 5 6 7 8 9 10
[[4288 428 80 16 44 58 33 38 47 21 33]
[ 855 1218 54 8 12 17 25 15 15 12 13]
[ 291 72 147 1 12 10 20 2 2 17 24]
[ 173 21 3 57 1 3 0 1 1 1 1]
[ 102 20 4 0 113 0 0 6 4 9 8]
[ 331 40 10 3 7 68 3 0 2 1 1]
[ 104 30 17 0 1 0 56 2 1 10 6]
[ 85 19 4 2 5 0 2 6 4 0 0]
[ 270 29 4 1 6 2 2 7 53 1 1]
[ 63 17 11 0 8 3 14 1 1 42 7]
[ 138 13 19 0 5 2 7 3 6 5 31]]
Am I wrong in assuming, that it has predicted 4288 samples of class label 0 out of a total of 5086, which should result in a recall value of 84.3% (0.843)? But that's not the number the report spits out. The precision seems wrong as well, unless I'm wrong when I calculate the percentage of correct predictions (4288) with the sum of the rest in column 0, which results in 0.563, and not 0.71.
What am I misunderstanding?
It might be worth nothing that I use sklearn's classification_report and confusion_matrix for these.

Related

Is there a way to manually calculate entity total length from a dxf file?

I need total length and width of whole sheet
Like below
Length - 602
width - 938
i checked with DXF file format reference document i didn't got any clue. can anyone knows about this calculation or code.
0
SECTION
2
HEADER
9
$ACADVER
1
AC1024
9
$ACADMAINTVER
70
109
9
$DWGCODEPAGE
3
ANSI_1252
9
$LASTSAVEDBY
1
haresh.patel
9
$INSBASE
10
0.0
20
0.0
30
0.0
9
$EXTMIN
10
-93.00730087511951
20
-51.69399222749615
30
0.0
9
$EXTMAX
10
1072.189249192752
20
688.3170963457955
30
0.0
9
$LIMMIN
10
0.0
20
0.0
9
$LIMMAX
10
12.0
20
9.0
9
$ORTHOMODE
70
1
9
$REGENMODE
70
1
9
$FILLMODE
70
1
9
$QTEXTMODE
70
0
9
$MIRRTEXT
70
1
9
$LTSCALE
40
1.0
9
$ATTMODE
70
1
9
$TEXTSIZE
40
0.2
9
$TRACEWID
40
0.05
9
$TEXTSTYLE
7
Standard
9
$CLAYER
8
0
9
$CELTYPE
6
ByLayer
9
$CECOLOR
62
256
9
$CELTSCALE
40
1.0
9
$DISPSILH
70
0
9
$DIMSCALE
40
1.0
9
$DIMASZ
40
1.0
9
$DIMEXO
40
0.0
9
$DIMDLI
40
0.0
9
$DIMRND
40
0.0
9
$DIMDLE
40
0.0
9
$DIMEXE
40
0.0
9
$DIMTP
40
0.0
9
$DIMTM
40
0.0
9
$DIMTXT
40
8.0
9
$DIMCEN
40
2.5
9
$DIMTSZ
40
0.0
9
$DIMTOL
70
0
9
$DIMLIM
70
0
9
$DIMTIH
70
0
9
$DIMTOH
70
1
9
$DIMSE1
70
0
9
$DIMSE2
70
0
9
$DIMTAD
70
1
9
$DIMZIN
70
8
9
$DIMBLK
1
9
$DIMASO
70
1
9
$DIMSHO
70
1
9
$DIMPOST
1
9
$DIMAPOST
1
9
$DIMALT
70
0
9
$DIMALTD
70
3
9
$DIMALTF
40
0.03937007874016
9
$DIMLFAC
40
1.0
9
$DIMTOFL
70
1
9
$DIMTVP
40
0.0
9
$DIMTIX
70
0
9
$DIMSOXD
70
0
9
$DIMSAH
70
0
9
$DIMBLK1
1
9
$DIMBLK2
1
9
$DIMSTYLE
2
PRAKASHDIM
9
$DIMCLRD
70
2
9
$DIMCLRE
70
2
9
$DIMCLRT
70
4
9
$DIMTFAC
40
1.0
9
$DIMGAP
40
3.0
9
$DIMJUST
70
0
9
$DIMSD1
70
0
9
$DIMSD2
70
0
9
$DIMTOLJ
70
0
9
$DIMTZIN
70
8
9
$DIMALTZ
70
0
9
$DIMALTTZ
70
0
9
$DIMUPT
70
0
9
$DIMDEC
70
1
9
$DIMTDEC
70
2
9
$DIMALTU
70
2
this is only few part of that dxf file entity

Xcode 11 console spew

I've noticed some strange spew in the console after updating to Xcode 11.
Has anyone else seen this, or know what the issue night be.
0000000A: 0100 4 4 319
00000016: 0101 4 4 398
00000022: 0102 3 6 110
0000002E: 011A 5 8 116
0000003A: 011B 5 8 124
00000046: 0128 3 2 3
00000052: 0131 2 13 132
0000005E: 0132 2 20 146
000000A8: 0100 4 4 205
000000B4: 0101 4 4 256
000000C0: 0102 3 6 268
000000CC: 0103 3 2 6
000000D8: 0106 3 2 6
000000E4: 0115 3 2 3
000000F0: 0201 4 4 274
000000FC: 0202 4 4 7301
etc

remove lines with awk when the value is smaller than

Hello I have this input:
10579 5 2.0 5 100 0 20 0 80 0 20 0.72
10586 5 2.0 5 100 0 20 0 40 20 40 1.52
10856 4 3.2 4 100 0 26 0 69 30 0 0.89
11049 6 12.2 6 65 26 48 14 36 49 0 1.43
11041 2 26.0 2 70 37 20 8 43 47 0 1.34
11012 5 3.0 5 90 9 25 0 56 43 0 0.99
11041 5 3.0 5 9 25 0 56 43 0 0.99
11096 6 2.2 6 100 0 26 15 30 53 0 1.42
11194 1 28.0 1 93 6 51 0 3 96 0 0.22
11236 5 2.4 5 0 24 0 41 58 0 0.98
11981 1 10.0 1 50 18 15 0 9 0 90 0.44
12184 5 2.2 5 100 0 22 18 0 54 27 1.44
12482 4 2.5 4 100 0 20 20 0 80 0 0.72
12627 5 2.2 5 100 0 22 18 0 81 0 0.68
I want to remove lines when the column 5 is smaller than 85. I can do that with awk (awk '$5 > 85') but I want to keep lines when $5 is empty (line 7 and 10). So my output will be like this:
10579 5 2.0 5 100 0 20 0 80 0 20 0.72
10586 5 2.0 5 100 0 20 0 40 20 40 1.52
10579 5 2.0 5 100 0 20 0 80 0 20 0.72
10586 5 2.0 5 100 0 20 0 40 20 40 1.52
10856 4 3.2 4 100 0 26 0 69 30 0 0.89
11012 5 3.0 5 90 9 25 0 56 43 0 0.99
11041 5 3.0 5 9 25 0 56 43 0 0.99
11096 6 2.2 6 100 0 26 15 30 53 0 1.42
11194 1 28.0 1 93 6 51 0 3 96 0 0.22
11236 5 2.4 5 0 24 0 41 58 0 0.78
12184 5 2.2 5 100 0 22 18 0 54 27 1.44
12482 4 2.5 4 100 0 20 20 0 80 0 0.72
12627 5 2.2 5 100 0 22 18 0 81 0 0.68
awk '$5 > 85' is removing also those line. Any help? Thanks
you have to set your field delimiter to a single space, otherwise awk wouldn't know which field is missing
$ awk -F' ' '$5>85' file
10579 5 2.0 5 100 0 20 0 80 0 20 0.72
10586 5 2.0 5 100 0 20 0 40 20 40 1.52
10856 4 3.2 4 100 0 26 0 69 30 0 0.89
11012 5 3.0 5 90 9 25 0 56 43 0 0.99
11096 6 2.2 6 100 0 26 15 30 53 0 1.42
11194 1 28.0 1 93 6 51 0 3 96 0 0.22
12184 5 2.2 5 100 0 22 18 0 54 27 1.44
12482 4 2.5 4 100 0 20 20 0 80 0 0.72
12627 5 2.2 5 100 0 22 18 0 81 0 0.68

Maths in a while loop causing random negative numbers

So I have done this in both python and bash, and the code I am about to post probably has a world of things wrong with it but it is generally very basic and I cannot see a reason that it would cause this 'bug' which I will explain soon.. I have done the same in Python, but much more professionally and cleanly and it also causes this error (at some point, the maths generates a negative number, which makes no sense.)
#!/bin/bash
while [ 1 ];
do
zero=0
ARRAY=()
ARRAY2=()
first=`command to generate a list of numbers`
sleep 1
second=`command to generate a list of numbers`
# so now we have two data sets, 1 second between the capture of each.
for i in $first;
do
ARRAY+=($i)
done
for i in $second;
do
ARRAY2+=($i)
done
for (( c=$zero; c<=${#ARRAY2[#]}; c++ ))
do
expr ${ARRAY2[$c]} - ${ARRAY[$c]}
done
ARRAY=()
ARRAY2=()
zero=0
c=0
first=``
second=``
math=''
done
So the script grabs a set of data, waits 1 second, grabs it again, does math on the two sets to get the difference, that difference is printed. It's very simple, and I have done it elegantly in Python too - no matter how I would do it every now and then, could be anywhere from 3 loops in to 30 loops in, we will get negative numbers.. like so:
START 0 0 0 0 0 19 10 563 0
-34 19 14 2 0
-1302 1198
-532 639
-1078 1119 1 0 0
-843 33 880 0 5
-8
-13508 8773 4541 988 181
-12
-205 217
-9 7 1
-360 303 60 1 0 0
-12
-96 98 3
-870 904
-130
-2105 2264 6
-3084 1576 1650
-939 971
-2249 1150 1281
-693 9 513 142 76 expr: syntax error
Please help, I simply can't find anything about this.
Sample OUTPUT as requested:
ARRAY1 OUTPUT
1 15 1 25 25 1 2 1 3541 853 94567 42 5 1 351 51 1 11 1 13 7 14 12 3999 983 5 1938 3 8287 40 1 1 1 5253 706 1 1 1 1 5717 3 50 1 85 100376 17334 4655 1 1345 2 1 16 1777 1 3 38 23 8 32 47 781 947 1 1 206 9 1 3 2 81 2602 7 158 1 1 43 91 1 120 6589 6 2534 1092 1 6014 7 2 2 37 1 1 1 80 2 1 1270 15448 66 1 10238 1 10794 16061 4 1 1 1 9754 5617 1123 926 3 24 10 16
ARRAY2 OUTPUT
1 15 1 25 25 1 2 1 3555 859 95043 42 5 1 355 55 1 11 1 13 7 14 12 4015 987 5 1938 3 8335 40 1 1 1 5280 706 1 1 1 1 5733 3 50 1 85 100877 17396 4691 1 1353 2 1 16 1782 1 3 38 23 8 32 47 787 947 1 1 206 9 1 3 2 81 2602 7 159 1 1 43 91 1 120 6869 6 2534 1092 1 6044 7 2 2 37 1 1 1 80 2 1 1270 15563 66 1 10293 1 10804 16134 4 1 1 1 9755 5633 1135 928 3 24 10 16
START
The answer lies in Russell Uhl's comment above. Your loop runs one time to many(this is your code):
for (( c=$zero; c<=${#ARRAY2[#]}; c++ ))
do
expr ${ARRAY2[$c]} - ${ARRAY[$c]}
done
To fix, you need to change the test condition from c <= ${#ARRAY2[#]} to c < ${#ARRAY2[#]}:
for (( c=$zero; c < ${#ARRAY2[#]}; c++ ))
do
echo $((${ARRAY2[$c]} - ${ARRAY[$c]}))
done
I've also changed the expr to use arithmetic evaluation builtin $((...)).
The test script (sum.sh):
#!/bin/bash
zero=0
ARRAY=()
ARRAY2=()
first="1 15 1 25 25 1 2 1 3541 853 94567 42 5 1 351 51 1 11 1 13 7 14 12 3999 983 5 1938 3 8287 40 1 1 1 5253 706 1 1 1 1 5717 3 50 1 85 100376 17334 4655 1 1345 2 1 16 1777 1 3 38 23 8 32 47 7
second="1 15 1 25 25 1 2 1 3555 859 95043 42 5 1 355 55 1 11 1 13 7 14 12 4015 987 5 1938 3 8335 40 1 1 1 5280 706 1 1 1 1 5733 3 50 1 85 100877 17396 4691 1 1353 2 1 16 1782 1 3 38 23 8 32 47
for i in $first; do
ARRAY+=($i)
done
# Alternately as chepner suggested:
ARRAY2=($second)
for (( c=$zero; c < ${#ARRAY2[#]}; c++ )); do
echo -n $((${ARRAY2[$c]} - ${ARRAY[$c]})) " "
done
Running it:
samveen#precise:/tmp$ echo $BASH_VERSION
4.2.25(1)-release
samveen#precise:/tmp$ bash sum.sh
0 0 0 0 0 0 0 0 14 6 476 0 0 0 4 4 0 0 0 0 0 0 0 16 4 0 0 0 48 0 0 0 0 27 0 0 0 0 0 16 0 0 0 0 501 62 36 0 8 0 0 0 5 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 280 0 0 0 0 30 0 0 0 0 0 0 0 0 0 0 0 115 0 0 55 0 10 73 0 0 0 0 1 16 12 2 0 0 0 0
EDIT:
* Added improvements from suggestions in comments.
I think the problem has to be when the two arrays don't have the same size. It's easy to reproduce that syntax error -- one of the operands for the minus operator is an empty string:
$ a=5; b=3; expr $a - $b
2
$ a=""; b=3; expr $a - $b
expr: syntax error
$ a=5; b=""; expr $a - $b
expr: syntax error
$ a=""; b=""; expr $a - $b
-
Try
ARRAY=( $(command to generate a list of numbers) )
sleep 1
ARRAY2=( $(command to generate a list of numbers) )
if (( ${#ARRAY[#]} != ${#ARRAY2[#]} )); then
echo "error: different size arrays!"
echo "ARRAY: ${#ARRAY[#]} (${ARRAY[*]})"
echo "ARRAY2: ${#ARRAY2[#]} (${ARRAY2[*]})"
fi
"The error occurs whenever the first array is smaller than the second" -- of course. You're looping from 0 to the array size of ARRAY2. When ARRAY has fewer elements, you'll eventually try to access an index that does not exist in the array. When you try to reference an unset variable, bash gives you the empty string.
$ a=(1 2 3)
$ b=(4 5 6 7)
$ i=2; expr ${a[i]} - ${b[i]}
-3
$ i=3; expr ${a[i]} - ${b[i]}
expr: syntax error

Oracle grand total column and row

I have this table as a result from another query
STATUS R1 R2 R3 R4 R5 R6 R7 R8 R9
----------------------------------------------------
ACCEPTED 322 241 278 473 575 595 567 449 605
ADECUACIONES 0 0 0 0 2 0 1 0 50
AET 0 0 2 0 0 0 0 0 11
EXECUTED 0 80 1 18 9 57 34 30 20
IN PROCESS 0 0 0 0 0 4 25 2 112
FREQ 0 55 2 76 25 117 7 73 48
INSTALL 1 4 1 10 5 14 2 13 62
WO INSTALL 9 2 51 24 143 17 15 59 16
WOT VL 0 1 0 0 1 0 0 0 0
OTHER 22 7 20 28 44 30 6 6 109
PROG 1 0 1 0 0 2 3 0 0
PTE PROG 0 5 0 0 0 0 3 19 93
TMX 0 0 0 28 4 8 11 3 14
PROJ 0 1 12 26 13 8 0 2 4
What I expect to have is this
STATUS R1 R2 R3 R4 R5 R6 R7 R8 R9 TOTAL
----------------------------------------------------------
ACCEPTED 322 241 278 473 575 595 567 449 605 4105
ADECUACIONES 0 0 0 0 2 0 1 0 50 53
AET 0 0 2 0 0 0 0 0 11 13
EXECUTED 0 80 1 18 9 57 34 30 20 249
IN PROCESS 0 0 0 0 0 4 25 2 112 143
FREQ 0 55 2 76 25 117 7 73 48 403
INSTALL 1 4 1 10 5 14 2 13 62 112
WO INSTALL 9 2 51 24 143 17 15 59 16 336
WOT VL 0 1 0 0 1 0 0 0 0 2
OTHER 22 7 20 28 44 30 6 6 109 272
PROG 1 0 1 0 0 2 3 0 0 7
PTE PROG 0 5 0 0 0 0 3 19 93 120
TMX 0 0 0 28 4 8 11 3 14 68
PROJ 0 1 12 26 13 8 0 2 4 66
TOTAL 355 396 368 683 821 852 674 656 1144 5949
I've been playing with grouping() and rollup(), but I always get duplicated rows and unwanted null values.
If you have problems, grouping_id function will help you.
(You can select grouping_id(col), but also grouping_id(col1, col2, col3, etc..))
But your case is simpler.
It is like:
drop table fg_test_group;
create table fg_test_group (a number, b number, c number, d number);
insert into fg_test_group values (1, 2, 3, 4);
insert into fg_test_group values (2, 2, 3, 4);
insert into fg_test_group values (3, 2, 3, 4);
select nvl(to_char(a), 'total') as a , sum(b), sum(c), sum(d), grouping_id(a)
from fg_test_group
group by rollup (a)
;
where a is Status in your case.
CREATE TABLE TEST1 (STATUS VARCHAR2(10), R1 NUMBER, R2 NUMBER, R3 NUMBER);
INSERT INTO TEST1 VALUES ('ACCEPTED', 322,241,278);
INSERT INTO TEST1 VALUES ('EXECUTED', 0, 80, 1);
INSERT INTO TEST1 VALUES ('FREQ', 0, 55, 2);
COMMIT;
select NVL(TO_CHAR(STATUS), 'total') as STATUS ,SUM(R1) R1, SUM(R2) R2 , SUM(R3) R3, SUM(R1+R2+R3)
from TEST1
group by rollup (STATUS)
;
STATUS R1 R2 R3 SUM(R1+R2+R3)
ACCEPTED 322 241 278 841
EXECUTED 0 80 1 81
FREQ 0 55 2 57
total 322 376 281 979

Resources