remove lines with awk when the value is smaller than - bash

Hello I have this input:
10579 5 2.0 5 100 0 20 0 80 0 20 0.72
10586 5 2.0 5 100 0 20 0 40 20 40 1.52
10856 4 3.2 4 100 0 26 0 69 30 0 0.89
11049 6 12.2 6 65 26 48 14 36 49 0 1.43
11041 2 26.0 2 70 37 20 8 43 47 0 1.34
11012 5 3.0 5 90 9 25 0 56 43 0 0.99
11041 5 3.0 5 9 25 0 56 43 0 0.99
11096 6 2.2 6 100 0 26 15 30 53 0 1.42
11194 1 28.0 1 93 6 51 0 3 96 0 0.22
11236 5 2.4 5 0 24 0 41 58 0 0.98
11981 1 10.0 1 50 18 15 0 9 0 90 0.44
12184 5 2.2 5 100 0 22 18 0 54 27 1.44
12482 4 2.5 4 100 0 20 20 0 80 0 0.72
12627 5 2.2 5 100 0 22 18 0 81 0 0.68
I want to remove lines when the column 5 is smaller than 85. I can do that with awk (awk '$5 > 85') but I want to keep lines when $5 is empty (line 7 and 10). So my output will be like this:
10579 5 2.0 5 100 0 20 0 80 0 20 0.72
10586 5 2.0 5 100 0 20 0 40 20 40 1.52
10579 5 2.0 5 100 0 20 0 80 0 20 0.72
10586 5 2.0 5 100 0 20 0 40 20 40 1.52
10856 4 3.2 4 100 0 26 0 69 30 0 0.89
11012 5 3.0 5 90 9 25 0 56 43 0 0.99
11041 5 3.0 5 9 25 0 56 43 0 0.99
11096 6 2.2 6 100 0 26 15 30 53 0 1.42
11194 1 28.0 1 93 6 51 0 3 96 0 0.22
11236 5 2.4 5 0 24 0 41 58 0 0.78
12184 5 2.2 5 100 0 22 18 0 54 27 1.44
12482 4 2.5 4 100 0 20 20 0 80 0 0.72
12627 5 2.2 5 100 0 22 18 0 81 0 0.68
awk '$5 > 85' is removing also those line. Any help? Thanks

you have to set your field delimiter to a single space, otherwise awk wouldn't know which field is missing
$ awk -F' ' '$5>85' file
10579 5 2.0 5 100 0 20 0 80 0 20 0.72
10586 5 2.0 5 100 0 20 0 40 20 40 1.52
10856 4 3.2 4 100 0 26 0 69 30 0 0.89
11012 5 3.0 5 90 9 25 0 56 43 0 0.99
11096 6 2.2 6 100 0 26 15 30 53 0 1.42
11194 1 28.0 1 93 6 51 0 3 96 0 0.22
12184 5 2.2 5 100 0 22 18 0 54 27 1.44
12482 4 2.5 4 100 0 20 20 0 80 0 0.72
12627 5 2.2 5 100 0 22 18 0 81 0 0.68

Related

Is there a way to manually calculate entity total length from a dxf file?

I need total length and width of whole sheet
Like below
Length - 602
width - 938
i checked with DXF file format reference document i didn't got any clue. can anyone knows about this calculation or code.
0
SECTION
2
HEADER
9
$ACADVER
1
AC1024
9
$ACADMAINTVER
70
109
9
$DWGCODEPAGE
3
ANSI_1252
9
$LASTSAVEDBY
1
haresh.patel
9
$INSBASE
10
0.0
20
0.0
30
0.0
9
$EXTMIN
10
-93.00730087511951
20
-51.69399222749615
30
0.0
9
$EXTMAX
10
1072.189249192752
20
688.3170963457955
30
0.0
9
$LIMMIN
10
0.0
20
0.0
9
$LIMMAX
10
12.0
20
9.0
9
$ORTHOMODE
70
1
9
$REGENMODE
70
1
9
$FILLMODE
70
1
9
$QTEXTMODE
70
0
9
$MIRRTEXT
70
1
9
$LTSCALE
40
1.0
9
$ATTMODE
70
1
9
$TEXTSIZE
40
0.2
9
$TRACEWID
40
0.05
9
$TEXTSTYLE
7
Standard
9
$CLAYER
8
0
9
$CELTYPE
6
ByLayer
9
$CECOLOR
62
256
9
$CELTSCALE
40
1.0
9
$DISPSILH
70
0
9
$DIMSCALE
40
1.0
9
$DIMASZ
40
1.0
9
$DIMEXO
40
0.0
9
$DIMDLI
40
0.0
9
$DIMRND
40
0.0
9
$DIMDLE
40
0.0
9
$DIMEXE
40
0.0
9
$DIMTP
40
0.0
9
$DIMTM
40
0.0
9
$DIMTXT
40
8.0
9
$DIMCEN
40
2.5
9
$DIMTSZ
40
0.0
9
$DIMTOL
70
0
9
$DIMLIM
70
0
9
$DIMTIH
70
0
9
$DIMTOH
70
1
9
$DIMSE1
70
0
9
$DIMSE2
70
0
9
$DIMTAD
70
1
9
$DIMZIN
70
8
9
$DIMBLK
1
9
$DIMASO
70
1
9
$DIMSHO
70
1
9
$DIMPOST
1
9
$DIMAPOST
1
9
$DIMALT
70
0
9
$DIMALTD
70
3
9
$DIMALTF
40
0.03937007874016
9
$DIMLFAC
40
1.0
9
$DIMTOFL
70
1
9
$DIMTVP
40
0.0
9
$DIMTIX
70
0
9
$DIMSOXD
70
0
9
$DIMSAH
70
0
9
$DIMBLK1
1
9
$DIMBLK2
1
9
$DIMSTYLE
2
PRAKASHDIM
9
$DIMCLRD
70
2
9
$DIMCLRE
70
2
9
$DIMCLRT
70
4
9
$DIMTFAC
40
1.0
9
$DIMGAP
40
3.0
9
$DIMJUST
70
0
9
$DIMSD1
70
0
9
$DIMSD2
70
0
9
$DIMTOLJ
70
0
9
$DIMTZIN
70
8
9
$DIMALTZ
70
0
9
$DIMALTTZ
70
0
9
$DIMUPT
70
0
9
$DIMDEC
70
1
9
$DIMTDEC
70
2
9
$DIMALTU
70
2
this is only few part of that dxf file entity

Extract column from file with shell [duplicate]

This question already has answers here:
bash: shortest way to get n-th column of output
(8 answers)
Closed 4 years ago.
I would like to extract column number 8 from the following table using shell (ash):
0xd024 2 0 32 20 3 0 1 0 2 1384 1692 -61 27694088
0xd028 0 1 5 11 1 0 46 0 0 301 187 -74 27689154
0xd02c 0 0 35 14 1 0 21 0 0 257 250 -80 27689410
0xd030 1 1 15 13 1 0 38 0 0 176 106 -91 27689666
0xd034 1 1 50 20 1 0 8 0 0 790 283 -71 27689980
0xd038 0 0 0 3 4 0 89 0 0 1633 390 -90 27690291
0xd03c 0 0 8 3 3 0 82 0 0 1837 184 -95 27690603
0xd040 0 0 4 5 1 0 90 0 0 0 148 -97 27690915
0xd064 0 0 36 9 1 0 29 0 0 321 111 -74 27691227
0xd068 0 0 5 14 14 0 40 0 0 8066 2270 -85 27691539
0xd06c 1 1 39 19 1 0 15 0 0 1342 261 -74 27691850
0xd070 0 0 12 11 1 0 53 0 0 203 174 -73 27692162
0xd074 0 0 18 2 1 0 75 0 0 301 277 -94 27692474
How can I do that?
the following command "awk '{print $8}' file" works fine

Discrepancy between classification report and confusion matrix

Maybe I'm reading the classification report or the confusion matrix wrong (or both!), but after having trained my classifier and run on it my test set, I get the following report:
precision recall f1-score support
0 0.71 0.67 0.69 5086
1 0.64 0.54 0.59 2244
2 0.42 0.25 0.31 598
3 0.65 0.22 0.33 262
4 0.53 0.42 0.47 266
5 0.42 0.15 0.22 466
6 0.35 0.25 0.29 227
7 0.07 0.05 0.06 127
8 0.39 0.14 0.21 376
9 0.35 0.25 0.29 167
10 0.25 0.14 0.18 229
avg / total 0.61 0.52 0.55 10048
Which is good and all, but when I create my confusion matrix:
0 1 2 3 4 5 6 7 8 9 10
[[4288 428 80 16 44 58 33 38 47 21 33]
[ 855 1218 54 8 12 17 25 15 15 12 13]
[ 291 72 147 1 12 10 20 2 2 17 24]
[ 173 21 3 57 1 3 0 1 1 1 1]
[ 102 20 4 0 113 0 0 6 4 9 8]
[ 331 40 10 3 7 68 3 0 2 1 1]
[ 104 30 17 0 1 0 56 2 1 10 6]
[ 85 19 4 2 5 0 2 6 4 0 0]
[ 270 29 4 1 6 2 2 7 53 1 1]
[ 63 17 11 0 8 3 14 1 1 42 7]
[ 138 13 19 0 5 2 7 3 6 5 31]]
Am I wrong in assuming, that it has predicted 4288 samples of class label 0 out of a total of 5086, which should result in a recall value of 84.3% (0.843)? But that's not the number the report spits out. The precision seems wrong as well, unless I'm wrong when I calculate the percentage of correct predictions (4288) with the sum of the rest in column 0, which results in 0.563, and not 0.71.
What am I misunderstanding?
It might be worth nothing that I use sklearn's classification_report and confusion_matrix for these.

R plotting boxplot with different amount of entries

I have a matrix that is 50x2. But column 2 has different amount of entries. How can I make a box plot where the x axis is position and the y axis are the different counts? Ideally, I'd like to take the absolute value of the counts. Thanks in advance!
> mat.count[1:50,]
position count
1 136873135 0
2 136873136 0
3 136873137 0
4 136873138 0
5 136873139 0
6 136873140 -15
7 136873141 0
8 136873142 0
9 136873143 0
10 136873144 0
11 136873145 0
12 136873146 0
13 136873147 0
14 136873148 0
15 136873149 0
16 136873150 0
17 136873151 0
18 136873152 0
19 136873153 0
20 136873154 0
21 136873155 0
22 136873156 0
23 136873157 0
24 136873158 0
25 136873159 0
26 136873160 0
27 136873161 0
28 136873162 0
29 136873163 0
30 136873164 0
31 136873165 0
32 136873166 0
33 136873167 0
34 136873168 -1
35 136873169 0
36 136873170 0
37 136873171 0
38 136873172 0
39 136873173 -70
40 136873174 -66
41 136873175 -73,-1,-1,-1,-73,-1
42 136873176 -52
43 136873177 0
44 136873178 0
45 136873179 -66,-1
46 136873180 -1
47 136873181 0
48 136873182 -68,-75
49 136873183 -67,-67
50 136873184 -60,-56,-56

Oracle grand total column and row

I have this table as a result from another query
STATUS R1 R2 R3 R4 R5 R6 R7 R8 R9
----------------------------------------------------
ACCEPTED 322 241 278 473 575 595 567 449 605
ADECUACIONES 0 0 0 0 2 0 1 0 50
AET 0 0 2 0 0 0 0 0 11
EXECUTED 0 80 1 18 9 57 34 30 20
IN PROCESS 0 0 0 0 0 4 25 2 112
FREQ 0 55 2 76 25 117 7 73 48
INSTALL 1 4 1 10 5 14 2 13 62
WO INSTALL 9 2 51 24 143 17 15 59 16
WOT VL 0 1 0 0 1 0 0 0 0
OTHER 22 7 20 28 44 30 6 6 109
PROG 1 0 1 0 0 2 3 0 0
PTE PROG 0 5 0 0 0 0 3 19 93
TMX 0 0 0 28 4 8 11 3 14
PROJ 0 1 12 26 13 8 0 2 4
What I expect to have is this
STATUS R1 R2 R3 R4 R5 R6 R7 R8 R9 TOTAL
----------------------------------------------------------
ACCEPTED 322 241 278 473 575 595 567 449 605 4105
ADECUACIONES 0 0 0 0 2 0 1 0 50 53
AET 0 0 2 0 0 0 0 0 11 13
EXECUTED 0 80 1 18 9 57 34 30 20 249
IN PROCESS 0 0 0 0 0 4 25 2 112 143
FREQ 0 55 2 76 25 117 7 73 48 403
INSTALL 1 4 1 10 5 14 2 13 62 112
WO INSTALL 9 2 51 24 143 17 15 59 16 336
WOT VL 0 1 0 0 1 0 0 0 0 2
OTHER 22 7 20 28 44 30 6 6 109 272
PROG 1 0 1 0 0 2 3 0 0 7
PTE PROG 0 5 0 0 0 0 3 19 93 120
TMX 0 0 0 28 4 8 11 3 14 68
PROJ 0 1 12 26 13 8 0 2 4 66
TOTAL 355 396 368 683 821 852 674 656 1144 5949
I've been playing with grouping() and rollup(), but I always get duplicated rows and unwanted null values.
If you have problems, grouping_id function will help you.
(You can select grouping_id(col), but also grouping_id(col1, col2, col3, etc..))
But your case is simpler.
It is like:
drop table fg_test_group;
create table fg_test_group (a number, b number, c number, d number);
insert into fg_test_group values (1, 2, 3, 4);
insert into fg_test_group values (2, 2, 3, 4);
insert into fg_test_group values (3, 2, 3, 4);
select nvl(to_char(a), 'total') as a , sum(b), sum(c), sum(d), grouping_id(a)
from fg_test_group
group by rollup (a)
;
where a is Status in your case.
CREATE TABLE TEST1 (STATUS VARCHAR2(10), R1 NUMBER, R2 NUMBER, R3 NUMBER);
INSERT INTO TEST1 VALUES ('ACCEPTED', 322,241,278);
INSERT INTO TEST1 VALUES ('EXECUTED', 0, 80, 1);
INSERT INTO TEST1 VALUES ('FREQ', 0, 55, 2);
COMMIT;
select NVL(TO_CHAR(STATUS), 'total') as STATUS ,SUM(R1) R1, SUM(R2) R2 , SUM(R3) R3, SUM(R1+R2+R3)
from TEST1
group by rollup (STATUS)
;
STATUS R1 R2 R3 SUM(R1+R2+R3)
ACCEPTED 322 241 278 841
EXECUTED 0 80 1 81
FREQ 0 55 2 57
total 322 376 281 979

Resources