Creating share of one variable in Top/Bottom 10 of another variable - max

I have 5 variables: nacional , nacional=1 for natives , nacional=0 for non-natives, FirmsID: NPC_FIC, Establsihmenti;s ID: ESTAB_ID, Occupation codes: CCPCodes and skill ratio: sk_ratio.I am going to compute the share of immigrants in Top 10 and Bottom 10 occupations. Since each firm has more than one establishment more often makes the solution tricky. Any help are apprecited.
input float nacioanal double(NPC_FIC ESTAB_ID) float CCPCodes double sk_ratio
1 500988754 100861 1114 4.359499
1 501950749 893218 1114 4.359499
1 500988750 990861 6914 2.309491
1 501951290 125210 1114 4.359499
1 500988760 100861 1141 5.459470
1 502000740 -820100 1114 4.359499
0 500988762 880861 7514 66.359409
1 500988754 100861 1114 4.359499
1 577988759 210869 4116 6.059120
1 500988092 670869 3216 10.059785
I used 'extremes sk_ratio CCPCodes, n(10)' to make top/bottom 10 but because of repeated valules I could not go futher.

Related

Count number of connections in each network using DAX

I have a dataset like this in Power BI with connections between "Participant ID" Column and "Knows Participant":
Participant ID
Knows Participant
111
353
111
777
111
112
111
249
112
143
112
144
113
111
113
244
114
NaN
115
113
...
...
777
111
777
398
777
114
778
NaN
779
112
3499
NaN
I've build Network chart. However, there are a lot of 1-1 connections that are not very useful for visualization, so I want to exclude them (see image):
Is it possible to count a number of connections in each network using DAX and then use this value to filter out all nodes with only 1 connection (red circled)? Or maybe filter out 1 connection nodes using another approach?
I've tried to make a calculated column using DAX:
Connection Column = COUNTROWS(
FILTER(Table,
EARLIER(Table[Knows Participant])=Table[Knows Participant])
)
However, it only shows duplicate values in "Knows Participant" Column, but not number of connections in each network.
Example of desired output:
Participant ID
Knows Participant
Number of Connections in the Network
111
353
4
353
444
4
444
551
4
551
987
4
112
143
1
220
190
1
333
337
2
337
410
2
765
0
You need the PATH functions as you're essentially trying to flatten a hierarchy and then exclude certain parts of it. The following help page gives a good rundown of the approach to take.
https://learn.microsoft.com/en-us/dax/understanding-functions-for-parent-child-hierarchies-in-dax
You can add a column to the table with a measure like this:
VAR pIdLinksCount = CALCULATE(COUNTROWS(tbl), ALL('tbl'[Knows Participant]))
VAR neighbourLinksCount =
IF(
pIdLinksCount=1
, -- if pIdLinksCount=1 then count neighbour links
VAR neighbourId =
CALCULATETABLE(
Values('tbl'[Knows Participant])
)
RETURN
CALCULATE(
COUNTROWS(tbl)
,ALL() -- removes all filters from data model
,'tbl'[Participant ID] = neighbourId -- applies filter to [Participant ID] column
--,'tbl'[Participant ID] IN neighbourId -- alternatively try this. I believe it is not necessary
)
,2 -- returns 2 if pIdLinksCount>1.
-- The "value = 2" will return "result > 3 = TRUE()"
)
VAR result = pIdLinksCount + neighbourLinksCount
RETURN
IF(
result>2
,1
,0
)
The idea is to check a neighbor too - if it has more then 1 link

why my dxf file not working in AutoCAD giving me ID 11 incorrect: already used

I have generated a dxf file but when I opened it with AutoCAD, crashes AutoCAD and gives a message ID 11 incorrect: already used.
the dxf content: https://github.com/tarikjabiri/dxf/blob/dev/examples/latest.dxf
I can't spot the problem 3 days I am trying to solve it.
I think something wrong with the APPID because it holding the ID 11 or the Handle in the language of DXF.
I have a dxf working: https://github.com/tarikjabiri/dxf/blob/dev/examples/Minimal_DXF_AC1021.dxf
Thanks in advance.
There are two minor issues:
DIMSTYLE table
0
TABLE
2
DIMSTYLE
105 <<< handle group code of the table "head" is 5 as usual
8
100
AcDbSymbolTable
100
AcDbDimStyleTable
70
1
0
DIMSTYLE
5 <<< handle group code of the table entry is 105
12
330
8
100
AcDbSymbolTableRecord
100
AcDbDimStyleTableRecord
2
STANDARD
70
0
40
1
BLOCK_RECORD table entries for *MODEL_SPACE and *PAPER_SPACE
0
TABLE
2
BLOCK_RECORD
5
9
330
0
100
AcDbSymbolTable
70
2
0
BLOCK_RECORD
5
14
330
9
100
AcDbSymbolTableRecord
100
AcDbRegAppTableRecord <<< subclass marker string "AcDbBlockTableRecord"
2
*MODEL_SPACE
70
0
70
0
280
After this changes the file opens in Autodesk DWG Trueview 2022.

Issue on DXF code for an ELLIPSE for AutoCad

I´m having an issue with a DXF code for an Ellipse, I´m trying to have it Graphed by AutoCad 2019 but it won´t recognize the code for an unknown reason. I would greatly apreciate any insight on the issue. Thanks a lot in advance
0
SECTION
2
ENTITIES
0
ELLIPSE
8
0
10
43.6886
20
16.2019
30
0
11
64.4949
21
16.2019
31
0
210
0
220
0
230
0
40
0.4
41
0
42
6.28319
0
ENDSEC
0
EOF
Well, you are missing some AutoCAD 'housekeeping' stuff in your dxf file. You have all the geometry bits for your ellipse, but AutoCAD doesn't know where to put them in the overall DXF file. So you have to include things like the drawing 'Handle' and the other required elements that place the ellipse in the overall framework of the whole drawing. An ellipse in a DXF file starts out like what is shown below:
ENTITIES
0
ELLIPSE
5
86
330
70
100
AcDbEntity
8
0
100
AcDbEllipse
10
43.6886
20
16.2019
30
0
The R2000 dxf spec will tell you what all those pieces are for specifically, but everything above
AcDbEllipse
10
is needed to place the ellipse in the greater context of the rest of the drawing. Without it, the ellipse would not be recognized.
I can tell you that code 8 identifies the layer the entity is on, in this case, 0 and code 5 identifies a unique handle (id code) for the entity, in this case 86. The handle must be unique for every entity in a dxf file. If you are manipulating/creating dxf entities in code, you have to be very careful to never have duplicate handles.
This ia an example of DXF with a line from 100,100,0 to 200,200,0
999
Start Section ***************************************************************
0
SECTION
999
Start Entities ****************************************************************
2
ENTITIES
999
Line ************************************************************************
0
LINE
8
LAYER1
10
100
20
100
30
0
11
200
21
200
31
0
0
ENDSEC
999
End Section ***************************************************************
999
End File ********************************************************************
0
EOF
999 is for comment
then you have to start section and entities
at the end close section and file
If you need more Info contact me

How can I convert the qblast XML output into the NCBI BLAST -outfmt 17?

I started my project with the NCBI standalone BLAST and used the -outfmt 17 option. For my purpose that formatting is extremely helpful. However, I had to change to Biopython and I'm now using qblast to align my sequences to the NCBI NT database. Can I save/convert the qblast XML in a format which is comparable to the NCBI BLAST standalone -outfmt 17 format?
Thank you very much for your help!
Cheers,
Philipp
I'm going to assume you meant -outfmt 7 and you need an output with columns.
from Bio.Blast import NCBIWWW, NCBIXML
# This is the BLASTN query which returns an XML handler in a StringIO
r = NCBIWWW.qblast(
"blastn",
"nr",
"ACGGGGTCTCGAAAAAAGGAGAATGGGATGAGAAGGATATATGGGTAGTGTCATTTTTTAACTTGCAGAT" +
"TTCATCCTAGTCTTCCAGTTATCGTTTCCTAGCACTCCATGTTCCCAAGATAGTGTCACCACCCCAAGGA" +
"CTCTCTCTCATTTTCTTTGCCTGGGCCCTCTTTCTACTGAGGAGTCGTGGCCTTCCATCAGTAGAAGCCG",
expect=1E-5)
# Now we read that XML extracting the info
for record in NCBIXML.parse(r):
for alignment in record.alignments:
for hsp in alignment.hsps:
cols = "{}\t" * 10
print(cols.format(hsp.positives / hsp.align_length,
hsp.align_length,
hsp.align_length - hsp.positives,
hsp.gaps,
hsp.query_start,
hsp.query_end,
hsp.sbjct_start,
hsp.sbjct_end,
hsp.expect,
hsp.score))
Outputs something like:
1 210 0 0 1 210 89250 89459 8.73028e-102 420.0
0 206 19 2 5 210 46259 46462 5.16461e-73 314.0
1 210 0 0 1 210 68822 69031 8.73028e-102 420.0
0 206 19 2 5 210 25825 26028 5.16461e-73 314.0
1 210 0 0 1 210 65887 66096 8.73028e-102 420.0
...

Pandas performance issue of dataframe column "rename" and "drop"

Below is the line_profiler record of a function :
Wrote profile results to FM_CORE.py.lprof
Timer unit: 2.79365e-07 s
File: F:\FM_CORE.py
Function: _rpt_join at line 1068
Total time: 1.87766 s
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1068 #profile
1069 def _rpt_join(dfa, dfb, join_type='inner'):
1070 ''' join two dataframe together by ('STK_ID','RPT_Date') multilevel index.
1071 'join_type' can be 'inner' or 'outer'
1072 '''
1073
1074 2 56 28.0 0.0 try: # ('STK_ID','RPT_Date') are normal column
1075 2 2936668 1468334.0 43.7 rst = pd.merge(dfa, dfb, how=join_type, on=['STK_ID','RPT_Date'], left_index=True, right_index=True)
1076 except: # ('STK_ID','RPT_Date') are index
1077 rst = pd.merge(dfa, dfb, how=join_type, left_index=True, right_index=True)
1078
1079
1080 2 81 40.5 0.0 try: # handle 'STK_Name
1081 2 426472 213236.0 6.3 name_combine = pd.concat([dfa.STK_Name, dfb.STK_Name])
1082
1083
1084 2 900584 450292.0 13.4 nameseries = name_combine[-Series(name_combine.index.values, name_combine.index).duplicated()]
1085
1086 2 1138140 569070.0 16.9 rst.STK_Name_x = nameseries
1087 2 596768 298384.0 8.9 rst = rst.rename(columns={'STK_Name_x': 'STK_Name'})
1088 2 722293 361146.5 10.7 rst = rst.drop(['STK_Name_y'], axis=1)
1089 except:
1090 pass
1091
1092 2 94 47.0 0.0 return rst
What surprise me is these two lines:
1087 2 596768 298384.0 8.9 rst = rst.rename(columns={'STK_Name_x': 'STK_Name'})
1088 2 722293 361146.5 10.7 rst = rst.drop(['STK_Name_y'], axis=1)
Why a simple dataframe column "rename" and "drop" operation costs that much percentage of time (8.9% + 10.7%)? Anyway, the "merge" operation only costs 43.7% , and "rename"/"drop" looks not like a calculation-intensive operation. How to improve it ?

Resources