Transforming data (to a format appropriate for Event History analysis, survival analysis, or panel regressions) - events

I am struggling with converting existing data to a format appropriate for Event History Analysis (EHA), or in other words, Survival Analysis.
The data is on a daily basis.
Company_name
Date
Policy_Type
A
2021-01-03
vaccination
A
2021-01-04
quarantine
A
2021-01-05
tracing
A
2021-01-08
sickleave
A
2021-01-10
voucher
A
2021-01-11
testingkit
B
2021-01-02
quarantine
B
2021-01-03
vaccination
B
2021-01-05
voucher
C
2021-01-05
tracing
C
2021-01-08
sickleave
D
2021-01-04
voucher
D
2021-01-07
vaccination
D
2021-01-10
tracing
D
2021-01-11
quarantine
D
2021-01-12
surveys
D
2021-01-15
sickleave
D
2021-01-15
testingkit
D
2021-01-15
travellimit
E
2021-01-05
quarantine
E
2021-01-11
vaccination
E
2021-01-14
tracing
E
2021-01-15
testingkit
...
I would like to transform the data above to a format appropriate for the event history analysis OR panel regressions like below:
Company_name
Date
vaccination
quarantine
tracing
sickleave
voucher
testingkit
surveys
travellimit
A
2021-01-01
0
0
0
0
0
0
0
0
A
2021-01-02
0
0
0
0
0
0
0
0
A
2021-01-03
1
0
0
0
0
0
0
0
A
2021-01-04
1
1
0
0
0
0
0
0
A
2021-01-05
1
1
1
0
0
0
0
0
A
2021-01-06
1
1
1
0
0
0
0
0
A
2021-01-07
1
1
1
0
0
0
0
0
A
2021-01-08
1
1
1
1
0
0
0
0
A
2021-01-09
1
1
1
1
0
0
0
0
A
2021-01-10
1
1
1
1
1
0
0
0
A
2021-01-11
1
1
1
1
1
1
0
0
A
2021-01-12
1
1
1
1
1
1
0
0
A
2021-01-13
1
1
1
1
1
1
0
0
A
2021-01-14
1
1
1
1
1
1
0
0
A
2021-01-15
1
1
1
1
1
1
0
0
B
2021-01-01
0
0
0
0
0
0
0
0
B
2021-01-02
0
1
0
0
0
0
0
0
B
2021-01-03
1
1
0
0
0
0
0
0
B
2021-01-04
1
1
0
0
0
0
0
0
B
2021-01-05
1
1
0
0
1
0
0
0
B
2021-01-06
1
1
0
0
1
0
0
0
B
2021-01-07
1
1
0
0
1
0
0
0
B
2021-01-08
1
1
0
0
1
0
0
0
B
2021-01-09
1
1
0
0
1
0
0
0
B
2021-01-10
1
1
0
0
1
0
0
0
B
2021-01-11
1
1
0
0
1
0
0
0
B
2021-01-12
1
1
0
0
1
0
0
0
B
2021-01-13
1
1
0
0
1
0
0
0
B
2021-01-14
1
1
0
0
1
0
0
0
B
2021-01-15
1
1
0
0
1
0
0
0
C
2021-01-01
0
0
0
0
0
0
0
0
C
2021-01-02
0
0
0
0
0
0
0
0
C
2021-01-03
0
0
0
0
0
0
0
0
C
2021-01-04
0
0
0
0
0
0
0
0
C
2021-01-05
0
0
1
0
0
0
0
0
C
2021-01-06
0
0
1
0
0
0
0
0
C
2021-01-07
0
0
1
0
0
0
0
0
C
2021-01-08
0
0
1
1
0
0
0
0
C
2021-01-09
0
0
1
1
0
0
0
0
C
2021-01-10
0
0
1
1
0
0
0
0
C
2021-01-11
0
0
1
1
0
0
0
0
C
2021-01-12
0
0
1
1
0
0
0
0
C
2021-01-13
0
0
1
1
0
0
0
0
C
2021-01-14
0
0
1
1
0
0
0
0
C
2021-01-15
0
0
1
1
0
0
0
0
D
2021-01-01
0
0
0
0
0
0
0
0
D
2021-01-02
0
0
0
0
0
0
0
0
D
2021-01-03
0
0
0
0
0
0
0
0
D
2021-01-04
0
0
0
0
1
0
0
0
D
2021-01-05
0
0
0
0
1
0
0
0
D
2021-01-06
0
0
0
0
1
0
0
0
D
2021-01-07
1
0
0
0
1
0
0
0
D
2021-01-08
1
0
0
0
1
0
0
0
D
2021-01-09
1
0
0
0
1
0
0
0
D
2021-01-10
1
0
1
0
1
0
0
0
D
2021-01-11
1
1
1
0
1
0
0
0
D
2021-01-12
1
1
1
0
1
0
1
0
D
2021-01-13
1
1
1
0
1
0
1
0
D
2021-01-14
1
1
1
0
1
0
1
0
D
2021-01-15
1
1
1
1
1
1
1
1
E
2021-01-01
0
0
0
0
0
0
0
0
E
2021-01-02
0
0
0
0
0
0
0
0
E
2021-01-03
0
0
0
0
0
0
0
0
E
2021-01-04
0
0
0
0
0
0
0
0
E
2021-01-05
0
1
0
0
0
0
0
0
E
2021-01-06
0
1
0
0
0
0
0
0
E
2021-01-07
0
1
0
0
0
0
0
0
E
2021-01-08
0
1
0
0
0
0
0
0
E
2021-01-09
0
1
0
0
0
0
0
0
E
2021-01-10
0
1
0
0
0
0
0
0
E
2021-01-11
1
1
0
0
0
0
0
0
E
2021-01-12
1
1
0
0
0
0
0
0
E
2021-01-13
1
1
0
0
0
0
0
0
E
2021-01-14
1
1
1
0
0
0
0
0
E
2021-01-15
1
1
1
0
0
1
0
0
Is there any program that I can use to transform the first format into the second format?
I can use Stata and R. The original data is in the csv format. I can also consider python as well.
Thank you so much for considering my question!

Related

Find 4-neighbors using J

I'm trying to find the 4-neighbors of all 1's in a matrix of 0's and 1's using the J programming language. I have a method worked out, but am trying to find a method that is more compact.
To illustrate, let's say I have the matrix M—
] M=. 4 4$0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0
0 0 1 0
0 0 0 0
0 0 0 0
and I want to generate—
0 0 1 0
0 1 0 1
0 0 1 0
0 0 0 0
I've sorted something close (which I owe to this little gem: https://www.reddit.com/r/cellular_automata/comments/9kw21u/i_made_a_34byte_implementation_of_conways_game_of/)—
] +/+/(|:i:1*(2 2)$1 0 0 1)&|.M
0 0 1 0
0 1 2 1
0 0 1 0
0 0 0 0
which is fine because I'll be weighting the initial 1's anyway (and the actual numbers aren't really that important for my application anyway). But I feel like this could be more compact and I've just hit a wall. And the compactness of the expression actually is important to my application.
Building on #Eelvex comment solution, if you are willing to make the verb dyadic it becomes pretty simple. The left argument can be the rotation matrix and then the result is composed with +./ which is a logical or and can be weighted however you want.
] M0=. 4 4$0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0
0 0 1 0
0 0 0 0
0 0 0 0
] m =.2,\5$0,i:1
0 _1
_1 0
0 1
1 0
m +./#:|. M0
0 0 1 0
0 1 0 1
0 0 1 0
0 0 0 0
There is still an issue with the edges (which wrap) around, but that also occurs with your original solution, so I am hoping that you are not concerned with that.
] M1=. 4 4$1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
m +./#:|. M1
0 1 0 1
1 0 0 0
0 0 0 0
1 0 0 0
If you did want to clean that up, you can use the slightly longer m +./#:(|.!.0), which fills the rotation with 0's.
] M2=. 4 4$ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 1
m +./#:(|.!.0) M2
0 0 0 0
0 0 0 0
0 0 0 1
0 0 1 0
m +./#:(|.!.0) M1
0 1 0 0
1 0 0 0
0 0 0 0
0 0 0 0

Is there a way to show the index of atoms in rdkit.Chem.rdmolops.GetAdjacencyMatrix?

I'm trying to convert a compound from mol to adjacency matrix. However, i encountered a problem that rdkit.Chem.rdmolops.GetAdjacencyMatrix() doesn't provide the index of the atoms for the adjacency matrix. Is there any way to include the index data for the adjacency matrix in rdkit?
rdkit.Chem.rdmolops.GetAdjacencyMatrix((Mol)mol)
As the RDKit AdjacencyMatrix is ordered from zero upwards, you can convert it to a Pandas dataframe.
from rdkit import Chem
import pandas as pd
s = 'CCC(C(O)C)CN'
mol = Chem.MolFromSmiles(s)
am = Chem.GetAdjacencyMatrix(mol)
print(am)
[[0 1 0 0 0 0 0 0]
[1 0 1 0 0 0 0 0]
[0 1 0 1 0 0 1 0]
[0 0 1 0 1 1 0 0]
[0 0 0 1 0 0 0 0]
[0 0 0 1 0 0 0 0]
[0 0 1 0 0 0 0 1]
[0 0 0 0 0 0 1 0]]
df = pd.DataFrame(am)
print(df)
0 1 2 3 4 5 6 7
0 0 1 0 0 0 0 0 0
1 1 0 1 0 0 0 0 0
2 0 1 0 1 0 0 1 0
3 0 0 1 0 1 1 0 0
4 0 0 0 1 0 0 0 0
5 0 0 0 1 0 0 0 0
6 0 0 1 0 0 0 0 1
7 0 0 0 0 0 0 1 0
If you want elements instead of indices
element = [atom.GetSymbol() for atom in mol.GetAtoms()]
print(element)
['C', 'C', 'C', 'C', 'O', 'C', 'C', 'N']
df_e = pd.DataFrame(am, index=element, columns=element)
print(df_e)
C C C C O C C N
C 0 1 0 0 0 0 0 0
C 1 0 1 0 0 0 0 0
C 0 1 0 1 0 0 1 0
C 0 0 1 0 1 1 0 0
O 0 0 0 1 0 0 0 0
C 0 0 0 1 0 0 0 0
C 0 0 1 0 0 0 0 1
N 0 0 0 0 0 0 1 0

Computing block sum for an arbitrary region in an image

I wonder what is the most effective way to solve the following problem:
(If there is a name for this problem, I would like to know it as well)
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 0 0 0;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 0 1 0 1 1 1;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 1 0 0;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
If I have an image where I am interested in the following pixels marked by 1. In an image I want to calculate a sum around this block. A sum of block is easy to calculate from an integral image but I don't want to do it for the whole image, since there is a lot of unnecessary computation.
One option that I can come up with is to search the minimum and maximum in horizontal and vertical directions and then take a rectangular portion of the image enlarged so that it will covered the block portion. For example +2 pixels each directions, if the block size is 5. But this solution still includes unnecessary calculation.
If I had a list of these indices, I could loop through them and calculate the sum for each block but then if there is another pixel close by which has the same pixels in its block, I need to recalculate them and If I save them, I somehow need to look if they are already calculated or not and that takes time as well.
Is there a known solution for this sort of a problem?

Sorting the connected component in order

I have a question in sort of connected component. I have a binary image ( onlye 0 and 1) I run the function from matlab:
f=
1 0 0 1 0 0 0 1 0 0
1 1 0 1 1 1 0 0 1 0
0 0 0 0 0 0 0 1 1 1
1 0 0 0 1 0 1 0 1 1
1 1 0 0 0 0 0 1 1 1
0 0 0 1 0 0 1 0 0 0
0 0 0 1 0 1 1 0 1 1
1 1 0 0 1 0 0 0 1 0
1 1 0 1 1 1 0 1 0 0
1 1 0 0 1 0 0 0 1 0
[L num]=bwlabel(f);
suppose that they give me the ma trix:
1 0 0 4 0 0 0 5 0 0
1 1 0 4 4 4 0 0 5 0
0 0 0 0 0 0 0 5 5 5
2 0 0 0 6 0 5 0 5 5
2 2 0 0 0 0 0 5 5 5
0 0 0 5 0 0 5 0 0 0
0 0 0 5 0 5 5 0 7 7
3 3 0 0 5 0 0 0 7 0
3 3 0 5 5 5 0 7 0 0
3 3 0 0 5 0 0 0 7 0
But you can see in this resul, the order of matrix is follow the column. Now I want to change this in to the oder rows, that mean number 4 is 2 , number 5 is 3... so on.
The oder is left-> right and top -> down. How can I do that ( the oder of reading )??
Thank you so much
f=f';
[L num]=bwlabel(f);
L=L';
does this solves your problem?

how to convert these 7-segment decoder to boolean expression

how to convert these 7-segment decoder to boolean expression??
BCD 7-Segment decoder
A B C D a b c d e f g
0 0 0 0 0 0 0 0 0 0 1
0 0 0 1 1 0 0 1 1 1 1
0 0 1 0 0 0 1 0 0 1 0
0 0 1 1 0 0 0 0 1 1 0
0 1 0 0 1 0 0 1 1 0 0
0 1 0 1 0 1 0 0 1 0 0
0 1 1 0 0 1 0 0 0 0 0
0 1 1 1 0 0 0 1 1 1 1
1 0 0 0 0 0 0 0 0 0 0
1 0 0 1 0 0 0 0 1 0 0
I suggest you use a karnough map.
You'll need to use one for each result column, so 7 4x4 tables.
There are even a few karough map generators on the web that you can use.

Resources