I have a data frame which houses data for a few individuals in my study. These individuals belong to one of four groups. I would like to plot each individual's curve and compare them to others in that group.
I was hoping to facet by group and then use the units argument to draw lines for each individual in a lineplot.
Here is what I have so far:
g = sns.FacetGrid(data = m, col='Sex', row = 'Group')
g.map(sns.lineplot, 'Time','residual')
The docs say that g.map accepts arguments in the order that they appear in lineplot. units is at the end of a very long list.
How can I facet a line plot and use the units argument?
Here is my data:
Subject Time predicted Concentration Group Sex residual
1 0.5 0.24 0.01 NAFLD Male -0.23
1 1.0 0.4 0.33 NAFLD Male -0.08
1 2.0 0.58 0.8 NAFLD Male 0.22
1 4.0 0.59 0.59 NAFLD Male -0.0
1 6.0 0.47 0.42 NAFLD Male -0.04
1 8.0 0.33 0.23 NAFLD Male -0.1
1 10.0 0.22 0.16 NAFLD Male -0.06
1 12.0 0.15 0.33 NAFLD Male 0.18
3 0.5 0.26 0.08 NAFLD Female -0.18
3 1.0 0.45 0.45 NAFLD Female 0.01
3 2.0 0.66 0.7 NAFLD Female 0.03
3 4.0 0.74 0.76 NAFLD Female 0.02
3 6.0 0.62 0.7 NAFLD Female 0.08
3 8.0 0.46 0.4 NAFLD Female -0.06
3 10.0 0.32 0.27 NAFLD Female -0.05
3 12.0 0.21 0.21 NAFLD Female -0.0
4 0.5 0.52 0.13 NAFLD Female -0.39
4 1.0 0.91 1.18 NAFLD Female 0.27
4 2.0 1.37 1.03 NAFLD Female -0.34
4 4.0 1.55 2.02 NAFLD Female 0.47
4 6.0 1.32 1.19 NAFLD Female -0.13
4 8.0 1.0 0.89 NAFLD Female -0.1
4 10.0 0.71 0.66 NAFLD Female -0.05
4 12.0 0.48 0.5 NAFLD Female 0.02
5 0.5 0.46 0.16 NAFLD Female -0.3
5 1.0 0.76 0.98 NAFLD Female 0.22
5 2.0 1.05 1.03 NAFLD Female -0.02
5 4.0 1.03 1.06 NAFLD Female 0.03
5 6.0 0.8 0.77 NAFLD Female -0.03
5 8.0 0.57 0.5 NAFLD Female -0.07
5 10.0 0.4 0.42 NAFLD Female 0.02
5 12.0 0.27 0.33 NAFLD Female 0.06
6 0.5 1.08 1.02 NAFLD Female -0.06
6 1.0 1.53 1.66 NAFLD Female 0.13
6 2.0 1.67 1.52 NAFLD Female -0.16
6 4.0 1.3 1.44 NAFLD Female 0.14
6 6.0 0.94 0.94 NAFLD Female -0.0
6 8.0 0.68 0.63 NAFLD Female -0.05
6 10.0 0.49 0.36 NAFLD Female -0.13
6 12.0 0.35 0.48 NAFLD Female 0.13
7 0.5 0.5 0.34 Control Female -0.16
7 1.0 0.81 0.84 Control Female 0.04
7 2.0 1.08 1.17 Control Female 0.1
7 4.0 1.0 0.99 Control Female -0.01
7 6.0 0.73 0.65 Control Female -0.08
7 8.0 0.5 0.49 Control Female -0.01
7 10.0 0.33 0.37 Control Female 0.04
7 12.0 0.22 0.25 Control Female 0.03
8 0.5 0.44 0.37 Control Male -0.06
8 1.0 0.67 0.74 Control Male 0.07
8 2.0 0.82 0.8 Control Male -0.03
8 4.0 0.72 0.72 Control Male 0.01
8 6.0 0.54 0.54 Control Male -0.0
8 8.0 0.4 0.38 Control Male -0.02
8 10.0 0.29 0.31 Control Male 0.02
8 12.0 0.21 0.21 Control Male 0.0
9 0.5 0.51 0.26 Control Female -0.25
9 1.0 0.86 0.66 Control Female -0.21
9 2.0 1.23 1.62 Control Female 0.39
9 4.0 1.3 1.26 Control Female -0.03
9 6.0 1.07 0.94 Control Female -0.13
9 8.0 0.81 0.74 Control Female -0.07
9 10.0 0.59 0.62 Control Female 0.03
9 12.0 0.43 0.54 Control Female 0.11
10 0.5 0.81 0.82 Control Female 0.01
10 1.0 1.05 1.03 Control Female -0.02
10 2.0 1.04 1.04 Control Female -0.0
10 4.0 0.77 0.81 Control Female 0.04
10 6.0 0.55 0.52 Control Female -0.03
10 8.0 0.39 0.35 Control Female -0.04
10 10.0 0.28 0.31 Control Female 0.03
10 12.0 0.2 0.21 Control Female 0.01
11 0.5 0.08 0.07 NAFLD Male -0.01
11 1.0 0.15 0.08 NAFLD Male -0.07
11 2.0 0.24 0.13 NAFLD Male -0.11
11 4.0 0.32 0.45 NAFLD Male 0.12
11 6.0 0.33 0.38 NAFLD Male 0.05
11 8.0 0.3 0.28 NAFLD Male -0.02
11 10.0 0.25 0.23 NAFLD Male -0.02
11 12.0 0.2 0.16 NAFLD Male -0.04
12 0.5 0.72 0.75 NAFLD Female 0.03
12 1.0 0.84 0.76 NAFLD Female -0.08
12 2.0 0.8 0.77 NAFLD Female -0.03
12 4.0 0.67 0.74 NAFLD Female 0.07
12 6.0 0.56 0.65 NAFLD Female 0.09
12 8.0 0.46 0.48 NAFLD Female 0.02
12 10.0 0.38 0.34 NAFLD Female -0.05
12 12.0 0.32 0.25 NAFLD Female -0.07
13 0.5 0.28 0.07 Control Female -0.21
13 1.0 0.49 0.38 Control Female -0.1
13 2.0 0.74 0.94 Control Female 0.2
13 4.0 0.88 0.84 Control Female -0.04
13 6.0 0.77 0.79 Control Female 0.02
13 8.0 0.61 0.57 Control Female -0.03
13 10.0 0.45 0.44 Control Female -0.01
13 12.0 0.32 0.32 Control Female 0.01
14 0.5 0.26 0.04 NAFLD Female -0.22
14 1.0 0.44 0.35 NAFLD Female -0.1
14 2.0 0.64 0.84 NAFLD Female 0.19
14 4.0 0.68 0.73 NAFLD Female 0.04
14 6.0 0.54 0.45 NAFLD Female -0.1
14 8.0 0.39 0.34 NAFLD Female -0.05
14 10.0 0.26 0.26 NAFLD Female 0.01
14 12.0 0.16 0.24 NAFLD Female 0.07
15 0.5 0.3 0.11 NAFLD Male -0.19
15 1.0 0.49 0.61 NAFLD Male 0.12
15 2.0 0.67 0.68 NAFLD Male 0.01
15 4.0 0.64 0.67 NAFLD Male 0.03
15 6.0 0.48 0.42 NAFLD Male -0.06
15 8.0 0.33 0.31 NAFLD Male -0.02
15 10.0 0.22 0.26 NAFLD Male 0.04
15 12.0 0.15 0.17 NAFLD Male 0.02
16 0.5 0.16 0.05 NAFLD Male -0.12
16 1.0 0.26 0.35 NAFLD Male 0.1
16 2.0 0.33 0.32 NAFLD Male -0.01
16 4.0 0.28 0.27 NAFLD Male -0.01
16 6.0 0.19 0.17 NAFLD Male -0.02
16 8.0 0.12 0.13 NAFLD Male 0.01
16 10.0 0.07 0.09 NAFLD Male 0.02
16 12.0 0.05 0.05 NAFLD Male 0.0
17 0.5 0.32 0.16 NAFLD Female -0.16
17 1.0 0.54 0.59 NAFLD Female 0.06
17 2.0 0.74 0.78 NAFLD Female 0.04
17 4.0 0.71 0.76 NAFLD Female 0.05
17 6.0 0.53 0.43 NAFLD Female -0.1
17 8.0 0.36 0.35 NAFLD Female -0.01
17 10.0 0.23 0.25 NAFLD Female 0.02
17 12.0 0.15 0.2 NAFLD Female 0.05
18 0.5 0.49 0.18 Control Female -0.31
18 1.0 0.81 0.82 Control Female 0.01
18 2.0 1.1 1.27 Control Female 0.16
18 4.0 1.03 1.06 Control Female 0.03
18 6.0 0.72 0.65 Control Female -0.07
18 8.0 0.45 0.38 Control Female -0.07
18 10.0 0.26 0.28 Control Female 0.02
18 12.0 0.14 0.19 Control Female 0.04
19 0.5 0.15 0.04 NAFLD Female -0.11
19 1.0 0.27 0.21 NAFLD Female -0.06
19 2.0 0.43 0.43 NAFLD Female -0.01
19 4.0 0.56 0.66 NAFLD Female 0.1
19 6.0 0.54 0.52 NAFLD Female -0.02
19 8.0 0.47 0.48 NAFLD Female 0.01
19 10.0 0.38 0.38 NAFLD Female 0.0
19 12.0 0.29 0.24 NAFLD Female -0.05
20 0.5 0.38 0.07 NAFLD Female -0.31
20 1.0 0.6 0.82 NAFLD Female 0.22
20 2.0 0.75 0.79 NAFLD Female 0.04
20 4.0 0.63 0.58 NAFLD Female -0.05
20 6.0 0.44 0.39 NAFLD Female -0.05
20 8.0 0.29 0.27 NAFLD Female -0.02
20 10.0 0.19 0.23 NAFLD Female 0.04
20 12.0 0.13 0.19 NAFLD Female 0.07
21 0.5 0.37 0.28 NAFLD Male -0.09
21 1.0 0.56 0.66 NAFLD Male 0.1
21 2.0 0.68 0.64 NAFLD Male -0.04
21 4.0 0.59 0.62 NAFLD Male 0.02
21 6.0 0.45 0.43 NAFLD Male -0.02
21 8.0 0.34 0.31 NAFLD Male -0.03
21 10.0 0.26 0.29 NAFLD Male 0.03
21 12.0 0.19 0.2 NAFLD Male 0.0
22 0.5 0.28 0.21 Control Male -0.07
22 1.0 0.42 0.5 Control Male 0.08
22 2.0 0.5 0.47 Control Male -0.03
22 4.0 0.42 0.42 Control Male 0.0
22 6.0 0.31 0.32 Control Male 0.01
22 8.0 0.23 0.22 Control Male -0.01
22 10.0 0.16 0.17 Control Male 0.01
22 12.0 0.12 0.11 Control Male -0.01
23 0.5 0.46 0.18 Control Female -0.28
23 1.0 0.75 0.65 Control Female -0.1
23 2.0 1.03 1.23 Control Female 0.2
23 4.0 0.96 1.05 Control Female 0.09
23 6.0 0.67 0.58 Control Female -0.1
23 8.0 0.42 0.36 Control Female -0.06
23 10.0 0.24 0.22 Control Female -0.02
23 12.0 0.14 0.14 Control Female 0.0
24 0.5 0.2 0.14 NAFLD Male -0.06
24 1.0 0.33 0.41 NAFLD Male 0.08
24 2.0 0.44 0.4 NAFLD Male -0.04
24 4.0 0.41 0.42 NAFLD Male 0.01
24 6.0 0.31 0.31 NAFLD Male 0.0
24 8.0 0.22 0.21 NAFLD Male -0.01
24 10.0 0.15 0.17 NAFLD Male 0.02
24 12.0 0.1 0.09 NAFLD Male -0.02
25 0.5 0.28 0.05 NAFLD Female -0.23
25 1.0 0.48 0.43 NAFLD Female -0.05
25 2.0 0.7 0.82 NAFLD Female 0.12
25 4.0 0.75 0.8 NAFLD Female 0.06
25 6.0 0.6 0.56 NAFLD Female -0.03
25 8.0 0.42 0.38 NAFLD Female -0.04
25 10.0 0.28 0.28 NAFLD Female -0.0
25 12.0 0.18 0.18 NAFLD Female -0.0
26 0.5 0.65 0.38 NAFLD Female -0.27
26 1.0 1.0 1.2 NAFLD Female 0.2
26 2.0 1.23 1.26 NAFLD Female 0.03
26 4.0 1.0 0.98 NAFLD Female -0.02
26 6.0 0.67 0.59 NAFLD Female -0.08
26 8.0 0.43 0.42 NAFLD Female -0.01
26 10.0 0.27 0.33 NAFLD Female 0.06
26 12.0 0.17 0.22 NAFLD Female 0.05
27 0.5 0.1 0.07 NAFLD Male -0.02
27 1.0 0.17 0.18 NAFLD Male 0.02
27 2.0 0.24 0.23 NAFLD Male -0.01
27 4.0 0.27 0.3 NAFLD Male 0.02
27 6.0 0.24 0.22 NAFLD Male -0.01
27 8.0 0.19 0.17 NAFLD Male -0.01
27 10.0 0.14 0.16 NAFLD Male 0.01
27 12.0 0.11 0.11 NAFLD Male 0.0
28 0.5 0.23 0.16 Control Female -0.08
28 1.0 0.4 0.39 Control Female -0.01
28 2.0 0.58 0.57 Control Female -0.01
28 4.0 0.62 0.69 Control Female 0.07
28 6.0 0.49 0.46 Control Female -0.04
28 8.0 0.35 0.39 Control Female 0.04
28 10.0 0.23 0.18 Control Female -0.05
28 12.0 0.15 0.12 Control Female -0.03
29 0.5 0.33 0.24 Control Female -0.09
29 1.0 0.55 0.5 Control Female -0.05
29 2.0 0.8 0.86 Control Female 0.06
29 4.0 0.84 0.91 Control Female 0.07
29 6.0 0.66 0.58 Control Female -0.08
29 8.0 0.46 0.43 Control Female -0.03
29 10.0 0.3 0.33 Control Female 0.03
29 12.0 0.19 0.2 Control Female 0.01
30 0.5 0.23 0.19 Control Female -0.04
30 1.0 0.4 0.41 Control Female 0.01
30 2.0 0.6 0.6 Control Female -0.0
30 4.0 0.68 0.71 Control Female 0.03
30 6.0 0.58 0.56 Control Female -0.03
30 8.0 0.45 0.43 Control Female -0.02
30 10.0 0.33 0.36 Control Female 0.02
30 12.0 0.24 0.24 Control Female 0.0
31 0.5 0.36 0.31 Control Female -0.05
31 1.0 0.61 0.66 Control Female 0.05
31 2.0 0.85 0.82 Control Female -0.03
31 4.0 0.86 0.9 Control Female 0.05
31 6.0 0.65 0.62 Control Female -0.03
31 8.0 0.45 0.43 Control Female -0.02
31 10.0 0.3 0.31 Control Female 0.01
31 12.0 0.19 0.21 Control Female 0.02
32 0.5 0.24 0.14 NAFLD Male -0.09
32 1.0 0.4 0.41 NAFLD Male 0.01
32 2.0 0.56 0.61 NAFLD Male 0.04
32 4.0 0.57 0.58 NAFLD Male 0.02
32 6.0 0.43 0.39 NAFLD Male -0.04
32 8.0 0.29 0.28 NAFLD Male -0.01
32 10.0 0.19 0.2 NAFLD Male 0.01
32 12.0 0.12 0.14 NAFLD Male 0.03
33 0.5 0.17 0.05 NAFLD Male -0.12
33 1.0 0.28 0.23 NAFLD Male -0.06
33 2.0 0.42 0.56 NAFLD Male 0.14
33 4.0 0.45 0.42 NAFLD Male -0.03
33 6.0 0.36 0.33 NAFLD Male -0.03
33 8.0 0.26 0.24 NAFLD Male -0.02
33 10.0 0.18 0.21 NAFLD Male 0.03
33 12.0 0.12 0.14 NAFLD Male 0.02
34 0.5 0.09 0.1 NAFLD Male 0.01
34 1.0 0.16 0.19 NAFLD Male 0.03
34 2.0 0.25 0.23 NAFLD Male -0.03
34 4.0 0.32 0.32 NAFLD Male -0.0
34 6.0 0.32 0.3 NAFLD Male -0.02
34 8.0 0.28 0.3 NAFLD Male 0.02
34 10.0 0.24 0.25 NAFLD Male 0.02
34 12.0 0.2 0.18 NAFLD Male -0.02
35 0.5 0.15 0.02 NAFLD Female -0.13
35 1.0 0.27 0.14 NAFLD Female -0.14
35 2.0 0.46 0.38 NAFLD Female -0.08
35 4.0 0.64 0.8 NAFLD Female 0.16
35 6.0 0.67 0.74 NAFLD Female 0.07
35 8.0 0.63 0.61 NAFLD Female -0.02
35 10.0 0.55 0.51 NAFLD Female -0.04
35 12.0 0.46 0.42 NAFLD Female -0.04
36 0.5 0.19 0.12 NAFLD Female -0.07
36 1.0 0.32 0.36 NAFLD Female 0.04
36 2.0 0.47 0.46 NAFLD Female -0.01
36 4.0 0.53 0.57 NAFLD Female 0.04
36 6.0 0.48 0.43 NAFLD Female -0.05
36 8.0 0.41 0.39 NAFLD Female -0.01
36 10.0 0.34 0.38 NAFLD Female 0.04
36 12.0 0.28 0.27 NAFLD Female -0.01
37 0.5 0.1 0.02 NAFLD Male -0.08
37 1.0 0.17 0.1 NAFLD Male -0.08
37 2.0 0.28 0.27 NAFLD Male -0.01
37 4.0 0.36 0.44 NAFLD Male 0.08
37 6.0 0.34 0.37 NAFLD Male 0.03
37 8.0 0.29 0.28 NAFLD Male -0.02
37 10.0 0.23 0.22 NAFLD Male -0.02
37 12.0 0.18 0.15 NAFLD Male -0.03
If you use FacetGrid.map_dataframe, you can pass the arguments almost as if you were directly calling lineplot directly:
g = sns.FacetGrid(data = m, col='Sex', row='Group')
g.map_dataframe(sns.lineplot, x='Time', y='residual', units='Subject', estimator=None)
A potential work around is to define a new function
g = sns.FacetGrid(data = m, col='Sex', row = 'Group')
def f(x,y,z,*args,**kwargs):
return sns.lineplot(x = x, y = y, units = z, estimator = None, *args, **kwargs)
g.map(f, 'Time','residual','Subject')
Related
I have generated two column data files ($Data1 and &Data2) with set table; here are the first values of Data1:
01/11/2021 00:15:00 15.0 70.0 0.10 1010.0 0.8 228 1.4 0.0
01/11/2021 00:30:00 14.8 71.0 0.20 1010.0 1.0 200 1.9 0.0
01/11/2021 00:45:00 14.6 73.0 0.30 1010.1 0.8 142 1.4 0.0
01/11/2021 01:00:00 14.6 74.0 0.20 1010.0 1.2 147 2.0 0.0
and Data2:
01/11/2021 00:15:00 14.8 56.0 0.00 1012.0 2.1 228 4.8 0.0
01/11/2021 00:30:00 14.2 59.0 0.00 1012.1 2.7 202 5.8 0.0
01/11/2021 00:45:00 14.6 62.0 0.00 1012.0 1.6 228 3.4 0.0
01/11/2021 01:00:00 14.0 65.0 0.00 1011.9 1.9 228 3.3 0.0
I have merged them into a new file called $Data with print, like this:
set print $Data
do for [i=1:|$Data1|-6] { print $Data1[i] }
do for [i=1:|$Data2|-6] { print $Data2[i] }
set print
I know how to plot the file $Data, but do you know how to edit it? (by editing I mean to be able to read the numerical values, not to plot them)
I have a text-file with some listing as shown below.I want to fill in missing numbers in the first columns as shown.
Typical original text:
5 401 6 5.80 0.15 -3.56 0.61 -0.02 0.96
8 -6.11 -0.64 4.07 0.24 0.20 0.38
402 6 -0.33 1.07 0.30 1.29 -0.00 2.04
8 0.02 -0.59 0.21 0.50 0.22 0.79
403 6 3.77 -0.70 -2.74 -0.94 0.20 -1.48
8 -4.08 0.22 2.23 -0.06 -0.19 -0.09
404 6 -2.36 0.22 1.12 -0.26 0.21 -0.41
8 2.05 0.27 -1.63 0.20 -0.16 0.32
16 401 16 -6.30 -0.76 -3.61 0.64 -0.22 -1.01
227 5.99 0.27 4.12 0.47 0.15 -0.74
402 16 -12.50 0.14 -7.52 -0.01 -0.24 0.02
227 12.19 0.35 8.03 0.24 0.13 -0.38
403 16 20.48 0.19 12.84 -0.29 0.03 0.46
227 -20.79 -0.68 -13.35 -0.64 -0.18 1.02
404 16 14.28 1.09 8.93 -0.94 0.01 1.48
227 -14.59 -0.60 -9.44 -0.87 -0.21 1.38
709 401 374 -1.17 -0.99 25.11 0.63 -1.12 -0.11
204 1.05 0.79 -24.91 -0.19 -0.62 0.06
402 374 -1.55 1.09 30.49 -0.90 -1.40 0.14
204 1.43 -0.90 -30.28 0.41 -0.79 -0.09
403 374 1.90 -1.58 0.79 1.65 0.50 -0.21
204 -2.02 1.38 -0.99 -0.93 0.41 0.14
404 374 1.51 0.50 6.16 0.12 0.22 0.04
204 -1.64 -0.31 -6.37 -0.32 0.24 -0.02
How I want it to be:
5 401 6 5.80 0.15 -3.56 0.61 -0.02 0.96
5 401 8 -6.11 -0.64 4.07 0.24 0.20 0.38
5 402 6 -0.33 1.07 0.30 1.29 -0.00 2.04
5 402 8 0.02 -0.59 0.21 0.50 0.22 0.79
5 403 6 3.77 -0.70 -2.74 -0.94 0.20 -1.48
5 403 8 -4.08 0.22 2.23 -0.06 -0.19 -0.09
5 404 6 -2.36 0.22 1.12 -0.26 0.21 -0.41
5 404 8 2.05 0.27 -1.63 0.20 -0.16 0.32
16 401 16 -6.30 -0.76 -3.61 0.64 -0.22 -1.01
16 401 227 5.99 0.27 4.12 0.47 0.15 -0.74
16 402 16 -12.50 0.14 -7.52 -0.01 -0.24 0.02
16 402 227 12.19 0.35 8.03 0.24 0.13 -0.38
16 403 16 20.48 0.19 12.84 -0.29 0.03 0.46
16 403 227 -20.79 -0.68 -13.35 -0.64 -0.18 1.02
16 404 16 14.28 1.09 8.93 -0.94 0.01 1.48
16 404 227 -14.59 -0.60 -9.44 -0.87 -0.21 1.38
709 401 374 -1.17 -0.99 25.11 0.63 -1.12 -0.11
709 401 204 1.05 0.79 -24.91 -0.19 -0.62 0.06
709 402 374 -1.55 1.09 30.49 -0.90 -1.40 0.14
709 402 204 1.43 -0.90 -30.28 0.41 -0.79 -0.09
709 403 374 1.90 -1.58 0.79 1.65 0.50 -0.21
709 403 204 -2.02 1.38 -0.99 -0.93 0.41 0.14
709 404 374 1.51 0.50 6.16 0.12 0.22 0.04
709 404 204 -1.64 -0.31 -6.37 -0.32 0.24 -0.02
I had a similar problem before, where two "cells" were missing regurlarly (e.g. the 402 to 404 numbers above also were missing. Then I managed to use this script:
for /F "delims=" %%i in ('type "tmp1.txt"') do (
set row=%%i
set cnt=0
for %%l in (%%i) do set /A cnt+=1
if !cnt! equ 7 (
set row=!header! !row!
) else (
for /F "tokens=1,2" %%j in ("%%i") do set header=%%j %%k
)
echo.!row!
) >> "tmp2.txt"
Idea anyone?
Assuming, the file is formatted with spaces (no TABs):
#echo off
setlocal enabledelayedexpansion
(for /f "delims=" %%a in (tmp1.txt) do (
set "line=%%a"
set "col1=!line:~0,3!"
set "col2=!line:~3,5!"
set "rest=!line:~8!"
if "!col1!" == " " (
set "col1=!old1!"
) else (
set "old1=!col1!"
)
if "!col2!" == " " (
set "col2=!old2!"
) else (
set "old2=!col2!"
)
echo !col1!!col2!!rest!
))>tmp2.txt
You will notice, I don't split the lines into tokens with for /f, but take the lines as a whole and "split" them manually to preserve the format (the length of the substring). Then simply replace "empty values" with the saved value from the line before.
Edit in response to I have made a mistake when pasting the original text. There are 4 (empty) spaces before all lines.:
Adapt the counting as follows ( first "token increase the lenght by 4, for the rest add 4 to the start position, keep the lengths unchanged):
set "col1=!line:~0,7!"
set "col2=!line:~7,5!"
set "rest=!line:~12!"
and adapt if "!col1!" == " " ( to if "!col1!" == " " ( (from three to seven spaces)
Assume I have the following DataFrame in Scala Spark, where year year value is a String categorical representation, but there is an order in the data.
+-----+
|years|
+-----+
| 0-1|
| 1-2|
| 2-5|
| 5-10|
+-----+
I would like to create a resulting pairwise matrix, representing the "distance" for each pair of values. Same values are given a score of 1, values at the extreme end are given 0 for eg. "0-1" and "5-10". The remaining values are filled with a linear model.:
I would like the following expected results (In DataFrame or similar structure to query a pair)
x/y, 0-1, 1-2, 2-5, 5-10,
0-1, 1 , 0.33, 0.67, 0,
1-2, 0.33, 1 , 0.33, 0.67,
2-5, 0.67, 0.33, 1 , 0.33,
5-10, 0 , 0.67, 0.33, 1
In the end, for a given pair of years I would like to retrieve the distance values. I would like to avoid hard coding this solution, is there a better way to do it?
Simply map your labels to points 0 = 0/g, 1/g, 2/g, ... , g/g = 1, where g is the number of gaps between two adjacent labels, that is, number of labels minus one:
def similarityMatrix[A](xs: List[A]): Map[A, Map[A, Double]] = {
val numGaps = xs.size - 1
val positions = xs.zip((0 to numGaps).map(i => i.toDouble / numGaps)).toMap
def similarity(x: A, y: A) = 1.0 - math.abs(positions(x) - positions(y))
xs.map(x => (x, xs.map(y => (y, similarity(x, y))).toMap)).toMap
}
Your example:
val ranges = List("0-1", "1-2", "2-5", "5-10")
val matrix = similarityMatrix(ranges)
for (x <- ranges) {
for (y <- ranges) {
printf("%4.2f ", matrix(x)(y))
}
println()
}
Gives the following nested map:
1.00 0.67 0.33 0.00
0.67 1.00 0.67 0.33
0.33 0.67 1.00 0.67
0.00 0.33 0.67 1.00
Works for any dimension, of course:
1.00 0.94 0.88 0.81 0.75 0.69 0.63 0.56 0.50 0.44 0.38 0.31 0.25 0.19 0.13 0.06 0.00
0.94 1.00 0.94 0.88 0.81 0.75 0.69 0.63 0.56 0.50 0.44 0.38 0.31 0.25 0.19 0.13 0.06
0.88 0.94 1.00 0.94 0.88 0.81 0.75 0.69 0.63 0.56 0.50 0.44 0.38 0.31 0.25 0.19 0.13
0.81 0.88 0.94 1.00 0.94 0.88 0.81 0.75 0.69 0.63 0.56 0.50 0.44 0.38 0.31 0.25 0.19
0.75 0.81 0.88 0.94 1.00 0.94 0.88 0.81 0.75 0.69 0.63 0.56 0.50 0.44 0.38 0.31 0.25
0.69 0.75 0.81 0.88 0.94 1.00 0.94 0.88 0.81 0.75 0.69 0.63 0.56 0.50 0.44 0.38 0.31
0.63 0.69 0.75 0.81 0.88 0.94 1.00 0.94 0.88 0.81 0.75 0.69 0.63 0.56 0.50 0.44 0.38
0.56 0.63 0.69 0.75 0.81 0.88 0.94 1.00 0.94 0.88 0.81 0.75 0.69 0.63 0.56 0.50 0.44
0.50 0.56 0.63 0.69 0.75 0.81 0.88 0.94 1.00 0.94 0.88 0.81 0.75 0.69 0.63 0.56 0.50
0.44 0.50 0.56 0.63 0.69 0.75 0.81 0.88 0.94 1.00 0.94 0.88 0.81 0.75 0.69 0.63 0.56
0.38 0.44 0.50 0.56 0.63 0.69 0.75 0.81 0.88 0.94 1.00 0.94 0.88 0.81 0.75 0.69 0.63
0.31 0.38 0.44 0.50 0.56 0.63 0.69 0.75 0.81 0.88 0.94 1.00 0.94 0.88 0.81 0.75 0.69
0.25 0.31 0.38 0.44 0.50 0.56 0.63 0.69 0.75 0.81 0.88 0.94 1.00 0.94 0.88 0.81 0.75
0.19 0.25 0.31 0.38 0.44 0.50 0.56 0.63 0.69 0.75 0.81 0.88 0.94 1.00 0.94 0.88 0.81
0.13 0.19 0.25 0.31 0.38 0.44 0.50 0.56 0.63 0.69 0.75 0.81 0.88 0.94 1.00 0.94 0.88
0.06 0.13 0.19 0.25 0.31 0.38 0.44 0.50 0.56 0.63 0.69 0.75 0.81 0.88 0.94 1.00 0.94
0.00 0.06 0.13 0.19 0.25 0.31 0.38 0.44 0.50 0.56 0.63 0.69 0.75 0.81 0.88 0.94 1.00
I want to sort a file based on values in columns 2-8?
Essentially I want ascending order based on the highest value that appears on the line in any of those fields but ignoring columns 1, 9 and 10. i.e. the line with the highest value should be the last line of the file, 2nd largest value should be 2nd last line etc... If the next number in the ascending order appears on multiple lines (like A/B) I don't care of the order it gets printed.
I've looked at using sort but can't figure out an easy way to do what I want...
I'm a bit stumped, any ideas?
Input:
#1 2 3 4 5 6 7 8 9 10
A 0.00 0.00 0.01 0.23 0.19 0.07 0.26 0.52 0.78
B 0.00 0.00 0.02 0.26 0.19 0.09 0.20 0.56 0.76
C 0.00 0.00 0.02 0.16 0.20 0.22 2.84 0.60 3.44
D 0.00 0.00 0.02 0.29 0.22 0.09 0.28 0.62 0.90
E 0.00 0.00 0.90 0.09 0.18 0.05 0.24 1.21 1.46
F 0.00 0.00 1.06 0.03 0.04 0.01 0.00 1.13 1.14
G 0.00 0.00 1.11 0.10 0.31 0.08 0.64 1.60 2.25
H 0.00 0.00 1.39 0.03 0.04 0.01 0.01 1.47 1.48
I 0.00 0.00 1.68 0.16 0.55 0.24 5.00 2.63 7.63
J 0.00 0.00 6.86 0.52 1.87 0.59 12.79 9.83 22.62
K 0.00 0.00 7.26 0.57 2.00 0.64 11.12 10.47 21.59
Expected output:
#1 2 3 4 5 6 7 8 9 10
A 0.00 0.00 0.01 0.23 0.19 0.07 (0.26) 0.52 0.78
B 0.00 0.00 0.02 (0.26) 0.19 0.09 0.20 0.56 0.76
D 0.00 0.00 0.02 (0.29) 0.22 0.09 0.28 0.62 0.90
E 0.00 0.00 (0.90) 0.09 0.18 0.05 0.24 1.21 1.46
F 0.00 0.00 (1.06) 0.03 0.04 0.01 0.00 1.13 1.14
G 0.00 0.00 (1.11) 0.10 0.31 0.08 0.64 1.60 2.25
H 0.00 0.00 (1.39) 0.03 0.04 0.01 0.01 1.47 1.48
C 0.00 0.00 0.02 0.16 0.20 0.22 (2.84) 0.60 3.44
I 0.00 0.00 1.68 0.16 0.55 0.24 (5.00) 2.63 7.63
K 0.00 0.00 7.26 0.57 2.00 0.64 (11.12) 10.47 21.59
J 0.00 0.00 6.86 0.52 1.87 0.59 (12.79) 9.83 22.62
Preprocess the data: print the max of columns 2 through 8 at the start of each line, then sort, then remove the added column:
awk '
NR==1{print "x ", $0}
NR>1{
max = $2;
for( i = 3; i <= 8; i++ )
if( $i > max )
max = $i;
print max, $0
}' OFS=\\t input-file | sort -n | cut -f 2-
Another pure awk variant:
$ awk 'NR==1; # print header
NR>1{ #For other lines,
a=$2;
ai=2;
for(i=3;i<=8;i++){
if($i>a){
a=$i;
ai=i;
}
} # Find the max number in the line
$ai= "(" $ai ")"; # decoration - mark highest with ()
g[$0]=a;
}
function cmp_num_val(i1, v1, i2, v2) {return (v1 - v2);} # sorting function
END{
PROCINFO["sorted_in"]="cmp_num_val"; # assign sorting function
for (a in g) print a; # print
}' sortme.txt | column -t # column -t for formatting.
#1 2 3 4 5 6 7 8 9 10
A 0.00 0.00 0.01 0.23 0.19 0.07 (0.26) 0.52 0.78
B 0.00 0.00 0.02 (0.26) 0.19 0.09 0.20 0.56 0.76
D 0.00 0.00 0.02 (0.29) 0.22 0.09 0.28 0.62 0.90
E 0.00 0.00 (0.90) 0.09 0.18 0.05 0.24 1.21 1.46
F 0.00 0.00 (1.06) 0.03 0.04 0.01 0.00 1.13 1.14
G 0.00 0.00 (1.11) 0.10 0.31 0.08 0.64 1.60 2.25
H 0.00 0.00 (1.39) 0.03 0.04 0.01 0.01 1.47 1.48
C 0.00 0.00 0.02 0.16 0.20 0.22 (2.84) 0.60 3.44
I 0.00 0.00 1.68 0.16 0.55 0.24 (5.00) 2.63 7.63
K 0.00 0.00 7.26 0.57 2.00 0.64 (11.12) 10.47 21.59
J 0.00 0.00 6.86 0.52 1.87 0.59 (12.79) 9.83 22.62
When I require open-uri and either active_support/core_ext/numeric/conversions.rb or active_support/core_ext/big_decimal/conversions.rb, 'open "http://some.website.com"' becomes extremely slow.
How can I avoid this?
Ruby 2.0.0, active_support 4.0.0
EDIT
Here are profiling results. There are so many Gem::Dependency#matching_specs (and others) calls.
source (with conversions)
require 'open-uri'
require 'active_support/core_ext/numeric/conversions'
open 'http://stackoverflow.com'
result
% cumulative self self total
time seconds seconds calls ms/call ms/call name
21.46 0.56 0.56 22620 0.02 0.11 Gem::Dependency#matching_specs
13.41 0.91 0.35 4567 0.08 0.76 Array#each
5.36 1.05 0.14 1500 0.09 0.15 Gem::Version#<=>
4.98 1.18 0.13 3810 0.03 0.11 Gem::BasicSpecification#contains_requirable_file?
3.83 1.28 0.10 5353 0.02 0.03 Gem::StubSpecification#activated?
3.45 1.37 0.09 27604 0.00 0.00 Gem::StubSpecification#name
3.07 1.45 0.08 1382 0.06 0.33 nil#
3.07 1.53 0.08 2139 0.04 0.25 Gem::Specification#initialize
2.68 1.60 0.07 106 0.66 5.85 Kernel#gem_original_require
2.68 1.67 0.07 21258 0.00 0.00 String#===
...
source (without conversions)
require 'open-uri'
open 'http://stackoverflow.com'
result
% cumulative self self total
time seconds seconds calls ms/call ms/call name
36.36 0.08 0.08 46 1.74 10.65 Kernel#gem_original_require
22.73 0.13 0.05 816 0.06 0.09 nil#
4.55 0.14 0.01 46 0.22 11.09 Kernel#require
4.55 0.15 0.01 22 0.45 22.27 Net::BufferedIO#rbuf_fill
4.55 0.16 0.01 3 3.33 3.33 URI::Parser#split
4.55 0.17 0.01 88 0.11 0.34 Module#module_eval
4.55 0.18 0.01 133 0.08 0.45 Object#DelegateClass
4.55 0.19 0.01 184 0.05 0.11 Gem.find_unresolved_default_spec
4.55 0.20 0.01 1280 0.01 0.01 Integer#chr
4.55 0.21 0.01 1280 0.01 0.01 String#%
4.55 0.22 0.01 1381 0.01 0.01 Module#method_added
...