Related
This is a follow up of question:
Calculating the total time in an online meeting
Suppose there was a meeting and the meeting record is saved in a CSV file. How to write a bash script/awk script to find out the total amount of time for which an employee stayed online. One employee may leave and rejoin the meeting, all his/her online time should be calculated.
What I did is as follows, but got stuck on how to compare one record with all other record, and add the total time of each joined and left pairs of a person.
cat tst.awk
BEGIN { FS=" *, *"; OFS=", " }
NR==1 { next }
$1 in joined {
jt = time2secs(joined[$1])
lt = time2secs($3)
totSecs[$1] += (lt - jt)
delete joined[$1]
next
}
{ joined[$1] = $3 }
END {
for (name in totSecs) {
print name, secs2time(totSecs[name])
}
}
function time2secs(time, t) {
split(time,t,/:/)
return (t[1]*60 + t[2])*60 + t[3]
}
function secs2time(secs, h,m,s) {
h = int(secs / (60*60))
m = int((secs - (h*60*60)) / 60)
s = int(secs % 60)
return sprintf("%02d:%02d:%02d", h, m, s)
}
The start_time and end_time of the meeting are given at command line such as:
$ ./script.sh input.csv 10:00:00 13:00:00
Only the time between startTime (10:00:00) and EndTime (13:00:00) should be considered. The persons who have joined but not left, must be considered as left at Endtime and their online time should be added also. I tried but no desired result.
The output should look like this: (Can be stored in an output file)
Bob, 02:44:00
John, 00:41:00
David, 02:50:00
James, 01:39:30
The contents of the CSV file is as follows:
Employee_name, Joined/Left, Time
David, joined, 09:40:00
David, left, 10:20:00
David, joined, 10:30:00
John, joined, 10:00:00
Bob, joined, 10:01:00
James, joined, 10:00:30
Bob, left, 10:20:00
James, left, 11:40:00
John, left, 10:41:00
Bob, joined, 10:35:00
$ cat tst.awk
BEGIN { FS=" *, *"; OFS=", " }
NR==1 { next }
{ names[$1] }
$3 < beg { next }
$3 >= end { exit }
$2 == "joined" {
joined[$1] = $3
}
($2 == "left") && ($1 in joined) {
jt = time2secs(joined[$1])
lt = time2secs($3)
totSecs[$1] += (lt - jt)
delete joined[$1]
next
}
END {
for (name in names) {
if (name in joined) {
jt = time2secs(joined[name])
lt = time2secs(end)
totSecs[name] += (lt - jt)
}
print name, secs2time(totSecs[name])
}
}
function time2secs(time, t) {
split(time,t,/:/)
return (t[1]*60 + t[2])*60 + t[3]
}
function secs2time(secs, h,m,s) {
h = int(secs / (60*60))
m = int((secs - (h*60*60)) / 60)
s = int(secs % 60)
return sprintf("%02d:%02d:%02d", h, m, s)
}
.
$ awk -v beg='10:00:00' -v end='13:00:00' -f tst.awk file
James, 01:39:30
David, 02:30:00
Bob, 02:44:00
John, 00:41:00
Let's say we have a tree like the one below. Is there an algorithm that given 2 nodes find the path that connects them. For example, given (A,E) it will return [A,B,E], or given (D,G) it will return [D,B,A,C,G]
A
/ \
B C
/ \ / \
D E F G
You will need to have a tree implementation where a child node has a link to its parent.
Then for both nodes you can build the path from the node to the root, just by following the parent link.
Then compare the two paths, starting from the ends of the paths (where the root is): as long as they are the same, remove that common node from both paths.
Finally you are left with two diverting paths. Reverse the second, and join the two paths, putting the last removed node in between the two.
Here is an implementation in JavaScript:
function getPathToRoot(a) {
if (a.parent == null) return [a];
return [a].concat(getPathToRoot(a.parent));
}
function findPath(a, b) {
let p = getPathToRoot(a);
let q = getPathToRoot(b);
let common = null;
while (p.length > 0 && q.length > 0 && p[p.length-1] == q[q.length-1]) {
common = p.pop();
q.pop();
}
return p.concat(common, q.reverse());
}
// Create the example tree
let nodes = {};
for (let label of "ABCDEFG") {
nodes[label] = { label: label, parent: null };
}
nodes.B.parent = nodes.C.parent = nodes.A;
nodes.D.parent = nodes.E.parent = nodes.B;
nodes.F.parent = nodes.G.parent = nodes.C;
// Perform the algorithm
let path = findPath(nodes.E, nodes.G);
// Output the result
console.log("Path from E to G is:");
for (let node of path) {
console.log(node.label);
}
I'm trying to parse a file that contains lines in a hierarchical structure. For example the file:
a b c
a b d
a B C
A B C
indicates that a contains b and B, that b contains c and d, that B contains C. A contains a different B which contains its own C.
This is much like a list of files.
I want to format this in a hierarchical bracketed way like:
a {
b {
c
d
}
B {
C
}
}
A {
B {
C
}
}
I couldn't come up with a decent way to do this. I thought that AWK would be my best bet, but came up short with how to actually implement it.
Context
My input is actually a list of files. I can of course separate the fields by spaces if needed, or keep them with /. The files are unordered and generated from a code-base during compile-time via inspection. My desired output is going to be a graphviz DOT file containing each file in its own subgraph.
Thus for the input:
a/b/c
a/b/d
a/B/C
A/B/C
the output would be
digraph {
subgraph cluster_a {
label = a
subgraph cluster_b {
label = b
node_1 [label=c]
node_2 [label=d]
}
subgraph cluster_B {
label = B
node_3 [label=C]
}
}
subgraph cluster_A {
label = A
subgraph cluster_B {
label = B
node_4 [label=C]
}
}
}
Does anybody know how I could get this processing done? I'm open to other tools as well, not just AWK.
NOTE: Depth is not fixed, though I could pre-compute the maximum depth if necessary. Not all leaves will be at the same depth either.
I'm open to other tools as well, not just AWK.
I offer this Python solution:
import sys
INDENT = ' '
NODE_COUNT = 1
def build(node, l):
x = l[0]
if x not in node:
node[x] = {}
if len(l) > 1:
build(node[x], l[1:])
def indent(s, depth):
print('%s%s' % (INDENT * depth, s))
def print_node(label, value, depth):
if len(value.keys()) > 0:
indent('subgraph cluster_%s {' % label, depth)
indent(' label = %s' % label, depth)
for child in value:
print_node(child, value[child], depth+1)
indent('}', depth)
else:
global NODE_COUNT
indent('node_%d [label=%s]' % (NODE_COUNT, label), depth)
NODE_COUNT += 1
def main():
d = {}
for line in sys.stdin:
build(d, [x.strip() for x in line.split()])
print('digraph {')
for k in d.keys():
print_node(k, d[k], 1)
print('}')
if __name__ == '__main__':
main()
Result:
$ cat rels.txt
a b c
a b d
a B C
A B C
$ cat rels.txt | python3 make_rels.py
digraph {
subgraph cluster_a {
label = a
subgraph cluster_b {
label = b
node_1 [label=c]
node_2 [label=d]
}
subgraph cluster_B {
label = B
node_3 [label=C]
}
}
subgraph cluster_A {
label = A
subgraph cluster_B {
label = B
node_4 [label=C]
}
}
}
If the depth is fixed at 3 levels
gawk -F/ '
{f[$1][$2][$3] = 1}
END {
n = 0
print "digraph {"
for (a in f) {
print " subgraph cluster_" a " {"
print " label = " a
for (b in f[a]) {
print " subgraph cluster_" b " {"
print " label = " b
for (c in f[a][b]) {
printf " node_%d [label=%s]\n", ++n, c
}
print " }"
}
print " }"
}
print "}"
}
' file
digraph {
subgraph cluster_A {
label = A
subgraph cluster_B {
label = B
node_1 [label=C]
}
}
subgraph cluster_a {
label = a
subgraph cluster_B {
label = B
node_2 [label=C]
}
subgraph cluster_b {
label = b
node_3 [label=c]
node_4 [label=d]
}
}
}
If the depth is arbitrary, things get complicated.
I usually find the answers to my questions by looking around here (I'm glad stackovergflow exists!), but I haven't found the answer to this one... I hope you can help me :)
I am using the projection.matrix() function from the "popbio" package to create transition matrices. In the function, you have to specify the "stage" and "fate" (both categorical variables), and the "fertilities" (a numeric column).
Everything works fine, but I would like to apply the function to 1:n fertility columns within the data frame, and get a list of matrices generated from the same categorical variables with the different fertility values.
This is how my data frame looks like (I only include the variables I am using for this question):
stage.fate = data.frame(replicate(2, sample(0:6,40,rep=TRUE)))
stage.fate$X1 = as.factor(stage.fate$X1)
stage.fate$X2 = as.factor(stage.fate$X2)
fertilities = data.frame(replicate(10,rnorm(40, .145, .045)))
df = cbind(stage.fate, fertilities)
colnames(df)[1:2]=c("stage", "fate")
prefix = "control"
suffix = seq(1:10)
fer.names = (paste(prefix ,suffix , sep="."))
colnames(df)[3:12] = c(fer.names)
Using
library(popbio)
projection.matrix(df, fertility=control.1)
returns a single transition matrix with the fertility values incorporated into the matrix.
My problem is that I would like to generate a list of matrices with the different fertility values in one go (in reality the length of my data is >=300, and the fertility columns ~100 for each of four different treatments...).
I will appreciate your help!
-W
PS This is how the function in popbio looks like:
projection.matrix =
function (transitions, stage = NULL, fate = NULL, fertility = NULL,
sort = NULL, add = NULL, TF = FALSE)
{
if (missing(stage)) {
stage <- "stage"
}
if (missing(fate)) {
fate <- "fate"
}
nl <- as.list(1:ncol(transitions))
names(nl) <- names(transitions)
stage <- eval(substitute(stage), nl, parent.frame())
fate <- eval(substitute(fate), nl, parent.frame())
if (is.null(transitions[, stage])) {
stop("No stage column matching ", stage)
}
if (is.null(transitions[, fate])) {
stop("No fate column matching ", fate)
}
if (missing(sort)) {
sort <- levels(transitions[, stage])
}
if (missing(fertility)) {
fertility <- intersect(sort, names(transitions))
}
fertility <- eval(substitute(fertility), nl, parent.frame())
tf <- table(transitions[, fate], transitions[, stage])
T_matrix <- try(prop.table(tf, 2)[sort, sort], silent = TRUE)
if (class(T_matrix) == "try-error") {
warning(paste("Error sorting matrix.\n Make sure that levels in stage and fate columns\n match stages listed in sort option above.\n Printing unsorted matrix instead!\n"),
call. = FALSE)
sort <- TRUE
T_matrix <- prop.table(tf, 2)
}
T_matrix[is.nan(T_matrix)] <- 0
if (length(add) > 0) {
for (i in seq(1, length(add), 3)) {
T_matrix[add[i + 0], add[i + 1]] <- as.numeric(add[i +
2])
}
}
n <- length(fertility)
F_matrix <- T_matrix * 0
if (n == 0) {
warning("Missing a fertility column with individual fertility rates\n",
call. = FALSE)
}
else {
for (i in 1:n) {
fert <- tapply(transitions[, fertility[i]], transitions[,
stage], mean, na.rm = TRUE)[sort]
F_matrix[i, ] <- fert
}
}
F_matrix[is.na(F_matrix)] <- 0
if (TF) {
list(T = T_matrix, F = F_matrix)
}
else {
T_matrix + F_matrix
}
}
<environment: namespace:popbio>
My question was answered via ResearchGate by Caner Aktas
Answer:
fertility.list<-vector("list",length(suffix))
names(fertility.list)<-fer.names
for(i in suffix) fertility.list[[i]]<-projection.matrix(df,fertility=fer.names[i])
fertility.list
Applying popbio “projection.matrix” to multiple fertilities and generate list of matrices?. Available from: https://www.researchgate.net/post/Applying_popbio_projectionmatrix_to_multiple_fertilities_and_generate_list_of_matrices#5578524f60614b1a438b459b [accessed Jun 10, 2015].
I use igraph.
I want yo find all the possible paths between 2 nodes.
For the moment, there doesn't seem to exist any function to find all the paths between 2 nodes in igraph
I found this subject that gives code in python:
All possible paths from one node to another in a directed tree (igraph)
I tried to port it to R but I have some little problems. It gives me the error:
Error of for (newpath in newpaths) { :
for() loop sequence incorrect
Here is the code:
find_all_paths <- function(graph, start, end, mypath=vector()) {
mypath = append(mypath, start)
if (start == end) {
return(mypath)
}
paths = list()
for (node in graph[[start]][[1]]) {
if (!(node %in% mypath)){
newpaths <- find_all_paths(graph, node, end, mypath)
for (newpath in newpaths){
paths <- append(paths, newpath)
}
}
}
return(paths)
}
test <- find_all_paths(graph, farth[1], farth[2])
Here is a dummy code taken from the igrah package, from which to get a sample graph and nodes:
actors <- data.frame(name=c("Alice", "Bob", "Cecil", "David",
"Esmeralda"),
age=c(48,33,45,34,21),
gender=c("F","M","F","M","F"))
relations <- data.frame(from=c("Bob", "Cecil", "Cecil", "David",
"David", "Esmeralda"),
to=c("Alice", "Bob", "Alice", "Alice", "Bob", "Alice"),
same.dept=c(FALSE,FALSE,TRUE,FALSE,FALSE,TRUE),
friendship=c(4,5,5,2,1,1), advice=c(4,5,5,4,2,3))
g <- graph.data.frame(relations, directed=FALSE, vertices=actors)
farth <- farthest.nodes(g)
test <- find_all_paths(graph, farth[1], farth[2])
thanks!
If anyone sees where the problem, that would be of great help...
Mathieu
I also attempted to translate that same solution from Python to R and came up with the following which seems to do the job for me:
# Find paths from node index n to m using adjacency list a.
adjlist_find_paths <- function(a, n, m, path = list()) {
path <- c(path, list(n))
if (n == m) {
return(list(path))
} else {
paths = list()
for (child in a[[n]]) {
if (!child %in% unlist(path)) {
child_paths <- adjlist_find_paths(a, child, m, path)
paths <- c(paths, child_paths)
}
}
return(paths)
}
}
# Find paths in graph from vertex source to vertex dest.
paths_from_to <- function(graph, source, dest) {
a <- as_adj_list(graph, mode = "out")
paths <- adjlist_find_paths(a, source, dest)
lapply(paths, function(path) {V(graph)[unlist(path)]})
}