Force Graphviz to complain for duplicate nodes - graphviz

I just noticed that duplicate node names (even if labeled uniquely) get processed without complaint by Graphviz. For example, consider the following simple graph as rendered (with circo) in the image below:
graph {
a [label="a1"]
a [label="a2"]
b
c
d
e
a -- b;
b -- c;
a -- c;
d -- c;
e -- c;
e -- a;
}
I want the above graph to have two nodes: a1 and a2. So I know I should instantiate them with unique names (different than what I did above). But in a large graph, I may not notice that I mistakenly instantiated two different nodes with identical names. So if I do something like this, I'd like to force Graphviz to complain about it or bring it to my attention somehow, maybe with a warning or an error message.
How do I accomplish that?

All the graphviz programs silently merge nodes with duplicate names and I cannot find any way to have them produce a warning when they do that. Since we only have to find the cases where nodes are declared by themselves, however, rather than nodes that are implicitly declared when an edge is declared (in which case duplication is normal and expected), we just have to find all the node names and identify the duplicates.
If no more than one node is ever declared on a line, this could be done with the following script:
#!/bin/sh
sed -n 's/^[\t ][\t ]*\([_a-zA-Z][_a-zA-Z0-9]*\) *\(\[.*\)*;*$/\1/ p' | \
sort | uniq -c | awk '$$1>1'
If we call this script findDupNodes, we can run it as follows:
$ findDupNodes <duplicates.gv
2 a
The script finds node names that are either declared by themselves or with a list of attributes that starts with [, sorts them, counts how many times each is declared (with uniq -c) and filters out the ones that are declared only once.
Multiple nodes can be declared on a single line (e.g. a; b; c; d;) but this script does not handle that case, or (probably) some other cases -- most of which would probably require a full-blown xdot language parser.
Nevertheless, this script should find many of the duplicate node names that might find their way into hand-written graphviz scripts.

Related

How can different values be used as the same parameter one after another?

I run a command in the terminal that outputs the below..
abc -> 1
bcd -> g
cde -> 2
def -> 3
efg -> 4
That you see on the left of -> represents first parameter of another function and that you see on the right of -> represents the second parameter of the same function.
What you see on the left is essentially paired with what is on the right, and I want to utilise this information in the other function.
I had the idea of outputting what's on the left into one file and
outputting what's on the right into another file
Then creating a function that reads both files and pulls out the
information one by one and uses it in the other function until all
information has been used i.e. after efg -> 4 has been
used in the other function then it would stop.
My questions are:
How can different values be used as the same parameter one after another?
How can you pair two pieces of information from two separate files? So that the first of the pair is run as one parameter and the second of the pair is run as the other parameter.
Is there a better approach to this?
Shell scripts are great for processing text and running commands. There's no need for temporary files. A simple loop can do it:
some_command | while read -r param1 _ param2; do
use_values "$param1" "$param2"
done
Here some_command is a placeholder for the command that prints the output above and use_values is a placeholder for the "other function" that uses the two values.
I used _ as a variable name for the -> bit, which is ignored. _ is a common idiom to indicate a variable that isn't used.

Rename all contents of directory with a minimum of overhead

I am currently in the position where I need to rename all files in a directory. The chance that a file does not change name is minimal, and the chance that an old filename is the same as a new filename is considerable, making renaming conflicts likely.
Thus, simply looping over the files and renaming old->new is not an option.
The easy / obvious solution is to rename everything to have a temporary filename: old->tempX->new. Of course, to some degree, this shifts the issue, because now there is the responsibility of checking nothing in the old names list overlaps with the temporary names list, and nothing in the temporary names list overlaps with the new list.
Additionally, since I'm dealing with slow media and virus scanners that love to slow things down, I would like to minimize the actual actions on disk. Besides that, the user will be impatiently waiting to do more stuff. So if at all possible, I would like to process all files on disk in a single pass (by smartly re-ordering rename operations) and avoid exponential time shenanigans.
This last bit has brought me to a 'good enough' solution where I first create a single temporary directory inside my directory, I move-rename everything into that, and finally, I move everything back into the old folder and delete the temporary directory. This gives me a complexity of O(2n) for disk and actions.
If possible, I'd love to get the on-disk complexity to O(n), even if it comes at a cost of increasing the in-memory actions to O(99999n). Memory is a lot faster after all.
I am personally not at-home enough in graph theory, and I suspect the entire 'rename conflict' thing has been tackled before, so I was hoping someone could point me towards an algorithm that meets my needs. (And yes, I can try to brew my own, but I am not smart enough to write an efficient algorithm, and I probably would leave in a logic bug that rears its ugly head rarely enough to slip through my testing. xD)
One approach is as follows.
Suppose file A renames to B and B is a new name, we can simply rename A.
Suppose file A renames to B and B renames to C and C is a new name, we can follow the list in reverse and rename B to C, then A to B.
In general this will work providing there is not a loop. Simply make a list of all the dependencies and then rename in reverse order.
If there is a loop we have something like this:
A renames to B
B renames to C
C renames to D
D renames to A
In this case we need a single temporary file per loop.
Rename the first in the loop, A to ATMP.
Then our list of modifications becomes:
ATMP renames to B
B renames to C
C renames to D
D renames to A
This list no longer has a loop so we can process the files in reverse order as before.
The total number of file moves with this approach will be n + number of loops in your rearrangement.
Example code
So in Python this might look like this:
D={1:2,2:3,3:4,4:1,5:6,6:7,10:11} # Map from start name to final name
def rename(start,dest):
moved.add(start)
print 'Rename {} to {}'.format(start,dest)
moved = set()
filenames = set(D.keys())
tmp = 'tmp file'
for start in D.keys():
if start in moved:
continue
A = [] # List of files to rename
p = start
while True:
A.append(p)
dest = D[p]
if dest not in filenames:
break
if dest==start:
# Found a loop
D[tmp] = D[start]
rename(start,tmp)
A[0] = tmp
break
p = dest
for f in A[::-1]:
rename(f,D[f])
This code prints:
Rename 1 to tmp file
Rename 4 to 1
Rename 3 to 4
Rename 2 to 3
Rename tmp file to 2
Rename 6 to 7
Rename 5 to 6
Rename 10 to 11
Looks like you're looking at a sub-problem of Topologic sort.
However it's simpler, since each file can depend on just one other file.
Assuming that there are no loops:
Supposing map is the mapping from old names to new names:
In a loop, just select any file to rename, and send it to a function which :
if it's destination new name is not conflicting (a file with the new name doesn't exist), then just rename it
else (conflict exists)
2.1 rename the conflicting file first, by sending it to the same function recursively
2.2 rename this file
A sort-of Java pseudo code would look like this:
// map is the map, map[oldName] = newName;
HashSet<String> oldNames = new HashSet<String>(map.keys());
while (oldNames.size() > 0)
{
String file = oldNames.first(); // Just selects any filename from the set;
renameFile(map, oldNames, file);
}
...
void renameFile (map, oldNames, file)
{
if (oldNames.contains(map[file])
{
(map, oldNames, map[file]);
}
OS.rename(file, map[file]); //actual renaming of file on disk
map.remove(file);
oldNames.remove(file);
}
I believe you are interested in a Graph Theory modeling of the problem so here is my take on this:
You can build the bidirectional mapping of old file names to new file names as a first stage.
Now, you compute the intersection set I the old filenames and new filenames. Each target "new filename" appearing in this set requires the "old filename" to be renamed first. This is a dependency relationship that you can model in a graph.
Now, to build that graph, we iterate over that I set. For each element e of I:
Insert a vertex in the graph representing the file e needing to be renamed if it doesn't exist yet
Get the "old filename" o that has to be renamed into e
Insert a vertex representing o into the graph if it doesn't already exist
Insert a directed edge (e, o) in the graph. This edge means "e must be renamed before o". If that edge introduce a cycle (*), do not insert it and mark o as a file that needs to be moved-and-renamed.
You now have to iterate over the roots of your graph (vertices that have no in-edges) and perform a BFS using them as a starting point and perform the renaming each time you discover a vertex. The renaming can be a common rename or a move-and-rename depending on if the vertex was tagged.
The last step is to move back the moved-and-renamed files back from their sandbox directory to the target directory.
C++ Live Demo to illustrate the graph processing.

SHELL Sorting output alphabetically

I have a script with output for example a c d txt iso e z I need to sort it alphabetically. These are file extensions so I cant compile it together in one word and then split up.
Can anyone help me?
If your the name of your script is foo and it writes to stdout a string such as a c d txt iso e z, you can get the sorted list by, for instance:
sorted_output=$(foo|xargs -n 1|sort)
Of course, depending on what you are going to do with the result, it might make more sense to store it into an array.

SPSS: Can I generate graphs for multiple variables using a single syntax input with the GRAPH command?

I was wondering if it was possible to create graphs for multiple variables in a single syntax command in SPSS:
GRAPH
/HISTOGRAM(NORMAL)=
As it is, I'm creating multiple graphs as such:
GRAPH
/HISTOGRAM(NORMAL)=CO
GRAPH
/HISTOGRAM(NORMAL)=Min_last
GRAPH
/HISTOGRAM(NORMAL)=Day_abs
etc etc.
If I would do something along the lines of:
GRAPH
/HISTOGRAM(NORMAL)=CO Min_last Day_abs
and it would generate a graph for each variable, I'd be pretty happy.
Anyways, let me know if you think it's possible or if I need to provide more info. Thanks for reading!
If you just to save typing and want an independent set of graphs, you can define a macro like this.
define !H (!positional !cmdend)
!do !i !in (!1)
graph /histogram(normal)=!i.
!doend
!enddefine.
and invoke it with a list of variables.
!H salary salbegin.
The way I like to do it is to reshape the data so all three variables are in the same row using VARSTOCASES and then either panel the charts in small multiples (if you want the axes to be the same) or use SPLIT FILES to produce seperate charts. Example of the split file approach below:
*Making fake data.
INPUT PROGRAM.
LOOP #i = 1 TO 100.
COMPUTE CO = RV.NORMAL(0,1).
COMPUTE Min_last = RV.UNIFORM(0,1).
COMPUTE Days_abs = RV.POISSON(5).
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
*Reshaping to long.
VARSTOCASES /MAKE V FROM CO Min_last Days_abs /INDEX VLab (V).
*Split file and build seperate charts.
SORT CASES BY VLab.
SPLIT FILE BY VLab.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=V
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: V=col(source(s), name("V"))
GUIDE: axis(dim(1), label("Value"))
GUIDE: axis(dim(2), label("Frequency"))
ELEMENT: interval(position(summary.count(bin.rect(V))), shape.interior(shape.square))
END GPL.
SPLIT FILE OFF.

Keeping track of path while recursively going through branches (more info in description )- Using Tcl

Background:
Writting script in Tcl
Running the script using a tool called IDSBatch from linux (centos) terminal
I have a system (.rdl file) that contains blocks, groups and registers.
Blocks can contain other blocks, groups, or registers. Whereas groups can only have registers and registers stand alone.
The problem I am having is I want to print out the "address" of each register i.e the name of the block(s), group and register associated with that specific register. For example:
______Block (a)______
| |
Block (b) reg(bob)
| |
group(tall) group(short)
| | |
reg(bill) reg(bobby) reg(burt)
In the end the output should be something along the lines of:
reg one: a.bob
reg two: a.b.tall.bill
reg three: a.b.tall.bobby
reg four: a.b.short.burt
The true problem comes from the fact that blocks can contain blocks. So the system will not always have one to three levels (one level would be Block--reg, two levels would be Block--Block--reg or Block ---group---reg and so on...)
I was leaning to some sort of recursive solution, where I would access the element say a block and get all of it's children (groups,blocks and regs) then I would use the same function to access it's children (unless it's a register). This way it can take care of any combination blocks groups and registers but then I'm stuck on how to keep track of the address of a specific register.
Thank you for taking the time in reading this and would appreciate any input or suggestions.
You could use a list for doing that.
Starting with an empty list, you append all address parts to it. If you come across a register, you can then construct the path from front to back. After every level of recursion, you remove the last element to get rid of the part you handled.
Example: you just came across the register bill. Then, your list is a -> b ->tall. To get the address, you iterate over the list and concatenate the nodes together, then appending bill to the resulting string.
So, your recursion function would be somewhat like
If the currently handled element is a register: Reconstruct the path.
If the currently handled element is not a register: Append the path element to the list, call the function with that list and remove the last element of that list.

Resources