Automatic multiline labels in Graphviz? - graphviz

I'm using Graphviz to draw some graphs. I'm using labels on nodes and I can put in "\n" to force it to split the label across 2 lines. Is there some way to get Graphviz (or dot which I'm using) to automatically see that it should split some nodes labels, and for it itself to make the best choice automagically?

Yes, HTML-like labels (<...>) support tag, using which you can break the lines. E.g.
"A" -> "B"
[label = <1. <br/>
2. <br/>
3. <br/>
4. <br/>
.... <br/>
> color="blue" style="dashed"];
These also work when embedding Graphviz in LaTeX, where \n would not.

I've also searched for this, but I don't think it's possible in the current version. The current "solution" is to write code that automatically adds the "\n" every few characters, based on the minimum distance between nodes (nodesep attribute, if I'm not mistaken).

One person wrote a Perl script to achieve this. I found it in his blog: Text wrapping with dot (graphviz).
⚠ Note
This only works if the labels are in the format node [ label=”node label” ]. If the nodes are declared directly (e.g. ”node label”) then it doesn’t work
Perl script:
#!/usr/bin/perl
use strict;
my $usage = "setdotlabelwidth [char-width] < [dotfile]";
my $width = shift() or die("Usage: $usage $!");
while(<STDIN>)
{
if(m/label="(.*?)"/)
{
my $labeltext = $1;
my #words = split(/ /, $labeltext);
my #newtext = ();
my $newline = "";
foreach my $word(#words)
{
if( length($newline) > 0 and
length($newline) + length($word) > $width )
{
push(#newtext, $newline);
$newline = "";
}
$newline .= " " if( length($newline) > 0 );
$newline .= $word;
}
push(#newtext, $newline) if( length($newline) > 0 );
my $newlabel = join("\\n", #newtext);
s/label=".*?"/label="$newlabel"/;
}
print;
}
Save this program as setdotlabelwidth, then simply pipe the output into GraphViz. If for example you want to set the width to 35 characters, then the command is:
./setdotlabelwidth 35 < tile-error-correction.dot | dot -Tpng -o tile-error-correction.png
Before:
After:

(Not sure how we're supposed to deal with duplicate questions?)
dot2tex (latex + graphviz) handles text wrapping,
along with other workarounds to the typesetting limitations of graphviz.
You'll find a short example at this duplicate question,
with prescribed fixed line width.

Related

Perl: how to combine consecutive page numbers?

OS: Windows server 2012, so I don't have access to Unix utils
Activestate Perl 5.16. Sorry I cannot upgrade the OS or Perl, I'm stuck with it.
I did a google search and read about 10 pages from that, I find similar problems but not what I'm looking for.
I then did 3 searches here and found similar issues with SQL, R, XSLT, but not what I'm looking for.
I actually am not sure where to start so I don't even have code yet.
I'd like to combine consecutive page numbers into a page range. Input will be a series of numbers in an array.
Input as an array of numbers: my #a=(1,2,5)
Output as a string: 1-2, 5
Input ex: (1,2,3,5,7)
Output ex: 1-3, 5, 7
Input ex: (100,101,102,103,115,120,121)
Output ex: 100-103,115,120-121
Thank you for your help!
This is the only code I have so far.
sub procpages_old
# $aref = array ref to list of page numbers.
# $model = used for debugging.
# $zpos = used for debugging only.
{my($aref,$model,$zpos)=#_;
my $procname=(caller(0))[3];
my #arr=#$aref; # Array of page numbers.
my #newarr=();
my $i=0;
my $np1=0; # Page 1 of possible range.
my $np2=0; # Page 2 of possible range.
my $p1=0; # Page number to test.
my $p2=0;
my $newpos=0;
while ($i<$#arr)
{
$np1=$arr[$i];
$np2=getdata($arr[$i+1],'');
$p1=$np1;
$p2=$np2;
while ($p2==($p1+1)) # Consecutive page numbers?
{
$i++;
$p1=$a[$i];
$p2=getdata($a[$i+1],'');
}
$newarr[$newpos]=$np1.'-'.$p2;
$newpos++;
# End of loop
$i++;
}
my $pages=join(', ',#arr);
return $pages;
}
That's called an intspan. Use Set::IntSpan::Fast::XS.
use Set::IntSpan::Fast::XS qw();
my $s = Set::IntSpan::Fast::XS->new;
$s->add(100,101,102,103,115,120,121);
$s->as_string; # 100-103,115,120-121
This seems to do what you want.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
while (<DATA>) {
chomp;
say rangify(split /,/);
}
sub rangify {
my #nums = #_;
my #range;
for (0 .. $#nums) {
if ($_ == 0 or $nums[$_] != $nums[$_ - 1] + 1) {
push #range, [ $nums[$_] ];
} else {
push #{$range[-1]}, $nums[$_];
}
}
for (#range) {
if (#$_ == 1) {
$_ = $_->[0];
} else {
$_ = "$_->[0]-$_->[-1]";
}
}
return join ',', #range;
}
__DATA__
1,2,5
1,2,3,5,7
100,101,102,103,115,120,121
The rangify() function builds an array of arrays. It traverses your input list and if a number is just one more than the previous number then it adds the new number to the second-level array that's currently at the end of the first-level array. If the new number is not sequential, it adds a new second-level array at the end of the first level array.
Having built this data structure, we walk the first-level array, looking at each of the second-level arrays. If the second level array contains only one element then we know it's not a range, so we overwrite the value with the single number from the array. If it contains more than one element, then it's a range and we overwrite the value with the first and last elements separated with a hyphen.
So I managed to adjust this code to work for me. Pass your array of numbers into procpages() which will then call num2range().
######################################################################
# In:
# Out:
sub num2range
{
local $_ = join ',' => #_;
s/(?<!\d)(\d+)(?:,((??{$++1}))(?!\d))+/$1-$+/g;
tr/-,/, /;
return $_;
}
######################################################################
# Concatenate consecutive page numbers in array.
# In: array like (1,2,5,7,99,100,101)
# Out: string like "1-2, 6, 7, 99-101"
sub procpages
{my($aref,$model,$zpos)=#_;
my $procname=(caller(0))[3];
my #arr=#$aref;
my $pages=num2range(#arr);
$pages=~s/\,/\-/g; # Change comma to dash.
$pages=~s/ /\, /g; # Change space to comma and space.
#$pages=~s/\,/\, /g;
return $pages;
}
You probably have the best solution already with the Set::IntSpan::Fast::XS module, but assuming you want to take the opportunity to learn perl here's another perl-ish way to do it.
use strict;
use warnings;
my #nums = (1,2,5);
my $prev = -999; # assuming you only use positive values, this will work
my #out = ();
for my $num (#nums) {
# if we are continuing a sequence, add a hyphen unless we did last time
if ($num == $prev + 1) {
push (#out, '-') unless (#out and $out[-1] eq '-');
}
else {
# if we are breaking a sequence (#out ends in '-'), add the previous number first
if (#out and $out[-1] eq '-') {
push(#out, $prev);
}
# then add the current number
push (#out, $num);
}
# track the previous number
$prev = $num;
}
# add the final number if necessary to close the sequence
push(#out, $prev) if (#out and $out[-1] eq '-');
# join all values with comma
my $pages = join(',', #out);
# flatten the ',-,' sequence to a single '-'
$pages =~ s/,-,/-/g;
print "$pages\n";
This is not super elegant or short, but is very simple to understand and debug.

Deleting lines with more than 30% lowercase letters

I try to process some data but I'am unable to find a working solution for my problem. I have a file which looks like:
>ram
cacacacacacacacacatatacacatacacatacacacacacacacacacacacacaca
cacacacacacacaca
>pam
GAATGTCAAAAAAAAAAAAAAAAActctctct
>sam
AATTGGCCAATTGGCAATTCCGGAATTCaattggccaattccggaattccaattccgg
and many lines more....
I want to filter out all the lines and the corresponding headers (header starts with >) where the sequence string (those not starting with >) are containing 30 or more percent lowercase letters. And the sequence strings can span multiple lines.
So after command xy the output should look like:
>pam
GAATGTCAAAAAAAAAAAAAAAAActctctct
I tried some mix of a while loop for reading the input file and then working with awk, grep, sed but there was no good outcome.
Here's one idea, which sets the record separator to ">" to treat each header with its sequence lines as a single record.
Because the input starts with a ">", which causes an initial empty record, we guard the computation with NR > 1 (record number greater than one).
To count the number of characters we add the lengths of all the lines after the header. To count the number of lower-case characters, we save the string in another variable and use gsub to replace all the lower-case letters with nothing --- just because gsub returns the number of substitutions made, which is a convenient way of counting them.
Finally we check the ratio and print or not (adding back the initial ">" when we do print).
BEGIN { RS = ">" }
NR > 1 {
total_cnt = 0
lower_cnt = 0
for (i=2; i<=NF; ++i) {
total_cnt += length($i)
s = $i
lower_cnt += gsub(/[a-z]/, "", s)
}
ratio = lower_cnt / total_cnt
if (ratio < 0.3) print ">"$0
}
$ awk -f seq.awk seq.txt
>pam
GAATGTCAAAAAAAAAAAAAAAAActctctct
Or:
awk '{n=length(gensub(/[A-Z]/,"","g"));if(NF && n/length*100 < 30)print a $0;a=RT}' RS='>[a-z]+\n' file
RS='>[a-z]+\n' - Sets the record separator to the line containing '>' and name
RT - This value is set by what is matched by RS above
a=RT - save previous RT value
n=length(gensub(/[A-Z]/,"","g")); - get the length of lower case chars
if(NF && n/length*100 < 30)print a $0; - check we have a value and that the percentage is less than 30 for lower case chars
awk '/^>/{b=B;gsub( /[A-]/,"",b);
if( length( b) < length( B) * 0.3) print H "\n" B
H=$0;B="";next}
{B=( (B != "") ? B "\n" : "" ) $0}
END{ b=B;gsub( /[A-]/,"",b);
if( length( b) < length( B) * 0.3) print H "\n" B
}' YourFile
quick qnd dirty, a function suite better the need for printing
Nowadays I would not use sed or awk anymore for anything longer than 2 lines.
#! /usr/bin/perl
use strict; # Force variable declaration.
use warnings; # Warn about dangerous language use.
sub filter # Declare a sub-routing, a function called `filter`.
{
my ($header, $body) = #_; # Give the first two function arguments the names header and body.
my $lower = $body =~ tr/a-z//; # Count the translation of the characters a-z to nothing.
print $header, $body, "\n" # Print header, body and newline,
unless $lower / length ($body) > 0.3; # unless lower characters have more than 30%.
}
my ($header, $body); # Declare two variables for header and body.
while (<>) { # Loop over all lines from stdin or a file given in the command line.
if (/^>/) { # If the line starts with >,
filter ($header, $body) # call filter with header and body,
if defined $header; # if header is defined, which is not the case at the beginning of the file.
($header, $body) = ($_, ''); # Assign the current line to header and an empty string to body.
} else {
chomp; # Remove the newline at the end of the line.
$body .= $_; # Append the line to body.
}
}
filter ($header, $body); # Filter the last record.

Script to find words inside a given word from wordlist

I have a dictionary with 250K words (txt file). For each of those words I would like to come up with a script that will throw all possible anagrams (each anagram should also be in the dictionary).
Ideally the script would output in this format:
word1: anagram1,anagram2...
word2: anagram1,anagram2...
Any help would be greatly appreciated.
Inspired by this, I would suggest you create a Trie.
Then, the trie with N levels will have all possible anagrams (where N is the length of the original word). Now, to get different sized words, I suggest you simply traverse the trie, ie. for all 3 letter subwords, just make all strings that are 3 levels deep in the trie.
I'm not really sure of this, because I didn't test this, but it's an interesting challenge, and this suggestion would be how I would start tackling it.
Hope it helps a little =)
It must be anagram week.
I'm going to refer you to an answer I submitted to a prior question: https://stackoverflow.com/a/12811405/128421. It shows how to build a hash for quick searches of words that have common letters.
For your purpose, of finding substrings/inner-words, you will also want to find the possible inner words. Here's how to quickly locate unique combinations of letters of varying sizes, based on a starting word:
word = 'misses'
word_letters = word.downcase.split('').sort
3.upto(word.length) { |i| puts word_letters.combination(i).map(&:join).uniq }
eim
eis
ems
ess
ims
iss
mss
sss
eims
eiss
emss
esss
imss
isss
msss
eimss
eisss
emsss
imsss
eimsss
Once you have those combinations, split them (or don't do the join) and do look-ups in the hash my previous answer built.
What I tried so far in Perl :
use strict;
use warnings;
use Algorithm::Combinatorics qw(permutations);
die "First argument should be a dict\n" unless $ARGV[0] or die $!;
open my $fh, "<", $ARGV[0] or die $!;
my #arr = <$fh>;
my $h = {};
map { chomp; $h->{lc($_)} = [] } #arr;
foreach my $word (#arr) {
$word = lc($word);
my $chars = [ ( $word =~ m/./g ) ];
my $it = permutations($chars);
while ( my $p = $it->next ) {
my $str = join "", #$p;
if ($str ne $word && exists $h->{$str}) {
push #{ $h->{$word} }, $str
unless grep { /^$str$/ } #{ $h->{$word} };
}
}
if (#{ $h->{$word} }) {
print "$word\n";
print "\t$_\n" for #{ $h->{$word} };
}
}
END{ close $fh; }
There's maybe some possible improvement for speed, but it works.
I use French dict from words archlinux package.
EXAMPLE
$ perl annagrammes.pl /usr/share/dict/french
abaissent
absentais
abstenais
abaisser
baissera
baserais
rabaisse
(...)
NOTE
To installl the perl module :
cpan -i Algorithm::Combinatorics
h = Hash.new{[]}
array_of_words.each{|w| h[w.downcase.chars.sort].push(w)}
h.values

aggregate totals when key changes in Perl

I have an input file with the following format
ant,1
bat,1
bat,2
cat,4
cat,1
cat,2
dog,4
I need to aggregate the col2 for each key (column1) so the result is:
ant,1
bat,3
cat,7
dog,4
Other considerations:
Assume that the input file is sorted
The input file is pretty large (about 1M rows), so I don't want to use an array and take up memory
Each input line should be processed as we read it, and move to the next line
I need to write the results to an outFile
I need to do this in Perl, but a pseudo-code or algorithm would help just as fine
Thanks!
This is what I came up with... want to see if this can be written better/elegant.
open infile, outFile
prev_line = <infile>;
print_line = $prev_line;
while(<>){
curr_line = $_;
#prev_cols=split(',', $prev_line);
#curr_cols=split(',', $curr_line);
if ( $prev_cols[0] eq $curr_cols[0] ){
$prev_cols[1] += curr_cols[1];
$print_line = "$prev_cols[0],$prev_cols[1]\n";
$print_flag = 0;
}
else{
$print outFile "$print_line";
$print_flag = 1;
$print_line = $curr_line;
}
$prev_line = $curr_line;
}
if($print_flag = 1){
print outFile "$curr_line";
}
else{
print outFile "$print_line";
}
#!/usr/bin/perl
use warnings;
use strict;
use integer;
my %a;
while (<>) {
my ($animal, $n) = /^\s*(\S+)\s*,\s*(\S+)/;
$a{$animal} += $n if defined $n;
}
print "$_,${a{$_}}\n" for sort keys %a;
This short code affords you the chance to learn Perl's excellent hash facility, as %a. Hashes are central to Perl. One really cannot write fluent Perl without them.
Observe incidentally that the code exercises Perl's interesting autovivification feature. The first time a particular animal is encountered in the input stream, no count exists, so Perl implicitly assumes a pre-existing count of zero. Thus, the += operator does not fail, even though it seems that it should. It just adds to zero in the first instance.
On the other hand, it may happen that not only the number of data but the number of animals is so large that one would not like to store the hash %a. In this case, one can still calculate totals, provided only that the data are sorted by animal in the input, as they are in your example. In this case, something like the following might suit (though regrettably it is not nearly so neat as the above).
#!/usr/bin/perl
use warnings;
use strict;
use integer;
my $last_animal = undef;
my $total_for_the_last_animal = 0;
sub start_new_animal ($$) {
my $next_animal = shift;
my $n = shift;
print "$last_animal,$total_for_the_last_animal\n"
if defined $last_animal;
$last_animal = $next_animal;
$total_for_the_last_animal = $n;
}
while (<>) {
my ($animal, $n) = /^\s*(\S+)\s*,\s*(\S+)/;
if (
defined($n) && defined($animal) && defined($last_animal)
&& $animal eq $last_animal
) { $total_for_the_last_animal += $n; }
else { start_new_animal $animal, $n; }
}
start_new_animal undef, 0;
Use Perl’s awk mode.
-a
turns on autosplit mode when used with a -n or -p. An implicit split command to the #F array is done as the first thing inside the implicit while loop produced by the -n or -p.
perl -ane 'print pop(#F), "\n";'
is equivalent to
while (<>) {
#F = split(' ');
print pop(#F), "\n";
}
An alternate delimiter may be specified using -F.
All that’s left for you is to accumulate the sums in a hash and print them.
$ perl -F, -lane '$s{$F[0]} += $F[1];
END { print "$_,$s{$_}" for sort keys %s }' input
Output:
ant,1
bat,3
cat,7
dog,4
It's trivial in perl. Loop on the file input. Split the input line on comma. For each key in column one keep a hash to which you add the value in column two. At the end of the file print the list of hash keys and their values. It can be done in one line but that would obfuscate the algorithm.

Use wordlist for Codeigniter captcha

I have a wordlist of dictionary words in .txt format. How can I use this with the captcha_helper instead of random characters? I've already extended the captcha_helper file but am having issues integrating my wordlist.txt file for use.
After doing some poking, I found a solution:
// This is the modified version in captcha_helper.php
if($word == ''){
$wordsfile = '../words.php';
$fp = fopen($wordsfile, 'r');
$length = strlen(fgets($fp));
$line = rand(1, (filesize($wordsfile)/$length)-2);
if(fseek($fp, $length*$line) == -1) return FALSE;
$word = trim(fgets($fp));
fclose($fp);
}
But I noticed that sometimes the last letter would get cut off. Is there a way to make sure that the first and last letter never get placed outside of the bounding box?
in this case you can use one function which pass your words randomly to the script to display..
i think this is better option.
instead to work wit

Resources