Pretty Printing a tree data structure in Ruby - ruby

I am working on a building a compiler and within that I generate a tree that represents the source program that is passed in. I want to display this is a tree like fashion so I can display the structure of the program to anyone interested.
Right now I just have the tree printing on a single line like this:
ProgramNode -> 'Math' BlockNode -> DeclarationNode -> ConstantDeclarationNode -> const ConstantListNode -> [m := 7, ConstantANode -> [n := StringLiteralNode -> ""TEST"" ]] ;
What I would like is something like this:
ProgramNode
/ \
'Math' BlockNode
|
DeclarationNode
|
ConstantDeclarationNode ------------------------------
/ \ |
const ConstantListNode |
/ | \ \ |
m := 7 ConstantANode |
/ | \ |
n := StringLiteralNode |
/ | \ |
" TEST " ;
I haven't really worked with trees in Ruby, how are they usually represented?
Any help would be appreciated.

This kind of pretty printing requires quite a bit of math. Besides, it's unclear what should happen if the tree grows too wide for the console window. I don't know of any existing libraries that'll do this. I personally use awesome_print.
tree = {'ConstantDeclarationNode' => ['const',
'ConstantListNode' => ['m', ':=', '7']]}
require 'awesome_print'
ap tree
# >> {
# >> "ConstantDeclarationNode" => [
# >> [0] "const",
# >> [1] {
# >> "ConstantListNode" => [
# >> [0] "m",
# >> [1] ":=",
# >> [2] "7"
# >> ]
# >> }
# >> ]
# >> }
It has tons of options, check it out!

You need to check out the Graph gem. It is amazing and remarkably simple to work with. You can choose the direction of your tree and the shape of the nodes, as well as colors and so much more. I first found out about it at Rubyconf last year and was blown away.
It is as simple as:
digraph do
edge "Programnode", "Blocknode"
edge "Programnode", "Math"
edge "Blocknode", "DeclarationNode"
end
Obviously you would want to programmatically enter the edges :)
Here is a link to a pdf of the talk which will give more information on it:
There is also a video of the talk on Confreaks if you are interested.
Cheers,
Sean

Related

Debugging Forth using a Test Harness

I would like to use a simple test harness to test my code during debugging using the same methodology as the Forth test harness developed by John Hayes.
The concept is to define a function, say my+ and then to define simple code snippets that will test the code when Tdebug is on.
Tdebug if T{ 1 1 my+ -> 2 }T else
Is it really as simple as including tester.f and changing {> to T{ and } to }T?
I plan to omit tester.f in the production release if size is an issue.
Edit:
debug if ... then does not work because it is outside compile...
Now I need help!
If debug is true tester.f works well.
If debug is false t{ and }t must work like ( ... ) comments. How do I code this?
0 constant debug
: t{
debug if
( as defined in tester.fr )
else
( what goes here? )
then
;
: }t
debug if
( as defined in tester.fr )
else
( and what goes here? )
then
;
The only way is to parse the input source stream up to }t. If t{ can be nested, it becomes a little trickier — see the reference implementation of [ELSE] word.
For reference, the production-mode definition of t{ word for simple (not nested) case in standard Forth:
: t{ ( "ccc }t" -- ) \ skip up to '}t'
begin
begin parse-name dup while S" }t" compare 0= until exit then 2drop
refill 0=
until
;
Although, I suggest to place tests into separate files and make conditional inclusion of such files ("spec" files). In such case you don't need to have another (production-mode) definition of t{ word at all.
I eventually did something similar to #ruvim by including tester.f when in debug mode and including notester.f when in production as follows:
\ notester.fs
( include either tester.fs or notester.fs )
\ adapted from longcomment.txt
false variable verbose
: t{ ( -- ) \ Long comment
begin
token \ Get next token
dup 0= if 2drop cr query token then \ If length of token is zero, end of
\ line is reached.
\ Fetch new line. Fetch new token.
s" }t" compare \ Search for }t
until
immediate 0-foldable
;
: testing ( -- ) \ Talking comment.
source verbose #
if dup >r type cr r> >in !
else >in ! drop [char] * emit
then
;
t{ 1 1 + -> 2 }t \ Usage sample
I find that having the tests as usage comments in the production file assists clarity.

Vowpal Wabbit - How to get prediction probabilities from contextual bandit model on a test sample

Given a trained contextual bandit model, how can I retrieve a prediction vector on test samples?
For example, let's say I have a train set named "train.dat" containing lines formatted as below
1:-1:0.3 | a b c # <action:cost:probability | features>
2:2:0.3 | a d d
3:-1:0.3 | a b e
....
And I run below command.
vw -d train.dat --cb 30 -f cb.model --save_resume
This produces a file, 'cb.model'. Now, let's say I have a test dataset as below
| a d d
| a b e
I'd like to see probabilities as below
0.2 0.7 0.1
The interpretation of these probabilities would be that action 1 should be picked 20% of the time, action 2 - 70%, and action 3 - 10% of the time.
Is there a way to get something like this?
When you use "--cb K", the prediction is the optimal arm/action based on argmax policy, which is a static policy.
When using "--cb_explore K", the prediction output contains the probability for each arm/action. Depending the policy you pick, the probabilities are calculated differently.
If you send those lines to a daemon running your model, you'd get just that. You send a context, and the reply is a probability distribution across the number of allowed actions, presumably comprising the "recommendation" provided by the model.
Say you have 3 actions, like in your example. Start a contextual bandits daemon:
vowpalwabbit/vw -d train.dat --cb_explore 3 -t --daemon --quiet --port 26542
Then send a context to it:
| a d d
You'll get just what you want as the reply.
In the Workspace Class, initialize the object and then call the method predict(prediction_type: int). Below are the corresponding parameter values
class PredictionType(IntEnum):
SCALAR = pylibvw.vw.pSCALAR
SCALARS = pylibvw.vw.pSCALARS
ACTION_SCORES = pylibvw.vw.pACTION_SCORES
ACTION_PROBS = pylibvw.vw.pACTION_PROBS
MULTICLASS = pylibvw.vw.pMULTICLASS
MULTILABELS = pylibvw.vw.pMULTILABELS
PROB = pylibvw.vw.pPROB
MULTICLASSPROBS = pylibvw.vw.pMULTICLASSPROBS
DECISION_SCORES = pylibvw.vw.pDECISION_SCORES
ACTION_PDF_VALUE = pylibvw.vw.pACTION_PDF_VALUE
PDF = pylibvw.vw.pPDF
ACTIVE_MULTICLASS = pylibvw.vw.pACTIVE_MULTICLASS
NOPRED = pylibvw.vw.pNOPRED

How to capture part of a sentence that starts with a verb and finishes with nouns

I am trying to use NLTK package to capture the following chunk in a sentence:
verb + smth + noun
or it may be
verb + smth + noun + and + noun
I truthfully spent entire day messing with regex, but still nothing proper is produced..
I was looking at this tutorial which wasn't much of help.
When you have an idea of what those somethings that might come in between are, there is a relatively easy method using NLTK's CFG. This is most certainly not the most efficient way. For a comprehensive analysis, consult NLTK's book on chapter 8.
We have two patterns as you mentioned:
<verb> ... <noun>
<verb> ... <noun> "and" <noun>
We should assemble a list of VPs and NPs and also the range of possible words that could happen in between. As a silly little example:
grammar = nltk.CFG.fromstring("""
% start S
S -> VP SOMETHING NP
VP -> V
SOMETHING -> WORDS SOMETHING
SOMETHING ->
NP -> N 'and' N
NP -> N
V -> 'told' | 'scolded' | 'loved' | 'respected' | 'nominated' | 'rescued' | 'included'
N -> 'this' | 'us' | 'them' | 'you' | 'I' | 'me' | 'him'|'her'
WORDS -> 'among' | 'others' | 'not' | 'all' | 'of'| 'uhm' | '...' | 'let'| 'finish' | 'certainly' | 'maybe' | 'even' | 'me'
""")
Now suppose this is the list of the sentences we want to use our filter against:
sentences = ['scolded me and you', 'included certainly uhm maybe even her and I', 'loved me and maybe many others','nominated others not even him', 'told certainly among others uhm let me finish ... us and them', 'rescued all of us','rescued me and somebody else']
As you can see, the third and the last phrases don't pass the filter. We can check whether the rest match the pattern:
def sentence_filter(sent, grammar):
rd_parser = nltk.RecursiveDescentParser(grammar)
try:
for p in rd_parser.parse(sent):
print("SUCCESS!")
except:
print("Doesn't match the filter...")
for s in sentences:
s = s.split()
sentence_filter(s, grammar)
When we run this, we get this result:
>>>
SUCCESS!
SUCCESS!
Doesn't match the filter...
SUCCESS!
SUCCESS!
SUCCESS!
Doesn't match the filter...
>>>

Extracting plain text output from binary file

I am working with Graphchi's pagerank example: https://github.com/GraphChi/graphchi-cpp/wiki/Example-Apps#pagerank-easy
The example app writes a binary file with vertex information that I would like to read/convert to a plan text file (to later call into R or some other language).
The documentation states that:
"GraphChi will write the values of the edges in a binary file, which is easy to handle in other programs. Name of the file containing vertex values is GRAPH-NAME.4B.vout. Here "4B" refers to the vertex-value being a 4-byte type (float)."
The 'easy to handle' part is what I'm struggling with - I have experience with high level languages but not C++ or dealing with binary files. I have found a few things through searching stackoverflow but no luck yet in reading this file. Ideally this would be done through bash or python.
thanks very much for your help on this.
Update: hexdump graph-name.4B.vout | head -5 gives:
0000000 999a 3e19 7468 3e7f 7d2a 3e93 d8e0 3ec4
0000010 cec6 3fe4 d551 3f08 eff2 3e54 999a 3e19
0000020 999a 3e19 3690 3e8c 0080 3f38 9ea3 3ef5
0000030 b7d6 3f66 999a 3e19 10e3 3ee1 400c 400d
0000040 a3df 3e7c 999a 3e19 979c 3e91 5230 3f18
Here is example code how you can use GraphCHi to write the output out as a string:
https://github.com/GraphChi/graphchi-cpp/wiki/Vertex-Aggregators
But the array is simple byte array. Here is example how to read it in python:
import struct
from array import array as binarray
import sys
inputfile = sys.argv[1]
data = open(inputfile).read()
a = binarray('c')
a.fromstring(data)
s = struct.Struct("f")
l = len(a)
print "%d bytes" %l
n = l / 4
for i in xrange(0, n):
x = s.unpack_from(a, i * 4)[0]
print ("%d %f" % (i, x))
I was having the same trouble. Luckily I work with a bunch of network engineers who helped me out! On Mac Linux, the following command works to print the 4B.vout data one line per node, with the integer values the same as is given in the summary file. If your file is called eg, filename.4B.vout, then some command line perl gets you:
cat filename.4B.vout | LANG= perl -0777 -e '$,=\"\n\"; print unpack(\"L*\",<>),\"\";'
Edited to add: this is for the assignments of connected component ID and community ID, written implicitly the 1st line is the ID of the node labeled 0, the 2nd line is the node labeled 1 etc. But I am copypasting here so I'm not sure how it would need to change for floats. It works great for the integer values per node.

Calculate sum of size notated figures?

I want to calculate the total size of all .mobi files from this
link (it's a good link by the way).
In my attempt of making this as my learning experience, I have made a 'pipe' (let's call it a) that output all the sizes from that page which looks like:
189K
20M
549K
2.2M
1.9M
3.1M
2.5M
513K
260K
1.1M
2.8M
5.1M
3.7M
1.5M
5.6M
1.0M
5.6M
1.5M
4.9M
3.4M
810K
My target is to get the total size (ex: 50.50M, or 50000K) - sum of all these numbers.
My question is, how to calculate that target, using pipeling (a | some_other_commands). Answers using python or any other language (preferably one liners) are welcome. Thanks a lot.
For the fun a solution in shell:
a | sed -e 's/M$/ 1024 * +/' -e 's/K$/ +/' | dc -e '0' -f - -e 'p'
Perl one-liner:
a | perl -ne 's/^([\d.]+)M$/$1*1024/e;$sum+=$_; END{print $sum."K"}'
see it
It assumes that all entries are in either Kilobytes or Megabytes as shown in OPs input.
Sigh, someone says “one-liner” and all my code-golf reflexes fire...
ruby -e 'puts $<.read.split.inject(0){ |m,e| m += e.to_f * { "M" => 1, "K" => 0.001 }[e[-1,1]]}.to_s+"M"'
or, with some shortcuts...
ruby -ne 'p #e=#e.to_f+$_.to_f*{"M"=>1,"K"=>0.001}[$_[-2,1]]'
Update: Heh, ok, hard to read. The OP asked for a "one liner". :-)
#!/usr/bin/env ruby
total = 0
while s = gets # get line
scalefactorMK = s.chomp[-1,1] # get the M or K
scalefactor = { 'M'=>1,'K'=>0.001 }[scalefactorMK] # get numeric scale
total += s.to_f * scalefactor # accumulate total
end
puts "%5.1fM" % [total]
if you have Ruby (1.9+)
require 'net/http'
url="http://hewgill.com/~greg/stackoverflow/ebooks/"
response = Net::HTTP.get_response( URI.parse(url) )
data=response.body
total=0
data.split("\n").each do |x|
if x=~/\.mobi/
size = x.split(/\s+/)[-1]
c = case size[-1]
when 'K' then 1024
when 'M' then 1024 * 1024
when 'G' then 1024 * 1024 * 1024
end
total+=size[0..-1].to_i * c
end
end
puts "Total size: %.2f MB" % ( total/(1024.0 * 1024.0) )
awk (assume files less than 1K don't substantially add to the total):
a | awk '/K/ {sum += $1/1024} /M/ {sum += $1} END {printf("%.2fM\n", sum)}'

Resources