I have a large .txt file, which contains some data, mapping crashing inputs to programs to their crash sites. The data is formatted as
, and each line is another crash.
I tried to run the ruby script below to automatically sort them, but it gave no output. Any and all suggestions would be appreciated.
# !/usr/bin/ruby
fn = ARGV[0]
$result = Hash.new([])
File.open(fn, "r") do |f|
f.readlines do |l|
ar = l.split
puts(ar)
$result[ar[1]].push[ar[0]]
end
end
$result.each do |k, v|
puts(k)
puts(v)
end
I think the problem is that $result = Hash.new([]) doesn't do what you want/think it does.
It will return the same array when you request any non-existent key for the hash, and also it doesn't assign any array for subsequent requests for the same key.
Instead you can use the block version of Hash.new:
result = Hash.new {|hash, key| hash[key] = [] }
The version of Hash.new with a default value is more useful for avoiding the need for nil checks when you're using a hash to maintain some counts e.g.
counts = Hash.new(0)
counts['foo'] += 1
Related
I've sent my I18n files to be translated by a third party. Since my translator is not computer savvy we made a spreadsheet with the keys, they where sent in dot notation and the values translated.
For example:
es.models.parent: "Pariente"
es.models.teacher: "Profesor"
es.models.school: "Colegio"
How can I move that into a YAML file?
UPDATE: Just like #tadman said, this already is YAML. So if you are with the, you are just fine.
So we will focus this question if you would like to have the tree structure for YAML.
The first thing to do is transform this into a Hash.
So the previous info moved into this:
tr = {}
tr["es.models.parent"] = "Pariente"
tr["es.models.teacher"] = "Profesor"
tr["es.models.school"] = "Colegio"
Then we just advanced creating a deeper hash.
result = {} #The resulting hash
tr.each do |k, value|
h = result
keys = k.split(".") # This key is a concatenation of keys
keys.each_with_index do |key, index|
h[key] = {} unless h.has_key? key
if index == keys.length - 1 # If its the last element
h[key] = value # then we only need to set the value
else
h = h[key]
end
end
end;
require 'yaml'
puts result.to_yaml #Here it is for your YAMLing pleasure
I'm creating a object of hash in order to write a little script that reads in a file a line at a time, and assigns arrays into my hash class. I get wildly different results depending if I subclass Hash or not, plus using super changes things which I don't' understand.
My main issue is that without subclassing hash ( < Hash) it works perfectly, but I get no methods of Hash (like to iterate over the keys and get things out of it.... Subclassing Hash lets me do those things, but it seems that only the last element of the hashed arrays is ever stored.... so any insight into how you get the methods of a subclass. The Dictionary class is a great example I found on this site, and does exactly what I want, so I'm trying to understand how to use it properly.
filename = 'inputfile.txt.'
# ??? class Dictionary < Hash
class Dictionary
def initialize()
#data = Hash.new { |hash, key| hash[key] = [] }
end
def [](key)
#data[key]
end
def []=(key,words)
#data[key] += [words].flatten
#data[key]
# super(key,words)
end
end
listData = Dictionary.new
File.open(filename, 'r').each_line do |line|
line = line.strip.split(/[^[:alpha:]|#|\.]/)
puts "LIST-> #{line[0]} SUB-> #{line[1]} "
listData[line[0]] = ("#{line[1]}")
end
puts '====================================='
puts listData.inspect
puts '====================================='
print listData.reduce('') {|s, (k, v)|
s << "The key is #{k} and the value is #{v}.\n"
}
If anyone understands what is going on here subclassing hash, and has some pointers, that would be excellent.
Running without explicit < Hash:
./list.rb:34:in `<main>': undefined method `reduce' for #<Dictionary:0x007fcf0a8879e0> (NoMethodError)
That is the typical error I see when I try and iterate in any way over my hash.
Here is a sample input file:
listA billg#microsoft.com
listA ed#apple.com
listA frank#lotus.com
listB evanwhite#go.com
listB joespink#go.com
listB fredgrey#stop.com
I can't reproduce your problem using your code:
d = Dictionary.new #=> #<Dictionary:0x007f903a1adef8 #data={}>
d[4] << 5 #=> [5]
d[5] << 6 #=> [6]
d #=> #<Dictionary:0x007f903a1adef8 #data={4=>[5], 5=>[6]}>
d.instance_variable_get(:#data) #=> {4=>[5], 5=>[6]}
But of course you won't get reduce if you don't subclass or include a class/module that defines it, or define it yourself!
The way you have implemented Dictionary is bound to have problems. You should call super instead of reimplementing wherever possible. For example, simply this works:
class Dictionary < Hash
def initialize
super { |hash, key| hash[key] = [] }
end
end
d = Dictionary.new #=> {}
d['answer'] << 42 #=> [42]
d['pi'] << 3.14 #=> [3.14
d #=> {"answer"=>[42], "pi"=>[3.14]}
If you want to reimplement how and where the internal hash is stored (i.e., using #data), you'd have to reimplement at least each (since that is what almost all Enumerable methods call to) and getters/setters. Not worth the effort when you can just change one method instead.
While Andrew Marshall's answer
already correct, You could also try this alternative below.
Going from your code, We could assume that you want to create an object that
act like a Hash, but with a little bit different behaviour. Hence our first
code will be like this.
class Dictionary < Hash
Assigning a new value to some key in the dictionary will be done differently
in here. From your example above, the assignment won't replace the previous
value with a new one, but instead push the new value to the previous or to
a new array that initialized with the new value if the key doesn't exist yet.
Here I use the << operator as the shorthand of push method for Array.
Also, the method return the value since it's what super do (see the if part)
def []=(key, value)
if self[key]
self[key] << value
return value # here we mimic what super do
else
super(key, [value])
end
end
The advantage of using our own class is we could add new method to the class
and it will be accessible to all of it instance. Hence we need not to
monkeypatch the Hash class that considered dangerous thing.
def size_of(key)
return self[key].size if self[key]
return 0 # the case for non existing key
end
Now, if we combine all above we will get this code
class Dictionary < Hash
def []=(key, value)
if self[key]
self[key] << value
return value
else
super(key, [value])
end
end
def size_of(key)
return self[key].size if self[key]
return 0 # the case for non existing key
end
end
player_emails = Dictionary.new
player_emails["SAO"] = "kirito#sao.com" # note no << operator needed here
player_emails["ALO"] = "lyfa#alo.com"
player_emails["SAO"] = "lizbeth#sao.com"
player_emails["SAO"] = "asuna#sao.com"
player_emails.size_of("SAO") #=> 3
player_emails.size_of("ALO") #=> 1
player_emails.size_of("GGO") #=> 0
p listData
#=> {"SAO" => ["kirito#sao.com", "lizbeth#sao.com", "asuna#sao.com"],
#=> "ALO" => ["lyfa#alo.com"] }
But, surely, the class definition could be replaced with this single line
player_emails = Hash.new { [] }
# note that we wont use
#
# player_emails[key] = value
#
# instead
#
# player_emails[key] << value
#
# Oh, if you consider the comment,
# it will no longer considered a single line
While the answer are finished, I wanna comment some of your example code:
filename = 'inputfile.txt.'
# Maybe it's better to use ARGF instead,
# so you could supply the filename in the command line
# and, is the filename ended with a dot? O.o;
File.open(filename, 'r').each_line do |line|
# This line open the file anonimously,
# then access each line of the file.
# Please correct me, Is the file will properly closed? I doubt no.
# Saver version:
File.open(filename, 'r') do |file|
file.each_line do |line|
# ...
end
end # the file will closed when we reach here
# ARGF version:
ARGF.each_line do |line|
# ...
end
# Inside the each_line block
line = line.strip.split(/[^[:alpha:]|#|\.]/)
# I don't know what do you mean by that line,
# but using that regex will result
#
# ["listA", "", "", "billg#microsoft.com"]
#
# Hence, your example will fail since
# line[0] == "listA" and line[1] == ""
# also note that your regex mean
#
# any character except:
# letters, '|', '#', '|', '\.'
#
# If you want to split over one or more
# whitespace characters use \s+ instead.
# Hence we could replace it with:
line = line.strip.split(/\s+/)
puts "LIST-> #{line[0]} SUB-> #{line[1]} "
# OK, Is this supposed to debug the line?
# Tips: the simplest way to debug is:
#
# p line
#
# that's all,
listData[line[0]] = ("#{line[1]}")
# why? using (), then "", then #{}
# I suggest:
listData[line[0]] = line[1]
# But to make more simple, actually you could do this instead
key, value = line.strip.split(/\s+/)
listData[key] = value
# Outside the block:
puts '====================================='
# OK, that's too loooooooooong...
puts '=' * 30
# or better assign it to a variable since you use it twice
a = '=' * 30
puts a
p listData # better way to debug
puts a
# next:
print listData.reduce('') { |s, (k, v)|
s << "The key is #{k} and the value is #{v}.\n"
}
# why using reduce?
# for debugging you could use `p listData` instead.
# but since you are printing it, why not iterate for
# each element then print each of that.
listData.each do |k, v|
puts "The key is #{k} and the value is #{v}."
end
OK, sorry for blabbering so much, Hope it help.
I have a set of word strings which I am turning into a hash, grouped by the size of the string. I am doing this by:
hash = set.group_by(&:size)
resulting in
hash = {5=>[apple, andys, throw, balls], 7=>[bananas, oranges]}
I want to further group the hash values by first letter, so the the end results looks like:
hash = {5=>{a=>[apple, andys],b=>[balls],t=>[throw]}, 7=>{b=>[bananas], o=>[oranges]}}
I tried putting
hash.each_value do | value |
value = value.group_by(&:chr)
end
after the first group_by but that only seems to return the original hash. I am admittedly a ruby beginner so I'm not sure if I could do this in one fell swoop, or exactly how (&:size) notation works, if I were asked to write it out. Thoughts?
To update your hash you need to do like this
hash.each do |key, value|
hash[key] = value.group_by(&:chr)
end
I'd keep the whole computation functional:
>> Hash[set.group_by(&:size).map { |k, vs| [k, vs.group_by(&:chr)] }]
=> {5=>{"a"=>["apple", "andys"], "t"=>["throw"], "b"=>["balls"]},
7=>{"b"=>["bananas"], "o"=>["oranges"]}}
The title really really doesn't explain things. My situation is that I would like to read a file and put the contents into a hash. Now, I want to make it clever, I want to create a loop that opens every file in a directory and put it into a hash. Problem is I don't know how to assign a name relative to the file name. eg:
hash={}
Dir.glob(path + "*") do |datafile|
file = File.open(datafile)
file.each do |line|
key, value = line.chomp("\t")
# Problem here is that I wish to have a different
# hash name for every file I loop through
hash[key]=value
end
file.close
end
Is this possible?
Why don't you use a hash whose keys are the file names (in your case "datafile") and whose value are hashes in which you insert your data?
hash = Hash.new { |h, key| h[key] = Hash.new }
Dir.glob(path + '*') do |datafile|
next unless File.stat(datafile).file?
File.open(datafile) do |file|
file.each do |line|
key, value = line.split("\t")
puts key, value
# Different hash name for every file is now hash[datafile]
hash[datafile][key]=value
end
end
end
You want to dynamically create variables with the names of the files you process?
try this:
Dir.glob(path + "*") do |fileName|
File.open(fileName) {
# the variable `hash` and a variable named fileName will be
# pointing to the same object...
hash = eval("#{fileName} = Hash.new")
file.each do |line|
key, value = line.chomp("\t")
hash[key]=value
end
}
end
Of course you would have to make sure you rubify the filename first. A variable named "bla.txt" wouldn't be valid in ruby, neither would "path/to/bla.csv"
If you want to create a dynamic variable, you can also use #instance_variable_set (assuming that instance variables are also OK.
Dir.glob(path + "*") do |datafile|
file = File.open(datafile)
hash = {}
file.each do |line|
key, value = line.chomp("\t")
hash[key] = value
end
instance_variable_set("#file_#{File.basename(datafile)}", hash)
end
This only works when the filename is a valid Ruby variable name. Otherwise you would need some transformation.
Can't you just do the following?
filehash = {} # after the File.open line
...
# instead of hash[key] = value, next two lines
hash[datafile] = filehash
filehash[key] = value
You may want to use something like this:
hash[file] = {}
hash[file][key] = value
Two hashes is enough now.
fileHash -> lineHash -> content.
I have a configuration class in Ruby that used to have keys like "core.username" and "core.servers", which was stored in a YAML file just like that.
Now I'm trying to change it to be nested, but without having to change all the places that refer to keys in the old way. I've managed it with the reader-method:
def [](key)
namespace, *rest = key.split(".")
target = #config[namespace]
rest.each do |k|
return nil unless target[k]
target = target[k]
end
target
end
But when I tried the same with the writer-class, that works, but isn't set in the #config-hash. #config is set with just a call to YAML.load_file
I managed to get it working with eval, but that is not something I would like to keep for long.
def []=(key, value)
namespace, *rest = key.split(".")
target = "#config[\"#{namespace}\"]"
rest.each do |key|
target += "[\"#{key}\"]"
end
eval "#{target} = value"
self[key]
end
Is there any decent way to achieve this, preferably without changing plugins and code throughout?
def []=(key, value)
subkeys = key.split(".")
lastkey = subkeys.pop
subhash = subkeys.inject(#config) do |hash, k|
hash[k]
end
subhash[lastkey] = value
end
Edit: Fixed the split.
PS: You can also replace the inject with an each-loop like in the [] method if you prefer. The important thing is that you do not call [] with the last key, but instead []= to set the value.
I used recursion:
def change(hash)
if hash.is_an? Hash
hash.inject({}) do |acc, kv|
hash[change(kv.first)] = change(kv.last)
hash
end
else
hash.to_s.split('.').trim # Do your fancy stuff here
end
end