Can we store multiple objects in file? - ruby

I am already familiar with How can I save an object to a file?
But what if we have to store multiple objects (say hashes) to a file.
I tried appending YAML.dump(hash) to a file from various locations in my code. But the difficult part is reading it back. As yaml dump can extend to many lines, do I have to parse the file? Also this will only complicate code. Is there a better way to achieve this?
PS: Same issue will persist with Marshal.dump. So I prefer YAML as its more human readable.

YAML.dump creates a single Yaml document. If you have several Yaml documents together in a file then you have a Yaml stream. So when you appended the results from several calls to YAML.dump together you would have had a stream.
If you try reading this back using YAML.load you will only get the first document. To get all the documents back you can use YAML.load_stream, which will give you an array with an entry for each of the documents.
An example:
f = File.open('data.yml', 'w')
YAML.dump({:foo => 'bar'}, f)
YAML.dump({:baz => 'qux'}, f)
f.close
After this data.yml will look like this, containing two separate documents:
---
:foo: bar
---
:baz: qux
You can now read it back like this:
all_docs = YAML.load_stream(File.open('data.yml'))
Which will give you an array like [{:foo=>"bar"}, {:baz=>"qux"}].
If you don’t want to load all the documents into an array in one go you can pass a block to load_stream and handle each document as it is parsed:
YAML.load_stream(File.open('data.yml')) do |doc|
# handle the doc here
end

You could manage to save multiple objects by creating a delimiter (something to mark that one object is finished and that you go to the next one). You could then process the file in two steps:
read the file, splitting it around each delimiter
use YAML to restore the hashes from each chunk
Now, this would be a bit cumbersome, as there is a much simpler solution. Let's say you have three hash to save:
student = { first_name: "John"}
restaurant = { location: "21 Jump Street" }
order = { main_dish: "Happy Meal" }
You can simply put them in an array and then dump them:
objects = [student, restaurant, order]
dump = YAML.dump(objects)
You can restore your objects easily:
saved_objects = YAML.load(dump)
saved_student = saved_objects[0]
Depending of your objects relationship, you may prefer to use an Hash to save them instead of an array (so that you can name them instead of depending on the order).

Related

Using Kiba: Is it possible to define and run two pipelines in the same file? Using an intermediate destination & a second source

My processing has a "condense" step before needing further processing:
Source: Raw event/analytics logs of various users.
Transform: Insert each row into a hash according to UserID.
Destination / Output: An in-memory hash like:
{
"user1" => [event, event,...],
"user2" => [event, event,...]
}
Now, I've got no need to store these user groups anywhere, I'd just like to carry on processing them. Is there a common pattern with Kiba for using an intermediate destination? E.g.
# First pass
source EventSource # 10,000 rows of single events
transform {|row| insert_into_user_hash(row)}
#users = Hash.new
destination UserDestination, users: #users
# Second pass
source UserSource, users: #users # 100 rows of grouped events, created in the previous step
transform {|row| analyse_user(row)}
I'm digging around the code and it appears that all transforms in a file are applied to the source, so I was wondering how other people have approached this, if at all. I could save to an intermediate store and run another ETL script, but was hoping for a cleaner way - we're planning lots of these "condense" steps.
To directly answer your question: you cannot define 2 pipelines inside the same Kiba file. You can have multiple sources or destinations, but the rows will all go through each transform, and through each destination too.
That said you have quite a few options before resorting to splitting into 2 pipelines, depending on your specific use case.
I'm going to email you to ask a few more detailed questions in private, in order to properly reply here later.

Write to a ruby file using ruby

Alright, so what I have is a ruby file that takes an input, and writes it to another ruby file. I do not want to write it as a text file, because I am trying to insert this item into a Hash that can later be accessed in another run of the program, which can only be achieved by writing the info to a text file or another ruby file. In this case I want to write it into another ruby file.Here's the first file:
test_text=gets.chomp
to_write_to=File.open("rubylib.rb", "a")
test_text="hobby => #{test_test},"
to_write_to.puts test_text
This inserts the given info at the BOTTOM of the page. The other file is this: (rubylib.rb)
user_info={
"name" => "bob",,
"favorite_color" => "red"
}
I have a threefold question:
1) Is it possible to add test_text to the hash BEFORE the closing bracket?
2) using this method, will the rubylib.rb file, when run, parse the added text as code, or something else?
3)is there a better way to do this?
What I am trying to do is actually physically write the new data to the Hash so that it is still there the next time the file is run, to store data about the user. Because if I add it the normal way, it will be lost the next time the file is run. Is there a way to store data between runs of a ruby file without writing to a text file?
I've done the best I can to give you the info you need and explain the situation as best I can. If you need clarification or more info, please leave a comment and I'll try and get back to you by commenting on that.
Thanks for the help
You should use YAML for this.
Here's how you could create a .yml file with the data you used in your example:
require "yaml"
user_info = { "name" => "bob", "favorite_color" => "red" }
File.write("user_info.yml", user_info.to_yaml)
This creates a file that looks like this:
---
name: bob
favorite_color: red
On a subsequent execution of your program, you can load the .yml file and you'll get back the same Hash that you started with:
user_info = YAML.load_file("user_info.yml")
# => { "name" => "bob", "favorite_color" => "red" }
And you can add new items to the Hash and save it again:
user_info["hobby"] = "fishing"
File.write("user_info.yml", user_info.to_yaml)
Now the file has these contents:
---
name: bob
favorite_color: red
hobby: fishing
Use a database, even SQLite, and it'll let you store data for multiple sessions without any sort of encoding. Writing to a file as you are is really not scalable or practical. You'll slam into some real problems quickly with it.
I'd recommend looking at Sequel and its associated documentation for how to easily work with databases. That's a much more scalable approach and will save you a lot of headaches as you grow your code.

Weird JSON parsing issues with Ruby

I'm downloading content from a webpage that seems to be in JSON. It is a large file with the following format:
"address1":"123 Street","address2":"Apt 1","city":"City","state":"ST","zip":"xxxxx","country":"US"
There are about 1000 of these entries, where each entry is contained within brackets. When I download the page using RestClient.get (open-uri for some reason was throwing a http 500 error), the data is in the following format:
\"address\1":\"123 Street\",\"address2\":\"Apt 1\",\"city\":\"City\",\"state\":\"ST\",\"zip\":\"xxxxx\",\"country\":\"US\"
When I then use the json class
parsed = JSON.parse(data_out)
it completely scrambles both the order of entries within the data structure, and also the order of the objects within each entry, for example:
"address1"=>"123 Street", "city"=>"City", "country"=>"US", "address2"=>"Apt 1"
If instead I use
data_j=data_out.to_json
then I get:
\\\"address\\\1":\\\"123 Street\\\",\\\"address2\\\":\\\"Apt 1\\\",\\\"city\\\":\\\"City\\\",\\\"state\\\":\\\"ST\\\",\\\"zip\\\":\\\"xxxxx\\\",\\\"country\\\":\\\"US\\\"
Further, only using the json class seems to allow me to select the entries I want:
parsed[1]["address1"]
=> "123 Street"
data_j[1]["address1"]
TypeError: can't convert String into Integer
from (irb):17:in `[]'
from (irb):17
from :0
Any idea whats going on? I guess since the json commands are working I can use them, but it is disconcerting that its scrambling the entries and order of the objects.
Although the data appears ordered in string form, it represents an unordered dataset. The line:
parsed = JSON.parse(data_out)
which you use is the correct way to convert the string form into something usable in Ruby. I cannot see the full structure from your example, so I don't know whether the top level is an array or id-based hash. I suspect the latter since you say it becomes unordered when you view from Ruby. Therefore, if you knew which part of the address you were interested in you might have code like this:
# Writes all the cities
parsed.each do |id,data|
puts data["city"]
end
If the outer structure is an array, you'd do this:
# Writes all the cities
parsed.each do |data|
puts data["city"]
end

Parsing one large array into several sub-arrays

I have a list of adjectives (found here), that I would like to be the basis for a "random_adjective(category)" method.
I'm really just taking a stab at this, as my first real attempt at a useful program.
Step 1: Open file, remove formatting. No problem.
list=File.read('adjectivelist')
list.gsub(/\n/, " ")
The next step is to break the string up by category..
list.split(" ")
Now I have an array of every word in the file. Neat. The ones with a tilde before them represent the category names.
Now I would like to break up this LARGE array into several smaller ones, based on category.
I need help with the syntax here, although the pseudocode for this would be something like
Scan the array for an element which begins with a tilde.
Now create a new array based on the name of that element sans the tilde, and ALSO place this "category name" into the "categories" array. Now pull all the elements from the main array, and pop them into the sub-array, until you meet another tilde. Then repeat the process until there are no more elements in the array.
Finally I would pull a random word from the category named in the parameter. If there was no category name matching the parameter, it would return false and exit (this is simply in case I want to add more categories later.)
Tips would be appreciated
You may want to go back and split first time around like this:
categories = list.split(" ~")
Then each list item will start with the category name. This will save you having to go back through your data structure as you suggest. Consider that a tip: sometimes it's better to re-think the start of a coding problem than to head inexorably forwards
The structure you are reaching towards is probably a Hash, where the keys are category names, and the values are arrays of all the matching adjectives. It might look like this:
{
'category' => [ 'word1', 'word2', 'word3' ]
}
So you might do this:
words_in_category = Hash.new
categories.each do |category_string|
cat_name, *words = category_string.split(" ")
words_in_category[cat_name] = words
end
Finally, to pick a random element from an array, Ruby provides a very useful method sample, so you can just do this
words_in_category[ chosen_category ].sample
. . . assuming chosen_category contains the string name of an actual category. I'll leave it to you to figure out how to put this all together and handle errors, bad input etc
Use slice_before:
categories = list.split(" ").slice_before(/~\w+/)
This will create an sub array for each word starting with ~, containing all words before the next matching word.
If this file format is your original and you have freedom to change it, then I recommend you save the data as yaml or json format and read it when needed. There are libraries to do this. That is all. No worry about the mess. Don't spend time reinventing the wheel.

Ruby CSV, how to write two variables and an array to the same row?

Hey guys I'm writing a ruby program that reads a database of food items and recipes that is in CSV format, and writes it back to a file. I'm having issues writing to a CSV file correctly
I want to write an objects attributes to a CSV file
csv_text = CSV.open("FoodDB1.txt","w") do |i|
##dataList.each do |j|
if j.get_type == "b"
i << [j.name,j.get_type,j.cal]
elsif j.get_type == "r"
i << [j.name,j.get_type,j.print_bFood]
end
end
end
I have two types of objects, basic food and a recipe. Both are stored in the dataList array. I check each object for its type, if it's a basic food, writing it is easy since it is just three simple fields. If it is a recipe, I write the name,type,and the basic foods that make up that recipe.
The issue I'm having is at this line
i << [j.name,j.get_type,j.print_bFood]
So it prints out the name of the recipe, the type(whether its a basic food or a recipe) and then finally the list of foods in the recipe. That is where I'm having issues.
bFood is an array of basic foods that is stored in the object, and I'm having trouble adding it to the CSV row. I tried making a method(which is print_bFood) that returns a string of the combined array using .join(","), but because of the comma in the string, when CSV writes it to a file it is wrapped in quotes
"PB&J Sandwich,r,"Jelly,Peanut butter,Bread slice, Bread slice""
I want it to look like this
"PB&J Sandwich,r,Jelly,Peanut butter,Bread slice, Bread slice"
Any ideas on what can help. I've looked for ways to do this and I just can't think of anything anymore.
One idea I had was if I had the ability to just add on to a row, I could iterate through the bFood array and add each one to the row, but I haven't found any functionality that can do that.
If I read this correctly you should just need...
i << [j.name, j.get_type, j.bFood].flatten

Resources