Best way to store data with pure ruby, no dependencies [closed] - ruby

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I must create an application using only tools available in ruby core or stdlib. Do YAML or SQLite come with ruby? What are some of the other tools available that would allow me to store data to a file? What are their advantages or disadvantages?

Ruby's stdlib is deep. Maybe too deep. I knew sqlite wasn't in there, but I figured something was. Here is what I found...
There are up to 4 different simple databases already in the stdlib:
PStore - Very simple persistent hash. Handles marshaling for you, so you can store trees of ruby objects. Pure ruby solution.
SDBM - C-based key/value store. Ruby ships with the entire source so it should be portable across platforms. Simple string keys and values only.
GDBM - Another string only key/value store. Uses GNU dbm. Its "enumerable" so its a little more hash-like. Possibly not very portable.
DBM - Uses the DBM headers available on the platform ruby was compiled on, so it could be one of several DBM implementations (read: not portable). Yet another string only key/value store. That's 3. Unlike GDBM though this one will allow you to store non-string values and silently ruin them by calling #to_s or #inspect.
I might actually use PStore for small things myself now. SQLite is probably better, but PStore is undoubtedly simpler so if the job is small enough it makes sense.
You can also use serialization. Marshal will dump actual ruby objects and their data. YAML can sort of do this as well. Using JSON/YAML/CSV you can finely control the format of the data. All of these can be used with File to write their output to a file.

You can you ruby's stdlib CSV library to store any database data. Its format is very useful, for storing, exporting, and importing DB data. See documentation on CSV here. As example, just do:
require 'csv'
# save
CSV.open("file.csv", "wb") do |csv|
csv << ["row", "of", "CSV", "data"]
csv << ["another", "row"]
...
end
#load
CSV.foreach("file.csv") do |row|
row # => ["row", "of", "CSV", "data"]
...
end

File.open 'local.rbdb', 'w+' do |f|
f.write JSON.generate(write_target)
End
Build your write_target data in a typical manner, and then use JSON as a storage format.

Related

Storing binary Ruby Marshalled objects in Git. Use filters to convert to text (JSON or YAML) and back? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I want to version control binary files that contain the data to run our project. The format used by the program is Marshalled Ruby objects, there is no options to change this in the program, Windows only, and it's closed source. Lovely right?
Here is some good news though. Most of the classes are well documented and for the most part are close to just being Structs, but some have custom Marshalling methods. I also plan to build tools for diffing and merging these files, but figuring out how to put them in to the repo is more important.
So, would using filters to smudge binary files into text (JSON or YAML) for storage in Git and clean them back out to binary for the working directory be a wise idea or just a waste of time?
Rough implementation of both filters, dropping imports, using YAML, and untested with Git:
puts Marshal.load(gets).to_yaml # Smudge
puts Marshal.dump(YAML.load(gets)) # Clean
Edit: Thought I should note that there is deflated Ruby scripts stored in one of these files. A clean project has about 133 KB of Zlib deflated script in it, about 800 KB when inflated.
I wouldn't get too caught up in the guideline of not storing binary files in Git.
The real challenge comes, as you suggested, in diffing and merging these files. If you store them as text, you likely don't need to do anything special here. YAML and JSON are both relatively easy to diff and merge manually.
If it is convenient, store text. This will let anybody diff the files using whatever tools they already have available.
On the other hand, if you are already planning to write your own diff and merge tools (which can be hooked into Git) you shouldn't have too much trouble storing the original binary files.
Storing binary files and using your custom diff / merge tools will require users to have those tools available for diffing and merging.

Data Splitting in Ruby

I am looking for a gem that will split a CSV dataset into smaller datasets for training and test on a machine learning system. There is a package in R which will do this, based on random sampling; but my research has not turned up anything in Ruby. The reason I wanted to do this in Ruby is that the original dataset is quite large, e.g. 17 million rows or 5.5 gig. R expects to load the entire dataset into memory. Ruby is far more flexible. Any suggestions would be appreciated.
This will partition your original data to two files without loading it all into memory:
require 'csv'
sample_perc = 0.75
CSV.open('sample.csv','w') do |sample_out|
CSV.open('test.csv','w') do |test_out|
CSV.foreach('alldata.csv') do |row|
(Random.rand < sample_perc ? sample_out : test_out) << row
end
end
end
you can use the smarter_csv Ruby gem and set the chunk_size to the desired sample size,
and then save the chunks as Resque jobs , which can then be processed in parallel.
https://github.com/tilo/smarter_csv
see examples on that GitHub page.
CSV is built-in to ruby, you don't need any gem to do this:
require 'csv'
csvs = (1..10).map{|i| CSV.open("data#{i}.csv", "w")}
CSV.foreach("data.csv") do |row|
csvs.sample << row
end
CSV.foreach will not load the entire file into memory.
You will probably want to write your own code for this, based around Ruby's bundled csv gem. There are lots of possibilities for how to split the data, and the requirement to do this efficiently over such a large data set is quite specialist, whilst also not requiring that much code.
However, you might have some luck looking through the many sub-features of ai4r
I've not yet found many mature pre-packaged machine learning algorithms for Ruby (that you might also find in R or in Python's scikitlearn). No random forests, gbm etc - or if there are, they are difficult to find. There is a Ruby interface to R. Also wrappers for ATLAS. I have tried neither.
I do make use of ruby-fann (neural nets) , and the gem narray is your friend for large numerical data sets.

Using gzip compression in Sinatra with Ruby

Note: I had another similar question about how to GZIP data using Ruby's zlib which technically was answered and I didn't feel I could start evolving the question since it had been answered so although this question is related it is not the same...
The following code (I believe) is GZIP'ing a static CSS file and storing the results in the result variable. But what do I do with this in the sense: how can I send this data back to the browser so it is recognised as being GZIP'ed rather than the original file size (e.g. when checking my YSlow score I want to see it correctly marking me for making sure I GZIP static resources).
z = Zlib::Deflate.new(6, 31)
z.deflate(File.read('public/Assets/Styles/build.css'))
z.flush
#result = z.finish # could also of done: result = z.deflate(file, Zlib::FINISH)
z.close
...one thing to note is that in my previous question the respondent clarified that Zlib::Deflate.deflate will not produce gzip-encoded data. It will only produce zlib-encoded data and so I would need to use Zlib::Deflate.new with the windowBits argument equal to 31 to start a gzip stream.
But when I run this code I don't actually know what to do with the result variable and its content. There is no information on the internet (that I can find) about how to send GZIP encoded static resources (like JavaScript, CSS, HTML etc) to the browser, this making the page load quicker. It seems every Ruby article I read is based on someone using Ruby on Rails!!?
Any help really appreciated.
After zipping the file you would simply return the result and ensure to set the header Content-Encoding: gzip for the response. Google has a nice, little introduction to gzip compression and what you have to watch out for. Here is what you could do in Sinatra:
get '/whatever' do
headers['Content-Encoding'] = 'gzip'
StringIO.new.tap do |io|
gz = Zlib::GzipWriter.new(io)
begin
gz.write(File.read('public/Assets/Styles/build.css'))
ensure
gz.close
end
end.string
end
One final word of caution, though. You should probably choose this approach only for content that you created on the fly or if you just want to use gzip compression in a few places.
If, however, your goal is to serve most or even all of your static resources with gzip compression enabled, then it will be a much better solution to rely on what is already supported by your web server instead of polluting your code with this detail. There's a good chance that you can enable gzip compression with some configuration settings. Here's an example of how it is done for nginx.
Another alternative would be to use the Rack::Deflater middleware.
Just to highlight 'Rack::Deflater' way as an 'answer' ->
As mentioned in the comment above, just put the compression in config.ru
use Rack::Deflater
thats pretty much it!
We can see that users are going to compress web related data like css files. I want to recommend using brotli. It was heavily optimized for such purpose. Any modern web browser today supports it.
You can use ruby-brs bindings for ruby.
gem install ruby-brs
require "brs"
require "sinatra"
get "/" do
headers["Content-Encoding"] = "br"
BRS::String.compress File.read("sample.css")
end
You can use streaming interface instead, it is similar to Zlib interface.
require "brs"
require "sinatra"
get "/" do
headers["Content-Encoding"] = "br"
StringIO.new.tap do |io|
writer = BRS::Stream::Writer.new io
begin
writer.write File.read("sample.css")
ensure
writer.close
end
end
.string
end
You can also use nonblock methods, please read more information about ruby-brs.

Ruby reading VB.NET generated data

This is the general goal I am trying to achieve:
My VB.NET program will generate some Lists that may contain booleans, integers, strings, or more lists. I want the program to output a "file" which basically contains such data. It is important that the file cannot be read by humans Okay actually, fine, human-readable data wouldn't be bad.
Afterward, I want my Ruby program to take such file and read the contents. The Lists become arrays, and integers, booleans and strings are read alright with Ruby. I just want to be able to read the file, I might not need to write it using Ruby.
In .Net you'd use a BinaryWriter, if you're using IronRuby you'd then use a BinaryReader. If you're not using IronRuby, then perhaps...
contents = open(path_to_binary_file, "rb") {|io| io.read }
Why do you not want it to be human readable? I hope it's not for security reasons...
use JSON you can use the json.net nuget package.

Java .properties file equivalent for Ruby?

I need to store some simple properties in a file and access them from Ruby.
I absolutely love the .properties file format that is the standard for such things in Java (using the java.util.Properties class)... it is simple, easy to use and easy to read.
So, is there a Ruby class somewhere that will let me load up some key value pairs from a file like that without a lot of effort?
I don't want to use XML, so please don't suggest REXML (my purpose does not warrant the "angle bracket tax").
I have considered rolling my own solution... it would probably be about 5-10 lines of code tops, but I would still rather use an existing library (if it is essentially a hash built from a file)... as that would bring it down to 1 line....
UPDATE: It's actually a straight Ruby app, not rails, but I think YAML will do nicely (it was in the back of my mind, but I had forgotten about it... have seen but never used as of yet), thanks everyone!
Is this for a Rails application or a Ruby one?
Really with either you may be able to stick your properties in a yaml file and then YAML::Load(File.open("file")) it.
NOTE from Mike Stone: It would actually be better to do:
File.open("file") { |yf| YAML::load(yf) }
or
YAML.load_file("file")
as the ruby docs suggest, otherwise the file won't be closed till garbage collection, but good suggestion regardless :-)
Another option is to simply use another Ruby file as your configuration file.
Example, create a file called 'options'
{
:blah => 'blee',
:foo => 'bar',
:items => ['item1', 'item2'],
:stuff => true
}
And then in your Ruby code do something like:
ops = eval(File.open('options') {|f| f.read })
puts ops[:foo]
YAML will do it perfectly as described above. For an example, in one of my Ruby scripts I have a YAML file like:
migration:
customer: Example Customer
test: false
sources:
- name: Use the Source
engine: Foo
- name: Sourcey
engine: Bar
which I then use within Ruby as:
config = YAML.load_file(File.join(File.dirname(__FILE__), ARGV[0]))
puts config['migration']['customer']
config['sources'].each do |source|
puts source['name']
end
inifile - http://rubydoc.info/gems/inifile/2.0.2/frames will support basic .properties files and also .ini files with [SECTIONS] eg.
[SECTION]
key=value
YAML is good when your data has complex structure but can be fiddly with spaces, tabs, end of lines etc - which might cause problems if the files are not maintained by programmers. By contrast .properties and .ini files are more forgiving and may be suitable if you don't need the deep structure available through YAML.
Devender Gollapally wrote a class to do precisely that:
...though i'd recommend better to use a YAML file.
Instead of the .properties style of config file, you might consider using YAML. YAML used in Ruby on Rails for database configuration, and has gained in popularity in other languages (Python, Java, Perl, and others).
An overview of the Ruby YAML module is here: http://www.ruby-doc.org/core/classes/YAML.html
And the home page of YAML is here: http://yaml.org

Resources