Packaging a large data file in a ruby gem

Packaging a large data file in a ruby gem - ruby

I'm trying to package a gem that relies on a large relational reference source, currently implemented as a 2.1GB sqlite database file. I've placed the file in a /data directory and included it appropriately in the gemspec. gem build works fine (although it takes half an hour to compress itself!), but gem install errors out:
ERROR: While executing gem ... (RangeError)
integer 2243380224 too big to convert to `int'
This would be totally cryptic, if I didn't notice that 2243380224 is the exact file size of the database. However, knowing that's the cause of the error doesn't bring me closer to a solution.
In the case at hand, it would not make sense to require users to separately download the database and specify it in their project configuration. I want gem install to deliver this functionality out of the box. Any suggestions on what best practice should be on packaging up ruby functionality that relies on mining a large information repository?

I suspect the error is coming from this line where rubygems tries to read the entry from the tar archive (a gem is basically a tar archive).
Not only is it trying to read the entry into memory in one go, but ruby's IO.read requires that its first argument fit into a signed long (not sure this is documented, but it's certainly what MRI does). On many common platforms, that means a 32 bit signed integer, max value 2**31-1, which is less than the size of your sqlite file.
You won't be able to install this gem without patching rubygems itself. Furthermore, since it does everything in memory (since it's expecting files to be smallish) you might run into memory allocation issues.
A gem can have a post install message, which you could use to prompt users to run your download script, but you can't run a script automatically (unless you abuse extconf.rb). Your users will probably thank you when minor code changes to your gem don't require them to re-download the 2GB data files.

Related

Ruby Bundler - handling gems in multiple VMs

Gems and gem dependencies are becoming a bit of a nightmare for our organisation. I am the only developer at the moment using Ruby but with more coming onboard all the time we really need to get Gems and dependencies in order.
The problem is that development VMs do not have internet access but do have access to a shared directory that we can map to. Currently I have just been downloading gems and dependencies one at a time to my local net connected laptop and then moving gems to the shared directory and then copying them over to the VM - clearly this is nonsense and I need to start using bundler or some other system.
I am sure other companies have had similar problems what is considered best practise?

Loading gems from project(not system or internet) can be solution of that problem. So, it means download all needed gems and then copy vendor folder(can use any memory from flash or CD) to each project and them will be can load needed gems from this folder.
Ruby gem dependencies on offline server

Process for debugging a third-party gem

While using the guard-cucumber plugin I found it wasn't working with terminal-notifier-guard
What's the best way to figure out what's going wrong and start making lasting fixes?
Some important criteria are:
I want to get up and running as fast as possible.
Once I'm up and running I want to be able to iterate as fast as possible.
I want to alter my project as little as possible.
I want to be able to keep my changes in version control so I don't lose track of them.
I want to be able to get feedback on my changes if I want.
I want to test my changes in an isolated env.
I don't want to change the guard-cucumber gem installed on my system.
This is my process:
Fork the project and clone it.
Create a new branch
Change the name to guard-cucumber-cats in the gemspec.
In the cloned gem, Do some work.
Add a breakpoint near my work with pry:
require 'pry'
binding.pry
rake install to install guard-cucumber-cats on my machine.
In my project, add guard-cucumber-cats as a dep and comment out guard-cucumber in gemspec. For example:
# spec.add_development_dependency "guard-cucumber", "~> 2.1.2"
spec.add_development_dependency "guard-cucumber-cats"
Run my project, hit debug, mess around with code.
Repeat 4-8
Thinking about process in terms of criteria:
Getting up and running wasn't too bad
Iterating was pretty painful, having to reinstall the guard-cucumber-cats every time I wanted to test a change was awkward.
Changing one line in my project gemspec doesn't seem bad, however it would be better to be able to change nothing.
All the benefits of git right for the beginning was a huge plus
Same goes for github
I don't have a workflow setup with either vagrant or docker, so my "isolated" test env is basically just me on my laptop. Maybe this is a good opportunity to set up a working containerized dev env? I'm not sure sure, but the way I have it now doesn't seem like it meets this criteria.
Although it seems awkward to have to rename the gem and then install it on my system it is nice that I didn't need to change the actual guard-cucumber on my machine.
Ideally I'd like to be able to do something like:
cp -r my project into some sandboxed env
Clone the forked guard-cucumber into the same sandboxed env
Have everything "just work"
Do work on the cloned guard-cucumber
Repeat 4
I read in another answer to just start edit the system gem but didn't like that solution because I lose the benefits of git and github.
Is there a better process that's working for you? Or areas my process can be improved?
Also, as a side note: this is how you can nest code blocks in a list

Use :path in the project's gemfile.
You are working on gem my_gem and project my_project at the same time. To make this painless use this in the gemfile of your project
# in /path/to/my_project/Gemfile
...
gem 'my_gem', :path => '/path/to/my_gem'
...
And then you can iterate without having to build the gem all time.
There's isn't even a need to rename the gem.

Is it a Bad Practice to Have Both a .rvmrc and a .ruby-version in a Ruby Project?

There are two Ruby projects I am currently working on which have both a .rvmrc and a .ruby-version file in their root dir. I use rvm to manage my Ruby versions in my local development environment, and have my own local .rvmrc files in my home directory's copy of various Ruby versions, so naturally I get the rvm warning when I change directory into these projects:
You are using '.rvmrc', it requires trusting, it is slower and it is not compatible with other ruby managers,
you can switch to '.ruby-version' using 'rvm rvmrc to ruby-version'
or ignore this warning with 'rvm rvmrc warning ignore /home/ME/craft/ruby/rails/CLIENT/APPLICATION/.rvmrc',
'.rvmrc' will continue to be the default project file in RVM 1 and RVM 2,
to ignore the warning for all files run 'rvm rvmrc warning ignore all.rvmrcs'.
I felt it was odd to have both configuration dotfiles in the same project at first, and figured that it might be a historical quirk to the Ruby culture that I was unaware of (Im a less-opinionated language generalist, really). Personally I never use .rvmrc in a project, and I work on 10-15 Ruby projects in a year, and rarely see this file in anything I work with.
The problem really arises on the 2nd of these two projects, where the .rvmrc file has an older patch of the Ruby version than the .ruby-version. This resulted in some complications for my local environment that I resolved, though I feel its a bit awkward. To make it worse, I fixed my environment for the (git) 'master' branch of the project, and when I switched to the latest feature branch, the .ruby-version file was updated to yet another different patch number. So I repeated things like reinstalling bundler, reinstalling all the gems, and I choose to manually switch to this patch version. I am unsure of the 'correct' way to do the above, and this way seems to work for my environment (at the cost of duplicating gems and taking up a bit of space on my hard disk).
I am concerned as to why a project would have both these files defined, and especially concerned for a project that has differing versions/patch numbers in each file.
Is this normal? Should this be rectified by removing the .rvmrc file from the project? Should the .rvmrc file, at the very least, be updated to the same version and patch number as the .ruby-version file? I instinctively feel this isnt right, but want to be aware of any sort of history regarding rvm and other methods for maintaining Ruby versions which might actually make this decision sensible. Can anyone relay the history of how such a situation might sensibly evolve, or is it just a symptom of too many cooks in the kitchen over time?
(possibly related question concerning .ruby-version and Gemfile)

It is a "bad practice" in that it maintains two conventions at once, which can lead to version management issues in some environments. It also makes it possible for one of the conventions to fall out of sync with the other in regards to the version of ruby used in the project. The .ruby-version file is more conventional at this time, so it would be best to remove the .rvmrc file and only maintain .ruby-version.

Maintaining multiple standalone ruby scripts on same machine

I have one machine which runs multiple standalone ruby scripts. Every time I have to upgrade some gem for one of the scripts, I have to look for its impact on other scripts as well. Do you think it will be a good practice to create one gemfile each for a ruby script or can someone recommend me a better way to maintain such system? I also sometimes want to use different versions of the same gem in different scripts.

How about using RVM or RbEnv?
In this case, you will need to have each file in a separate folder with rvm/rbenv config.

I create independent gemset for each project to be sure that all be ok with gems and it's versions.
RVM gives all the tools to manage gemsets.
So you will need to create a separate folder for each script with its own .ruby-gemset in it.

Can I install postgresql8.2 via command prompt or running any batch or registry file?

Is it possible to install the entire database(postgresql8.2) via command prompt or batch file or registry file bypassing the trivial procedure for installation. But then to a question comes that, how can we supply default parameters such as name,password,language,default location of database? Currently I'm working on 'Windows XP' platform.
Thank you.

For 8.3 and lower the obvious answer is: http://pginstaller.projects.pgfoundry.org/ which supports or supported silent installations. For more recent versions, please read: http://forums.enterprisedb.com/posts/list/2135.page
Use of existing installers would simplify your life and be where I would start.
This being said there is no reason you can't generate a script to register dll's properly run initdb, etc. This will take some extra knowledge of both PostgreSQL and Windows, and will be mostly suitable for custom solutions (i.e. not cases where you merely are packaging software that runs with PostgreSQL). I don't think an complete answer can be given here because once you need such a solution you need to design your installation around if. Books could be written on that topic. The docs http://www.postgresql.org/docs/9.0/static/install-windows.html should get you started however since the only difference really between installing from source and installing from the precompiled source is just that you need to compile the source files first.
Failing that you could take a look at the binary zip packages. Typically these can be extracted and PostgreSQL can be run from inside them.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio