Build Fog gem with just required providers and limit dependencies - ruby

I'm using the excellent Fog gem to access just the Rackspace Cloud Files service. My challenge is that I'm trying to keep the service that is accessing Cloud Files lightweight, and it seems that Fog through its flexibility has a lot of dependencies and code I'll never need.
Has anybody tried to build a slimmed down copy of Fog, just to include a subset of providers, and therefore limit the dependencies? For example, for the Rackspace Cloud Files API exclusively, I'd expect to be able to handle everything without net-ssh, net-scp, nokogiri gems, and all the unused code for Amazon, Rackspace and 20 other providers that are not being used. I'm hoping to avoid upgrading the gem every time one of those unused providers notices a bug, while keeping my memory footprint down.
I'd appreciate any experience anybody may have in doing this, or advice from anybody familiar with building Fog in what I can and can't rip out.
If I'm just using the wrong gem, then that's equally fine. I'll move to something more focused.

I'm the maintainer of fog so I'll chime in to help fill in some of the explanation/gaps. The short answer is, yeah it's a lot of stuff, but mostly it shouldn't impact you negatively.
First off, fog grew pretty organically over time, so it did get bigger than intended. One of the ways that we contend with this is that we rather aggressively avoid requiring/loading files until really needed. So although you have to download lots of provider files you won't use to install fog, they shouldn't actually end up in memory. That was the simplest thing we could do in order to keep things "just working" while also reducing the memory usage (and load time).
The release schedule doesn't tend to be too crazy (on average about once a month) and tends to include a mix of stuff across most of the providers. So I'd expect you won't have too much churn here (outside of emergency/security type fixes which might warrant shortening the normal cycle).
So, that hopefully provides some insight in to the state of the art. We have also discussed starting to split things out more in the longer term. I think if/when that happens we would end up with something like fog-rackspace for all the rackspace related things. And then they could share things through fog-core or similar. We have a rough outline, but it is a pretty big undertaking without a huge upside, so it isn't something we have really actively begun on.
Hope that helps, certainly happy to discuss further if you have further questions or concerns.

I work for Rackspace on, among other things, our Ruby SDKs. You're using the right gem. Fog is our official Ruby API.
This is possibly something that could be done by introducing another gemspec into the project that builds from only fog core and the Rackspace-specific files. Though this would be unconventional and make #geemus' (the gem maintainer) gem release process more complicated––especially should other providers start to do the same. Longer term, this would serve to divert the fog community away from acting as a unified API.

Related

Figuring Out Which Gems Rails App Does Not Use

Working on an app that has almost 200 gems. Has anyone figured out how to isolate gems that are not used so they can be taken out of the mix.
Due to the dynamic nature of Ruby, it's not possible to know for sure if a gem is or isn't used without testing. Although it is bad practice to load them in ad-hoc without a good reason, it is possible to require a gem at any point in the execution of the program. They do not need to be loaded up-front.
Although there might be advantages to loading gems on demand, for instance, keeping a lower memory footprint and reducing launch times, it does make it difficult to determine if or where they are actually used.
There isn't always a correlation between a gem name and the methods it uses. While many have a namespace that's easily grepped for, some just add methods to existing classes which can complicate tracking them down, especially if they go so far as to patch out old methods with new ones that have the same name.
If you are able to exercise a large portion of the application through your unit, functional, and integration tests it might be possible to use ruby-prof to at least get a sense of which gems are used. That could make identifying candidates for removal easier.

Best practice for hosting a large image library

I'm looking into setting up a fairly large scale image library site. I'm looking at CodeIgniter for the framework as I hear it's easy to work with and quick. What I'm looking for help on is the server set up. I've been speaking to a company about getting hosting set up as its's not something I've had much experience with.
I know a Content Delivey Network is worth thinking about, as is using something like Varnish, but don't want to start building anything only to have to redo it to take this into account.
So my question is this: What's the best way to go about setting this up? Start building the site as efficiently as possible and worry about speeding the server up later or is it something that needs sorting up before anything's built.
I recommend building a working site to meet your most basic requirements. Don't worry about features or requirements that are so far off they may never materialize. You can always refactor and improve performance, but requirements and priorities often change, especially once you have something to work with and people are actually using your site.
Having to constantly change/improve working code is often better than doing lots of planning up front, only to end up realizing later that you made a wrong assumption and have to make major changes on a code base that never worked. This is basically Agile vs Waterfall.
If you like PHP, CodeIgniter is a quick way to get started. The most important thing is to be sure to follow conventions and be consistent so that you can easily make major changes without worrying about breaking everything, or having to maintain lots of documentation.
I wouldn't worry about Varnish yet. CodeIgniter has lots of caching options built-in. You won't have millions of users over night, so if you find your growth trajectory going vertical, you can always re-align your priorities at that point. Also, explosive growth is usually tied with people giving you lots of money, so you have more options on solving that "problem".
I would start out with a CDN, as it seems like an essential part of your site. It will largely address image backup as well. Just be sure to comparison shop, because CDN services vary quite a bit. Also, for simplicity, you may want to look into origin-pull.

What is a good open source package for building flexible spam detection on a large Rails site?

My site is getting larger and it's starting to attract a lot of spam through various channels. The site has a lot of different types of UGC (profiles, forums, blog comments, status updates, private messages, etc, etc). I have various mitigation efforts underway, which I hope to deploy in a blitzkrieg fashion to convince the spammers that we're not a worthwhile target. I have high confidence in what I'm doing functionality wise, but one missing piece is killing all the old spam all at once.
Here's what I have:
Large good/bad corpora (5-figure bad, 6 or 7-figure good). A lot of the spam has very reliable fingerprints, and the fact that I've sort of been ignoring it for 6 months helps :)
Large, modular Rails site deployed to AWS. It's not a huge traffic site, but we're running 8 instances with the beginnings of a SOA.
Ruby, Redis, Resque, MySQL, Varnish, Nginx, Unicorn, Chef, all on Gentoo
My requirements:
I want it to perform reasonably well given the volume of data (therefore I'm wary of a pure ruby solution).
I should be able to train multiple classifications to different types of content (419-scam vs botnet link spam)
I would like to be able to add manual factors based on our own detective work (pattern matching, IP reuse, etc)
Ultimately I want to construct a nice interface to be used with Ruby. If this requires getting my hands dirty in C or whatever, I can handle it, but I'll avoid it if I can.
I realize this is a long and vague question, but what I'm looking for primarily is just a list of good packages, and secondarily any random thoughts from someone who has built a similiar system about ways to approach it.
We looked for an acceptable open source solution and didn't find one.
If you come to the same conclusion and decide to consider proprietary anti-spam, check out the paid Akismet collaborative spam filtering service. We've had decent performance from it across a dozen medium sized sites. It integrates with rails through rack and rackismet.

safe browsing with ruby

any usable ruby code to interact with the safe browsing API from google?
i did search at google but i didn't find any mature and solid code.
I have 3 points:
(0) I'd say that This looks alright, as does this
(1) Having used quite a few ruby gems for various obscure things, I find bugs all the time. It helps the open source community and the world if you find a gem, fix a bug, and let the rest of the world benefit by submitting a pull request. Tests make the life of a contributor sooooooooooo much easier, and guarantee that your fix works, so use gems with extensive tests where possible, even if they are not mature and you half-expect them to fail.
(2) From experience, gems which have lots of objects encapsulating something can sometimes be counterproductive. This has tripped me up in the case of the ruby mail gem and the tire gem (though that's not to say that they are not good and incredibly useful gems.). This applies to you if you only need to make one type of API call, say, and take a simple action. Using the simplest gem is sometimes advantageous, and for this purpose you might not need to use any gem at all! Just write a class that uses Net::HTTP to call the HTTP API: https://developers.google.com/safe-browsing/lookup_guide

Ruby daemons vs daemon-kit gems: what are the pros and cons?

What are the relative pros and cons of:
http://rubyforge.org/projects/daemons
http://github.com/kennethkalmer/daemon-kit
Which is more robust?
Are there any other effective Ruby daemon management tools?
DISCLAIMER: I maintain daemon-kit, so this might appear bias but I'm trying my best to be honest.
daemon-kit grew as a set of wrappers around the daemons gem, then about a year ago (with 0.1.7.3) I ripped all traces of the daemons gem from the project and handled everything myself, which resolved the issues you mentioned above, as well as several other.
Instead of acclaiming my own project (not that it needs it), I'll highlight some shortcomings that I plan to address in the future:
Daemons are not easily embedded into Rails applications
Project layout enforced on developers might be to rigid
Biased towards capistrano-based deployments of daemons
Testing daemons is difficult, but on inconceivable
I've got a separate branch where I'm toying with a total rewrite that hopes to make the project more flexible, but it is by no means a pain at the moment. It is currently in production use at quite a few companies, from ISP infrastructure management to telecoms, twitter polling & processing, and just about everything in between.
Movement on the project has been slow in the last few months, purely because it works well. The low version number is very deceptive, it should in fact be way past a 1.x release by now...
Hope this helps!

Resources