Ruby web spider & search engine library - ruby

I'm looking for a Ruby library or gem (or set of gems) which will not only do spidering, but also collect the data into, say, a database, and allow basic searches on the data (i.e. a typical web search).
I've found several spidering libraries, so that part seems well covered (I was going to try Anemone first), but I can't find anything that will take the spidered data and allow querying on it. For lack of an existing one, I was going to write something myself with Anemone.
Any suggestions?

That blog post might give you some pointers. Also, look into ferret for the search part.

there is a ruby gem may help you:
http://spidr.rubyforge.org/

There are lots of great stuff on github.com

Related

Ruby Project VS Ruby Gem

I have read through Q&A/articles that explain the ideal structure of a Ruby project. I read the RubyGems guides on how to create a Ruby gem. I have just read a Q&A asking at what point a ruby project becomes a ruby gem but I can not for the life of me see the difference between the two. The structure seems to be the same. The files, where they go, everything looks the same to me. Is it how they're used? Can someone please explain the difference between the two to me?
The question that must be answered respect to 'Gemify' or not is: am I writing something that is readily reusable in a different context? If the answer is yes then your application is a candidate for 'Gemification'. If not then generally it is not worth the additional complexity to convert a Ruby project into a Gem.
For example. If one makes a CLI Ruby application that collects mortgage rates from multiple vendors and updates a database then there are two ways this could be converted into a gem.
First: You could generalise the interface/configuration and make it useful as a plugin/add-on/extension to projects written by someone needing the same or similar functionality. So someone could add the gemified version to their project and use it to do the grunt work for them and just make use of the results. This describes the most common use case for gems.
Second: However, you could also extract the framework of your CLI project layout into a generator gem for others to easily create their own CLI project layouts. This is how Rails came to be.

Sphinx alternative without Rails

I need an indexing and searching gem like Sphinx but without needing Rails Any suggestions ? It has to run under Ruby 1.9.3 on a windows box. Tried Sphinx without rails but it needs MySql and a lot of configuration, diddn't succeed. Can you recommend something that uses a build in db or feature like Sqlite ?
While certain plugins (like thinking_sphinx) are Rails-specific, Sphinx itself is just a search server, and you can use the client gem directly to index and search whatever you want; it need not be in ActiveRecord (which, incidentally, you could also use outside of Rails if you wished).
Another alternative is Apache Solr, which provides similar functionality to Sphinx, including a sophisticated search system (supporting stemming and lots of other nice things).
Although Sphinx itself works regardless of whether you use Rails or not, if you're having difficulties with it, you can always try something like Solr or Elastic Search. I heard good things about the latter and it is quite trivial to get to run on Windows, as you seem to be.
Another alternative for sphinx,solr is OpenSearchServer . A search engine based on lucene. And it has a good windows support. Have a look at the documentation here how to install in windows.You can use the API to integrate in Ruby.

Extracting source code from a document for testing

I wrote tutorials for some Ruby gems I wrote. It is in markdown (Kramdown) text document. To ensure the integrity of the source code in the tutorials as the development of the gems continue I want to extract the source code from the tutorial document and run test to ensure the code is correct and working. Before reinventing the wheel I searched but found nothing on this kind of problem. Is there any software that can help me solve my problem? Ruby software would be cool but I'm not particular about the language. I'm sure I can't be the first person to encounter this problem.
The other option is to only have place holders in the tutorial documents and have all the files externally en then populate the document prior to publishing. This would mean a lot more loose files but would be significantly easier to implement.
Org mode in emacs can do that, but it means you can't write in Markdown.
Are you after something like Ruby DocTest?

Discovering capabilities of Ruby gem

The Ruby (and RoR) community publishes a large number of gems. But more often than not using these gems requires a good amount of effort, specially if one is new to Ruby. It would to be nice if Ruby experts (rockstars) share the best approaches to utilize inadequately documented gems.
Thanks
--arsh
As my manager likes to say:
The truth is in the code.
Look for examples of how others have used it, and modify as necessary.
There are frequently example directories in gems
Search the internet, people like to put this stuff in blogs
Read the docs.
Maybe posted on github
Frequently a link from the rubygems page
If installed as a gem, you can host your own server with $ gem server then go to localhost:8808 to get a list of all your installed gems, and you click the one you are interested in to see its documentation.
Look for tutorials that cover the gem
Railscasts are great for this
Many gems will have a wiki on github
Many of the more useful / cool / fun gems will be talked about in different books. You can get a lot of tutorials about how to deal with a given gem by getting a book that uses that gem to do something. The downside of this is that these kinds of books tend to go out of date pretty quickly.
Look at the code
If the code base is small, or you have a specific question about how something works, or want the truly definitive source, go check out the code.
If the code is installed as a gem, you can type $ gem environment and it will tell you your rubygems dir. Go there, cd into the gem you are interested in, check out its code in the lib directory.
Ask a mailing list
If a gem or project is large enough, it will have its own mailing list. You can usually find these by going to its homepage or reading its readme.
If not, try asking about the gem on the Ruby or the Ruby on Rails mailing lists.
You can always give your own gems a rockstar promotion. Vimeo: Zombie-chaser version 0.1: Mutation testing ... with zombies!

Website Listing Commonly Used Ruby Gems, Including Alternatives

I know that I've seen this site before, but cannot remember it for the life of me. Basically, it is a listing of commonly used gems, like XML parsing or ORM libraries. For the ORM case, it lists ActiveRecord, DataMapper, and the like, stating the advantages and disadvantages of each. Does anyone know what this site is? I've googled and have not been able to find it.
You have ruby-toolbox for that http://www.ruby-toolbox.com
You know only information about their activity on github. But it's interesting.

Resources