Anything better than ruby alchemy for extracting keywords? - ruby

I've currently written an algorithm in Ruby based on the arc90 readability code to extract an article from a web page.
Now that I have the article, I want to extract keywords and specific information from it (names, author, etc)
I heard Alchemy was a great ruby gem for doing this though it consumes a lot of resources. Are there any better gems I can use for this?

fast, leightweight and easy-to-use gem for extracting keywords from longer content:
https://rubygems.org/gems/highscore
i use it in production, works like a charm.
The question is a bit older, but i'll leave this here for others who will come from google to see this question.

There is an OpenCalais gem which provides similar capability. In addition to entity extraction it can also detect events and relations between entities. It's not lightweight, though I couldn't tell if it's better or worse than Alchemy as I haven't used the Alchemy gem. Hope this helps.

Related

Ruby Object Mapper documentation

I need to use an ORM for my project and I opted to use ROM, since every other ORM seems to be directly tied to Rails and I am not using and will not use Rails.
I find the official documentation of ROM, found on www.rom-rb.org, to be awful. It doesn't provide any minimal code examples or even a cohesive experience. Instead, it gives you multitudes of theoretical breadcrumbs that are completely unrelated to each other. After reading the entire documentation under the "Learn" section, I realised my knowledge of ROM did not change in the slightest.
I need help from developers who have experience with ROM. Is there an actual source of knowledge on how to use ROM? Where can I find code examples or projects that use ROM? Or is there a good alternative ORM in Ruby that isn't tied to Rails?
Following some recommendations from the comments to the question, I decided to ditch ROM in favour of Sequel. It really turned out to be much simpler and, most importantly, to actually work.
It is quite ironic that ROM is in part based on Sequel...

Can I read webpage data using Ruby?

I am looking for a way to automate the testing, web page data filling, and also wanted to extract web page data and get them stored into our database permanent basis. Is there any way to fulfil such requirement using Ruby? If so, please guide me to what Ruby modules can help me.
Yes you can do all this tasks using Ruby and some gems.
I recommend you to take a look at Nokogiri gem for data extraction:
https://github.com/sparklemotion/nokogiri
And Capybara gem for testing and automation of forms and stuff:
https://github.com/jnicklas/capybara
P.S.: Capybara gem does much more than just this, but it can be applied to your case too.
Since some Webpages may not be valid XML, you are also able to use Regular Expressions to fetch the data you want from a webpage. Sometimes a XMLReader-approach just fails.
Sample:
require 'open-uri'
page_content = open("http://your_page.com").read
page_body = page_content.scan(/<body>(.*)<\/body>/i).first
# do whatever you want with it
As VBSlover said, capybara is useful to deal with browsing related stuff.
Doing this in an automated way every n minutes or the like is also possible with the whenever gem.
For handling Database-Storing there are plenty of very good gems out there.
Final answer: there is nothing you can't do with Ruby nowadays. Okay, maybe except writing some really (!) high-performance code / 3D-Engines.
Edit:
if you can tell what you exactly want to do i may suggest you some matching gems.
Usually "There is a gem for it" is a good saying. you can browse rubygems.org for some keywords you need, or look at https://www.ruby-toolbox.com/ for some categorized/ranked suggestions for your problem. :)
EDIT 2:
have a look at http://watir.com/
maybe just play around with it in some little painless scripts to get a feeling for it and if it is the solution for you.
Watir drives browsers the same way people do. It clicks links, fills
in forms, presses buttons. Watir also checks results, such as whether
expected text appears on the page.
Once you have it clicked everything for you, just scrape the results (or whatever you need) from the webpage, using some XML-Parser (nokogiri would be a good choice) or some regexp's.
Then stuff your data in your database. Activerecord comes to mind for this, but it may or may not be overkill. depending on your database, choose whatever adapter/connection gem you like (again: there are MANY).
If you want to do this every hour or the like, just use the whenever gem (manages a cronjob for you) or simply write a infinite loop with sleep(x) in it if you want. There is more than one way to do it. :)
First of all, you need a proper operation system, either use Linux or BSD or MacOS.
Windows will fit for some people, but not for you as ruby developer, too much libraries need c extensions with are pain in the ass to compile under cygwin.
I recommend, install a Ruby version manager, so you can try out different ruby versions, I prefer RVM, the Ruby Version Manager.
Install Ruby 1.9.3 it is the standard nowadays.
Trough rubygems install the gem mechanize, with does pretty all automation for websites you will need. It is a successor of LWP::Mechanize from Perl.
Nokogiri would be also useful, for parsing XML data like (X)HTML, but remember you should have prior libxml libs installed on your system.
Ah, according to your question:
Yes, you can read websites using ruby, for example read this webpage:
http = HTTPClient.new
http.get "http://stackoverflow.com/questions/14235393/can-i-read-webpage-data-using-ruby"
Done

safe browsing with ruby

any usable ruby code to interact with the safe browsing API from google?
i did search at google but i didn't find any mature and solid code.
I have 3 points:
(0) I'd say that This looks alright, as does this
(1) Having used quite a few ruby gems for various obscure things, I find bugs all the time. It helps the open source community and the world if you find a gem, fix a bug, and let the rest of the world benefit by submitting a pull request. Tests make the life of a contributor sooooooooooo much easier, and guarantee that your fix works, so use gems with extensive tests where possible, even if they are not mature and you half-expect them to fail.
(2) From experience, gems which have lots of objects encapsulating something can sometimes be counterproductive. This has tripped me up in the case of the ruby mail gem and the tire gem (though that's not to say that they are not good and incredibly useful gems.). This applies to you if you only need to make one type of API call, say, and take a simple action. Using the simplest gem is sometimes advantageous, and for this purpose you might not need to use any gem at all! Just write a class that uses Net::HTTP to call the HTTP API: https://developers.google.com/safe-browsing/lookup_guide

Creating on-demand, print-quality PDFs (preferably in Ruby if feasible)

Main-Question
What's your fast and reliable (as in "stable") solution to create on-demand, newspaper-like (as in "using advanced layout or typesetting") PDFs out of an application on a Linux server?
Therefore: No, HTML2PDF is not the solution I'm looking for. ;-)
Bonus-Question
And if it's not Ruby-based: Is there a way to steer your solution out of a Rails application? Preferably over a webservice or a something-2-Ruby-bridge-kind-of-thing?
Thanks a lot for your suggestions!
Update
There's a similar question and the rtex gem suggested there looks like what I'm looking for. I'll keep this question unanswered to look if there are other suggestions.
Typesetting well is hard.
If you can't find a ruby typesetting library, you may want to look at running a background pdflatex. LaTeX source is pretty easy to generate programmatically.
How useful this idea is will depend a bit on how complicated your documents are, and how much you care about the quality of output. If you have simple text only, and only want something a bit better than html, you probably have more options.
Prawn is designed for this type of thing, and it's under current development.
With php I've had great luck with FPDF. I generate a few thousand high quality reports everyday with it. Never misses a beat and is pretty quick. With php running on a webserver, I imagein it wouldn't be too hard to setup ruby to feed the php page the data required to generate and then ruby pick up the result.
EDIT: It looks like there is a port for Ruby. http://zeropluszero.com/software/fpdf/

Ruby off the rails

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
Sometimes it feels that my company is the only company in the world using Ruby but not Ruby on Rails, to the point that Rails has almost become synonymous with Ruby.
I'm sure this isn't really true, but it'd be fun to hear some stories about non-Rails Ruby usage out there.
One of the huge benefits of Ruby is the ability to create DSLs very easily. Ruby allows you to create "business rules" in a natural language way that is usually easy enough for a business analyst to use. Many Ruby apps outside of web development exist for this purpose.
I highly recommend Googling "ruby dsl" for some excellent reading, but I would like to leave you with one post in particular. Russ Olsen wrote a two part blog post on DSLs. I saw him give a presentation on DSLs and it was very good. I highly recommend reading these posts.
I also found this excellent presentation on Ruby DSLs by Obie Fernandez. Highly recommended reading!
I use Ruby extensively in my work, and none of it is Rails (or even web) based.
My domain is usually client-side Windows applications (wxRuby GUI) and scripts, automating Excel, Internet Explorer, SQL Server queries and report generation (win32ole COM automation). I also use the sqlite, pdf-writer, and gruff libraries for various data munging and graph generation tasks.
Rails' success has been great for Ruby, but I agree that Rails has received so much attention that Ruby's value beyond the web is often overlooked.
We are mainly a C++ shop, but we've found several areas where Ruby has proven quite useful. Here are a few:
Code Generation - Built several DSLs to generate C++/Java/C# code from single input files
Build Support
scripts to generate Makefiles for unix from Visual Studio Project Files
scripts for building projects and formatting the output for Cruise Control
scripts for running our unit tests and formatting the output for Cruise Control
scripts for manipulating Visual Studio projects and solutions from the command line
Integration Tests - We can crank out tests much quicker and cleaner using Ruby than C++
QA's entire testing suite is written in Ruby
Ruby is basically my go to tool for where it makes sense. And it makes sense in a lot of places.
Google Sketchup uses Ruby as an embedded scripting language. You can use it to perform all sorts of 3d modeling and import/export tasks. The scripting works with the free version and there's even decent documentation.
Ruby with a homebrew extension written in C++ does all the heavy pixel pushing for my photography processing. I was using Python+numpy but when doing artsy stuff, Ruby is just more fun. Also the relative lack of, or lesser maturity of, good image processing libraries makes me feel less like i'm reinventing wheels. I am clueless about Rails, other than i've heard of it, have a fuzzy idea what it is, and actually have a book on it (unopened)
We use Watir (Ruby library) to test our .net web application.
Check out Shoes, a simple API for building GUIs in Ruby aimed at novice programmers.
Or you could use Ruby to make music ala Giles Bowkett's Archaeopteryx. This presentation by Giles about Archaeopteryx is one of the best presentations ever. I highly recommend it.
RubyCocoa and MacRuby. Possible to make full Cocoa-based GUI apps without Rails. And then you get to use Interface Builder, too.
I worked on a museum project last year that used a lot of Ruby. (http://http://ourspace.tepapa.com/home)
The part that I spent most of my time on was an interactive floor map. The Map on the floor has sensors so when people walk on it lights are triggered and displays in the wall show images or videos and audio tracks are played.
All the control code for this part of the exhibit is ruby. I wrote C interfaces with ruby wrappers to communicate with the floor sensors and the lighting controllers. The system queries a MYSQL database for the media files to be displayed and then tells computers in the walls to play the media via UDP.
It's the most reliable part of the entire exhibit.
Ruby was used for the other major part of the exhibit, the Wall though I didn't have much to do with that. Most of the graphics were prototyped in ruby using interfaces to OpenGL, a bit of Cocoa and a physics library before being ported to pure Obj-C.
Puppet and Chef: DevOps
I didn't see a mention of Puppet or Chef in the 30 answers that preceded my arrival. Ruby appears to dominate current work in cloud automation and is the base, extension, and templating language of these two big players. They are used primarily to distribute system and application configuration information for server arrays and for general IT workstation management.
The DevOps field is quite Ruby-aware. Today, Perl has a competitor. While a really simple script may often still be written directly for sh(1), a complex task now might be done in Ruby rather than Perl.
The only site I've done with Ruby at work is using Rails, but I'd like to try Merb.
Other than that I do a lot of little utility programs in Ruby - for instance an app that reads RSS feeds and imports new posts into a dabase.
It's fun, so I also write some dumb stuff just because it's so quick. Yesterday I wrote an app to play the Monty Hall problem 100,000 times to help a friend convince her professor that switching is the correct strategy.
I almost take insult that ruby is a rails thing. It is like back when CGI was the latest trend and everyone figured that if you knew perl you must be doing it only because you programmed CGI apps. Ruby is just a scripting language for me, although not as mature as python so I somewhat regret having to jump through some of its hoops and recent changes, I still like it and use it. Although I work in a java shop and therefore groovy is the ideal choice for a scripting language, I still use ruby at home and for throw away scripts that aren't needed to be shared at work.
I was considering getting into RoR from all the buzz and how quick/simple it is, but after looking over rails I didn't see anything at all that was amazing or even the least bit innovative or rapidly fast about its development compared to any other framework. The only benefit I saw was that I could code in ruby, which would be nice, but initial setup, server maintenance and scaling is more difficult, thus re-offsetting the pleasure of coding in ruby.
I created a presentation -- coincidentally named Off The Rails -- to discuss Rack-based web applications:
https://github.com/alexch/Off-The-Rails
The git repo includes slides in Markdown format and sample code (in the form of running applications and middleware). Here's the abstract:
Ruby on Rails is the most popular web application framework for Ruby. But it's not the only one! If you think Rails is too big, or too opinionated, or too anything, you might be happy to learn about the new generation of so-called microframeworks built on Rack. And since Rails 3 is itself a Rack app, you don't have to give up Rails to get the benefit of Sinatra routes or Grape APIs.
And here are some references:
This talk lives at https://github.com/alexch/off-the-rails
Yehuda's #10 Favorite Thing About Ruby
Rack
rack-test
rack-client
Sinatra
Grape
Vegas
Siesta
Rerun
Hope you find it useful!
I'm mostly a Web developer, and I learned Ruby to use Rails, but I like the language so much that I started developing a desktop Swing application in Ruby, using JRuby and Monkeybars. I'm competent in Java, but don't much like using it, and the Swing API is horrible, so putting Ruby on top has been a big win.
We mainly use rails, but we have plenty of other non-rails ruby things - for example a standalone authentication daemon thing for centralized authentication of users, and an 'image processing server' which runs arbitrary numbers of ruby processes to process images in parallel.
Oh, and don't forget good old Rake :-)
Ruby is also used for Desktop application. Especially the use of JRuby to develop Swing desktop application.
I've used Ruby at work for
A data extractor, generating csv files from binary output.
A .ini file generator, turning a simple syntax into a repetitive .ini format.
A simple TCP/IP server, acting as stand-in for the customer's system during testing.
We use Ruby to implement our test automation software. This includes test framework and driver code for Selenium RC, WATIR and AutoIT.
Ruby is powerful enough to create comprehensive applications that can interface with Test tools like Selenium or WATIR, while at the same time reading from data files, interacting with a remote Windows UI and performing near transparent network communication. All while running on Windows or Linux.
The uncluttered syntax makes it ideal for new and inexperienced programmers to read. While its totally OO nature makes it easy for these same programmers to apply good (recently learned) OO techniques, from the start.
The flexible nature of Ruby's syntax also makes the use and creation of DSLs much easier. This allows less-technical people to get invovled, read and possibly create there own tests.
I have used Ruby for code generation of C# and T-SQL stored procedures in a project with unstable requirements. The data model was encoded in a YAML file and .erb templates were used for the classes and stored procedures. It also allowed for a much more DRY solution than would have been possible with straight C# as repetitve code could be factored out into a single method in the code generator.
Where I work, we use Ruby to do a number of different one-off type batch jobs. One example of that is a job that interacts with Amazon's S3 service. At the time, the Ruby S3 library was probably the easiest one out there for us to get up and running in a short amount of time.
I wrote an order processing expert system (see DSL answer as well), converted 100k lines of customer specific perl into about 10k lines of ruby handling dozens of customers. No web components at all, no Rails.
I am a webdriver user. ruby is used by webdriver for automating the build process thanks to rake. see http://code.google.com/p/webdriver/ for details
Heh, great question.
I used Ruby to convert Excel spreadsheet airport facility data to sqlite3 for the android phone platform while making an app for pilots.
I use Ruby with Sinatra which is much simpler than Rails. I did use Rails but just found that it has turned into a bit of a monster, although Rails is still amazing compared to web frameworks available for Java.
The main feature of Ruby that I love however is "eval" and "method_missing", which Rails actually uses for example in ActiveRecord so that you can use the amazing "find_by-field-name-" queries.
I used Ruby for a lot of back-end code simply because I was the only person who was tasked to do it and needed a nice clean language that allowed me to be very productive and write easy to maintain code. I find Ruby allows me to do that easier than Perl and Python. Other people's mileage might vary on that but it works well for me.
Besides that, I like how Sequel and Nokogiri work. I also used ActiveRecord for a while separately from Rails.
We use some Ruby for file manipulation but have not been able to incorporate rails yet.
I've used Ruby a lot professionally for quick scripts for things like shuffling files around. I'm the same way in that I was using Ruby first before touching Rails at all.
In Boulder there was an excellent group of Ruby users who met monthly. This point was made - that Ruby does have an existence beside its use in Rails. Plain Ruby users do exist, are begging for attention, have neat things to show, and can find each other at user group meetings.
They also had better pizza than the Python group, who met also the same day of the month. Can only pick one...
While we do have several Rails apps at work, we also use Ruby for some fairly intensive non-web stuff.
We've got an SMS delivery daemon, which pulls messages from a queue and then delivers them, and credit card processing daemon which other apps can call out to, which makes sure there's a central audit trail.

Resources