ThinkingSphinx does not work in RSpec - ruby

I've read posts stating that this is due to transactional fixtures, but it looks like we're not using those. I've read other posts that suggest I do some funky stopping, restarting, and reindexing before each test run. That doesn't seem to help.
ThinkingSphinx runs just fine in the app, but in test it behaves very strangely. After creating a couple of Organization models in my spec, I run a sphinx search. It has the two records that I'm expecting, but also a lot of nils. That's annoying, but I add a :retry_stale => true and those go away, so I'm satisfied with this first query:
(rdb:1) p Organization.search '', :retry_stale => true
[#<Organization id: 1, ein: nil, name: "Allina", created_at: "2012-09-11 15:03:04", updated_at: "2012-09-11 15:03:04", parent_id: nil, main_phone: "7867685187x894", fax: nil, email: nil, url: nil, crm_id: 1, target_market: false, notes: nil, form990_notes: nil, mec_revenue: 750000.0, legacy_id: nil, teaching_program_type_id: nil, sponsorship_type_id: nil, is_member_council_teaching_hospitals: nil, has_teaching_program: nil, is_major_academic_medical_center: nil, does_participate_in_survey: nil, status: "modified", net_revenue: nil, gross_revenue: nil, source_name: nil, source_id: nil, import_concat_key: nil, bigtime_id: nil, is_non_hc: nil, is_client: nil, contact_record_id: nil, form990_legacy_id: nil, number_of_beds: nil, tax_exempt_status: nil, delta: true>, #<Organization id: 2, ein: nil, name: "HealthPartners", created_at: "2012-09-11 15:03:04", updated_at: "2012-09-11 15:03:04", parent_id: nil, main_phone: "3407862693x0935", fax: nil, email: nil, url: nil, crm_id: 1, target_market: false, notes: nil, form990_notes: nil, mec_revenue: nil, legacy_id: nil, teaching_program_type_id: nil, sponsorship_type_id: nil, is_member_council_teaching_hospitals: nil, has_teaching_program: nil, is_major_academic_medical_center: nil, does_participate_in_survey: nil, status: "modified", net_revenue: nil, gross_revenue: nil, source_name: nil, source_id: nil, import_concat_key: nil, bigtime_id: nil, is_non_hc: nil, is_client: nil, contact_record_id: nil, form990_legacy_id: nil, number_of_beds: nil, tax_exempt_status: nil, delta: true>]
But now if I try to search by name, which is most definitely indexed on my model, I don't get my record.
(rdb:1) p Organization.search 'Allina', :retry_stale => true
[]
I tried to modify the options to no avail, so I thought I'd just try searching for single letters in star mode. The results are disturbing.
(rdb:1) p Organization.search 'C', :star => true, :retry_stale => true
[#<Organization id: 1, ein: nil, name: "Allina", created_at: "2012-09-11 15:03:04", updated_at: "2012-09-11 15:03:04", parent_id: nil, main_phone: "7867685187x894", fax: nil, email: nil, url: nil, crm_id: 1, target_market: false, notes: nil, form990_notes: nil, mec_revenue: 750000.0, legacy_id: nil, teaching_program_type_id: nil, sponsorship_type_id: nil, is_member_council_teaching_hospitals: nil, has_teaching_program: nil, is_major_academic_medical_center: nil, does_participate_in_survey: nil, status: "modified", net_revenue: nil, gross_revenue: nil, source_name: nil, source_id: nil, import_concat_key: nil, bigtime_id: nil, is_non_hc: nil, is_client: nil, contact_record_id: nil, form990_legacy_id: nil, number_of_beds: nil, tax_exempt_status: nil, delta: true>]
There is clearly no letter 'C' in the name field. My model also indexes the status field, which is set to modified, and it indexes the state field of the address association. That value is 'CA', so I thought maybe ThinkingSphinx is finding that record with that index. So I thought I'd try to search for 'CA', and guess what happened:
(rdb:1) p Organization.search 'CA', :star => true, :retry_stale => true
[]
Nothing! What is with this wonky behavior? Ok, you guys probably want to see some configs and spec helpers and some index definitions:
config/sphinx.yml
development:
enable_star: 1
min_infix_len: 1
test:
enable_star: 1
min_infix_len: 1
spec/spec_helper.rb
Spork.prefork do
require 'headless'
headless = Headless.new(:display => 99)
headless.start
at_exit do
headless.destroy
end
ENV["RAILS_ENV"] ||= 'test'
require File.expand_path("../../config/environment", __FILE__)
require 'rspec/rails'
require 'capybara/rspec'
require 'database_cleaner'
#require 'rspec/autorun'
Dir[Rails.root.join("spec/support/**/*.rb")].each {|f| require f}
RSpec.configure do |config|
config.use_transactional_fixtures = false
config.treat_symbols_as_metadata_keys_with_true_values = true
config.infer_base_class_for_anonymous_controllers = false
config.include Devise::TestHelpers, :type => :controller
config.extend ControllerMacros, :type => :controller
config.include Warden::Test::Helpers, :type => :request
config.include FactoryGirl::Syntax::Methods
config.before(:suite) do
DatabaseCleaner.strategy = :truncation, {
:except => %w(sponsorship_types teaching_program_types title_groups positions tax_exempt_organization_types)}
end
config.before(:each) do
DatabaseCleaner.start
end
config.after(:each) do
DatabaseCleaner.clean
end
end
end
Spork.each_run do
end
app/models/organization.rb
define_index do
indexes :name, :sortable => true
indexes address(:state), :as => :state
indexes status
set_property :delta => true
end

We wrote Thinking Spec tests in Test Unit. It worked pretty well.
Include the following in your spec/spec_helper.rb
require 'thinking_sphinx/test'
ThinkingSphinx::Test.init
Also, in each test, have ThinkingSphinx::Test.start in the setup and ThinkingSphinx::Test.stop in the teardown

Related

Ruby and RSpec - Test fails when expected output is the same as the method

I know I am missing something simple here. I want to write a test that checks if an array of array has been outputted. The test keeps failing but what the test expects is the same that the method is giving.
connect4.rb
class Board
attr_accessor :board
def make_and_print_board
grid = Array.new(6) { Array.new(6)}
p grid
end
end
connect4_spec.rb
require './lib/connect4'
RSpec.describe Board do
let (:new_board) {Board.new}
it "prints board" do
expect{new_board.make_and_print_board}.to output(
Array.new(6) { Array.new(6)}
).to_stdout
end
end
This is the error...
1) Board prints board
Failure/Error:
expect{new_board.make_and_print_board}.to output(
Array.new(6) { Array.new(6)}
).to_stdout
expected block to output [[nil, nil, nil, nil, nil, nil], [nil, nil, nil, nil, nil, nil], [nil, nil, nil, nil, nil, nil], [nil, nil, nil, nil, nil, nil], [nil, nil, nil, nil, nil, nil], [nil, nil, nil, nil, nil, nil]] to stdout, but output "[nil, nil, nil, nil, nil, nil]\n[nil, nil, nil, nil, nil, nil]\n[nil, nil, nil, nil, nil, nil]\n[nil, nil, nil, nil, nil, nil]\n[nil, nil, nil, nil, nil, nil]\n[nil, nil, nil, nil, nil, nil]\n"
What am I missing here? Why isn't it passing? How can I get this test to pass?
The correct way to write this test is to be verbose about your expectation. Test the exact value of what you expect ii to give. p will output a new line so write this way.
RSpec.describe Board do
let (:new_board) {Board.new}
it 'prints board' do
p_output = "[[nil, nil, nil, nil, nil, nil], [nil, nil, nil, nil, nil, nil], [nil, nil, nil, nil, nil, nil], [nil, nil, nil, nil, nil, nil], [nil, nil, nil, nil, nil, nil], [nil, nil, nil, nil, nil, nil]]\n"
expect{new_board.make_and_print_board}.to output(p_output).to_stdout
end
end
But you might want to add this spec if you care more about the internals:
it 'it outputs a 6 x 6 2d array' do
expect( new_board.make_and_print_board ).to match_array Array.new(6) { Array.new(6)}
end

Classifying array of strings into the closes regex

I have the following array:
a
=> ["http://dominio-1-736865.com/path1",
"http://dominio-2-570941.com/path2",
"http://102.160.194.146/path4",
"http://142.231.2.110",
"http://142.231.2.110/path/inventado",
"http://dominio-3-468658.com/path2",
"http://dominio-3-468658.com/path2/path1",
"http://dominio-3-468658.com/path2/path2",
"http://subdominio.dominio-3-468658.com/path2",
"http://www.dominio-3-468658.com/path2",
"http://este-se-repite.re/AP-448055"]
Then I need to group like this:
fqdns
=> ["dominio-1-736865.com", "dominio-2-570941.com", "102.160.194.146", "142.231.2.110", "dominio-3-468658.com", "subdominio.dominio-3-468658.com", "este-se-repite.re"]
getting this =
["http://dominio-1-736865.com/path1"]
["http://dominio-2-570941.com/path2"]
["http://102.160.194.146/path4"]
["http://142.231.2.110", "http://142.231.2.110/path/inventado"]
["http://dominio-3-468658.com/path2", "http://dominio-3-468658.com/path2/path1", "http://dominio-3-468658.com/path2/path2", "http://www.dominio-3-468658.com/path2"]
["http://subdominio.dominio-3-468658.com/path2"]
["http://este-se-repite.re/AP-448055"]
The problem is with subdominio.dominio-3-468658.com, and dominio3-468658.com, that can be in two but I need to meet only in the one that has the subdomain. how can achieve this in ruby
[25] pry(#<Notifications::Notification>)> a.map{|d| d.match(fqdns[1])}
=> [nil, #<MatchData "dominio-2-570941.com">, nil, nil, nil, nil, nil, nil, nil, nil, nil]
[26] pry(#<Notifications::Notification>)> a.map{|d| d.match(fqdns[0])}
=> [#<MatchData "dominio-1-736865.com">, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil]
[27] pry(#<Notifications::Notification>)> a.map{|d| d.match(fqdns[2])}
=> [nil, nil, #<MatchData "102.160.194.146">, nil, nil, nil, nil, nil, nil, nil, nil]
[28] pry(#<Notifications::Notification>)> a.map{|d| d.match(fqdns[3])}
=> [nil, nil, nil, #<MatchData "142.231.2.110">, #<MatchData "142.231.2.110">, nil, nil, nil, nil, nil, nil]
[29] pry(#<Notifications::Notification>)> a.map{|d| d.match(fqdns[4])}
=> [nil, nil, nil, nil, nil, #<MatchData "dominio-3-468658.com">, #<MatchData "dominio-3-468658.com">, #<MatchData "dominio-3-468658.com">, #<MatchData "dominio-3-468658.com">, #<MatchData "dominio-3-468658.com">, nil]
[30] pry(#<Notifications::Notification>)> a.map{|d| d.match(fqdns[5])}
=> [nil, nil, nil, nil, nil, nil, nil, nil, #<MatchData "subdominio.dominio-3-468658.com">, nil, nil]
[31] pry(#<Notifications::Notification>)> a.map{|d| d.match(fqdns[6])}
=> [nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, #<MatchData "este-se-repite.re">]
No need to add regexes here (and create another problem). Use the right tool for the job: URI parsers.
uris = ["http://dominio-1-736865.com/path1",
"http://dominio-2-570941.com/path2",
"http://102.160.194.146/path4",
"http://142.231.2.110",
"http://142.231.2.110/path/inventado",
"http://dominio-3-468658.com/path2",
"http://dominio-3-468658.com/path2/path1",
"http://dominio-3-468658.com/path2/path2",
"http://subdominio.dominio-3-468658.com/path2",
"http://www.dominio-3-468658.com/path2",
"http://este-se-repite.re/AP-448055"]
require 'uri'
uris.group_by{|u| URI(u).host}.values
# => [
# ["http://dominio-1-736865.com/path1"],
# ["http://dominio-2-570941.com/path2"],
# ["http://102.160.194.146/path4"],
# ["http://142.231.2.110", "http://142.231.2.110/path/inventado"], ["http://dominio-3-468658.com/path2", "http://dominio-3-468658.com/path2/path1", "http://dominio-3-468658.com/path2/path2"],
# ["http://subdominio.dominio-3-468658.com/path2"],
# ["http://www.dominio-3-468658.com/path2"],
# ["http://este-se-repite.re/AP-448055"]
#]
Finally, if you want to put domains with "www." in the same bucket with their naked versions:
uris.group_by{|u| URI(u).host.sub(/^www\./, '')}
=> {"dominio-1-736865.com"=>["http://dominio-1-736865.com/path1"],
"dominio-2-570941.com"=>["http://dominio-2-570941.com/path2"],
"102.160.194.146"=>["http://102.160.194.146/path4"],
"142.231.2.110"=>["http://142.231.2.110", "http://142.231.2.110/path/inventado"],
"dominio-3-468658.com"=>
["http://dominio-3-468658.com/path2", "http://dominio-3-468658.com/path2/path1", "http://dominio-3-468658.com/path2/path2", "http://www.dominio-3-468658.com/path2"],
"subdominio.dominio-3-468658.com"=>["http://subdominio.dominio-3-468658.com/path2"],
"este-se-repite.re"=>["http://este-se-repite.re/AP-448055"]}
You can use Enumerable#group_by :
a.group_by {|url| url.match(/http:\/\/([^\/]*)\/?/)[1] }.values
# ["http://dominio-2-570941.com/path2"],
# ["http://102.160.194.146/path4"],
# ["http://142.231.2.110", "http://142.231.2.110/path/inventado"],
# ["http://dominio-3-468658.com/path2",
# "http://dominio-3-468658.com/path2/path1",
# "http://dominio-3-468658.com/path2/path2"],
# ["http://subdominio.dominio-3-468658.com/path2"],
# ["http://www.dominio-3-468658.com/path2"],
# ["http://este-se-repite.re/AP-448055"]]
Regex explanation (without escaping)
http://([^/]*)/?
http:// matches prefix (same in every address)
([^/]*) captures host part - everything but slash /
/? optional slash ending the address

How to download specific links from a page with Nokogiri

I have a webpage with a list of names (which are regular links). When I click on the names of the first page, this opens up another page which has a list of files for download as links. I want to download only the links that end with fq.qz for all of the page1 links.
To do this I have been trying to use Nokogiri:
require 'nokogiri'
require 'open-uri'
url = 'http://myURL/'
doc = Nokogiri::HTML(open(url))
puts doc.css('li')[2]['href']
doc.traverse do |el|
[el[:src], el[:href]].grep(/\.(fq.gz)$/i).map{|l| URI.join(url, l).to_s}.each do |link|
File.open(File.basename(link),'wb'){|f| f << open(link,'rb').read}
end
end
However, I don't think this opens up each of the page 1 links to get the fq.gz ending files in the next level.
The format of the links I am interested in is:
<td>SLX-7998.blabla.fq.gz</td>
I tried using this code which is heavily adapted from one of the answers below but nothing gets downloaded and I get the array as below
master_page.links_with(:href => /ViewSample/).map {|link| link.click
link = agent.get(agent.page.uri.to_s)
if link.content.include?("fq.gz")
out_file = File.new("downloaded_file", "w")
out_file.puts(agent.get_file(link[:href]))
out_file.close
end
=> [nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil]
This is the basis for a quick search for anchors containing certain sub-strings in the linked-text:
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
foo.fq.gz
bar.fq.gz
baz
EOT
nodes = doc.search('a').select{ |node| node.text[/fq\.gz$/] }
At this point nodes is a NodeSet of Nodes that match the /fq\.gz$/ pattern in their text:
nodes
# => [#(Element:0x3fd9818bda2c {
# name = "a",
# attributes = [
# #(Attr:0x3fd982027060 { name = "href", value = "http://foo" })],
# children = [ #(Text "foo.fq.gz")]
# }),
# #(Element:0x3fd9818bd928 {
# name = "a",
# attributes = [
# #(Attr:0x3fd982035ef8 { name = "href", value = "http://bar" })],
# children = [ #(Text "bar.fq.gz")]
# })]
We can walk through those and extract just the href parameters:
hrefs = nodes.map{ |node| node['href'] }
Resulting in an array that can be iterated over:
hrefs
# => ["http://foo", "http://bar"]
You should be able to figure out the rest.
You sound like you could use Mechanize, which is a tool for automating interaction with web pages that uses Nokogiri as dependency. You could probably do something like this:
require 'mechanize'
$agent = Mechanize.new
master_page = $agent.get("http://master_page")
master_page.search("a.download_list_link") do |download_list_link|
download_list_page = $agent.get(download_list_link[:href])
download_list_page.search("td > a") do |link|
if link.content.include?("fq.gz")
out_file = File.new("downloaded_file", "w")
out_file.puts($agent.get_file(link[:href]))
out_file.close
end
end
end
Some things that I wrote there will depend on the specific names of elements on the pages you're visiting, but I think that the general idea there will solve your problem.
Edit:
Regarding the errors you're getting with an array of nil objects, one problem that I see is that you forgot to close the block:
master_page.links_with(:href => /ViewSample/).map {|link| link.click
...
# no terminating curly brace

Ruby: printing one attribute of user model for each user

I have a bunch of users in my database with these attributes, however, I only want the email address for each user
#<User id: 1, email: "email#yahoo.com", encrypted_password: "", reset_password_token: nil, reset_password_sent_at: nil, remember_created_at: nil, sign_in_count: 0, current_sign_in_at: nil, last_sign_in_at: nil, current_sign_in_ip: nil, last_sign_in_ip: nil, created_at: "2012-09-03 09:14:01", updated_at: "2012-09-03 09:14:01", name: nil, confirmation_token: nil, confirmed_at: nil, confirmation_sent_at: nil, unconfirmed_email: nil, opt_in: nil, invitation_token: nil, invitation_sent_at: nil, invitation_accepted_at: nil, invitation_limit: nil, invited_by_id: nil, invited_by_type: nil>,
In the console, I did
u = User.all
which printed all the users and their attributes.
Now, to get the email address for each i tried
u.each do |f|
f.email
end
but it just printed the whole list of users again, with all their attributes.
Can anyone show me how to print a list of email addresses for all the users, leaving out the other attributes.
Your console will print at the end the result of what you typed.
So if you write u.each { anything }, the console will print the result of the each loop. To print stuff explicitly, you need to use output function (puts, p, pp, print etc)
users = User.all
puts users.map(&:email).join("\n")

RSpec fails when matching two classes, even though the classes are _identical_

I have the following piece of code that checks if the returned video object is the same as the one I put in.
The video object is a Panda Stream object, and referring to their homepage, the Panda::Video.find should return exactly the object I put in with the Panda::Video.create method.
# screencast_spec.rb
before(:each) do
#screencast = Factory.build(:screencast)
#video = Panda::Video.create(:source_url => "http://panda-test-harness-videos.s3.amazonaws.com/panda.mp4")
end
it "should fetch the video with the #video method" do
#screencast.video = #video
#screencast.video.should == #video
end
# screencast.rb
# Returns the panda::video object
def video
Panda::Video.find(self.video_id)
end
# Sets the +video_id+ reference
def video= video_object_or_id
video_object_or_id = video_object_or_id.id unless video_object_or_id.is_a?(Integer)
self.video_id = video_object_or_id
end
The error that RSpec returns is this:
1) Screencast#video should fetch the video with the #video method
Failure/Error: #screencast.video.should == #video
expected: #<Panda::Video id: "dac9bf057c9b667f57096054a64625a1", created_at: "2012/01/29 18:04:07 +0000", updated_at: "2012/01/29 18:04:07 +0000", original_filename: "panda.mp4", source_url: "http://panda-test-harness-videos.s3.amazonaws.com/panda.mp4", duration: nil, height: nil, width: nil, extname: ".mp4", file_size: nil, video_bitrate: nil, audio_bitrate: nil, audio_codec: nil, video_codec: nil, fps: nil, audio_channels: nil, audio_sample_rate: nil, status: "processing", path: "dac9bf057c9b667f57096054a64625a1", cloud_id: "55372b1612963b045f27bb093fed0abb">
got: #<Panda::Video id: "dac9bf057c9b667f57096054a64625a1", created_at: "2012/01/29 18:04:07 +0000", updated_at: "2012/01/29 18:04:07 +0000", original_filename: "panda.mp4", source_url: "http://panda-test-harness-videos.s3.amazonaws.com/panda.mp4", duration: nil, height: nil, width: nil, extname: ".mp4", file_size: nil, video_bitrate: nil, audio_bitrate: nil, audio_codec: nil, video_codec: nil, fps: nil, audio_channels: nil, audio_sample_rate: nil, status: "processing", path: "dac9bf057c9b667f57096054a64625a1", cloud_id: "55372b1612963b045f27bb093fed0abb"> (using ==)
Diff:
# ./spec/models/screencast_spec.rb:26:in `block (3 levels) in <top (required)>'
Note the diff saying that there is no difference between the two objects, so why is the comparison failing?
If any of you have any pointers, or have tried testing the panda stream before I would appreciate all the guidance I can get
ruby provides 3 methods for testing for equality == eql? equals?. when using == you are comparing by object identity. ruby uses the object-id for this. the important thing is that classes can redefine == like it's done for string-comparison.
if you wan't to learn more on this topic i can advise you this blog post: http://www.skorks.com/2009/09/ruby-equality-and-object-comparison/

Resources