How to download specific links from a page with Nokogiri - ruby

I have a webpage with a list of names (which are regular links). When I click on the names of the first page, this opens up another page which has a list of files for download as links. I want to download only the links that end with fq.qz for all of the page1 links.
To do this I have been trying to use Nokogiri:
require 'nokogiri'
require 'open-uri'
url = 'http://myURL/'
doc = Nokogiri::HTML(open(url))
puts doc.css('li')[2]['href']
doc.traverse do |el|
[el[:src], el[:href]].grep(/\.(fq.gz)$/i).map{|l| URI.join(url, l).to_s}.each do |link|
File.open(File.basename(link),'wb'){|f| f << open(link,'rb').read}
end
end
However, I don't think this opens up each of the page 1 links to get the fq.gz ending files in the next level.
The format of the links I am interested in is:
<td>SLX-7998.blabla.fq.gz</td>
I tried using this code which is heavily adapted from one of the answers below but nothing gets downloaded and I get the array as below
master_page.links_with(:href => /ViewSample/).map {|link| link.click
link = agent.get(agent.page.uri.to_s)
if link.content.include?("fq.gz")
out_file = File.new("downloaded_file", "w")
out_file.puts(agent.get_file(link[:href]))
out_file.close
end
=> [nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil]

This is the basis for a quick search for anchors containing certain sub-strings in the linked-text:
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
foo.fq.gz
bar.fq.gz
baz
EOT
nodes = doc.search('a').select{ |node| node.text[/fq\.gz$/] }
At this point nodes is a NodeSet of Nodes that match the /fq\.gz$/ pattern in their text:
nodes
# => [#(Element:0x3fd9818bda2c {
# name = "a",
# attributes = [
# #(Attr:0x3fd982027060 { name = "href", value = "http://foo" })],
# children = [ #(Text "foo.fq.gz")]
# }),
# #(Element:0x3fd9818bd928 {
# name = "a",
# attributes = [
# #(Attr:0x3fd982035ef8 { name = "href", value = "http://bar" })],
# children = [ #(Text "bar.fq.gz")]
# })]
We can walk through those and extract just the href parameters:
hrefs = nodes.map{ |node| node['href'] }
Resulting in an array that can be iterated over:
hrefs
# => ["http://foo", "http://bar"]
You should be able to figure out the rest.

You sound like you could use Mechanize, which is a tool for automating interaction with web pages that uses Nokogiri as dependency. You could probably do something like this:
require 'mechanize'
$agent = Mechanize.new
master_page = $agent.get("http://master_page")
master_page.search("a.download_list_link") do |download_list_link|
download_list_page = $agent.get(download_list_link[:href])
download_list_page.search("td > a") do |link|
if link.content.include?("fq.gz")
out_file = File.new("downloaded_file", "w")
out_file.puts($agent.get_file(link[:href]))
out_file.close
end
end
end
Some things that I wrote there will depend on the specific names of elements on the pages you're visiting, but I think that the general idea there will solve your problem.
Edit:
Regarding the errors you're getting with an array of nil objects, one problem that I see is that you forgot to close the block:
master_page.links_with(:href => /ViewSample/).map {|link| link.click
...
# no terminating curly brace

Related

Ruby and RSpec - Test fails when expected output is the same as the method

I know I am missing something simple here. I want to write a test that checks if an array of array has been outputted. The test keeps failing but what the test expects is the same that the method is giving.
connect4.rb
class Board
attr_accessor :board
def make_and_print_board
grid = Array.new(6) { Array.new(6)}
p grid
end
end
connect4_spec.rb
require './lib/connect4'
RSpec.describe Board do
let (:new_board) {Board.new}
it "prints board" do
expect{new_board.make_and_print_board}.to output(
Array.new(6) { Array.new(6)}
).to_stdout
end
end
This is the error...
1) Board prints board
Failure/Error:
expect{new_board.make_and_print_board}.to output(
Array.new(6) { Array.new(6)}
).to_stdout
expected block to output [[nil, nil, nil, nil, nil, nil], [nil, nil, nil, nil, nil, nil], [nil, nil, nil, nil, nil, nil], [nil, nil, nil, nil, nil, nil], [nil, nil, nil, nil, nil, nil], [nil, nil, nil, nil, nil, nil]] to stdout, but output "[nil, nil, nil, nil, nil, nil]\n[nil, nil, nil, nil, nil, nil]\n[nil, nil, nil, nil, nil, nil]\n[nil, nil, nil, nil, nil, nil]\n[nil, nil, nil, nil, nil, nil]\n[nil, nil, nil, nil, nil, nil]\n"
What am I missing here? Why isn't it passing? How can I get this test to pass?
The correct way to write this test is to be verbose about your expectation. Test the exact value of what you expect ii to give. p will output a new line so write this way.
RSpec.describe Board do
let (:new_board) {Board.new}
it 'prints board' do
p_output = "[[nil, nil, nil, nil, nil, nil], [nil, nil, nil, nil, nil, nil], [nil, nil, nil, nil, nil, nil], [nil, nil, nil, nil, nil, nil], [nil, nil, nil, nil, nil, nil], [nil, nil, nil, nil, nil, nil]]\n"
expect{new_board.make_and_print_board}.to output(p_output).to_stdout
end
end
But you might want to add this spec if you care more about the internals:
it 'it outputs a 6 x 6 2d array' do
expect( new_board.make_and_print_board ).to match_array Array.new(6) { Array.new(6)}
end

Ruby: For a 2D array,why are nested while loops overwriting elements defined previously? [duplicate]

This question already has answers here:
Creating matrix with `Array.new(n, Array.new)`
(2 answers)
Closed 5 years ago.
Context: Im trying to populate a 2D array with while loops ,after witch I want to try and do it with {} block format. The point is to understand how these two syntax structures can do the same thing.
I have been reviewing this code and scouring the internet for the past hour and Ive decided that I'm simply not getting something, but I dont understand what that is.
The outcome should be
=> [["A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8"]
=> ..(Sequentially)..
=>["H1", "H2", "H3", "H4", "H5", "H6", "H7", "H8"]]
The code is as follows:
char= ('A'..'H').to_a
num= (1..8).to_a
arr=Array.new(8,Array.new(8))
x=0
while x <8
y=0
while y < 8
arr[x][y] = char[x] + num[y].to_s
y+=1
end
x+=1
end
arr
Thank you in advance, I appreciate your patience and time.
####Edit####
The source of the confusion was due to a lack of understanding of the reference concept. Referencing allows us, by using the Array.new(n,Array.new(n)) method scheme, to access the values of the nested arrays that share a reference to their data via their parent array. This question is addressed directly here: Creating matrix with `Array.new(n, Array.new)` . Although I thought it was a issue with my while loops, the problem was indeed how I created the matrix.
Your code is not working due to call to reference. Ruby is pass-by-value, but all the values are references. https://stackoverflow.com/a/1872159/3759158
Have a look at the output
2.4.3 :087 > arr = Array.new(8,Array.new(8))
=> [[nil, nil, nil, nil, nil, nil, nil, nil], [nil, nil, nil, nil, nil, nil, nil, nil], [nil, nil, nil, nil, nil, nil, nil, nil], [nil, nil, nil, nil, nil, nil, nil, nil], [nil, nil, nil, nil, nil, nil, nil, nil], [nil, nil, nil, nil, nil, nil, nil, nil], [nil, nil, nil, nil, nil, nil, nil, nil], [nil, nil, nil, nil, nil, nil, nil, nil]]
2.4.3 :088 > arr[0][0] = 'B'
=> "B"
2.4.3 :089 > arr
=> [["B", nil, nil, nil, nil, nil, nil, nil], ["B", nil, nil, nil, nil, nil, nil, nil], ["B", nil, nil, nil, nil, nil, nil, nil], ["B", nil, nil, nil, nil, nil, nil, nil], ["B", nil, nil, nil, nil, nil, nil, nil], ["B", nil, nil, nil, nil, nil, nil, nil], ["B", nil, nil, nil, nil, nil, nil, nil], ["B", nil, nil, nil, nil, nil, nil, nil]]
This is happen because of call by object on array object you can see this in effect by a simple example
a = []
b = a
b << 10
puts a => [10]
and very same thing is happening with your code.
Instead of all that try this :
('A'..'H').map{|alph| (1..8).map{|num| "#{alph}#{num}"}}

Display an icon only for certain input values in Squib

My card game (built with Squib) is based on a CSV file. In this file I have (among others) one column called main.
Here's the content of the column... as you see, a lot of nil:
print data['main']
> [3, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, 0, 0, nil, nil, 0, nil, nil, nil, 0, nil, nil, 0, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, 1, nil, nil, nil, nil, nil, -1, nil, nil, nil, nil, 0, 0, 0, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil]
Desired behavior
For each card (row):
If there is a value in this column, I would like to display the value, with a SVG as a background
If there is no value (nil), I would like to skip the text and the svg alltogether
What I tried
1 - I tried the following, but data['sprint'] is an array not a value, so the == does not work:
data = Squib.csv file: 'data.csv'
[...]
if data['Sprint'] != nil
text str: data['Sprint3'], layout: 'sprint3'
svg layout: 'block', file: 'svg\left_block.svg'
end
2 - So I attempted to iterate through the array with an each method, but this of course leads to displaying every element of the array on each card:
data['Main'].each do |n|
if n != nil
text str: n, layout: 'main'
svg layout: 'block', file: 'svg\up_block.svg'
end
end
My Ruby knowledge is here at its end. I have no idea how to display one element only when the value in the main column is not nil. Any idea? Thanks!
Squib's svg works the way you want it to - if it gets nil for file it won't do anything. So in your layout file just put:
block:
file: 'svg\up_block.svg'
And then the svg call does NOT have that "file" option (that sets it for all cards no matter what)
When layout is nil, it'll default to the svg method's "file", which is nil, which happens to do nothing - that's what you want. So then your data field could be something like ['block', nil, nil, nil] and would only show up on the first card.
Check out some new articles I've written on the topic:
http://squib.readthedocs.org/en/latest/arrays.html
http://squib.readthedocs.org/en/latest/guides/getting-started/part_2_iconography.html
(work in progress, of course)
Also, these can also be helpful: https://github.com/andymeneely/squib/blob/master/samples/ranges.rb
That if-statement reminds me how nanDeck would do it - not the same way that Squib works. You COULD do it with an "each", or a "select", but that's more complicated than it needs to be.
(full disclosure: I am the developer of Squib)

How can I set elements of a two dimensional array in ruby?

I tried:
1.9.3-p448 :046 > a=Array.new(7){Array.new(7)}
=> [[nil, nil, nil, nil, nil, nil, nil],
[nil, nil, nil, nil, nil, nil, nil],
[nil, nil, nil, nil, nil, nil, nil],
[nil, nil, nil, nil, nil, nil, nil],
[nil, nil, nil, nil, nil, nil, nil],
[nil, nil, nil, nil, nil, nil, nil],
[nil, nil, nil, nil, nil, nil, nil]]
1.9.3-p448 :047 > a[0,0]='a'
=> "a"
1.9.3-p448 :048 > a[0,1]='b'
=> "b"
1.9.3-p448 :049 > a[0,2]='c'
=> "c"
1.9.3-p448 :050 > a[1,0]='d'
=> "d"
1.9.3-p448 :051 > a[1,1]='e'
=> "e"
1.9.3-p448 :052 > a[1,2]='f'
=> "f"
and I got:
1.9.3-p448 :053 > a
=> ["c", "f", [nil, nil, nil, nil, nil, nil, nil],
[nil, nil, nil, nil, nil, nil, nil],
[nil, nil, nil, nil, nil, nil, nil],
[nil, nil, nil, nil, nil, nil, nil],
[nil, nil, nil, nil, nil, nil, nil]]
but I wanted
1.9.3-p448 :053 > a
=> ["a","b","c",nil,nil,nil], ["d","e","f", nil, nil, nil, nil],
[nil, nil, nil, nil, nil, nil, nil],
[nil, nil, nil, nil, nil, nil, nil],
[nil, nil, nil, nil, nil, nil, nil],
[nil, nil, nil, nil, nil, nil, nil]]
In Ruby, as #Daniel points out, accessing multidimensional array elements is done as it is done in, for example, C.
The notation you're attempting to use is from, for example, Pascal, but doesn't work the way you think it does in Ruby. What it does in Ruby is give a start index and a count.
So if you have:
a = ['a','b','c','d','e','f']
Then a[2,3] will be:
['c','d','e']
This is described in the Ruby Array class documentation. If you attempt to assign to it, Ruby will dynamically change the array accordingly. In the above example, if I do this:
a[2,3] = 'h'
Then a will become:
['a','b','h','f']
Or if I do this:
a[2,0] = 'j'
Ruby inserts a value at position 2 and now I get:
['a','b','j','h','f']
In other words, assigning a value to a[2,3] replaced the subarray of three values with whatever I assigned to it. In the case of a two-dimensional array, such as in the original example,
a[0,0] = 'a' # Inserts a new first row of array with value 'a'
a[0,1] = 'b' # Replaces the entire first row of array with 'b'
a[0,2] = 'c' # Replaces the entire first two rows of array with 'c'
a[1,0] = 'd' # Inserts a new first row of array with value 'd'
a[1,1] = 'e' # Replaces the entire second row of array with 'e'
a[1,2] = 'f' # Replaces the entire second and third rows of array with 'f'
Thus, you get the result that you see.
You're currently assigning letters to a range of the outer array. This is the syntax to reference the inner arrays:
a[0][0]='a'

ThinkingSphinx does not work in RSpec

I've read posts stating that this is due to transactional fixtures, but it looks like we're not using those. I've read other posts that suggest I do some funky stopping, restarting, and reindexing before each test run. That doesn't seem to help.
ThinkingSphinx runs just fine in the app, but in test it behaves very strangely. After creating a couple of Organization models in my spec, I run a sphinx search. It has the two records that I'm expecting, but also a lot of nils. That's annoying, but I add a :retry_stale => true and those go away, so I'm satisfied with this first query:
(rdb:1) p Organization.search '', :retry_stale => true
[#<Organization id: 1, ein: nil, name: "Allina", created_at: "2012-09-11 15:03:04", updated_at: "2012-09-11 15:03:04", parent_id: nil, main_phone: "7867685187x894", fax: nil, email: nil, url: nil, crm_id: 1, target_market: false, notes: nil, form990_notes: nil, mec_revenue: 750000.0, legacy_id: nil, teaching_program_type_id: nil, sponsorship_type_id: nil, is_member_council_teaching_hospitals: nil, has_teaching_program: nil, is_major_academic_medical_center: nil, does_participate_in_survey: nil, status: "modified", net_revenue: nil, gross_revenue: nil, source_name: nil, source_id: nil, import_concat_key: nil, bigtime_id: nil, is_non_hc: nil, is_client: nil, contact_record_id: nil, form990_legacy_id: nil, number_of_beds: nil, tax_exempt_status: nil, delta: true>, #<Organization id: 2, ein: nil, name: "HealthPartners", created_at: "2012-09-11 15:03:04", updated_at: "2012-09-11 15:03:04", parent_id: nil, main_phone: "3407862693x0935", fax: nil, email: nil, url: nil, crm_id: 1, target_market: false, notes: nil, form990_notes: nil, mec_revenue: nil, legacy_id: nil, teaching_program_type_id: nil, sponsorship_type_id: nil, is_member_council_teaching_hospitals: nil, has_teaching_program: nil, is_major_academic_medical_center: nil, does_participate_in_survey: nil, status: "modified", net_revenue: nil, gross_revenue: nil, source_name: nil, source_id: nil, import_concat_key: nil, bigtime_id: nil, is_non_hc: nil, is_client: nil, contact_record_id: nil, form990_legacy_id: nil, number_of_beds: nil, tax_exempt_status: nil, delta: true>]
But now if I try to search by name, which is most definitely indexed on my model, I don't get my record.
(rdb:1) p Organization.search 'Allina', :retry_stale => true
[]
I tried to modify the options to no avail, so I thought I'd just try searching for single letters in star mode. The results are disturbing.
(rdb:1) p Organization.search 'C', :star => true, :retry_stale => true
[#<Organization id: 1, ein: nil, name: "Allina", created_at: "2012-09-11 15:03:04", updated_at: "2012-09-11 15:03:04", parent_id: nil, main_phone: "7867685187x894", fax: nil, email: nil, url: nil, crm_id: 1, target_market: false, notes: nil, form990_notes: nil, mec_revenue: 750000.0, legacy_id: nil, teaching_program_type_id: nil, sponsorship_type_id: nil, is_member_council_teaching_hospitals: nil, has_teaching_program: nil, is_major_academic_medical_center: nil, does_participate_in_survey: nil, status: "modified", net_revenue: nil, gross_revenue: nil, source_name: nil, source_id: nil, import_concat_key: nil, bigtime_id: nil, is_non_hc: nil, is_client: nil, contact_record_id: nil, form990_legacy_id: nil, number_of_beds: nil, tax_exempt_status: nil, delta: true>]
There is clearly no letter 'C' in the name field. My model also indexes the status field, which is set to modified, and it indexes the state field of the address association. That value is 'CA', so I thought maybe ThinkingSphinx is finding that record with that index. So I thought I'd try to search for 'CA', and guess what happened:
(rdb:1) p Organization.search 'CA', :star => true, :retry_stale => true
[]
Nothing! What is with this wonky behavior? Ok, you guys probably want to see some configs and spec helpers and some index definitions:
config/sphinx.yml
development:
enable_star: 1
min_infix_len: 1
test:
enable_star: 1
min_infix_len: 1
spec/spec_helper.rb
Spork.prefork do
require 'headless'
headless = Headless.new(:display => 99)
headless.start
at_exit do
headless.destroy
end
ENV["RAILS_ENV"] ||= 'test'
require File.expand_path("../../config/environment", __FILE__)
require 'rspec/rails'
require 'capybara/rspec'
require 'database_cleaner'
#require 'rspec/autorun'
Dir[Rails.root.join("spec/support/**/*.rb")].each {|f| require f}
RSpec.configure do |config|
config.use_transactional_fixtures = false
config.treat_symbols_as_metadata_keys_with_true_values = true
config.infer_base_class_for_anonymous_controllers = false
config.include Devise::TestHelpers, :type => :controller
config.extend ControllerMacros, :type => :controller
config.include Warden::Test::Helpers, :type => :request
config.include FactoryGirl::Syntax::Methods
config.before(:suite) do
DatabaseCleaner.strategy = :truncation, {
:except => %w(sponsorship_types teaching_program_types title_groups positions tax_exempt_organization_types)}
end
config.before(:each) do
DatabaseCleaner.start
end
config.after(:each) do
DatabaseCleaner.clean
end
end
end
Spork.each_run do
end
app/models/organization.rb
define_index do
indexes :name, :sortable => true
indexes address(:state), :as => :state
indexes status
set_property :delta => true
end
We wrote Thinking Spec tests in Test Unit. It worked pretty well.
Include the following in your spec/spec_helper.rb
require 'thinking_sphinx/test'
ThinkingSphinx::Test.init
Also, in each test, have ThinkingSphinx::Test.start in the setup and ThinkingSphinx::Test.stop in the teardown

Resources