Issue parsing web page data from twitter for dashing ruby app - ruby

I think my issue is the same as that in Having problems with Ruby file from Dashing which as to date no answer.
Full problem is when I start dashing I get.
scheduler caught exception:
undefined method `[]' for nil:NilClass
/home/bhladmin/Shopify-dashing-e672d84/dashboard/jobs/twitter_user.rb:19:in `block in <top (required)>'
/usr/lib64/ruby/gems/1.9.1/gems/rufus-scheduler-2.0.23/lib/rufus/sc/jobs.rb:230:in `call'
/usr/lib64/ruby/gems/1.9.1/gems/rufus-scheduler-2.0.23/lib/rufus/sc/jobs.rb:230:in `trigger_block'
/usr/lib64/ruby/gems/1.9.1/gems/rufus-scheduler-2.0.23/lib/rufus/sc/jobs.rb:204:in `block in trigger'
/usr/lib64/ruby/gems/1.9.1/gems/rufus-scheduler-2.0.23/lib/rufus/sc/scheduler.rb:430:in `call'
/usr/lib64/ruby/gems/1.9.1/gems/rufus-scheduler-2.0.23/lib/rufus/sc/scheduler.rb:430:in `block in trigger_job'
Something isn't right on line 19, but I can't work out what...
The full section of code is below...
#!/usr/bin/env ruby
require 'net/http'
# Track public available information of a twitter user like follower, follower
# and tweet count by scraping the user profile page.
# Config
# ------
twitter_username = ENV['TWITTER_USERNAME'] || 'foobugs'
SCHEDULER.every '2m', :first_in => 0 do |job|
http = Net::HTTP.new("twitter.com", Net::HTTP.https_default_port())
http.use_ssl = true
response = http.request(Net::HTTP::Get.new("/#{twitter_username}"))
if response.code != "200"
puts "twitter communication error (status-code: #{response.code})\n#{response.body}"
else
tweets = /profile["']>[\n\t\s]*<strong>([\d.,]+)/.match(response.body)[1].delete('.,').to_i
following = /following["']>[\n\t\s]*<strong>([\d.,]+)/.match(response.body)[1].delete('.,').to_i
followers = /followers["']>[\n\t\s]*<strong>([\d.,]+)/.match(response.body)[1].delete('.,').to_i
send_event('twitter_user_tweets', current: tweets)
send_event('twitter_user_followers', current: followers)
send_event('twitter_user_following', current: following)
end
end
From the previous question it looks like the way of extracting the data from the webpage is the problem, but I don't know Ruby well enough. I've tried removing the ENV['TWITTER_USERNAME'] section to make sure the username I used (not the one above) is being used. If I dump out the raw html data then it contains the info I'm searching for so I know that part is working.

I think I've solved this myself, by going about it a different way. I've changed the code to use the twitter API rather than page scraping. Details below... The auth checking and timeout isn't great so if anyone has hints on making that better they'd be welcome...
#### Get your twitter keys & secrets:
#### https://dev.twitter.com/docs/auth/tokens-devtwittercom
Twitter.configure do |config|
config.consumer_key = 'YOUR_CONSUMER_KEY'
config.consumer_secret = 'YOUR_CONSUMER_SECRET'
config.oauth_token = 'YOUR_OAUTH_TOKEN'
config.oauth_token_secret = 'YOUR_OAUTH_SECRET'
end
twitter_username = 'foobugs'
MAX_USER_ATTEMPTS = 10
user_attempts = 0
SCHEDULER.every '10m', :first_in => 0 do |job|
begin
tw_user = Twitter.user("#{twitter_username}")
if tw_user
tweets = tw_user.statuses_count
followers = tw_user.followers_count
following = tw_user.friends_count
send_event('twitter_user_tweets', current: tweets)
send_event('twitter_user_followers', current: followers)
send_event('twitter_user_following', current: following)
end
rescue Twitter::Error => e
user_attempts = user_attempts +1
puts "Twitter error #{e}"
puts "\e[33mFor the twitter_user widget to work, you need to put in your twitter API keys in the jobs/twitter_user.rb file.\e[0m"
sleep 5
retry if(user_attempts < MAX_USER_ATTEMPTS)
end
end

Related

ZAP automation :undefined method `[]' for nil:NilClass (NoMethodError)

ZAP automation :undefined method `[]' for nil:NilClass (NoMethodError)
I am getting the above error while trying to get the response of zap using ruby. below is my code:
Then(/^I should be able to see security warnings$/) do
#Get response from via RestClient framework method.
begin
response = JSON.parse RestClient.get "http://#{$zap_proxy}:#{$zap_proxy_port}/json/core/view/alerts"
rescue RestClient::ServerBrokeConnection
#Classify the alerts
events = response['alerts']
high_risks = events.select{|x| x['risk'] == 'High'}
high_count = high_risks.size
medium_count = events.select{|x| x['risk'] == 'Medium'}.size
low_count = events.select{|x| x['risk'] == 'Low'}.size
informational_count = events.select{|x| x['risk'] == 'Informational'}.size
end
#Check high alert count and print them
if high_count > 0
high_risks.each { |x| p x['alert'] }
end
#Expect high alert count equal to 0
expect(high_count).to eq 0
#Print alerts with risk levels
site = Capybara.app_host
response = JSON.parse RestClient.get "http://#{$zap_proxy}:#{$zap_proxy_port}/json/core/view/alerts",
params: { zapapiformat: 'JSON', baseurl: site }
response['alerts'].each { |x| p "#{x['alert']} risk level: #{x['risk']}"}
end
some one please help me. my intention is to print the security alerts and display them on my command prompt
I think you have nil value in events and you try to get value x['...'] from nil .
it would take a little more detail including the line.
edit:
try events = response['alerts'].reject { |x| x.nil? }

In RoR, how do I catch an exception if I get no response from a server?

I’m using Rails 4.2.3 and Nokogiri to get data from a web site. I want to perform an action when I don’t get any response from the server, so I have:
begin
content = open(url).read
if content.lstrip[0] == '<'
doc = Nokogiri::HTML(content)
else
begin
json = JSON.parse(content)
rescue JSON::ParserError => e
content
end
end
rescue Net::OpenTimeout => e
attempts = attempts + 1
if attempts <= max_attempts
sleep(3)
retry
end
end
Note that this is different than getting a 500 from the server. I only want to retry when I get no response at all, either because I get no TCP connection or because the server fails to respond (or some other reason that causes me not to get any response). Is there a more generic way to take account of this situation other than how I have it? I feel like there are a lot of other exception types I’m not thinking of.
This is generic sample how you can define timeout durations for HTTP connection, and perform several retries in case of any error while fetching content (edited)
require 'open-uri'
require 'nokogiri'
url = "http://localhost:3000/r503"
openuri_params = {
# set timeout durations for HTTP connection
# default values for open_timeout and read_timeout is 60 seconds
:open_timeout => 1,
:read_timeout => 1,
}
attempt_count = 0
max_attempts = 3
begin
attempt_count += 1
puts "attempt ##{attempt_count}"
content = open(url, openuri_params).read
rescue OpenURI::HTTPError => e
# it's 404, etc. (do nothing)
rescue SocketError, Net::ReadTimeout => e
# server can't be reached or doesn't send any respones
puts "error: #{e}"
sleep 3
retry if attempt_count < max_attempts
else
# connection was successful,
# content is fetched,
# so here we can parse content with Nokogiri,
# or call a helper method, etc.
doc = Nokogiri::HTML(content)
p doc
end
When it comes to rescuing exceptions, you should aim to have a clear understanding of:
Which lines in your system can raise exceptions
What is going on under the hood when those lines of code run
What specific exceptions could be raised by the underlying code
In your code, the line that's fetching the content is also the one that could see network errors:
content = open(url).read
If you go to the documentation for the OpenURI module you'll see that it uses Net::HTTP & friends to get the content of arbitrary URIs.
Figuring out what Net::HTTP can raise is actually very complicated but, thankfully, others have already done this work for you. Thoughtbot's suspenders project has lists of common network errors that you can use. Notice that some of those errors have to do with different network conditions than what you had in mind, like the connection being reset. I think it's worth rescuing those as well, but feel free to trim the list down to your specific needs.
So here's what your code should look like (skipping the Nokogiri and JSON parts to simplify things a bit):
require 'net/http'
require 'open-uri'
HTTP_ERRORS = [
EOFError,
Errno::ECONNRESET,
Errno::EINVAL,
Net::HTTPBadResponse,
Net::HTTPHeaderSyntaxError,
Net::ProtocolError,
Timeout::Error,
]
MAX_RETRIES = 3
attempts = 0
begin
content = open(url).read
rescue *HTTP_ERRORS => e
if attempts < MAX_RETRIES
attempts += 1
sleep(2)
retry
else
raise e
end
end
I would think about using a Timeout that raises an exception after a short period:
MAX_RESPONSE_TIME = 2 # seconds
begin
content = nil # needs to be defined before the following block
Timeout.timeout(MAX_RESPONSE_TIME) do
content = open(url).read
end
# parsing `content`
rescue Timeout::Error => e
attempts += 1
if attempts <= max_attempts
sleep(3)
retry
end
end

ruby net/http `read_body': Net::HTTPOK#read_body called twice (IOError)

I'm getting read_body called twice (IOError) using the net/http library. I'm trying to download files and use http sessions efficiently. Looking for some help or advice to fix my issues. From my debug message it appears when I log the response code, readbody=true. Is that why read_body is read twice when I try to write the large file in chunks?
D, [2015-04-12T21:17:46.954928 #24741] DEBUG -- : #<Net::HTTPOK 200 OK readbody=true>
I, [2015-04-12T21:17:46.955060 #24741] INFO -- : file found at http://hidden:8080/job/project/1/maven-repository/repository/org/project/service/1/service-1.zip.md5
/usr/lib/ruby/2.2.0/net/http/response.rb:195:in `read_body': Net::HTTPOK#read_body called twice (IOError)
from ./deploy_application.rb:36:in `block in get_file'
from ./deploy_application.rb:35:in `open'
from ./deploy_application.rb:35:in `get_file'
from ./deploy_application.rb:59:in `block in <main>'
from ./deploy_application.rb:58:in `each'
from ./deploy_application.rb:58:in `<main>'
require 'net/http'
require 'logger'
STAMP = Time.now.utc.to_i
#log = Logger.new(STDOUT)
# project , build, service remove variables above
project = "project"
build = "1"
service = "service"
version = "1"
BASE_URI = URI("http://hidden:8080/job/#{project}/#{build}/maven-repository/repository/org/#{service}/#{version}/")
# file pattern for application is zip / jar. Hopefully the lib in the zipfile is acceptable.
# example for module download /#{service}/#{version}.zip /#{service}/#{version}.zip.md5 /#{service}/#{version}.jar /#{service}/#{version}.jar.md5
def clean_exit(code)
# remove temp files on exit
end
def get_file(file)
puts BASE_URI
uri = URI.join(BASE_URI,file)
#log.debug(uri)
request = Net::HTTP::Get.new uri #.request_uri
#log.debug(request)
response = #http.request request
#log.debug(response)
case response
when Net::HTTPOK
size = 0
progress = 0
total = response.header["Content-Length"].to_i
#log.info("file found at #{uri}")
# need to handle file open error
Dir.mkdir "/tmp/#{STAMP}"
File.open "/tmp/#{STAMP}/#{file}", 'wb' do |io|
response.read_body do |chunk|
size += chunk.size
new_progress = (size * 100) / total
unless new_progress == progress
#log.info("\rDownloading %s (%3d%%) " % [file, new_progress])
end
progress = new_progress
io.write chunk
end
end
when 404
#log.error("maven repository file #{uri} not found")
exit 4
when 500...600
#log.error("error getting #{uri}, server returned #{response.code}")
exit 5
else
#log.error("unknown http response code #{response.code}")
end
end
#http = Net::HTTP.new(BASE_URI.host, BASE_URI.port)
files = [ "#{service}-#{version}.zip.md5", "#{service}-#{version}.jar", "#{service}-#{version}.jar.md5" ].each do |file| #"#{service}-#{version}.zip",
get_file(file)
end
Edit: Revised answer!
Net::HTTP#request, when called without a block, will pre-emptively read the body. The documentation isn't clear about this, but it hints at it by suggesting that the body is not read if a block is passed.
If you want to make the request without reading the body, you'll need to pass a block to the request call, and then read the body from within that. That is, you want something like this:
#http.request request do |response|
# ...
response.read_body do |chunk|
# ...
end
end
This is made clear in the implementation; Response#reading_body will first yield the unread response to a block if given (from #transport_request, which is called from #request), then read the body unconditionally. The block parameter to #request gives you that chance to intercept the response before the body is read.

Having problems with Ruby file from Dashing

I am having trouble with twitter_user.rb, which is supposed to get the number of tweets, followers, and following of a given Twitter username.
I assume that I am supposed to replace TWITTER_USERNAME in line 9 with the Twitter username that I am interested in. I did that and started dashing but I got:
scheduler caught exception:
undefined method '[]' for nil:NilClass
/.../jobs/twitter_user.rb:19:in 'block in <top (required)>'
It looks like the problem is with line 19 which is:
tweets = /profile["']>[\n\t\s]*<strong>([\d.,]+)/.match(response.body)[1].delete('.,').to_i
Can anybody tell me what is going on and how to fix it?
Your assumption is incorrect. The program is looking for an environment variable called TWITTER_USERNAME that is set to the relevant user name. If that variable doesn't exist then the code uses foobugs instead.
If you would rather modify the code than set up an environment variable, then change
twitter_username = ENV['TWITTER_USERNAME'] || 'foobugs'
to
twitter_username = 'myusername'
This is untested code, but it's a general idea how it should have been written. If you clone the source on the original page you can adjust it for your own purposes (i.e. fix it):
require 'nokogiri'
doc = Nokogiri::XML(content)
tweets = doc.at('profile strong').text.delete('.,').to_i
following = doc.at('following strong').text.delete('.,').to_i
followers = doc.at('followers strong').text.delete('.,').to_i
The above three lines can be reduced to something like:
tweets, following, followers = %w[profile following followers].map{ |tag|
doc.at("#{ tag } strong").text.delete(',.').to_i
}
Again, without a usable sample of the XML/HTML I can't do much more, but as a practice we (programmers) shouldn't use regular expressions to try to parse XML or HTML. It's much to easy to break a pattern with either of those types of files.
I managed to solve the same issue for myself by using the twitter API instead to pull out the relevant information. It seems the web page had changed too much for the scraping to work and it could also stop working again at no notice as various people have already said...
This is the solution I used.
#### Get your twitter keys & secrets:
#### https://dev.twitter.com/docs/auth/tokens-devtwittercom
Twitter.configure do |config|
config.consumer_key = 'YOUR_CONSUMER_KEY'
config.consumer_secret = 'YOUR_CONSUMER_SECRET'
config.oauth_token = 'YOUR_OAUTH_TOKEN'
config.oauth_token_secret = 'YOUR_OAUTH_SECRET'
end
twitter_username = 'foobugs'
MAX_USER_ATTEMPTS = 10
user_attempts = 0
SCHEDULER.every '10m', :first_in => 0 do |job|
begin
tw_user = Twitter.user("#{twitter_username}")
if tw_user
tweets = tw_user.statuses_count
followers = tw_user.followers_count
following = tw_user.friends_count
send_event('twitter_user_tweets', current: tweets)
send_event('twitter_user_followers', current: followers)
send_event('twitter_user_following', current: following)
end
rescue Twitter::Error => e
user_attempts = user_attempts +1
puts "Twitter error #{e}"
puts "\e[33mFor the twitter_user widget to work, you need to put in your twitter API keys in the jobs/twitter_user.rb file.\e[0m"
sleep 5
retry if(user_attempts < MAX_USER_ATTEMPTS)
end
end
I have resolved by substituting this line:
followers = /<strong>([\d.]+)<\/strong> Follower/.match(response.body)[0].delete('.,').to_i
with these two:
followers_count_metadata = /followers_count":[\d]+/.match(response.body)
followers = /[\d]+/.match(followers_count_metadata.to_s).to_s

tailable cursor in mongo db timing out

I am trying to create an oplog watcher in ruby. So far ive come up with a small script below.
require 'rubygems'
require 'mongo'
db = Mongo::Connection.new("localhost", 5151).db("local")
coll = db.collection('oplog.$main')
loop do
cursor = Mongo::Cursor.new(coll, :tailable => true)
while not cursor.closed?
if doc = cursor.next_document
puts doc
else
sleep 1
end
end
end
The problem with this is, after 5 or 6 seconds when it has spit out a lot of data it times out and i get an error
C:/RailsInstaller/Ruby1.8.7/lib/ruby/gems/1.8/gems/mongo-1.4.0/lib/../lib/mongo/connection.rb
:807:in `check_response_flags': Query response returned CURSOR_NOT_FOUND. Either an invalid c
ursor was specified, or the cursor may have timed out on the server. (Mongo::OperationFailure
)
from C:/RailsInstaller/Ruby1.8.7/lib/ruby/gems/1.8/gems/mongo-1.4.0/lib/../lib/mongo/
connection.rb:800:in `receive_response_header'
from C:/RailsInstaller/Ruby1.8.7/lib/ruby/gems/1.8/gems/mongo-1.4.0/lib/../lib/mongo/
connection.rb:768:in `receive'
from C:/RailsInstaller/Ruby1.8.7/lib/ruby/gems/1.8/gems/mongo-1.4.0/lib/../lib/mongo/
connection.rb:493:in `receive_message'
from C:/RailsInstaller/Ruby1.8.7/lib/ruby/gems/1.8/gems/mongo-1.4.0/lib/../lib/mongo/
connection.rb:491:in `synchronize'
from C:/RailsInstaller/Ruby1.8.7/lib/ruby/gems/1.8/gems/mongo-1.4.0/lib/../lib/mongo/
connection.rb:491:in `receive_message'
from C:/RailsInstaller/Ruby1.8.7/lib/ruby/gems/1.8/gems/mongo-1.4.0/lib/../lib/mongo/
cursor.rb:494:in `send_get_more'
from C:/RailsInstaller/Ruby1.8.7/lib/ruby/gems/1.8/gems/mongo-1.4.0/lib/../lib/mongo/
cursor.rb:456:in `refresh'
from C:/RailsInstaller/Ruby1.8.7/lib/ruby/gems/1.8/gems/mongo-1.4.0/lib/../lib/mongo/
cursor.rb:124:in `next_document'
from n.rb:7
from n.rb:6:in `loop'
from n.rb:6
What i dont understand is when i m able to see the actual data how can it suddenly say cursor not found. Im pretty new to ruby and any ideas on what direction i must take will be useful for me.
The solution is that i need to have an exception handling mechanism to capture the exception which is thrown when the cursor reads the last document in a relatively small oplog with an higher number of writes per second. Since the cursor reaches the end of the oplog it would throw an exception that there are no more records.
require 'rubygems'
require 'mongo'
db = Mongo::Connection.new("localhost",5151).db("local")
coll = db.collection('oplog.$main')
loop do
cursor = Mongo::Cursor.new(coll, :timeout => false, :tailable => true)
while not cursor.closed?
begin
if doc = cursor.next_document
puts "Timestamp"
puts doc["ts"]
puts "Record"
puts doc["o"]
puts "Affected Collection"
puts doc["ns"]
end
rescue
puts ""
break
end
end
end
This now works as the exception is been handled. Thanks to the mongodb-user google group for pointing this out to me.

Resources