How to get Feedjira to work with HTTPS feeds? - ruby

For instance, the following example code returns 0:
require 'feedjira'
feed_parsed = Feedjira::Feed.fetch_and_parse("https://news.yahoo.com/rss/topstories")
puts feed_parsed

Set ssl_verify_peer to false and that successfully accesses the file. For instance:
require 'feedjira'
feed_parsed = Feedjira::Feed.fetch_and_parse("https://news.yahoo.com/rss/topstories", {:ssl_verify_peer => false})
puts feed_parsed

Related

Rejecting info from being stored in a file

I have a working program that searches Google using Mechanize, however when the program searches Google it also pulls sites that look something like http://webcache.googleusercontent.com/.
I would like to reject that site from being stored in the file. All the sites' URLs are structured differently.
Source code:
require 'mechanize'
PATH = Dir.pwd
SEARCH = "test"
def info(input)
puts "[INFO]#{input}"
end
def get_urls
info("Searching for sites.")
agent = Mechanize.new
page = agent.get('http://www.google.com/')
google_form = page.form('f')
google_form.q = "#{SEARCH}"
url = agent.submit(google_form, google_form.buttons.first)
url.links.each do |link|
if link.href.to_s =~ /url.q/
str = link.href.to_s
str_list = str.split(%r{=|&})
urls_to_log = str_list[1]
success("Site found: #{urls_to_log}")
File.open("#{PATH}/temp/sites.txt", "a+") {|s| s.puts("#{urls_to_log}")}
end
end
info("Sites dumped into #{PATH}/temp/sites.txt")
end
get_urls
Text file:
http://www.speedtest.net/
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:M47_v0xF3m8J
http://www.speedtest.net/%252Btest%26gbv%3D1%26%26ct%3Dclnk
http://www.speedtest.net/results.php
http://www.speedtest.net/mobile/
http://www.speedtest.net/about.php
https://support.speedtest.net/
https://en.wikipedia.org/wiki/Test
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:R94CAo00wOYJ
https://en.wikipedia.org/wiki/Test%252Btest%26gbv%3D1%26%26ct%3Dclnk
https://www.test.com/
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:S92tylTr1V8J
https://www.test.com/%252Btest%26gbv%3D1%26%26ct%3Dclnk
https://www.speakeasy.net/speedtest/
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:sCEGhiP0qxEJ:https://www.speakeasy.net/speedtest/%252Btest%26gbv%3D1%26%26ct%3Dclnk
https://www.google.com/webmasters/tools/mobile-friendly/
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:WBvZnqZfQukJ:https://www.google.com/webmasters/tools/mobile-friendly/%252Btest%26gbv%3D1%26%26ct%3Dclnk
http://www.humanmetrics.com/cgi-win/jtypes2.asp
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:w_lAt3mgXcoJ:http://www.humanmetrics.com/cgi-win/jtypes2.asp%252Btest%26gbv%3D1%26%26ct%3Dclnk
http://speedtest.xfinity.com/
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:snNGJxOQROIJ:http://speedtest.xfinity.com/%252Btest%26gbv%3D1%26%26ct%3Dclnk
https://www.act.org/content/act/en/products-and-services/the-act/taking-the-test.html
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:1sMSoJBXydo
https://www.act.org/content/act/en/products-and-services/the-act/taking-the-test.html%252Btest%26gbv%3D1%26%26ct%3Dclnk
https://www.16personalities.com/free-personality-test
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:SQzntHUEffkJ
https://www.16personalities.com/free-personality-test%252Btest%26gbv%3D%26%26ct%3Dclnk
https://www.xamarin.com/test-cloud
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:ypEu7XAFM8QJ:
https://www.xamarin.com/test-cloud%252Btest%26gbv%3D1%26%26ct%3Dclnk
It works now. I had issue with success('log'), i dont know why but commented it.
str_list = str.split(%r{=|&})
next if str_list[1].split('/')[2] == "webcache.googleusercontent.com"
# success("Site found: #{urls_to_log}")
File.open("#{PATH}/temp/sites.txt", "a+") {|s| s.puts("#{urls_to_log}")}
There are well-tested wheels used to tear apart URLs into the component parts so use them. Ruby comes with URI, which allows us to easily extract the host, path or query:
require 'uri'
URL = 'http://foo.com/a/b/c?d=1'
URI.parse(URL).host
# => "foo.com"
URI.parse(URL).path
# => "/a/b/c"
URI.parse(URL).query
# => "d=1"
Ruby's Enumerable module includes reject and select which make it easy to loop over an array or enumerable object and reject or select elements from it:
(1..3).select{ |i| i.even? } # => [2]
(1..3).reject{ |i| i.even? } # => [1, 3]
Using all that you could check the host of a URL for sub-strings and reject any you don't want:
require 'uri'
%w[
http://www.speedtest.net/
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:M47_v0xF3m8J
].reject{ |url| URI.parse(url).host[/googleusercontent\.com$/] }
# => ["http://www.speedtest.net/"]
Using these methods and techniques you can reject or select from an input file, or just peek into single URLs and choose to ignore or honor them.

Hash/string gets escaped

This is my hyperresource client:
require 'rubygems'
require 'hyperresource'
require 'json'
api = HyperResource.new(root: 'http://127.0.0.1:9393/todos',
headers: {'Accept' => 'application/vnd.127.0.0.1:9393/todos.v1+hal+json'})
string = '{"todo":{"title":"test"}}'
hash = JSON.parse(string)
api.post(hash)
puts hash
The hash output is: {"todo"=>{"title"=>"test"}}
At my Sinatra with Roar API I have this post function:
post "/todos" do
params.to_json
puts params
#todo = Todo.new(params[:todo])
if #todo.save
#todo.extend(TodoRepresenter)
#todo.to_json
else
puts 'FAIL'
end
end
My puts 'params' over here gets: {"{\"todo\":{\"title\":\"test\"}}"=>nil}
I found out, these are 'escaped strings' but I don't know where it goes wrong.
EDIT:
I checked my api with curl and postman google extension, both work fine. It's just hyperresource I guess
You are posting JSON, ergo you either need to register a Sinatra middleware that will automatically parse incoming JSON requests, or you need to do it yourself.
require 'rubygems'
require 'hyperresource'
require 'json'
api = HyperResource.new(root: 'http://127.0.0.1:9393/todos',
headers: {'Accept' => 'application/vnd.127.0.0.1:9393/todos.v1+hal+json'})
string = '{"todo":{"title":"test"}}'
hash = JSON.parse(string)
api.post({:data => hash})
puts hash
---
post "/todos" do
p = JSON.parse(params[:data])
puts p.inspect
#todo = Todo.new(p[:todo])
if #todo.save
#todo.extend(TodoRepresenter)
#todo.to_json
else
puts 'FAIL'
end
end
Should do what you need.

How to exit from async call when url timeout with ruby/curb

I am using Ruby curb to call multiple urls at once, e.g.
require 'rubygems'
require 'curb'
easy_options = {:follow_location => true}
multi_options = {:pipeline => true}
Curl::Multi.get(['http://www.example.com','http://www.trello.com','http://www.facebook.com','http://www.yahoo.com','http://www.msn.com'], easy_options, multi_options) do|easy|
# do something interesting with the easy response
puts easy.last_effective_url
end
The problem I have is I want to break the subsequent async calls when any url timeout occurred, is it possible?
As far as I know the current API doesn't expose the Curl::Multi instance, since otherwise you could do:
stop_everything = proc { multi.cancel! }
multi = Curl::Multi.get(array_of_urls, on_failure: stop_everything)
The easiest way might be to patch the Curl::Multi.http to return the m variable.
See https://github.com/taf2/curb/blob/master/lib/curl/multi.rb#L85
I think this will do exactly what you ask for:
require 'rubygems'
require 'curb'
responses = {}
requests = ['http://www.example.com','http://www.trello.com','http://www.facebook.com','http://www.yahoo.com','http://www.msn.com']
m = Curl::Multi.new
requests.each do |url|
responses[url] = ""
c = Curl::Easy.new(url) do|curl|
curl.follow_location = true
curl.on_body{|data| responses[url] << data; data.size }
curl.on_success {|easy| puts easy.last_effective_url }
curl.on_failure {|easy| puts "ERROR:#{easy.last_effective_url}"; #should_stop = true}
end
m.add(c)
end
m.perform { m.cancel! if #should_stop }

Suggested Redis driver for use within Goliath?

There seem to be several options for establishing Redis connections for use within EventMachine, and I'm having a hard time understanding the core differences between them.
My goal is to implement Redis within Goliath
The way I establish my connection now is through em-synchrony:
require 'em-synchrony'
require 'em-synchrony/em-redis'
config['redis'] = EventMachine::Synchrony::ConnectionPool.new(:size => 20) do
EventMachine::Protocols::Redis.connect(:host => 'localhost', :port => 6379)
end
What is the difference between the above, and using something like em-hiredis?
If I'm using Redis for sets and basic key:value storage, is em-redis the best solution for my scenario?
We use em-hiredis very successfully inside Goliath. Here's a sample of how we coded publishing:
config/example_api.rb
# These give us direct access to the redis connection from within the API
config['redisUri'] = 'redis://localhost:6379/0'
config['redisPub'] ||= EM::Hiredis.connect('')
example_api.rb
class ExampleApi < Goliath::API
use Goliath::Rack::Params # parse & merge query and body parameters
use Goliath::Rack::Formatters::JSON # JSON output formatter
use Goliath::Rack::Render # auto-negotiate response format
def response(env)
env.logger.debug "\n\n\nENV: #{env['PATH_INFO']}"
env.logger.debug "REQUEST: Received"
env.logger.debug "POST Action received: #{env.params} "
#processing of requests from browser goes here
resp =
case env.params["action"]
when 'SOME_ACTION' then process_action(env)
when 'ANOTHER_ACTION' then process_another_action(env)
else
# skip
end
env.logger.debug "REQUEST: About to respond with: #{resp}"
[200, {'Content-Type' => 'application/json', 'Access-Control-Allow-Origin' => "*"}, resp]
end
# process an action
def process_action(env)
# extract message data
data = Hash.new
data["user_id"], data["object_id"] = env.params['user_id'], env.params['object_id']
publishData = { "action" => 'SOME_ACTION_RECEIVED',
"data" => data }
redisPub.publish("Channel_1", Yajl::Encoder.encode(publishData))
end
end
return data
end
# process anothr action
def process_another_action(env)
# extract message data
data = Hash.new
data["user_id"], data["widget_id"] = env.params['user_id'], env.params['widget_id']
publishData = { "action" => 'SOME_OTHER_ACTION_RECEIVED',
"data" => data }
redisPub.publish("Channel_1", Yajl::Encoder.encode(publishData))
end
end
return data
end
end
Handling subscriptions are left as an exercise for the reader.
what em-synchrony does is patch the em-redis gem to allow using it with fibers which effectively allows it to run in goliath.
Here is a project using Goliath + Redis which can guide you on how to make all this works: https://github.com/igrigorik/mneme
Example with em-hiredis, what goliath do is wrap your request in a fiber so a way to test it is:
require 'rubygems'
require 'bundler/setup'
require 'em-hiredis'
require 'em-synchrony'
EM::run do
Fiber.new do
## this is what you can use in goliath
redis = EM::Hiredis.connect
p EM::Synchrony.sync redis.keys('*')
## end of goliath block
end.resume
end
and the Gemfile I used:
source :rubygems
gem 'em-hiredis'
gem 'em-synchrony'
If you run this example you will get the list of defined keys in your redis database printed on screen.
Without the EM::Synchrony.sync call you would get a deferrable but here the fiber is suspended until the calls return and you get the result.

Ruby AMQP under Thin HTTP server

I'm running a simple thin server, that publish some messages to different queues, the code looks like :
require "rubygems"
require "thin"
require "amqp"
require 'msgpack'
app = Proc.new do |env|
params = Rack::Request.new(env).params
command = params['command'].strip rescue "no command"
number = params['number'].strip rescue "no number"
p command
p number
AMQP.start do
if command =~ /\A(create|c|r|register)\z/i
MQ.queue("create").publish(number)
elsif m = (/\A(Answer|a)\s?(\d+|\d+-\d+)\z/i.match(command))
MQ.queue("answers").publish({:number => number,:answer => "answer" }.to_msgpack )
end
end
[200, {'Content-Type' => "text/plain"} , command ]
end
Rack::Handler::Thin.run(app, :Port => 4001)
Now when I run the server, and do something like http://0.0.0.0:4001/command=r&number=123123123
I'm always getting duplicate outputs, something like :
"no command"
"no number"
"no command"
"no number"
The first thing is why I'm getting like duplicate requests ? is it something has to do with the browser ? since when I use curl I'm not having the same behavior , and the second thing why I can't get the params ?
Any tips about the best implementation for such a server would be highly appreciated
Thanks in advance .
The second request comes from the browser looking for the favicon.ico. You can inspect the requests by adding the following code in your handler:
params = Rack::Request.new(env).params
p env # add this line to see the request in your console window
Alternatively you could use Sinatra:
require "rubygems"
require "amqp"
require "msgpack"
require "sinatra"
get '/:command/:number' do
command = params['command'].strip rescue "no command"
number = params['number'].strip rescue "no number"
p command
p number
AMQP.start do
if command =~ /\A(create|c|r|register)\z/i
MQ.queue("create").publish(number)
elsif m = (/\A(Answer|a)\s?(\d+|\d+-\d+)\z/i.match(command))
MQ.queue("answers").publish({:number => number,:answer => "answer" }.to_msgpack )
nd
end
return command
end
and then run ruby the_server.rb at the command line to start the http server.

Resources