HTTParty request returns 404 code - ruby

I'm sending an HTTP request with the HTTParty Ruby gem with the following code:
require 'httparty'
require 'pry'
page = HTTParty.get('http://www.cubuffs.com/')
binding.pry
You can verify that the URL is valid. When exploring the results with Pry, I get the following:
[1] pry(main)> page
=> nil
[2] pry(main)> page.code
=> 404
[3] pry(main)> page.response
=> #<Net::HTTPNotFound 404 Not Found readbody=true>
I'm pretty sure nothing is wrong with my code, because I can substitute other URLs and they work as expected. For some reason, URLs from this domain return a 404 code. Any ideas what is wrong here and how to fix it?

The owner of that site is checking the User-Agent from the browser, and doesn't like the one that HTTParty is using. You can get the page by including a user agent header from a browser, here is the one from Chrome:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36
Modify your code as follows:
require 'httparty'
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'
page = HTTParty.get('http://www.cubuffs.com/', headers: {"User-Agent": user_agent})

Related

symfony5 - heroku - urls 404 not found

I am using Symfony 5, and have installed easyAdmin 3.1 bundle.
It works fine in my localhost using:
https://127.0.0.1:8000/admin
However when I deploy to my heroku account I get error when accessing the /admin :
404 : The requested URL was not found on this server.
my heroku app url is:
https://symfony-aristos.herokuapp.com/admin
As a matter of fact this happens with any url other that /. So it's something with the redirects.
https://symfony-aristos.herokuapp.com/login
In heroku logs all I see is:
2020-11-25T07:58:41.279377+00:00 app[web.1]: 10.11.131.158 - -
[25/Nov/2020:07:58:41 +0000] "GET /admin HTTP/1.1" 404 196 "-"
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,
like Gecko) Chrome/87.0.4280.66 Safari/537.36
Is there anything else missing on heroku side to get it working?
I hope this helps other people:
Symfony 4 and newer applications no longer contain a .htaccess or other configuration to allow rewriting of URLs in such a way that they do not require the index.php script in the path.
An easy fix this is to require the apache-pack
$ composer require symfony/apache-pack
$ git add composer.json composer.lock symfony.lock public/.htaccess
$ git commit -m "apache-pack"

Puppeteer Random user-agent args

Recently I asked this random useragents from .json file but the thing is that after I added "capture screen" of puppeteer it keeps showing headless chrome, so I copied the previous topic answer into wrong place.
Now the real useragent js page came from this page:
const browser = await puppeteer.launch({
headless: false,
args: ['--headless', '--disable-infobars', '--user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36', '--no-sandbox', `--proxy-server=socks5://127.0.0.1:${port}`] : ['--no-sandbox'],
});
So how do I create rnadom list inside the arguments?
My previous help which doesnt worked for me as the random useragents code was not in correct place is here: Puppeteer browser useragent list
But adding that code inside this wont work.
So after --user-agent= I want to add "random" function but how?
You can use the user-agents module.
You should install npm install user-agents
const UserAgent = require("user-agents");
const userAgent = new UserAgent({
deviceCategory: "desktop",
platform: "Linux x86_64",
});
Then in the head "--user-agent=" + userAgent + "",

Is there an update to open-uri that changes the way you call a User-Agent?

In the book "Instant Nokogiri" and on the Packt Hub Nokogiri page it has a User-Agent application for spoofing a browser while crawling the New York Times website for the top story.
I am working through this book but the code is a little dated, but I updated it.
My version of the code is:
require 'open-uri'
require 'nokogiri'
require 'sinatra'
browser = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4)
AppleWebKit/536.30.1 (KHTML, like Gecko) Version/6.0.5 Safari/536.30.1'
doc = Nokogiri::HTML(open ('http://nytimes.com', browser))
nyt_headline = doc.at_css('h2 span').content
nyt_url = "http://nytimes.com" + doc.at_css('.css-16ugw5f a')[:href]
html = "<h1>Nokogiri News Service</h1>"
html += "<h2>Top Story: #{nyt_headline}</h2>"
get '/' do
html
end
I am running this through a terminal session on Mac OS and getting this error:
invalid access mode Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) (ArgumentError)
AppleWebKit/536.30.1 (KHTML, like Gecko) Version/6.0.5 Safari/536.30.1 (URI::HTTP resource is read only.)
I don't believe I am attempting to 'write'. Not sure why a 'read only' error would block this from running. It was working before I added the User Agent info.
See OpenURI's open documentation:
URI.open("http://www.ruby-lang.org/en/",
"User-Agent" => "Ruby/#{RUBY_VERSION}",
"From" => "foo#bar.invalid",
"Referer" => "http://www.ruby-lang.org/") {|f|
# ...
}
The options are a Hash. You're passing a String.

bypass cloudflare protection with casperjs or phantomjs while using tor proxy

I use tor to access casperjs via this socks proxy
my OS windows 10 x64
my test.js
var casper = require('casper').create({
verbose: true,
logLevel: 'error',
pageSettings: {
loadImages: false, // The WebPage instance used by Casper will
loadPlugins: false, // use these settings
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'
}
});
var caturl = ('http://www.test.com');
casper.start(caturl, function() {
this.echo(this.getTitle());
});
casper.run();
result from my local machine
casperjs test.js
This Is Page Title
when open tor, and I'm sure its working fine also the socks proxy is working tested it before
casperjs --proxy=127.0.0.1:9150 --proxy-type=socks5 test.js
Attention Required! | Cloudflare
the result as I see, that its want to solve recaptcha to open this site from cloudflare
BUT
when I open the tor browser, and open the link tested in casperjs, its open normally without any asking for recaptcha
WHY
when open the link with casperjs ask for recaptcha , and when open the link with tor browser (same proxy IP used) it doesn't ask for recaptcha ?
is this related with useragent or what ?

Ember isn't looking for my API

So I'm running an Ember-CLI and Rails 5 API-only app. It works fine in development when I use the --proxy http://localhost:3000 flag, but now I am trying to deploy to Heroku.
I have two sides: recipme-ember and recipme-rails. Feel free to explore the repos:
amclelland/fancy_recipme_frontend
amclelland/fancy_recipme
So after some struggling I have both sides deployed, but the Emberapp refuses to talk to the Rails app.
When I try to go to the Meal index I see this in the Heroku logs:
"GET /meals HTTP/1.1" 200 711 "http://recipme-ember.herokuapp.com/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/50.0.2661.102 Chrome/50.0.2661.102 Safari/537.36"
Looks like the Ember app is trying to get the /meals data from itself.
I have set API_URL env var for my Ember app to the Rails Heroku URL. Not sure if there's something else I need to set.
Thanks in advance!
You need to set the host property for your adapter:
// adapters/application.js
import DS from 'ember-data';
export default DS.RESTAdapter.extend({
host: 'http://yourapi.herokuapp.com',
});
If you need different hosts for development and production, you can use your config file to change it like shown here.

Resources