Scrapy referring back to original page instead of next page - xpath

I am having trouble using scapry to follow a "next page" link - according to the log it is referring back to itself instead of the "next page" url. Here is the code:
import scrapy
class QuotesSpider(scrapy.Spider):
name = "quotes2"
start_urls = [
'http://search.jeffersondeeds.com/pdetail.php?instnum=2016230701&year=2016&db=0&cnum=20',
]
def parse(self, response):
for quote in response.xpath('//div'):
yield{
'record' : quote.select(".//span/text()").extract()
}
next_page = response.xpath('//*[#id="nextpage"]/a/#href').extract()
if next_page is not None:
print("GOOOO BUCKS!!")
next_page = response.urljoin(next_page)
yield scrapy.Request(next_page, callback=self.parse)
else:
print("Ahhh fooey!")
The xpath looks to be correct:
But the url in being captured as next_page is the original url (starts_urls)

next_page isn't None, but it is an empty list.
Now the nextpage link being generated with a javascript inside '//table//script/text()'
you can get it with: response.xpath('//table//script/text()').re_first("href=\\'(pdetail.*)\\'>")

Related

dpymenus, discord.py how to get the currently selected page in a menu

How would I get the currently open page in dpymenus, eg page 2 in this example
from discord.ext import commands
from dpymenus import Page, PaginatedMenu
class Demo(commands.Cog):
def __init__(self, client):
self.client = client
#commands.command()
async def demo(self, ctx: commands.Context):
page1 = Page(title='Page 1', description='First page test!')
page1.add_field(name='Example A', value='Example B')
page2 = Page(title='Page 2', description='Second page test!')
page2.add_field(name='Example C', value='Example D')
page3 = Page(title='Page 3', description='Third page test!')
page3.add_field(name='Example E', value='Example F')
menu = PaginatedMenu(ctx)
menu.add_pages([page1, page2, page3])
await menu.open()
def setup(client):
client.add_cog(Demo(client))
also what should this be tagged with?
I have discovered the answer:
use menu.page it will return something like Page 1 Title
you can use menu.page.index to get the index of the page item.
I found the answer in a discord server, I was not able to find any info in https://dpymenus.readthedocs.io/en/latest/ if you can please tell me.

How to get captcha img src with Ruby and Mechanize?

I'm trying to write simple crawler, that would be filling 2 input fields. The page has an img element. Through Chrome developer mode I can see that img has src attribute. But after fetching the page the src attribute is gone. How do I get over this?
Code:
require 'mechanize'
agent = Mechanize.new
agent.user_agent_alias = 'Windows Chrome'
page = agent.get('https://ercdmd.ru/?gpay')
form = page.forms.first
form.gpay_abon = '00-0000000000'
captcha = page.at('#img_captcha')
pp captcha
Output:
#(Element:0x15e90ec {
name = "img",
attributes = [ #(Attr:0x15e8c14 { name = "id", value = "img_captcha" })]
})
My idea is to get invoice by a query through Telegram bot. Since there is a captcha I thought that I could read captcha image src with Mechanize to send that image through Telegram. Than, I would input digits that I can see on image and send in back to Mechanize to fill second input field. But now I am stuck.
Is there an other way to get invoice from that source?
I'm looking at that page, the captcha url would be:
captcha_url = "https://ercdmd.ru/captcha.php?time=#{Time.now.to_i}000"
Give that a try and see if it works.

Redirect user to the appropriate page from ASP classic login page

I have a website on which certain pages are secured with a login page written in vbscript (framework is asp classic). Given that the username is "foo" and the password is "bar", the login page currently accepts the username and password and then redirects the user to a default page. We will call it "page1". Below is the code:
Response.Buffer = True
If lcase(Request.Form("username")) = "foo" AND lcase(Request.Form("password")) = "bar" then
Session.Contents("foo") = "1"
Response.Redirect("page1.asp")
Else
Response.Redirect("failure.asp")
End If
This works but the user will always be redirected to the same page, regardless of which page they were trying to reach. I would like the user to be redirected to the page they were trying to reach before being sent tho the login page. Supposing the user wanted to go to "page2", I tried the following code:
Response.Buffer = True
If Request.ServerVariables("URL")= "http://www.mysite.com.com/page2.asp" AND lcase(Request.Form("username")) = "foo" AND lcase(Request.Form("password")) = "bar" then
Session.Contents("Dealer") = "1"
Response.Redirect("page2.asp")
ElseIf lcase(Request.Form("username")) = "foo" AND lcase(Request.Form("password")) = "bar" then
Session.Contents("foo") = "1"
Response.Redirect("page1.asp")
Else
Response.Redirect("failure.asp")
End If
This does not work, which is probably because
Request.ServerVariables("URL")
is pulling the url from the login page. Does anyone know how to send the user to the page that was originally requested?
Thanks in advance for any help/advice!
On your login form page add a hidden field for the referring page and populate the value like this:
<input name="referer" type="hidden" value="<%= Request.ServerVariables("HTTP_REFERER") %> />
Then redirect to this page once the form is submitted successfully:
Response.Redirect(Request.Form("referer"))
I don't know about "classic asp", nor VB script.
But what about the approach of adding the desired page (requiring login) in a querystring ?
forcing to land the user on login.aspx?redirectUrl=desiredPage.asp
then when login is done, you redirect to the page by retrieving it from the querystring ?
This page explains it better.
ASP.NET: directing user to login page, after login send user back to page requested originally?
Every secured page should have some header code like this:
If Not Session("LoggedIn") Then
Response.Redirect "login.asp?r=" & Server.UrlEncode(Request.ServerVariables("SCRIPT_NAME"))
End If
I typically put this into an include file called "private.asp" and make sure to include it at the top of every page that should be secured.
In your login page, after you've successfully logged in the user, check your querystring value to see if you should forward the user back to an originally requested page:
' After successful login...
strReturnURL = Request.QueryString("r")
If Len(strReturnURL) > 0 Then
Response.Redirect strReturnURL
Else
' Send them to your homepage...
Response.Redirect "/"
End If
your signin page URL should look like that:
http://domain.com/login.asp?urlstr=page2.asp
Response.Buffer = True
dim redirecturl
redirecturl = Request("urlstr")
If lcase(Request.Form("username")) = "foo" AND lcase(Request.Form("password")) = "bar" and len(redirecturl)>0 then
Session("Dealer") = "1"
Response.Redirect(redirecturl)
ElseIf lcase(Request.Form("username")) = "foo" AND lcase(Request.Form("password")) = "bar" and len(redirecturl)=0 then
Session("foo") = "1"
Response.Redirect("login.asp")
Else
Response.Redirect("failure.asp")
End If

extracting an img source link from a string in ruby

I have this string
#<Fletcher::Model::Amazon alt="You Are Not a Gadget: A Manifesto (Vintage)" border="0" element="img" height="240" id="prodImage" onload="if (typeof uet == 'function') { if(typeof setCSMReq=='function'){setCSMReq('af');setCSMReq('cf');}else{uet('af');uet('cf');amznJQ.completedStage('amznJQ.AboveTheFold');} }" onmouseout="sitb_doHide('bookpopover'); return false;" onmouseover="sitb_showLayer('bookpopover'); return false;" src="http://ecx.images-amazon.com/images/I/51bpl1wA%2BaL._BO2,204,203,200_PIsitb-sticker-arrow-click,TopRight,35,-76_AA240_SH20_OU01_.jpg" width="240">
I simply want the link in the src attribute:
http://ecx.images-amazon.com/images/I/51bpl1wA%2BaL._BO2,204,203,200_PIsitb-sticker-arrow-click,TopRight,35,-76_AA240_SH20_OU01_.jpg"
How can I parse this string to get the link
Below is a listing of relevant functions
module Fletcher
module Model
class Amazon < Fletcher::Model::Base
# A regular expression for determining if a url comes from a specific service/website
def self.regexp
/amazon\.com/
end
# Parse data and look for object attributes to give to object
def parse(data)
super(data)
case doc
when Nokogiri::HTML::Document
# Get Name
self.name = doc.css("h1.parseasinTitle").first_string
# Get Description
self.description = doc.css("div#productDescriptionWrapper").first_string
# Get description from meta title if not found
self.description = doc.xpath("//meta[#name='description']/#content").first_string if description.nil?
# Get Price
parse_price(doc.css("b.priceLarge").first_string)
# Get Images
self.images = doc.xpath("//table[#class='productImageGrid']//img").attribute_array
self.image = images.first
end
end
end
end
end
In that case I believe it would be: fletchedProduct.image[:src]
require 'open-uri'
x = %Q{#<Fletcher::Model::Amazon alt="You Are Not a Gadget: A Manifesto (Vintage)" border="0" element="img" height="240" id="prodImage" onload="if (typeof uet == 'function') { if(typeof setCSMReq=='function'){setCSMReq('af');setCSMReq('cf');}else{uet('af');uet('cf');amznJQ.completedStage('amznJQ.AboveTheFold');} }" onmouseout="sitb_doHide('bookpopover'); return false;" onmouseover="sitb_showLayer('bookpopover'); return false;" src="http://ecx.images-amazon.com/images/I/51bpl1wA%2BaL._BO2,204,203,200_PIsitb-sticker-arrow-click,TopRight,35,-76_AA240_SH20_OU01_.jpg" width="240">}
url = URI.extract(x)
puts url[2]
output:
http://ecx.images-amazon.com/images/I/51bpl1wA%2BaL._BO2,204,203,200_PIsitb-sticker-arrow-click,TopRight,35,-76_AA240_SH20_OU01_.jpg
Hope this helps. I just happened to need to be able to do this last week and looked it up.

How to load Blobproperty image in Google App Engine?

I wrote some codes.
I could save image in BobProperty.
But I cannot load image into HTML page...
source code:
class Product(db.Model):
image = db.BlobProperty()
...
class add:
productImage = self.request.get('image')
product.image = db.Blob(productImage)
product.put()
but i wrote {{product.image}} into html code. But there were like ��袀 ���� ���� ���� (����������� ��(:(������� (������� (��>̢��� (�������>������Y������K��׏
What should i do if i want load image from datastore?
I use an auxiliary view:
def serve_image(request, image):
if image == "None":
image = ""
response = HttpResponse(image)
response['Content-Type'] = "image/png"
response['Cache-Control'] = "max-age=7200"
return response
and in the model:
def get_image_path(self):
# This returns the url of serve_image, with the argument of image's pk.
# Something like /main/serve_image/1231234dfg22; this url will return a
# response image with the blob
return reverse("main.views.serve_image", args=[str(self.pk)])
and just use {{ model.get_image_path }} instead.
(this is django-nonrel, but I guess you could figure out what it does)
Also, there is a post here about this; you should check it out.

Resources