I use scrapy and splash to crawl all url in website. In some website with static html, It works very good! But when I crawled some website has AJAX page, and html5 I cannot get any url (Example: http://testphp.vulnweb.com/AJAX/, http://testhtml5.vulnweb.com). Anyone has solution for this problem?
Thanks so much!
Use Request package-python
You can generate the request for the content and get the response.
Example code here
Related
I have a page on my site that doesn't display well when viewed from Mozilla Firefox. I have some jQuery codes that don't work on Mozilla. It displays well on Some other browsers.
I am wondering if there is a way I can redirect my visitors who use Mozilla to an alternate URL where I will remove the jQuery code.
Pls any help?
Thanks.
Because of an HTTP_REFERER issue I need to make a url pass from an https site to http.
I have this bit of javascript but it is not working.
Save this page as PDF
Can I also find out how I would append the current site using javascript their api url?
http://api.htm2pdf.co.uk/urltopdf?apikey=yourapikey&url=http://www.example.com
Any advice?
Need to block the initial anchor tag event.
Save this page as PDF
I would use either javascript or the href attribute, not both. I don't see how they would work well together.
You can use .preventDefault() as noted, but why put the href attribute there in the first place?
Is this what you're looking for? It should work on both http or https sites.
<a onclick="window.open('http://api.htm2pdf.co.uk/urltopdf?apikey=yourapikey&url=' + window.location.href, '_blank', 'location=yes,scrollbars=yes,status=yes');">Save as PDF</a>
I need to have an iframe script which I can give to my different clients, so that they can embed it in their sites. Just like Youtube or facebook does.
But it does not get rendered due to cross domain restrictions.
I have gone through every documentation for x - frame options , crossDomain ajax call.
The problem with crossDomain ajax call is that I have only JSONP to work with.
I have tried this - just go to any youtube video and get its Embed code. Its a plain iframe script e.g. <iframe width="420" height="315" src="http://www.youtube.com/embed/7N5OhNplEd4" frameborder="0" allowfullscreen></iframe>
If you inject the above script in your html, it will get rendered , but as soon as you edit the src of the iframe to youtube.com itself , it will go blank.
Facebook's iframe too gets rendered everywhere smoothly.
I am hell tortured by this thing.
Please guide me on this. Thanks in advance!
IF you look at the response headers from youtube.com it is returning "X-Frame-Options:SAMEORIGIN" so they are adding the header on the server to stop people from displaying youtube (website pages) via a iframe.
For AJAX crawling of Googlebot I use "_escaped_fragment_" argument in my website.
Now I checked Yandex's search results for my site.
I saw that AJAX reponses don't exist in search results.
Is there an option for Yandex like "_escaped_fragment_" ?
Else, should I check user agent and if user agent includes "YandexBot" then serve non-AJAX page?
Thank you
I found out that Yandex also supports Google's proposition for AJAX crawling.
No need to change any code if you optimized your site for Googlebot crawling.
I am looking at the Gawker blogs (http://io9.com, http://lifehacker.com/) and I'm curious about how they are made.
When I click for on a link only the article part of the page reloads displaying a loading icon while it does.
But what I can't figure out is that links point to new URLs like io9.com/something/something and its not something like I see on ajax pages that they put a site.com/#something tag at the end of the url from javascript to mark the page after an ajax request.
Can I change the full blown URL from javascript or what is happening?
When it happens, the website is using the HTML5 History API. This API can change the url (via JavaScript) without changing the page.
See caniuse.com for browser support.
If you would like to implement it in yout website, backbonejs.org would be very useful.