Prevent Google Web Preview bot - user-agent

I noticed today in the webserver logs that we sometimes get bursts (450 requests in 2 seconds) of requests from a useragent with Google Web Preview. Looking at other stackoverflow it seems this is probably related to the preview functionality on the search page or maybe to the saved/most used links at the bottom of a users chrome tabs.
I've already blocked these particular URLs in the robots.txt, so, it's obviously ignoring that. It seems from this 2010 instant previews page that you can add a nosnippet tag and Google will then not try to fetch the preview. However, it seems that adding nosnippet wouldn't actually stop the request (as they'd still have to fetch the page to parse out the tag).
Short of blocking Google's ip address which I don't want to do, is there a decent way to stop Google hammering the server periodically.

I think you probably did it, but when I get such issue I make a buffer page, and provide link on that page e.g link for admin panel that I don't want to be rendered and use NO Index on that page

Related

Can I see the network calls from a Firefox add-on?

I'd like to see the calls made from a Firefox add-on.
I know it's calling its website's REST API, and I would like to see the requests to better understand the API.
However in the Web Developper Network tab, these calls do not appear. Is there an option to see them ? Whether in dev tools or in about:config ?
Edit: I tried the about:debugging too, but it doesn't seem to capture the requests either. There are some background requests yes, but not the ones I know should be there.
As I don't know if this is generic or specific to the extension I'm looking at I'll detail. I'm trying to look at the requests made by the raindrop.io extension (https://raindrop.io/) that offers an API (https://developer.raindrop.io).
When I click on the extension button, I can create a bookmark for the page. For instance: the one I'm editing right now
This goes through requests to the REST API (at least a POST to https://api.raindrop.io/rest/v1/raindrop). I know because:
I can see similar requests when doing an operation from the website itself
I can send this request via the JS console and make it work
However I do not see this request in the normal Network console, I see no requests from the extensions.
I do not see it either from the debugging one. I see some requests though, but only background GET requests to a /links API that returns the full list of bookmarks. A request is made after I've added my bookmark, but it is clearly not the one that makes the update.
Another way I know the request is being made is that if I try to bookmark something weird (like the debugging tab), the extension displays an error that is the same I get if I manually sent a malformed request to the API.
So these calls happen. But I can't see them anywhere.
Note that the illustration is on this add-on because this is what I'm looking at right now, but I had the issue with others in the past. No way to see the foreground requests of the add-on.

How do browsers(Firefox more specifically) know which cookies are tracking cookies

I came accross to a situation where Firefox in incognito mode blocks some of the cookies on my site. More specifically google analytics cookies like _ga, _gid, ..etc. Searching in the internet I came across to this article. So browsers like Firefox somehow identify these cookies as tracking. But how? How does it know which cookies are tracking and which not? I need to know this because next time I set cookies on my server I dont want them to be blocked by browsers.
In context of the article it just means blocking reference links. For instance it blocks sending the referral information from, for instance Facebook, to other sites.
Other sites use the referral information to decide who to pay to get more traffic and stuff like that.
There's like 100 different versions of the idea of "tracking" though.
Like the article points out, your ISP always know every DNS search you do and every call to an IP so they always know ALLLL your traffic and are "tracking" it.
There's also "ad tracking" where all those google calls send out what the crawler says is on the page in order to create targeted ads and all that.
I think, based on what you wrote, you're just talking about tracking links which is just scrubbing the referral link part though.
You'd have to be more specific if that's not what you're looking at.

Why some programmers use GET instead of POST in request that modify information on server?

I've learned that you shouldn't be using GET requests for URLs that modify information on the server because you could get problems with browser link prefetch, search engine crawlers etc.
But when I'm viewing the source code for some sites I saw that many big companies doesn't use this approach.
For example: I signed up for tidal.com and activated a subscription.
When I went to the subscript page I got a page where I was able to cancel my subscription. But the button "cancel my subscription" is not a form performing a POST request, but simply a link to https://my.tidal.com/br/account/subscription/cancel
as well reactivate subscription is a link to https://go.tidal.com/br/account/subscription/resume/40cd9e3e-3d58-4c80-aee7-c378011b49d4
Why are they doing that if my action is modifying information on the server?

How do I get XHR/Ajax resource timing data from window.performance?

When I open the Firefox "network" tab in the developer tools, I'm able to see the timing data from all the requests my page is making, including application/json (XHR) calls. I want to be able to get this timing information programmatically.
In Selenium, I let my page load fully and then ask the window.performance.getEntries() method for all of the resources. It gives me back a ton, including CSS, javascript, etc, but I don't see the calls to our RESTful services that show up in the Firefox window as "json" requests.
Since Firefox shows them in its Network tab in the developer tools, is a way for me to get them programatically? Our app is an angular app that is not using iframes.
I figured out my issue after a day of googling and trying different things. Thanks to this article I discovered that I needed to add Timing-Allow-Origin: * to the response header of all the services.
Once I did that, the timing information started to appear. It's apparently because the services are hosted at a different domain than my client. I don't understand the ramifications of leaving that header in there so I'll make sure it doesn't get deployed to production.

Does a website link (href) validation service exist?

I am looking for a web service kind of like Google Analytics.
Paste some javascript into your web page and if any of the links there become invalid, hey presto, an email is sent to someone telling them which link, which page etc etc has the incorrect link.
Anyone heard of such a service?
This would slow the page loading down a lot if it had to check for broken links every time someone visited it (basically a http request for every link). Not that it isn't possible, but the implementation would have to be very very good.
Javascript cannot send emails, you would have to use ajax to post the details to another page that would then email the admin. As this is all client side, it is very open to abuse.
I would suggest using a program to do it every now and again. There are even Firefox extensions to do it rather than a program. Google will also list a whole host of websites offering the service.

Resources