Related to the subject of adblockers : it's easy to find if someone has an adblocker, but I am interested in finding out if the browser is blocking some analytics call too (cf the easyprivacy list for instance, or ublock).
Is there a way to find out if a specific js call has a "ERR_BLOCKED_BY_CLIENT /failed" status?
Thanks !
Related
Due to a limitation of the API of a websites I use for searching some products, I have to do html scraping its Products page. There's no no other way because it offers only free API with the limitation. I just need 10 or 100 times more items that its API returns, meaning even if I call it 5 times, it'll return the same set of the products as if it were 1 call.
I don't need to scrape plenty of the page in short period of time. Normally a scrape bot would scrape all that data in a few minutes. For me a few hours is acceptable, so my scraper can be more like a human.
The questions is: what are the ways to make my scraper look like a normal user?
First, make less calls in a short period of time.
Use a headless browser, maybe?
Use vpn? or proxy? or both?
What are other pointers?
Note: in my case scraping is the only way to achieve what I want because the API doesn't work. So there's no question whether I should use the API or scraping. I simply can only use scraping.
You are basically heading toward a right direction.
Yet I suspect that you don't really master the API (or it's a weird one) if if call it 5 times, it'll return the same set of the products as if it were 1 call. API should be able to let users access to all possible data (with frequency limit though).
The items you've asked about:
Make less calls in a short period of time. - Kind of true, yet still you should be clear what request frequency is acceptible for certain site (not being detected, nor bandwidth throttling).
Use a headless browser. - Yes. Abandon cookie, be anonymous.
Use vpn? or proxy? - Proxy yes, use an appropriate proxy service that will provide you enough flexibility of not being detected. VPN does not help, since network nodes (where you scrape from) are limited in number and have static IPs (basically).
I think this post might be to your help.
I request Google Places API once a week to see if my requests to add new places have passed the moderation queue (if the scope has changed to google).
The problem is that I don't know how to know if a request has been rejected by the moderation.
The only solution seems to ask for get-current-place (or other "search" request) and look for the place, since the place must not appear in the result once rejected by the moderation, but I'm not really convinced by that solution.
Thanks
Unfortunately there isn't a good or stable way to do this at the moment. Looking at search results gives an approximation, but there are other reasons that a valid place may not appear in the results, so it won't give a strong confirmation.
How can I abort a store load while the ajax call is still executing? I have a simple store with proxy type of 'ajax' and 'json' reader.
The documentation does not indicate any way to abort this. I have noticed that jsonp does allow aborting a load in progress. Do I have to switch to jsonp?
The motivation here is that I have a search bar and list object that gets populated with results. The actual search on the backend can take 5-10 seconds. So if a user starts a search then quickly wants to do another search (in case, for example, the first search was a typo), then the new search needs to abort the first search ajax call. Otherwise, I am seeing mixed results showing up in my search results.
As usual, any help is greatly appreciated!
Mohammad
The solution I have used in the past to solve this exact problem is to track each request with an incrementing counter and as requests complete I check the counter and if a request has been made with a higher counter I disregard the result.
I am trying to create real-time and collaborative application like - google wave for example.
When user1 writes something at the same time it shows on user2 screens.
I started a little research,and found some ways to this with Ajax -
1.every X seconds send request to the server and to check what is "happening"
2.timeout - long request ,Problem - I saw i can do this only with IE8
there are other options?what is the best way to this?
And with way number 2,this true I can do this only with IE8?
Yosy
The whole point of AJAX is that the server can wait for notifications from each clients, and notify all the other clients when something happens. There's no need for polling. Look up keywords like comet, and bayeux. Dojo has a good implementation.
I'm not sure what you are referring to in 2, but if I were going to implement something like this, I'd do what you explain in 1. Basically your server will be keeping track of the conversation, and the clients will constantly ask for updates.
Another possible option would be flash, but I don't know much about that other than it would be capable, so your on your own for researching that.
Some notes on keeping things running quickly in option 1:
Remember you only have 2 "ajax"
calls to work with on the client side (you can only have 2 calls
out at once). So keep track
of the calls that are out. Make use
of abort() if a call takes too long or its response is not going to be valid anymore.
Get the most out of your calls, if
you need to send text to the server,
use the response to get an update on
the current "conversation".
An example:
Say, I have an AJAX chat on a page where people can talk to each other.
How is it possible to display (send) the message sent by person A to persons B, C and D while they have the chat opened?
I understand that technically it works a bit different: the chat(ajax) is reading from DB (or other source), say every second, to find out if there are new messages to display.
But I wonder if there is a method to send the new message to the rest of the people just when it is sent, and not to load the DB with 1000s of reads every second.
Please note that the AJAX chat example is just an example to explain what I want, and is not something I want to realize. I just need to know if there is a method to let all the opened browser at a specific page(ajax) that there is new content on the server that should be gathered.
{sorry for my English}
Since the server cannot respond to a client without a corresponding request, you need to keep state for each user's queued message. However, this is exactly what the database accomplishes. You cannot get around this by replacing the database with something that doesn't just accomplish the same thing in a different way. That said, there are surely optimizations you could do. Keep in mind, however, that you shouldn't prematurely optimize situations like this; databases are designed to handle extremely high traffic, and it's very possible (and in fact, likely), that the scenario described will be handled just fine by the database out of the box.
What you're describing is generally referred to as the 'Comet' concept. See the Wikipedia article for details, especially implementation options (long polling, etc.).
Another answer is to have the server push changes to connected clients, that way there is just one call to the database and then the server pushes the change to all the clients. This article indicates it is possible, however I have never tried this myself.
It's very basic, but if you want to stick with a standard AJAX solution, a simple means of reducing load on the server when polling would be to get the AJAX call to forward the last collected comment ID for that client - you then use that (with the appropriate escaping) in the lookup query on the server side to ensure you only return new comments.