Using random user agent vs proxy in scraping? - proxy

I am recently working on web scraping.
I found that we can use proxy or random user agents to stay away from anti - scraping detection's.
Is there any difference between proxy and random user agents?
Because I got confused when I understood that both are used to hide the original client request identity.
If m understanding is wrong please let me know

Useragent and proxy are totally different concepts
1) Useragents : The useragent will be sent to the targeted website through headers
When I send a request to stackoverflow, my useragent is :
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0
It says I'm using mozilla and linux + other infos. Everyone using same browser (firefox 5.0) on linux will have the same useragent.
This library will help you find the most common useragents used on the web so that your useragent looks anonymous : https://github.com/Lobstrio/shadow-useragent
2) Proxy
A proxy will let you hide your ip adress behind a proxy. The website you target will receive the ip address of the proxy rather than yours. If you ip get blocked by the website, then using a proxy would normally unlock the website.
There can be many reasons why you can be blocked during scraping but rotating ip and useragents can be effective in some cases

Related

Mikrotik Received HTTP POST from server or web page

im new in mikrotik, please help
all device connected is blocked from internet
so, i want to send http post from web page to mikrotik with data ip & macAddress
then the mikrotik make this ip+macAddress can connect to internet
is that can be happen in mikrotik?
or maybe any option for that, thank you.
This strongly depends on how do you block users from accessing outside network.
Since you mentioned client IP a MAC address my guess is Hotspot. If I'm correct then THIS would be your way

Is there a way to implement captive portal on windows hotspot?

I am looking for a way to implement a captive portal for the windows 10 - mobile hotspot. The idea is to redirect all devices that connect to the hotspot to a webpage.
I was able to find this article which shows how to do it in linux.
But I have been unsuccessful in finding a similar one for windows. Posts like this one proved to be dead ends.
I am okay with using a simple nginx server to give 302 redirect response to clients if needed, but prefer not to use any existing software that implements a captive portal.
UPDATE
I have succeeded in triggering a captive portal on clients (linux laptop, android device etc) using a workaround.
Whenever a device connects to the hotspot it sends a request to some predefined websites to check if the wifi connection has internet access. If it gets a 302 response it generates the captive portal window.
So I added the following entries to the hosts file on windows machine.
127.0.0.1 clients3.google.com #android
127.0.0.1 connectivitycheck.gstatic.com #android
127.0.0.1 nmcheck.gnome.org #ubuntu
These requests will then be resolved locally using the hosts file entries and sent to the nginx server which gives a 302 redirect to all http requests.
The setup I mentioned in the UPDATE above was tweaked finally to get where I wanted. I used dnschef, an open-source dns server that works perfectly as a command line client.
The steps followed.
Start windows mobile hotspot.
Go to Network adapters => Select hotspot adapter => Change IPv4 settings => set 127.0.0.1 as DNS server.
Start dnschef with --fakeip = 192.168.137.1
Start an http server on 192.168.137.1 and give 302 redirect response to all requests.
And that's it! Whenever a device connects to the hotspot, it will attempt to connect to any one of the preset websites used to determine internet connectivity. These requests will be resolved locally by dnschef to our Nginx server. The Nginx server then gives a 302 redirect which triggers captive portal on the client.
I tried a similar approach using dnscrypt-proxy which provides dedicated captive-portal support. Since, this is nothing more than dns cloaking there are several ways to achieve, that requests to certain "connection-checking" domains are directed to a local webserver.
Unlike in the accepted answer, I figured out an even easier and more flexible way by using the windows hosts file without any third-party dns proxy. Instead of associating the connection-checking domains with localhost, I mapped them with the physical wifi accespoint ip address (which is 192.168.137.1). This causes wifi clients to directly send their connection-checking requests to the webserver, that is running on the local pc and listens to all connections on port 80.
hosts file:
192.168.137.1 captive.apple.com
192.168.137.1 clients3.google.com
192.168.137.1 nmcheck.gnome.org
192.168.137.1 connectivitycheck.gstatic.com
192.168.137.1 connectivitycheck.android.com
192.168.137.1 www.msftncsi.com
192.168.137.1 dns.msftncsi.com
192.168.137.1 www.msftconnecttest.com
192.168.137.1 ipv6.msftconnecttest.com
192.168.137.1 ipv4only.arpa
This webserver (in my case asp.net core) redirects clients to a login page, unless they are already registered. In this case the webserver may answer to the calls just like the "real" servers do, that sit behind those connection-checking domains, in order not to redirect clients, that have already been logged in successfully.

OS X / Chrome - Tunnel all traffic over HTTP Proxy?

I have written a proxy server listening on port 80 that can bypass the firewall by accessing blocked websites. It can be accessed this way: http://proxy-server/?url=http://blockedwebsite.com
How can I automate this, by forwarding all requests to http://blockedwebsite.com to http://proxy-server/?url=http://blockedwebsite.com without actually showing proxy-server in the URL?
I'm looking for a solution that works with OS X (all traffic) and Google Chrome (http / https only).
Thanks!

Using wget and getting a different outcome than when using a browser

I am using wget for windows (gnuwin32 wget-1.11.4-1) in Windows 8 and using it for a helpdesk tool called kayako, telling it to poll from an email queue. The command line looks like this:
wget.exe -O null --timeout 25 http://xxx.kayako.com/cron/index.php?/Parser/ParserMinute/POP3IMAP
I know it takes around 20 seconds to receive a response from the server in my particular case when using a browser with the url in the command line above. However, when using that command, it returns almost immediately. This is an excerpt of the output:
Connecting to xxx.kayako.com[xxx.xxx.xxx.xxx]:80... connected. HTTP
request sent, awaiting response... 200 OK Length: unspecified
[text/html]
I would like to know what would be the difference between the two cases and how could I get wget to behave in the same way as the computer (I know it doesn't because kayako is not polling from the email queue).
There are a number of potential variables, but one of the more common distinctions made by web servers is based on the user agent string you are reporting. By default, wget will identify itself truthfully as wget. If this is an issue, you can use the --user-agent= option to change the user agent string.
For example, you could identify as Firefox on 64-bit Windows with something like --user-agent="Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0".

Using Fiddler to debug the Windows Phone 7 emulator

I recently started using the updated beta tools for Windows Phone 7 and ran into an interesting problem. It seems that with Fiddler running, any Http requests run through the emulator start returning a null result and create a "not found" web exception. This is easy to reproduce with WebClient.DownloadStringAsync(). The old versions of the emulator did work with Fiddler if I remember correctly. Has anyone had luck getting the two to work together? If it's not possible I'd be open to any other tool that could help debug web requests from the WP7 emulator.
It looks like there is a blog post that describes getting fiddler working with Win Phone 7 through some customized rules for setting up Fiddler as a Reverse Proxy.
Here is a little bit of the instructions from the fiddler website, but the blog post seems a little clearer (sorry for wacky format, the block quote is not cooperating):
Option #1: Configure Fiddler as a
Reverse-Proxy Fiddler can be
configured so that any traffic sent to
http://127.0.0.1:8888 is automatically
sent to a different port on the same
machine. To set this configuration:
Start REGEDIT Create a new DWORD named
ReverseProxyForPort inside
HKCU\SOFTWARE\Microsoft\Fiddler
Set the DWORD to the local port you'd like
to re-route inbound traffic to
(generally port 80 for a standard HTTP
server) Restart Fiddler Navigate your
browser to http://127.0.0.1:8888
Option #2: Write a FiddlerScript rule
Alternatively, you can write a rule
that does the same thing.
Say you're running a website on port
80 of a machine named WEBSERVER.
You're connecting to the website using
Internet Explorer Mobile Edition on a
Windows SmartPhone device for which
you cannot configure the web proxy.
You want to capture the traffic from
the phone and the server's response.
Start Fiddler on the WEBSERVER
machine, running on the default port
of 8888. Click Tools | Fiddler
Options, and ensure the "Allow remote
clients to connect" checkbox is
checked. Restart if needed. Choose
Rules | Customize Rules. Inside the
OnBeforeRequest handler, add a new
line of code: if
(oSession.host.toLowerCase() ==
"webserver:8888") oSession.host =
"webserver:80"; On the SmartPhone,
navigate to http://webserver:8888
Requests from the SmartPhone will
appear in Fiddler. The requests are
forwarded from port 8888 to port 80
where the webserver is running. The
responses are sent back through
Fiddler to the SmartPhone, which has
no idea that the content originally
came from port 80.
I'm not able to get Fiddler to monitor the traffic, so I use WireShark, which works fine.

Resources