Using wget and getting a different outcome than when using a browser - windows

I am using wget for windows (gnuwin32 wget-1.11.4-1) in Windows 8 and using it for a helpdesk tool called kayako, telling it to poll from an email queue. The command line looks like this:
wget.exe -O null --timeout 25 http://xxx.kayako.com/cron/index.php?/Parser/ParserMinute/POP3IMAP
I know it takes around 20 seconds to receive a response from the server in my particular case when using a browser with the url in the command line above. However, when using that command, it returns almost immediately. This is an excerpt of the output:
Connecting to xxx.kayako.com[xxx.xxx.xxx.xxx]:80... connected. HTTP
request sent, awaiting response... 200 OK Length: unspecified
[text/html]
I would like to know what would be the difference between the two cases and how could I get wget to behave in the same way as the computer (I know it doesn't because kayako is not polling from the email queue).

There are a number of potential variables, but one of the more common distinctions made by web servers is based on the user agent string you are reporting. By default, wget will identify itself truthfully as wget. If this is an issue, you can use the --user-agent= option to change the user agent string.
For example, you could identify as Firefox on 64-bit Windows with something like --user-agent="Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0".

Related

Using random user agent vs proxy in scraping?

I am recently working on web scraping.
I found that we can use proxy or random user agents to stay away from anti - scraping detection's.
Is there any difference between proxy and random user agents?
Because I got confused when I understood that both are used to hide the original client request identity.
If m understanding is wrong please let me know
Useragent and proxy are totally different concepts
1) Useragents : The useragent will be sent to the targeted website through headers
When I send a request to stackoverflow, my useragent is :
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0
It says I'm using mozilla and linux + other infos. Everyone using same browser (firefox 5.0) on linux will have the same useragent.
This library will help you find the most common useragents used on the web so that your useragent looks anonymous : https://github.com/Lobstrio/shadow-useragent
2) Proxy
A proxy will let you hide your ip adress behind a proxy. The website you target will receive the ip address of the proxy rather than yours. If you ip get blocked by the website, then using a proxy would normally unlock the website.
There can be many reasons why you can be blocked during scraping but rotating ip and useragents can be effective in some cases

Gradle extremely slow HTTP resource access on Windows. How to diagnose and fix?

Gradle 2.2 takes hours to build a project on a PC that takes 8 minutes on Linux. When run with –debug, on the slow machine, gradle reports no errors, but it stops and waits for approx. 2 minutes at every resource, after every User-Agent line:
18:39:15.819 [DEBUG] [org.apache.http.headers] >> User-Agent: Gradle/2.0 (Windows 7;6.1;amd64) (Oracle Corporation;1.7.0_67;24.65-b04)
<2 min. delay>
18:41:15.527 [DEBUG] [org.apache.http.impl.conn.DefaultClientConnection] Receiving response: HTTP/1.1 200 OK
18:41:15.527 [DEBUG] [org.apache.http.headers] << HTTP/1.1 200 OK
Linux workstations on the same subnet (behind the same firewall and using the same squid proxy) do not have this delay.
An Extended snip from Windows is here.
Snip from Linux build around same point in build.
This seems to have been a VERY STRANGE issue with a transparent http proxy and DansGuardian web filter. For still unknown reasons, this one PC’s http traffic got mangled.
This is odd, because our entire LAN’s http traffic to the internet is content filtered. There was a filtering exception that allowed any traffic from this slow pc to be unfiltered. But that had the opposite effect as expected. Gradle traffic became crazy slow on the ‘unfiltered’ PC, while content-filtered workstations had no problems. Even stranger, Gradle also ran at normal speed on unfiltered Linux workstations.
The workaround was to configure IPTables and the transparent proxy to completely ignore the slow pc's http traffic. So now it is unfiltered, and unproxied. It has been nicknamed the pornstation.
It happened to us as well, though in our case it was caused by the AntiVirus on the PC (Nod32 not to name it).
We had to completely disable the HTTP/web filters on it.
May not be your case, but may help others coming here for advice.

Cant send email with bash script

The terminal does not pop out any error message, but I never receive the email.
this is my code:
mail -s "hello" "example#example.com" <<EOF
hello
world
EOF
Works fine for me:
pax> mail -s "hello" "pax" <<EOF
hi there
EOF
pax> mailx
Mail version 8.1.2 01/15/2001. Type ? for help.
"/var/mail/pax": 1 message 1 new
>N 1 pax#paxbox.com Sat Jun 14 10:25 16/629 hello
& _
You should try it with a local address first (as I have) to see if a mail is being created.
Beyond that, you should realise that mail simply adds mail messages into the mail system. If you want to find out what happens after that, you'll need to look into whatever MTAs (mail transfer agents) you have set up on your system.
If the MTA itself fails, you'll almost certainly get a mail back to the sending account stating so (you can use mailx as I have above, to discover this).
Since you haven't specified your systems, I'll give advice below based on Debian since that's what I'm used to.
On my Debian box, exim is the MTA but, by default, it does not support sending to remote domains. You can modify this by running:
sudo dpkg-reconfigure exim4-config
but you need to be careful not to relay emails lest you unknowingly become a spam-bot. More details can be found here.
You may find, if you want them to go to the outside world, that it's better to send them to your ISP via SMTP rather than trying to configure mail on your local box to do it.
However, if you want to go the mail route, simply run dpkg-reconfigure as above, select "Internet site; mail is sent and received directly using SMTP" as the answer to the first question, then accept defaults for all the other questions (checking to ensure you only accept mail from your local addresses 127.0.0.1 and ::1).
Then wait for exim to restart and try send the mail again.
Just be aware that exim typically starts queue runners (the processes that actually send out your email) on a schedule (30 minutes for me) so it may take some time for the message to go out.
You can examine the files in /var/log/exim4 to see what's happening (such as, in my case, my ISP rejecting the attempt since it knows nothing about pax#paxbox.com but you may be able to find an open SMTP relay somewhere or spoof your sending details to something your ISP will allow).

Scapy Windows sr1() not being answered

I'm using the Scapy library for Python 2.6 on Windows 7 in order to see if I can spoof my IP address (for non-malicious purposes, I'm curious how it works).
When I use the sr1() function, it sends the packet, but it gets nothing in return. I have to interrupt it manually using CTRL+C in order for it to stop receiving packets that are not an answer.
I've tried to use both Python-Scapy or the like-How can I create an HTTP GET request at the packet level and Scapy: no reply on raw ICMP packet, with no luck.
I have tried tracking it in Wireshark, but nothing showed up.
I know Scapy is not made for Windows, so that could be the issue. If so, I can get a Linux environment instead.
NOTE: Note that I am running this through the console version of Scapy, but I found the exact same results running it through Python scripts.
IP(dst="www.google.com")/TCP(dport=80,flags="S")
This TCP packet's sport will automatically set to be 20 (www-data) in Scapy.
And what is worse, www.google.com does not reply from the sport=20 (www-data)
try
IP(dst="www.google.com")/TCP(sport=65000,dport=80,flags="S")

Windows HTTP issue

I use HTTP protocol for send binary data to server (PUT request and Content-Type: application/octet-stream). Until recently time this worked fine.
But now I getting 504 http error on Windows (I try it on several Windows mashines).
I try do all what imagine, so correct this behavior. When I catch request through Fiddler. I see what full request has sended to server, but server not respond.
I send absolutly same request from Linux machine, so it still works fine as a before.
In addition I notice, what Windows mashine works correctly (PUT request and Content-Type: application/octet-stream) when in the body request contains only literal characters.
Any idea, what I do with this? Is it known issue?
I solve this issue. Problem wasn't on Windows.
This has came with server kernel version (kernel 3.0.23).
When I update kernel (Ubuntu server from 3.0.23 to 3.0.24 version), it was working again.
That most likely problem in recognizing TCP packages.

Resources