How can I open a list of URLs on Windows - windows

I'm looking for a way to open a list of URLs in all of my browsers ( Firefox, Chrome, and IE ) on Windows using a scriptable shell such as Powershell or Cygwin.
Ideally I should be able to type in a list of URLs as arguments to the command, i.e. `openUrl http://example.net http://example2.net http://example3.com...
I would also need this script to pass authentication info into the http header (encoded usename and password).

With chrome it's not hard.
$chrome = (gi ~\AppData\Local\Google\Chrome\Application\chrome.exe ).FullName
$urls = "stackoverflow.com","slate.com"
$urls | % { & $chrome $_ }

First, how to open URLs in PowerShell. In PowerShell open a URL is very simple, just use start
start http://your.url.com
I think you can simple use foreach to handle the list of URLs.
Second, pass authentication via URL. There is a standard way for HTTP based authentication. (not HTML form based). You could construct the URL like:
http://username:password#your.url.com
Again, it only works for HTTP based authentication.

Look at HKCR\http\shell\open\command how each browser handles urls. Then just use the normal methods to launch the browsers with appropriate urls.

Related

How to use multiple proxy when crawling with scrapy + splash?

We crawl with scrapy + splash and we want to use multiple proxy. But splash only support single proxy https://splash.readthedocs.io/en/stable/api.html#proxy-profiles.
[proxy]
; required
host=proxy.crawlera.com
port=8010
; optional, default is no auth
username=username
password=password
; optional, default is HTTP. Allowed values are HTTP and SOCKS5
type=HTTP
How to use multiple proxy when crawling with scrapy + splash?
There are several options:
use multiple profiles (as Rafael Almeida suggested in comment);
pass a different proxy URL with each request (see http://splash.readthedocs.io/en/stable/api.html#arg-proxy);
write a Splash Lua script and use request:set_proxy in splash:on_request callback - there is an example in docs. This way you can set a different proxy for different requests initialted by a page, not only a single proxy per rendered page. I'm not aware of a way to do that in other browser automation tools like phantomjs or selenium.

How do I copy cookies from Chrome?

I am using bash to to POST to a website that requires that I be logged in first. So I need to send the request with login cookie. So I tried logging in and keeping the cookies, but it doesn't work because the site uses javascript to hash the password in a really weird fashion, so instead I'm going to just take my login cookies for the site from Chrome. How do get the cookies from Chrome and format them for Curl?
I'm trying to do this:
curl --request POST -d "a=X&b=Y" -b "what goes here?" "site.com/a.php"
Hit F12 to open the developer console (Mac: Cmd+Opt+J)
Look at the Network tab.
Do whatever you need to on the web site to trigger the action you're interested in
Right click the relevant request, and select "Copy as cURL"
This will give you the curl command for the action you triggered, fully populated with cookies and all. You can of course also copy the flags as a basis for new curl commands.
In Chrome:
Open web developer tools (view -> developer -> developer tools)
Open the Application tab (on older versions, Resources)
Open the Cookies tree
Find the cookie you are interested in.
In the terminal
add --cookie "cookiename=cookievalue" to your curl request.
There's an even easier way to do this in Chrome/Chromium.
The open source Chrome extension
cookies.txt exports cookie data in a cookies.txt file, and generates an optional ready-made wget command.
*I have nothing to do with the extension, it just works really well.
Can't believe no one has mentioned this. Here's the easiest and quickest way.
Simply open up your browser's Developer Tools, click on the Console tab, and lastly within the console, simply type the following & press ENTER...
console.log(document.cookie)
The results will be immediately printed in proper format. Simply highlight it and copy it.
I was curious if others were reporting that chrome doesn't allow "copy as curl" feature to have cookies anymore.
It then occurred to me that this is like a security idea. If you visit example.com, copying requests as curl to example.com will have cookies. However, copying requests to other domains or subdomains will sanitize the cookies. a.example.com or test.com will not have cookies for example.
For anyone that wants all of the cookies for a site, but doesn't want to use an extension:
Open developer tools -> Application -> Cookies.
Select the first cookie in the list and hit Ctrl/Cmd-A
Copy all of the data in this table with Ctrl/Cmd-C
Now you have a TSV (tab-separated value) string of cookie data. You can process this in any language you want, but in Python (for example):
import io
import pandas as pd
cookie_str = """[paste cookie str here]"""
# Copied from the developer tools window.
cols = ['name', 'value', 'domain', 'path', 'max_age', 'size', 'http_only', 'secure', 'same_party', 'priority']
# Parse into a dataframe.
df = pd.read_csv(io.StringIO(cookie_str), sep='\t', names=cols, index_col=False)
Now you can export them in Netscape format:
# Fill in NaNs and format True/False for cookies.txt.
df = df.fillna(False).assign(flag=True).replace({True: 'TRUE', False: 'FALSE'})
# Get unix timestamp from max_age
max_age = (
df.max_age
.replace({'Session': np.nan})
.pipe(pd.to_datetime))
start = pd.Timestamp("1970-01-01", tz='UTC')
max_age = (
((max_age - start) // pd.Timedelta('1s'))
.fillna(0) # Session expiry are 0s
.astype(int)) # Floats end with ".0"
df = df.assign(max_age=max_age)
cookie_file_cols = ['domain', 'flag', 'path', 'secure', 'max_age', 'name', 'value']
with open('cookies.txt') as fh:
# Python's cookiejar wants this header.
fh.write('# Netscape HTTP Cookie File\n')
df[cookie_file_cols].to_csv(fh, sep='\t', index=False, header=False)
And finally, back to the shell:
# Get user agent from navigator.userAgent in devtools
wget -U $USER_AGENT --load-cookies cookies.txt $YOUR_URL

How can I rewrite URLs in the Zeus web server for Mobile useragent?

I need to redirect anyone with a mobile user agent to a file called mobile.php.
My web hosting provider, Net Registry uses the Zeus web server.
Here's the script I've written from my research
RULE_1_START:
# get the document root
map path into SCRATCH:DOCROOT from /
match IN:User-Agent into $ with iPad|iPod|iPhone|Android|s+Mobile
if matched then
set OUT:Location = /mobile.php
endif
RULE_1_END:
I used the instructions on my host's site.
I pasted that into their console and it has worked to do redirects. Net registry have some odd console thing that you submit and it takes 10 minutes to update the zeus server config (annoying as hell).
Anyway my issue is that it redirects me to the wrong location:
So if you visit the site, with a user agent string that contains ipad|ipod|android|\s+mobile then you will trigger it ()
It takes me here:
http://example.com.au/mobile.php,%20catalog/index.php
I can't work out how to fix that, or why that happens because at the moment this file exists:
http://example.com.au/mobile.php
as does this one:
http://example.com.au/index.php. Contents of this file are:
<?php header("Location: catalog/index.php");
Any ideas on how I can make this work more like an apache .htaccess url Rewrite?
the official Zeus documentation
Fixed it by changing
set OUT:Location = /mobile.php
to
set URL = /mobile.php
From the manual...
Using Request Rewrite Scripts
To use the request rewriting functionality, create a script in the Zeus Request
Rewrite Scripting Language. The script contains instructions telling the
Virtual Server how to change the URL or headers of requests that match specified criteria.
The Virtual Server compiles the script, and (if the rewrite functionality is
enabled) uses it every time it receives a request. It runs the commands in the
script, changing the URL if it matches the specified criteria. Once the script is
finished, the Virtual Server continues processing the resulting URL.Zeus Web Server 4.3 User Guide
142 Configuring URL Handling
When changing the URL, the rewrite functionality can only change the local
part of it, that is, the part of the URL after the host name. For example, if a
user requests http://www.myhost.com/sales/uk.html, the rewrite
functionality can only make changes to /sales/uk.html. This means that
you cannot use the rewrite functionality to change the request to refer to a
file on another Virtual Server.
For example, the following script illustrates how to change requests for any
HTML files in the /sales directory so that the user receives them from the
/newsales directory instead:
match URL into $ with ^/sales/(.).html
if matched set URL=/newsales/$1.html
The rewrite functionality can also change the HTTP headers that were received
with a request, and create new HTTP headers to be returned to the user. For
example, the following script changes the HTTP host header, so that a request
for www.mysite.com/subserver is redirected to the Subserver
www.subserver.mysite.com:
match URL into $ with ^/([^/]+)/(.)$
if matched then
set IN:Host = www.$1.mysite.com
set URL = /$2
endif

How do I find out what my external IP address is?

My computers are sitting behind a router/firewall. How do I programmatically find out what my external IP address is. I can use http://www.whatsmyip.org/ for ad-hoc queries, but the TOS don't allow for automated checks.
Any ideas?
http://ipecho.net/plain appears to be a
workable alternative, as whatismyip.com now requires membership for
their automated link. They very kindly appear to be
offering this service for free,
so please don't abuse it.
Unfortunately there is no easy way to do it.
I would use a site like www.whatsmyip.org and parse the output.
checkip.dyndns.com returns a very simple HTML file which looks like this:
<html>
<head>
<title>Current IP Check</title>
</head>
<body>
Current IP Address: 84.151.156.163
</body>
</html>
This should be very easy to parse.
Moreover the site is exists for about ten years.
There is hope that it will be around for a while.
If you have access to a webserver with modphp, you can roll your own:
<?php print $_SERVER['REMOTE_ADDR']; ?>
If you don't want that to get abused, you'll have to keep it secret or add request limits.
I've been using one on my server for years.
Explicitly:
Create a file called whatismyip.php in your public_html folder in your website. It can be called anything and be anywhere in your webroot.
Add the line above, and then query your server:
curl http://example.com/whatismyip.php
for example.
Unfortunately as of 2013, whatismyip.com charge for the service.
http://www.icanhazip.com is still going strong, 3 years later. Just outputs the IP as text, absolutely nothing else.
http://checkip.dyndns.org still works as well.
You can also use Google if you want to be sure it won't go down, but it can still block you for TOS violations.
https://www.google.ie/search?q=whats+is+my+ip
But even when they block me, they still tell me my client IP address in the error message.
curl ifconfig.me
or
curl ifconfig.me/ip
Incase you don't have curl installed,
wget ifconfig.me/ip 2>/dev/null && cat ip
Hope this helps.
If the router you are behind speak UPnP you could always use a UPnP library for whatever language you are developing in to query the router for its external ip.
http://myexternalip.com provides this kind of information. To
retrieve your IP you have plenty of options:
http://myexternalip.com/ - browser + lot's of examples of how to use it
http://myexternalip.com/raw - a pure text answer, only your ip, no other crap
http://myexternalip.com/json - a resposnse ready for json-parsers, also supports jsonp
HEAD http://myexternalip.com - send only a HEAD-request and get the answer
Since this question was asked a while back, there's now a freely available web service designed specifically to allow you to determine your IP address programmatically, called ipify.
$ curl 'https://api.ipify.org?format=json'
Results in
{"ip": "1.2.3.4" /* your public IP */}
Another way is if you have access to a cloud email (yahoo, google, hotmail), send yourself an email. Then view the headers and you should see your IP address in there.
I would look up the exact area but the headers may vary from each implmentation, Look for the received-by and follow that until you get to something that looks like sent-by
EDIT: This answers the how to find IP address, not the via PROGRAMMATIC approach
My WRT54G router tells me through its Local Router Access feature (the http(s) administration interface), and I imagine something similar could be done with many other devices. In this case, the entry page gives the octets of the IPv4 address in four lines containing this phrase:
class=num maxLength=3 size=3 value='i' name='wan_ipaddr_N' id='wan_ipaddr_N'
Where i is the octet value and N is the octet number. This bit of doggerel fetches and parses it for me, courtesy of cygwin:
#! /usr/bin/env perl
use strict;
use warnings 'all';
my( $account, $password ) = #ARGV;
open QUERY,
"curl --sslv3 --user '$account:$password' https://Linksys/ --silent |"
or die "Failed to connect to router";
my #ipaddr = ('x','x','x','x');
while( <QUERY> ) {
$ipaddr[$2] = $1 if /value='(\d+)' name='wan_ipaddr_([0-3])/;
}
close QUERY;
print join('.', #ipaddr);
There is no guarantee that this will work with all versions of the router firmware.
If your router is set to use http for this interface, drop the --sslv3 curl option, and you can use dotted-decimal notation to address the router. To use https with the curl options above, I also did this:
Used a browser to fetch the router's self-signed certificate (saved as Linksys.crt).
Added it to my CA bundle:
openssl x509 -in Linksys.crt -text >> /usr/ssl/certs/ca-bundle.crt
Added 'Linksys' to my hosts file (C:\Windows\System32\Drivers\etc\HOSTS on my Win8 box), as an alias for the router's address. If the dotted-decimal notation is given to curl instead of this alias, it rejects the connection on account of a certificate subject name mismatch.
Alternatively, you could just use the --insecure option to bypass certificate verification, which probably makes more sense in the circumstances.
whatismyip.com or ipchicken.com are very easy to parse.
If you have a webhost or vps you can also determine it, without fear of it randomly going down leaving you stuck.
ifcfg.me
allows Lookup via
nslookup
telnet
ftp
and http
even works with IPv6
Simple but not elegant for this use. I created a VBS file with the following code to drop the result to dropbox and google drive ... have to delete the file for new one to sync though for some reason.
This runs on a PC at my home. My PC is set to resume on power outage and a task is scheduled to run this every day once (note if you have it run often, the site will block your requests).
Now I can get my IP address on the road and watch people steal my stuff :-)
get_html "http://ipecho.net/plain", "C:\Users\joe\Google Drive\IP.html"
get_html "http://ipecho.net/plain", "C:\Users\joe\Dropbox\IP.html"
sub get_html (up_http, down_http)
dim xmlhttp : set xmlhttp = createobject("msxml2.xmlhttp.3.0")
xmlhttp.open "get", up_http, false
xmlhttp.send
dim fso : set fso = createobject ("scripting.filesystemobject")
dim newfile : set newfile = fso.createtextfile(down_http, true)
newfile.write (xmlhttp.responseText)
newfile.close
set newfile = nothing
set xmlhttp = nothing
end sub

HTTP request with custom headers fields using Windows Scripts

Is it possible to perform a HTTP Request with specific header fields (like 'referer', 'cookies' or 'User-Agent') using Windows Script Host or any other Windows scripting technology??
Thanks.
Yes (VBScript) :-
Dim oWinHTTP
Set oWinHTTP = CreateObject("WinHttp.WinHttpRequest.5.1")
oWinHTTP.Open "GET", "http://remoteserver/thing.ext", False
oWinHTTP.SetRequestHeader "User-Agent", "My Agent String"
oWinHTTP.Send
Using WinHttp gives you the greatest level of control, you can use MSXML2.ServerXMLHTTP.3.0 if you want more effeciently access any XML DOM sent. Using the standard MSXML2.XMLHTTP.3.0 component goes throught WinINet at gives you the users proxy settings and cookie store etc, but reduces your control over the conversation.
Can we set the request header like Cache-control : 'max-age =10000' using jsp / java?
Actually I want to see the previous page with old data on click of browser's back button only if user comes back to this page with in a specified time , say 10 minutes.
Thanks,
Anurag

Resources