Get the source code of website using lynx - bash

As I can access the source code with lynx, w3m, links, etc. protected with a form.
lynx -source -auth=user:pass domain.com
lynx -source -accept_all_cookies -auth=user:pass domain.com
lynx -accept_all_cookies -auth=user:pass domain.com
all fail me.
thx.

What about:
lynx --source -accept_all_cookies -auth=user:pass "domain.com"
The -- and the semicolon play the role for me sometimes.

If there is a logon form before the page, you cannot pass it with lynx or similar applications
you should actually write some scripts. use something like Mechanize module either on Perl or Python!
somthing like this:
import mechanize
browser = mechanize.Browser()
browser.open(YOUR URL)
browser.select_form(nr = 0)
browser.form['username'] = USERNAME
browser.form['password'] = PASSWORD
browser.submit()

Related

Cannot download a file on OneDrive programmatically from Japan?

I made a script that downloads several files located in my professional OneDrive. This script works perfectly from a French computer, a US computer but it can't work from a Japanese computer.
To permit you understand the problem, I will detail the program:
1- I establish the token system (I got inspired by Jay Lee detailed answer) and retrieve the token in the access_token variable.
2- To download the file, in my case I cannot use
curl -w %{time_total} https://graph.microsoft.com/v1.0/me/drive/items/01M...WU/content -H "Authorization: Bearer $access_token"
Thus, this how I proceed:
#I get the item properties
itemProperties=$(curl ${ODf1Mb} -H "Authorization: Bearer $access_token")
#In these properties I select the downloadUrl that will permit me to download the file
downloadUrl=$(echo -e "$itemProperties" | grep "#microsoft.graph.downloadUrl" | awk -F'[",]' '{ print $9 }')
#Finally I execute this URL storing the download time in a variable (I do all this stuff for this)
dload=$(curl -w %{time_total} ${downloadUrl} -H "Authorization: Bearer $access_token")
As I said at the begin, for French and US computers it will work but on the Japanese machine it doesn't. I do get the itemProperties and the downloadUrl but when I call the downloadUrl with CURL it seems that it cannot reach the server because I have this:
As we can see we do not even have the Total weight to be downloaded. As an element of comparison, this is the result in a French machine:
I know, there is a warning relating to command substitution but I haven't tried to fix it yet because it makes its job.
Note -> the downloadUrl has this format:
https://lpl-my.sharepoint.com/personal/{user}_{company infra domain}_com/_layouts/15/download.aspx?
I just cannot figure out what is the problem. I can access to the https://lpl-my.sharepoint.com through the browser so I don't think the server IP is banned.
Check your ping / traceroute to see if lpl-my.sharepoint.com resolves to the same network location.
Also, I have seen other folks run curl with -v to see verbose traces and see if what the difference is.

Proxy for Ruby HTTP traffic

I have a ruby script, that posts data to a URL:
require 'httparty'
data = {}
data[:client_id] = '123123'
data[:key] = '123321'
url = "http://someserver.com/endpoint/"
response = HTTParty.post(url, :body => data)
Now i am using Charles for sniffing the HTTP traffic. This works great from the browser, but not from the terminal, where I run my script:
$ ruby MyScript.rb
How can I tell ruby or my Terminal.app to use the Charles proxy at http://localhost:88888
Update Another solution would be to see the request before it is being sent. So that I would not necessarily need the proxy.
Setting the proxy as timmah suggested should work.
Anyway 88888 is not a valid port! I think you want to use 8888 (Charles proxy default port).
So the right commands would be:
export http_proxy=localhost:8888
ruby MyScript.rb
If your script was to use https:// you would also/instead need to specify a HTTPS proxy like so:
export https_proxy=localhost:8888

Nginx - Password Protect Directory

I want to password protect my entire site. I am running Debian Squeeze. Say I want my username to be "Jane" and my password to be "V3RySEcRe7".
In my app-nginx.conf:
auth_basic "Restricted";
auth_basic_user_file /etc/nginx/htpasswd;
In my shell script I have this:
printf "Jane:$(openssl passwd -1 V3RySEcRe7)\n" >> /etc/nginx/htpasswd
When I go to my site it is password protected, but the credentials I use don't work. Where am I going wrong here?
I'm sure you'd have fixed this by now, but thought I'd add this for others:
The Nginx documentation is a little cryptic on this, but does mention the "Apache variant of the MD5-based password algorithm (apr1)" should be used to generate the password hash. So using the -apr1 flag instead of -1 will work:
printf "Jane:$(openssl passwd -apr1 V3RySEcRe7)\n" >> /etc/nginx/htpasswd

How do I execute an HTTP PUT in bash?

I'm sending requests to a third-party API. It says I must send an HTTP PUT to http://example.com/project?id=projectId
I tried doing this with PHP curl, but I'm not getting a response from the server. Maybe something is wrong with my code because I've never used PUT before. Is there a way for me to execute an HTTP PUT from bash command line? If so, what is the command?
With curl it would be something like
curl --request PUT --header "Content-Length: 0" http://website.com/project?id=1
but like Mattias said you'd probably want some data in the body as well so you'd want the content-type and the data as well (plus content-length would be larger)
If you really want to only use bash it actually has some networking support.
echo -e "PUT /project?id=123 HTTP/1.1\r\nHost: website.com\r\n\r\n" > \
/dev/tcp/website.com/80
But I guess you also want to send some data in the body?
Like Mattias suggested, Bash can do the job without further tools. If you want to send data, you have to preset at least "Content-length". With variables "host", "port", "resource" and "data" defined, you can do a HTTP put with
echo -e "PUT /$resource HTTP/1.1\r\nHost: $host:$port\r\nContent-Length: ${#data}\r\n\r\n$data\r\n" > /dev/tcp/$host/$port
I tested this with a Rest API and it workes fine.

How to scrape a _private_ google group?

I'd like to scrape the discussion list of a private google group. It's a multi-page list and I might have to this later again so scripting sounds like the way to go.
Since this is a private group, I need to login in my google account first.
Unfortunately I can't manage to login using wget or ruby Net::HTTP. Surprisingly google groups is not accessible with the Client Login interface, so all the code samples are useless.
My ruby script is embedded at the end of the post. The response to the authentication query is a 200-OK but no cookies in the response headers and the body contains the message "Your browser's cookie functionality is turned off. Please turn it on."
I got the same output with wget. See the bash script at the end of this message.
I don't know how to workaround this. am I missing something? Any idea?
Thanks in advance.
John
Here is the ruby script:
# a ruby script
require 'net/https'
http = Net::HTTP.new('www.google.com', 443)
http.use_ssl = true
path = '/accounts/ServiceLoginAuth'
email='john#gmail.com'
password='topsecret'
# form inputs from the login page
data = "Email=#{email}&Passwd=#{password}&dsh=7379491738180116079&GALX=irvvmW0Z-zI"
headers = { 'Content-Type' => 'application/x-www-form-urlencoded',
'user-agent' => "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/6.0"}
# Post the request and print out the response to retrieve our authentication token
resp, data = http.post(path, data, headers)
puts resp
resp.each {|h, v| puts h+'='+v}
#warning: peer certificate won't be verified in this SSL session
Here is the bash script:
# A bash script for wget
CMD=""
CMD="$CMD --keep-session-cookies --save-cookies cookies.tmp"
CMD="$CMD --no-check-certificate"
CMD="$CMD --post-data='Email=john#gmail.com&Passwd=topsecret&dsh=-8408553335275857936&GALX=irvvmW0Z-zI'"
CMD="$CMD --user-agent='Mozilla'"
CMD="$CMD https://www.google.com/accounts/ServiceLoginAuth"
echo $CMD
wget $CMD
wget --load-cookies="cookies.tmp" http://groups.google.com/group/mygroup/topics?tsc=2
Have you tried with mechanize for ruby?
Mechanize library is used for automating interaction with website; you could log in to google and browse your private google group saving what you need.
Here an example where mechanize is used for gmail scraping.
I did this previously by logging in manually with Firefox and then used Chickenfoot to automate browsing and scraping.
Found this PHP Solution to scraping private Google Groups.

Resources