Why does curl -o output contain sequences like "^[[38;5;250m", when "surf" output looks fine? - bash

I want to output wttr.in in to a file with curl. The problem is that the output isn't how it would be when i just surf wttr.in.
What i did is:
curl wttr.in -o ~/wt.tex and curl wttr.in -o ~/wt
The output is like: <output>
It should be https://wttr.in.

I solved my self:
less -r -f -L wt.tex
-r controlls the binary characters
-f forces to open the the file with out asking.

Related

return filename after downloading file with curl

I would like to capture the filename to a variable of a file that is downloaded using curl. I am using the following flag to preserve the filename as in the below using --remote-name
My code:
file1=$(curl -O --remote-name 'https://url.com/download_file.tgz')
echo $file1
You can use the -w|--write-out switch of curl:
file1="$(curl -O --remote-name -s \
-w "%{filename_effective}" "https://url.com/download_file.tgz")"
echo "$file1"
file1=download_file.tgz
url=https://url.com/$file1 #encoding this might be necessary
curl -O --remote-name $url
echo $file1
If, to construct the URL, you need to know the filename that you want to download, then you don't need anything from curl to identify the file that it downloaded unless there is not a 1:1 relationship between the basename of the URL and file that was downloaded.

iterate through specific files using webHDFS in a bash script

I want to download specific files in a HDFS directory, with their names starting with "total_conn_data_". Since I've got many files I want to write a bash script.
Here's what I do:
myPatternFile="total_conn_data_*.csv"
for filename in `curl -i -X GET "https://knox.blabla/webhdfs/v1/path/to/the/directory/?OP=LISTSTATUS" -u username`; do
curl -i -X GET "https://knox.blabla/webhdfs/v1/path/to/the/directory/$filename?OP=OPEN" -u username -L -o "./data/$filename" -k;
done
But it does not work since curl -i -X GET "https://knox.blabla/webhdfs/v1/path/to/the/directory/?OP=LISTSTATUS" -u username is sending back a json text and not file names.
How should I do? Thanks
curl provides output in json format only. you will have to use other tools like jquery and sed to format that output and get the list of files.

wget to parse a webpage in shell

I am trying to extract URLS from a webpage using wget. I tried this
wget -r -l2 --reject=gif -O out.html www.google.com | sed -n 's/.*href="\([^"]*\).*/\1/p'
It is displaiyng FINISHED
Downloaded: 18,472 bytes in 1 files
But not displaying the weblinks. If I try to do it seperately
wget -r -l2 --reject=gif -O out.html www.google.com
sed -n 's/.*href="\([^"]*\).*/\1/p' < out.html
Output
http://www.google.com/intl/en/options/
/intl/en/policies/terms/
It is not displaying all the links
ttp://www.google.com
http://maps.google.com
https://play.google.com
http://www.youtube.com
http://news.google.com
https://mail.google.com
https://drive.google.com
http://www.google.com
http://www.google.com
http://www.google.com
https://www.google.com
https://plus.google.com
And more over I want to get links from 2nd level and more can any one give a solution for this
Thanks in advance
The -O file option captures the output of wget and writes it to the specified file, so there is no output going through the pipe to sed.
You can say -O - to direct wget output to standard output.
If you don't want to use grep, you can try
sed -n "/href/ s/.*href=['\"]\([^'\"]*\)['\"].*/\1/gp"

Shell script to batch download files using curl + cookie and merge those files

I have a list of urls to files that I want to download and join. Those can only be accessed when authenticated.
So first I call:
curl -c cookie.txt http://url.to.authenticate
Then I can download a file file1 using the cookie:
curl -b cookie.txt -O http://url.to.file1
At the end I would just use cat:
cat file1 file2 file3 ... > file_merged
I have 320 of those urls stored in a text file and want to create a script with these urls included in the script, so all I need is to copy the script to a remote computer and execute it.
I am not that good at shell scripting and would love it if someone could help me out a bit.
Maybe something a little more fail-proof than
#!/bin/sh
curl -c cookie.txt http://url.to.authenticate
curl -b cookie.txt -O http://url.to.file1
curl -b cookie.txt -O http://url.to.file2
curl -b cookie.txt -O http://url.to.file3
...
cat file1 file2 file3 ... file320 > file_merged
So, something like (if your list of files is stored in files.txt):
#!/bin/sh
curl -c cookie.txt http://url.to.authenticate
while read f; do
curl -b cookie.txt -O http://url.to."$f"
cat "$f" >> file_merged
rm -f "$f"
done < files.txt

How to download multiple URLs using wget using a single command?

I am using following command to download a single webpage with all its images and js using wget in Windows 7:
wget -E -H -k -K -p -e robots=off -P /Downloads/ http://www.vodafone.de/privat/tarife/red-smartphone-tarife.html
It is downloading the HTML as required, but when I tried to pass on a text file having a list of 3 URLs to download, it didn't give any output, below is the command I am using:
wget -E -H -k -K -p -e robots=off -P /Downloads/ -i ./list.txt -B 'http://'
I tried this also:
wget -E -H -k -K -p -e robots=off -P /Downloads/ -i ./list.txt
This text file had URLs http:// prepended in it.
list.txt contains list of 3 URLs which I need to download using a single command. Please help me in resolving this issue.
From man wget:
2 Invoking
By default, Wget is very simple to invoke. The basic syntax is:
wget [option]... [URL]...
So, just use multiple URLs:
wget URL1 URL2
Or using the links from comments:
$ cat list.txt
http://www.vodafone.de/privat/tarife/red-smartphone-tarife.html
http://www.verizonwireless.com/smartphones-2.shtml
http://www.att.com/shop/wireless/devices/smartphones.html
and your command line:
wget -E -H -k -K -p -e robots=off -P /Downloads/ -i ./list.txt
works as expected.
First create a text file with the URLs that you need to download.
eg: download.txt
download.txt will as below:
http://www.google.com
http://www.yahoo.com
then use the command wget -i download.txt to download the files. You can add many URLs to the text file.
If you have a list of URLs separated on multiple lines like this:
http://example.com/a
http://example.com/b
http://example.com/c
but you don't want to create a file and point wget to it, you can do this:
wget -i - <<< 'http://example.com/a
http://example.com/b
http://example.com/c'
pedantic version:
for x in {'url1','url2'}; do wget $x; done
the advantage of it you can treat is as a single wget url command

Resources