wGet using sed output in POST request - bash

Fairly new to the world of UNIX and trying to get my head round its quirks.
I am trying to make a fairly simple shell script that uses wGet to send a XML file that has been pre-processed with Sed .
I thought about using a pipe but it caused some weird behaviour where it just outputted my XML into the console.
This is what I have so far:
File_Name=$1
echo "File name being sent to KCIM is : " $1
wget "http://testserver.com" --post-file `sed s/XXXX/$File_Name/ < template.xml` |
--header="Content-Type:text/xml"
From the output I can see I am not doing this right as its creating a badly formatted HTTP request
POST data file <?xml' missing: No such file or directory
Resolving <... failed: node name or service name not known.
wget: unable to resolve host address<'
Bonus points for explaining what the problem is as well as solution

For wget the option post-file sends the contents of the named file. In your case you seem to be passing directly the data so you probably want --post-data.
The way you are doing it right now, bash gets the output from sed and wget gets something like:
wget ... --post-file <?xml stuff stuff stuff
So wget goes looking for a file called <?xml instead of using that text verbatim.

Related

WGET saves with wrong file and extension name possibly due to BASH

I`ve tried this on a few forum threads already.
However I keep on getting the some failure as a result.
To replicate the problem :
Here is an url leading to a forum thread with 6 pages.
http://forex.kbpauk.ru/showflat.php/Cat/0/Number/107623/page/0/fpart/1/vc/1
What I typed into the console was :
wget "http://forex.kbpauk.ru/showflat.php/Cat/0/Number/107623/page/0/fpart/{1..6}/vc/1"
And here is what I got:
--2018-06-14 10:44:17-- http://forex.kbpauk.ru/showflat.php/Cat/0/Number/107623/page/0/fpart/%7B1..6%7D/vc/1
Resolving forex.kbpauk.ru (forex.kbpauk.ru)... 185.68.152.1
Connecting to forex.kbpauk.ru (forex.kbpauk.ru)|185.68.152.1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: '1'
1 [ <=> ] 19.50K 58.7KB/s in 0.3s
2018-06-14 10:44:17 (58.7 KB/s) - '1' saved [19970]
The file was saved as simply "1" with no extension as it seems.
My expectations were that the file will be saved with an .html extension, because its a webpage.
Im trying to get WGET to work, but if its possible to do what I want with CURL than I would also accept that as an answer.
Well, there's a couple of issues with what you're trying to do.
The double quotes around your URL actually prevent Bash expansion, so you're not really downloading 6 files, but a single URL with "{1..6}" in it. You probably want to not have quotes around the URL to allow bash to expand it into 6 different parameters.
I notice that all of the pages are called "1", irrespective of their actual page numbers. This means the server is always serving a page with the same name, making it very hard for Wget or any other tool to actually make a copy of the webpage.
The real way to create a mirror of the forum would be to use this command line:
$ wget -m --no-parent -k --adjust-extension http://forex.kbpauk.ru/showflat.php/Cat/0/Number/107623/page/0/fpart/1
Let me explain what this command does:
-m --mirror activates the mirror mode (recursion)
--no-parent asks Wget to not go above the directory it starts from
-k --convert-links will edit the HTML pages you download so that the links in them will point to the other local pages you have also downloaded. This allows you to browse the forum pages locally without needing to be online
--adjust-extension This is the option you were originally looking for. It will cause Wget to save the file with a .html extension if it downloads a text/html file but the server did not provide an extension.
simply use the -O switch to specify the output filename, otherwise wget just defaults to something like in your case its 1
so if you wanted to call your file what-i-want-to-call-it.html then you would do
wget "http://forex.kbpauk.ru/showflat.php/Cat/0/Number/107623/page/0/fpart/{1..6}/vc/1" -o what-i-want-to-call-it.html
if you type into the console wget --help you will get a full list of all the options that wget provides
To verify it has worked type the following to output
cat what-i-want-to-call-it.html

How to combine url with filename from file

Text file (filename: listing.txt) with names of files as its contents:
ace.pdf
123.pdf
hello.pdf
Wanted to download these files from url http://www.myurl.com/
In bash, tried to merged these together and download the files using wget eg:
http://www.myurl.com/ace.pdf
http://www.myurl.com/123.pdf
http://www.myurl.com/hello.pdf
Tried variations of the following but without success:
for i in $(cat listing.txt); do wget http://www.myurl.com/$i; done
No need to use cat and loop. You can use xargs for this:
xargs -I {} wget http://www.myurl.com/{} < listing.txt
Actually, wget has options which can avoid loops & external programs completely.
-i file
--input-file=file
Read URLs from a local or external file. If - is specified as file, URLs are read from the standard input. (Use ./- to read from a file literally named -.)
If this function is used, no URLs need be present on the command line. If there are URLs both on the command line and in an input file, those on the command lines will be the first ones to be retrieved. If --force-html
is not specified, then file should consist of a series of URLs, one per line.
However, if you specify --force-html, the document will be regarded as html. In that case you may have problems with relative links, which you can solve either by adding "<base href="url">" to the documents or by
specifying --base=url on the command line.
If the file is an external one, the document will be automatically treated as html if the Content-Type matches text/html. Furthermore, the file's location will be implicitly used as base href if none was specified.
-B URL
--base=URL
Resolves relative links using URL as the point of reference, when reading links from an HTML file specified via the -i/--input-file option (together with --force-html, or when the input file was fetched remotely from a
server describing it as HTML). This is equivalent to the presence of a "BASE" tag in the HTML input file, with URL as the value for the "href" attribute.
For instance, if you specify http://foo/bar/a.html for URL, and Wget reads ../baz/b.html from the input file, it would be resolved to http://foo/baz/b.html.
Thus,
$ cat listing.txt
ace.pdf
123.pdf
hello.pdf
$ wget -B http://www.myurl.com/ -i listing.txt
This will download all the 3 files.

bash, getting the mime type of a file handle

How would i get the mime type of a file-handle without saving it to disk?
What i mean is a file that is not saved to disk, rather: i extracted it from an archive and plan on piping it to another script.
Say i extracted the file like this:
tar -xOzf images.tar.gz images/logo.jpg | myscript
Now inside myscript I would like to check the mime type of the file before further processing it. How would it go about this?
as some people think my comment above is helpful i post it as an answer.
the file-command is able to determine a file's mime type on the fly/when being piped. It ist able to read a file from stdin - printing the file's --mime-type briefly/in a short manner when passing -b. Considering your example you probably want to extract a single file from an archive and dertermine its file/mime type.
$ tar -xOzf foo.tar.gz file_in_archive.txt | file -b --mime-type -
text/plain
so for a simple text file extracted from an archive to stdout it could look like the example above. hope that helped. regards

Linux Scripting

Can you give me a sample on how to filter a certain keyword like for example "error" in the /var/log/messages and then send email if it finds real-time word for error.
I would just like to watch for error keyword in the /var/log/messages and then send it to my email address.
simply grepit.
tail -f log.log | grep error
This will list you all error you can then mail them
What you can do is this:
On a regular basis (which you decide), you:
copy the main file to another file
you DIFF on that file, only taking out the newly added parts (if the file is sequentially written, this will be a nice and clean block of lines, at the end of the file)
you copy the main file to the other file, again (this sets the new reference for the next check)
then you GREP on whatever you want, in the block of lines you've found 2 steps back
you report the found lines, using the wanted method (mail,..)

using wget to log redirected URLs shell script

I was trying to find a way of using wget to log a the list of redirected website URLs into one file.
For example:
www.website.com/1234 now redirects to www.newsite.com/a2as4sdf6nonsense
and
www.website.com/1235 now redirects to www.newsite.com/ab6haq7ah8nonsense
Wget does output the redirect, but doesn't log the new location. I get this in the terminal:
HTTP request sent, awaiting response...301 moved permanently
Location: http.www.newsite.com/a2as4sdf6
...
I would just like to capture that new URL to a file.
I was using something like this:
for i in `seq 1 9999`; do
wget http://www.website.com/$i -O output.txt
done
But this outputs the sourcecode of each webpage to that file. I am trying to just retrieve only the redirect info. Also, I would like to add a new line to the same output file each time it retrieves a new URL.
I would like the output to look something like:
www.website.com/1234 www.newsite.com/a2as4sdf6nonsense
www.website.com/1235 www.newsite.com/ab6haq7ah8nonsense
...
It's not a perfect solution, but it works:
wget http://tinyurl.com/2tx --server-response -O /dev/null 2>&1 |\
awk '(NR==1){SRC=$3;} /^ Location: /{DEST=$2} END{ print SRC, DEST}'
wget is not a perfect tool for that. curl would be bit better.
This is how it works: we get url, but we redirect all output (page content) to /dev/null. We ask for server response http headers (to get Loaction header), then we pass it to awk.
Note, that there might be several redirections. I assumed you want the last one.
Awk gets the URL you asked for from the first line (NR==1) and destination URL from each Location header. At the end, we print both SRC and DESC as you wanted.

Resources