curl - know i reached html file from header - bash

I send curl cmd from shell with --head flag:
curl -k --head http://www.something.com/whatever
is there any way to know from answer headers if this link contains html file can be shown in browser or another type of downloading file (pdf, doc, txt, etc).
Thanks.

The response header should contain a Content-Type field,
for HTML files it should be:
Content-Type: text/html
See also the list of known MIME-types.

Related

How to download a private repository from Gitlab without exposing the access token in the url?

I want to download a private repository from Gitlab using curl command (unfortunately I am obliged to do so). I know that the following works:
curl 'https://gitlab.com/api/v4/projects/2222/repository/archive?private_token=hlpat-21321' > your_project.tar.gz
where 21321 is the project id and hlpat-21321 is the access token (I put them random).
I want to do the same thing but without exposing the access token directly. An idea would be to use stdin,meaning taking the token as an input from the user in the command line. How can I do it ?
Quoting curl man pages
Starting in 7.55.0, this option can take an argument in #filename
style, which then adds a header for each line in the input file. Using
#- will make curl read the header file from stdin.
use it as header instead and pass the header from stdin
curl --header -# ...url... <<< "PRIVATE-TOKEN: private_token"
This way you dont need to pass to url

Sejda-Console HTML to PDF Conversion

I am attempting to convert HTML documents to PDF format using a bash script. I've found that the Sejda converter does a good job of fully rendering the charts I need, but am having some trouble using it in the console rather than the web interface. Although the documentation at https://www.sejda.com/developers gives an example of how to convert a URL, does anyone know of a similar way to convert a local file in the console?
The HTML to PDF conversion is not available via the sejda-console.
However, you can convert a local file through the sejda.com API, not only URLs, by posting the file's HTML contents.
Here's an example converting HTML code from the command line:
curl -i https://api.sejda.com/v1/tasks\
--fail --silent --show-error \
--header "Content-Type: application/json" \
--data '{"htmlCode": "<strong>HTML<\/strong> code here",
"type": "htmlToPdf" }' > converted.pdf
Disclaimer: I'm one of the developers.

WGET saves with wrong file and extension name possibly due to BASH

I`ve tried this on a few forum threads already.
However I keep on getting the some failure as a result.
To replicate the problem :
Here is an url leading to a forum thread with 6 pages.
http://forex.kbpauk.ru/showflat.php/Cat/0/Number/107623/page/0/fpart/1/vc/1
What I typed into the console was :
wget "http://forex.kbpauk.ru/showflat.php/Cat/0/Number/107623/page/0/fpart/{1..6}/vc/1"
And here is what I got:
--2018-06-14 10:44:17-- http://forex.kbpauk.ru/showflat.php/Cat/0/Number/107623/page/0/fpart/%7B1..6%7D/vc/1
Resolving forex.kbpauk.ru (forex.kbpauk.ru)... 185.68.152.1
Connecting to forex.kbpauk.ru (forex.kbpauk.ru)|185.68.152.1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: '1'
1 [ <=> ] 19.50K 58.7KB/s in 0.3s
2018-06-14 10:44:17 (58.7 KB/s) - '1' saved [19970]
The file was saved as simply "1" with no extension as it seems.
My expectations were that the file will be saved with an .html extension, because its a webpage.
Im trying to get WGET to work, but if its possible to do what I want with CURL than I would also accept that as an answer.
Well, there's a couple of issues with what you're trying to do.
The double quotes around your URL actually prevent Bash expansion, so you're not really downloading 6 files, but a single URL with "{1..6}" in it. You probably want to not have quotes around the URL to allow bash to expand it into 6 different parameters.
I notice that all of the pages are called "1", irrespective of their actual page numbers. This means the server is always serving a page with the same name, making it very hard for Wget or any other tool to actually make a copy of the webpage.
The real way to create a mirror of the forum would be to use this command line:
$ wget -m --no-parent -k --adjust-extension http://forex.kbpauk.ru/showflat.php/Cat/0/Number/107623/page/0/fpart/1
Let me explain what this command does:
-m --mirror activates the mirror mode (recursion)
--no-parent asks Wget to not go above the directory it starts from
-k --convert-links will edit the HTML pages you download so that the links in them will point to the other local pages you have also downloaded. This allows you to browse the forum pages locally without needing to be online
--adjust-extension This is the option you were originally looking for. It will cause Wget to save the file with a .html extension if it downloads a text/html file but the server did not provide an extension.
simply use the -O switch to specify the output filename, otherwise wget just defaults to something like in your case its 1
so if you wanted to call your file what-i-want-to-call-it.html then you would do
wget "http://forex.kbpauk.ru/showflat.php/Cat/0/Number/107623/page/0/fpart/{1..6}/vc/1" -o what-i-want-to-call-it.html
if you type into the console wget --help you will get a full list of all the options that wget provides
To verify it has worked type the following to output
cat what-i-want-to-call-it.html

Downloading pdf files with wget. (characters after file extension?)

I'm trying to dowload recursively all .pdf files from a webpage.
The files URL have this format:
"http://example.com/fileexample.pdf?id=****"
I'm using these parameters:
wget -r -l1 -A.pdf http://example.com
wget is rejecting all the files when saving. Getting this error when using --debug:
Removing file due to recursive rejection criteria in recursive_retrieve()
I think that's happening because of this "?id=****" after the extension.
But did you try -A "*.pdf*" ? Regarding the wget docs, this should work out.

Set Content-Type created file with Ruby File?

I'm using File.open to create a .csv file on the fly.
But what I need to do is set the Content-Type of the file to binary/octet-stream so that the browser will automatically download it instead of just displaying the contents of it in the browser.
The file itself is created locally and then uploaded to Amazon S3.
Short Answer
There is no way to specify a Content-Type value in the filesystem when you create your file. In fact, this is probably not the best way to achieve your goal.
In order to suggest that a browser download a file rather than displaying it, you can leave Content-Type: text/csv and add the header Content-Disposition: attachment or Content-Disposition: attachment; filename=<your custom filename>.csv to change the filename in the "Save As..." dialog.
Setting Content-Disposition using Paperclip and AWS::S3
To set the Content-Disposition header using Paperclip, you can add a key to your has_attached_file definition: s3_headers.
has_attached_file :spreadsheet,
:path => 'perhaps/a/custom/path/:class/:id/:filename',
:or_maybe => 'other parameters',
:s3_headers => { 'Content-Disposition' => 'attachment' }
Content-Type issues
By default, a file with the extension .csv should be classified as a text/csv file. You can check this with Mime::Type.lookup_by_extension('csv').to_s # => "text/csv". If this is not the case, you can add text/csv as a custom mime-type by creating a config/initializers/mime_types.rb file and adding:
Mime::Type.register 'text/csv', :csv
However, this should almost always not be the case (unless Windows does something funky with content types; I've only tested in Linux).
Examples
I've put up two examples that you can check. The first is a CSV file uploaded with a text/plain mime-type which forces the browser to show it in-browser without downloading (my browser downloaded text/csv files).
https://s3.amazonaws.com/stackoverflow-demo/demo.csv
The second also has a mime-type of text/plain, but I added a header Content-Disposition: attachment; filename="mycustomname.csv"
https://s3.amazonaws.com/stackoverflow-demo/demo-download.csv
You'll notice that the first link is displayed in browser, while the second link is downloaded with the custom name mycustomname.csv.
To learn why, look at the headers using curl -I.
$ curl -I https://s3.amazonaws.com/stackoverflow-demo/demo-download.csv
HTTP/1.1 200 OK
Content-Disposition: attachment; filename="mycustomname.csv"
Content-Type: text/plain
versus
$ curl -I https://s3.amazonaws.com/stackoverflow-demo/demo.csv
HTTP/1.1 200 OK
Content-Type: text/plain
Note: unrelated headers were removed.
If you are using Ruby on Rails, you may use send_data method:
send_data csv_data, :type => 'text/csv; charset=iso-8859-1; header=present', :disposition => "attachment; filename=some_csv_file.csv"

Resources