Can't "wget" a link after passing a captcha - terminal

Normally, after passing the captcha, I can copy&paste the link of the protected resource (e.g. video) and download with "wget" on the terminal.
Sth. weird happens to this website: (download raw video footage)
http://download.iris32.com/clip/H058_C003_0320O5.html
After captcha, I find the link to the video ends with ".html". If I open it on current browser, it starts to download; if on other browsers or wget, it only gets the captcha webpage.
So, what trick are they using? Is there a way to download the video with "wget"?
Thanks

Related

Internal Links Not Working Convert .HTM to .pdf

I am trying to convert an .htm file from the SEC website to a .pdf and have the internal links work. I am successfully converting to .pdf using wkhtmltopdf, but all the internal links point me back to the first page.
wkhtmltopdf https://www.sec.gov/Archives/edgar/data/1594617/000119312514117433/d640354ds1a.htm test.pdf
It looks like there's an issue with wkhtmltopdf dealing with anchor tags that have no content. There's a PR that was opened in 2017 to resolve it, but it remains open.
As it turns out, your document does indeed have empty anchor tags, so that's probably the root cause:
<A NAME="toc640354_15"></A>
I would suggest using chrome to produce the pdf, with its --headless and --print-to-pdf flags. From your chrome installation directory, do:
chrome.exe --headless --disable-gpu --print-to-pdf="C:\path\to\file.pdf" https://www.sec.gov/Archives/edgar/data/1594617/000119312514117433/d640354ds1a.htm
Make sure you specify an absolute path to the output file or it doesn't seem to work, for whatever reason. The command will immediately return without any output or indication of success. Give it a few seconds to retrieve, render and write the file.
I tested with your document, and the links work perfectly.

Not show picture from word

Why images not showing when I paste from MS Word.
Ckeditor show that source
<h1><img src="file:///C:\Users\user\AppData\Local\Temp\msohtmlclip1\01\clip_image002.jpg" style="height:88px; width:1005px" /></h1>
This file exist.
Ckeditor version is for AspNet.
Tested on Chrome, IE 10 and IE 11
Your CKEditor is presumably running on a web page, with a http:// address.
Modern browsers don't support embedding images (or anything else) from file:// URLs in http:// pages (or https://, or any other protocol) for security reasons.
This is because there'd be the danger of a malicious site embedding something from your private files (like a document), and then using some security hole to read and upload it elsewhere.
But even if this worked, it wouldn't do you much good: the image isn't uploaded into CKEditor so the image would show up on your computer only. Anyone else watching the page you're editing would see a broken image link.
As far as I know, there's currently no way around uploading the image separately.

How to download pdf file in ruby without .pdf in the link

I need to download a pdf from a website which does not provide a link ending with (.pdf) using ruby. Manually, when i click on the link to download the pdf, it takes me to a new page and the dialog box to save/open the file appears after some time.
Please help me in downloading the file.
The link
You an do this
require 'open-uri'
File.open('my_file_name.pdf', "wb") do |file|
file.write open('http://someurl.com/2013-1-2/somefile/download').read
end
I have been doing this for my projects and it works.
If you just need a simple ruby script to do it, I'd just run wget. Like this exec 'wget "http://path.to.the.file/and/some/params"'
At that point though, you might as well run wget.
The other way, is to just run a get on the page that you know the pdf is at
source = Net::HTTP.get("http://the.website.com", "/and/some/params")
There are a number of other http clients that you could use, but as long as you make a get request to the endpoint that the pdf is at, it should give you the raw data. Then you can just rename the file, and you'll have the pdf
In your case, I ran the following commands to get the pdf
wget http://www.lawcommission.gov.np/en/documents/prevailing-laws/constitution/func-download/129/chk,d8c4644b0f086a04d8d363cb86fb1647/no_html,1/
mv index.html thefile.pdf
Then open the pdf. Note that these are linux commands. If you want to get the file with a ruby script, you could use something like what I previously mentioned.
Update:
There is an added complication that was not initially stated, which is that the url to the pdf changes every time there is an update to the pdf. In order to make this work, you probably want to do something involving web scraping. I suggest nokogiri. This way you can look at the page where the download is and then perform a get request on the desired URL. Furthermore, the server that hosts the pdf is misconfigured, and breaks chrome within a few seconds of opening the page.
How to solve this problem: I went to the site, and refreshed it. Then broke the connection to the server (press the X where there would otherwise be a refresh button). Then right click next to the download link, and select inspect element. Then browse the dom to find something that is definitively identifying (like an id). Thankfully, I found something <strong id="telecharger"> Download</strong>. This means that you can use something like page.css('strong#telecharger')[0].parent['href'] This should give you a URL. Then you can perform a get request as described above. I don't have time to make the script for you (too much work to do), but this should be enough to solve the problem.

Can't stream tiff image using cfcontent

We're using cfcontent to stream images from outside the web root. jpegs, pngs, and gifs all stream correctly, but I can't get any tiff files to stream. I get the broken image icon, even though I can see that the file is in the correct location and the file name is stored correctly and is being passed correctly.
The code is really simple - the displaying page has
<img src="?file=common/includes/displayPhoto.cfm&thisImage=someImage.tiff" />
and displayPhoto.cfm has
<cfcontent type = "image/*" file = "#imagePath#" deleteFile = "No">
where #imagePath# has the fully qualified file path.
I'm at a loss - and my friend Google has let me down.
You're probably doing it right. The problem is that most browsers do not support TIFF. The only browser that does (and not all TIFF formats, at that!) is Safari.
On Firefox, Internet Explorer, Google Chrome, Opera etc, you will always see the broken image icon - because to the browser that image is indeed "broken", unintelligible.
See also Display TIFF image in all web browser
Your URL is wrong actually: You have something.cfm&x=y. In URL's, you begin with a question mark: something.cfm?x=y.

How to edit FTP URLs to HTTP to make images display?

I need to upload images into a page in my website.
I usually use WinSCP FTP program because it gives me the option "Copy to Clipboard (Include paths)". I copy images' URL through this option and the images are usually uploading and displaying successfully to the website.
I'm trying to do the same now for a new page but that is not working. Using any option in WinSCP is not helping at all. All I get is a small icon instead of the image. But when I use FileZilla for copying the URL, the images are uploading and displaying successfully. BUT the problem is that the page is requesting the username and the password to display the images.
I've been googling about it and I realise that the problem could be that I need to change the FTP URL to HTTP. I tried to do it this way:
ftp://username#domain.org/domain_restore/pics/anton.jpg
to:
http://username.domain.org/anton.jpg
That is probably totally wrong? I tried some other ways but the problem is I'm only a beginner and I don't have the knowledge how to edit it or how to find out what the problem is.
I followed the instructions of someone from the support of my host and they advised me to do a restore to all my directories in the FTP manager. I did that but I feel like I messed it up because now all the folders and the directories are duplicated. Could that also be the problem?

Resources