Original URL of a saved .webarchive on macOS - macos

We have a saved .webarchive and we want to retrieve the original URL. Is that possible?
Background. My wife filled out a long application on the web and saved a local copy, .webarchive. The instructions said that to make changes you have to go to URL of where you were when you were at a certain step of the submission. The instructions are complicated/confusing and like most of these long applications hard to deal with anyway. She doesn't have that URL. We went back to her Safari history and one URL for the site that day but that just came up with an error.
To give a sense on how the sophistication of the site, they have a link for downloading Flash Player.
We're trying to contact the site. Due in 24 hours. Fortunately what they have is OK, she just wanted to make some edits and add some info.
I looked at the 13k lines of the .webarchive in a text editor and skimming through it don't see anything obvious. There is some com.apple.print plist embedded but no URL. I looked at Get Info and no URL (some things I download from the web have the original URL).
Thank you for any help.

Actually, webarchive itself is in the binary plist format and can be read like .plist files. The original URL of a webarchive file, if there is one, should be stored at :WebMainResource:WebResourceURL, which can be read with:
/usr/libexec/PlistBuddy -c 'print ":WebMainResource:WebResourceURL"' /path/to/file

Related

How to get full image paths from web page using Firebug?

I would like to download all images in full quality from this blog: http://w899c8kcu.homepage.t-online.de/Blog.
I have access to server, but I can not find the directory where the images lie. When I use Firebug on the first picture, it shows me http://w899c8kcu.homepage.t-online.de/Blog;session=f0577255d9df9185d3abe04af0ce922d&focus=CMTOI_de_dtag_hosting_hpcreator_widget_PictureGallery_15716702&path=image.action&frame=CMTOI_de_dtag_hosting_hpcreator_widget_PictureGallery_15716702?id=34877331&width=1000&height=2000&crop=false.
How can I find the file paths like /dirname/image.jpg?
According to its HTML output the page obviously uses the CM4all content management system (CMS).
I don't know how precisely this CMS is working, though generally CMSs normally either save the files under cryptic names within a folder specified in the CMS's configuration or not in the file system at all but within a database.
Also, CMS may only save compressed or resized versions of the original files.
So, if you don't want to or are not able to dig into the server-side script code to find out if and where the images are saved, you should contact the company behind CM4all about this.

How to download pdf file in ruby without .pdf in the link

I need to download a pdf from a website which does not provide a link ending with (.pdf) using ruby. Manually, when i click on the link to download the pdf, it takes me to a new page and the dialog box to save/open the file appears after some time.
Please help me in downloading the file.
The link
You an do this
require 'open-uri'
File.open('my_file_name.pdf', "wb") do |file|
file.write open('http://someurl.com/2013-1-2/somefile/download').read
end
I have been doing this for my projects and it works.
If you just need a simple ruby script to do it, I'd just run wget. Like this exec 'wget "http://path.to.the.file/and/some/params"'
At that point though, you might as well run wget.
The other way, is to just run a get on the page that you know the pdf is at
source = Net::HTTP.get("http://the.website.com", "/and/some/params")
There are a number of other http clients that you could use, but as long as you make a get request to the endpoint that the pdf is at, it should give you the raw data. Then you can just rename the file, and you'll have the pdf
In your case, I ran the following commands to get the pdf
wget http://www.lawcommission.gov.np/en/documents/prevailing-laws/constitution/func-download/129/chk,d8c4644b0f086a04d8d363cb86fb1647/no_html,1/
mv index.html thefile.pdf
Then open the pdf. Note that these are linux commands. If you want to get the file with a ruby script, you could use something like what I previously mentioned.
Update:
There is an added complication that was not initially stated, which is that the url to the pdf changes every time there is an update to the pdf. In order to make this work, you probably want to do something involving web scraping. I suggest nokogiri. This way you can look at the page where the download is and then perform a get request on the desired URL. Furthermore, the server that hosts the pdf is misconfigured, and breaks chrome within a few seconds of opening the page.
How to solve this problem: I went to the site, and refreshed it. Then broke the connection to the server (press the X where there would otherwise be a refresh button). Then right click next to the download link, and select inspect element. Then browse the dom to find something that is definitively identifying (like an id). Thankfully, I found something <strong id="telecharger"> Download</strong>. This means that you can use something like page.css('strong#telecharger')[0].parent['href'] This should give you a URL. Then you can perform a get request as described above. I don't have time to make the script for you (too much work to do), but this should be enough to solve the problem.

Joomla and JomSocial Error

I'm having problem with a client site. I'm not good with Joomla (we mostly do Wordpress), but one of my long-time clients asked me to move a site from another developer that never finished it, so I obliged. The problem is, everything is working great except for the Community page:
http://gettingripped.com/index.php/community
The only errors I'm finding are with the Facebook integration (which they told me the previous dev never finished/fixed). I'm really confused here...anyone out there have any ideas? It seems instead of showing the proper titles that Com_community_somethingElseHere is replacing everything.
Thank you guys in advance for your help!
Seems something is wrong with the en-GB.com_community.ini file.
Location: gettingripped.com/language/en-GB/en-GB.com_community.ini
I could not find the file in the above location!!!
Put this file in that folder and it will work!!!
If you can't find the file to put in the folder, create your own and place it there.. how? Well, google for this string as it is (including double quotes) "en-GB.com_community.ini" and open the first couple of results.
Then copy paste the displayed file content into your own ini file (name it en-GB.com_community.ini) and place it in your en-GB folder.
Load the page and it will show up as it should!

Browser add-on to find a download's origin

Back in the earlier days of the internet I remember that in certain browsers, every time you downloaded an image or a file, the URL of where that file was downloaded from would be written into that file's properties (I guess the summary tab?). I think Netscape v2 did this if I remember correctly.
I really miss that kind of functionality as every once in a while I'll run into a neat little program stored somewhere in the depths of my hard drive and wonder where I got it from originally.
I googled around but I'm not quite sure what terms to use to describe what I'm looking for. So I'm wondering if anyone knows of a Firefox plug-in or something similar that would do this?
If you use the DownThemAll! extension for Firefox, you can tell it to prepend the URL of the site to the downloaded file name...
thus you end up with files like:
download.com_utils_compression_ABCD32.exe
It also works really well when you want to download/queue a bunch of files.
You download http://example.com/foo to ~/Desktop/foo, and you want to see the originating URL in the properties of the local file foo?
Back when I used OS X, I remember Safari used to record the original URL in the resource fork of the downloaded file. Can't remember what the named fork is, well, named, but it'll show up in the properties panel from Finder. Since it's there, Spotlight will probably index it, too, but I haven't used OS X since 10.3.
If you use Opera, and haven't cleared the file out from your download manager, select the download and it'll show the original URL that the file is from in the properties pane.
Is this what you want? If so... well, I don't know of a similar Firefox extension, but it'll clarify the question.
For the IE Browser I use the hell out of Fidler to look at all traffic going across the wire.
For FireFox, you can use the FireBug plugin. There is a "Net" tab that will show you request information that is going across the wire.
Most of the time you can use one of these tools to see what URL was requested in order to start a download. You can also view all the get and post information that might need to be sent in order to have your request succeed.
Fidler is here: http://www.fiddlertool.com/fiddler/
FireBug is here: https://addons.mozilla.org/en-US/firefox/addon/1843
Best of Luck!

Load an image from Firefox cache?

I'm trying to load an image from the Firefox cache as the title suggests. I'm running Ubuntu, so the location of my cache is /home/me/.mozilla/firefox/xxxxxx.default/Cache
However, in the Cache (and this is on Mac, too) the filenames are just ridiculous combinations of letters and numbers. Is there a way to pinpoint a certain file?
You should take a look at the source code of the CacheViewer Add-on.
Download the file instead of installing it (right click and save as) and then extract it (it's just a Zip file, even though it has a .xpi extension), then extract the cacheviewer.jar file inside the resulting chrome folder. Finally go into content and then cacheviewer to find the javascript and XUL files.
From my brief investigation, the useful routines are in the cacheviewer.js file, though if you were hoping there would be a simple javascript one liner for accessing cached items you're probably going to be disappointed. The XUL files (which are just XML) are helpful in working out which JS functions are called to perform particular tasks. I'm not too sure how all this maps into Greasemonkey, rather than the extension environment, but hopefully there's enough code to get you started.
Ummm, that really is an internal implementation detail. But I suggest looking at how about:cache?device=disk and about:cache-entry?client=HTTP&sb=1&key=https://stackoverflow.com/Content/img/wmd/blockquote.png are implemented.
Also, http://www.securityfocus.com/infocus/1832 gives details, too. Note that Firefox doesn't use a separate file for everything...
And of course, Firefox may change the format at any time.
Just give your img src= attribute the full URL. If the image happens to be cacheable (the server sends an appropriate Expires: or Cache-control: header, for example) and it's already in the cache, Firefox will not hit the network.
HTTP caching is supposed to be invisible. When you're generating content, you generally shouldn't worry about it.
You can point REDbot at a URL to see all sorts of delicious information about its cacheability.

Resources