Selenium - Retaining firefox cache and history files

Selenium - Retaining firefox cache and history files - firefox

Is there a way to disable Selenium creating a temporary directory and profile when it starts Firefox?
I fully understand why Selenium does things as it does. I am just experimenting it as I try to create Firefox caches and histories with it for computer forensic training purposes. To this end, I have set up a clean virtual machine with a pristine user account. I can now run a Python script with selenium API to start firefox, visit a couple of web pages and shut down.
THe problem is, it leaves nothing behind. This is of course excellent if you are using selenium in its original purpose, but it thwarts my work by deleting everything.
So is there a way to disable the temporary profile creation and just start Firefox as it would start if ran by the user without Selenium.
Addition 5:34PM:
Java API documentation mentions a system property webdriver.reap_profile that would prevent deletion of temporary files. I went to the source of the problem and it appears this does not appear in Python WebDriver class:
def quit(self):
"""Quits the driver and close every associated window."""
try:
RemoteWebDriver.quit(self)
except (http_client.BadStatusLine, socket.error):
# Happens if Firefox shutsdown before we've read the response from
# the socket.
pass
self.binary.kill()
try:
shutil.rmtree(self.profile.path)
if self.profile.tempfolder is not None:
shutil.rmtree(self.profile.tempfolder)
except Exception as e:
print(str(e))
Deletion of files upon quit appears to be unconditional. I will solve this in my case by injecting
return self.profile.path
just after self.binary.kill(). This probably breaks all sorts of things and is a horrible thing to do but it appears to do exactly what I want it to do. The return value tells the calling function the random name of the temporary directory under /tmp. Not elegant but appears to work.

Addition 5:34PM: Java API documentation mentions a system property webdriver.reap_profile that would prevent deletion of temporary files. I went to the source of the problem and it appears this does not appear in Python WebDriver class:
def quit(self):
"""Quits the driver and close every associated window."""
try:
RemoteWebDriver.quit(self)
except (http_client.BadStatusLine, socket.error):
# Happens if Firefox shutsdown before we've read the response from
# the socket.
pass
self.binary.kill()
try:
shutil.rmtree(self.profile.path)
if self.profile.tempfolder is not None:
shutil.rmtree(self.profile.tempfolder)
except Exception as e:
print(str(e))
Deletion of files upon quit appears to be unconditional. I will solve this in my case by injecting
return self.profile.path
in /usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/webdriver.py just after self.binary.kill(). This probably breaks all sorts of things and is a horrible thing to do but it appears to do exactly what I want it to do. The return value tells the calling function the random name of the temporary directory under /tmp. Not elegant but appears to wor after a recompile.
If a more elegant solution exists, I would be happy to flag that as the correct one.

Related

Google API service account authentication can't find JSON credentials file

I can't get the Google API to find my service account's credentials. I downloaded the necessary JSON file with the right name into the proper place, and I'm using Python code straight off the API documentation:
import gspread
gc = gspread.service_account()
sh = gc.open("Example spreadsheet (I'll replace this with my actual sheet name later)")
print(sh.sheet1.get('A1'))
The code stops at gc = gspread.service_account() with a FileNotFoundError. I discovered via an error message that this is because it's looking at the complete wrong file path (I think it's thinking I'm on a Mac when I'm actually on a Windows PC??). Overriding the file name, i.e.
gc = gspread.service_account(filename="insert\actual\path\here.json"),
does not work either, which is the mystifying part. I copied that path straight out of my file explorer, doubled the backslashes so Unicode doesn't try to escape it all (that happened once), tried every modification on the file path I could think of (%APPDATA%\gspread\service_account.json instead of the whole thing, etc.) - what could be going wrong?
Edit: #mods, feel free to close the question! I found the issue, which is that I was using the Repl.it online coding environment instead of a local one. I ported everything over to IDLE and it worked fine. I strongly suspect Repl.it just couldn't access my local files at all (I also tried it on Repl.it with a random screenshot in a different place, and it threw the same error).

Python: requests module do not cache, why this error then?

I have a link to a raw txt file in github of the form https://raw.githubusercontent.com/XXX/YYY/master/txtfile where I want to periodically put a new version so a python script will know that it must update, the python script (py 3.5) uses an infinite while loop and the module requests:
while True:
try:
r = requests.get('https://raw.githubusercontent.com/XXX/YYY/master/txtfile', timeout=10)
required_version = r.text
except:
required_version = 0
log_in_txt_file(required_version)
sleep(10)
This script runs under Windows, however, I remark that despite the version is updated on the server, the log still show that the request is getting the previous version! If I try to get the version from a browser (Chrome) the same happens, but after some F5 the new version appears (in the browser and in the log), however, the script still log sometimes the old, sometimes the new version! I tried to make the URL variable with:
https://raw.githubusercontent.com/XXX/YYY/master/txtfile?_=time.time
But the problem remain, I'm using an Amazon workspace and I'm pretty sure it's a OS issue, my question, how to workaround this using python? Any idea?

This is not a client-side caching issue. In effect, Github servers are caching the version, serving you content until they are updated in time.
Github serves your data from a series of webservers, distributed geographically to ease loading times. These servers don't all update at the same time; until a change has propagated to all servers you'll see old and new content returned on that URL, depending on what machine served you the content for a specific request.
You can't really use GitHub to detect when a new version has been released, not reliably. Instead, generate a unique filename (generate a GUID perhaps) that at a future time will contain the new version information. Give that filename out with the current version, and try and poll that. Releasing a new version then consists of generating the filename for the version after, and putting the information to the current 'new version' URL. Each version links to the next file, and when it appears you only need to load it once.

How can a bookmarklet access a Firefox extension (or vice versa)

I have written a Firefox extension that catches when a particular URL is entered and does some stuff. My main app launches Firefox with this URL. The URL contains sensitive information so I don't want it being stored in the history.
I'm concerned about the case where the extension is not installed. If its not installed and Firefox gets launched with the sensitive URL, it will get stored in history and there's nothing I can do about it. So my idea is to use a bookmarklet.
I will launch Firefox with "javascript:window.location.href='pleaseinstallthisplugin.html'; sensitiveinfo='blahblah'".
If the extension is not installed they will get redirected to a page that tells them to install it and the sensitive info won't get stored in the history. If the extension IS installed it will grab the information in the sensitiveinfo variable and do its thing.
My question is, can the bookmarklet call a method in the extension to pass the sensitive info (and if so, how) or can the extension catch when javascript is being called in the bookmarklet?
How can a bookmarklet and Firefox extension communicate?
p.s. The alternative means of getting around this situation would be for my main app to launch Firefox and communicate with the extension using sockets but I am loath to do that because I've run into too many issues over the years with users with crazy firewalls blocking socket communication. I'd like to do everything without sockets if possible.

As far as I know, bookmarklets can never access chrome files (extensions).

Bookmarklets are executed in the scope of the current document, which is almost always a content document. However, if you are passing it in via the command line, it seems to work:
/Applications/Namoroka.app/Contents/MacOS/firefox-bin javascript:alert\(Components\)
Accessing Components would throw if it was not allowed, but the alert displays the proper object.

You could use unsafeWindow to inject a global. You can add a mere property so that your bookmarklet only needs to detect whether the global is defined or not, but you should know that, as far as I know, there is no way to prohibit sites in a non-bookmarklet context from also sniffing for this same global (since it may be a privacy concern to some that sites can detect whether they are using the extension). I have confirmed in my own add-on which injects a global in a manner similar to that below that it does work in a bookmarklet as well as regular site context.
If you register an nsIObserver, e.g., where content-document-global-created is the topic, and then unwrap the subject, you can inject your global (see this if you need to inject something more sophisticated like an object with methods).
Here is some (untested) code which should do the trick:
var observerService = Cc['#mozilla.org/observer-service;1'].getService(Ci.nsIObserverService);
observerService.addObserver({observe: function (subject, topic, data) {
var unsafeWindow = XPCNativeWrapper.unwrap(subject);
unsafeWindow.myGlobal = true;
}}, 'content-document-global-created', false);
See this and this if you want an apparently easier way in an SDK add-on (not sure whether SDK postMessage communication would work as an alternative but with the apparently same concern that this would be exposed to non-bookmarklet contexts (i.e., regular websites) as well).

Run script directly in 2 various browsers

I have created Ruby test script that use Selenium RC to test my web app directly in 2 browsers(IE, Firefox). My script runs - first on IE then continue on Firefox and then should be continued and finished in already opened IE browser. My problem is: I can't continue(reconnect) to run my script in already opened IE browser. I use:
#browser = RSpecSeleniumHelper.connect_browser("URL")
but it opens with new session (it needs to keep previous session).

Is there a particular reason you need to switch between browsers half way through?
I have no idea how you'd fix the problem, but it seems like it would be best solved by running the tests in one browser at a time.

I'm also unsure why you need to switch back and forth in your browsers.
Regardless, I'm doing something similar, but instead I use a different library. I'm using the "Selenium" gem. (gem install selenium) and here's what I would do in your situation.
#ie_driver = Selenium::SeleniumDriver.new(rc_host, port, "*iexplore", url, 1000)
#ie_driver.start
#ie_driver.whatever //Test code
#ff_driver = Selenium::SeleniumDriver.new(rc_host, port, "*firefox", url, 1000)
#ff_driver.start
#ff_driver.whatever //Test code
#ff_driver.stop
#ie_driver.whatever //Continue test code with IE
#ie_driver.stop
In summary, while I'm not really familiar with your selenium library, typically I would create 2 instances of the R/C driver, that way I won't have to interrupt the session.

Launching a registered mime helper application

I used to be able to launch a locally installed helper application by registering a given mime-type in the Windows registry. This enabled me to allow users to be able to click once on a link to the current install of our internal browser application. This worked fine in Internet Explorer 5 (most of the time) and Firefox but now does not work in Internet Explorer 7.
The filename passed to my shell/open/command is not the full physical path to the downloaded install package. The path parameter I am handed by IE is
"C:\Document and Settings\chq-tomc\Local Settings\Temporary Internet Files\
EIPortal_DEV_2_0_5_4[1].expd"
This unfortunately does not resolve to the physical file when calling FileExists() or when attempting to create a TFileStream object.
The physical path is missing the Internet Explorer hidden caching sub-directory for Temporary Internet Files of "Content.IE5\ALBKHO3Q" whose absolute path would be expressed as
"C:\Document and Settings\chq-tomc\Local Settings\Temporary Internet Files\
Content.IE5\ALBKHO3Q\EIPortal_DEV_2_0_5_4[1].expd"
Yes, the sub-directories are randomly generated by IE and that should not be a concern so long as IE passes the full path to my helper application, which it unfortunately is not doing.
Installation of the mime helper application is not a concern. It is installed/updated by a global login script for all 10,000+ users worldwide. The mime helper is only invoked when the user clicks on an internal web page with a link to an installation of our Desktop browser application. That install is served back with a mime-type of "application/x-expeditors". The registration of the ".expd" / "application/x-expeditors" mime-type looks like this.
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.expd]
#="ExpeditorsInstaller"
"Content Type"="application/x-expeditors"
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\ExpeditorsInstaller]
"EditFlags"=hex:00,00,01,00
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\ExpeditorsInstaller\shell]
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\ExpeditorsInstaller\shell\open]
#=""
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\ExpeditorsInstaller\shell\open\command]
#="\"C:\\projects\\desktop2\\WebInstaller\\WebInstaller.exe\" \"%1\""
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\MIME\Database\Content Type\application/x-expeditors]
"Extension"=".expd"
I had considered enumerating all of a user's IE cache entries but I would be concerned with how long it may take to examine them all or that I may end up finding an older cache entry before the current entry I am looking for. However, the bracketed filename suffix "[n]" may be the unique key.
I have tried wininet method GetUrlCacheEntryInfo but that requires the URL, not the virtual path handed over by IE.
My hope is that there is a Shell function that given a virtual path will hand back the physical path.

I believe the sub-directories created by IE are randomly generated, so you won't be able guarantee that it will be named the same every time, and the problem I see with the registry method is that it only works when the file is still in the cache...emptying the cache would purge the file requiring yet another installation.
Would it not be better to install this helper into application data?

I'm not sure about this but perhaps this may lead you in the right direction: try using URL cache functions from the wininet DLL: FindFirstUrlCacheEntry, FindNextUrlCacheEntry, FindCloseUrlCache for enumeration and when you locate an entry whose local file name matches the given path maybe you can use RetrieveUrlCacheEntryFile to retrieve the file.

I am using a similar system with the X-Appl browser to display WAML web applications and it works perfectly. Maybe you should have a look at how they managed to do it.

It looks like iexplore is passing the shell namespace "name" of the file rather than the filesystem name.
I dont think there is a documented way to be passed a shell item id on the command line - explorer does it to itself, but there are marshaling considerations as shell item ids are (pointers to) binary data structures that are only valid in a single process.
What I might try doing is:
1. Call SHGetDesktopFolder which will return the root IShellFolder object of the shell namespace.
2. Call the IShellFolder::ParseDisplayName to turn the name you are given back into a shell item id list.
3. Try the IShellFolder::GetDisplayNameOF with the SHGDN_FORPARSING flag - which, frankly, feels like w'eve just gone in a complete circle and are back where we started. Because I think its this API thats ultimately responsible for returning the "wrong" filesystem relative path.

Some follow-up to close out this question.
Turned out the real issue was how I was creating the file handle using TFileStream. I changed to open with fmOpenRead or fmShareDenyWrite which solved what turned out to be a file locking issue.
srcFile := TFileStream.Create(physicalFilename, fmOpenRead or fmShareDenyWrite);

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio