Phantomjs and sessions - session

I can't maintain session surfing through a website once logged in.
I do can successfully login on the site (i specify that whatever the page is, after the login you will be redirected to the homepage) but then I have to move to another page. First I tried with page.open() then with page.evaluate changing the location.href window property, but in both cases unfortunately the result is that I'm not logged in anymore. I traced the login status just rendering the page on every page load event with incremental png names (1.png, 2.png, etc) . I also tried with --cookies-file=cookies.txt param but it didn't help much.
My questions are:
What is the best way to "move" through site pages with phantomjs?
Is there a specific way to handle sessions in these cases (maybe sending cookies manually on each .open(), just saying)?
Thanks for help.

Sessions require cookies. You have to add an extra argument in phantomjs.
--cookies-file=/path/to/cookies.txt
Look here for more info.
Edit :
Does your cookies.txt contains something ?

I had issues with the cookies file approach. The file would be written, and I could see "cookie information" in the file, but future requests would be interpreted as unauthenticated. As a last ditch effort, I simply took the page.cookies array, serialized it to JSON and saved it to a file. My next script would open the file, deserialize it to a variable and set the page.cookies to the variable. Sure enough this worked! Just thought I would pass it along.

Related

Getting the JSON file when triggering AJAX

I'm writing a crawler to get the content from a website which uses AJAX.
There is a "show more" button at the bottom of the page, and my origin approach is to use Selenium.PhantomJS to pretend a web browser but it works in some website and some don't.
I'm wondering if there is some way i can directly get the underly JSON file of the AJAX action. Please give me some details, thanks.
By the way, I'm using Python.
I understand this is less of a python than a scraping problem in general (and I understand you meant "scraping" instead of "crawling" as a scraper reads/parses/processes one page whereas a crawler processes multiple pages and they're relation to each other).
You can get the JSON file immediately given you know it's URL. If you don't (for example because the URL changes from time to time), you might need to search through javascript files on the page manually to find out how the URL is generated.
Once you know the JSON file's URL, it's quite simple. As you already seem to know how to get the HTML of the "main" page, you can use your existing code to get the JSON file.
I'm not familiar with PhantomJS, but I reckon it's easier to get the JSON file immediately instead of simulating an AJAX request (if that's even possible with Phantom).

Ajax generated pages with different URLs

I couldn't really word the title very well, but here's my problem: I've got a webpage that reads from a database each time the user clicks a button, the content is then replaced for part of the page.
Because it is an ajax load, everything is done in the background, and so the URL stays the same. This wasn't be a problem at all until I realised that I will want to have a different Facebook comments box for each set of content that is loaded - so if someone comments, it is posted to their facebook profile, people click on the link and are then taken to different content.
So... what I need is some way of referencing each set of content, and I've found a site that does exactly that (I'm sure there are a lot of them).
Here's the link.
Each set of content has a different 'hash code' (because I don't know the actual name for it) which is appended to the URL - in this case the code is "#1922934", this allows people to post links to it that specific set of content on Facebook etc. - and also allows a different Facebook comment box for each set of content.
Does anyone know how such a set-up can be achieved or how these 'hash codes' work?
Here's a document from wikipedia on it.
[http://en.wikipedia.org/wiki/Fragment_identifier][1]
The main idea is that URI fragments are used because they don't cause a page reload. They also can be used to refer to anchors on a web page.
What I would do is on page load use JavaScript to read the URI fragment (location.hash) then make a request to your server to load the comments etc. The URI fragment cannot be read by a server and is only found through a client (browser)
Sounds like you want something like SammyJS.

get title and url of the previous page on Magento

I'm trying to get title and url of the previous page the customers visited in Magento.
We can get the URL via $ _SERVER ['HTTP_REFERER'] variable but not sure how to get it's title.
Would appreciate any help on this issue or maybe different approaches how to display the previous page (url + title) in Magento
This information isn't sent normally, so you may want to add an observer to the page that saves the last title in the session. The trick will be that this will save the last page from /any/ tab, not necessarily for the referring page. You could save all titles from pages this way, but you'd have to hold onto everything (they could click the link a second time consecutively, etc etc). This may also run into issues with full page caching.
I know this is old but fyi you can JavaScript to the page and use
alert(document.referrer);
Since the numbers of web users with JavaScript turned on (~98%) and most site requirements it's safe. You would be able to make all adjustments to the DOM with that. Just one way of getting what you need.

Virtuemart Display Page after Remote redirect

I am aiming to create a payment module. Its users shall be redirected away from the site's URL in order for the transaction to be processed by a third party at a different URL. I would then like customers to be redirected back to a generic 'success' page that notifies them the order was a success. I have tried redirecting to the default success page (checkout.thankyou.php), but I get lots of errors; all the constants etc. that the application requires have obviously been lost during the redirect.
I would like to be able to retrieve the theme currently enabled in the configuration and use it to insert some basic HTML into the view. I would also like to access the database to perform some queries.
Can anybody advise? I am very stuck, and cannot find anything useful in the documentation! Thank you.
Can you be more specific about what type of information you want in your success page? If you just want basic HTML, then there's no reason you can't just write a basic Joomla article and redirect to that instead of trying to redirect to a VM partial. Again, if it's just basic HTML (no data from the transaction), then you can simply use a code inspector (like FireFox Inspect Element) to track down the CSS classes you like from the template and simply use them in your Joomla article to make it look like the VM template. You can find most of them in components/com_virtuemart/themes/default/themes.css.
If you need to display actual transaction data in your Thank You message, be prepared for a bit more work. You're probably going to have to write a cookie containing the record data BEFORE it gets sent offsite, and then read the cookie just prior to rendering the Thank You page.

Firefox 3 doesn't allow 'Back' to a form if the form result in a redirect last time

Greetings,
Here's the problem I'm having. I have a page which redirects directly to another page the first time it is visited. If the user clicks 'back', though, the page behaves differently and instead displays content (tracking session IDs to make sure this is the second time the page has been loaded). To do this, I tell the user's browser to disable caching for the relevant page.
This works well in IE7, but Firefox 3 won't let me click 'back' to a page that resulted in a redirect. I assume it does this to prevent the typical back-->redirect again loop that frustrates so many users. Any ideas for how I may override this behavior?
Alexey
EDIT: The page which we redirect to is an external site over which we have no control. Server-side redirects won't work because this wouldn't generate a 'back' button for in the browser.
To quote:
Some people in the thread are talking about server-side redirect, and redirect headers (same thing)... keep in mind that we need client-side redirection which can be done in two ways:
a) A META header - Not recommended, and has some problems
b) Javascript, which can be done in at least three ways ("location", "location.href" and "location.replace()")
The server side redirect won't and shouldn't activate the back button, and can't display the typical "You'll be redirected now" page... so it's no good (it's what we're doing at the moment, actually.. where you're immediately redirected to the "lucky" page).
I think the Mozilla team takes a step into the right direction by breaking this particularly annoying pattern. Finding a way around it somehow defies the purpose, doesn't it?
Instead of redirecting on first encounter, you could simply make your page render differently when a user hits it the first time. Should be easy enough on the server side, since you already have the code that is able to make that distinction.
You can get around this by creating an iframe and saving the state of the page in a form field in the iframe before doing the redirect. All browsers save the form fields of an iframe.
This page has a really good description of how to get it working. This is the same technique google maps uses when you click on map search results.
I'm strongly in favor for the Firefox behaviour.
The most basic way to redirect is to let the server send HTTP status code 302 + Location header back to the client. This way the client (typically a browser) will not place the request URI into its history, but just resend the same request to the advocated URI.
Now it seems that Firefox started to apply the bevaviour also for server responses that try redirections e.g. by Javascript's onload event.
If you want the browser not to display a page, I think the best solution is if the server does not send the page in the first place.
Its possibly in aide to eliminate repeated actions.
A lot of ways people do things is
page 1 -> [Action] -> page 2 -> redirect to page 2 without the action parameters.
Now if you were permitted to click the back button in this situation and visit the page without the redirect, the action would be blindly re-performed.
Instead, firefox presumes the server sent a redirect header for a good reason.
Although it is noted, that you can however have content delivered after the redirect header, sending a redirect header ( at least in php ) doesn't terminate execution, so in theory, if you were to ingnore the redirect request you would get the page doing weird stuff.
( I circumvent this by the fact all our redirects are done via the same function call, where i call an explicit terminate directly after the redirect, because people when coding assume this is how it behaves )
In the URL window of firefox type about:config
Change this setting in firefox
browser.sessionstore.postdata
Change from a 0 to 1

Resources