dealing with html source vs browser dom - ruby

I'm building a scraper in ruby using nokogiri and I noticed that sometimes the dom created by parsing the source and the dom when the browser parses the source are different.
For example, the browser adds in tbody tags, the browser can modify tags if the document is not well formed or if javascript makes runtime changes.
The problem is that I am getting the desired element path from the browser dom (using an element inspector at this stage) but when I search for that element in the dom from parsing the source nothing is found because of these differences.
Is it possible to get the same dom as the browser and if so, how?

you can USE a browser and get the dom from the rendered source. there are several drivers like selenium, webkit-headless, poltergeist etc that can drive headless or real browsers.
all browsers will probably render different versions, cause there is no official standard implementation, so you will need to find the one that is the best fit for you.

Related

How do you see your html edits after you reload (Firefox DevTools)?

I feel like this should be really easy to find if it exists, but I've been googling for 10 minutes. I don't see it mentioned in the official documentation.
I'm trying to troubleshoot a solution to a problem. The solution requires rearranging the order the head child elements.
It's trivial to make that change with Firefox devtools, but I don't know how to view the page with those changes; if I reload (ctrl+r), the HTML goes back to the server version. Does this feature exist?
FWIW, I can find ways of doing this for CSS and JS changes.
There is currently (as of Firefox 92) no such feature of re-applying HTML changes.
What I found is a feature request in Mozilla's bug tracker.
At the moment, the only workaround for this is to directly transfer the changes on the server side script creating the HTML output once you've done them in the Inspector.
Note: It requires some heuristics to re-apply changes to HTML in the browser because the resources served over the network could change in the meantime.
For CSS, the simple DevTools solution is to replace a file entirely by the saved one. For JavaScript, they have some heuristics to recognize where a line moved when code has changed between two reloads. Though those solutions cannot be transferred to HTML easily, as it is generated dynamically most of the time.

Firefox tool for finding difference in HTML before and after AJAX response?

I want to inspect a page that is using AJAX requests for making some slight changes to the HTML. The source is huge and the AJAX response is not JSON/XML but 1000 lines of JavaScript code, which manipulates the DOM.
There is Firediff, but it seems to only work with an outdated version of Firebug. What other tools are there for me to inspect the differences in HTML code before and after an AJAX call?
What you can do is copy the HTML before and after modifications and compare results.
To copy the HTML:
go to the 'Elements' tab,
right click on the body tag and choose "Copy",
pase result somewhere safe.
Repeat after modification.
To compare gathered results, you can use online tool diffchecker.com or console tool diff (linux / OS X).
Alternatively, you can use an extension like DOMListener that lists all DOM changes in the console.

Firefox plugin for a web developer that shows all resources (js, css, html) as a single unified file?

I'm developing with a really complex cms system, and sometimes I need to know if something was sent to my html rendering.
Since this is a huge cms system, I have at least 30 resources linked to a page (js, css), and going through each one, clicking and searching for a string is not the best way to do it.
I would like to have a plugin that gets all the resources from a page, merges them as text, so i can search only once. Is this possible? Does something like this exists?
(I know Firebug can inspect an element and such, but a search option for an specific scenario - like a type=submit somewhere in a css file - is faster and more useful).
The plugin you need is the Web Developers toolbar addon for Firefox.
You can search all JavaScript files in plain text by clicking Information -> View JavaScript
You can search all CSS files in plain text by clicking CSS -> View CSS
In firebug, when you inspect an element it shows all CSS rules applying to it and a link to the source file involved. I think it is away more powerful then what you want.

Selenium RC : Button is not clicked but test passes

I have a script which enters some data in the page and click save button.
Here I used HTML component id for save button.
selenium.click("StudentID:saveData");
I even provided proper wait condition and also tried with X path locator.
The test passes. It doesn't throw any error message but the button is not clicked and the data is not getting updated.
Please let me know what might be the issue .
I had a similar problem and used a CSS selector instead. CSS selectors are much faster than Xpath (and in my experience work better in general, though Xpath is necessary for certain things).
If you are using Firefox, install the Firebug add-on; right-clicking on an element on the page will give you the option to copy CSS path. I've found that I often have to make some changes to it to get it working properly but it allows you to get to very deeply nested elements quickly.
The W3C has a good page on CSS selectors here.

Firefox Live DOM

Is it possible through a plugin or setting or something to allow Firefox to recognize the live DOM source code?
Basically, firebug or other similar tools can recognize elements on the page which Firefox does not.
I understand with these extensions I have the ability to see such changes made by javascript, but Firefox does not seem to fully recognize them.
I'll try to clarify.
If I load a page and view source (ctrl-U), I see what the server sent to Firefox, and what Firefox ostensibly recognizes as the source code of the page. If in that source code, there is javascript which alters the DOM, and then I hit (ctrl-U) again, the code is not updated.
I am using a testing tool (iMacros firefox plugin) to automate functionality, but it does not recognize the updated DOM because Firefox does not. Firebug and similar tools can recognize these "live" updates. Does that help?
http://www.chapter31.com/2006/12/04/viewing-ajax-generated-source-code/
You can try using the web-developer extension with a view-generated-source option.
I'm still not sure I understand your question, but I think what you're getting at is the Web Developer extension for FireFox, specifically its "View Generated Source" feature.
That will let you see the altered DOM.
Firebug gives u this ability:
for instance check the HTML tab when running a jquery ticker and see the dynamic changes live in the DOM
Usually, when I have weird issues with either the console or the DOM inspector with firebug, I find restarting the browser and validating your code is the way forward.
That said, I'm not really following your question, the document that firebug shows is the same one in the firefox window...?
It looks like the problem is not that you want firefox to show the current DOM when you hit CTRL+U, but that you want some automated testing tool to be able to test your web pages.
Perhaps you should use a testing tool that is suited to the job of testing rich web applications, Selenium, for example, can do this.

Resources