Search for string in a webpage with frames - vbscript

I am trying to search the webpage for particular strings, and I am using the following code:
strPageContent = appIE.document.documentElement.InnerHTML
However, the page is quite complicated and has got a lot of frames and framesets, and the command above returns only the contents of some parents tag. How can I access the contents of the particular div element - please see the code below:
<html>
<head>...</head>
<frameset id="NavContent_Workhorse" frameborder="0" framespacing="0" rows="*,0">
<frameset id="Nav_Content" border="3" frameborder="1" framespacing="3" cols="240,*">
<frame name="nav" src="/interface/sidebar/sidebar.def" scrolling="no">
<frame name="content" src="/interface/home.def" frameborder="1" border="3" marginheight="0" marginwidth="0" scrolling="no">
#document
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang>
<head> … </head>
<body onload="LoadAdd();">
<div class="PageTitle" id="PageTitle">...</div>
<div class="ToolBar" id="PageBody">...</div>
<div id="error" class="none"> … </div>
<div id="content" style="display: block; height: 445px;">
<form id="frmUSR" method="post" target="workhorse" onsubmit="return false;" action="/setup/users_groups/users/insert.sdl?parentid=14">
<div id="wiz_1" class="wiz_vis">
<table class="frmTbl">
<thead class="title">
<tr>
<th class="label">
TEXT THAT I AM LOOKING FOR
</th>
Thanks in advance!
Edit:
I forgot to add that I did try the code below, but I get a null value
Set div = appIE.document.getElementById("wiz_1")
Edit 2:
The purpose of the script is to automate filling out the user creation forms on my company's system (webpage UI). I don't know why, but I cannot get a reference to anything that is below the main <frameset id="Nav_Content" border="3". I keep getting null values.

You can access the content of an element with a particular ID inside a frame like this:
Set frame = appIE.Document.parentWindow.window.frames("content")
Set div = frame.document.getElementById("wiz_1")
WScript.Echo div.innerHTML
WScript.Echo div.innerText

Related

Seleium/Ruby - can't access save button in a modal popup

For the life of me I cannot get control of anything in this modal. I just want to click on this dang save button.
For all the other modals that are like this I am able to use this code successfully:
#driver.switch_to.frame #driver.find_element(:xpath, "//*[contains(#name, 'modal')]")
I get a not found error this time around though.
Here's the html of the modal i'm trying to access -NOTE the modal number changes so I can't hard code modal3:
and here's the html for the button:
<html class=" ext-strict">
<head>
<body class=" ext-gecko ext-gecko2" keydownhandlerset="true">
<div id="patientChartsContainer">
<div id="patientSearch" style="display:none">
<div class="tooltipWrapper">
<div id="patientPhotoContainer"></div>
<div id="modalWindowContainer" class="">
<div class="axShadowLayer window local" style="display: block; width: 558px; height: 453px; left: 384px; right: auto; top: 228px; z-index: 1010;">
<div class="axShadowTopRow">
<div class="axShadowMiddleRow">
<div class="axShadowBottomRow">
<div class="axShadowContentLayer">
<div class="priModalWrapper">
<ul class="priModalHeader">
<div class="priModalContentBackground"></div>
<ul class="priModalFooter">
<div class="priModalContentWrapper">
<div class="priModalContentContainer">
<iframe id="modal1" class="windowFrame" name="modal1" src="/chart/ui/desktop/patientCharts/chartSummary/chartNote/createChartNote/createNoteModal.html" allowtransparency="true" frameborder="0">
<!DOCTYPE html>
<html class=" ext-strict" xmlns="http://www.w3.org/1999/xhtml">
<head>
<body class=" ext-gecko ext-gecko2" keydownhandlerset="true">
<div id="newNoteContainer" class="newEncounterNote">
<ul class="newNoteOptions">
<div class="axModalButtonsFooter">
<div class="footerButtonsWrapperRight">
<div class="buttonClass axSaveButton">
<span>Save</span>
I've tried a few things and variations of what I have below:
##driver.find_element(:xpath, "//div[#id='modalWindowContainer']/div/div[4]/div/ul/li/div[2]/div[3]").click
##driver.switch_to.frame #driver.find_element(:xpath, "//*[contains(#name, 'modal')]")
##driver.switch_to.default_content
##driver.switch_to.frame(#driver.find_element(:class, 'windowFrame'))
##driver.find_element(:css, "div.buttonClass.axSaveButton").click
##driver.switch_to.frame #driver.find_element(:class, 'windowFrame')
##driver.find_element(:xpath => "//button/span[contains(text(),'Save')]").click
/html/body/div1/div/div/div/span
Try:
##driver.switch_to.default_content
##driver.switch_to.frame #driver.find_element(:class, 'windowFrame')
##driver.find_element(:css, "div.buttonClass.axSaveButton > span").click
This will:
reset the frame context
find the proper iframe
Click on the span (which has the save text, and might hold the listener) element in the <div class="buttonClass axSaveButton"> tag

Simple wkhtmltopdf conversion with framesets creating empty pdf

We need to convert/provide our html-based in-app HelpSystem to an on-disc pdf for the client to view outside of the application.
I'm trying to use wkhtmltopdf with a very basic file (3 frames with links to simple .html files) but getting an empty .pdf when I run the following from the command line:
wkhtmltopdf "C:\Program Files (x86)\wkhtmltopdf\index.html" "c:\delme\test.pdf"
I know frames are somewhat deprecated but it’s what I’ve got to deal with. Are the frames causing the empty pdf?
Index.html:
<html>
<head>
<title>Help</title>
</head>
<frameset cols="28%, 72%">
<frameset rows="8%, 92%">
<frame noresize="noresize" src="Buttons.html" name="UPPERLEFT" />
<frame noresize="noresize" src="mytest2.html" name="LOWERLEFT" />
</frameset>
<frame noresize="noresize" src="mytest.html" name="RIGHT" />
</frameset>
</html>
mytest.html:
<html>
<body>
<p>
<b>This text is bold</b>
</p>
<p>
<strong>This text is strong</strong>
</p>
<p>
<em>This text is emphasized</em>
</p>
<p>
<i>This text is italic</i>
</p>
<p>
<small>This text is small</small>
</p>
<p>This is
<sub>subscript</sub> and
<sup>superscript</sup></p>
</body>
</html>
mytest2.html:
<!DOCTYPE html>
<html>
<head>
<title></title>
</head>
<body>
<h2>The blockquote Element</h2>
<p>The blockquote element specifies a section that is quoted from another source.</p>
<p>Here is a quote from WWF's website:</p>
<blockquote cite="http://www.worldwildlife.org/who/index.html">For 50 years, WWF has been protecting the future of nature. The
world’s leading conservation organization, WWF works in 100 countries and is supported by 1.2 million members in the United
States and close to 5 million globally.</blockquote>
<p>
<b>Note:</b> Browsers usually indent blockquote elements.</p>
<h2>The q Element</h2>
<p>The q element defines a short quotation.</p>
<p>WWF's goal is to:
<q>Build a future where people live in harmony with nature.</q> We hope they succeed.</p>
<p>
<b>Note:</b> Browsers insert quotation marks around the q element.</p>
</body>
</html>
buttons.html:
![<html>
<body>
<center>
<table>
<tr>
<td>
<form method="link" action="mytest.html" target="LOWERLEFT">
<input type="submit" value="Contents" />
</form>
</td>
<td>
<form method="link" action="mytest2.html" target="LOWERLEFT">
<input type="submit" value="Index" />
</form>
</td>
</tr>
</table>
</center>
</body>
</html>][2]
Taken from the official wkhtmltopdf issues area from a code project member’s answer; emphasis is mine:
wkhtmltopdf calculates the TOC based on the H* (e.g. H1, H2 and so on)
tags in the supplied documents. It does not recurse into frames and
iframes.. It will nest dependend on the number, to make sure that it
does the right thing, it is good to make sure that you only have
tags under a tag and not for some k larger
then 1. 2000+ files sounds like a lot. You might run out of memory
while converting the output. If it does not work for you.. you could
try using the switch to dump the outline to a xml file, to see what it
would but into a TOC.

Best way to markup "mainContentOfPage"?

for other areas of a web page it is simple to mark up; i.e. navigation element, header, footer, sidebar
Not so with mainContentOfPage; I've seen a number of different ways to implement this, most recently (and I found this one to be the most strange) on schema.org itself:
<div itemscope itemtype="http://schema.org/Table">
<meta itemprop="mainContentOfPage" content="true"/>
<h2 itemprop="about">list of presidents</h2>
<table>
<tr><th>President</th><th>Party</th><tr>
<tr>
<td>George Washington (1789-1797)</td>
<td>no party</td>
</tr>
<tr>
<td>John Adams (1797-1801)</td>
<td>Federalist</td>
</tr>
...
</table>
</div>
I could use some examples; the main content of my page is in this case a search results page, but I would plan to use this on other pages too (homepage, product page, etc.)
Edit, I found some more examples:
Would this be valid? I found this on a blog:
<div id="main" itemscope itemtype="http://schema.org/WebPageElement" itemprop="mainContentOfPage">
<p>The content</p>
</div>
I also found this even simpler example on another blog (might be too simple?):
<div id="content" itemprop="mainContentOfPage">
<p>The content</p>
</div>
The mainContentOfPage property can be used on WebPage and expects a WebPageElement as value.
But Table is not a child of WebPage and true is not an expected value. So this example is in fact strange, as it doesn’t follow the specification.
A parent WebPage should use Table as value for mainContentOfPage:
<body itemscope itemtype="http://schema.org/WebPage">
<div itemprop="mainContentOfPage" itemscope itemtype="http://schema.org/Table">
</div>
</body>
EDIT: Update
Your second example is the same like mine, it just uses the more general WebPageElement instead of Table. (Of course you’d still need a parent WebPage item, like in my example.)
Your third example is not in line with schema.org’s definition, as the value is Text and not the expected WebPageElement (or child) item.
A valid option would be:
<body itemscope itemtype="http://schema.org/WebPage">
<main itemprop="mainContentOfPage" itemscope itemtype="http://schema.org/WebPageElement">
<div itemprop="about" itemscope="" itemtype="http://schema.org/Thing">
<h1 itemprop="name">whatever</h1>
</div>
</main>
</body>
Of course you may add related properties to top-level or nested elements, and change Thing into any other item type listed at Full Hierarchy. I also recommend to use mainEntity, documentation still doesn't clarify if it's really necessary, but according to 1st example here, using WebPage you may want to specify a mainEntity:
<body itemscope itemtype="http://schema.org/WebPage">
<header><h1 itemscope itemprop="mainEntity" itemtype="http://schema.org/Thing">whatever</h1></header>
<main itemprop="mainContentOfPage" itemscope itemtype="http://schema.org/WebPageElement">
<div itemprop="about" itemscope="" itemtype="http://schema.org/Thing">
<h2 itemprop="name">whatever</h2>
</div>
</main>
</body>
Cannot tell if also this would be valid:
<body itemscope itemtype="http://schema.org/WebPage">
<main itemprop="mainContentOfPage" itemscope itemtype="http://schema.org/WebPageElement">
<div itemprop="mainEntity" itemscope="" itemtype="http://schema.org/Thing">
<h1 itemprop="name">whatever</h1>
</div>
</main>
</body>
Documentation doesn't say nothing about setting mainEntity to nested items.
In any case, consider that "[...] Every web page is implicitly assumed to be declared to be of type WebPage [...]" as stated in WebPage description, and use of HTML tags as <main>, <footer> or <header> already gives information about what type of elements are used in a page. So if actually you do not need to add relevant information to those elements or to your web page itself, with a proper use of HTML tags you could easily do without mainContentOfPage or even WebPage.

Phantom <span> element using ImportXML with XPath in Google Spreadsheet

I am trying to get the value of an element attribute from this site via importXML in Google Spreadsheet using XPath.
The attribute value i seek is content found in the <span> with itemprop="price".
<div class="left" style="margin-top: 10px;">
<meta itemprop="currency" content="RON">
<span class="pret" itemprop="price" content="698,31 RON">
<p class="pret">Pretul tau:</p>
698,31 RON
</span>
...
</div>
I can access <div class="left"> but i can't get to the <span> element.
Tried using:
//span[#class='pret']/#content i get #N/A;
//span[#itemprop='price']/#content i get #N/A;
//div[#class='left']/span[#class='pret' and #itemprop='price']/#content i get #N/A;
//div[#class='left']/span[1]/#content i get #N/A;
//div[#class='left']/span/text() to get the text node of <span> i get #N/A;
//div[#class='left']//span/text() i get the text node of a <span> lower in div.left.
To get the text node of <span> i have to use //div[#class='left']/text(). But i can't use that text node because the layout of the span changes if a product is on sale, so i need the attribute.
It's like the span i'm looking for does not exist, although it appears in the development view of Chrome and in the page source and all XPath work in the console using $x("").
I tried to generate the XPath directly form the development tool by right clicking and i get //*[#id='produs']/div[4]/div[4]/div[1]/span which does not work. I also tried to generate the XPath with Firefox and plugins for FF and Chrome to no avail. The XPath generated in these ways did not even work on sites i managed to scrape with "hand coded XPath".
Now, the strangest thing is that on this other site with apparently similar code structure the XPath //span[#itemprop='price']/#content works.
I struggled with this for 4 days now. I'm starting to think it's something to do with the auto-closing meta tag, but why doesn't this happen on the other site?
Perhaps the following formulas can help you:
=ImportXML("http://...","//div[#class='product-info-price']//div[#class='left']/text()")
Or
=INDEX(ImportXML("http://...","//div[#class='product-info-price']//div[#class='left']"), 1, 2)
UPDATE
It seems that not properly parse the entire document, it fails. A document extraction, something like:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<div class="product-info-price">
<div class="left" style="margin-top: 10px;">
<meta itemprop="currency" content="RON">
<span class="pret" itemprop="price" content="698,31 RON">
<p class="pret">Pretul tau:</p>
698,31 RON
</span>
<div class="resealed-info">
» Vezi 1 resigilat din aceasta categorie
</div>
<ul style="margin-left: auto;margin-right: auto;width: 200px;text-align: center;margin-top: 20px;">
<li style="color: #000000; font-size: 11px;">Rata de la <b>28,18 RON</b> prin BRD</li>
<li style="color: #5F5F5F;text-align: center;">Pretul include TVA</li>
<li style="color: #5F5F5F;">Cod produs: <span style="margin-left: 0;text-align: center;font-weight: bold;" itemprop="identifier" content="mol:GA-Z87X-UD3H">GA-Z87X-UD3H</span> </li>
</ul>
</div>
<div class="right" style="height: 103px;line-height: 103px;">
<form action="/?a=shopping&sa=addtocart" method="post" id="add_to_cart_form">
<input type="hidden" name="product-183641" value="on"/>
<img src="/templates/marketonline/images/pag-prod/buton_cumpara.jpg"/>
</form>
</div>
</div>
</html>
works with the following XPath query:
"//div[#class='product-info-price']//div[#class='left']//span[#itemprop='price']/#content"
UPDATE
It occurs to me that one option is that you can use Apps Script to create your own ImportXML function, something like:
/* CODE FOR DEMONSTRATION PURPOSES */
function MyImportXML(url) {
var found, html, content = '';
var response = UrlFetchApp.fetch(url);
if (response) {
html = response.getContentText();
if (html) content = html.match(/<span class="pret" itemprop="price" content="(.*)">/gi)[0].match(/content="(.*)"/i)[1];
}
return content;
}
Then you can use as follows:
=MyImportXML("http://...")
At this time, the referred web page in the first link doesn't include a span tag with itemprop="price", but the following XPath returns 639
//b[#itemprop='price']
Looks to me that the problem was that the meta tag was not XHTML compliant but now all the meta tags are properly closed.
Before:
<meta itemprop="currency" content="RON">
Now
<meta itemprop="priceCurrency" content="RON" />
For web pages that are not XHTML compliant, instead of IMPORTXML another solution should be used, like using IMPORTDATA and REGEXEXTRACT or Google Apps Script, the UrlFetch Service and the match JavasScript function, among other alternatives.
Try smth like this:
print 'content by key',tree.xpath('//*[#itemprop="price"]')[0].get('content')
or
nodes = tree.xpath('//div/meta/span')
for node in nodes:
print 'content =',node.get('content')
But i haven't tried that.

How to convert Spring <form:checkboxes> to AngularJS equivalent?

I am migrating segments of Spring MVC code into AngularJS and hit the following problem:
In Spring, there is a nice tag that will take a Collection (or Map) of items and a property path to magically generate a list of checkboxes and have the selected ones checked;
<form:checkboxes path="selectedItems" items="${items}" />
where selectedItems is a List of value and items is Map of value and name.
Yes I can display all the checkboxes using this code:
<span ng-repeat="(key, value) in items" >
<input type="checkbox" ng-value="key" > <label class="label" >{{value}}</label>
</span>
But the trick is how we can auto select the checkboxes based on the values in the selectedItems and then bind it when the user select/unselect other items?
Directive give your html tag more power. I wrote a simple directive which will take a property "items" to generate a list of checkboxes and checked the selected ones according to item's status.
HTML: define data in your controller and add tag < checkboxes >
<!DOCTYPE html>
<html lang="en" ng-app="myApp">
<head>
<meta charset="utf-8">
<title>Angular test</title>
</head>
<body>
<script src="//ajax.googleapis.com/ajax/libs/angularjs/1.0.6/angular.min.js"></script>
<script src="js/app.js"></script>
<div ng-controller="CheckboxesCtrl">
<checkboxes items="items"></checkboxes>
<button ng-click="changeData()">change data</button>
</div>
</body>
</html>
App.js define controller and directive
var app = angular.module('myApp',[]);
app.controller('CheckboxesCtrl',function($scope){
//fake data
$scope.items = [{label:"A",checked:true},{label:"B",checked:true},{label:"C",checked:false}];
//data binding test
$scope.changeData = function(){
$scope.items[0].checked=false;
$scope.items[0].label="changed A";
}
});
//checkboxes directive
app.directive('checkboxes',function(){
return {
restrict: "E",
scope:{
items: "="
},
template: '<div ng-repeat="item in items">'+
'<input type="checkbox" ng-value="{{item.label}}" ng-checked="item.checked" />'+
' <lable class="label"> {{item.label}} </label>'+
'</div>'
};
});
I used ng-checked directive to process checkbox status binding. You could try my test JSFiddle.
Hope this is helpful for you.

Resources