Use HtmlUnit to search google

Use HtmlUnit to search google - htmlunit

The following code is an attempt to search google, and return the results as text or html.
The code was almost entirely copied directly from code snippets online, and i see no reason for it to not return results from the search. How do you return google search results, using htmlunit to submit the search query, without a browser?
import com.gargoylesoftware.htmlunit.WebClient;
import java.io.*;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlInput;
import com.gargoylesoftware.htmlunit.html.HtmlSubmitInput;
import java.net.*;
public class GoogleSearch {
public static void main(String[] args)throws IOException, MalformedURLException
{
final WebClient webClient = new WebClient();
HtmlPage page1 = webClient.getPage("http://www.google.com");
HtmlInput input1 = page1.getElementByName("q");
input1.setValueAttribute("yarn");
HtmlSubmitInput submit1 = page1.getElementByName("btnK");
page1=submit1.click();
System.out.println(page1.asXml());
webClient.closeAllWindows();
}
}

There must be some browser detection that changes the generated HTML, because when inspecting the HTML with page1.getWebResponse().getContentAsString(), the submit button is named btnG and not btnK (which is not what I observe in Firefox). Make this change, and the result will be the expected one.

I've just checked this. It's actually 2 ids for 2 google pages:
btnK: on the google home page (where there's 1 long textbox in the middle of the screen). This time the button's id = 'gbqfa'
btnG: on the google result page (where the main textbox is on top of the screen). This time the button's id = 'gbqfb'

Related

Wicket page is refreshed if use ajax after form is submitted with target=_blank

I have a preview button. When user press preview, form is submitted on new tab to show a pdf file have data in form.
I use a custom SubmitLink to do that
SubmitResourceLink
public abstract class SubmitResourceLink extends SubmitLink implements IResourceListener {
private final IResource resource;
#Override
public final void onResourceRequested() {
Attributes a = new Attributes(RequestCycle.get().getRequest(), RequestCycle.get().getResponse(), null);
resource.respond(a);
}
Implement on form
new SubmitResourceLink("previewBtn", form, new JasperReportsResource() {
private static final long serialVersionUID = -2596569027102924489L;
#Override
public byte[] getData(Attributes attributes) {
return control.getExportPreviewByteStream(estimateModel.getObject());
}
}) {
private static final long serialVersionUID = 1L;
#Override
protected String getTriggerJavaScript() {
String js = super.getTriggerJavaScript();
js = "document.getElementById('" + form.getMarkupId() + "').target='_blank';" + js;
return js;
}
#Override
public void onSubmit() {
form.add(AttributeModifier.append("target", Model.of("_blank")));
processInputs(form);
onResourceRequested();
}
}.setDefaultFormProcessing(false);
When I press preview, a new tab is opend. But when I input in any ajax component (ex:AutoCompleteTextField), ajax reponse data xml: <ajax-response><redirect>....</redirect></ajax-response> and refresh page.
Now, I want after press preview, I still use current form normaly.
Thank.

This is caused by the "stale page protection" in Wicket.
The first click opens the same page instance in a new tab/window. This increments the page's renderCount counter, i.e. it says "this page has been rendered N times".
The links in Wicket look like ?2-1.ILinkListener-component~path. Here '2' is the page id and '1' is the page render count.
So the links in tab1 have renderCount 'N', and the links in tab2 - 'N+1'.
Clicking on a link in tab1 will fail with StalePageException that tells Wicket "the user is trying to use an obsolete version of the page. Please render the latest version of the page so the user can try again".
This protection is needed because the user may do many actions in tab3, e.g. replace a panel that replaces/hides the link the user wants to click in tab1. If there is no such protection Wicket will either fail with
ComponentNotFoundException while trying to click the Link or even worse can do the wrong action if the Link/Button is in a repeater and the repeater has changed its items in tab2.
To overcome your problem you should open a new page instance in tab2, i.e. it submits the form but in onSubmit() does something like setResponsePage(getPage().getClass()). This way it won't re-render the current page instance N+1 time.

HTML UNIT FOUND: INTERNAL ERROR: Oops! Exiting

I'm new to the HTMLUNIT, When I run the below code.
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlSubmitInput;
import com.gargoylesoftware.htmlunit.html.HtmlTextInput;
public class WeBrowser {
public void homePage() throws Exception {
final WebClient webClient = new WebClient();
// Get the first page
final HtmlPage page1 = webClient.getPage("http://some_url");
// Get the form that we are dealing with and within that form,
// find the submit button and the field that we want to change.
final HtmlForm form = page1.getFormByName("myform");
final HtmlSubmitInput button = form.getInputByName("submitbutton");
final HtmlTextInput textField = form.getInputByName("userid");
// Change the value of the text field
textField.setValueAttribute("root");
// Submit the form by clicking the button and get back the second page.
final HtmlPage page2 = button.click();
webClient.closeAllWindows();
}
}
It shows the following Error :
Exception in thread "main" org.apache.bcel.verifier.exc.AssertionViolatedException:
FOUND:
INTERNAL ERROR: Oops!
Exiting!!
at org.apache.bcel.verifier.exc.AssertionViolatedException.main(AssertionViolatedException.java:102)

Yesterday I have almost same problem, Alhtough it looks odd but try by debugging rather than running, or add some delay before clicking submit button
Thread.sleep(10000);
and also take a look at this answer AssertionViolatedException

I was getting the same error using eclipse to run java code that had previously worked ok. After using 'project>>clean>>all projects' the problem disappeared. Don't know what triggered it but all projects in the workspace were affected.

Failing to extract content by xpath using HtmlUnit

I'm trying to extract the title from this Maltese news page
http://www.maltarightnow.com/Default.asp?module=news&at=Inawgurat+%26%23289%3Bnien+%26%23289%3Bdid+f%27Marsalforn&t=a&aid=99839603&cid=19
using the following XPath
html/body/table/tbody/tr[2]/td/table/tbody/tr[4]/td/table/tbody/tr[1]/td[1]/table/tbody/tr/td/table/tbody/tr/td[2]/table[3]/tbody/tr[1]/td/h1
(Ain't pretty but this Xpath was generated by Chrome and makes sense since there's a lack of element Ids).
I'm extracting the title programatically using HTMLUnit in Java. Here's the code. I've extracted news content and article date using the same code (obviously with a different xpath).
public static void main (String[] args) {
WebClient webClient = new WebClient();
HtmlPage page = null;
try {
page = webClient.getPage("http://www.maltarightnow.com/?module=news&at=Inawgurat+%26%23289%3Bnien+%26%23289%3Bdid+f%27Marsalforn&t=a&aid=99839603&cid=19");
} catch (FailingHttpStatusCodeException | IOException e) {
}
String text = ((DomElement)page.getFirstByXPath("html/body/table/tbody/tr[2]/td/table/tbody/tr[4]/td/table/tbody/tr[1]/td[1]/table/tbody/tr/td/table/tbody/tr/td[2]/table[3]/tbody/tr[1]/td/h1")).asText();
System.out.println(text);
}
However it's giving a null pointer for the mentioned xpath in
((DomElement)page.getFirstByXPath("html/body/table/tbody/tr[2]/td/table/tbody/tr[4]/td/table/tbody/tr[1]/td[1]/table/tbody/tr/td/table/tbody/tr/td[2]/table[3]/tbody/tr[1]/td/h1")).asText();
The DomElement is not being found and I'm sure it's there, Chrome created the XPath after all.
What could be the cause of this?
Thanks in advance

It is not that easy. You should:
See the text HTMLUnit is actually creating with Page.asXml()
Correct the XPath you're traversing to match whatever HTMLUnit is outputting in the previous step

HTML Unit - login to secure website using form - can't connect to page after form

I'm a newbie to java htmlunit so any help would be greatly appreciated - Thanks in advance.
I'm trying to login to a webpage that is protected with username and password authentication by submitting a username and password to the form on the webpage using htmlunit to mirror the actions of a web browser. The website itself has form based authorisation.
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.util.Iterator;
import java.util.Set;
//Import htmlunit classes
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlSubmitInput;
import com.gargoylesoftware.htmlunit.html.HtmlTextInput;
import com.gargoylesoftware.htmlunit.util.Cookie;
//This Class attempts to submit user and password credentials
//and mirrors how a login button would be clicked on a webpage:
public class submitForm {
public static void main(String[] args) throws Exception {
WebClient webClient = new WebClient();
// Get the first page
HtmlPage page1 = (HtmlPage) webClient.getPage("http://cmdbjr/frameset.php?ci_name=&ci_id=&ci_type=");
// Get the form that we are dealing with and within that form,
// find the submit button and the field that we want to change.
HtmlForm form = page1.getFormByName("loginform");
// Enter login and passwd
form.getInputByName("user_id").setValueAttribute("#####");
form.getInputByName("password").setValueAttribute("#####");
// Click "Sign In" button/link
page1 = (HtmlPage) form.getInputByValue("Log In").click();
// I added the cookie section but this returns a null pointer exception
Set<Cookie> cookie = webClient.getCookieManager().getCookies();
if(cookie != null){
Iterator<Cookie> i = cookie.iterator();
while (i.hasNext()) {
webClient.getCookieManager().addCookie(i.next());
}
}
// Get page as Html
String htmlBody = page1.getWebResponse().getContentAsString();
// Save the response in a file
String filePath = "c:/temp/test_out.html";
BufferedWriter bw = new BufferedWriter(new FileWriter(new File(filePath)));
bw.write(htmlBody);
bw.close();
// Change the value of the text field
// userField.setValueAttribute("alwalsh");
// passwordField.setValueAttribute("1REland6");
// Now submit the form by clicking the button and get back the second page.
// final HtmlPage page2 = button.click();
webClient.closeAllWindows();
}
}
If I run the code without the cookie section of code the page I am trying to reach which
is after the login page doesn't appear an error page appears saying I'm not connected to the internet.
If the code is run with the cookie section the error:
Exception in thread "main" >java.lang.NullPointerException at contentWeb.main(contentWeb.java:26)
is returned.
I'm new to java htmlunit so any help at all would be greatly appreciated.
Thanks in advance.

I replicated your example with my yahoo mail login credentials and it worked. However, I added : webClient.setThrowExceptionOnScriptError(false); to ignore exceptions on script errors.

Can Selenium verify text inside a PDF loaded by the browser?

My web application loads a pdf in the browser. I have figured out how to check that the pdf has loaded correctly using:
verifyAttribute
xpath=//embed/#src
{URL of PDF goes here}
It would be really nice to be able to check the contents of the pdf with Selenium - for example verify that some text is present. Is there any way to do this?

While not natively supported, I have found a couple ways using the java driver. One way is to have the pdf open in your browser (having adobe acrobat installed) and then use keyboard shortcut keys to select all text (CTRL+A), then copy it to the clipboard (CTRL+C) and then you can verify the text in the clipboard. eg:
protected String getLastWindow() {
return session().getEval("var windowId; for(var x in selenium.browserbot.openedWindows ){windowId=x;} ");
}
#Test
public void testTextInPDF() {
session().click("link=View PDF");
String popupName = getLastWindow();
session().waitForPopUp(popupName, PAGE_LOAD_TIMEOUT);
session().selectWindow(popupName);
session().windowMaximize();
session().windowFocus();
Thread.sleep(3000);
session().keyDownNative("17"); // Stands for CTRL key
session().keyPressNative("65"); // Stands for A "ascii code for A"
session().keyUpNative("17"); //Releases CTRL key
Thread.sleep(1000);
session().keyDownNative("17"); // Stands for CTRL key
session().keyPressNative("67"); // Stands for C "ascii code for C"
session().keyUpNative("17"); //Releases CTRL key
TextTransfer textTransfer = new TextTransfer();
assertTrue(textTransfer.getClipboardContents().contains("Some text in my pdf"));
}
Another way, still in java, is to download the pdf and then convert the pdf to text with PDFBox, see http://www.prasannatech.net/2009/01/convert-pdf-text-parser-java-api-pdfbox.html for an example on how to do this.

You cannot do this using WebDriver natively. However, PDFBox API can be used here to read content of PDF file. You will have to first of all shift a focus to browser window where PDF file is opened. You can then parse all the content of PDF file and search for the desired text string.
Here is a code to use PDFBox API to search within PDF document.

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.PrintWriter;
import org.pdfbox.cos.COSDocument;
import org.pdfbox.pdfparser.PDFParser;
import org.pdfbox.pdmodel.PDDocument;
import org.pdfbox.util.PDFTextStripper;
public class pdfToTextConverter {
public static void pdfToText(String path_to_PDF_file, String Path_to_output_text_file) throws FileNotFoundException, IOException{
//Parse text from a PDF into a string variable
File f = new File("path_to_PDF_file");
PDFParser parser = new PDFParser(new FileInputStream(f));
parser.parse();
COSDocument cosDoc = parser.getDocument();
PDDocument pdDoc = new PDDocument(cosDoc);
PDFTextStripper pdfStripper = new PDFTextStripper();
String parsedText = pdfStripper.getText(pdDoc);
System.out.println(parsedText);
//Write parsed text into a file
PrintWriter pw = new PrintWriter("Path_to_output_text_file");
pw.print(parsedText);
pw.close();
}
}
JAR Source
http://sourceforge.net/projects/pdfbox/files/latest/download?source=files

Unfortunately you can not do this at all with Selenium

There is a way.
Before you click the link you can obtain the href value
element.FindElement(By.TagName("href")).Text
Then after the PDF loads you can get the Url
driver.GetUrl();
Then you can just check to see if the url contains the href.
It's not the best, but it's better than nothing.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Use HtmlUnit to search google - htmlunit

There must be some browser detection that changes the generated HTML, because when inspecting the HTML with page1.getWebResponse().getContentAsString(), the submit button is named btnG and not btnK (which is not what I observe in Firefox). Make this change, and the result will be the expected one.

Related

Wicket page is refreshed if use ajax after form is submitted with target=_blank

HTML UNIT FOUND: INTERNAL ERROR: Oops! Exiting

Failing to extract content by xpath using HtmlUnit

HTML Unit - login to secure website using form - can't connect to page after form

Can Selenium verify text inside a PDF loaded by the browser?

Categories

Resources