I wanted to share with you how to retrieve the content of a html page which is changed by ajax.
The following code returns the old page.
public class Test {
public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException, InterruptedException {
String url = "valid html page";
WebClient client = new WebClient(BrowserVersion.FIREFOX_17);
client.getOptions().setJavaScriptEnabled(true);
client.getOptions().setRedirectEnabled(true);
client.getOptions().setThrowExceptionOnScriptError(true);
client.getOptions().setCssEnabled(true);
client.getOptions().setUseInsecureSSL(true);
client.getOptions().setThrowExceptionOnFailingStatusCode(false);
client.setAjaxController(new NicelyResynchronizingAjaxController());
HtmlPage page = client.getPage(url);
System.out.println(page.getWebResponse().getContentAsString());
}
}
What is happening here?
The answer is that page.getWebResponse() confers to the initial page.
In order to get to the updated the content we have to use the page variable itself
package utils;
import java.io.IOException;
import java.net.MalformedURLException;
import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.NicelyResynchronizingAjaxController;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
public class Test {
public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException, InterruptedException {
String url = "valid html page";
WebClient client = new WebClient(BrowserVersion.FIREFOX_17);
client.getOptions().setJavaScriptEnabled(true);
client.getOptions().setRedirectEnabled(true);
client.getOptions().setThrowExceptionOnScriptError(true);
client.getOptions().setCssEnabled(true);
client.getOptions().setUseInsecureSSL(true);
client.getOptions().setThrowExceptionOnFailingStatusCode(false);
client.setAjaxController(new NicelyResynchronizingAjaxController());
HtmlPage page = client.getPage(url);
System.out.println(page.asXml());
System.out.println(page.getWebResponse().getContentAsString());
}
}
I found the hint in the following link
http://htmlunit.10904.n7.nabble.com/Not-expected-result-code-from-htmlunit-td28275.html
Ahmed Ashour yahoo.com> writes:
Hi,You shouldn't use WebResponse, which is meant to get the actual content from
the server.You should use htmlPage.asText() or .asXml()Yours,Ahmed
Related
in my spring boot project iam using docx4j to load a file from the target folder although the file exists when i use system.out.print("exists) it appears in the console . any solution ? here is the code
public void testDocx4j() throws Docx4JException, FileNotFoundException {
File file = ResourceUtils.getFile("classpath:compare.docx");
if(file.exists()){
System.out.println("exists !!");
}
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(file);
MainDocumentPart mainDocumentPart = wordMLPackage.getMainDocumentPart();
}
i was trying to load the file with docx4j
The following works for me:
import java.io.IOException;
import java.io.InputStream;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart;
import org.docx4j.utils.ResourceUtils;
public class LoadAsResource {
public static void main(String[] args) throws Docx4JException, IOException {
InputStream is = ResourceUtils.getResource("sample-docxv2.docx");
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
.load(is);
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
System.out.println(documentPart.getXML());
}
}
Hello everyone I'm learning spring boot and well I've been following some tutorials and well I'm stuck showing the image in the browser, I've already managed to save it in the path called "user-photos" but it doesn't save inside resources but as a separate folder, when I try to show it in the browser I get a 404 error.
This is my controller that I used to save an user with the image:
#PostMapping("/users/save")
public String saveUser(User user,
RedirectAttributes redirectAttributes,
#RequestParam("image")MultipartFile multipartFile) throws IOException {
if (!multipartFile.isEmpty()) {
String fileName = StringUtils.cleanPath(multipartFile.getOriginalFilename());
user.setPhotos(fileName);
User savedUser = userService.save(user);
String uploadDir = "ShopmeWebParent/ShopmeBackEnd/user-photos/" + savedUser.getId();
FileUploadUtil.saveFile(uploadDir, fileName, multipartFile);
}
// userService.save(user);
redirectAttributes.addFlashAttribute("message", "The user has been saved successfully.");
return "redirect:/users";
}
Now I created my MVC Controller to be allowed to show images called MvcConfig:
package com.shopme.admin;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.servlet.config.annotation.ResourceHandlerRegistry;
import org.springframework.web.servlet.config.annotation.WebMvcConfigurer;
import java.nio.file.Path;
import java.nio.file.Paths;
#Configuration
public class MvcConfig implements WebMvcConfigurer {
#Override
public void addResourceHandlers(ResourceHandlerRegistry registry) {
String dirName = "user-photos";
Path userPhotosDir = Paths.get(dirName);
String userPhotosPath = userPhotosDir.toFile().getAbsolutePath();
registry.addResourceHandler("/user-photos/**")
.addResourceLocations("file:/" + userPhotosPath + "/");
}
}
Finally I created an img tag to show it with the followind code snippet:
<img th:if="${user.photos != null}" th:src="#{${user.photosImagePath}}" >
if you haven't solve it, you need to remove the in the controller "ShopmeWebParent/ShopmeBackEnd/" only leave the "user-photos + ..."
I'm trying my hand at HtmlUnity and have ran into trouble when I try to click an area with javaScript.
Here is the code:
import java.io.IOException;
import java.net.MalformedURLException;
import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlArea;
import com.gargoylesoftware.htmlunit.html.HtmlElement;
import com.gargoylesoftware.htmlunit.html.HtmlMap;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
public class ToPost {
/**
* #param args
* #throws IOException
* #throws MalformedURLException
* #throws FailingHttpStatusCodeException
*/
public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
HtmlPage page;
final WebClient webClient = new WebClient();
page = webClient.getPage("http://www.hidrografico.pt/previsao-mares.php");
System.out.println(page.getTitleText());
HtmlPage pagePortoLeixoes = setPort(page, "362,64,440,90");
System.out.println("Are they the same? "+page.asXml().equals(pagePortoLeixoes.asXml()));
}
private static HtmlPage setPort(HtmlPage page, String coordinatesPort) throws IOException {
HtmlMap map = page.getHtmlElementById("FPMap1");
Iterable<HtmlElement> childAreas = map.getChildElements();
HtmlArea tempArea;
for (HtmlElement htmlElement : childAreas) {
tempArea = (HtmlArea) htmlElement;
if(tempArea.getCoordsAttribute().equals(coordinatesPort)){
System.out.println("Found Leixoes! --> "+ tempArea.asXml());
return tempArea.click();
}
}
return null;
}
}
I don't show it here but I double-check in my full code that I'm really not in the page I want.
What is happening? Why doesn't the click work?
HtmlUnit .click() often works poorly when "complex" javascript is involved.
http://htmlunit.sourceforge.net/apidocs/com/gargoylesoftware/htmlunit/html/HtmlElement.html#click()
Simulates clicking on this element, returning the page in the window that has the focus after the element has been clicked. Note that the returned page may or may not be the same as the original page, depending on the type of element being clicked, the presence of JavaScript action listeners, etc
In this case, you'll have to find another way to catch the data.
What i did see is that using .rss links, it gives you direct links to cities ...
eg : http://www.hidrografico.pt/previsao-mares-aveiro.php
Another way would have been to forge a POST request (check for exemple with Httpfox which requests are done when you're stuck getting a page)
Possibly showing Javascript Test Support
package htmlunitpoc;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlSubmitInput;
/**
*
* #author
*/
public class HtmlPoc {
/**
* #param args the command line arguments
*/
public static void main(String[] args) throws Exception {
WebClient wc = new WebClient();
HtmlPage page = (HtmlPage) wc.getPage("http://www.google.com");
HtmlForm form = page.getFormByName("f");
HtmlSubmitInput button = (HtmlSubmitInput) form.getInputByName("btnG");
HtmlPage page2 = (HtmlPage) button.click();
}
}
but i get:
Nov 17, 2010 3:41:14 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
BUILD SUCCESSFUL (total time: 4 seconds)
Which does not help as it does not run as a Unit test, and shows Pass/Fail etc.
I am using netbeans 6.9.1
That's because you haven't written it as a unit test. HtmlUnit is somewhat mis-named, as it's not a test runner itself, instead it's a "headless browser" which allows you to interact with a website from Java as if you were a browser.
Try this instead:
import junit.framework.TestCase;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlSubmitInput;
public class HtmlPoc
extends TestCase
{
public void test()
throws Exception
{
WebClient wc = new WebClient();
HtmlPage page = (HtmlPage) wc.getPage("http://www.google.com");
HtmlForm form = page.getFormByName("f");
HtmlSubmitInput button = (HtmlSubmitInput) form.getInputByName("btnG");
HtmlPage page2 = (HtmlPage) button.click();
assertNotNull( page2 ) ;
}
}
Are there any libraries or APIs available to convert MHT files to images? Can we use Universal Document Converter software to do this? Appreciate any thoughts.
If you really want to do this programatically,
MHT
Archived Web Page. When you save a Web
page as a Web archive in Internet
Explorer, the Web page saves this
information in Multipurpose Internet
Mail Extension HTML (MHTML) format
with a .MHT file extension. All
relative links in the Web page are
remapped and the embedded content is
included in the .MHT file.
you can use the JEditorPane to convert this into an Image
import javax.imageio.ImageIO;
import javax.swing.*;
import java.awt.*;
import java.awt.image.BufferedImage;
import java.beans.PropertyChangeEvent;
import java.beans.PropertyChangeListener;
import java.io.File;
import java.io.IOException;
import java.net.URL;
public class Test {
private static volatile boolean loaded;
public static void main(String[] args) throws IOException {
loaded = false;
URL url = new URL("http://www.google.com");
JEditorPane editorPane = new JEditorPane();
editorPane.addPropertyChangeListener(new PropertyChangeListener() {
public void propertyChange(PropertyChangeEvent evt) {
if (evt.getPropertyName().equals("page")) {
loaded = true;
}
}
});
editorPane.setPage(url);
while (!loaded) {
Thread.yield();
}
File file = new File("out.png");
componentToImage(editorPane, file);
}
public static void componentToImage(Component comp, File file) throws IOException {
Dimension prefSize = comp.getPreferredSize();
System.out.println("prefSize = " + prefSize);
BufferedImage img = new BufferedImage(prefSize.width, comp.getPreferredSize().height,
BufferedImage.TYPE_INT_ARGB);
Graphics graphics = img.getGraphics();
comp.setSize(prefSize);
comp.paint(graphics);
ImageIO.write(img, "png", file);
}
}