HtmlUnit - HTMLParser (page with characters) - htmlunit

I have a resource (a static html page), that I wanna use to test. But, when I get the static page, it comes with some characters encoding. I try with the class StringEscapeUtils but it doesn't work.
My function:
private HtmlPage getStaticPage() throws IOException, ClassNotFoundException {
final Reader reader = new InputStreamReader(this.getClass().getResourceAsStream("/" + "testPage" + ".html"), "UTF-8");
final StringWebResponse response = new StringWebResponse(StringEscapeUtils.unescapeHtml4(IOUtils.toString(reader)), StandardCharsets.UTF_8, new URL(URL_PAGE));
return HTMLParser.parseHtml(response, WebClientFactory.getInstance().getCurrentWindow());
}
import org.apache.commons.lang3.StringEscapeUtils;

final Reader reader = new InputStreamReader(this.getClass().getResourceAsStream("/" + "testPage" + ".html"), "UTF-8");
For the reader use the encoding of the file (from your comment i guess this is windows-1252 in your case).
Then read the file into an string (e.g. use commons.io).
Then you can process it like this
final StringWebResponse tmpResponse = new StringWebResponse(anHtmlCode,
new URL("http://www.wetator.org/test.html"));
final WebClient tmpWebClient = new WebClient(aBrowserVersion);
try {
final HtmlPage tmpPage = HTMLParser.parseHtml(tmpResponse, tmpWebClient.getCurrentWindow());
return tmpPage;
} finally {
tmpWebClient.close();
}
If you still have problem please make a simple sample out of your page that shows your problem and upload it here together with your code.

Related

How to read and write files in a reactive way using InputStreamand OutputStream

I am trying to read an Excel file in manipulate it or add new data to it and write it back out. I am also trying to do this a complete reactive process using Flux and Mono. The Idea is to return the resulting file or bytearray via a webservice.
My question is how do I get a InputStream and OutputStream in a non blocking way?
I am using the Apache Poi library to read and generate the Excel File.
I currently have a solution based around a mix of Mono.fromCallable() and Blocking code getting the Input Stream.
For example the webservice part is as follows.
#GetMapping(value = API_BASE_PATH + "/download", produces = "application/vnd.ms-excel")
public Mono<ByteArrayResource> download() {
Flux<TimeKeepingEntry> createExcel = excelExport.createDocument(false);
return createExcel.then(Mono.fromCallable(() -> {
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
excelExport.getWb().write(outputStream);
return new ByteArrayResource(outputStream.toByteArray());
}).subscribeOn(Schedulers.elastic()));
}
And the Processing of the file:
public Flux<TimeKeepingEntry> createDocument(boolean all) {
Flux<TimeKeepingEntry> entries = null;
try {
InputStream inputStream = new ClassPathResource("Timesheet Template.xlsx").getInputStream();
wb = WorkbookFactory.create(inputStream);
Sheet sheet = wb.getSheetAt(0);
log.info("Created document");
if (all) {
//all entries
} else {
entries = service.findByMonth(currentMonthName).log("Excel Export - retrievedMonths").sort(Comparator.comparing(TimeKeepingEntry::getDateOfMonth)).doOnNext(timeKeepingEntry-> {
this.populateEntry(sheet, timeKeepingEntry);
});
}
} catch (IOException e) {
log.error("Error Importing File", e);
}
return entries;
}
This works well enough but not very in line with Flux and Mono. Some guidance here would be good. I would prefer to have the whole sequence non-blocking.
Unfortunately the WorkbookFactory.create() operation is blocking, so you have to perform that operation using imperative code. However fetching each timeKeepingEntry can be done reactively. Your code would looks something like this:
public Flux<TimeKeepingEntry> createDocument() {
return Flux.generate(
this::getWorkbookSheet,
(sheet, sink) -> {
sink.next(getNextTimeKeepingEntryFrom(sheet));
},
this::closeWorkbook);
}
This will keep the workbook in memory, but will fetch each entry on demand when the elements of the Flux are requested.

How to extract and manipulate data within a Nifi processor

I'm trying to write a custom Nifi processor which will take in the contents of the incoming flow file, perform some math operations on it, then write the results into an outgoing flow file. Is there a way to dump the contents of the incoming flow file into a string or something? I've been searching for a while now and it doesn't seem that simple. If anyone could point me toward a good tutorial that deals with doing something like that it would be greatly appreciated.
The Apache NiFi Developer Guide documents the process of creating a custom processor very well. In your specific case, I would start with the Component Lifecycle section and the Enrich/Modify Content pattern. Any other processor which does similar work (like ReplaceText or Base64EncodeContent) would be good examples to learn from; all of the source code is available on GitHub.
Essentially you need to implement the #onTrigger() method in your processor class, read the flowfile content and parse it into your expected format, perform your operations, and then re-populate the resulting flowfile content. Your source code will look something like this:
#Override
public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {
FlowFile flowFile = session.get();
if (flowFile == null) {
return;
}
final ComponentLog logger = getLogger();
AtomicBoolean error = new AtomicBoolean();
AtomicReference<String> result = new AtomicReference<>(null);
// This uses a lambda function in place of a callback for InputStreamCallback#process()
processSession.read(flowFile, in -> {
long start = System.nanoTime();
// Read the flowfile content into a String
// TODO: May need to buffer this if the content is large
try {
final String contents = IOUtils.toString(in, StandardCharsets.UTF_8);
result.set(new MyMathOperationService().performSomeOperation(contents));
long stop = System.nanoTime();
if (getLogger().isDebugEnabled()) {
final long durationNanos = stop - start;
DecimalFormat df = new DecimalFormat("#.###");
getLogger().debug("Performed operation in " + durationNanos + " nanoseconds (" + df.format(durationNanos / 1_000_000_000.0) + " seconds).");
}
} catch (Exception e) {
error.set(true);
getLogger().error(e.getMessage() + " Routing to failure.", e);
}
});
if (error.get()) {
processSession.transfer(flowFile, REL_FAILURE);
} else {
// Again, a lambda takes the place of the OutputStreamCallback#process()
FlowFile updatedFlowFile = session.write(flowFile, (in, out) -> {
final String resultString = result.get();
final byte[] resultBytes = resultString.getBytes(StandardCharsets.UTF_8);
// TODO: This can use a while loop for performance
out.write(resultBytes, 0, resultBytes.length);
out.flush();
});
processSession.transfer(updatedFlowFile, REL_SUCCESS);
}
}
Daggett is right that the ExecuteScript processor is a good place to start because it will shorten the development lifecycle (no building NARs, deploying, and restarting NiFi to use it) and when you have the correct behavior, you can easily copy/paste into the generated skeleton and deploy it once.

Image without extension in src not loading in IE alone, and works perfect in all other browers

I have below HTML code:
<img title="hotelThumbImage" id="hotelThumbImage01" width="140px" height="129px"
src="/b2c/images/?url=FixedPkgB2c/FF-252-325"/>
It renders in IE as below:
It renders in all other browser like FireFox and Chrome as:
Related question : How to make a Servlet call form UI which returns the Content itself and place an img tag using Script in the output?
My project is suffering from this too, and it's because IE prevents download/display of files which have a different encoding than their extension. It has something to do with malicious code being able to be hidden as image files simply by changing the extension of the file.
Firefox and Chrome are smart enough to display it as an image so long as the encoding is that of an image, but IE takes no chances, it seems.
You'll have to add the extension that matches your image's encoding for it to display in IE.
Edit: It's also possible that your server is sending the file with a header denoting plain text. Again, Firefox and Chrome are smart enough to handle it, but IE isn't. See: https://stackoverflow.com/a/32988576/4793951
Welcome to IE world... :(
What i would do, in order to have better control of the situation is to modify the getter method, so in Holiday.getPkgCode():
public String getPkgCode() throws IOException {
if (!this.pkgCode.contains(".")) {
String ext = ImgUtil.determineFormat(this.pkgCode);
return this.pkgCode + ImgUtil.toExtension(ext);
} else {
return this.pkgCode;
}
}
To use it you will need to catch exceptions and this ImgUtil class adapted from here:
class ImgUtil {
public static String determineFormat(String name) throws IOException {
// get image format in a file
File file = new File(name);
// create an image input stream from the specified file
ImageInputStream iis = ImageIO.createImageInputStream(file);
// get all currently registered readers that recognize the image format
Iterator<ImageReader> iter = ImageIO.getImageReaders(iis);
if (!iter.hasNext()) {
throw new RuntimeException("No readers found!");
}
// get the first reader
ImageReader reader = iter.next();
String toReturn = reader.getFormatName();
// close stream
iis.close();
return toReturn;
}
public static String toExtension(String ext) {
switch (ext) {
case "JPEG": return ".jpg";
case "PNG": return ".png";
}
return null;
}
}
TEST IT:
NOTE: I placed an image (jpg) without extension placed in C:\tmp folder
public class Q37052184 {
String pkgCode = "C:\\tmp\\yorch";
public static void main(String[] args) throws IOException {
Q37052184 q = new Q37052184();
System.out.println(q.getPkgCode());
}
// the given getter!!!
}
OUTPUT:
C:\tmp\yorch.jpg
You have to set the Content Type property of responses' header in the servlet.
For example in spring 4 mvc,
#GetMapping(value = "/b2c/images/?url=FixedPkgB2c/FF-252-325")
public ResponseEntity<byte []> getImageThumbnail() {
HttpHeaders headers = new HttpHeaders();
headers.setContentType(media type));
byte [] content= ...;
return ResponseEntity.ok().headers(headers).body(content);
}

loading a pdf in-browser from a file in the server file system?

How can I get a pdf located in a file in a server's directory structure to load in a browser for users of a Spring MVC application?
I have googled this and found postings about how to generate PDFs, but their answers do not work in this situation. For example, this other posting is not relevant because res.setContentType("application/pdf"); in my code below does not solve the problem. Also, this other posting describes how to do it from a database but does not show full working controller code. Other postings had similar problems that caused them not to work in this case.
I need to simply serve up a file (not from a database) and have it been viewable by a user in their browser. The best I have come up with is the code below, which asks the user to download the PDF or to view it in a separate application outside the browser. What specific changes can I make to the specific code below so that the user automatically sees the PDF content inside their browser when they click on the link instead of being prompted to download it?
#RequestMapping(value = "/test-pdf")
public void generatePdf(HttpServletRequest req,HttpServletResponse res){
res.setContentType("application/pdf");
res.setHeader("Content-Disposition", "attachment;filename=report.pdf");
ServletOutputStream outStream=null;
try {
BufferedInputStream bis = new BufferedInputStream(
new FileInputStream(new File("/path/to", "nameOfThe.pdf")));
/*ServletOutputStream*/ outStream = res.getOutputStream();
//to make it easier to change to 8 or 16 KBs
int FILE_CHUNK_SIZE = 1024 * 4;
byte[] chunk = new byte[FILE_CHUNK_SIZE];
int bytesRead = 0;
while ((bytesRead = bis.read(chunk)) != -1) {outStream.write(chunk, 0, bytesRead);}
bis.close();
outStream.flush();
outStream.close();
}
catch (Exception e) {e.printStackTrace();}
}
Change
res.setHeader("Content-Disposition", "attachment;filename=report.pdf");
To
res.setHeader("Content-Disposition", "inline;filename=report.pdf");
You should also set the Content Length
FileCopyUtils is handy:
#Controller
public class FileController {
#RequestMapping("/report")
void getFile(HttpServletResponse response) throws IOException {
String fileName = "report.pdf";
String path = "/path/to/" + fileName;
File file = new File(path);
FileInputStream inputStream = new FileInputStream(file);
response.setContentType("application/pdf");
response.setContentLength((int) file.length());
response.setHeader("Content-Disposition", "inline;filename=\"" + fileName + "\"");
FileCopyUtils.copy(inputStream, response.getOutputStream());
}
}

Uploading more than one image

Dear All,
Working on Spring MVC. I want to upload more than one images from the client. How to achieve it. I know how to handle the multipart form data for single image. But now I am expecting some data with some images from the client.
Any help or url that will help me.
Thanks,
Op
Image is also a file. Whether you would be storing it in database / in file system but it is still a file.
In spring MVC, you could do as shown in the below link:
http://viralpatel.net/blogs/spring-mvc-multiple-file-upload-example/
Here are the code i tried and it is working fine at my end.
//Handle multiple images
#RequestMapping(method = RequestMethod.POST, value="upload", consumes=MediaType.MULTIPART_FORM_DATA_VALUE,
produces=MediaType.APPLICATION_JSON_VALUE)
public #ResponseBody JSONResponse uploadImages(HttpServletRequest req)
throws Exception {
try{
MultipartHttpServletRequest multipartRequest = (MultipartHttpServletRequest) req;
Set set = multipartRequest.getFileMap().entrySet();
Iterator i = set.iterator();
while(i.hasNext()) {
Map.Entry me = (Map.Entry)i.next();
String fileName = (String)me.getKey()+"_"+System.currentTimeMillis();
MultipartFile multipartFile = (MultipartFile)me.getValue();
System.out.println("Original fileName - " + multipartFile.getOriginalFilename());
System.out.println("fileName - " + fileName);
saveImage(fileName, multipartFile);
}
}
catch(Exception e){
e.printStackTrace();
}
return new JSONResponse();
}

Resources