I have written a controller which takes as a input the domain name , crawls the whole site and gives back the result in JSON format
http://crawlmysite-tgugnani.rhcloud.com/getUrlCrawlData/www.google.com
This gives the data google
http://crawlmysite-tgugnani.rhcloud.com/getUrlCrawlData/www.yahoo.com
This gives data for yahoo
If I try to run these two URL's simultaneously, I see that I am getting the mixed data, and the results of one is affecting the another, even though I try to hit them from different machines.
Here is my controller
#RequestMapping("/getUrlCrawlData/{domain:.+}")
#ResponseBody
public String registerContact(#PathVariable("domain") String domain) throws HttpStatusException, SQLException, IOException {
List<URLdata> urldata = null;
Gson gson = new Gson();
String json;
urldata = crawlService.crawlURL("http://"+domain);
json = gson.toJson(urldata);
return json;
}
What do I need to do modify to allow many multiple independent connections.
Update
Following is my crawl Service
public List<URLdata> crawlURL(String domain) throws HttpStatusException, SQLException, IOException{
testDomain = domain;
urlList.clear();
urlMap.clear();
urldata.clear();
urlList.add(testDomain);
processPage(testDomain);
//Get all pages
for(int i = 1; i < urlList.size(); i++){
if(urlList.size()>=500){
break;
}
processPage(urlList.get(i));
//System.out.println(urlList.get(i));
}
//Calculate Time
for(int i = 0; i < urlList.size(); i++){
getTitleAndMeta(urlList.get(i));
}
return urldata;
}
public static void processPage(String URL) throws SQLException, IOException, HttpStatusException{
//get useful information
try{
Connection.Response response = Jsoup.connect(URL)
.userAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21")
.timeout(10000)
.execute();
Document doc = response.parse();
//get all links and recursively call the processPage method
Elements questions = doc.select("a[href]");
for(Element link: questions){
String linkName = link.attr("abs:href");
if(linkName.contains(testDomain.replaceAll("http://www.", ""))){
if(linkName.contains("#")){
linkName = linkName.substring(0, linkName.indexOf("#"));
}
if(linkName.contains("?")){
linkName = linkName.substring(0, linkName.indexOf("?"));
}
if(!urlList.contains(linkName) && urlList.size() <= 500){
urlList.add(linkName);
}
}
}
}
catch(HttpStatusException e){
System.out.println(e);
}
catch(SocketTimeoutException e){
System.out.println(e);
}
catch(UnsupportedMimeTypeException e){
System.out.println(e);
}
catch(UnknownHostException e){
System.out.println(e);
}
catch(MalformedURLException e){
System.out.println(e);
}
}
Each of your requests (http://crawlmysite-tgugnani.rhcloud.com/getUrlCrawlData/www.google.com and http://crawlmysite-tgugnani.rhcloud.com/getUrlCrawlData/www.yahoo.com) is processed in a separate thread. You have two instances of the crawlURL() method working simultaneously, but both methods use the same variables (testDomain, urlList, urlMap and urldata). So they mess up each other's data in these variables.
One way to fix the problem is to declare these variables locally (inside the method). This way, new instances of these variables will be created for each invocation of crawlURL(). Alternatively, you can create a new instance of your CrawlService class for each invocation of the crawlURL() method.
Synchronizing threads would be a bad idea here because one requests will wait for another to complete before it can be processed by crawlURL().
As far as SpringMVC is concerned every request running in separate thread. So I think problem is in crawlService which, I suppose, is not stateless (singleton-like). Try to create new crawl service for every request and check if your data is not mixed. If creating crawl service is expensive operation you should rewrite it to work in stateless way.
#RequestMapping("/getUrlCrawlData/{domain:.+}")
#ResponseBody
public String registerContact(#PathVariable("domain") String domain) throws HttpStatusException, SQLException, IOException {
Gson gson = new Gson();
List<URLdata> = new CrawlService().crawlURL("http://"+domain);
return gson.toJson(urldata);
}
I think
urldata = crawlService.crawlURL("http://"+domain);
This call to crawl Service is the one which is affected by Multiple requests coming simultaneously.
check whether crawlService is safe from multithreading.
ie check whether crawlURL() method is synchronized , if not make it synchronized.
or else synchronize the block of calling crawlservice inside controller.
Related
Using Spring Boot, I am trying to implement a REST controller, which can handle a GET request asking to return a BLOB object from my database.
Googling around a little bit, and putting pieces together, I have created the following code snippet:
#GetMapping("student/pic/studentId")
public void getProfilePicture(#PathVariable Long studentId, HttpServletResponse response) throws IOException {
Optional<ProfilePicture> profilePicture;
profilePicture = profilePictureService.getProfilePictureByStudentId(studentId);
if (profilePicture.isPresent()) {
ServletOutputStream outputStream = response.getOutputStream();
outputStream.write(profilePicture.get().getPicture());
outputStream.close();
}
}
I am sending the GET request using VanillaJS and the fetch-API:
async function downloadPicture(profilePic, studentId) {
const url = "http://localhost:8080/student/pic/" + studentId;
const response = await fetch(url);
const responseBlob = await response.blob();
if (responseBlob.size > 0) {
profilePic.src = URL.createObjectURL(responseBlob);
}
}
Somehow, this works. That's great, but now I would like to understand the usage of HttpServletResponse in this context, which I am not familiar with. It seems to me that the fetch-API makes use of HttpServletResponse (maybe even creates it), since I am not creating this object or do anything with it.
What is very strange to me is that the return-type of my controller method getProfilePicture() is void, and still I am sending a response, which is most definitely not void.
Also, if the profilePicture was not found in my database, for example due to a non-existing studentId being passed, my controller-method does not do anything. But still, I am getting a response code of 200. That's why I have added the responseBlob.size > 0 part in my Javascript to check for a positive response.
Can someone explain this magic to me, please?
response.getOutputStream(); javadoc says "Returns a ServletOutputStream suitable for writing binary data in the response." It's literally the response stream and you write the picture bytes into it. It's not related to the client reading the response. Alternatively you could just return a byte array which will be automatically written into the response stream and the result will be the same.
To return a different http status code you should change the method return type to ResponseEntity<byte[]>:
#GetMapping("student/pic/studentId")
public ResponseEntity<byte[]> getProfilePicture(#PathVariable Long studentId, HttpServletResponse response) throws IOException {
Optional<ProfilePicture> profilePicture = profilePictureService.getProfilePictureByStudentId(studentId);
if (profilePicture.isPresent()) {
return ResponseEntity.ok(profilePicture.get().getPicture()); //status code 200
} else {
return ResponseEntity.notFound().build(); //status code 404
}
}
ResponseEntity is basically springs way to return different status codes/messages.
Is there a reason why you are manually downloading the image via javascript? You could just create a img element with the http link to the image and the browser will automatically display the image content: <img src="http://localhost:8080/student/pic/studentId">
I am aware of the TransferManager and the .uploadFileList() and .uploadFileDirectory() methods, however they accept java.io.File types as arguments. I have a collection of byte array input streams containing jpeg image data. I don't want to create in-memory files to store this data before I upload it either.
So what I need is essentially what the S3 client's PutObjectRequest does but for a collection of InputStream objects. Also, if one upload fails, I want to abort the whole thing and not upload anything, much like how a database transaction will reverse the changes if something goes wrong along the way.
Is this possible with the Java SDK?
Before I share an answer, please consider upgrading...
fyi - TransferManager is deprecated, now supported as TransferManagerBuilder in JAVA AWS SDK, please consider upgrading if TransferManagerBuilder Object suits your needs.
now since you asked about TransferManager, you could either 1) copy the code below and replace the functionality/arguments with your custom in memory handling of the input stream and handle it in your custom function... or; 2) further below is another sample, try to use this as-is...
Github source modify with with inputstream and issue listed here
private def uploadFile(is: InputStream, s3ObjectName: String, metadata: ObjectMetadata) = {
try {
val putObjectRequest = new PutObjectRequest(bucketName, s3ObjectName,
is, metadata)
// TransferManager supports asynchronous uploads and downloads
val upload = transferManager.upload(putObjectRequest)
upload.addProgressListener(ExceptionReporter.wrap(UploadProgressListener(putObjectRequest)))
} catch {
case e: Exception => throw new RuntimeException(e)
}
}
Bonus, Nice custom answer here using sequence input streams
public void combineFiles() {
List<String> files = getFiles();
long totalFileSize = files.stream()
.map(this::getContentLength)
.reduce(0L, (f, s) -> f + s);
try {
try (InputStream partialFile = new SequenceInputStream(getInputStreamEnumeration(files))) {
ObjectMetadata resultFileMetadata = new ObjectMetadata();
resultFileMetadata.setContentLength(totalFileSize);
s3Client.putObject("bucketName", "resultFilePath", partialFile, resultFileMetadata);
}
} catch (IOException e) {
LOG.error("An error occurred while combining files. {}", e);
}
}
private Enumeration<? extends InputStream> getInputStreamEnumeration(List<String> files) {
return new Enumeration<InputStream>() {
private Iterator<String> fileNamesIterator = files.iterator();
#Override
public boolean hasMoreElements() {
return fileNamesIterator.hasNext();
}
#Override
public InputStream nextElement() {
try {
return new FileInputStream(Paths.get(fileNamesIterator.next()).toFile());
} catch (FileNotFoundException e) {
System.err.println(e.getMessage());
throw new RuntimeException(e);
}
}
};
}
I am trying to read an Excel file in manipulate it or add new data to it and write it back out. I am also trying to do this a complete reactive process using Flux and Mono. The Idea is to return the resulting file or bytearray via a webservice.
My question is how do I get a InputStream and OutputStream in a non blocking way?
I am using the Apache Poi library to read and generate the Excel File.
I currently have a solution based around a mix of Mono.fromCallable() and Blocking code getting the Input Stream.
For example the webservice part is as follows.
#GetMapping(value = API_BASE_PATH + "/download", produces = "application/vnd.ms-excel")
public Mono<ByteArrayResource> download() {
Flux<TimeKeepingEntry> createExcel = excelExport.createDocument(false);
return createExcel.then(Mono.fromCallable(() -> {
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
excelExport.getWb().write(outputStream);
return new ByteArrayResource(outputStream.toByteArray());
}).subscribeOn(Schedulers.elastic()));
}
And the Processing of the file:
public Flux<TimeKeepingEntry> createDocument(boolean all) {
Flux<TimeKeepingEntry> entries = null;
try {
InputStream inputStream = new ClassPathResource("Timesheet Template.xlsx").getInputStream();
wb = WorkbookFactory.create(inputStream);
Sheet sheet = wb.getSheetAt(0);
log.info("Created document");
if (all) {
//all entries
} else {
entries = service.findByMonth(currentMonthName).log("Excel Export - retrievedMonths").sort(Comparator.comparing(TimeKeepingEntry::getDateOfMonth)).doOnNext(timeKeepingEntry-> {
this.populateEntry(sheet, timeKeepingEntry);
});
}
} catch (IOException e) {
log.error("Error Importing File", e);
}
return entries;
}
This works well enough but not very in line with Flux and Mono. Some guidance here would be good. I would prefer to have the whole sequence non-blocking.
Unfortunately the WorkbookFactory.create() operation is blocking, so you have to perform that operation using imperative code. However fetching each timeKeepingEntry can be done reactively. Your code would looks something like this:
public Flux<TimeKeepingEntry> createDocument() {
return Flux.generate(
this::getWorkbookSheet,
(sheet, sink) -> {
sink.next(getNextTimeKeepingEntryFrom(sheet));
},
this::closeWorkbook);
}
This will keep the workbook in memory, but will fetch each entry on demand when the elements of the Flux are requested.
I'm trying to write a custom Nifi processor which will take in the contents of the incoming flow file, perform some math operations on it, then write the results into an outgoing flow file. Is there a way to dump the contents of the incoming flow file into a string or something? I've been searching for a while now and it doesn't seem that simple. If anyone could point me toward a good tutorial that deals with doing something like that it would be greatly appreciated.
The Apache NiFi Developer Guide documents the process of creating a custom processor very well. In your specific case, I would start with the Component Lifecycle section and the Enrich/Modify Content pattern. Any other processor which does similar work (like ReplaceText or Base64EncodeContent) would be good examples to learn from; all of the source code is available on GitHub.
Essentially you need to implement the #onTrigger() method in your processor class, read the flowfile content and parse it into your expected format, perform your operations, and then re-populate the resulting flowfile content. Your source code will look something like this:
#Override
public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {
FlowFile flowFile = session.get();
if (flowFile == null) {
return;
}
final ComponentLog logger = getLogger();
AtomicBoolean error = new AtomicBoolean();
AtomicReference<String> result = new AtomicReference<>(null);
// This uses a lambda function in place of a callback for InputStreamCallback#process()
processSession.read(flowFile, in -> {
long start = System.nanoTime();
// Read the flowfile content into a String
// TODO: May need to buffer this if the content is large
try {
final String contents = IOUtils.toString(in, StandardCharsets.UTF_8);
result.set(new MyMathOperationService().performSomeOperation(contents));
long stop = System.nanoTime();
if (getLogger().isDebugEnabled()) {
final long durationNanos = stop - start;
DecimalFormat df = new DecimalFormat("#.###");
getLogger().debug("Performed operation in " + durationNanos + " nanoseconds (" + df.format(durationNanos / 1_000_000_000.0) + " seconds).");
}
} catch (Exception e) {
error.set(true);
getLogger().error(e.getMessage() + " Routing to failure.", e);
}
});
if (error.get()) {
processSession.transfer(flowFile, REL_FAILURE);
} else {
// Again, a lambda takes the place of the OutputStreamCallback#process()
FlowFile updatedFlowFile = session.write(flowFile, (in, out) -> {
final String resultString = result.get();
final byte[] resultBytes = resultString.getBytes(StandardCharsets.UTF_8);
// TODO: This can use a while loop for performance
out.write(resultBytes, 0, resultBytes.length);
out.flush();
});
processSession.transfer(updatedFlowFile, REL_SUCCESS);
}
}
Daggett is right that the ExecuteScript processor is a good place to start because it will shorten the development lifecycle (no building NARs, deploying, and restarting NiFi to use it) and when you have the correct behavior, you can easily copy/paste into the generated skeleton and deploy it once.
I have a YANG model (known to MDSAL) which I am using in an opendaylight application. In my application, I am presented with a json formatted String which I want to store in the MDSAL database. I could use the builder of the object that I wish to store and set its with fields presented in the json formatted String one by one but this is laborious and error prone.
Alternatively I could post from within the application to the Northbound API which will eventually write to the MDSAL datastore.
Is there a simpler way to do this?
Thanks,
Assuming that your incoming JSON matches the structure of your YANG model exactly (does it?), I believe what you are really looking for is to transform that JSON into a "binding independant" (not setters of the generated Java class) internal model - NormalizedNode & Co. Somewhere in the controller or mdsal project there is a "codec" class that can do this.
You can either search for such code, and its usages (I find looking at tests are always useful) in the ODL controller and mdsal projects source code, or in other ODL projects which do similar things - I'm thinking specifically browsing around the jsonrpc and daexim projects sources; specifically this looks like it may inspire you: https://github.com/opendaylight/daexim/blob/stable/nitrogen/impl/src/main/java/org/opendaylight/daexim/impl/ImportTask.java
Best of luck.
Based on the information above, I constructed the following (which I am posting here to help others). I still do not know how to get rid of the deprecated reference to SchemaService (perhaps somebody can help).
private void importFromNormalizedNode(final DOMDataReadWriteTransaction rwTrx, final LogicalDatastoreType type,
final NormalizedNode<?, ?> data) throws TransactionCommitFailedException, ReadFailedException {
if (data instanceof NormalizedNodeContainer) {
#SuppressWarnings("unchecked")
YangInstanceIdentifier yid = YangInstanceIdentifier.create(data.getIdentifier());
rwTrx.put(type, yid, data);
} else {
throw new IllegalStateException("Root node is not instance of NormalizedNodeContainer");
}
}
private void importDatastore(String jsonData, QName qname) throws TransactionCommitFailedException, IOException,
ReadFailedException, SchemaSourceException, YangSyntaxErrorException {
// create StringBuffer object
LOG.info("jsonData = " + jsonData);
byte bytes[] = jsonData.getBytes();
InputStream is = new ByteArrayInputStream(bytes);
final NormalizedNodeContainerBuilder<?, ?, ?, ?> builder = ImmutableContainerNodeBuilder.create()
.withNodeIdentifier(new YangInstanceIdentifier.NodeIdentifier(qname));
try (NormalizedNodeStreamWriter writer = ImmutableNormalizedNodeStreamWriter.from(builder)) {
SchemaPath schemaPath = SchemaPath.create(true, qname);
LOG.info("SchemaPath " + schemaPath);
SchemaNode parentNode = SchemaContextUtil.findNodeInSchemaContext(schemaService.getGlobalContext(),
schemaPath.getPathFromRoot());
LOG.info("parentNode " + parentNode);
try (JsonParserStream jsonParser = JsonParserStream.create(writer, schemaService.getGlobalContext(),
parentNode)) {
try (JsonReader reader = new JsonReader(new InputStreamReader(is))) {
reader.setLenient(true);
jsonParser.parse(reader);
DOMDataReadWriteTransaction rwTrx = domDataBroker.newReadWriteTransaction();
importFromNormalizedNode(rwTrx, LogicalDatastoreType.CONFIGURATION, builder.build());
}
}
}
}