Parse a JSONNode using Java 8 stream API - java-8

I'm reading all Objects from Salesforce environment using Java, it's working fine but below code is taking 10 mins to convert the JSON into Java ArrayList. I was thinking if I can use Java - 8 stream API to parallel the parsing logic. Below is my working code, any suggestion appreciated.
/**
* #Desc : Get All available objects(tables) from salesforce
* #return : List<SalesforceObject>
* */
public List<SalesforceObject> getAllsObjects() {
List<SalesforceObject> listsObject = new ArrayList<SalesforceObject>();
try {
// query Salesforce
final URIBuilder builder = new URIBuilder(this.sfAccess.instanceURL);
builder.setPath(appProp.salesforceObjectPath);
final HttpGet get = new HttpGet(builder.build());
get.setHeader("Authorization", "Bearer " + this.sfAccess.token);
final CloseableHttpClient httpclient = HttpClients.createDefault();
final HttpResponse queryResponse = httpclient.execute(get);
// parse
final ObjectMapper mapper = new ObjectMapper().enable(SerializationFeature.INDENT_OUTPUT);
final JsonNode queryResults = mapper.readValue(queryResponse.getEntity().getContent(), JsonNode.class);
System.out.println(queryResults);
// This line takes - 10 mins
listsObject.addAll(mapper.convertValue(queryResults.get("sobjects"), new TypeReference<List<SalesforceObject>>(){}));
return listsObject;
} catch(IOException e) {
e.printStackTrace();
} catch (URISyntaxException e) {
e.printStackTrace();
}
return null;
}

You are looking at,
return StreamSupport.stream(queryResults.get("sobjects").spliterator(), true)
.map(sObj -> mapper.convertValue(sObj, SalesforceObject.class))
.collect(Collectors.toList());
Note that your concurrency will still be limited by the number of CPU cores of your server.

Related

Spring Boot using Apache POI streaming workbook - How do I release the memory taken up

I am writing a simple Spring boot app that reads from the DB and writes it out to an excel file using Apache POI. The generated file can contain upto 100K rows, and is around 8-10 MB in size.
My controller class:
public ResponseEntity<Resource> getExcelData(
#RequestBody ExcelRequest request) {
InputStreamResource file = new InputStreamResource(downloadService.startExcelDownload(request));
return ResponseEntity.ok()
.header(HttpHeaders.CONTENT_DISPOSITION, "attachment; filename=myFile.xlsx")
.contentType(MediaType.parseMediaType("application/vnd.ms-excel"))
.body(file);
}
Service class:
public ByteArrayInputStream startExcelDownload(ExcelRequest request) {
/** Apache POI code using SXSSFWorkbook **/
SXSSFWorkbook workbook = new SXSSFWorkbook(1000);
ByteArrayOutputStream out = new ByteArrayOutputStream();
try {
// Excel generation logic here
...
workbook.write(out);
return new ByteArrayInputStream(out.toByteArray());
}
catch (IOException | ParseException e) {
throw new RuntimeException("fail to import data to Excel file: " + e.getMessage());
}
finally {
try {
out.close();
} catch (Exception e)
{
e.printStackTrace();
}
}
}
Here is what I see in VisualVM
And in the heap dump:
byte[] 151,521,152 B (41.2%) 3,100,020 (30.7%)
Is there something I have missed? Should the byte[] continue to take up memory after the response has been returned? The memory goes down once I manually run the GC.

read json file from resources

I'm trying to read json file that located in documents folder into resources file in quarkus.
here is my code:
try(InputStream inputStream = classLoader.getResourceAsStream("documents/helloWorldDocument.json")) {
// Retrieve the JSON document and put into a string/object map
ObjectMapper mapper = new ObjectMapper();
TypeReference<HashMap<String, Object>> documentMapType =
new TypeReference<HashMap<String, Object>>() {};
//
Map<String, Object> document = mapper.readValue(
new File(inputStream.toString()),
documentMapType);
// Use builder methods in the SDK to create the directive.
RenderDocumentDirective renderDocumentDirective = RenderDocumentDirective.builder()
.withToken("helloWorldToken")
.withDocument(document)
.build();
// Add the directive to a responseBuilder.
responseBuilder.addDirective(renderDocumentDirective);
// Tailor the speech for a device with a screen.
speechText.append(" You should now also see my greeting on the screen.");
} catch (IOException e) {
throw new AskSdkException("Unable to read or deserialize the hello world document", e);
}
but getting exception. really appreciate if anyone could help.
(I'm implementing APL for an alexa skill)
After searching a lot, I solve this:
try {
File file = new File(
Objects.requireNonNull(this.getClass().getClassLoader().getResource("helloWorldDocument.json")).getFile()
);
ObjectMapper mapper = new ObjectMapper();
TypeReference<HashMap<String, Object>> documentMapType =
new TypeReference<HashMap<String, Object>>() {
};
Map<String, Object> document = mapper.readValue(
new File(file.toString()),
documentMapType);
RenderDocumentDirective renderDocumentDirective = RenderDocumentDirective.builder()
.withToken("helloWorldToken")
.withDocument(document)
.build();
responseBuilder.addDirective(renderDocumentDirective);
speechText.append(" You should now also see my greeting on the screen.");
} catch (IOException e) {
throw new AskSdkException("Unable to read or deserialize the hello world document", e);
}

How to download data from url?

I can download data via HttpUrlConnection and InputStream but I need to download raw-data. So, i want to create a DownloadManager via raw-data, then using raw-data I convert this data to binary or image format. According to my research, I see "download file from url" but I can't download file in mac? Always, I get FileNotFoundException. Please help me. How I can download data from url?
public class DownloadData extends AsyncTask<Void,Void,Void> {
#Override
protected Void doInBackground(Void... params) {
try {
downloadData("https://blablalabla/get");
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
public void downloadData(String myurl) throws IOException {
URL u = new URL(myurl);
InputStream is = u.openStream();
DataInputStream dis = new DataInputStream(is);
byte[] buffer = new byte[1024];
int length;
OutputStream fos = new FileOutputStream(new File(Environment.getExternalStorageDirectory() + "/Users/ilknurpc/Desktop/text.docx"));
while ((length = dis.read(buffer))>0) {
fos.write(buffer, 0, length);
}
}
}
If you want to construct a workable download manager, I would suggest that you take a look at the
Tomcat Default Servlet Implementation
.
There a few number of HTTP headers that you need to understand such as E-Tags and Http Range Headers for a proper implementation.
Thankfully the Tomcat Default Servlet handles the prerequisites for you.
You can adapt this servlet in your code with minor changes (package declaration etc).

CloseableHttpClient.execute freezes once every few weeks despite timeouts

We have a groovy singleton that uses PoolingHttpClientConnectionManager(httpclient:4.3.6) with a pool size of 200 to handle very high concurrent connections to a search service and processes the xml response.
Despite having specified timeouts, it freezes about once a month but runs perfectly fine the rest of the time.
The groovy singleton below. The method retrieveInputFromURL seems to block on client.execute(get);
#Singleton(strict=false)
class StreamManagerUtil {
// Instantiate once and cache for lifetime of Signleton class
private static PoolingHttpClientConnectionManager connManager = new PoolingHttpClientConnectionManager();
private static CloseableHttpClient client;
private static final IdleConnectionMonitorThread staleMonitor = new IdleConnectionMonitorThread(connManager);
private int warningLimit;
private int readTimeout;
private int connectionTimeout;
private int connectionFetchTimeout;
private int poolSize;
private int routeSize;
PropertyManager propertyManager = PropertyManagerFactory.getInstance().getPropertyManager("sebe.properties")
StreamManagerUtil() {
// Initialize all instance variables in singleton from properties file
readTimeout = 6
connectionTimeout = 6
connectionFetchTimeout =6
// Pooling
poolSize = 200
routeSize = 50
// Connection pool size and number of routes to cache
connManager.setMaxTotal(poolSize);
connManager.setDefaultMaxPerRoute(routeSize);
// ConnectTimeout : time to establish connection with GSA
// ConnectionRequestTimeout : time to get connection from pool
// SocketTimeout : waiting for packets form GSA
RequestConfig config = RequestConfig.custom()
.setConnectTimeout(connectionTimeout * 1000)
.setConnectionRequestTimeout(connectionFetchTimeout * 1000)
.setSocketTimeout(readTimeout * 1000).build();
// Keep alive for 5 seconds if server does not have keep alive header
ConnectionKeepAliveStrategy myStrategy = new ConnectionKeepAliveStrategy() {
#Override
public long getKeepAliveDuration(HttpResponse response, HttpContext context) {
HeaderElementIterator it = new BasicHeaderElementIterator
(response.headerIterator(HTTP.CONN_KEEP_ALIVE));
while (it.hasNext()) {
HeaderElement he = it.nextElement();
String param = he.getName();
String value = he.getValue();
if (value != null && param.equalsIgnoreCase
("timeout")) {
return Long.parseLong(value) * 1000;
}
}
return 5 * 1000;
}
};
// Close all connection older than 5 seconds. Run as separate thread.
staleMonitor.start();
staleMonitor.join(1000);
client = HttpClients.custom().setDefaultRequestConfig(config).setKeepAliveStrategy(myStrategy).setConnectionManager(connManager).build();
}
private retrieveInputFromURL (String categoryUrl, String xForwFor, boolean isXml) throws Exception {
URL url = new URL( categoryUrl );
GPathResult searchResponse = null
InputStream inputStream = null
HttpResponse response;
HttpGet get;
try {
long startTime = System.nanoTime();
get = new HttpGet(categoryUrl);
response = client.execute(get);
int resCode = response.getStatusLine().getStatusCode();
if (xForwFor != null) {
get.setHeader("X-Forwarded-For", xForwFor)
}
if (resCode == HttpStatus.SC_OK) {
if (isXml) {
extractXmlString(response)
} else {
StringBuffer buffer = buildStringFromResponse(response)
return buffer.toString();
}
}
}
catch (Exception e)
{
throw e;
}
finally {
// Release connection back to pool
if (response != null) {
EntityUtils.consume(response.getEntity());
}
}
}
private extractXmlString(HttpResponse response) {
InputStream inputStream = response.getEntity().getContent()
XmlSlurper slurper = new XmlSlurper()
slurper.setFeature("http://xml.org/sax/features/validation", false)
slurper.setFeature("http://apache.org/xml/features/disallow-doctype-decl", false)
slurper.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false)
slurper.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false)
return slurper.parse(inputStream)
}
private StringBuffer buildStringFromResponse(HttpResponse response) {
StringBuffer buffer= new StringBuffer();
BufferedReader rd = new BufferedReader(new InputStreamReader(response.getEntity().getContent()));
String line = "";
while ((line = rd.readLine()) != null) {
buffer.append(line);
System.out.println(line);
}
return buffer
}
public class IdleConnectionMonitorThread extends Thread {
private final HttpClientConnectionManager connMgr;
private volatile boolean shutdown;
public IdleConnectionMonitorThread
(PoolingHttpClientConnectionManager connMgr) {
super();
this.connMgr = connMgr;
}
#Override
public void run() {
try {
while (!shutdown) {
synchronized (this) {
wait(5000);
connMgr.closeExpiredConnections();
connMgr.closeIdleConnections(10, TimeUnit.SECONDS);
}
}
} catch (InterruptedException ex) {
// Ignore
}
}
public void shutdown() {
shutdown = true;
synchronized (this) {
notifyAll();
}
}
}
I also found found this in the log leading me to believe it happened on waiting for response data
java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:150) at java.net.SocketInputStream.read(SocketInputStream.java:121) at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
Findings thus far:
We are using java 1.8u25. There is an open issue on a similar scenario
https://bugs.openjdk.java.net/browse/JDK-8075484
HttpClient had a similar report https://issues.apache.org/jira/browse/HTTPCLIENT-1589 but this was fixed in
the 4.3.6 version we are using
Questions
Can this be a synchronisation issue? From my understanding even though the singleton is accessed by multiple threads, the only shared data is the cached CloseableHttpClient
Is there anything else fundamentally wrong with this code,approach that may be causing this behaviour?
I do not see anything obviously wrong with your code. I would strongly recommend setting SO_TIMEOUT parameter on the connection manager, though, to make sure it applies to all new socket at the creation time, not at the time of request execution.
I would also help to know what exactly 'freezing' means. Are worker threads getting blocked waiting to acquire connections from the pool or waiting for response data?
Please also note that worker threads can appear 'frozen' if the server keeps on sending bits of chunk coded data. As usual a wire / context log of the client session would help a lot
http://hc.apache.org/httpcomponents-client-4.3.x/logging.html

ObjectMapper Not Handling UTF-8 Properly?

I'm using ObjectMapper to serialize posts in my system to json. These posts contain entries from all over the world and contain utf-8 characters. The problem is that the ObjectMapper doesn't seem to be handling these characters properly. For example, the string "Musée d'Orsay" gets serialized as "Mus?©e d'Orsay".
Here's my code that's doing the serialization:
public static String toJson(List<Post> posts) {
ObjectMapper objectMapper = new ObjectMapper()
.configure(Feature.USE_ANNOTATIONS, true);
ByteArrayOutputStream out = new ByteArrayOutputStream();
try {
objectMapper.writeValue(out, posts);
} catch (JsonGenerationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (JsonMappingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return new String(out.toByteArray());
}
Interestingly, the exact same List<Post> posts gets serialized just fine when I return it via a request handler using #ResponseBody using the following configuration:
public void configureMessageConverters(List<HttpMessageConverter<?>> converters) {
ObjectMapper m = new ObjectMapper()
.enable(Feature.USE_ANNOTATIONS)
.disable(Feature.FAIL_ON_UNKNOWN_PROPERTIES);
MappingJacksonHttpMessageConverter c = new MappingJacksonHttpMessageConverter();
c.setObjectMapper(m);
converters.add(c);
super.configureMessageConverters(converters);
}
Any help greatly appreciated!
Aside from conversions, how about simplifying the process to:
return objectMapper.writeValueAsString(posts);
which speeds up the process (no need to go from POJO to byte to array to decode to char to build String) as well as (more importantly) shortens code.
Not 10 minutes later and I found the problem. The issue wasn't with the ObjectMapper, it was with how I was turning the ByteArrayOutputStream into a string. I changed the code as follows and everything started working:
try {
return out.toString("utf-8");
} catch (UnsupportedEncodingException e) {
return out.toString();
}

Resources