Minimize the impact on memory on generating and downloading a file - spring-boot

For testing purposes only, I created this controller, which generates a 100MB file and returns it to the client. The method is very fast. The content of the file does not matter. The file is generated on the fly, it is not saved to disk.
Is it possible to reduce the impact on memory, particularly on java heap? Thank you
#GetMapping("/testDownload100MB")
public ResponseEntity<Resource> download100MB() throws IOException {
int sizeInBytes = 100 * 1024 * 1024;
ByteArrayOutputStream outStream = new ByteArrayOutputStream(sizeInBytes);
for (int i = 0; i < sizeInBytes; i++) {
outStream.write(0);
}
Resource resource = new ByteArrayResource(outStream.toByteArray());
return ResponseEntity.ok()
.headers(utilities.getGenericHttpHeadersToDownloadFile())
.contentLength(sizeInBytes)
.contentType(MediaType.APPLICATION_OCTET_STREAM)
.body(resource);
}
The code of utilities.getGenericHttpHeadersToDownloadFile() is not relevant for the question, however I report it for completeness:
public HttpHeaders getGenericHttpHeadersToDownloadFile() {
HttpHeaders headers = new HttpHeaders();
headers.add(HttpHeaders.CONTENT_DISPOSITION, "attachment; filename=myfile");
headers.add("Cache-Control", "no-cache, no-store, must-revalidate");
headers.add("Pragma", "no-cache");
headers.add("Expires", "0");
return headers;
}

Solved.
The code in my question throws a java.lang.OutOfMemoryError: Java heap space on my Spring Boot server. This new code solves the issue:
#GetMapping("/testDownload100MB")
public ResponseEntity<Resource> download100MB() throws IOException {
int size1MB = 1 * 1024 * 1024; // size in bytes
File tempFile = new File(System.getProperty("user.home") + File.separator + "tempFile" + System.currentTimeMillis());
for (int i = 0; i < 100; i++) {
ByteArrayOutputStream outStream = new ByteArrayOutputStream(size1MB);
for (int j = 0; j < size1MB; j++) {
outStream.write(0);
}
FileUtils.writeByteArrayToFile(tempFile, outStream.toByteArray(), true); // append to the temp file
}
Resource resource = new FileSystemResource(tempFile);
Timer timer = new Timer();
timer.schedule(new TimerTask() {
#Override
public void run() {
// we can assume that the file can be safely deleted after five minutes
tempFile.delete();
}
}, 1000 * 60 * 5);
return ResponseEntity.ok()
.headers(utilities.getGenericHttpHeadersToDownloadFile())
.contentLength(tempFile.length())
.contentType(MediaType.APPLICATION_OCTET_STREAM)
.body(resource);
}
Note that FileUtils must be imported with: import org.apache.commons.io.FileUtils;.
In this case, no objects exceeding 1 MB are placed in the java heap. In addition, FileSystemResource does not need to load the entire file into memory. On the other hand, I had to add a timer to delete the file: since this is a method for testing purposes, I can safely assume that after a few minutes the file is no longer needed.
If there are better solutions that do not require a temporary file, add your answer :)

Related

gRPC slow serialization on large dataset

I know that google states that protobufs don't support large messages (i.e. greater than 1 MB), but I'm trying to stream a dataset using gRPC that's tens of megabytes, and it seems like some people say it's ok, or at least with some splitting...
However, when I try to send an array this way (repeated uint32), it takes like 20 seconds on the same local machine.
#proto
service PAS {
// analyze single file
rpc getPhotonRecords (PhotonRecordsRequest) returns (PhotonRecordsReply) {}
}
message PhotonRecordsRequest {
string fileName = 1;
}
message PhotonRecordsReply {
repeated uint32 PhotonRecords = 1;
}
where PhotonRecordsReply needs to be ~10 million uint32 in length...
Does anyone have an idea on how to speed this up? Or what technology would be more appropriate?
So I think I've implemented streaming based on comments and answers given, but it still takes the same amount of time:
#proto
service PAS {
// analyze single file
rpc getPhotonRecords (PhotonRecordsRequest) returns (stream PhotonRecordsReply) {}
}
class PAS_GRPC(pas_pb2_grpc.PASServicer):
def getPhotonRecords(self, request: pas_pb2.PhotonRecordsRequest, _context):
raw_data_bytes = flb_tools.read_data_bytes(request.fileName)
data = flb_tools.reshape_flb_data(raw_data_bytes)
index = 0
chunk_size = 1024
len_data = len(data)
while index < len_data:
# last chunk
if index + chunk_size > len_data:
yield pas_pb2.PhotonRecordsReply(PhotonRecords=data[index:])
# all other chunks
else:
yield pas_pb2.PhotonRecordsReply(PhotonRecords=data[index:index + chunk_size])
index += chunk_size
Min repro
Github example
If you changed it over to use streams that should help. It took less than 2 seconds to transfer for me. Note this was without ssl and on localhost. This code I threw together. I did run it and it worked. Not sure what might happen if the file is not a multiple of 4 bytes for example. Also the endian order of bytes read is the default for Java.
I made my 10 meg file like this.
dd if=/dev/random of=my_10mb_file bs=1024 count=10240
Here's the service definition. Only thing I added here was the stream to the response.
service PAS {
// analyze single file
rpc getPhotonRecords (PhotonRecordsRequest) returns (stream PhotonRecordsReply) {}
}
Here's the server implementation.
public class PhotonsServerImpl extends PASImplBase {
#Override
public void getPhotonRecords(PhotonRecordsRequest request, StreamObserver<PhotonRecordsReply> responseObserver) {
log.info("inside getPhotonRecords");
// open the file, I suggest using java.nio API for the fastest read times.
Path file = Paths.get(request.getFileName());
try (FileChannel fileChannel = FileChannel.open(file, StandardOpenOption.READ)) {
int blockSize = 1024 * 4;
ByteBuffer byteBuffer = ByteBuffer.allocate(blockSize);
boolean done = false;
while (!done) {
PhotonRecordsReply.Builder response = PhotonRecordsReply.newBuilder();
// read 1000 ints from the file.
byteBuffer.clear();
int read = fileChannel.read(byteBuffer);
if (read < blockSize) {
done = true;
}
// write to the response.
byteBuffer.flip();
for (int index = 0; index < read / 4; index++) {
response.addPhotonRecords(byteBuffer.getInt());
}
// send the response
responseObserver.onNext(response.build());
}
} catch (Exception e) {
log.error("", e);
responseObserver.onError(
Status.INTERNAL.withDescription(e.getMessage()).asRuntimeException());
}
responseObserver.onCompleted();
log.info("exit getPhotonRecords");
}
}
The client just logs the size of the array received.
public long getPhotonRecords(ManagedChannel channel) {
if (log.isInfoEnabled())
log.info("Enter - getPhotonRecords ");
PASGrpc.PASBlockingStub photonClient = PASGrpc.newBlockingStub(channel);
PhotonRecordsRequest request = PhotonRecordsRequest.newBuilder().setFileName("/udata/jdrummond/logs/my_10mb_file").build();
photonClient.getPhotonRecords(request).forEachRemaining(photonRecordsReply -> {
log.info("got this many photons: {}", photonRecordsReply.getPhotonRecordsCount());
});
return 0;
}

Create AIScene instance from the file's content

I'm writing a Java web service where it is possible to upload a 3D object, operate on it and store it.
What I'm trying to do is creating an AIScene instance using a byte[] as an input parameter which is the file itself (it's content).
I have found no way to do this in the docs, all import methods require a path.
Right now I'm taking a look at both the lwjgl java version of Assimp as well as the C++ version. It doesn't matter which one is used to solve the issue.
Edit: the code I'm trying to get done:
#Override
public String uploadFile(MultipartFile file) {
AIFileIO fileIo = AIFileIO.create();
AIFileOpenProcI fileOpenProc = new AIFileOpenProc() {
public long invoke(long pFileIO, long fileName, long openMode) {
AIFile aiFile = AIFile.create();
final ByteBuffer data;
try {
data = ByteBuffer.wrap(file.getBytes());
} catch (IOException e) {
throw new RuntimeException();
}
AIFileReadProcI fileReadProc = new AIFileReadProc() {
public long invoke(long pFile, long pBuffer, long size, long count) {
long max = Math.min(data.remaining(), size * count);
memCopy(memAddress(data) + data.position(), pBuffer, max);
return max;
}
};
AIFileSeekI fileSeekProc = new AIFileSeek() {
public int invoke(long pFile, long offset, int origin) {
if (origin == Assimp.aiOrigin_CUR) {
data.position(data.position() + (int) offset);
} else if (origin == Assimp.aiOrigin_SET) {
data.position((int) offset);
} else if (origin == Assimp.aiOrigin_END) {
data.position(data.limit() + (int) offset);
}
return 0;
}
};
AIFileTellProcI fileTellProc = new AIFileTellProc() {
public long invoke(long pFile) {
return data.limit();
}
};
aiFile.ReadProc(fileReadProc);
aiFile.SeekProc(fileSeekProc);
aiFile.FileSizeProc(fileTellProc);
return aiFile.address();
}
};
AIFileCloseProcI fileCloseProc = new AIFileCloseProc() {
public void invoke(long pFileIO, long pFile) {
/* Nothing to do */
}
};
fileIo.set(fileOpenProc, fileCloseProc, NULL);
AIScene scene = aiImportFileEx(file.getName(),
aiProcess_JoinIdenticalVertices | aiProcess_Triangulate, fileIo); // ISSUE HERE. file.getName() is not a path, just a name. so is getOriginalName() in my case.
try{
Long id = scene.mMeshes().get(0);
AIMesh mesh = AIMesh.create(id);
AIVector3D vertex = mesh.mVertices().get(0);
return mesh.mName().toString() + ": " + (vertex.x() + " " + vertex.y() + " " + vertex.z());
}catch(Exception e){
e.printStackTrace();
}
return "fail";
}
When debugging the method I get an access violation in the method that binds to the native:
public static long naiImportFileEx(long pFile, int pFlags, long pFS)
this is the message:
#
A fatal error has been detected by the Java Runtime Environment:
#
EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x000000007400125d, pid=6400, tid=0x0000000000003058
#
JRE version: Java(TM) SE Runtime Environment (8.0_201-b09) (build 1.8.0_201-b09)
Java VM: Java HotSpot(TM) 64-Bit Server VM (25.201-b09 mixed mode windows-amd64 compressed oops)
Problematic frame:
V [jvm.dll+0x1e125d]
#
Failed to write core dump. Minidumps are not enabled by default on client versions of Windows
#
An error report file with more information is saved as:
C:\Users\ragos\IdeaProjects\objectstore3d\hs_err_pid6400.log
#
If you would like to submit a bug report, please visit:
http://bugreport.java.com/bugreport/crash.jsp
#
It is possible if we use the aiImportFileFromMemory method.
The approach I wanted to follow was copied from a github demo and actually copies the buffer around unnecessarily.
The reason for the access violation was the use of indirect buffers (for more info why that is a problem, check this out).
The solution is not nearly as complicated as the code I initially pasted:
#Override
public String uploadFile(MultipartFile file) throws IOException {
ByteBuffer buffer = BufferUtils.createByteBuffer((int) file.getSize());
buffer.put(file.getBytes());
buffer.flip();
AIScene scene = Assimp.aiImportFileFromMemory(buffer,aiProcess_Triangulate, (ByteBuffer) null);
Long id = scene.mMeshes().get(0);
AIMesh mesh = AIMesh.create(id);
AIVector3D vertex = mesh.mVertices().get(0);
return mesh.mName().dataString() + ": " + (vertex.x() + " " + vertex.y() + " " + vertex.z());
}
Here I create a direct buffer with the appropriate size, load the data and flip it (this part is a must.) After that let Assimp do its magic so you get pointers to the structure. With the return statement I just check if I got the valid data.
edit
As in the comments it was pointed out, this implementation is limited to a single file upload and assumes it gets everything that is necessary from that one MultipartFile, it won't work well with referenced formats. See docs for more detail.
The demo that was linked in the question's comments which was used in the question as a base has a different use case to my original one.

Best way to handle awt.Image buffering in JavaFX

I have a class that takes a String parameter and performs a google search, then it gets the ten images and puts them in an array, that is then handled by another method in the same class. Using Javafx.scene.image would probably allow me to implement the buffering progress easily, but there is a bug with JavaFX Image, that misinterprets the color encoding of normal Images, and saves a weird looking image to the hard drive, so I just decided to use awt.Image.
This is the image search class:
public class GoogleCustomSearch {
static String key = //custom google id;
static String cx = // also a custom google id;
static String searchType = "image";
static java.awt.Image[] publicImageArray;
public static java.awt.Image[] Search(String searchParameter,int start) throws IOException, URISyntaxException{
String formatedText = URLEncoder.encode(searchParameter,"UTF-8");
URL url = new URL("https://www.googleapis.com/customsearch/v1?" + "key=" +key + "&cx=" +cx + "&q=" +formatedText + "&searchType=" +searchType +"&imgSize=medium" + "&start=" + start + "&num=10");
System.out.println(url);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
conn.setRequestProperty("Accept", "application/json");
BufferedReader br = new BufferedReader(new InputStreamReader( ( conn.getInputStream() ) ) );
GResults results = new Gson().fromJson(br, GResults.class);
java.awt.Image [] imageArray = new java.awt.Image[10];
//JProgressBar prb = new JProgressBar();
//MediaTracker loadTracker = new MediaTracker(prb);
for(int i = 0; i<10; i++){
try {
imageArray[i] = ImageIO.read(new URL(results.getLink(i)));
}catch (java.io.IOException e){
imageArray[i] = ImageIO.read(new File("C:\\Users\\FILIP.D\\IdeaProjects\\Manual_Artwork\\src\\MAT - NoImage.jpg"));
}
}
conn.disconnect();
return imageArray;
}
public static BufferedImage getImage(String searchPar, int index, boolean newSearch) throws IOException, URISyntaxException {
int adaptedIndex;
int start;
BufferedImage bimage;
if(index<10){
adaptedIndex = index;
start = 1;
}else if (index<20){
start = 11;
adaptedIndex = index % 10;
if(index == 10){
publicImageArray = new java.awt.Image[10];
publicImageArray = Search(searchPar,start);
}
}else if(index < 30){
start = 21;
adaptedIndex = index % 10;
if (index == 20) {
publicImageArray = new java.awt.Image[10];
publicImageArray = Search(searchPar,start);
}
}else{
adaptedIndex = index % 10;
start = 21; //ovo ce posle 30 da ga vrti u loop prvih 10
}
if(newSearch){
publicImageArray = new java.awt.Image[10];
publicImageArray = Search(searchPar,start);
return bimage = (BufferedImage) publicImageArray[adaptedIndex];
}else{
return bimage = (BufferedImage) publicImageArray[adaptedIndex];
}
}
public static RenderedImage getLiveImage (int index){
return (RenderedImage) publicImageArray[index % 10];
}
}
And this is the snippet of the main GUI class that just handles opening the new image in the array
private void nextImageResult() throws IOException, URISyntaxException {
if(imgNr == -1){
imgNr++;
changeImage(SwingFXUtils.toFXImage(GoogleCustomSearch.getImage(oppenedTrack.getArtistName() + "+" + oppenedTrack.getTrackName(),imgNr,true),null));
}else{
imgNr++;
changeImage(SwingFXUtils.toFXImage(GoogleCustomSearch.getImage(oppenedTrack.getArtistName() + "+" + oppenedTrack.getTrackName(),imgNr,false),null));
}
}
To summarise, I need a proper way to show a progress bar in the place of the image before it loads, and it needs not to hang the UI, for which I can use Task. I can optimise the loading of the array with MediaTracker, so it can prioritize loading the first few images first.

Memory issues when running Spark job on relatively large input

I am running a spark cluster with 50 machines. Each machine is a VM with 8-core, and 50GB memory (41 seems to be available to Spark).
I am running on several input folders, I estimate the size of input to be ~250GB gz compressed.
Although it seems to me that the amount and configuration of machines I am using seems to be sufficient, after about 40 minutes of run the job fail, I can see following errors in the logs:
2558733 [Result resolver thread-2] WARN org.apache.spark.scheduler.TaskSetManager - Lost task 345.0 in stage 1.0 (TID 345, hadoop-w-3.c.taboola-qa-01.internal): java.lang.OutOfMemoryError: Java heap space
java.lang.StringCoding$StringDecoder.decode(StringCoding.java:149)
java.lang.StringCoding.decode(StringCoding.java:193)
java.lang.String.<init>(String.java:416)
java.lang.String.<init>(String.java:481)
com.doit.customer.dataconverter.Phase0$3.call(Phase0.java:699)
com.doit.customer.dataconverter.Phase0$3.call(Phase0.java:660)
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
and also:
2653545 [Result resolver thread-2] WARN org.apache.spark.scheduler.TaskSetManager - Lost task 122.1 in stage 1.0 (TID 392, hadoop-w-22.c.taboola-qa-01.internal): java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.StringCoding$StringDecoder.decode(StringCoding.java:149)
java.lang.StringCoding.decode(StringCoding.java:193)
java.lang.String.<init>(String.java:416)
java.lang.String.<init>(String.java:481)
com.doit.customer.dataconverter.Phase0$3.call(Phase0.java:699)
com.doit.customer.dataconverter.Phase0$3.call(Phase0.java:660)
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
How do I go about debugging such an issue?
EDIT: I Found the root cause of the problem. It is this piece of code:
private static final int MAX_FILE_SIZE = 40194304;
....
....
JavaPairRDD<String, List<String>> typedData = filePaths.mapPartitionsToPair(new PairFlatMapFunction<Iterator<String>, String, List<String>>() {
#Override
public Iterable<Tuple2<String, List<String>>> call(Iterator<String> filesIterator) throws Exception {
List<Tuple2<String, List<String>>> res = new ArrayList<>();
String fileType = null;
List<String> linesList = null;
if (filesIterator != null) {
while (filesIterator.hasNext()) {
try {
Path file = new Path(filesIterator.next());
// filter non-trc files
if (!file.getName().startsWith("1")) {
continue;
}
fileType = getType(file.getName());
Configuration conf = new Configuration();
CompressionCodecFactory compressionCodecs = new CompressionCodecFactory(conf);
CompressionCodec codec = compressionCodecs.getCodec(file);
FileSystem fs = file.getFileSystem(conf);
ContentSummary contentSummary = fs.getContentSummary(file);
long fileSize = contentSummary.getLength();
InputStream in = fs.open(file);
if (codec != null) {
in = codec.createInputStream(in);
} else {
throw new IOException();
}
byte[] buffer = new byte[MAX_FILE_SIZE];
BufferedInputStream bis = new BufferedInputStream(in, BUFFER_SIZE);
int count = 0;
int bytesRead = 0;
try {
while ((bytesRead = bis.read(buffer, count, BUFFER_SIZE)) != -1) {
count += bytesRead;
}
} catch (Exception e) {
log.error("Error reading file: " + file.getName() + ", trying to read " + BUFFER_SIZE + " bytes at offset: " + count);
throw e;
}
Iterable<String> lines = Splitter.on("\n").split(new String(buffer, "UTF-8").trim());
linesList = Lists.newArrayList(lines);
// get rid of first line in file
Iterator<String> it = linesList.iterator();
if (it.hasNext()) {
it.next();
it.remove();
}
//res.add(new Tuple2<>(fileType,linesList));
} finally {
res.add(new Tuple2<>(fileType, linesList));
}
}
}
return res;
}
Particularly allocating a buffer of size 40M for each file in order to read the content of the file using BufferedInputStream. This causes the stack memory to end at some point.
The thing is:
If I read line by line (which does not require a buffer), it will be
very non-efficient read
If I allocate one buffer and reuse it for
each file read - is it possible in parallelism sense? Or will it get
overwritten by several threads?
Any suggestions are welcome...
EDIT 2: Fixed first memory issue by moving the byte array allocation outside the iterator, so it gets reused by all partition elements. But there is still the new String(buffer, "UTF-8").trim()) which gets created for the split purpose - that's an object that gets also created every time. I could use a stringbuffer/builder but then how would I set the charset encoding without a String object ?
Eventually I changed the code as follows:
// Transform list of files to list of all files' content in lines grouped by type
JavaPairRDD<String,List<String>> typedData = filePaths.mapToPair(new PairFunction<String, String, List<String>>() {
#Override
public Tuple2<String, List<String>> call(String filePath) throws Exception {
Tuple2<String, List<String>> tuple = null;
try {
String fileType = null;
List<String> linesList = new ArrayList<String>();
Configuration conf = new Configuration();
CompressionCodecFactory compressionCodecs = new CompressionCodecFactory(conf);
Path path = new Path(filePath);
fileType = getType(path.getName());
tuple = new Tuple2<String, List<String>>(fileType, linesList);
// filter non-trc files
if (!path.getName().startsWith("1")) {
return tuple;
}
CompressionCodec codec = compressionCodecs.getCodec(path);
FileSystem fs = path.getFileSystem(conf);
InputStream in = fs.open(path);
if (codec != null) {
in = codec.createInputStream(in);
} else {
throw new IOException();
}
BufferedReader r = new BufferedReader(new InputStreamReader(in, "UTF-8"), BUFFER_SIZE);
// Get rid of the first line in the file
r.readLine();
// Read all lines
String line;
while ((line = r.readLine()) != null) {
linesList.add(line);
}
} catch (IOException e) { // Filtering of files whose reading went wrong
log.error("Reading of the file " + filePath + " went wrong: " + e.getMessage());
} finally {
return tuple;
}
}
});
So now I do not use a buffer in size of 40M but rather build the lines list dynamically using an array list. This solved my current memory issue, but now I got other strange errors failing the job. Will report those in a different question...

Posting file on Background Agent / HttpWebRequest stream buffer keeps growing?

I need to POST a 5MB file from within a ResourceIntensiveTask, where the OS sets a max memory usage of 5MB.
So trying to stream the file directly from storage, but the Stream associated to the HttpWebRequest keeps growing in size. This is the code:
public void writeStream(Stream writer, string filesource, string filename)
{
var store = System.IO.IsolatedStorage.IsolatedStorageFile.GetUserStoreForApplication();
var f = store.OpenFile(filesource, FileMode.Open, FileAccess.Read);
store.Dispose();
byte[] buffer = Encoding.UTF8.GetBytes(String.Format(#"Content-Disposition: form-data; name=""file""; filename=""{0}""\n", filename));
writer.Write(buffer, 0, buffer.Length);
buffer = Encoding.UTF8.GetBytes("Content-Type: application/octet-stream\n");
writer.Write(buffer, 0, buffer.Length);
long initialMemory = Microsoft.Phone.Info.DeviceStatus.ApplicationCurrentMemoryUsage;
buffer = new byte[2048];
int DataRead = 0;
do
{
DataRead = f.Read(buffer, 0, 2048);
if (DataRead > 0)
{
writer.Write(buffer, 0, DataRead);
Array.Clear(buffer, 0, 2048);
}
} while (DataRead > 0);
double increasedMemory = ((double)Microsoft.Phone.Info.DeviceStatus.ApplicationCurrentMemoryUsage - initialMemory) / 1000000;
buffer = Encoding.UTF8.GetBytes("\n--" + boundary + "\n--");
writer.Write(buffer, 0, buffer.Length);
writer.Flush();
}
increasedMemory debug variable is used to get the differential memory before and after the file is read and streamed to the HttpWebRequest, and it gives almost the exact size of the file (5MB) which means the process memory is increasing 5MB.
I am also setting AllowReadStreamBuffering=false to the HttpWebRequest.
How to keep memory low? How to upload large files when memory usage limit is 5MB?
The problem is that without being able to turn off write buffering, the connection to the server is not even made until BeginGetResponse() is called after closing the request stream (verified with WireShark).
The only way I can think of to get around this would be to use sockets directly (although that will be way more complicated if using an SSL connection).
This code works for me and doesn't increase memory usage while sending data to the server. I haven't tested it in a background task but don't see any reason it wouldn't work.
Socket _socket;
const int BUFFERSIZE = 4096;
byte[] writebuffer = new byte[BUFFERSIZE];
string hostName = "www.testdomain.com";
string hostPath = "/test/testupload.aspx";
IsolatedStorageFileStream isoFile;
public void SocketPOST(string hostName, string filesource)
{
using (IsolatedStorageFile store = IsolatedStorageFile.GetUserStoreForApplication())
{
if (store.FileExists(filesource))
{
isoFile = store.OpenFile(filesource, FileMode.Open, FileAccess.Read);
}
}
_socket = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
_socket.SetNetworkRequirement(NetworkSelectionCharacteristics.NonCellular);
SocketAsyncEventArgs socketEventArg = new SocketAsyncEventArgs();
socketEventArg.RemoteEndPoint = new DnsEndPoint(hostName, 80);
socketEventArg.Completed += new EventHandler<SocketAsyncEventArgs>(Socket_Completed);
_socket.ConnectAsync(socketEventArg);
}
private void Socket_Completed(object sender, SocketAsyncEventArgs e)
{
if (e.SocketError == SocketError.Success)
{
switch (e.LastOperation)
{
case SocketAsyncOperation.Connect: // Connected so started sending data, headers first
if (e.ConnectSocket.Connected)
{
StringBuilder sbHeaders = new StringBuilder("POST " + hostPath + " HTTP/1.1\r\n");
sbHeaders.Append("HOST: " + hostName + "\r\n");
sbHeaders.Append("USER-AGENT: MyWP7App/1.0\r\n");
sbHeaders.Append("Content-Type: text/plain; charset=\"utf-8\"\r\n");
sbHeaders.Append("Content-Length: " + isoFile.Length.ToString() + "\r\n\r\n");
byte[] headerBuffer = Encoding.UTF8.GetBytes(sbHeaders.ToString());
e.SetBuffer(headerBuffer, 0, headerBuffer.Length);
if (!e.ConnectSocket.SendAsync(e)) Socket_Completed(e.ConnectSocket, e);
}
break;
case SocketAsyncOperation.Send:
case SocketAsyncOperation.SendTo: // Previous buffer sent so send next one if stream not finished
Array.Clear(writebuffer, 0, BUFFERSIZE);
int DataRead = 0;
DataRead = isoFile.Read(writebuffer, 0, BUFFERSIZE);
if (DataRead > 0)
{
e.SetBuffer(writebuffer, 0, DataRead);
if (!_socket.SendAsync(e)) Socket_Completed(e.ConnectSocket, e);
}
else
{
isoFile.Dispose();
if (!_socket.ReceiveAsync(e)) Socket_Completed(e.ConnectSocket, e);
}
break;
case SocketAsyncOperation.Receive:
case SocketAsyncOperation.ReceiveFrom:
if (e.BytesTransferred > 0)
{
string response = Encoding.UTF8.GetString(e.Buffer, e.Offset, e.BytesTransferred).Trim('\0');
// Check response if necessary
e.ConnectSocket.Shutdown(SocketShutdown.Both);
e.ConnectSocket.Dispose();
}
break;
default:
break;
}
}
}
Note: I've left a lot of the error handling out to keep the example short.
SSL Note: Because SSL works at the TCP level and WP7 doesn't currently support SSL sockets (SslStream) you would need to handle the certificate handshake, cipher exchange, etc yourself to set up the SSL connection on the socket and then encrypt everything being sent (and decrypt everything received) with the agreed algorithms. There has been some success using the Bouncy Castle API so that could be possible (see this blog post).
One thing I noticed: you forgot to dispose f!
I personally would use the code like this:
public void writeStream(Stream writer, string filesource, string filename)
{
using (var store = System.IO.IsolatedStorage.IsolatedStorageFile.GetUserStoreForApplication())
{
long initialMemory = Microsoft.Phone.Info.DeviceStatus.ApplicationCurrentMemoryUsage;
using (var f = store.OpenFile(filesource, FileMode.Open, FileAccess.Read))
{
byte[] buffer = Encoding.UTF8.GetBytes(string.Format(#"Content-Disposition: form-data; name=""file""; filename=""{0}""\n", filename));
writer.Write(buffer, 0, buffer.Length);
buffer = Encoding.UTF8.GetBytes("Content-Type: application/octet-stream\n");
writer.Write(buffer, 0, buffer.Length);
buffer = new byte[2048];
int DataRead = 0;
do
{
DataRead = f.Read(buffer, 0, 2048);
if (DataRead > 0)
{
writer.Write(buffer, 0, DataRead);
}
} while (DataRead > 0);
buffer = Encoding.UTF8.GetBytes("\n--" + boundary + "\n--");
writer.Write(buffer, 0, buffer.Length);
writer.Flush();
}
double increasedMemory = ((double)Microsoft.Phone.Info.DeviceStatus.ApplicationCurrentMemoryUsage - initialMemory) / 1000000;
}
}
The boundary var seems to be missing, so a coding error still remains here!

Resources