unable to read large file in azure app service - spring-boot

We have spring boot application running in azure app service to do some ETL operations on CSV files.
A file will be put into instance local directory from where application will be picking the file and process it. We are facing an issue if the uploaded file is bigger than >10 MB. Reader is not able to read the file and returning null . We are using supercsv to process the csv
file .
FileReader fr = new FileReader(filePath);
BufferedReader bufferedReader = new BufferedReader(fr);
CsvListReader reader = new CsvListReader(bufferedReader, CsvPreference.EXCEL_NORTH_EUROPE_PREFERENCE);
List<String> read = reader.read();
`
reader.read() method returns null.The issue is happening only in azure app service (linux) and working in local perfectly with the same.
Can anyone help me to find out what the issue is here ?

Related

fs.statSync() isn't giving actual BirthTime of the file - NodeJs

I am using fs (Node Module) to manage files. I am getting the file's created time (BirthTime). It is working absolutely fine when I run this app on my local machine. But when I try to implement it on EFS using NodeJs Lambda function then it gives 1970-01-01T00:00:00.000Z which is not the actual time of the file that I created.
var efsDirectory = "/mnt/data/";
var filePath = path.join(efsDirectory, file);
console.log("This file is going to be executed :", file);
var response = fs.statSync(filePath);
let fileBirthTime = response.birthtime;
console.log("File path is : ", filePath);
After joining the path my filepath looks like this filepath = /mnt/data/172.807056.json which is the actual path of the file.
In the Cloudwatch logs I am getting this :
On the local machine, it is working fine and giving the actual file birthtime. Can you tell me guys why I am getting this?
I posted the same question on the AWS repost, and an engineer responded to me with the following answer. Pasting the same answer here, if someone is facing that problem too.
You are getting this result with birthtime, as it is not supported on most NFS filesystems like EFS. Even on Linux OSes it depends on the kernel and type of file system as to whether this field is supported. The default file system on Amazon Linux 2 on EBS doesn't return a value to birthtime. However with the latest Ubuntu image, it is supported. This is why you would be seeing a difference between running it locally and against EFS.

Access Azure Storage Emulator through hadoop FileSystem api

I have a scala codebase where i am accessing azure blob files using Hadoop FileSystem Apis (and not the azure blob web client). My usage is of the format:
val hadoopConfig = new Configuration()
hadoopConfig.set(s"fs.azure.sas.${blobContainerName}.${accountName}.blob.windows.core.net",
sasKey)
hadoopConfig.set("fs.defaultFS",
s"wasbs://${blobContainerName}#${accountName}.blob.windows.core.net")
hadoopConfig.set("fs.wasb.impl",
"org.apache.hadoop.fs.azure.NativeAzureFileSystem")
hadoopConfig.set("fs.wasbs.impl",
"org.apache.hadoop.fs.azure.NativeAzureFileSystem$Secure")
val fs = FileSystem.get(
new java.net.URI(s"wasbs://" +
s"${blobContainerName}#${accountName}.blob.windows.core.net"), hadoopConfig)
I am now writing unit tests for this code using azure storage emulator as the storage account. I went through this page but it only explains how to access azure emulator through web apis of AzureBlobClient. I need to figure out how to test my above code by accessing azure storage emulator using hadoop FileSystem apis. I have tried the following way but this does not work:
val hadoopConfig = new Configuration()
hadoopConfig.set(s"fs.azure.sas.{containerName}.devstoreaccount1.blob.windows.core.net",
"Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==")
hadoopConfig.set("fs.defaultFS",
s"wasbs://{containerName}#devstoreaccount1.blob.windows.core.net")
hadoopConfig.set("fs.wasb.impl",
"org.apache.hadoop.fs.azure.NativeAzureFileSystem")
hadoopConfig.set("fs.wasbs.impl",
"org.apache.hadoop.fs.azure.NativeAzureFileSystem$Secure")
val fs = FileSystem.get(
new java.net.URI(s"wasbs://{containerName}#devstoreaccount1.blob.windows.core.net"), hadoopConfig)
I was able to solve this problem and connect to storage emulator by adding the following 2 configurations:
hadoopConfig.set("fs.azure.test.emulator",
"true")
hadoopConfig.set("fs.azure.storage.emulator.account.name",
"devstoreaccount1.blob.windows.core.net")

Hadoop: Can files be created without overwriting by appending content to the end of the name?

So in a lot of cases, when you add a file to a directory where a file with the same name already exists, it'll append something to the end of it. For example, a unique, incrementing number.
So, let's say the Hadoop system I'm connecting to has a folder called "/input", and there's already a file there called "sample.txt". So the full path would be "/input/sample.txt". If I tried to create a new file with the path "/input/sample.txt", it would save it as "/input/sample1.txt", or something like that. And then there would be two files in that directory, sample.txt and sample1.txt.
I'm new to Hadoop, my company has me building an interface that will allow our application to connect to Hadoop systems. I've got some simple client code working nicely, but I don't see anything in the API about how to do this. It's a behavior that other components of our product have, and while it's not necessary, I would like to be able to provide it for consistency's sake.
Thanks in advance.
PS. The client code I'm working on is in Java and uses Apache's Hadoop Client library.
Use FileSystem exists API and change the file to your needs increment or whatever.
sample java code todo that
Configuration conf = new Configuration();
conf.set("fs.default.name", "hdfs://namenode:9000");
FileSystem fs = FileSystem.get(conf);
Path path = new Path("/input/sample.txt");
if(fs.exists(path)) ....
https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/fs/FileSystem.html

Windows Service Use Isolated Storage

I have a windows service which seems to be unable to use isolated storage (most likely because I'm doing something wrong).
The service is running under local system.
Everytime I run the code there is no change to isolated storage folder.
Can someone please tell me why this is so.
I have tried on windows 7 and windows 8.1
IsolatedStorageFileStream configFile = new IsolatedStorageFileStream("UIsolate.cfg", FileMode.Create);
// create a writer to write to the stream
StreamWriter writer = new StreamWriter(configFile);
// write some data to the config. file
writer.WriteLine("test");
// flush the buffer and clean up
sr.Close();
writer.Close();
configFile.Close();

Bigquery utf-8 problems

I am using google-api-services-bigquery in java to load data from JSON files stored in Google Cloud Storage to BigQuery.
Everything was ok with this configuration:
Job job = new Job();
JobConfiguration config = new JobConfiguration();
JobConfigurationLoad configLoad = new JobConfigurationLoad();
configLoad.setSchema(schema);
configLoad.setDestinationTable(destTable);
configLoad.setEncoding(StringConstants.UTF_8);
configLoad.setCreateDisposition("CREATE_IF_NEEDED");
configLoad.setWriteDisposition("WRITE_APPEND");
configLoad.setSourceFormat("NEWLINE_DELIMITED_JSON");
configLoad.setAllowQuotedNewlines(false);
configLoad.setSourceUris(gcsPaths);
config.setLoad(configLoad);
job.setConfiguration(config);
But since about 2014-01-30 12:00:00 GMT russian characters in JSON values began to be replaced by question marks. The application was working as a daemon and was not even restarted at that moment. So, I think the issue is caused by some changes in Bigquery. Bigquery now uses Latin-1?
Does anyone know, how can I solve this?

Resources