The documentation of MultiPartFile says that
The file contents are either stored in memory or temporarily on disk.
I want to be sure that the multipart file is not stored in memory, rather it is stored on disk.
There are multiple ways but this code will surely help you easily.
private void saveFile(MultipartFile multipartFile) throws Exception {
String filePath = request.getServletContext().getRealPath("/");
multipartFile.transferTo(new File(filePath));
}
or you can define your own custom path.
Related
I would like to download a list of files with name and content in Apache Camel.
Currently I am downloading the file content of all files as byte[] and storing them in a List. I then read the list using a ConsumerTemplate.
This works well. This is my Route:
from(downloadUri)
.aggregate(AggregationStrategies.flexible(byte[].class).accumulateInCollection(
LinkedList.class))
.constant(true)
.completionFromBatchConsumer()
.to("direct:" + this.destinationObjectId);
I get the List of all downloaded file contents as byte[] as desired.
I would like to extend it now so that it downloads the content and the file name of each file. It shall be stored in a pair object:
public class NameContentPair {
private String fileName;
private byte[] fileContent;
public NameContentPair(String fileName, byte[] fileContent) { ... }
}
These pair objects for each downloaded file shall in turn be stored in a List. How can I change or extend my Route to do this?
I tried Camel Converters, but was not able to build them properly into my Route. I always got the Route setup wrong.
I solved this by implementing a custom AggregationStrategy.
It reads the file name and the file content from each Exchange and puts them into a list as a NameContentPair objects. The file content and file name is present in the Exchange's body as a RemoteFile and is read from there.
The general aggregation implementation is based on the example implementation from https://camel.apache.org/components/3.15.x/eips/aggregate-eip.html
The aggregation strategy is then added to the route
from(downloadUri)
.aggregate(new FileContentWithFileNameInListAggregationStrategy())
.constant(true)
.completionFromBatchConsumer()
.to("direct:" + this.destinationObjectId);
My project has this requirement where user uploads a CSV file which has to be pushed to sql server database.
I know we can use Spring batch to process large number of records. But I'm not able to find any tutorial/sample code for this requirement of mine.
All the tutorials which I came across just hardcoded the CSV file name and in-memory databases in it like below:
https://spring.io/guides/gs/batch-processing/
User Input file is available in shared drive location on schduled time with file name prefix as eg: stack_overlfow_dd-MM-yyyy HH:mm, on daily basis how can I poll the Network shared drive for every 5-10 minutes atleast for one hour daily if its matches with regex then upload to database.
How can I take the csv file first from shared location and store it in memory or somewhere and then configure spring batch to read that as input.
any help here would be appreciated. Thanks In advance
All the tutorials which I came across just hardcoded the CSV file name and in-memory databases
You can find samples in the official repo here. Here is an example where the input file name is not hardcoded but passed as a job parameter.
How can I take the csv file first from shared location and store it in memory or somewhere and then configure spring batch to read that as input.
You can proceed in two steps: download the file locally then read/process/write it to the database (See https://stackoverflow.com/a/52110781/5019386).
how can I poll the Network shared drive for every 5-10 minutes atleast for one hour daily if its matches with regex then upload to database.
Once you have defined your job, you can schedule it to run when you want using:
a scheduler like Quartz
or using Spring's task scheduling features.
or using a combination of Spring Integration and Spring Batch. Spring integration would poll the directory and then launches a Spring Batch job when appropriate. This approach is described here.
More details on job scheduling here.
You can make a service layer that can process excel file and read data from file and construct java object to save into DB. Here I have used apache POI to parse Excel data and read from excel sheet.
public class FileUploadService {
#Autowired
FileUploadDao fileUploadDao;
public String uploadFileData(String inputFilePath) {
Workbook workbook = null;
Sheet sheet = null;
try {
workbook = getWorkBook(new File(inputFilePath));
sheet = workbook.getSheetAt(0);
/*Build the header portion of the Output File*/
String headerDetails = "EmployeeId,EmployeeName,Address,Country";
String headerNames[] = headerDetails.split(",");
/*Read and process each Row*/
ArrayList < ExcelTemplateVO > employeeList = new ArrayList < > ();
Iterator < Row > rowIterator = sheet.iterator();
while (rowIterator.hasNext()) {
Row row = rowIterator.next();
//Read and process each column in row
ExcelTemplateVO excelTemplateVO = new ExcelTemplateVO();
int count = 0;
while (count < headerNames.length) {
String methodName = "set" + headerNames[count];
String inputCellValue = getCellValueBasedOnCellType(row, count++);
setValueIntoObject(excelTemplateVO, ExcelTemplateVO.class, methodName, "java.lang.String", inputCellValue);
}
employeeList.add(excelTemplateVO);
}
fileUploadDao.saveFileDataInDB(employeeList);
} catch (Exception ex) {
ex.printStackTrace();
}
return "Success";
}
I believe your question have already been answered here.
The author of the question has even uploaded a repository of his working result :
https://github.com/PriyankaBolisetty/SpringBatchUploadCSVFileToDatabase/tree/master/src/main/java/springbatch_example
You can retrieve and filter files' lists in a shared drive using JCIFS API method SmbFile.listFiles(String wildcard).
I set up an FTPS server using Apache MINA. By overriding the default ftplet I can detect when a client starts uploading a new file to the server. I want to redirect the transfer to an S3 database, instead of having the file written in disk. The ftplet documentation in the MINA project states (https://mina.apache.org/ftpserver-project/ftplet.html) that
We can get the data input stream from request
But I cannot find how to get that stream from the two arguments.
Furthermore, in the FAQ there is a code example where a download is obtained from a database, by overriding the onDownloadStart method (https://mina.apache.org/ftpserver-project/faq.html#how-can-i-send-binary-data-stored-in-a-database-when-the-ftp-server-gets-the-retr-command):
public FtpletEnum onDownloadStart(FtpSession session, FtpRequest request,
FtpReplyOutput response) throws FtpException, IOException {
....
However, I am using the latest MINA version (mina-core 2.0.16, ftplet-api 1.1.1, ftpserver-core 1.1.1) and that method does not include the third argument. Has this changed in the latest versions??
The onDownloadStart example you're referring to seems to be out of date. For starters, the FtpletEnum class used was part of an early version of ftplet-api. Newer versions don't have it anymore. At least I was not able to find it.
Despite that, it's still possible to get the uploaded file from the client. You can ask for a DataConnection from the session, when overriding DefaultFtplet's onUploadStart method.
OutputStream outputStream = new ByteArrayOutputStream();
DataConnectionFactory connectionFactory = session.getDataConnection();
try {
DataConnection dataConnection = connectionFactory.openConnection();
dataConnection.transferFromClient(session, outputStream);
// now outputstream contains the uploaded file and you could
// store it in S3 if you wish
} catch (Exception e) {
e.printStackTrace();
} finally {
connectionFactory.closeDataConnection();
}
Keep in mind that you might also have to notify the client with response codes if your onUploadStart method returns SKIP. From Ftplet docs
This method will be called before the file upload. The file name can be get from the request argument. We can get the data input stream from request. This will be called before the permission check. This is called during STOR command. If the method returns SKIP, it has to send responses before and after processing. For example, before opening the data input stream, the method has to notify the client with a response code 150. Similarly, after the data transfer, the method has to notify the client with a response code 226. In case of any error, the method should send different response codes like 450, 425, 426, 551.
I'd like to utilize Spring Integration to initiate messages about files that appear in a remote location, without actually transferring them. All I require is the generation of a Message with, say, header values indicating the path to the file and filename.
What's the best way to accomplish this? I've tried stringing together an FTP inbound channel adapter with a service activator to write the header values I need, but this causes the file to be transferred to a local temp directory, and by the time the service activator sees it, the message consists of a java.io.File that refers to the local file and the remote path info is gone. It is possible to transform the message prior to this local transfer occurring?
We have similar problem and we solved it with filters. On inbound-channel-adapter you can set custom filter implementation. So before polling your filter will be called and you will have all informations about files, from which you can decide will that file be downloaded or not, for example;
<int-sftp:inbound-channel-adapter id="test"
session-factory="sftpSessionFactory"
channel="testChannel"
remote-directory="${sftp.remote.dir}"
local-directory="${sftp.local.dir}"
filter="customFilter"
delete-remote-files="false">
<int:poller trigger="pollingTrigger" max-messages-per-poll="${sftp.max.msg}"/>
</int-sftp:inbound-channel-adapter>
<beans:bean id="customFilter" class="your.class.location.SftpRemoteFilter"/>
Filter class is just implementation of the FileListFilter interface. Here it is dummy filter implementation.
public class SftpRemoteFilter implements FileListFilter<LsEntry> {
private static final Logger log = LoggerFactory.getLogger(SftpRemoteFilter.class);
#Override
public final List<LsEntry> filterFiles(LsEntry[] files) {
log.info("Here is files.");
//Do something smart
return Collections.emptyList();
}
}
But if you want to do that as you described, I think it is possible to do it by setting headers on payloads and then using same headers when you are using that payload, but in that case you should use Message<File> instead File in your service activator method.
I am new to hadoop. Basically I am writing a program which takes two multifasta files (ref.fasta,query.fasta) which are 3+ GB.....
ref.fasta:
gi|12345
ATATTATAGGACACCAATAAAATT..
gi|5253623
AATTATCGCAGCATTA...
..and so on..
query.fasta:
query
ATTATTTAAATCTCACACCACATAATCAATACA
AATCCCCACCACAGCACACGTGATATATATACA
CAGACACA...
NOw to each mapper I need to give a single part of ref file and the whole query file.
i.e
gi|12345
ATATTATAGGACACCAATA....
(a single fasta sequence from ref file)
AND the entire query file.because I want to run an exe inside mapper which takes these both as input.
so do i process ref.fasta outside and then give it to mapper?or some thing else..??
I just need approach which will take minimum time.
Thanks.
The best approach for your use-case may be to have the query file in distributed cache and get the file object ready in the configure()/setup() to be used in the map(). And have the ref file as normal input.
You may do the following:
In your run() add the query file to the distributed cache:
DistributedCache.addCacheFile(new URI(queryFile-HDFS-Or-S3-Path), conf);
Now have the mapper class something like following:
public static class MapJob extends MapReduceBase implements Mapper {
File queryFile;
#Override
public void configure(JobConf job) {
Path queryFilePath = DistributedCache.getLocalCacheFiles(job)[0];
queryFile = new File(queryFilePath.toString());
}
#Override
public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter)
throws IOException {
// Use the queryFile object and [key,value] from your ref file here to run the exe file as desired.
}
}
I faced a similar problem.
I'd suggest you pre-process your ref file and split it into multiple files (one per sequence).
Then copy those files to a folder on the hdfs that you will set as your input path in your main method.
Then implement a custom input format class and custom record reader class. Your record reader will just pass the name of the local file split path (as a Text value) to either the key or value parameter of your map method.
For the query file that is require by all map functions, again add your query file to the hdfs and then add it to the DistributedCache in your main method.
In your map method you'll then have access to both local file paths and can pass them to your exe.
Hope that helps.
I had a similar problem and eventually re-implemented the functionality of blast exe file so that I didn't need to deal with reading files in my map method and could instead deal entire with Java objects (Genes and Genomes) that are parsed from the input files by my custom record reader and then passed as objects to my map function.
Cheers, Wayne.