Load and Save (csv format) a part large data CSV with spring boot spring batch - spring-boot

I would like to know if there is a way to load a large data in csv format, and then save only a part (ex: 50-100 lines) in csv format.

CSV in reality is just text, you can turn your image into text byte but will not be usable as CSV.

If your data is too large and you can't load everything at once, you could read the csv file line by line and get the data that you want.

Related

Read Excel file with CSV Data Set Config Element

I'm building a project on JMeter and I would like to read an Excel file with CSV Data Set Config to avoid to use Groovy to read it.
Do you know if it is possible? If not, any other JMeter element can help me to read Excel file row by row?
Many thanks in advance,
Best regards,
CSV Data Set Config is only able to read text files, as per Wikipedia:
A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values.
If your data file is a binary file like .xls or .xslx unfortunately CSV Data Set Config won't help, you have the following options:
Export .xls or .xlsx file to CSV using MS Excel or equivalent
If for any reason you cannot use point 1 you can go for Apache POI libraries to read the Excel file formats in JSR223 Test Elements like it's described in How to Implement Data Driven Testing in your JMeter Test article.
You can also see Busy Developers' Guide to HSSF and XSSF Features for some code snippets for popular user scenarios when it comes to reading/writing data from/to Excel files

Create parquet file with qualified name in databricks

I have to process some raw data files in csv with cleansing transformations and load as .parquet file in clenase layer. Raw layer file(csv) and Cleanse layer file should have same name.
But I cannot save the .parquet file with the given name, it is creating directory and underneath .parquet files are saved with random name. Please help how to accomplish this.
This is how parquet files are designed to be, a collection of multiple row groups.
The name of your parquet is the folder the chunks of data will be saved under.
If you want a single file, you will have to use a different file format, and you will likely lose the parallelization capabilities offered by parquet for both read and write.

read csv file data and store it in database using spring framework

I need help, I want the code to read the data which is in a csv file and the store that data into database. I have tried reading the csv file with known rows and cols. But the challenge here is that I want to create an utility where I don't know the number of cols and rows that are in the csv file so how would I do it? Please help.
Have you explored Spring Batch? You can write your own implementation of LineTokenizer for the columns which are going to change dynamically.

Adding image to hdfs in hadoop

My data is in the format of csv file (sam,1,34,there,hello). I want to add an image to each row in the csv file using hadoop. Does any body have any idea about it. I have seen about Hipi which process the image files and adds it also. But I want to add as a column to csv file.
If you have to use CSV file, consider using Base64 encoding over binary image data - it will give you a printable string. But in general I would recommend you to switch to sequence file, there you would be able to store the image directly in a binary format

Kettle: load CSV file which contains multiple data tables

I'm trying to import data from a csv file which, unfortunately, contains multiple data tables. Actually, it's not really a pure csv file.
It contains a header field with some metadata and then the actual csv data parts are separated by:
//-------------
Table <table_nr>;;;;
An example file looks as follows:
Summary;;
Reporting Date;29/05/2013;12:36:18
Report Name;xyz
Reporting Period From;20/05/2013;00:00:00
Reporting Period To;26/05/2013;23:59:59
//-------------
Table 1;;;;
header1;header2;header3;header4;header5
string_aw;0;0;0;0
string_ax;1;1;1;0
string_ay;1;2;0;1
string_az;0;0;0;0
TOTAL;2;3;1;1
//-------------
Table 2;;;
header1;header2;header3;header4
string_bv;2;2;2
string_bw;3;2;3
string_bx;1;1;1
string_by;1;1;1
string_bz;0;0;0
What would be the best way to process load such data using kettle?
Is there a way to split this file into the header and csv data parts and then process each of them as separate inputs?
Thanks in advance for any hints and tips.
Best,
Haes.
I don't think there are any steps that will really help you with data in such a format. You probably need to do some preprocessing before bringing your data into a CSV step. You could still do this in your job, though, by calling out to the shell and executing a command there first, like maybe an awk script to split up the file into its component files and then load those files via the normal Kettle pattern.

Resources