Serilog writing to undetermined number of log files - runtime

I have a bunch of archives, each contains thousands of files to be processed. I want to write one log file as summary, one containing errors if any, and finally, for each archive, a separate log file with all entries processed.
The first two can be configured with WriteTo and Filters. But I don't know how many archives up front. Is it possible to start new log files at runtime in a loop?
Thanks

Yes, Serilog loggers can be created and destroyed on-the-fly:
foreach (var archive in archives)
{
var filename = "archive-" + archive.Name + ".txt";
using (var log = new LoggerConfiguration()
.WriteTo.File(filename)
.CreateLogger())
{
log.Information("Processing archive");
// etc.
}
}

Related

ForEach Controller is not getting Triggered in Jmeter when using List from JSR223 Preprocessor

I need to read a folder and store the file names in a list.Then,I have a For each controller on which the filename list should iterate(i.e) I have a sampler inside For each and I have to pass different file path in that HTTP request in Files upload Section.
i have successfully read files from folder and stored that in a list and passed that list as input in For Each Controller.
but my ForEach controller is not even getting hit(not listed in View Results tree) and i can't able to see any info regarding that in my console.
reading File from folder and store in array
import java.io.File;
File file = new File("D:\\testdata\\trunk\\version2\\Jmeter\\loadscript\\Files\\txt\\test")
String[] arr = file.list()
def varList =[]
for(String e : arr){
varList.add(e)
}
vars.put("filelist",varList)
ForEachController:
When I am using debug PostProcessor,the values inside list is displayed correctly.
ForEach Controller picks up JMeter Variables in form of:
filelist_1
filelist_2
filelist_3
etc.
So you need to slightly amend your code to look like:
File file = new File("D:\\testdata\\trunk\\version2\\Jmeter\\loadscript\\Files\\txt\\test")
String[] arr = file.list()
arr.eachWithIndex { f, index ->
vars.put("filelist_" + (index + 1), f)
}
Also you may find Directory Listing Config plugin easier to use, it can be installed using JMeter Plugins Manager

Transactions for file operations in Laravel

In Laravel I can do database transactions by passing a closure to the DB::transaction function.
Does Laravel have any support for a similar feature for the File or Storage facade? Where the file operations are run in a transaction, with rollback in case a file operation fails?
I'm imagining something like
$files = ... // Something that returns a collection
File::transaction(function () use ($files) {
$files->each(function() {
File::move(....);
});
});
There is no built in way of doing it so you'd have to make an implementation yourself.
A simple method of achieving it would be
$fileName = ""; // or $fileNames ( array ) if multiple file uploads
$files = "" // to be used if you're going to update or delete files. Again if multiple file modifications then use array
try{
/* Just a note, but your new file could overwrite any existing files,
so before uploading, check if another file exists with same filename
And if it does, load that file and keep it in the $files variable
*/
// Upload File
$fileName = // name of uploaded file
$files = // Any existing file you're going to modify. Load the entire file data, not just the name
// Modify/Delete a file
}( \Exception $e ){
// Now delete the file using fileName or $fileNames if the variable is not empty
// If you modified/deleted any file, then undo those modifications using the data in $files ( if it's not empty )
}
In this method, existing files are loaded to memory, but if there are multiple large files, it might be better to move them to a temporary location instead, and move them back if any exception is thrown. Just don't forget to delete these temporary files if the file transaction is a success

Spring ResourceArrayPropertyEditor: How to filter only files?

I am using spring.ios ResourceArrayPropertyEditor to find all resources matching certain patterns (to make it easier for this example, lets say, I am looking for foo-files):
What I do:
ResourceArrayPropertyEditor resolver = new ResourceArrayPropertyEditor();
String[] resourcePattern = new String[]{"classpath*:**/*.foo"};
resolver.setValue(resourcePattern);
Resource[] resources = (Resource[]) resolver.getValue();
Problem: this not only finds all "*.foo" files I have on my classpath, but also finds all package-folders that end with "foo", for example: "org.mydomain.database.foo".
I do not need these entries and even will get errors when trying to process them.
How can I filter the resources to only contain files? (like a find . -type f).
Documentation says, that ResourceArrayPropertyEditor is using PathMatchingResourcePatternResolver by default, for resolve particular resources. Judging from source code of PathMatchingResourcePatternResolver, it will select all resources matched to specified pattern, and does not check is it directory or file.
The only option, is check isReadable() property of Resource after you get list of resources.
Resource[] resources = (Resource[]) resolver.getValue();
for(Resource resource : resources){
if(resource.isReadable()){
//will work only for files
}
}
Or if you use Java 8 stream:
Resource [] resources = (Resource[]) resolver.getValue();
Resource [] fileResources = Arrays.stream(resources).filter(Resource::isReadable).toArray(Resource[]::new);
This method is more preferable than, resource.getFile().isDirectory(), because there is no need to handle an IOException.

Parsing several csv files using Spring Batch

I need to parse several csv files from a given folder. As each csv has different columns, there are separate tables in DB for each csv. I need to know
Does spring batch provide any mechanism which scans through the given folder and then I can pass those files one by one to the reader.
As I am trying to make the reader/writer generic, is it possible to just get the column header for each csv, based upon that I am trying to build tokenizer and also the insert query.
Code sample
public ItemReader<Gdp> reader1() {
FlatFileItemReader<Gdp> reader1 = new FlatFileItemReader<Gdp>();
reader1.setResource(new ClassPathResource("datagdp.csv"));
reader1.setLinesToSkip(1);
reader1.setLineMapper(new DefaultLineMapper<Gdp>() {
{
setLineTokenizer(new DelimitedLineTokenizer() {
{
setNames(new String[] { "region", "gdpExpend", "value" });
}
});
setFieldSetMapper(new BeanWrapperFieldSetMapper<Gdp>() {
{
setTargetType(Gdp.class);
}
});
}
});
return reader1;
}
Use a MultiResourceItemReader to scan all files.
I think you need a sort of classified ItemReader as MultiResourceItemReader.delegate but SB doesn't offer that so you have to write your own.
For ItemProcessor and ItemWriter SB offers a classifier-aware implementation (ClassifierCompositeItemProcessor and ClassifierCompositeItemWriter).
Obviously more different input file you have more XML config must be write,but it should be straightforward to do.
I suppose you are expecting this kind of implementation.
During the Partition Step Builder, read all the files names, file header, insert query for the writer and save the same in the Execution Context.
In the slave step, for every reader and writer, pass on the Execution context, get the file to read, file header to the tokenizer, insert query that needs to be inserted for that writer.
This resolves your question.
Answers for your questions:
I don't know about a specific mechanism on spring batch to scan files.
You can use opencsv as generic CSV reader, there are a lot of mechanisms reading files.
About OpenCSV:
If you are using maven project, try to import this dependency:
<dependency>
<groupId>net.sf.opencsv</groupId>
<artifactId>opencsv</artifactId>
<version>2.0</version>
</dependency>
You can read your files making an object for specific formats, or generic headers like this below:
private static List<DadosPeople> extrairDadosPeople() throws IOException {
CSVReader readerPeople = new CSVReader(new FileReader(people));
List<PeopleData> listPeople = new ArrayList<PeopleData>();
String[] nextLine;
while ((nextLine = readerPeople.readNext()) != null) {
PeopleData people = new PeopleData();
people.setIncludeData(nextLine[0]);
people.setPartnerCode(Long.valueOf(nextLine[1]));
listPeople.add(people);
}
readerPeople.close();
return listPeople;
}
There are a lot of other ways to read CSV files using opencsv:
If you want to use an Iterator style pattern, you might do something like this:
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
String [] nextLine;
while ((nextLine = reader.readNext()) != null) {
// nextLine[] is an array of values from the line
System.out.println(nextLine[0] + nextLine[1] + "etc...");
}
Or, if you might just want to slurp the whole lot into a List, just call readAll()...
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
List myEntries = reader.readAll();
which will give you a List of String[] that you can iterate over. If all else fails, check out the Javadocs here.
If you want to customize quote characters and separators, you'll find constructors that cater for supplying your own separator and quote characters. Say you're using a tab for your separator, you can do something like this:
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"), '\t');
And if you single quoted your escaped characters rather than double quote them, you can use the three arg constructor:
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"), '\t', '\'');
You may also skip the first few lines of the file if you know that the content doesn't start till later in the file. So, for example, you can skip the first two lines by doing:
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"), '\t', '\'', 2);
Can I write csv files with opencsv?
Yes. There is a CSVWriter in the same package that follows the same semantics as the CSVReader. For example, to write a tab separated file:
CSVWriter writer = new CSVWriter(new FileWriter("yourfile.csv"), '\t');
// feed in your array (or convert your data to an array)
String[] entries = "first#second#third".split("#");
writer.writeNext(entries);
writer.close();
If you'd prefer to use your own quote characters, you may use the three arg version of the constructor, which takes a quote character (or feel free to pass in CSVWriter.NO_QUOTE_CHARACTER).
You can also customise the line terminators used in the generated file (which is handy when you're exporting from your Linux web application to Windows clients). There is a constructor argument for this purpose.
Can I dump out SQL tables to CSV?
Yes you can. There is a feature on CSVWriter so you can pass writeAll() a ResultSet.
java.sql.ResultSet myResultSet = ....
writer.writeAll(myResultSet, includeHeaders);
Is there a way to bind my CSV file to a list of Javabeans?
Yes there is. There is a set of classes to allow you to bind a CSV file to a list of JavaBeans based on column name, column position, or a custom mapping strategy. You can find the new classes in the com.opencsv.bean package. Here's how you can map to a java bean based on the field positions in your CSV file:
ColumnPositionMappingStrategy strat = new ColumnPositionMappingStrategy();
strat.setType(YourOrderBean.class);
String[] columns = new String[] {"name", "orderNumber", "id"}; // the fields to bind do in your JavaBean
strat.setColumnMapping(columns);
CsvToBean csv = new CsvToBean();
List list = csv.parse(strat, yourReader);

Apache Spark on YARN: Large number of input data files (combine multiple input files in spark)

A help for the implementation best practice is needed.
The operating environment is as follows:
Log data file arrives irregularly.
The size of a log data file is from 3.9KB to 8.5MB. The average is about 1MB.
The number of records of a data file is from 13 lines to 22000 lines. The average is about 2700 lines.
Data file must be post-processed before aggregation.
Post-processing algorithm can be changed.
Post-processed file is managed separately with original data file, since the post-processing algorithm might be changed.
Daily aggregation is performed. All post-processed data file must be filtered record-by-record and aggregation(average, max min…) is calculated.
Since aggregation is fine-grained, the number of records after the aggregation is not so small. It can be about half of the number of the original records.
At a point, the number of the post-processed file can be about 200,000.
A data file should be able to be deleted individually.
In a test, I tried to process 160,000 post-processed files by Spark starting with sc.textFile() with glob path, it failed with OutOfMemory exception on the driver process.
What is the best practice to handle this kind of data?
Should I use HBase instead of plain files to save post-processed data?
I wrote own loader. It solved our problem with small files in HDFS. It uses Hadoop CombineFileInputFormat.
In our case it reduced the number of mappers from 100000 to approx 3000 and made job significantly faster.
https://github.com/RetailRocket/SparkMultiTool
Example:
import ru.retailrocket.spark.multitool.Loaders
val sessions = Loaders.combineTextFile(sc, "file:///test/*")
// or val sessions = Loaders.combineTextFile(sc, conf.weblogs(), size = 256, delim = "\n")
// where size is split size in Megabytes, delim - line break character
println(sessions.count())
I'm pretty sure the reason your getting OOM is because of handling so many small files. What you want is to combine the input files so you don't get so many partitions. I try to limit my jobs to about 10k partitions.
After textFile, you can use .coalesce(10000, false) ... not 100% sure that will work though because it's been a while since I've done it, please let me know. So try
sc.textFile(path).coalesce(10000, false)
You can use this
First You can get a Buffer/List of S3 Paths / Same for HDFS or Local Path
If you're trying with Amazon S3 then :
import scala.collection.JavaConverters._
import java.util.ArrayList
import com.amazonaws.services.s3.AmazonS3Client
import com.amazonaws.services.s3.model.ObjectListing
import com.amazonaws.services.s3.model.S3ObjectSummary
import com.amazonaws.services.s3.model.ListObjectsRequest
def listFiles(s3_bucket:String, base_prefix : String) = {
var files = new ArrayList[String]
//S3 Client and List Object Request
var s3Client = new AmazonS3Client();
var objectListing: ObjectListing = null;
var listObjectsRequest = new ListObjectsRequest();
//Your S3 Bucket
listObjectsRequest.setBucketName(s3_bucket)
//Your Folder path or Prefix
listObjectsRequest.setPrefix(base_prefix)
//Adding s3:// to the paths and adding to a list
do {
objectListing = s3Client.listObjects(listObjectsRequest);
for (objectSummary <- objectListing.getObjectSummaries().asScala) {
files.add("s3://" + s3_bucket + "/" + objectSummary.getKey());
}
listObjectsRequest.setMarker(objectListing.getNextMarker());
} while (objectListing.isTruncated());
//Removing Base Directory Name
files.remove(0)
//Creating a Scala List for same
files.asScala
}
Now Pass this List object to the following piece of code, note : sc is an object of SQLContext
var df: DataFrame = null;
for (file <- files) {
val fileDf= sc.textFile(file)
if (df!= null) {
df= df.unionAll(fileDf)
} else {
df= fileDf
}
}
Now you got a final Unified RDD i.e. df
Optional, And You can also repartition it in a single BigRDD
val files = sc.textFile(filename, 1).repartition(1)
Repartitioning always works :D

Resources