SpringBatch write to different entities - spring

I have processed a "wrapperObject" (AimResponse in this case).
Depending on the property "type" I map to Document or SourceSpace object.
Then I need to persist these entities. I found an example similar to this one:
#Override
public void write(List<? extends List<AimResponse>> list)
throws Exception {
List<SourceSpace> sourceSpaces = new ArrayList<>();
List<Document> documents = new ArrayList<>();
for(List<AimResponse> item:list) {
for(AimResponse i:item) {
if(i.getType().indexOf("folder") >= 0) {
SourceSpace sourceSpace = Mapper.aimResponseToSourceSpace(i);
sourceSpace.setStatus(Status.FOUND.name());
sourceSpaces.add(sourceSpace);
} else if(i.getType().indexOf("document") >= 0) {
Document document = Mapper.aimResponseToDocument(i);
document.setStatus(Status.FOUND.name());
documents.add(document);
}
}
}
if(!CollectionUtils.isEmpty(sourceSpaces)) {
sourceSpaceWriter.write(sourceSpaces);
}
if(!CollectionUtils.isEmpty(documents)) {
documentWriter.write(documents);
}
}
In this example I'm not able to instantiate JdbcBatchItemWriter but anyway I think should be better if the processor could split into 2 different lists and call 2 different writers each one with its own type but I guess it's not possible.
Any help is appreciated.

ClassifierCompositeItemWriter is what you are looking for. It allows you to classify items according to a given criteria and call the corresponding writer.
In your case, you can classify items based on their type (i.getType()) and use a writer for each type. You can find an example of how to use that writer here.

Related

Hibernate queries getting slower and slower

I'm working on a process that checks and updates data from Oracle database. I'm using hibernate and spring framework in my application.
The application reads a csv file, processes the content, then persiste entities :
public class Main() {
Input input = ReadCSV(path);
EntityList resultList = Process.process(input);
WriteResult.write(resultList);
...
}
// Process class that loops over input
public class Process{
public EntityList process(Input input) :
EntityList results = ...;
...
for(Line line : input.readLine()){
results.add(ProcessLine.process(line))
...
}
return results;
}
// retrieving and updating entities
Class ProcessLine {
#Autowired
DomaineRepository domaineRepository;
#Autowired
CompanyDomaineService companydomaineService
#Transactional
public MyEntity process(Line line){
// getcompanyByXX is CrudRepository method with #Query that returns an entity object
MyEntity companyToAttach = domaineRepository.getCompanyByCode(line.getCode());
MyEntity companyToDetach = domaineRepository.getCompanyBySiret(line.getSiret());
if(companyToDetach == null || companyToAttach == null){
throw new CustomException("Custom Exception");
}
// AttachCompany retrieves some entity relationEntity, then removes companyToDetach and adds CompanyToAttach. this updates relationEntity.company attribute.
companydomaineService.attachCompany(companyToAttach, companyToDetach);
return companyToAttach;
}
}
public class WriteResult{
#Autowired
DomaineRepository domaineRepository;
#Transactional
public void write(EntityList results) {
for (MyEntity result : results){
domaineRepository.save(result)
}
}
}
The application works well on files with few lines, but when i try to process large files (200 000 lines), the performance slows drastically, and i get a SQL timeout.
I suspect cache issues, but i'm wondering if saving all the entities at the end of the processing isn't a bad practice ?
The problem is your for loop which is doing individual saves on the result and thus does single inserts slowing it down. Hibernate and spring support batch inserts and should be done when ever possible.
something like domaineRepository.saveAll(results)
Since you are processing lot of data it might be better to do things in batches so instead of getting one company to attach you should get a list of companies to attach processes those then get a list of companies to detach and process those
public EntityList process(Input input) :
EntityList results;
List<Code> companiesToAdd = new ArrayList<>();
List<Siret> companiesToRemove = new ArrayList<>();
for(Line line : input.readLine()){
companiesToAdd.add(line.getCode());
companiesToRemove.add(line.getSiret());
...
}
results = process(companiesToAdd, companiesToRemove);
return results;
}
public MyEntity process(List<Code> companiesToAdd, List<Siret> companiesToRemove) {
List<MyEntity> attachList = domaineRepository.getCompanyByCodeIn(companiesToAdd);
List<MyEntity> detachList = domaineRepository.getCompanyBySiretIn(companiesToRemove);
if (attachList.isEmpty() || detachList.isEmpty()) {
throw new CustomException("Custom Exception");
}
companydomaineService.attachCompany(attachList, detachList);
return attachList;
}
The above code is just sudo code to point you in the right direction, will need to work out what works for you.
For every line you read you are doing 2 read operations here
MyEntity companyToAttach = domaineRepository.getCompanyByCode(line.getCode());
MyEntity companyToDetach = domaineRepository.getCompanyBySiret(line.getSiret());
You can read more than one line and us the in query and then process that list of companies

How to retrieve data by property in Couchbase Lite?

My documents have the property docType that separated them based on the purpose of each type, in the specific case template or audit. However, when I do the following:
document.getProperty("docType").equals("template");
document.getProperty("docType").equals("audit");
The results of them are always the same, it returns every time all documents stored without filtering them by the docType.
Below, you can check the query function.
public static Query getData(Database database, final String type) {
View view = database.getView("data");
if (view.getMap() == null) {
view.setMap(new Mapper() {
#Override
public void map(Map<String, Object> document, Emitter emitter) {
if(String.valueOf(document.get("docType")).equals(type)){
emitter.emit(document.get("_id"), null);
}
}
}, "4");
}
return view.createQuery();
}
Any hint?
This is not a valid way to do it. Your view function must be pure (it cannot reference external state such as "type"). Once that is created you can then query it for what you want by setting start and end keys, or just a set of keys in general to filter on.

Tranversing and filtering a Set comparing its objects' getters to an Array using Stream

I've got some working, inelegant code here:
The custom object is:
public class Person {
private int id;
public getId() { return this.id }
}
And I have a Class containing a Set<Person> allPersons containing all available subjects. I want to extract a new Set<Person> based upon one or more ID's of my choosing. I've written something which works using a nested enhanced for loop, but it strikes me as inefficient and will make a lot of unnecessary comparisons. I am getting used to working with Java 8, but can't quite figure out how to compare the Set against an Array. Here is my working, but verbose code:
public class MyProgram {
private Set<Person> allPersons; // contains 100 people with Ids 1-100
public Set<Person> getPersonById(int[] ids) {
Set<Person> personSet = new HashSet<>() //or any type of set
for (int i : ids) {
for (Person p : allPersons) {
if (p.getId() == i) {
personSet.add(p);
}
}
}
return personSet;
}
}
And to get my result, I'd call something along the lines of:
Set<Person> resultSet = getPersonById(int[] intArray = {2, 56, 66});
//resultSet would then contain 3 people with the corresponding ID
My question is how would i convert the getPersonById method to something using which streams allPersons and finds the ID match of any one of the ints in its parameter array? I thought of some filter operation, but since the parameter is an array, I can't get it to take just the one I want only.
The working answer to this is:
return allPersons.stream()
.filter(p -> (Arrays.stream(ids).anyMatch(i -> i == p.getId())) )
.collect(Collectors.toSet());
However, using the bottom half of #Flown's suggestion and if the program was designed to have a Map - it would also work (and work much more efficiently)
As you said, you can introduce a Stream::filter step using a Stream::anyMatch operation.
public Set<Person> getPersonById(int[] ids) {
Objects.requireNonNull(ids);
if (ids.length == 0) {
return Collections.emptySet();
}
return allPersons.stream()
.filter(p -> IntStream.of(ids).anyMatch(i -> i == p.getId()))
.collect(Collectors.toSet());
}
If the method is called more often, then it would be a good idea to map each Person to its id having a Map<Integer, Person>. The advantage is, that the lookup is much faster than iterating over the whole set of Person.Then your algorithm may look like this:
private Map<Integer, Person> idMapping;
public Set<Person> getPersonById(int[] ids) {
Objects.requireNonNull(ids);
return IntStream.of(ids)
.filter(idMapping::containsKey)
.mapToObj(idMapping::get)
.collect(Collectors.toSet());
}

learning java stream, how to pass a value from the outer loop to the nested loop in a functional way

I have map of a map of strings. This map is a parsing of a json object and represents the criteria entered by the user to filter a list in the UI.
In the rest service I want to populate an object with data comes from this map. Unfortunately I cannot change queryModel Object. Query Model object has a list of filters. Each filter has a list of fields and a list of operations to be applied to the field. My goal is to convert the following code with java 8 stream.
for(Map.Entry<String,Map<String,String>> entry: filters.entrySet()) {
Filter filter = new Filter();
filter.setFields(new ArrayList<String>());
filter.getFields().add(entry.getKey());
filter.setValues(new ArrayList<String>());
filter.setOperators(new ArrayList<String>());
if (entry.getValue() != null) {
for(String key : entry.getValue().keySet()) {
if(key.equals("value")) {
filter.getValues().add(entry.getValue().get(key));
}
else if(key.equals("matchMode")){
filter.getOperators().add(entry.getValue().get(key));
}
}
queryModel.getFilters().add(filter);
}
As you can see I first set the name of the field in the fields list and then for that field I loop in the values to get the value entered and the match mode. In a functional I don't know ho to save the field of the outer loop to set it in the filter object created in the inner loop.
That was my attempt
public static Filter getFilter(Map.Entry<String,String> entry) {
Filter filter = new Filter();
filter.setFields(new ArrayList<String>());
filter.getFields().add(entry.getKey());
filter.setValues(new ArrayList<String>());
filter.setOperators(new ArrayList<String>());
if(entry.getKey().equals("value")) {
filter.getValues().add(entry.getValue());
}
else if(entry.getKey().equals("matchMode")){
filter.getOperators().add(entry.getValue());
}
return filter;
}
List<Filter> filterList = filters.entrySet().stream()
.filter( stringMapEntry -> stringMapEntry.getValue() != null)
.flatMap( entry -> entry.getValue().entrySet().stream())
.map (innerEntry-> QueryModelAdapter.getFilter(innerEntry))
.collect (Collectors.toList());
queryModel.setFilters (filterList);
I need in QueryModelAdapter.getFilter the entry of the flat map. How can I do that?
Before I say anything, be polite when asking questions. Nobody gets paid for answering questions here. All are doing it for their pleasure.
So, be nice to them at least with your words.
Alright, I think your question is more suitable for CodeReview than StackOverflow.
One thing to note, You can't rewrite your legacy java projects to have every single line with lambdas and streams.
Sometimes, it's better the old fashioned way than the new features.
You don't need to iterate a Map to retrieve its matching value. You can remove that Inner-loop.
Let's take your current class (whatever the class you copied the code from) named it as RespectOthers.java
private static Filter getEmptyFilter(){
Filter filter = new Filter();
filter.setFields(new ArrayList<String>());
filter.setValues(new ArrayList<String>());
filter.setOperators(new ArrayList<String>());
return filter;
}
private static Filter setKeyAndValues(Filter inputFilterObj, Map.Entry<String,Map<String,String>> entry, QueryModel queryModel){
inputFilterObj.setFields(new ArrayList<String>());
inputFilterObj.getFields().add(entry.getKey());
if (entry.getValue() != null) {
inputFilterObj.getValues().add(entry.getValue().get("value"));
inputFilterObj.getOperators().add(entry.getValue().get("matchMode"));
queryModel.getFilters().add(inputFilterObj);
}
return inputFilterObj;
}
List<Filter> finalOutput = filters.entrySet().stream()
.map(e -> RespectOthers.setKeyAndValues(RespectOthers.getEmptyFilter(), e, myQueryModel))
.collect(Collectors.toList());

Parallel Access to Elements in a Groovy List

This is a simple efficiency question around the Groovy language; I have a Customer object that within it has an id and I would like to transfer those IDs into another list which in my view is atomic so can be paralleled.
e.g. linear execution
public List<Long> extractIds(List<Customer> customerList) {
List<Long> customerIds = new ArrayList<Long>();
customerList.each { it -> customerIds.add(it.id) }
}
Question: What is the most efficient way to transfer the IDs in the above example when holding a large volume of customers?
The simplest method would be:
public List<Long> extractIds(List<Customer> customerList) {
customerList.id
}
Or, if you want to do it in a multi-threaded fashion, you can use gpars:
import static groovyx.gpars.GParsPool.withPool
public List<Long> extractIds(List<Customer> customerList) {
withPool {
customerList.collectParallel { it.id }
}
}
But you may find the first brute-force method is quicker for this simple example (rather than spinning up a thread pool, and synchronizing the collection of results from different threads)

Resources