How to do bulk update using elasticsearchOperations in Spring data elasticsearch? - spring-boot

We were doing bullk insert by using bulkIndex as follow:
List<IndexQuery> indexQueries = new ArrayList<>();
for (FooDocument fooDocument : fooDocuments) {
IndexQuery query = new IndexQueryBuilder()
.withId(String.valueOf(fooDocument.getId()))
.withObject(fooDocument)
.build();
indexQueries.add(query);
}
elasticsearchOperations.bulkIndex(indexQueries, IndexCoordinates.of("foo_index_3"));
We also want to use for bulk updates something like:
elasticsearchOperations.bulkUpdate(updateQueries, IndexCoordinates.of("foo_index_3"));
But bulkUpdate requires list of UpdateQuery.I am trying to create it by doing something like:
List<IndexQuery> updateQueries = new ArrayList<>();
for (FooDocument fooDocument : fooDocuments) {
UpdateQuery updateQuery = new UpdateQueryBuilder()//which Builder class and method is required?
.withId(String.valueOf(fooDocument.getId()))
.withObject(fooDocument)
.build();
updateQueries.add(query);
}
But unlike IndexQueryBuilder there is no UpdateQueryBuilder() available, what is the correct way to build the UpdateQuery and which Builder class should we use? I am wondering whether UpdateQueryBuilder class has been deprecated.
P.S: we are using 4.0.2.RELEASE version of spring-data-elasticsearch

You create an UpdateQuery with a builder like this:
UpdateQuerybuilder builder = UpdateQuery.builder(id)
.with(...)
.build();
Here the builder is a nested class and not a separate one.

Related

Get Aggregate Information from Elasticsearch using Spring-data-elasticsearch, ElasticsearchRepository

I would like to get aggregate results from ES like avgSize (avg of a field with name 'size'), totalhits for documents that match a term, and some other aggregates in future, for which I don't think ElasticsearchRepository has any methods to call. I built Query and Aggregate Builders as below. I want to use my Repository interface but I am not sure of what should the return ObjectType be ? Should it be a document type in my DTOs ? Also I have seen examples where the searchQueryis passed directly to ElasticsearchTemplate but then what is the point of having Repository interface that extends ElasticsearchRepository
Repository Interface
public interface CCFilesSummaryRepository extends ElasticsearchRepository<DataReferenceSummary, UUID> {
}
Elastic configuration
#Configuration
#EnableElasticsearchRepositories(basePackages = "com.xxx.repository.es")
public class ElasticConfiguration {
#Bean
public ElasticsearchOperations elasticsearchTemplate() throws UnknownHostException {
return new ElasticsearchTemplate(elasticsearchClient());
}
#Bean
public Client elasticsearchClient() throws UnknownHostException {
Settings settings = Settings.builder().put("cluster.name", "elasticsearch").build();
TransportClient client = new PreBuiltTransportClient(settings);
client.addTransportAddress(new TransportAddress(InetAddress.getLocalHost(), 9200));
return client;
}
}
Service Method
public DataReferenceSummary createSummary(final DataSet dataSet) {
try {
QueryBuilder queryBuilder = QueryBuilders.matchQuery("type" , dataSet.getDataSetCreateRequest().getContentType());
AvgAggregationBuilder avgAggregationBuilder = AggregationBuilders.avg("avg_size").field("size");
ValueCountAggregationBuilder valueCountAggregationBuilder = AggregationBuilders.count("total_references")
.field("asset_id");
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(queryBuilder)
.addAggregation(avgAggregationBuilder)
.addAggregation(valueCountAggregationBuilder)
.build();
return ccFilesSummaryRepository.search(searchQuery).iterator().next();
} catch (Exception e){
e.printStackTrace();
}
return null;
}
DataReferernceSummary is just a POJO for now and for which I am getting an error during my build that says Unable to build Bean CCFilesSummaryRepository, illegalArgumentException DataReferernceSummary. is not a amanged Object
First DataReferenceSummary must be a class annotated with #Document.
In Spring Data Elasticsearch 3.2.0 (the current version) you need to define the repository return type as AggregatedPage<DataReferenceSummary>, the returned object will contain the aggregations.
From the upcoming version 4.0 on, you will have to define the return type as SearchHits<DataReferenceSummary> and find the aggregations in this returned object.

How to call appropriate Item Processor for different records?

I have a flat file containing different records(header, record and footer)
HR,...
RD,...
FR,...
ItemReader
#Bean
#StepScope
public FlatFileItemReader reader(#Value("#{jobParameters['inputFileName']}") String inputFileName) {
FlatFileItemReader reader = new FlatFileItemReader();
reader.setResource(new FileSystemResource(inputFileName));
reader.setLineMapper(patternLineMapper());
return reader;
}
#Bean
public LineMapper patternLineMapper() {
PatternMatchingCompositeLineMapper patternLineMapper = new PatternMatchingCompositeLineMapper<>();
tokenizers = new HashMap<String, LineTokenizer>();
try {
tokenizers.put("HR*", headerLineTokenizer());
tokenizers.put("RD*", recordLineTokenizer());
tokenizers.put("FR*", footerLineTokenizer());
} catch (Exception e) {
e.printStackTrace();
}
fieldSetMappers = new HashMap<String, FieldSetMapper>();
fieldSetMappers.put("HR*", new HeaderFieldSetMapper());
fieldSetMappers.put("RD*", new RecordFieldSetMapper());
fieldSetMappers.put("FR*", new FooterFieldSetMapper());
patternLineMapper.setTokenizers(tokenizers);
patternLineMapper.setFieldSetMappers(fieldSetMappers);
return patternLineMapper;
}
They are working fine and spring batch calls the appropriate reader for each record the problem is when it comes to item processor I want to use the same approach I get java.lang.ClassCastException cuz spring batch try to map domain object [returned from reader] to java.lang.String
ItemProcessor
#Bean
#StepScope
public ItemProcessor processor() {
ClassifierCompositeItemProcessor processor = new ClassifierCompositeItemProcessor();
PatternMatchingClassifier<ItemProcessor> classifier = new PatternMatchingClassifier<>();
Map<String, ItemProcessor> patternMap = new HashMap<>();
patternMap.put("HR*", new HeaderItemProcessor());
patternMap.put("RD*", new RecordItemProcessor());
patternMap.put("FR*", new FooterItemProcessor());
classifier.setPatternMap(patternMap);
processor.setClassifier(classifier);
return processor;
}
I also used BackToBackPatternClassifier but it turns out it has a bug and when I use generics like ItemWriter<Object> I get an exception Couldn't Open File. the question is
How can I make ItemProcessor that handles different record types returned from Reader??
Your issue is that the classifier you use in the ClassifierCompositeItemProcessor is based on a String pattern and not a type. What really should happen is something like:
The reader returns a specific type of items based on the input pattern, something like:
HR* -> HRType
RD* -> RDType
FR* -> FRType
This is what you have basically done on the reader side. Now on the processing side, the processor will receive objects of type HRType, RDType and FRType. So the classifier should not be based on String as input type, but on the item type, something like:
Map<Object, ItemProcessor> patternMap = new HashMap<>();
patternMap.put(HRType.class, new HeaderItemProcessor());
patternMap.put(RDType.class, new RecordItemProcessor());
patternMap.put(FRType.class, new FooterItemProcessor());
This classifier uses Object type because your ItemReader returns a raw type. I would not recommend using raw types and Object type in the classifier. What you should do is:
create a base class of your items and a specific class for each type
Make the reader return items of type <? extends BaseClass>
Use a org.springframework.classify.SubclassClassifier in your ClassifierCompositeItemProcessor

Wring to multiple files dynamically in Spring Batch

In Spring batch I configure a file write as such:
#Bean
public FlatFileItemWriter<MyObject> flatFileItemWriter() throws Exception{
FlatFileItemWriter<MyObject> itemWriter = new FlatFileItemWriter();
// pass through aggregator just calls toString on any item pass in.
itemWriter.setLineAggregator(new PassThroughLineAggregator<>());
String outputPath = File.createTempFile("output", ".out").getAbsolutePath();
System.out.println(">>output path=" + outputPath);
itemWriter.setResource(new FileSystemResource(outputPath));
itemWriter.afterPropertiesSet();
return itemWriter;
}
What happens if MyObject is a complex structure that can vary depending on configuration settings etc and I want to generate different parts of that structure to different files.
How do I do this?
Have you looked at CompositeItemWriter? You may need to have CompositeLineMapper in your reader as well as ClassifierCompositeItemProcessor depending on your needs.
Below is example of a CompositeItemWriter
#Bean
public ItemWriter fileWriter() {
CompositeItemWriter compWriter = new CompositeItemWriter();
FlatFileItemWriter<MyObject_data> dataWriter = new FlatFileItemWriter<MyObject_data>();
FlatFileItemWriter<MyObject_otherdata> otherWriter = new FlatFileItemWriter<MyObject_otherdata>();
List<ItemWriter> iList = new ArrayList<ItemWriter>();
iList.add(dataWriter);
iList.add(otherWriter);
compWriter.setDelegates(iList);
return compWriter;
}

spring boot spring batch: how to set query dynamically to ItemReader

I am new to Spring. I have a use case where I need to execute same multiple sql queries and return same POJO for every query. I would like to write one Item reader and change the query in each step. Is there a way to do this?
You can use spring batch late binding by adding #StepScope in your reader
Sample code
#StepScope
#Bean
public ItemReader<Pojo> myReader() {
JdbcCursorItemReader<Pojo> reader = new JdbcCursorItemReader<>();
reader.setDataSource(basicDataSource);
//You can inject sql as per you need
//Some expamles
//using #{jobParameters['']}
//using {jobExecutionContext['input.file.name']}"
//using #{stepExecutionContext['input.file.name']}"
reader.setSql("Your-SQL");
reader.setRowMapper(new MyMapper());
return reader;
}
check section 5.4
https://docs.spring.io/spring-batch/reference/html/configureStep.html

Deserializing DynamoDBResults with gson fails

I have a specific use case where I store the results from my one table in DynamoDB to be stored in a serialized manner in another DynamoDB.
Now when I use gson to deserialize the data being retrieved,
I get this error:
java.lang.RuntimeException: Unable to invoke no-args constructor for class java.nio.ByteBuffer. Register an InstanceCreator with Gson for this type may fix this problem.
at com.google.gson.internal.ConstructorConstructor$12.construct(ConstructorConstructor.java:210)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:186)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.read(ReflectiveTypeAdapterFactory.java:103)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:196)
at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.read(TypeAdapterRuntimeTypeWrapper.java:40)
at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.read(MapTypeAdapterFactory.java:187)
at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.read(MapTypeAdapterFactory.java:145)
at com.google.gson.Gson.fromJson(Gson.java:810)
at com.google.gson.Gson.fromJson(Gson.java:775)
My method looks like this:
public void store(MyCustomObject obj) {
String primaryKey = obj.getKey();
List<Map<String, AttributeValue>> results = AmazonDynamoDB.query(...).getItems();
Gson gson = new Gson();
List<String>records = results .stream()
.map(mappedResult-> gson.toJson(mappedResult))
.collect(Collectors.toList());
Map<String, AttributeValue> attributeMap = transformToAttributeMap(records);
PutItemRequest putItemRequest = new PutItemRequest().withItem(attributeMap);
AmazonDynamoDB.putItem(...);
}
The method to retrieve the records looks something like this:
public void retrieve(String id) {
QueryRequest...
Map<String, AttributeValue> records = DynamoDB.query(...).getItems();
List<String> serializedRecords = new ArrayList<>();
List<AttributeValue> values = records.get("key");
for( AttributeValue attributeValue: values) {
serializedRecords.add(attributeValue.getS());
}
Gson gson = new Gson();
Type recordType = new TypeToken<Map<String, AttributeValue>>() { }.getType();
List<Map<String, AttributeValue>> actualRecords = serializedRecords.stream()
.map(record-> gson.fromJson(record, recordType))
.collect(Collectors.toList());
}
What am I doing wrong?
The problem is AttributeValue class has a field java.nio.ByteBuffer with name b. Gson tries to deserialize the data into it, but there is no default constructor for ByteBuffer class. Therefore gson cannot deserialize b field.
An alternative solution is with the new DynamoDB usage of AWS SDK. Following example should work:
AmazonDynamoDBClient client = new AmazonDynamoDBClient(
new ProfileCredentialsProvider());
Item item = new DynamoDB(client).getTable("user").getItem("Id", "user1");
String json = item.toJSON();
Item deserialized = Item.fromJSON(json);
You should modify the credentials provider according to your setup.
Not exactly the best workaround/answer, but I was able to do this:
Item item = new Item().withJSON("document", jsonStr);
Map<String,AttributeValue> attributes = InternalUtils.toAttributeValues(item);
return attributes.get("document").getM();

Resources