Elastic Search ttl (time to live) in the dynamic java mapping file - Spring data elastic search - elasticsearch

We use elastic search dynamic mapping and the java file is as follows.
#Document(indexName = "test", type = "test", shards = 1, replicas = 0)
public class ElasticSearchIndexObject {
private #Id
#Indexed
String id;
private #Indexed("name")
String name;
}
We use scheduler that runs at every 60 mins to fetch the data from the DB and to add to the index.
Connection conn = dataSource.getConnection();
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(FETCH_SIZE);
rs = stmt.executeQuery(ESEARCH_QUERY);
int i=1;
while (rs.next()) {
ElasticSearchIndexObject indexObj = new ElasticSearchIndexObject();
indexObj.setName(rs.getString("name"));
indexObj.setId(rs.getString("id"));
indexObjects.add(indexObj);
i=i+1;
}
elasticSearchObjectIndexRepository.save(indexObjects);
indexObjects.clear();
}
This scheduler runs every 60 mins and add/update the index.
Add - If the id is not there in the index
Update - If the id is already there in the index
Problem is with the deleted records in the database. These records are not getting deleted from the index and becomes an orphan records.
I came across "ttl" property and looking for a way to add this to the index so that the orphan records will get deleted after the ttl time.
If the ttl is not to add to each index, Should it be at the generic level for all the documents? If so, should i set this for the each schedule run?
Thanks,

Be sure your index type has its "_ttl" : { "enabled" : true } mapping already configured. Then pass the _ttl value for your document in _source. In your POJO add this field:
#JsonInclude(value=Include.NON_EMPTY) //to make it optional
#JsonProperty("_ttl")
private Long ttl;

According to this open issue it doesn't look like the _ttl field is currently supported by Spring Data Elasticsearch.
Another way of doing it is to "soft-delete" records from your database by setting a flag (i.e. a new boolean column). The flag would be true when the record is active and false when the record is deleted. That way when your import process runs, you'd get all records and based on that flag you know you have to delete the documents from Elasticsearch.

Related

Spring Data elastic search with out entity fields

I'm using spring data elastic search, Now my document do not have any static fields, and it is accumulated data per qtr, I will be getting ~6GB/qtr (we call them as versions). Lets say we get 5GB of data in Jan 2021 with 140 columns, in the next version I may get 130 / 120 columns, which we do not know, The end user requirement is to get the information from the database and show it in a tabular format, and he can filter the data. In MongoDB we have BasicDBObject, do we have anything in springboot elasticsearch
I can provide, let say 4-5 columns which are common in every version record and apart from that, I need to retrieve the data without mentioning the column names in the pojo, and I need to use filters on them just like I can do in MongoDB
List<BaseClass> getMultiSearch(#RequestBody Map<String, Object>[] attributes) {
Query orQuery = new Query();
Criteria orCriteria = new Criteria();
List<Criteria> orExpression = new ArrayList<>();
for (Map<String, Object> accounts : attributes) {
Criteria expression = new Criteria();
accounts.forEach((key, value) -> expression.and(key).is(value));
orExpression.add(expression);
}
orQuery.addCriteria(orCriteria.orOperator(orExpression.toArray(new Criteria[orExpression.size()])));
return mongoOperations.find(orQuery, BaseClass.class);
}
You can define an entity class for example like this:
public class GenericEntity extends LinkedHashMap<String, Object> {
}
To have that returned in your calling site:
public SearchHits<GenericEntity> allGeneric() {
var criteria = Criteria.where("fieldname").is("value");
Query query = new CriteriaQuery(criteria);
return operations.search(query, GenericEntity.class, IndexCoordinates.of("indexname"));
}
But notice: when writing data into Elasticsearch, the mapping for new fields/properties in that index will be dynamically updated. And there is a limit as to how man entries a mapping can have (https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-settings-limit.html). So take care not to run into that limit.

Is It possible to dynamically remove indexing on a field on Elasticsearch using spring data?

I need to index all the fields of an Elasticsearch index when building the index, but after some time, if needed change the index value to false and improve the performance by removing indexing from some of the fields.
As I searched, read the docs and tested it with spring data when I set index = false after building the index in #Field there is no change and the field is still searchable.
#Document(indexName = "book")
#Setting(refreshInterval = "30s", shards = 3)
class Book(
#Id
#Field(type = FieldType.Keyword)
var id: String? = null,
#Field(type = FieldType.Keyword )
var title: String? = null,
#Field(type = FieldType.Keyword,index = false)
val isbn: String)
I wanted to know if there is another solution to change the indexing of fields dynamically after building the index using spring data?
You'll need to run the modified program with a new index name to create a new index with the adjusted mapping and then manually do a reindexing from the old to the new index: Elasticsearch documentation about reindexing.

How to filter Range criteria using ElasticSearch Repository

I need to fetch Employees who joined between 2021-12-01 to 2021-12-31. I am using ElasticsearchRepository to fetch data from ElasticSearch index.
How can we fetch range criteria using repository.
public interface EmployeeRepository extends ElasticsearchRepository<Employee, String>,EmployeeRepositoryCustom {
List<Employee> findByJoinedDate(String joinedDate);
}
I have tried Between option like below: But it is returning no results
List<Employee> findByJoinedDateBetween(String fromJoinedDate, String toJoinedDate);
My Index configuration
#Document(indexName="employee", createIndex=true,type="_doc", shards = 4)
public class Employee {
#Field(type=FieldType.Text)
private String joinedDate;
Note: You seem to be using an outdated version of Spring Data Elasticsearch. The type parameter of the #Document
annotation was deprecated in 4.0 and removed in 4.1, as Elasticsearch itself does not support typed indices since
version 7.
To your question:
In order to be able to have a range query for dates in Elasticsearch the field in question must be of type date (the
Elasticsearch type). For your entity this would mean (I refer to the attributes from the current version 4.3):
#Nullable
#Field(type = FieldType.Date, pattern = "uuuu-MM-dd", format = {})
private LocalDate joinedDate;
This defines the joinedDate to have a date type and sets the string representation to the given pattern. The
empty format argument makes sure that the additional default values (DateFormat.date_optional_time and DateFormat. epoch_millis) are not set here. This results in the
following mapping in the index:
{
"properties": {
"joinedDate": {
"type": "date",
"format": "uuuu-MM-dd"
}
}
}
If you check the mapping in your index (GET localhost:9200/employee/_mapping) you will see that in your case the
joinedDate is of type text. You will either need to delete the index and have it recreated by your application or
create it with a new name and then, after the application has written the mapping, reindex the data from the old
index into the new one (https://www.elastic.co/guide/en/elasticsearch/reference/7.16/docs-reindex.html).
Once you have the index with the correct mapping in place, you can define the method in your repository like this:
List<Employee> findByJoinedDateBetween(LocalDate fromJoinedDate, LocalDate toJoinedDate);
and call it:
repository.findByJoinedDateBetween(LocalDate.of(2021, 1, 1), LocalDate.of(2021, 12, 31));

Spring Redis: Range query "greater than" on a field

I am using Redis to store some data and later query it and update it with latest information.
Considering an example:
I receive File data, which carries info on the file and the physical storage location of that file.
One shelf has multiple racks, and each rack can have multiple files.
Each file has a version field, and it gets updated (incremented) when an operation on file is performed.
How do I plan to store?
I need to query based on "shelfID + rack ID" -- To get all files.
I need to query based on "shelfID + rack ID + version > XX" -- To get all files with version more than specified.
Now, to get all files belonging to a shelf and rack, is achievable in Spring Data Redis.
I create a key of the combination of 2 ID's and later query based on this Key.
private <T> void save(String id, T entity) {
redisTemplate.opsForValue().set(id, entity);
}
But, how do I query for version field?
I had kept "version" field as #Indexed, but spring repository query does not work.
#RedisHash("shelves")
public class ShelfEntity {
#Indexed
#Id
private String id;
#Indexed
private String shelfId;
#Indexed
private String rackId;
#Indexed
private String fileId;
#Indexed
private Integer version;
private String fileName;
// and other updatable fields
}
Repository method:
List<ShefEntity> findAllByShelfIdAndRackIdAndVersionGreaterThan(String centerCd,
String floorCd, int version);
Above, gives error:
java.lang.IllegalArgumentException: GREATER_THAN (1): [IsGreaterThan,
GreaterThan]is not supported for redis query derivation
Q. How do I query based on Version Greater than?
Q. Is it even possible with Spring Data Redis?
Q. If possible, how should I model the data (into what data structure), in order to make such queries?
Q. If we don't use Spring, how to do this in Redis using redis-cli and data structure?
May be something like:
<key, key, value>
<shelfId+rackId, version, fileData>
I am not sure how to model this in Redis?
Update 2:
One shelf can have N racks.
One rack can have N files.
Each file object will have a version.
This version gets updated (o -> 1 -> 2....)
I want to store only the latest version of a file.
So, if we have 1 file object
shelfId - 1
rackId - 1
fileId - 1
version - 0
.... on update of version ... we should still have 1 file object.
version - 1
I tried keeping key as a MD5 hash of shelfId + rackId, in hash data structure.
But cannot query on version.
I also tried using a ZSet.
Saving it like this:
private void saveSet(List<ShelfEntity> shelfInfo) {
for (ShelfEntity item : shelfInfo) {
redisTemplate.opsForZSet()
.add(item.getId(), item, item.getVersion());
}
}
So, version becomes the score.
But the problem is we cannot update items of set.
So for one fileId, there are multiple version.
When I query, I get duplicates.
Get code:
Set<ShelfEntity> objects = (Set<ShelfEntity>) (Object) redisTemplate.opsForZSet()
.rangeByScore(generateMd5Hash("-", shelfId, rackId), startVersion,
Double.MAX_VALUE);
Now, this is an attempt to mimic version > XX
Create ZSET for each shelfId and rackId combination
Use two methods to save and update records in Redis
// this methods stores all shelf info in db
public void save(List<ShelfEntity> shelfInfo) {
for (ShelfEntity item : shelfInfo) {
redisTemplate.opsForZSet()
.add(item.getId(), clonedItem, item.getVersion());
}
}
Use update to remove old and insert new one, Redis does not support key update as it's a table so you need to remove the existing and add a new record
public void update(List<ShelfEntity> oldRecords, List<ShelfEntity> newRecords) {
if (oldRecords.size() != newRecords.size()){
throw new IlleagalArgumentException("old and new records must have same number of entries");
}
for (int i=0;i<oldRecords.size();i++) {
ShelfEntity oldItem = oldRecords.get(i);
ShelfEntity newItem = newRecords.get(i);
redisTemplate.opsForZSet().remove(oldItem.getId(), oldItem);
redisTemplate.opsForZSet()
.add(newItem.getId(), newItem, newItem.getVersion());
}
}
Read items from ZSET with score.
List<ShefEntity> findAllByShelfIdAndRackIdAndVersionGreaterThan(String shelfId,
String rackId, int version){
Set<TypedTuple<ShelfEntity>> objects = (Set<TypedTuple<ShelfEntity>>) redisTemplate.opsForZSet()
.rangeWithScores(generateMd5Hash("-", shelfId, rackId), new Double(version),
Double.MAX_VALUE);
List<ShelfEntity> shelfEntities = new ArrayList<>();
for (TypedTuple<ShelfEntity> entry: objects) {
shelfEntities.add(entry.getValue().setVersion( entry.getScore().intValue()));
}
return shelfEntities;
}

Filter Search query in Spring Mongo DB

In feed collection "likeCount" and "commentCount" are two column. I want to get all document where "likeCount" + "commentCount" greater than 100. How can I write the search filter query in Spring Mongo DB?
Below is my sample feed collection data.
{
"_id" : ObjectId("55deb33dcb9be727e8356289"),
"channelName" : "Facebook",
"likeCount" : 2,
"commentCount" : 10,
}
For compare single field we can write search query like :
BasicDBObject searchFilter = new BasicDBObject();
searchFilter.append("likeCount", new BasicDBObject("$gte",100));
DBCursor feedCursor = mongoTemplate.getCollection("feed").find(searchFilter);
Try this
db.collection.aggregate([{$project:{total:{'$add':["$likeCount","$commentCount"]}}},{$match:{total:{$gt:100}}}])
You would need to use the MongoDB Aggregation Framework with Spring Data MongoDB. In Spring Data the following returns all feeds with a combined likes and comments counts greater than 100, using the aggregation framework. :
Entities
class FeedsCount {
#Id String id;
String channelName;
long likeCount;
long commentCount;
long totalLikesComments;
//...
}
Aggregation
import static org.springframework.data.mongodb.core.aggregation.Aggregation.*;
Aggregation agg = newAggregation(Feed.class,
project("id", "channelName", "likeCount", "commentCount")
.andExpression("likeCount + commentCount").as("totalLikesComments"),
match(where("totalLikesComments").gt(100))
);
//Convert the aggregation result into a List
AggregationResults<FeedsCount> groupResults
= mongoTemplate.aggregate(agg, FeedsCount.class);
List<FeedsCount> results = groupResults.getMappedResults();
In the code above, first create a new aggregation via the newAggregation static factory method to which you pass a list of aggregation operations. These aggregate operations define the aggregation pipeline of your Aggregation.
As a first step, select the "id", "channelName", "likeCount", "commentCount" fields from the input collection with the project operation and add a new field "totalLikesComments" which is a computed property that stores the sum of the "likeCount" and "commentCount" fields.
Finally in the second step, filter the intermediate result by using a match operation which accepts a Criteria query as an argument.
Note that you derive the name of the input-collection from the Feed-class passed as first parameter to the newAggregation-Method.

Resources