Is It possible to dynamically remove indexing on a field on Elasticsearch using spring data? - elasticsearch

I need to index all the fields of an Elasticsearch index when building the index, but after some time, if needed change the index value to false and improve the performance by removing indexing from some of the fields.
As I searched, read the docs and tested it with spring data when I set index = false after building the index in #Field there is no change and the field is still searchable.
#Document(indexName = "book")
#Setting(refreshInterval = "30s", shards = 3)
class Book(
#Id
#Field(type = FieldType.Keyword)
var id: String? = null,
#Field(type = FieldType.Keyword )
var title: String? = null,
#Field(type = FieldType.Keyword,index = false)
val isbn: String)
I wanted to know if there is another solution to change the indexing of fields dynamically after building the index using spring data?

You'll need to run the modified program with a new index name to create a new index with the adjusted mapping and then manually do a reindexing from the old to the new index: Elasticsearch documentation about reindexing.

Related

How to filter Range criteria using ElasticSearch Repository

I need to fetch Employees who joined between 2021-12-01 to 2021-12-31. I am using ElasticsearchRepository to fetch data from ElasticSearch index.
How can we fetch range criteria using repository.
public interface EmployeeRepository extends ElasticsearchRepository<Employee, String>,EmployeeRepositoryCustom {
List<Employee> findByJoinedDate(String joinedDate);
}
I have tried Between option like below: But it is returning no results
List<Employee> findByJoinedDateBetween(String fromJoinedDate, String toJoinedDate);
My Index configuration
#Document(indexName="employee", createIndex=true,type="_doc", shards = 4)
public class Employee {
#Field(type=FieldType.Text)
private String joinedDate;
Note: You seem to be using an outdated version of Spring Data Elasticsearch. The type parameter of the #Document
annotation was deprecated in 4.0 and removed in 4.1, as Elasticsearch itself does not support typed indices since
version 7.
To your question:
In order to be able to have a range query for dates in Elasticsearch the field in question must be of type date (the
Elasticsearch type). For your entity this would mean (I refer to the attributes from the current version 4.3):
#Nullable
#Field(type = FieldType.Date, pattern = "uuuu-MM-dd", format = {})
private LocalDate joinedDate;
This defines the joinedDate to have a date type and sets the string representation to the given pattern. The
empty format argument makes sure that the additional default values (DateFormat.date_optional_time and DateFormat. epoch_millis) are not set here. This results in the
following mapping in the index:
{
"properties": {
"joinedDate": {
"type": "date",
"format": "uuuu-MM-dd"
}
}
}
If you check the mapping in your index (GET localhost:9200/employee/_mapping) you will see that in your case the
joinedDate is of type text. You will either need to delete the index and have it recreated by your application or
create it with a new name and then, after the application has written the mapping, reindex the data from the old
index into the new one (https://www.elastic.co/guide/en/elasticsearch/reference/7.16/docs-reindex.html).
Once you have the index with the correct mapping in place, you can define the method in your repository like this:
List<Employee> findByJoinedDateBetween(LocalDate fromJoinedDate, LocalDate toJoinedDate);
and call it:
repository.findByJoinedDateBetween(LocalDate.of(2021, 1, 1), LocalDate.of(2021, 12, 31));

Spring Boot 2 with Hibernate Search, indexes are not created on save

I've an entity defined like below. If I use save() Hibernate does not create a new index for newly created entity. Updating/modifying an existing entity works well and as expected.
I'm using kotling with spring boot 2.
#Entity(name = "shipment")
#Indexed
data class Shipment(
#Id #GeneratedValue(strategy = GenerationType.IDENTITY) val id: Long = -1,
#JoinColumn(name = "user") #ManyToOne() var user: User?,
#IndexedEmbedded
#JoinColumn(name = "sender") #ManyToOne(cascade = [CascadeType.ALL]) val sender: Contact,
#IndexedEmbedded
#JoinColumn(name = "sender_information") #ManyToOne(cascade = [CascadeType.ALL]) val senderInformation: ShipmentInformation,
) {}
Save function, I'm using this same function to update my entity and index is updated if index exists.
#Transactional
fun save(user: User, shipment: Shipment): Shipment {
shipment.user = user;
return this.shipmentRepository.save(shipment)
}
application.properties
spring.jpa.properties.hibernate.search.default.directory_provider=filesystem
spring.jpa.properties.hibernate.search.default.indexBase=./lucene/
spring.jpa.open-in-view=false
If I restart the server, indexing manually works too.
#Transactional
override fun onApplicationEvent(event: ApplicationReadyEvent) {
val fullTextEntityManager: FullTextEntityManager = Search.getFullTextEntityManager(entityManager)
fullTextEntityManager.createIndexer().purgeAllOnStart(true)
fullTextEntityManager.createIndexer().optimizeAfterPurge(true)
fullTextEntityManager.createIndexer().batchSizeToLoadObjects(15)
fullTextEntityManager.createIndexer().cacheMode(CacheMode.IGNORE)
fullTextEntityManager.createIndexer().threadsToLoadObjects(2)
fullTextEntityManager.createIndexer().typesToIndexInParallel(2)
fullTextEntityManager.createIndexer().startAndWait()
return
}
I tried to force to use JPA transaction manager but It did not help me.
#Bean(name = arrayOf("transactionManager"))
#Primary
fun transactionManager(#Autowired entityManagerFactory: EntityManagerFactory): org.springframework.orm.jpa.JpaTransactionManager {
return JpaTransactionManager(entityManagerFactory)
}
Update
I think I found why I don't get the results of newly inserted entities.
My search query has a condition on "pid" field which is declared:
#Field(index = Index.YES, analyze = Analyze.NO, store = Store.NO)
#SortableField
#Column(name = "id", updatable = false, insertable = false)
#JsonIgnore
#NumericField val pid: Long,
and query:
query.must(queryBuilder.keyword().onField("customer.pid").matching(user.customer.id.toString()).createQuery())
pid is not stored and so newly inserted values are not visible. Can this be the cause?
BTW: How can I query/search by nested indexed document id? In my case it is customer.id which is DocumentId. I've tried to change the query like below but don't get any result, should I create a new field to query?
query.must(queryBuilder.keyword().onField("customer.id").matching(user.customer.id.toString()).createQuery())
Update 2
I found a solution and now getting the newly inserted datas too. There was an error with definition of "pid" field and I've defined my Fields as below and it works as expected.
#Fields(
Field(name = "pid", index = Index.YES, analyze = Analyze.YES, store = Store.NO)
)
#SortableField(forField = "pid")
#Id #GeneratedValue(strategy = GenerationType.IDENTITY) val id: Long?,
Can we search and sort by id in an easy way or is it the best practice? I know that we should use native JPA functions to get results by id but in my case I need to search by an embedded id to restrict search results. (depends on role of user) so therefore it is not an option for me.
And I don't understand why manual indexing works...
BTW: How can I query/search by nested indexed document id? In my case it is customer.id which is DocumentId. I've tried to change the query like below but don't get any result, should I create a new field to query?
Normally you don't need to create a separate field if all you want is to perform an exact match.
Can we search and sort by id in an easy way
Searching, yes, at least in Hibernate Search 5.
Sorting, no: you need a dedicated field.
or is it the best practice?
The best practice is to declare a field alongside your #DocumentId if you need anything more complex than an exact match on the ID.
I know that we should use native JPA functions to get results by id
I'm not sure I understand what you mean by "native JPA functions".
but in my case I need to search by an embedded id to restrict search results. (depends on role of user)
Yes, this should work. That is, it should work if the id is properly populated.
And I don't understand why manual indexing works...
Neither do I, but I suppose the explanation lies in the "error in the definition of "pid" field". Maybe the ID wasn't populated properly in some cases, leading to the entity being considered as deleted by Hibernate Search?
If you need me to give you a definitive answer, the best way to get it would be to create a reproducer. You can use this as a template: https://github.com/hibernate/hibernate-test-case-templates/tree/master/search
This looks odd:
#Id #GeneratedValue(strategy = GenerationType.IDENTITY) val id: Long = -1,
I'd expect a nullable long, initialized to null (or whatever is the Kotlin equivalent).
I'm not sure this is the problem, but I imagine it could be, as a non-null ID is generally only expected from an already persisted entity.
Other than that, I think you're on the right track: if mass indexing works but not automatic indexing, it may have something to do with your changes not being executed in database transactions.

Spring Data Elasticsearch Problem with IP_Range Data type

I use Spring boot 2.0.1.RELEASE/ Spring Data Elasticsearch 3.0.6.
I annotate my domain class with #Document annotation and i have a field as below:
#Field(store = true, type = FieldType.?)
private String ipRange;
as you see, I need to set the field type to IP_Range (exists in elastic search engine data types)
but not exists in FieldType enum.
I want to create this document index by ElasticsearchTemplate.createIndex(doc) method. but none of any FieldType enum support ip_range data type.
Spring Data Elasticsearch currently (3.2.0.M2) does not support this. I saw that you already opened an issue, thanks for that. The answer here is just for the completeness and for other users having the same problem
Thanks #P.J.Meisch for your reply, I used #Mapping annotation to specify my mapping directly via json format. Already Spring data supports creating index based on this config. but i am also waiting for Range Data Structure Support to refactor my code.
My Document:
#Document(createIndex = true, indexName = "mydomain", type = "doc-rule"
, refreshInterval = BaseDocument.REFRESH_INTERVAL, replicas = BaseDocument.REPLICA_COUNT, shards = BaseDocument.SHARD_COUNT)
#Mapping(mappingPath = "/elasticsearch/mappings/mydomain-mapping.json")
public class MyDomainDoc {
#Field(store = true, type = FieldType.text)
private List<String> ipRange;
... other fields
}
And My mydomain-mapping.json file:
{
"properties": {
...,
"ipRange": {
"type": "ip_range",
...
},
...
}
}

Elasticsearch + Spring boot: Query creation from method names for property with #InnerField/#MultiField

I'm trying to build an Elasticsearch query using method name and just curios on what would be the method name if one of the property has multiple fields like following
#MultiField(
mainField = #Field(type = Text, fielddata = true),
otherFields = {
#InnerField(suffix = "keyword", type = Keyword)
}
)
private String resourceType;
I needed "keyword" type (non-analyzed) so I can search it with entire string.
I have tried it as
List<Event> findByResourceType_KeywordIsIn(Collection<String> list);
and getting following error
No property keyword found for type String! Traversed path: Event.resourceType.
Is there anyway I can tell spring-data-elasticsearch that it is for the same property but an InnerField ?
P.S: I can certainly go with either #Query or just build that entire query using NativeSearchQueryBuilder but curios if I can achieve it with just a method name(Less code -> Less unit testing :) )
Thanks
This won't work with the method names of Repository implementations. The logic in Spring Data that does the parsing uses the - possibly nested - properties of the java class whereas you need to have a query searching the resourceType.keyword Elasticsearch field.
So as you already wrote, you'll need a #Query to do this.

Elastic Search ttl (time to live) in the dynamic java mapping file - Spring data elastic search

We use elastic search dynamic mapping and the java file is as follows.
#Document(indexName = "test", type = "test", shards = 1, replicas = 0)
public class ElasticSearchIndexObject {
private #Id
#Indexed
String id;
private #Indexed("name")
String name;
}
We use scheduler that runs at every 60 mins to fetch the data from the DB and to add to the index.
Connection conn = dataSource.getConnection();
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(FETCH_SIZE);
rs = stmt.executeQuery(ESEARCH_QUERY);
int i=1;
while (rs.next()) {
ElasticSearchIndexObject indexObj = new ElasticSearchIndexObject();
indexObj.setName(rs.getString("name"));
indexObj.setId(rs.getString("id"));
indexObjects.add(indexObj);
i=i+1;
}
elasticSearchObjectIndexRepository.save(indexObjects);
indexObjects.clear();
}
This scheduler runs every 60 mins and add/update the index.
Add - If the id is not there in the index
Update - If the id is already there in the index
Problem is with the deleted records in the database. These records are not getting deleted from the index and becomes an orphan records.
I came across "ttl" property and looking for a way to add this to the index so that the orphan records will get deleted after the ttl time.
If the ttl is not to add to each index, Should it be at the generic level for all the documents? If so, should i set this for the each schedule run?
Thanks,
Be sure your index type has its "_ttl" : { "enabled" : true } mapping already configured. Then pass the _ttl value for your document in _source. In your POJO add this field:
#JsonInclude(value=Include.NON_EMPTY) //to make it optional
#JsonProperty("_ttl")
private Long ttl;
According to this open issue it doesn't look like the _ttl field is currently supported by Spring Data Elasticsearch.
Another way of doing it is to "soft-delete" records from your database by setting a flag (i.e. a new boolean column). The flag would be true when the record is active and false when the record is deleted. That way when your import process runs, you'd get all records and based on that flag you know you have to delete the documents from Elasticsearch.

Resources