Spring data solr, How to force numeric-looking string field to be solr string type - spring

I'm trying to use spring-data-solr:3.0.6 to index data from different source, there is one field, casenumber having different format. When casenumber has ONLY digits, say 123, spring-data-solr will index the field as plong. That not causes problem until later on, a record with casenumber “CASE456”. Solr engine throw error, of course, casenumber must be long
Can I let spring data know "123" is string, not guess it as number without touch schema? I like the schemaless mode. I have tried the following code, spring-data-solr just index “123” as 123. There is little document about #Indexed/type. Thanks
#SolrDocument(collection =..)
public class CaseDocument
{
#Indexed(type="string")
private String caseNumber;
// OR
#Indexed(type="lowercase")
private String caseNumber;
....

Related

How to filter Range criteria using ElasticSearch Repository

I need to fetch Employees who joined between 2021-12-01 to 2021-12-31. I am using ElasticsearchRepository to fetch data from ElasticSearch index.
How can we fetch range criteria using repository.
public interface EmployeeRepository extends ElasticsearchRepository<Employee, String>,EmployeeRepositoryCustom {
List<Employee> findByJoinedDate(String joinedDate);
}
I have tried Between option like below: But it is returning no results
List<Employee> findByJoinedDateBetween(String fromJoinedDate, String toJoinedDate);
My Index configuration
#Document(indexName="employee", createIndex=true,type="_doc", shards = 4)
public class Employee {
#Field(type=FieldType.Text)
private String joinedDate;
Note: You seem to be using an outdated version of Spring Data Elasticsearch. The type parameter of the #Document
annotation was deprecated in 4.0 and removed in 4.1, as Elasticsearch itself does not support typed indices since
version 7.
To your question:
In order to be able to have a range query for dates in Elasticsearch the field in question must be of type date (the
Elasticsearch type). For your entity this would mean (I refer to the attributes from the current version 4.3):
#Nullable
#Field(type = FieldType.Date, pattern = "uuuu-MM-dd", format = {})
private LocalDate joinedDate;
This defines the joinedDate to have a date type and sets the string representation to the given pattern. The
empty format argument makes sure that the additional default values (DateFormat.date_optional_time and DateFormat. epoch_millis) are not set here. This results in the
following mapping in the index:
{
"properties": {
"joinedDate": {
"type": "date",
"format": "uuuu-MM-dd"
}
}
}
If you check the mapping in your index (GET localhost:9200/employee/_mapping) you will see that in your case the
joinedDate is of type text. You will either need to delete the index and have it recreated by your application or
create it with a new name and then, after the application has written the mapping, reindex the data from the old
index into the new one (https://www.elastic.co/guide/en/elasticsearch/reference/7.16/docs-reindex.html).
Once you have the index with the correct mapping in place, you can define the method in your repository like this:
List<Employee> findByJoinedDateBetween(LocalDate fromJoinedDate, LocalDate toJoinedDate);
and call it:
repository.findByJoinedDateBetween(LocalDate.of(2021, 1, 1), LocalDate.of(2021, 12, 31));

How to match numeric and boolean values in a lucene query

I am using hibernate search to construct a lucene query that returns string values that contain (part of) the search string. Next to that the query must only return the string values if the language id matches as well and if the deleted flag isn't set to true. I've made the below code for this. But the problem is that it doesn't return anything.
private Query getQueryWithBooleanClauses(Class entityClass, String searchString, Long preferredLanguageId, FullTextEntityManager fullTextEntityManager, String firstField, String... additionalFields) {
QueryBuilder queryBuilder = getQueryBuilder(entityClass, fullTextEntityManager);
Query containsSearchString = getMatchingStringCondition(searchString, queryBuilder, firstField, additionalFields);
BooleanQuery isPreferredOrDefaultLanguageTranslation = getLanguageCondition(preferredLanguageId);
BooleanQuery finalQuery = new BooleanQuery.Builder()
.add(new TermQuery(new Term("parentDeleted", "false")), BooleanClause.Occur.MUST)
.add(new TermQuery(new Term("parentApproved", "true")), BooleanClause.Occur.MUST)
.add(new TermQuery(new Term("childDeleted", "false")), BooleanClause.Occur.MUST)
.add(isPreferredOrDefaultLanguageTranslation, BooleanClause.Occur.MUST)
.add(containsSearchString, BooleanClause.Occur.MUST)
.build();
return finalQuery;
}
getMatchingStringCondition
private Query getMatchingStringCondition(String searchString, QueryBuilder queryBuilder, String firstField, String... additionalFields) {
log.info(MessageFormat.format("{0}*", searchString));
return queryBuilder.simpleQueryString()
.onFields(firstField, additionalFields)
.withAndAsDefaultOperator()
.matching(MessageFormat.format("{0}*", searchString))
.createQuery();
}
getLanguageCondition
private BooleanQuery getLanguageCondition(Long preferredLanguageId) {
return new BooleanQuery.Builder()
.add(createLanguagePredicate(preferredLanguageId), BooleanClause.Occur.SHOULD)
.add(createLanguagePredicate(languageService.getDefaultLanguage().getId()), BooleanClause.Occur.SHOULD)
.build();
}
createLanguagePredicate
private Query createLanguagePredicate(Long languageId){
return new TermQuery(new Term("language.languageId", languageId.toString()));
}
Query executing method
public List<AutoCompleteSuggestion> findAllBySearchStringAndDeletedIsFalse(Class entityClass, String searchString, Long preferredLanguageId){
FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(entityManager);
Query finalQuery = getQueryWithBooleanClauses(entityClass, searchString, preferredLanguageId, fullTextEntityManager, "parent.latinName", "translatedName");
FullTextQuery fullTextQuery = fullTextEntityManager.createFullTextQuery(finalQuery, entityClass);
fullTextQuery.setProjection("parentId", "autoCompleteSuggestion", "childApproved"); //volgorde moet overeen komen met argumenten volgorde in AutoCompleteSuggestion constructor, zie convertToAutoCompleteSuggestionList
fullTextQuery.setMaxResults(maxResults);
fullTextQuery.getResultList();
return convertToAutoCompleteSuggestionList(fullTextQuery.getResultList());
}
This code doesn't throw an error but never returns anything either. Only when i remove all the boolean conditions for the boolean and numerical fields, leaving only the containsSearchString condition will the query return anything.
According to this post Hibernate Search 5.0 Numeric Lucene Query HSEARCH000233 issue this happens because as of Hibernate search 5 numerical fields are no longer treated as text fields and you can't perform matching queries on numerical fields.
You can force that the fields are treated as textfields by annotating them with #FieldBridge. But i'd rather not do that. So my question is. How do i perform match queries on non-text fields like booleans, dates, and numbers?
EDIT: It works if i annotate all the fields required for filtering with #FieldBridge(impl= implementation.class)`,also the index parameter must always be set to YES.
But now all these fields will be stored as strings, which is undesirable. So i'd still like to know if there is another more elegant way to apply filters.
EDIT 2:
#yrodiere, When i removed #FieldBridge(impl = LongBridge.class) from languageId and replace the line .add(isPreferredOrDefaultLanguageTranslation, BooleanClause.Occur.MUST) with:
.add(queryBuilder.bool().must(queryBuilder.keyword().onField("language.languageId").matching(languageService.getDefaultLanguage().getId().toString()).createQuery()).createQuery(), BooleanClause.Occur.MUST)
I get the error:
org.hibernate.search.exception.SearchException: HSEARCH000238: Cannot create numeric range query for field 'language.languageId', since values are not numeric (Date, int, long, short or double)
However just now i discovered that matching() also accepts a Long number so i don't have to call toString() on it. When matching() uses the Long value i don't get an error but nothing is returned either.
Only when i used new TermQuery(new Term("language.languageId", languageId.toString())) instead of matching() while also using a LongBridge for languageId will anything get returned. Am i defining the matching() query erroneously?
I also have a different question that i wanted to start a new SO question for. But maybe you can answer that question in this thread as well :). The question is about the includeEmbeddedObjectId parameter of #IndexedEmbedded. I think i know what this does but i would like to have some confirmation from you.
I assume that when i set this to true the id of the parent entity will be included in the lucene document of the child entity, correct? Lets say that this parent entity is used in a matching() query thats used as a true/false condition. Is it then correct to assume that the search will be faster because the id can now also be found in the lucene document of the child entity?
Thanks
Booleans are still indexed as strings in Hibernate Search 5. See org.hibernate.search.bridge.builtin.BooleanBridge. So boolean fields are not part of the problem here.
If you really want to create numeric queries yourself, in Hibernate Search 5 you will have to use numeric range queries, e.g.:
private Query createLanguagePredicate(Long languageId){
return org.apache.lucene.search.NumericRangeQuery.newLongRange("language.languageId", languageId,
languageId, true, true);
}
That being said, to avoid that kind of problems, you should use the Hibernate Search DSL. Then you'll pass values of the type you use in your model (here, a Long), and Hibernate Search will create the right query automatically.
Or even better, upgrade to Hibernate Search 6, which exposes a different API, but less verbose and with fewer quirks. See for yourself in the documentation of the Search DSL in Hibernate Search 6, in particular the predicate DSL.

Spring Redis: Range query "greater than" on a field

I am using Redis to store some data and later query it and update it with latest information.
Considering an example:
I receive File data, which carries info on the file and the physical storage location of that file.
One shelf has multiple racks, and each rack can have multiple files.
Each file has a version field, and it gets updated (incremented) when an operation on file is performed.
How do I plan to store?
I need to query based on "shelfID + rack ID" -- To get all files.
I need to query based on "shelfID + rack ID + version > XX" -- To get all files with version more than specified.
Now, to get all files belonging to a shelf and rack, is achievable in Spring Data Redis.
I create a key of the combination of 2 ID's and later query based on this Key.
private <T> void save(String id, T entity) {
redisTemplate.opsForValue().set(id, entity);
}
But, how do I query for version field?
I had kept "version" field as #Indexed, but spring repository query does not work.
#RedisHash("shelves")
public class ShelfEntity {
#Indexed
#Id
private String id;
#Indexed
private String shelfId;
#Indexed
private String rackId;
#Indexed
private String fileId;
#Indexed
private Integer version;
private String fileName;
// and other updatable fields
}
Repository method:
List<ShefEntity> findAllByShelfIdAndRackIdAndVersionGreaterThan(String centerCd,
String floorCd, int version);
Above, gives error:
java.lang.IllegalArgumentException: GREATER_THAN (1): [IsGreaterThan,
GreaterThan]is not supported for redis query derivation
Q. How do I query based on Version Greater than?
Q. Is it even possible with Spring Data Redis?
Q. If possible, how should I model the data (into what data structure), in order to make such queries?
Q. If we don't use Spring, how to do this in Redis using redis-cli and data structure?
May be something like:
<key, key, value>
<shelfId+rackId, version, fileData>
I am not sure how to model this in Redis?
Update 2:
One shelf can have N racks.
One rack can have N files.
Each file object will have a version.
This version gets updated (o -> 1 -> 2....)
I want to store only the latest version of a file.
So, if we have 1 file object
shelfId - 1
rackId - 1
fileId - 1
version - 0
.... on update of version ... we should still have 1 file object.
version - 1
I tried keeping key as a MD5 hash of shelfId + rackId, in hash data structure.
But cannot query on version.
I also tried using a ZSet.
Saving it like this:
private void saveSet(List<ShelfEntity> shelfInfo) {
for (ShelfEntity item : shelfInfo) {
redisTemplate.opsForZSet()
.add(item.getId(), item, item.getVersion());
}
}
So, version becomes the score.
But the problem is we cannot update items of set.
So for one fileId, there are multiple version.
When I query, I get duplicates.
Get code:
Set<ShelfEntity> objects = (Set<ShelfEntity>) (Object) redisTemplate.opsForZSet()
.rangeByScore(generateMd5Hash("-", shelfId, rackId), startVersion,
Double.MAX_VALUE);
Now, this is an attempt to mimic version > XX
Create ZSET for each shelfId and rackId combination
Use two methods to save and update records in Redis
// this methods stores all shelf info in db
public void save(List<ShelfEntity> shelfInfo) {
for (ShelfEntity item : shelfInfo) {
redisTemplate.opsForZSet()
.add(item.getId(), clonedItem, item.getVersion());
}
}
Use update to remove old and insert new one, Redis does not support key update as it's a table so you need to remove the existing and add a new record
public void update(List<ShelfEntity> oldRecords, List<ShelfEntity> newRecords) {
if (oldRecords.size() != newRecords.size()){
throw new IlleagalArgumentException("old and new records must have same number of entries");
}
for (int i=0;i<oldRecords.size();i++) {
ShelfEntity oldItem = oldRecords.get(i);
ShelfEntity newItem = newRecords.get(i);
redisTemplate.opsForZSet().remove(oldItem.getId(), oldItem);
redisTemplate.opsForZSet()
.add(newItem.getId(), newItem, newItem.getVersion());
}
}
Read items from ZSET with score.
List<ShefEntity> findAllByShelfIdAndRackIdAndVersionGreaterThan(String shelfId,
String rackId, int version){
Set<TypedTuple<ShelfEntity>> objects = (Set<TypedTuple<ShelfEntity>>) redisTemplate.opsForZSet()
.rangeWithScores(generateMd5Hash("-", shelfId, rackId), new Double(version),
Double.MAX_VALUE);
List<ShelfEntity> shelfEntities = new ArrayList<>();
for (TypedTuple<ShelfEntity> entry: objects) {
shelfEntities.add(entry.getValue().setVersion( entry.getScore().intValue()));
}
return shelfEntities;
}

Elasticsearch + Spring boot: Query creation from method names for property with #InnerField/#MultiField

I'm trying to build an Elasticsearch query using method name and just curios on what would be the method name if one of the property has multiple fields like following
#MultiField(
mainField = #Field(type = Text, fielddata = true),
otherFields = {
#InnerField(suffix = "keyword", type = Keyword)
}
)
private String resourceType;
I needed "keyword" type (non-analyzed) so I can search it with entire string.
I have tried it as
List<Event> findByResourceType_KeywordIsIn(Collection<String> list);
and getting following error
No property keyword found for type String! Traversed path: Event.resourceType.
Is there anyway I can tell spring-data-elasticsearch that it is for the same property but an InnerField ?
P.S: I can certainly go with either #Query or just build that entire query using NativeSearchQueryBuilder but curios if I can achieve it with just a method name(Less code -> Less unit testing :) )
Thanks
This won't work with the method names of Repository implementations. The logic in Spring Data that does the parsing uses the - possibly nested - properties of the java class whereas you need to have a query searching the resourceType.keyword Elasticsearch field.
So as you already wrote, you'll need a #Query to do this.

how to create query parser to parse query param in spring REST

My query parameter is like this:
q=name:abc+age:20+roleid:(23|45)|audeince:(23|24).Here + is for AND | is for OR
I have to accept this query param as it is into my spring controller and have to make query to solar to fetch the data.

#Controller
#RequestMapping("/user")
public class BooksController {
#RequestMapping(value="/details", method=RequestMethod.GET)
public ResponseEntity<?> getUser(final HttpServletRequest request) {
String params = requestParams.get("q")[0];
//passing this string to make query in apache solar
}
}
I need to write a parser to split the param value to make a solar query.how to write a query parser to split the above url to make solar query satisfying OR AND condition.name:abc+age:20+roleid:(23|45)|audeince:(23|24) means create a solar query where name=abc and age=20 and roleid in (23,24) or audience in (23,24) .This way user sends query.
Eg:firstName:(abc|bcd)+lastName:abc+emailId:abc+dsID:abc|countryCd:US+audienceId:(123+678)
first using regex convert like this
firstName:(abc|bcd)+ -----------segment1
lastName:abc+.............segment2
emailId:abc+.............segment3
dsID:abc|--------------segment4
countryCd:US+----------segment5
audienceId:(123+678)------segment 6;
like many segments may come in the url
i have a class called
class Queryobj{
private String field;
private List value;
private String internalOperator;
private String externalOperator;
}
firstName:(abc|bcd)+ again using regex map Like this
field=firstName
value={abc,bcd}
internalOperator=|
externalOperator=+
like second segment emailId:abc+
field=emailId
value=abc
internalOperator=null
externalOperator=+
same for other segments also .if there are n segments we have n objects.
After that add each object to Linked List.if internalOperator or externalOperator is null then leave it as null .How can I achieve that
You can use this regex pattern to get "key:value operator" segment
Pattern keyValuePattern = Pattern.compile("[\\w]+:([\\w#.]+|\\([\\w|+#.]+\\))[+|]?");

Resources