Elasticsearch document id type integer vs string : Is there any performace difference? - elasticsearch

I am using elasticsearch 2.3.1. Currently all the document ids are integer. But I have a situation where the document ids can be numeric valued or sometimes alpha-numeric string. So I need to make the field type 'string'.
So, I need to know if there is any performance difference based on the type of Id. Please help....

Elasticsearch will store the id as a String even if your mapping says otherwise:
"mappings": {
"properties": {
"id": {
"type": "integer"
},
That is my mapping, but when I do a sort on _id I get documents ordered as:
10489, 10499, 105, 10514...
i.e. in String order.

Latest version of ES (7.14) mandates the document's _id to be a String. You can see it in the documentation for org.elasticsearch.action.index.IndexRequest. IndexRequest mandates the _id to be a String field alone. No other types are supported. Example usage of IndexRequest can be found here: https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-document-index.html
In case, the above link stops working later, here is the snippet from the link:
IndexRequest request = new IndexRequest("posts");
request.id("1"); //This is the only method available to set the document's _id.
String jsonString = "{" +
"\"user\":\"kimchy\"," +
"\"postDate\":\"2013-01-30\"," +
"\"message\":\"trying out Elasticsearch\"" +
"}";
request.source(jsonString, XContentType.JSON);

Related

How to filter Range criteria using ElasticSearch Repository

I need to fetch Employees who joined between 2021-12-01 to 2021-12-31. I am using ElasticsearchRepository to fetch data from ElasticSearch index.
How can we fetch range criteria using repository.
public interface EmployeeRepository extends ElasticsearchRepository<Employee, String>,EmployeeRepositoryCustom {
List<Employee> findByJoinedDate(String joinedDate);
}
I have tried Between option like below: But it is returning no results
List<Employee> findByJoinedDateBetween(String fromJoinedDate, String toJoinedDate);
My Index configuration
#Document(indexName="employee", createIndex=true,type="_doc", shards = 4)
public class Employee {
#Field(type=FieldType.Text)
private String joinedDate;
Note: You seem to be using an outdated version of Spring Data Elasticsearch. The type parameter of the #Document
annotation was deprecated in 4.0 and removed in 4.1, as Elasticsearch itself does not support typed indices since
version 7.
To your question:
In order to be able to have a range query for dates in Elasticsearch the field in question must be of type date (the
Elasticsearch type). For your entity this would mean (I refer to the attributes from the current version 4.3):
#Nullable
#Field(type = FieldType.Date, pattern = "uuuu-MM-dd", format = {})
private LocalDate joinedDate;
This defines the joinedDate to have a date type and sets the string representation to the given pattern. The
empty format argument makes sure that the additional default values (DateFormat.date_optional_time and DateFormat. epoch_millis) are not set here. This results in the
following mapping in the index:
{
"properties": {
"joinedDate": {
"type": "date",
"format": "uuuu-MM-dd"
}
}
}
If you check the mapping in your index (GET localhost:9200/employee/_mapping) you will see that in your case the
joinedDate is of type text. You will either need to delete the index and have it recreated by your application or
create it with a new name and then, after the application has written the mapping, reindex the data from the old
index into the new one (https://www.elastic.co/guide/en/elasticsearch/reference/7.16/docs-reindex.html).
Once you have the index with the correct mapping in place, you can define the method in your repository like this:
List<Employee> findByJoinedDateBetween(LocalDate fromJoinedDate, LocalDate toJoinedDate);
and call it:
repository.findByJoinedDateBetween(LocalDate.of(2021, 1, 1), LocalDate.of(2021, 12, 31));

ElasticSearch with Nest: Partial search using multiple words using Query<>.Wildcard

I have been pulling my hair out trying to configure and partial search ElasticSearch indexed data using Nest library version 5.3.1 (same version applies to its one of its dependencies; Elasticsearch.Net).
As per suggestions found online I used data attributes to specify analyzer type on some of the indexed properties as shown below:
public class Article
{
public int Id { get; set; }
[Completion(Analyzer = "standard", PreservePositionIncrements = true, PreserveSeparators = true)]
public string Title { get; set; }
public string Url { get; set; }
}
I have at least one record in the search index for type "Article" having title starting with "The greatest ....". Whenever I perform a partial search for a keyword "greatest" using code below, it works just fine returning matching search results.
MultiTermQueryRewrite multiqueryRewrite = null;
var searchQuery = Query<Article>.Wildcard(f => f.Title, "*greatest*", rewrite: multiqueryRewrite);
var client = ElasticsearchClient.GetClient<Article>();
return client.Search<Article>(s => s.Query(searchQuery));
But... if I try searching for "the greatest" keywords with any variation listed below, I don't get any results back.
var searchQuery = Query<Article>.Wildcard(f => f.Title, "*the greatest*", rewrite: multiqueryRewrite);
or
var searchQuery = Query<Article>.Wildcard(f => f.Title, "*the*greatest*", rewrite: multiqueryRewrite);
or even
var searchQuery = Query<Article>.Wildcard(f => f.Title, "*the?greatest*", rewrite: multiqueryRewrite);
I am new to the ElasticSearch product, so any help would be greatly appreciated.
Thanks in advance for your help.
As per documentation
Wild card Matches documents that have fields matching a wildcard expression (not analyzed).
Since title field is Analyzed, it gets tokenized before getting indexed. Some text say The Greatest will get tokenized and then converted into lower case (Behaviour Of Standard Analyzer). So it will be stored in reverse index as two tokens the and greatest.
When you search for *greatest*. It is searched as there is a token corresponding to that.
But when you search for * the greatest * , it is not found as there is no token which contains this text.
You can use Query String
var searchQuery = Query<Article>.QueryString(c => c
.Query("*the greatest*")
.DefaultField(p=>p.Title))
Hope this helps!!
The standard analyzer applied to the Title field produces lower case terms of your "The Greatest" Title in the following format [the, greatest]. You could consider using the Keyword Analyzer but please note you will have to deal with word casing.

Converting stringified float to float in Elasticsearch

I have a mapping in an Elasticsearch index with a certain string field called duration. However, duration is actually a float, but it's passed in as a string from my provisioning chain, so it will always look something like this : "0.12". So now I'd like to create a new index with a new mapping, where the duration field is a float. Here's what I've done, which isn't working at all, either for old entries or for incoming new ones.
First, I create my new index with my new mapping by doing the following :
PUT new_index
{
"mappings": { "new_mapping": {"properties": {"duration": {"type": "float"}, ... }
}
I then check that the new mapping are really in place using :
GET new_index/_mapping
I then copy the contents of the old index into the new one :
POST _reindex
{
"source": {
"index": "old_index"
},
"dest": {
"index": "new_index"
}
}
However, when I look at the entries in new_index, be it the ones I've added with that last POST or the new ones that came in since through my provisioning chain, the duration entry is still a string, even when its _type is new_mapping.
What am I doing wrong here ? Or is there simply no way to convert a string to a float within Elasticsearch ?
The duration field in the new index will be indexed as float (as per your mapping), however if the duration field in the source document is still a string, it will stay as a string in the _source, but still be indexed as float.
You can do a range query "from 1.00 to 3.00" on the new index and compare with what you get in the old index. Since the old index will run a lexical range (because of the string type) you might get results with a duration of 22.3, while in the new index you'll only get durations that are really between 1.00 and 3.00.

Elasticsearch 2.x index mapping _id

I ran ElasticSearch 1.x (happily) for over a year. Now it's time for some upgrading - to 2.1.x. The nodes should be turned off and then (one-by-one) on again. Seems easy enough.
But then I ran into troubles. The major problem is the field _uid, which I created myself so that I knew the exact location of a document from a random other one (by hashing a value). This way I knew that only that the exact one will be returned. During upgrade I got
MapperParsingException[Field [_uid] is a metadata field and cannot be added inside a document. Use the index API request parameters.]
But when I try to map my former _uid to _id (which should also be good enough) I get something similar.
The reason why I used the _uid param is because the lookup time is a lot lower than a termsQuery (or the like).
How can I still use the _uid or _id field in each document for the fast (and exact) lookup of certain exact documents? Note that I have to call thousands exact ones at the time, so I need an ID like query. Also it may occur the _uid or _id of the document does not exist (in that case I want, like now, a 'false-like' result)
Note: The upgrade from 1.x to 2.x is pretty big (Filters gone, no dots in names, no default access to _xxx)
Update (no avail):
Updating the mapping of _uid or _id using:
final XContentBuilder mappingBuilder = XContentFactory.jsonBuilder().startObject().startObject(type).startObject("_id").field("enabled", "true").field("default", "xxxx").endObject()
.endObject().endObject();
CLIENT.admin().indices().prepareCreate(index).addMapping(type, mappingBuilder)
.setSettings(Settings.settingsBuilder().put("number_of_shards", nShards).put("number_of_replicas", nReplicas)).execute().actionGet();
results in:
MapperParsingException[Failed to parse mapping [XXXX]: _id is not configurable]; nested: MapperParsingException[_id is not configurable];
Update: Changed name into _id instead of _uid since the latter is build out of _type#_id. So then I'd need to be able to write to _id.
Since there appears to be no way around setting the _uid and _id I'll post my solution. I mapped all document which had a _uid to uid (for internal referencing). At some point it came to me, you can set the relevant id
To bulk insert document with id you can:
final BulkRequestBuilder builder = client.prepareBulk();
for (final Doc doc : docs) {
builder.add(client.prepareIndex(index, type, doc.getId()).setSource(doc.toJson()));
}
final BulkResponse bulkResponse = builder.execute().actionGet();
Notice the third argument, this one may be null (or be a two valued argument, then the id will be generated by ES).
To then get some documents by id you can:
final List<String> uids = getUidsFromSomeMethod(); // ids for documents to get
final MultiGetRequestBuilder builder = CLIENT.prepareMultiGet();
builder.add(index_name, type, uids);
final MultiGetResponse multiResponse = builder.execute().actionGet();
// in this case I simply want to know whether the doc exists
if (only_want_to_know_whether_it_exists){
for (final MultiGetItemResponse response : multiResponse.getResponses()) {
final boolean exists = response.getResponse().isExists();
exist.add(exists);
}
} else {
// retrieve the doc as json
final String string = builder.getSourceAsString();
// handle JSON
}
If you only want 1:
client.prepareGet().setIndex(index).setType(type).setId(id);
Doing - the single update - using curl is mapping-id-field (note: exact copy):
# Example documents
PUT my_index/my_type/1
{
"text": "Document with ID 1"
}
PUT my_index/my_type/2
{
"text": "Document with ID 2"
}
GET my_index/_search
{
"query": {
"terms": {
"_id": [ "1", "2" ]
}
},
"script_fields": {
"UID": {
"script": "doc['_id']"
}
}
}

Spring Data Solr Facet Range example?

Using Spring data solr 1.4 -- I have FacetQuery defined as such:
#Facet(fields = { "client", "state", “market”, “price" }, limit = 10)
FacetPage<SearchResponse> findTerm(String fieldName, String fieldValue, String filterField, String filterValue, Pageable pageable);
How do I add ranges to the facet price? I don't want all the single values, but 10-20, 20-30, 30-40, etc.
Seems still an open issue. Check:
https://github.com/spring-projects/spring-data-solr/pull/29

Resources