Differentiating fields that didn't exist before schema change with Elastic Search - elasticsearch

I'm trying to add a field to an elastic search schema. I already have about a million records in the index which don't have the field and I need to be able to differentiate those from the ones that are added after the field is. Using the modified date is the absolute last resort because I don't know when this will be turned on in production.
What I considered trying was the old records return something like
{
myField: null
}
and the new ones would return
{
myField: { }
}
But can't find a way to set the field on insert.

Related

Terms query does not work on keyword field which contains an array of values

I am a beginner in Elasticsearch. I recently added a new field jc_job_meta_field which is of keyword type (see image 1 below as I output the mapping of all my fields) and my index is en-gb. I expect it to be an array to hold a bunch of values. And I now have a document with ["Virtual", "Hybrid"] in that field. I wanted to have the ability to search all entries with Virtual in the field jc_job_meta_field. But now when I do a term query search like this
{
"query": {
"terms": {
"jc_job_meta_field": ["Virtual"]
}
}
}
Nothing returned (see image 2 below). Shouldn't it at least return that exact document with [Virtual, Hybrid]? I checked a similar post here and it seems like I am doing exactly what's supposed to work. What went wrong here? Thanks in advance!
My Mapping and field values:
My query:

Is it possible to query by field data type in Elasticsearch?

I am needing to do a query in Elasticsearch by field data type. I have not been successful in creating that query. I want to be able to {1) specify the type I want to search for in the query, i.e. all fields of {"type"="boolean"}, and also, (2) get the field and see what the type is for that field.
Reason is to check that the field is designated correctly. Let's say I inserted the following data into this index and fields and I now want to see what the data types of those fields are programmatically. How would I query that?
POST /index_name1/_doc/
{
"field1":"hello_field_2",
"field2":"123456.54321",
"field3.field4": false,
"field3.field5.field10":"POINT(-117.918976 33.812511)",
"field3.field5.field8": "field_of_dragons",
"field3.field5.field9": "2022-05-26T07:47:26.133275Z"
}
I have tried:
GET /index_name1/_search
{
"query":{
"wildcard":{
"field3.field4":{ "type":"*"}
}
}
}
That gives [wildcard] query does not support [type].
I've tried many other queries and searched the documentation and threads, but can't find anything that will do this. It has got to be possible, right?

How elasticsearch updateByQuery syntax works

I've been working with Elasticsearch for some days. As i'm creating a CRUD, I've come across the updateByQuery method. I'm working with nestjs, and the way that I'm updating a field is:
await this.elasticSearch.updateByQuery(
{
index: 'my_index_user',
body:{
query:{
match:{
name: 'user_name',
}
},
script: {
inline : 'ctx._source.name = "new_user_name"'
}
}
}
);
My question is:
Why does elasticsearch need this syntax 'ctx._source.name = "new_user_name"' to specifie what the new value of the field name should be? What is ctx._source is this context?
As mentioned in the official doc of source filtering, using this you can fetch field value in the _source (Value which sent to Elasticsearch and this is stored as it is, and doesn't go through the analysis process).
Let's take an example of text field for which standard analyzer(Default) is applied, and you store the value of foo bar in this field, Elasticsearch
breaks the value of field as it goes through the analysis process and foo and bar two tokens are stored in the inverted index of Elasticsearch, but if you want to see the original value ie foo bar, you can check the _source and get it.
Hence, it's always better to have the original value(without analysis process) to be in the _source, hence using this API, you are updating the field value there.. this also helps when you want to reindex later to new index or change the way its analyzed as you have the original value in _source.

Type of field for prefix search in Elastic Search

I'm confused on what index type I should apply for my field for prefix search, many show search_as_you_type but I think auto complete is not what I'm going for.
I have a UUID field:
id: 34y72ca1-3739-41ff-bbec-f6d17479384c
The following terms should return the doc above:
3
34
34y72ca1
34y72ca1-3739
34y72ca1-3739-41ff-bbec-f6d17479384c
Using 3739 should not return it as it doesn't start with 3739. Initially this is what I was going for but then the wildcard field is not supported by Amazon AWS, so I compromise for prefix search instead of partial search.
I tried search_as_you_type field but it doesn't return the result when I use the whole ID. Actually, my use case is when user click enter, the results will be shown, instead of real-live when they type, so if speed is compromised its OK, just that I hope for something that will be good for many rows of data.
Thanks
If you have not explicitly defined any index mapping, then you need to use id.keyword field instead of the id field for the prefix query to show the appropriate results. This uses the keyword analyzer instead of the standard analyzer
{
"query": {
"prefix": {
"id.keyword": {
"value": "34y72ca1"
}
}
}
}
Otherwise, you can modify your index mapping, by adding multi fields for id field

Group by field in found document

The best way to explain what I want to accomplish is by example.
Let us say that I have an object with fields name and color and transaction_id. I want to search for documents where name and color match the specified value and that I can accomplish easily with boolean queries.
But, I do not want only documents which were found with search query. I also want transaction to which those documents belong, and that is specified with transaction_id. For example, if a document has been found with transaction_idequal to 123, I want my query to return all documents with transaction_idequal to 123.
Of course, I can do that with two queries, first one to fetch all documents that match criteria, and the second one that will return all documents that have one of transaction_idvalues found in first query.
But is there any way to do it in a single query?
You can use parent-child relation ship between transaction and your object. Or nest the denormalize your data to include the objects in the transactions. Otherwise you'll have to do an application side join, meaning 2 queries.
Try an index mapping similar to the following, and include a parent_id in the objects.
{
"mappings": {
"transaction": {},
"object": {
"_parent": {
"type": "transaction"
}
}
}
}
Further reading:
https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child-mapping.html

Resources