Accessing other documents inside update and reindex script

Accessing other documents inside update and reindex script - elasticsearch

Is there anyway to run a query and fetch matching documents inside a script?
For example I want to add all of the documents matching the query x to the field f of my document d like this:
d:
{
other fields...
"f": [ {doc 1 matching the query}, {doc 2 matching the query}, ... ]
It is possible in sql to use a query inside another query. Does elasticsearch have anything similar?
Tried writing all kinds of code in script to fetch data from indexes in my database but didn't work.

Related

Elastic search wildcard search space issue

Consider index field "ProductName" having the value "dove 3.75oz" and when user searches for "dove 3.75oz" text below bool query is working fine to retreive the document:
{"bool":{"must":[{"wildcard":{"ProductName":{"value":"dove"}}},{"wildcard":{"ProductName":{"value":"3.75oz"}}}]}}
If user searches for "dove 3.75 oz" (Space between "3.75" and "oz") the bool query is failing to retrieve the same document:
{"bool":{"must":[{"wildcard":{"ProductName":{"value":"dove"}}},{"wildcard":{"ProductName":{"value":"3.75 oz"}}}]}}
Question: How to design a query using a wildcard query that supports space or no spaces? Please share an example.

Text fields values are broken into tokens by default and then stored. So something like "hello man"" will be saved separately as hello and man because of the space between them. And that is exactly why this will not work with a wildcard query.
{"wildcard":{"ProductName":{"value":"3.75 oz"}}}
It only works for single tokens. For wildcard queries you can use a special field type called wildcard.
If you do not want to reindex your data, try phrase search like:
"match_phrase": {
"ProductName": {
"query": "3.75 oz"
}
}

Elasticsearch query based on properties of another document

Is there a way in ES to do a single query that finds documents that are based on values "close" (whose logic I determine) to values in another document?
Example: i have document like this:
{
"myId": 10,
"price": 200
}
Now I want to run a query that finds documents that are within 100 either side of the price of the above document (but I don't know the price of document on the client..all I have is the myId)
In other words, i want to write a client method like this:
GetSimilarDocuments(int myId);
Is that possible to do in a single ES query? Or do I need two round trips? (get the document, then do another query based on the values of the document)

Unable to loop through array field ES 6.1

I'm facing a problem in ElasticSearch 6.1 that I cannot solve and I don't know why. I have read the docs several times and maybe I'm missing something.
I have a scripted query that needs to do some calculation before decides if a record is available or not.
Here is the following script:
https://gist.github.com/dunice/a3a8a431140ec004fdc6969f77356fdf
What I'm doing is trying to loop though an array field with the following source:
"unavailability": [
{
"starts_at": "2018-11-27T18:00:00+00:00",
"local_ends_at": "2018-11-27T15:04:00",
"local_starts_at": "2018-11-27T13:00:00",
"ends_at": "2018-11-27T20:04:00+00:00"
},
{
"starts_at": "2018-12-04T18:00:00+00:00",
"local_ends_at": "2018-12-04T15:04:00",
"local_starts_at": "2018-12-04T13:00:00",
"ends_at": "2018-12-04T20:04:00+00:00"
},
]
When the script is executed it throws the error: No field found for [unavailability] in mapping with types [aircraft]
Is there any clue to make it work?
Thanks
UPDATE
Query:
https://gist.github.com/dunice/3ccd7d83ca6ddaa63c11013b84e659aa
UPDATE 2
Mapping:
https://gist.github.com/dunice/f8caee114bbd917115a21b8b9175a439
Data example:
https://gist.github.com/dunice/8ad0602bc282b4ca19bce8ae849117ad

You cannot access an array present in the source document via doc_values (i.e. doc). You need to directly access the source document via the _source variable instead, like this:
for(int i = 0; i < params._source['unavailability'].length; i++) {
Note that depending on your ES version, you might want to try ctx._source or just _source instead of params._source

I solve my use-case in a different approach.
Instead having a field as array of object like unavailability was I decided to create two fields as array of datetime:
unavailable_from
unavailable_to
My script walks through the first field then checks the second with the same position.
UPDATE
The direct access to _source is disabled by default:
https://github.com/elastic/elasticsearch/issues/17558

Group by field in found document

The best way to explain what I want to accomplish is by example.
Let us say that I have an object with fields name and color and transaction_id. I want to search for documents where name and color match the specified value and that I can accomplish easily with boolean queries.
But, I do not want only documents which were found with search query. I also want transaction to which those documents belong, and that is specified with transaction_id. For example, if a document has been found with transaction_idequal to 123, I want my query to return all documents with transaction_idequal to 123.
Of course, I can do that with two queries, first one to fetch all documents that match criteria, and the second one that will return all documents that have one of transaction_idvalues found in first query.
But is there any way to do it in a single query?

You can use parent-child relation ship between transaction and your object. Or nest the denormalize your data to include the objects in the transactions. Otherwise you'll have to do an application side join, meaning 2 queries.
Try an index mapping similar to the following, and include a parent_id in the objects.
{
"mappings": {
"transaction": {},
"object": {
"_parent": {
"type": "transaction"
}
}
}
}
Further reading:
https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child-mapping.html

For 1 billion documents, Populate data from one field to another fields in the same collection using MongoDB

I need to populate data from one field to multiple fields on the same collection. For example:
Currently I have document like below:
{ _id: 1, temp_data: {temp1: [1,2,3], temp2: "foo bar"} }
I want to populate into two different fields on the same collection as like below:
{ _id: 1, temp1: [1,2,3], temp2: "foo bar" }
I have one billion documents to migrate. Please suggest me the efficient way to update all one billion documents?

In your favorite language, write a tool that runs through all documents, migrates them, and store them in a new database.
Some hints:
When iterating the results, make sure they are sorted (e.g. on the _id) so you can implement resume should your migration code crash at 90%...
Do batch inserts: read, say, 1000 items, migrate them, then write 1000 items in a single batch to the new database. Reads are automatically batched.
Create indexes after the migration, not before. That will be faster and lead to less fragmentation

Here I made a query for you, use following query to migrate your data
db.collection.find().forEach(function(myDoc) {
db.collection_new.update(
{_id: myDoc._id},
{
$unset: {'temp_data': 1},
$set: {
'temp1': myDoc.temp_data.temp1,
'temp2': myDoc.temp_data.temp2
}
},
{ upsert: true }
)
});
To learn more about foreach cursor please visit link
Need $limit and $skip operator to migrate data in batches. In update query i have used upsert beacuse there if already exist it will update otherwise inserted entry wiil be new.
Thanks

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Accessing other documents inside update and reindex script - elasticsearch

Related

Elastic search wildcard search space issue

Elasticsearch query based on properties of another document

Unable to loop through array field ES 6.1

Group by field in found document

For 1 billion documents, Populate data from one field to another fields in the same collection using MongoDB

Categories

Resources