elasticseach 2.4 : retrieve all records which are fulfilling all search criterias using scroll - sorting

I am using elastic search for the first time and based on requirements i have some doubts and questions for scroll
To retrieve all data which are fulfilling all search criteria
1)I am trying to use scroll but i found while searching about it
https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking_21_search_changes.html
i found Search type scan is deprecated
but NEST is supporting it
so should i use "search type scan" or "sort by doc"? (I am using elastic search 2.4)
2)Can i use "sorting on any field" when using scrolling?
3)while doing clear scroll
var test2 = client.ClearScroll(x=>x.ScrollId(results.ScrollId));
Getting error as below:
Invalid NEST response built from a unsuccessful low level call on DELETE: /_search/scroll
Audit trail of this API call:
[1] BadResponse: Node: http://mydomain#localhost:9200/ Took: 00:00:00.0160110
OriginalException: System.Net.WebException: The remote server returned an error: (404) Not Found.
at System.Net.HttpWebRequest.GetResponse()
at Elasticsearch.Net.HttpConnection.Request[TReturn](RequestData requestData) in C:\Users\russ\source\elasticsearch-net-2.x\src\Elasticsearch.Net\Connection\HttpConnection.cs:line 141
Request:
{"scroll_id":["c2NhbjswOzE7dG90YWxfaGl0czoxMjs="]}
Response:
{}
so is it correct way of clearing scroll or not?
Update: : below is my code :
List<Object> indexedList = new List<Object>();
ISearchResponse<ListingSearch> listingResult =
client.Search<ListingSearch>(search => search
.Index(Constant.ES_INDEX)
.Type(Constant.ES_TYPE)
.From(listingSearch.StartIndex)
.Size(10)
.Source(s => s.Include(i => i.Fields(outpputFields)))
.Query(query => query.
Bool(boolean => boolean.
Must(
must => must.Term(t => t.Field("is_deleted").Value(false))
)
.Sort(x => x.Field("_doc", SortOrder.Ascending))
.Scroll("60s")
);
List<Object> indexedList = new List<Object>();
var results = client.Scroll<ListingSearch>("60s", listingResult.ScrollId);
while (results.Documents.Any())
{
foreach (var doc in results.Hits)
{
indexedList.Add(doc);
}
results = client.Scroll<ListingSearch>("60s", results.ScrollId);
}
var test2 = client.ClearScroll(x=>x.ScrollId(results.ScrollId));
//Clear Scroll
With above code I am getting data
but if i change size from 10 to 1000, getting no records.
Not sure if issue is the amount of data because my ES db has only 12-15 documents.

NEST 2.x versions have SearchType.Scan because NEST 2.x versions are compatible with all Elasticsearch 2.x versions, so the search type needs to exist when using NEST 2.x against Elasticsearch 2.0. Sending the search type through in later versions won't have any effect.
The most efficient way of retrieving documents with scroll is sorting by _doc but you can specify any sort parameters when scrolling.
When using the scroll API, you should use the scroll_id from the previous request in the next scroll call to fetch the next set of results. Once you have finished with a scroll, it is a good idea to clear it by calling ClearScroll() as you are doing. Your call looks correct; perhaps the scroll_id has already expired at the point you make the clear call?

Related

Elastic Search doesn't give me any error when updating non existent document

I'm running an updateByQuery operation in ElasticSearch using Spring Data ElasticSearch (Spring Boot parent v2.6.1, Elastic Search 7.15.2).
In my ES index I have stored 2 documents.
When I give a non-existent document in the search, it doesn't give me any error, because of which I'm not able to distinguish whether the update actually ran or not.
Updates for a document that exists work fine. I'd like to figure a way such that if no rows are edited, I can log it.
What should I look at? What should I change so that I can get some message to understand if there was an update?
Here's my code snippet:
UpdateByQueryRequest request = new UpdateByQueryRequest('index');
Map<String, Object> data = new HashMap<>();
data.put("marks", "30");
data.put("name", "timmy");
data.put("roll_number", "10");
request.setScript(
new Script(
ScriptType.INLINE, "painless",
"if (ctx._source.name == params.name && ctx._source.roll_number == params.roll_number) {ctx._source.marks=params.marks;}",
data));
BulkByScrollResponse resp = globalClient.updateByQuery(request, RequestOptions.DEFAULT);
log.info("response: {}",resp.getStatus());
I've added the response status as well. What I find weird is that in case of both existent and non-existent document, the updated parameter count is 2, same as the number of documents I have in my index.
response in case of non-existent record:
response: BulkIndexByScrollResponse[sliceId=null,updated=2,created=0,deleted=0,batches=1,versionConflicts=0,noops=0,retries=0,throttledUntil=0s]
response in case of existing record:
response: BulkIndexByScrollResponse[sliceId=null,updated=2,created=0,deleted=0,batches=1,versionConflicts=0,noops=0,retries=0,throttledUntil=0s]
This is pure Elasticsearch code, nothing from Spring Data Elasticsearch .
Where do you specify the query? I don't see any in your code. That means that all documents will be updated - 2 in your case.

How can I use X.PagedList with ElasticSearch Nest?

Background
I'm using ElasticSearch as the search engine for a new ASP.Net Core 2.1 website I'm working on. I'm using the Nest API to integrate with it. I want to use the X.PagedList to handle the paging for me.
I've used this in other ASP.Net Core projects and it's worked well querying data in MS SQL Server.
Code
ISearchResponse<Foo> searchResponse =
_elasticSearchClient.Search<Foo>(s => s
.Query(q => q
.Bool(b => b.Filter(distanceFilters))
)
.Source(src => src
.Includes(i => i
.Fields(
f => f.Field1,
f => f.Field2,
f => f.Field3
)
)
)
.From(options.From)
.Size(options.Size)
);
var hitsMD = searchResponse.HitsMetadata;
var results = hitsMD?.Hits.Select(s => new Hit()
{
Index = s.Index,
Id = s.Id,
Score = s.Score,
Job = s.Source
}
).ToPagedList(PageNumber, PageSize);
Issue
When I call .ToPagedList on the search results returned by ElasticSearch, it only shows one page of results.
The issue is that ElasticSearch has its own paging mechanism so it's only returning one page of hits.
I had the idea that because ElasticSearch passes back the total number of hits I could tell the PagedList how many items are in the list by setting the PagedList.TotalItemCount property. However, I can't do this as it's a private set.
I've tried removing the from and size but this returns 10 hits which is ElasticSearch's default size which they obviously put in place for performance reasons.
Question
How can I make use of the X.PagedList package whilst integrating into ElasticSearch using the Nest API?
You've basically got all the pieces here already. All you're missing is StaticPagedList<T>. Since paging is already being handled by Elasticsearch, you need to simply define a static paging setup, i.e.:
var pagedResults = new StaticPagedList<Foo>(results, PageNumber, PageSize, total);

Delete ElasticSearch by Query with JEST

I have some custom data(let's call Camera) in my ElasticSearch, the data showed in Kibana is like
And I tried to delete data by Query according to the accepted answer in this article ElasticSearch Delete by Query, my code is like
String query = "{\"Name\":\"test Added into Es\"}";
DeleteByQuery delete = new DeleteByQuery.Builder(query).addIndex(this._IndexName).addType(this._TypeName).build();
JestResult deleteResult = this._JestClient.execute(delete);
And the result is 404 Not Found.
Its obvious that there exist one Camera data in ElasticSearch which Name match the query, so I believe the 404 is caused by other reason.
Did I do anything wrong? Should I change the query string?
The query needs to be a real query, not a partial document
Try with this instead
String query = "{\"query\": { \"match\": {\"Name\":\"test Added into Es\"}}}";

Cannot index document

I have written some code using the Elasticsearch.Net & NEST client library that should index a document without using a POCO for mapping fields as I have many different documents.
Question 1) Is this correct way to do create an index, does the .AddMapping<string>(mapping => mapping.Dynamic(true)) create the mapping based on the document passed in?
var newIndex = client.CreateIndex(indexName, index => index
.NumberOfReplicas(replicas)
.NumberOfShards(shards)
.Settings(settings => settings
.Add("merge.policy.merge_factor", "10")
.Add("search.slowlog.threshold.fetch.warn", "1s")
)
.AddMapping<string>(mapping => mapping.Dynamic(true))
);
Question 2) Is this possible?
string document = "{\"name\": \"Mike\"}";
var newIndex = client.Index(document, indexSelector => indexSelector
.Index(indexName)
);
When I run code in "Question 2" it returns:
{"Unable to perform request: 'POST ' on any of the nodes after retrying 0 times."}
NEST only deals with typed objects in this case passing a string will cause it to index the document into /{indexName}/string/{id}.
Since it can't infer an id from string and you do not pass it one it will fail on that or on the fact that it can't serialize a string. I'll update the client to throw a better exception in this case.
If you want to index a document as string use the exposed Elasticsearch.NET client like so:
client.Raw.Index(indexName, typeName, id, stringJson);
If you want elasticsearch to come up with an id you can use
client.Raw.Index(indexName, type, stringJson);
client is the NESTclient and the Raw property is an Elasticsearch.Net client with the same connectionsettings.
Please note that I might rename Raw with LowLevel in the next beta update, still debating that.

Multi get returns source as null after bulk update

I am using elastic search multi get for reading documents after bulk update. Its returning some document sources as null.
MultiGetRequestBuilder builder = client.prepareMultiGet();
builder.setRefresh(true);
builder.add(indexName, type, idsList);
MultiGetResponse multiResponse = builder.execute().actionGet();
for (MultiGetItemResponse response : multiResponse.getResponses())
{
String customerJson = response.getResponse().getSourceAsString();
System.out.println("customerJson::" + customerJson);
}
Any issues in my code? Thanks in advance.
When you say "some return sources as null", I assume the get response is marking them as not existing..?
If that's the case, then maybe :
some indexation request in the bulk are failing dur to mapping/random error.
You need to refresh your index between the indexation and the multiget (i.e : your docs are not available for search yet)
transportClient.admin().indices().prepareRefresh(index).execute();
good luck
EDIT : You answered your own question in the comment, but for readability's sake : when using get or multiget, if a routing key was used when indexing, it must be specified again during the get, else, a wrong shard is determined using default routing and the get fails.

Resources