I am struggling with a new problem in ElasticSearch 7. I get the "limit of total fields" error when trying to create the index with auto map in Nest library, c#.
await _elasticContext.GetClient().Indices.CreateAsync(indexName, c => c
.Map<DocumentWrapper>(m => m.AutoMap()));
Question is how to integrate the index.mapping.total_fields.limit into the query above? Or an alternative solution, if there is one?
Thank you
Please find an example below
var response = await client.Indices.CreateAsync("my_index1", c => c
.Settings(s => s.Setting("index.mapping.total_fields.limit", 10))
.Map(m => m.AutoMap<Document>()));
Hope that helps.
The maximum number of fields in an index. Field and object mappings, as well as field aliases count towards this limit. The default value is 1000.
see doc
Related
I'm indexing a lot of data in Elasticsearch (through NEST) from multiple processes each running multiple threads. Part of indexing a document is finding out if we have seen a similar document before. This feature is implemented by generating a hash of a set of fields on the document and checking if we have documents in Elasticsearch with the same hash. Before indexing a document, I make the following query:
var result = elasticClient
.Index(indexName)
.Count<MyDocument>(c => c
.Query(q => q
.ConstantScore(qs => qs
.Filter(f => f
.Term(field => field.Hash, hash))))
...
This returns a count of existing documents with the specified hash. So far so good. Things are working. If a process is indexing two documents with the same hash within the same second, the count check doesn't work, since the first document isn't available for search yet. I'm running with the default refresh interval (1 second). For now I have added a refresh call after indexing each document:
var refreshResponse = client.Refresh(indexName);
This also works but it doesn't scale when indexing large amounts of documents (indexing becomes slow as already pointed out here: https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-indexing-speed.html).
Any ideas for how to avoid having to call Refresh but still be able to perform a uniqueness check? I'm thinking some sort of local cache shared between all threads with hashes of documents indexed since the last refresh. I know that this won't work across processes, but that is acceptaple for now.
I ended up implementing a write-through cache as suggested by Val. This makes it possible to remove the call to Refresh but still make the count on each iteration. This is implemented using a singleton MemoryCache shared between all threads:
var cache = new MemoryCache("hashes");
When checking for uniqueness I check the cache in case no similar documents are found in Elasticsearch:
var result = elasticClient
.Count<MyDocument>(c => c
.Index(indexName)
.Query(q => q
.ConstantScore(qs => qs
.Filter(f => f
.Term(field => field.Hash, hash)))));
bool isUnique = false;
if (result.Count == 0)
{
isUnique = !cache.Contains(hash);
}
In case the count for the hash returns 0 I check a cache for that hash.
When a document has been successful indexed, I store the hash in the cache with an expiration:
var policy = new CacheItemPolicy();
policy.AbsoluteExpiration = DateTimeOffset.UtcNow.AddSeconds(5);
cache.AddOrGetExisting(hash, string.Empty, policy);
TTL could probably be 1 second as well since that is the refresh interval I currently have configured on the index.
I searched NEST docs but seems to cant find a proper answer for it.
My question is how to search multiple indices against some index pattern using NEST? e.g
if I have indices with following names in Elasticsearch DB
media-2017-10, media-2018-03, media-2018-04
For specifying my selected indices, I need to use wild card character * like this:
client.Search<Media>(s => s
.Index("media-*")
. query goes here .....
Is it possible in NEST ?
Yes, this works. Try it :)
.Index(...) accepts wildcard indices
You can also search in multiple indices in that way:
var allIndices = new[] {
"media-*",
"docs-*",
"common-*"
};
Nest.Indices allIndices = allIndices;
return _elasticClient
.SearchAsync<EsBaseModel>(s => s
.Index( allIndices)
.Size(_esConfig.MaxCallIDsSize)
.RequestConfiguration(r => r.RequestTimeout(TimeSpan.FromMinutes(5)))
.Query(q =>
q.Match(m => m.Field("fieldname").Query(condition))
));
Steps:
Just create an array with string indices.
Indices can be explicit or implicit using any pattern supported in Nest client docs.
Notice - neet to put attention to optimize the searching, since it could take a while to search in all the indices that you've provided.
(optimize can be achieved by ignoring very old dates, limit the results, etc...)
Background
I'm using ElasticSearch as the search engine for a new ASP.Net Core 2.1 website I'm working on. I'm using the Nest API to integrate with it. I want to use the X.PagedList to handle the paging for me.
I've used this in other ASP.Net Core projects and it's worked well querying data in MS SQL Server.
Code
ISearchResponse<Foo> searchResponse =
_elasticSearchClient.Search<Foo>(s => s
.Query(q => q
.Bool(b => b.Filter(distanceFilters))
)
.Source(src => src
.Includes(i => i
.Fields(
f => f.Field1,
f => f.Field2,
f => f.Field3
)
)
)
.From(options.From)
.Size(options.Size)
);
var hitsMD = searchResponse.HitsMetadata;
var results = hitsMD?.Hits.Select(s => new Hit()
{
Index = s.Index,
Id = s.Id,
Score = s.Score,
Job = s.Source
}
).ToPagedList(PageNumber, PageSize);
Issue
When I call .ToPagedList on the search results returned by ElasticSearch, it only shows one page of results.
The issue is that ElasticSearch has its own paging mechanism so it's only returning one page of hits.
I had the idea that because ElasticSearch passes back the total number of hits I could tell the PagedList how many items are in the list by setting the PagedList.TotalItemCount property. However, I can't do this as it's a private set.
I've tried removing the from and size but this returns 10 hits which is ElasticSearch's default size which they obviously put in place for performance reasons.
Question
How can I make use of the X.PagedList package whilst integrating into ElasticSearch using the Nest API?
You've basically got all the pieces here already. All you're missing is StaticPagedList<T>. Since paging is already being handled by Elasticsearch, you need to simply define a static paging setup, i.e.:
var pagedResults = new StaticPagedList<Foo>(results, PageNumber, PageSize, total);
So I'm trying the following query but I only get ten results. I want all matching results.
elasticSearchQuery = (q => q.Filtered(frd => frd
.Query(qf => qf.MatchAll())
.Filter(f => f.Bool(b =>
b.Must(mt =>
mt.Term("productType", productTypeId)
)))));
The MatchAll part doesn't seem to work. What am I doing wrong?
You have to specify number of results. From and size can be set as request parameters, they can also be set within the search body. from defaults to 0, and size defaults to 10.
D Volsky is correct, the default size is 10. You can see this in the documentation here: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-from-size.html
The reason for this is that results that appear completely unrelated might be returned even if their score is low. You might try having your query return 1000 or more results, but applying a min_score to the results. A min_score may help to ensure your results are still relevant. Documentation for min_score here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-min-score.html
follow me on this one...
if i've got a db of movies and i want to search on multiple fields and return the results into a single field, how would i accomplish this?
let me set an example...
my documents have a title and artists.name (array). i want the user to be able to search in both title and artist at the same time so that the results are in the same field. this would be implemented in an 'autocomplete' search scenario where you get results as you type.
so if a user types 'mike' i want to be able to search for actors (artists.name) with the name mike and titles with the word mike in it. in this case, you might return 'magic mike' and 'mike meyers' in the same autocomplete result set. (imdb.com has this implementation)
i understand how to search both of those fields, but how do i return them into one? i believe i'd have to have some knowledge on where my 'hit' came from - title or artists.name. so maybe that's the larger question here - how do i tell which field the hit came from?
I don't think there are any direct ways to determine which field(s) a query matched on. I can think of a few "workaround" approaches that may do it for you- one is by using the multisearch api, and executing separate queries on each field. Another is using highlighting, which will return back the fields that a match was found in.
Example using multi search:
var response = client.MultiSearch(ms => ms
.Search<Artist>("name", s => s.Query(q => q.Match(m => m.OnField(a => a.Name).Query("mike"))))
.Search<Artist>("titles", s => s.Query(q => q.Match(m => m.OnField(a => a.Titles).Query("mike")))));
response.GetResponse<Artist>("name"); // <-- Contains search results from matching on Name
response.GetResponse<Artist>("titles"); // <-- Contains search results from matching on Titles
Example using highlighting:
var response = client.Search<Artist>(s => s
.Query(q => q
.MultiMatch(m => m
.OnFields(a => a.Name, a => a.Titles)
.Query("mike")))
.Highlight(h => h
.OnFields(fs => fs.OnField(a => a.Name),
fs => fs.OnField(a => a.Titles))));
You can then inspect the Highlights object of each hit, or the Highlights object of the response to determine what field the match came from.
There is also the explain api, and you can add explain to your query, but that will return a lot of irrelevant scoring info, which you would have to parse through. Probably too cumbersome for your needs.
As a side note- for autocomplete functionality, if possible I would really try to leverage the completion suggester instead of the above solutions. These are pre-computed suggestions that are created when you index your documents by building up FSTs, which will increase your indexing time as well as index size, but as a result will provide extremely fast suggestions.