BulkAll only works with default mapping - elasticsearch

I am trying to use the BulkAll() method to add documents to my index.
The index is created using a mapping which works using IndexMany():
client.CreateIndex(indexName, c => c
.Settings(s => s
.NumberOfShards(shardCount).NumberOfReplicas(replicaCount))
.Mappings(ms => ms
.Map<MyDocument>(m => m.AutoMap())
));
I then try to use BulkAll():
Console.WriteLine("Indexing documents into elasticsearch...");
var waitHandle = new CountdownEvent(1);
var bulkAll = client.BulkAll(docsToUpload, b => b
.Index(indexName)
.BackOffRetries(2)
.BackOffTime("30s")
.RefreshOnCompleted(true)
.MaxDegreeOfParallelism(4)
.Size(100)
);
bulkAll.Subscribe(new BulkAllObserver(
onNext: (b) => { Console.Write("."); },
onError: (e) => { throw e; },
onCompleted: () => waitHandle.Signal()
));
waitHandle.Wait();
Console.WriteLine("Done.");
This logs out a row of "....." followed by "Done" as expected, but my index is empty when I check it.
If I don't do the first step of creating the index then the bulk upload creates an index with default mappings, but I can't use it without my mappings.

Related

How to add conditional properties for index creation in elasticsearch nest?

I want to create index with some condition,like with querycontainer to add conditional filters.
PropertiesDescriptor<object> ps = new PropertiesDescriptor<object>();
if (condition)
{
ps.Text(s => s.Name(name[1]));
}
if(condition)
{
ps.Number(s => s.Name(name[1]));
}
if (!_con.client.Indices.Exists(indexname).Exists)
{
var createIndexResponse = _con.client.Indices.Create(indexname, index => index.Settings(s => s.NumberOfShards(1).NumberOfReplicas(0))
.Map(m=>m.Properties(ps)));
}
But i receive following error, can you guide me how to acheive this.
cannot convert from 'Nest.PropertiesDescriptor<object>' to 'System.Func<Nest.PropertiesDescriptor<object>, Nest.IPromise<Nest.IProperties>>'
You are almost there, just change Properties part to m.Properties(p => ps).
_con.client.Indices.Create(indexname,
index => index.Settings(s => s.NumberOfShards(1).NumberOfReplicas(0)).Map(m=>m.Properties(p => ps)));
Hope that helps.

Setting the Elasticsearch routing_partition_size using NEST

I'm using NEST to create an index in Elasticsearch 5.5. I need to update the index.routing_partition_size setting on index creation but don't see that setting in the in CreateIndexDescriptor object. How can I specify what this value is in NEST?
My settings currently looks like this:
return createIndexSelector
//add analyzers and tokenizers
.Settings(s => s
.NumberOfReplicas(2)
.NumberOfShards(40)
.Setting("refresh_interval", 10)
.Analysis(a => a
.Analyzers(az => az
.Custom("str_search_analyzer", c1 => GetCustomSearchAnalyzer())
.Custom("str_index_analyzer", c2 => GetCustomNgramAnalyzer()))
.Tokenizers(tz => tz
.NGram("autocomplete_ngram_tokenizer", ng => GetCustomAutoCompleteTokenizer()))))
//add mappings for invoice and contact doc types
.Mappings(m => m
.Map<DocType>(mDocType => mDocType .Properties(DocType.AddAllMappings)));
Assuming you are using NEST 5.x, it's under IndexSettingsDescriptor
var createIndexResponse = await client.CreateIndexAsync("index", c => c
.Settings(s => s.RoutingPartitionSize(10)));
Which produces the following request
{
"settings": {
"index.routing_partition_size": 10
}
}
Hope that helps.

Adding FunctionScore/FieldValueFactor to a MultiMatch query

We've got a pretty basic query we're using to allow users to provide a query text, and then it boosts matches on different fields. Now we want to add another boost based on votes, but not sure where to nest the FunctionScore in.
Our original query is:
var results = await _ElasticClient.SearchAsync<dynamic>(s => s
.Query(q => q
.MultiMatch(mm => mm
.Fields(f => f
.Field("name^5")
.Field("hobbies^2")
)
.Query(queryText)
)
)
);
If I try to nest in FunctionScore around the MultiMatch, it basically ignores the query/fields, and just returns everything in the index:
var results = await _ElasticClient.SearchAsync<dynamic>(s => s
.Query(q => q
.FunctionScore(fs => fs
.Query(q2 => q2
.MultiMatch(mm => mm
.Fields(f => f
.Field("name^5")
.Field("hobbies^2")
)
.Query(queryText)
)
)
)
)
);
My expectation is that since I'm not providing a FunctionScore or any Functions, this should basically do the exact same thing as above. Then, just adding in FunctionScore will provide boosts on the results based on the functions I give it (in my case, boosting based on the votes field just FieldValueFactor).
The documentation around this is a little fuzzy, particularly with certain combinations, like MultiMatch, FunctionScore, and query text. I did find this answer, but it doesn't cover when including query text.
I'm pretty sure it boils down to my still foggy understanding of how Elastic queries work, but I'm just not finding much to cover the (what I would think is a pretty common) scenario of:
A user entering a query
Boosting matches of that query with certain fields
Boosting all results based on the value of a numeric field
Your function_score query is correct, but the reason that you are not seeing the results that you expect is because of a feature in NEST called conditionless queries. In the case of a function_score query, it is considered conditionless when there are no functions, omitting the query from the serialized form sent in the request.
The easiest way to see this is with a small example
private static void Main()
{
var defaultIndex = "my-index";
var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var settings = new ConnectionSettings(pool, new InMemoryConnection())
.DefaultIndex(defaultIndex)
.DisableDirectStreaming()
.PrettyJson()
.OnRequestCompleted(callDetails =>
{
if (callDetails.RequestBodyInBytes != null)
{
Console.WriteLine(
$"{callDetails.HttpMethod} {callDetails.Uri} \n" +
$"{Encoding.UTF8.GetString(callDetails.RequestBodyInBytes)}");
}
else
{
Console.WriteLine($"{callDetails.HttpMethod} {callDetails.Uri}");
}
Console.WriteLine();
if (callDetails.ResponseBodyInBytes != null)
{
Console.WriteLine($"Status: {callDetails.HttpStatusCode}\n" +
$"{Encoding.UTF8.GetString(callDetails.ResponseBodyInBytes)}\n" +
$"{new string('-', 30)}\n");
}
else
{
Console.WriteLine($"Status: {callDetails.HttpStatusCode}\n" +
$"{new string('-', 30)}\n");
}
});
var client = new ElasticClient(settings);
var queryText = "query text";
var results = client.Search<dynamic>(s => s
.Query(q => q
.FunctionScore(fs => fs
.Query(q2 => q2
.MultiMatch(mm => mm
.Fields(f => f
.Field("name^5")
.Field("hobbies^2")
)
.Query(queryText)
)
)
)
)
);
}
which emits the following request
POST http://localhost:9200/my-index/object/_search?pretty=true&typed_keys=true
{}
You can disable the conditionless feature by marking a query as Verbatim
var results = client.Search<dynamic>(s => s
.Query(q => q
.FunctionScore(fs => fs
.Verbatim() // <-- send the query *exactly as is*
.Query(q2 => q2
.MultiMatch(mm => mm
.Fields(f => f
.Field("name^5")
.Field("hobbies^2")
)
.Query(queryText)
)
)
)
)
);
This now sends the query
POST http://localhost:9200/my-index/object/_search?pretty=true&typed_keys=true
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "query text",
"fields": [
"name^5",
"hobbies^2"
]
}
}
}
}
}

Elastic Search with Scroll and Slice with NEST to retrieve large volume data parallel

I am writing NEST code to retrieve large volume of data from Elastic search.Right now I am using Scroll feature to fetch all records from cluster as synchronous way .
Below is the code snippet.
var response = elasticClient.Search<IndexType>(s => s
.Source(sf => sf
.Includes(i => i
.Fields(
f => f.DateTime
)
)
)
.Scroll("1m")
.From(0)
.Size(9999)
.Query(q => q
.DateRange(r => r
.Field(f => f.DateTime)
.GreaterThanOrEquals(new DateTime(2017, 01, 01))
.LessThan(new DateTime(2017, 04, 01))
)
)
.Sort(q => q.Ascending(u => u.DateTime))
);
List<IndexType> allData = new List<IndexType>();
while (response.Documents.Any())
{
foreach (var document in response.Documents)
{
allData.Add(document);
}
response = elasticClient.Scroll<RACType>("1m", response.ScrollId);
}
Now instead of while loop (fetching 10000 record in batch till all documents fetched ), is there any mechanism to do this Asynchronously/parallel so that I don't have to wait for all iteration ?

example of how to use synonyms in nest

i haven't found a solid example on how to create and use synonyms using Nest for Elasticsearch. if anyone has one it would be helpful.
my attempt looks like this, but i don't know how to apply it to a field.
var syn = new SynonymTokenFilter
{
Synonyms = new [] { "pink, p!nk => pink", "lil, little", "ke$ha, kesha => ke$ha" },
IgnoreCase = true,
Tokenizer = "standard"
};
client.CreateIndex("myindex", i =>
{
i
.Analysis(a => a.Analyzers(an => an
.Add("fullTermCaseInsensitive", fullTermCaseInsensitive)
)
.TokenFilters(x => x
.Add("synonym", syn)
)
)
...
it's very simple :)
you will need to define first the Synonym filter the you can use it in your custom Analyzer...where you can add also other type of filters.
Small example :
.Analysis(descriptor => descriptor
.Analyzers(bases => bases
.Add("folded_word", new CustomAnalyzer()
{
Filter = new List<string> { "icu_folding", "trim", "synonym" },
Tokenizer = "standard"
}
)
)
.TokenFilters(i => i
.Add("synonym", new SynonymTokenFilter()
{
SynonymsPath="analysis/synonym.txt",
Format = "Solr"
}
)
)
Then you can use the custom analyzer in the mapping part
Assuming your fullTermCaseInsensitive analyzer is custom, you need to add your synonym filter to it:
var fullTermCaseInsensitive = new CustomAnalyzer()
{
.
.
.
Filter = new string[] { "syn" }
};
And upon creating your index, you can add a mapping and apply the fullTermCaseInsensitive analyzer to your field(s):
client.CreateIndex("myindex", c => c
.Analysis(a => a
.Analyzers(an => an.Add("fullTermCaseInsensitive", fullTermCaseInsensitive))
.TokenFilters(tf => tf.Add("syn", syn)))
.AddMapping<MyType>(m => m
.Properties(p => p
.String(s => s.Name(t => t.MyField).Analyzer("fullTermCaseInsensitive")))));

Resources