I am looking for a way to stream all (~ 10^6+) documents via .NET Nest Client.
I want to boost performance by using parallel async requests. (e.g ActionBlock, Task.WhenAll())
old fashioned without boosting:
var objects = new List<object>();
var searchResponse = await elasticClient.SearchAsync<object>(
new SearchRequest<object>("myIndex")
{
Size = 7000,
Query = new BoolQuery
{
//...
},
// why here and in scroll itself?
Scroll = "2s",
Sort = new List<ISort>
{
//..
}
});
while (searchResponse.Documents.Any())
{
objects.AddRange(searchResponse.Documents);
searchResponse = await elasticClient.ScrollAsync<object>("2s", searchResponse.ScrollId).ConfigureAwait(false);
}
return objects;
then a try using parallel sliced scroll
var result = new ConcurrentBag<object>();
var tasks = Enumerable.Range(0, 4).Select(
id => new SearchRequest<object>("myIndex")
{
// hast to be lower than 1024?
Size = 1000,
Query = new BoolQuery
{
//...
},
// why here and in scroll itself?
Scroll = "2s",
Sort = new List<ISort>
{
//..
}
}).Select(
async searchRequest =>
{
var searchResponse = await elasticClient.SearchAsync<object>(searchRequest).ConfigureAwait(false);
while (searchResponse.Documents.Any())
{
searchResponse.Documents.Each(result.Add);
searchResponse = await elasticClient.ScrollAsync<object>("2s", searchResponse.ScrollId).ConfigureAwait(false);
}
// good idea right?
//await elasticClient.ClearScrollAsync(x => x.ScrollId(searchResponse.ScrollId)).ConfigureAwait(false);
});
await Task.WhenAll(tasks).PreserveAllExceptions().ConfigureAwait(false);
return result.ToList();
But this only gives me a fraction of the actual available documents.
More over slices scroll is limited to 1024 documents per slice.
I was not able to increase this value to 7000:
{
"myIndex_template": {
"settings": {
"index": {
"number_of_shards": "1",
"number_of_replicas": "0",
"max_slices_per_scroll": "10000"
}
}
}
}
Related
I have 50k users in my collection, I want to add a new field suppose rowNumber to every record, i tried this solution using cursor, but it is taking too much time, so how can i increase performance or what will be the better approach?
let user, cursor, rowNumber = 1;
cursor = User.aggregate().cursor({ batchSize: 10 }).exec();
while ((user = await cursor.next())) {
let savedUser = await User.updateOne(
{ _id: user._id },
{ $set: { rowNumber: rowNumber++ } }
);
}
MongoDB has a bulkWrite functionality just for that purpose.
const users = await User.find().exec();
const writeOperations = users.map((user, i) => {
return {
updateOne: {
filter: { _id: user._id },
update: { rowNumber: i + 1 }
}
};
});
await User.bulkWrite(writeOperations);
I am trying to do Elastic search with sort option. My query is like this:
var client = new ElasticClient(settings);
var query = new
{
query = new
{
term = new { title = "7-0 v Spurs" }
},
Sort = new List<ISort>
{
new SortField { Field = "releaseFrom", Order = SortOrder.Descending }
}
};
and my search is like this:
var stream = new MemoryStream();
client.Serializer.Serialize(query, stream);
var jsonQuery = System.Text.Encoding.UTF8.GetString(stream.ToArray());
var qRequest = new SearchRequest(jsonQuery);
var searchResponse = client.LowLevel.Search<SearchResponse<dynamic>>(IndexingService.IndexName, "article_en", qRequest);
I am getting the result, but it returns records which does not match the title and also it does not sort.
This is the query which is generated:
{ "query": { "term": { "title": "7-0 v Spurs" } }, "sort": [ { "releaseFrom": { "order": "desc" } } ] }
Anybody, with suggestion if I miss something here.
Found the solution.
Used ElasticLowLevelClient instead of ElasticClient.
Code is like this:
var lowlevelClient = new ElasticLowLevelClient(settings);
var stream = new MemoryStream();
lowlevelClient.Serializer.Serialize(query, stream);
var jsonQuery = System.Text.Encoding.UTF8.GetString(stream.ToArray());
var searchResponse = lowlevelClient.Search< SearchResponse<dynamic>>(IndexingService.IndexName, "article_en", jsonQuery);
one change in query also
match = new { title = "7-0 v Spurs" }
I would like to do an index mapping by passing through nest but by i want to give directly a raw elasticsearch request:
var setting = new ConnectionSettings(new Uri("uri"));
setting.DefaultIndex(_esIndexName);
var client = new ElasticClient(setting);
string rawEsRequest= "PUT /myindex
{
""mappings"": {
""review"": {
""properties"": {
""commentaire"": {
""analyzer"" : ""french"",
""type"": ""text"",
""fields"": {
""keyword"": {
""type"": ""keyword"",
""ignore_above"": 256
}
}
},
""date_creaation"": {
""type": "date""
}
}}}}"
//want to do this bellow
client.Mapping.rawPut(rawEsRequest);
Do you know if it is possible to give a direct elasticsearch request for doing mapping?
Yes, with the low level client in Elasticsearch.Net that is also exposed on the high level client in NEST through the .LowLevel property. You just need to remove the HTTP verb and URI as these are part of the method call on the client.
var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var defaultIndex = "myindex;
var connectionSettings = new ConnectionSettings(pool)
.DefaultIndex(defaultIndex);
var client = new ElasticClient(connectionSettings);
string rawEsRequest = #"{
""mappings"": {
""review"": {
""properties"": {
""commentaire"": {
""analyzer"" : ""french"",
""type"": ""text"",
""fields"": {
""keyword"": {
""type"": ""keyword"",
""ignore_above"": 256
}
}
},
""date_creaation"": {
""type"": ""date""
}
}
}
}
}";
ElasticsearchResponse<dynamic> putResponse =
client.LowLevel.IndicesCreate<dynamic>(defaultIndex, rawEsRequest);
I am looking for how to init a SearchRequest Object with several no nested Aggregations by using the object initializer syntax.
If the request were given as param into ElasticClient.Search() with lambda expression helper it would be written like bellow:
var response = Client.Search<person>(s => s.Aggregations(a =>
a.Terms("bucketAge", t => t.Field("age").Size(50))
.Terms("bucketCity", t => t.Field("city").Size(50))));
What is paradoxical is i found i how to write a Agg with a nested Agg
var searchRequest = new SearchRequest<person>
{
Size = 0,
Aggregations = new TermsAggregation("bucketAge")
{
Field = "age",
Size = 50,
Aggregations = new TermsAggregation("bucketcity")
{
Field = "city",
Size = 50
}
}
};
But i fail to init SearchRequest with 2 aggs on same level with Something like that:
var searchRequest = new SearchRequest<person>
{
Size = 0,
Aggregations =
{
new TermsAggregation("bucketAge")
{
Field = "age",
Size = 50
},
new TermsAggregation("bucketcity")
{
Field = "city",
Size = 50
}
}
};
How to do this please?
With the Object Initializer syntax, you can combine aggregations with &&
var searchRequest = new SearchRequest<person>
{
Size = 0,
Aggregations =
new TermsAggregation("bucketAge")
{
Field = "age",
Size = 50
} &&
new TermsAggregation("bucketcity")
{
Field = "city",
Size = 50
}
};
var searchResponse = client.Search<person>(searchRequest);
You can use the longer winded method using an aggregation dictionary if you prefer
var aggregations = new Dictionary<string, AggregationContainer>
{
{ "bucketAge", new TermsAggregation("bucketAge")
{
Field = "age",
Size = 50
}
},
{ "bucketcity", new TermsAggregation("bucketcity")
{
Field = "city",
Size = 50
}
},
};
var searchRequest = new SearchRequest<person>
{
Size = 0,
Aggregations = new AggregationDictionary(aggregations)
};
var searchResponse = client.Search<person>(searchRequest);
Note that the keys in the Dictionary<string, AggregationContainer> will be the names of the aggregations in the request.
I'm not able to get TermVector results properly thru SolrNet. I tried with the following code.
QueryOptions options = new QueryOptions()
{
OrderBy = new[] { new SortOrder("markupId", Order.ASC) },
TermVector = new TermVectorParameters
{
Fields = new[] { "text" },
Options = TermVectorParameterOptions.All
}
};
var results = SolrMarkupCore.Query(query, options);
foreach (var docVectorResult in results.TermVectorResults)
{
foreach (var vectorResult in docVectorResult.TermVector)
System.Diagnostics.Debug.Print(vectorResult.ToString());
}
In the above code, results.TermVectorResults in the outer foreach gives the proper count whereas docVectorResult.TermVector in the inner foreach is empty.
I've copied the generated solr query of the above code and issued against solr admin and I'm properly getting the termVectors values. The actual query I issued is below
http://localhost:8983/solr/select/?sort=markupId+asc&tv.tf=true&start=0&q=markupId:%2823%29&tv.offsets=true&tv=true&tv.positions=true&tv.fl=text&version=2.2&rows=50
First you should check HTTP query to sure termvector feature is set property.
If it's not OK, change your indexing based on:
The Term Vector Component
If it is OK,You can use "ExtraParams" by changing the handler to termvector handler. Try this:
public SolrQueryExecuter<Product> instance { get; private set; }
public ICollection<TermVectorDocumentResult> resultDoc(string q)
{
string SERVER="http://localhost:7080/solr/core";//change this
var container = ServiceLocator.Current as SolrNet.Utils.Container;
instance = new SolrQueryExecuter<Product>(
container.GetInstance<ISolrAbstractResponseParser<Product>>(),
new SolrConnection(SERVER),
container.GetInstance<ISolrQuerySerializer>(),
container.GetInstance<ISolrFacetQuerySerializer>(),
container.GetInstance<ISolrMoreLikeThisHandlerQueryResultsParser<Product>>());
instance.DefaultHandler = "/tvrh";
SolrQueryResults<Product> results =
instance.Execute(new SolrQuery(q),
new QueryOptions
{
Fields = new[] { "*" },
Start = 0,
Rows = 10,
ExtraParams = new Dictionary<string, string> {
{ "tv.tf", "false" },
{ "tv.df", "false" },
{ "tv.positions", "true" },
{ "tv", "true" },
{ "tv.offsets", "false" },
{ "tv.payloads", "true" },
{ "tv.fl", "message" },// change the field name here
}
}
);
return results.TermVectorResults;
}