NEST: Update source filter - elasticsearch

I have a method that takes a query as parameter like:
public ISearchResponse<Object> SearchComponent(SearchDescriptor<Object> query)
{
...
}
In this query I want to add a source filter like:
public ISearchResponse<Object> SearchComponent(SearchDescriptor<Object> query)
{
query = query.Source(sf =>
sf.Exclude(e => e
.Field("SomeField")
));
...
}
But what happens if the query already have a source filter? This filter will override that filter right? How can I update the existing queries source filter?

This seems to work but it's not the most beautiful solution. Anyone that can come up with a better alternative?
public ISearchResponse<Object> SearchComponent(ISearchRequest query)
{
var excludeFields = new List<string>();
excludeFields.Add("SomeField");
if (query.Source == null)
{
query.Source = new SourceFilter {Include = "*", Exclude = excludeFields.ToArray()};
}
else if (query.Source.Exclude == null)
{
query.Source.Exclude = excludeFields.ToArray();
}
else
{
query.Source.Exclude.And(excludeFields.ToArray());
}
...
}

You are using an older version of Nest than what I have and this has changed a bit, but I figure you can do something along these lines:
var exclude = query.Source?.Exclude;
query.Source = new SourceFilter() { Excludes = (exclude ?? new Field[0]).Union(moreFields) };

Related

Nest QueryContainer usage

Hi I am able to populate QueryContainer with DateRangeQuery array as shown below QueryContainer marriageDateQuerys = null;
if (!string.IsNullOrEmpty((item.marriage_date)))
{
DateRangeQuery query = new DateRangeQuery();
query.Field = "marriages.marriage_date";
query.Name = item.marriage_date;
query.GreaterThanOrEqualTo = item.marriage_date;
query.LessThanOrEqualTo = item.marriage_date;
marriageDateQuerys &= query;
}
But when I use QueryContainer to use MatchQuery/TermQuery to populate data it is not happening.
QueryContainer marriageSpouseFirstNameQuerys = null;
if (!string.IsNullOrEmpty((item.spouse_first_name)))
{
MatchQuery query = new MatchQuery();
query.Field = "marriages.spouse_first_name";
query.Name = item.spouse_first_name;
marriageSpouseFirstNameQuerys &= query;
}
Query object is created in last if condition but marriageSpouseFirstNameQuerys is not populated with the same. I even tried marriageSpouseFirstNameQuerys += query; but without any success
Didn't try it but you can try something like this
Query = new QueryContainer(new BoolQuery
{
Must = new List<QueryContainer>
{
new MatchQuery
{
//props
},
new TermQuery
{
Field = field
Value = value
},
}
})
Below code worked for me after making changes with eyildiz answer
if (!string.IsNullOrEmpty((item.spouse_last_name)))
{
marriageSpouseLastNameQuery = new QueryContainer(
new MatchQuery
{
Field = "marriages.spouse_last_name",
Query = item.spouse_last_name
});
lstmarriageSpouseLastNameQuerys.Add(marriageSpouseLastNameQuery);
}

elasticsearch nest stopword filter does not work

I am tiring to implement elasticsearch NEST client and indexing documents and SQL data and able to search these perfectly. But I am not able to apply stopwords on these records. Below is the code. Please note I put "abc" as my stopword.
public IndexSettings GetIndexSettings()
{
var stopTokenFilter = new StopTokenFilter();
string stopwordsfilePath = Convert.ToString(ConfigurationManager.AppSettings["Stopwords"]);
string[] stopwordsLines = System.IO.File.ReadAllLines(stopwordsfilePath);
List<string> words = new List<string>();
foreach (string line in stopwordsLines)
{
words.Add(line);
}
stopTokenFilter.Stopwords = words;
var settings = new IndexSettings { NumberOfReplicas = 0, NumberOfShards = 5 };
settings.Settings.Add("merge.policy.merge_factor", "10");
settings.Settings.Add("search.slowlog.threshold.fetch.warn", "1s");
settings.Analysis.Analyzers.Add("xyz", new StandardAnalyzer { StopWords = words });
settings.Analysis.Tokenizers.Add("keyword", new KeywordTokenizer());
settings.Analysis.Tokenizers.Add("standard", new StandardTokenizer());
settings.Analysis.TokenFilters.Add("standard", new StandardTokenFilter());
settings.Analysis.TokenFilters.Add("lowercase", new LowercaseTokenFilter());
settings.Analysis.TokenFilters.Add("stop", stopTokenFilter);
settings.Analysis.TokenFilters.Add("asciifolding", new AsciiFoldingTokenFilter());
settings.Analysis.TokenFilters.Add("word_delimiter", new WordDelimiterTokenFilter());
return settings;
}
public void CreateDocumentIndex(string indexName = null)
{
IndexSettings settings = GetIndexSettings();
if (!this.client.IndexExists(indexName).Exists)
{
this.client.CreateIndex(indexName, c => c
.InitializeUsing(settings)
.AddMapping<Document>
(m => m.Properties(ps => ps.Attachment
(a => a.Name(o => o.Documents)
.TitleField(t => t.Name(x => x.Name)
.TermVector(TermVectorOption.WithPositionsOffsets))))));
}
var r = this.client.GetIndexSettings(i => i.Index(indexName));
}
Indexing Data
var documents = GetDocuments();
documents.ForEach((document) =>
{
indexRepository.IndexData<Document>(document, DOCindexName, DOCtypeName);
});
public bool IndexData<T>(T data, string indexName = null, string mappingType = null)
where T : class, new()
{
if (client == null)
{
throw new ArgumentNullException("data");
}
var result = this.client.Index<T>(data, c => c.Index(indexName).Type(mappingType));
return result.IsValid;
}
In one of my document I have put a single line "abc" and I do not expect this to be returned as "abc" is in my stopword list. But On Searching Document It is also returning the above document. Below is the search query.
public IEnumerable<dynamic> GetAll(string queryTerm)
{
var queryResult = this.client.Search<dynamic>(d => d
.Analyzer("xyz")
.AllIndices()
.AllTypes()
.QueryString(queryTerm)).Documents;
return queryResult;
}
Please suggest where I am going wrong.

NEST Elasticsearch Reindex examples

my objective is to reindex an index with 10 million shards for the purposes of changing field mappings to facilitate significant terms analysis.
My problem is that I am having trouble using the NEST library to perform a re-index, and the documentation is (very) limited. If possible I need an example of the following in use:
http://nest.azurewebsites.net/nest/search/scroll.html
http://nest.azurewebsites.net/nest/core/bulk.html
NEST provides a nice Reindex method you can use, although the documentation is lacking. I've used it in a very rough-and-ready fashion with this ad-hoc WinForms code.
private ElasticClient client;
private double count;
private void reindex_Completed()
{
MessageBox.Show("Done!");
}
private void reindex_Next(IReindexResponse<object> obj)
{
count += obj.BulkResponse.Items.Count();
var progress = 100 * count / (double)obj.SearchResponse.Total;
progressBar1.Value = (int)progress;
}
private void reindex_Error(Exception ex)
{
MessageBox.Show(ex.ToString());
}
private void button1_Click(object sender, EventArgs e)
{
count = 0;
var reindex = client.Reindex<object>(r => r.FromIndex(fromIndex.Text).NewIndexName(toIndex.Text).Scroll("10s"));
var o = new ReindexObserver<object>(onError: reindex_Error, onNext: reindex_Next, completed: reindex_Completed);
reindex.Subscribe(o);
}
And I've just found the blog post that showed me how to do it: http://thomasardal.com/elasticsearch-migrations-with-c-and-nest/
Unfortunately the NEST implementation is not quite what I expected. In my opinion it's a bit over-engineered for possibly the most common use case.
Alot of people just want to update their mappings with zero downtime...
In my case - I had already taken care of creating the index with all its settings and mappings, but NEST insists that it must create a new index when reindexing. That among many other things. Too many other things.
I found it much less complicated to just implement directly - since NEST already has Search, Scroll, and Bulk methods. (this is adopted from NEST's implementation):
// Assuming you have already created and setup the index yourself
public void Reindex(ElasticClient client, string aliasName, string currentIndexName, string nextIndexName)
{
Console.WriteLine("Reindexing documents to new index...");
var searchResult = client.Search<object>(s => s.Index(currentIndexName).AllTypes().From(0).Size(100).Query(q => q.MatchAll()).SearchType(SearchType.Scan).Scroll("2m"));
if (searchResult.Total <= 0)
{
Console.WriteLine("Existing index has no documents, nothing to reindex.");
}
else
{
var page = 0;
IBulkResponse bulkResponse = null;
do
{
var result = searchResult;
searchResult = client.Scroll<object>(s => s.Scroll("2m").ScrollId(result.ScrollId));
if (searchResult.Documents != null && searchResult.Documents.Any())
{
searchResult.ThrowOnError("reindex scroll " + page);
bulkResponse = client.Bulk(b =>
{
foreach (var hit in searchResult.Hits)
{
b.Index<object>(bi => bi.Document(hit.Source).Type(hit.Type).Index(nextIndexName).Id(hit.Id));
}
return b;
}).ThrowOnError("reindex page " + page);
Console.WriteLine("Reindexing progress: " + (page + 1) * 100);
}
++page;
}
while (searchResult.IsValid && bulkResponse != null && bulkResponse.IsValid && searchResult.Documents != null && searchResult.Documents.Any());
Console.WriteLine("Reindexing complete!");
}
Console.WriteLine("Updating alias to point to new index...");
client.Alias(a => a
.Add(aa => aa.Alias(aliasName).Index(nextIndexName))
.Remove(aa => aa.Alias(aliasName).Index(currentIndexName)));
// TODO: Don't forget to delete the old index if you want
}
And the ThrowOnError extension method in case you want it:
public static T ThrowOnError<T>(this T response, string actionDescription = null) where T : IResponse
{
if (!response.IsValid)
{
throw new CustomExceptionOfYourChoice(actionDescription == null ? string.Empty : "Failed to " + actionDescription + ": " + response.ServerError.Error);
}
return response;
}
I second Ben Wilde's answer above. Better to have full control over index creation and the re-index process.
What's missing from Ben's code is support for parent/child relationship. Here is my code to fix that:
Replace the following lines:
foreach (var hit in searchResult.Hits)
{
b.Index<object>(bi => bi.Document(hit.Source).Type(hit.Type).Index(nextIndexName).Id(hit.Id));
}
With this:
foreach (var hit in searchResult.Hits)
{
var jo = hit.Source as JObject;
JToken jt;
if(jo != null && jo.TryGetValue("parentId", out jt))
{
// Document is child-document => add parent reference
string parentId = (string)jt;
b.Index<object>(bi => bi.Document(hit.Source).Type(hit.Type).Index(nextIndexName).Id(hit.Id).Parent(parentId));
}
else
{
b.Index<object>(bi => bi.Document(hit.Source).Type(hit.Type).Index(nextIndexName).Id(hit.Id));
}
}

The method 'OrderBy' must be called before the method 'Skip' Exception

I was trying to implement the jQgrid using MvcjQgrid and i got this exception.
System.NotSupportedException was unhandled by user code
Message=The method 'Skip' is only supported for sorted input in LINQ to Entities. The method 'OrderBy' must be called before the method 'Skip'.
Though OrdeyBy is used before Skip method why it is generating the exception? How can it be solved?
I encountered the exception in the controller:
public ActionResult GridDataBasic(GridSettings gridSettings)
{
var jobdescription = sm.GetJobDescription(gridSettings);
var totalJobDescription = sm.CountJobDescription(gridSettings);
var jsonData = new
{
total = totalJobDescription / gridSettings.PageSize + 1,
page = gridSettings.PageIndex,
records = totalJobDescription,
rows = (
from j in jobdescription
select new
{
id = j.JobDescriptionID,
cell = new[]
{
j.JobDescriptionID.ToString(),
j.JobTitle,
j.JobType.JobTypeName,
j.JobPriority.JobPriorityName,
j.JobType.Rate.ToString(),
j.CreationDate.ToShortDateString(),
j.JobDeadline.ToShortDateString(),
}
}).ToArray()
};
return Json(jsonData, JsonRequestBehavior.AllowGet);
}
GetJobDescription Method and CountJobDescription Method
public int CountJobDescription(GridSettings gridSettings)
{
var jobdescription = _dataContext.JobDescriptions.AsQueryable();
if (gridSettings.IsSearch)
{
jobdescription = gridSettings.Where.rules.Aggregate(jobdescription, FilterJobDescription);
}
return jobdescription.Count();
}
public IQueryable<JobDescription> GetJobDescription(GridSettings gridSettings)
{
var jobdescription = orderJobDescription(_dataContext.JobDescriptions.AsQueryable(), gridSettings.SortColumn, gridSettings.SortOrder);
if (gridSettings.IsSearch)
{
jobdescription = gridSettings.Where.rules.Aggregate(jobdescription, FilterJobDescription);
}
return jobdescription.Skip((gridSettings.PageIndex - 1) * gridSettings.PageSize).Take(gridSettings.PageSize);
}
And Finally FilterJobDescription and OrderJobDescription
private static IQueryable<JobDescription> FilterJobDescription(IQueryable<JobDescription> jobdescriptions, Rule rule)
{
if (rule.field == "JobDescriptionID")
{
int result;
if (!int.TryParse(rule.data, out result))
return jobdescriptions;
return jobdescriptions.Where(j => j.JobDescriptionID == Convert.ToInt32(rule.data));
}
// Similar Statements
return jobdescriptions;
}
private IQueryable<JobDescription> orderJobDescription(IQueryable<JobDescription> jobdescriptions, string sortColumn, string sortOrder)
{
if (sortColumn == "JobDescriptionID")
return (sortOrder == "desc") ? jobdescriptions.OrderByDescending(j => j.JobDescriptionID) : jobdescriptions.OrderBy(j => j.JobDescriptionID);
return jobdescriptions;
}
The exception means that you always need a sorted input if you apply Skip, also in the case that the user doesn't click on a column to sort by. I could imagine that no sort column is specified when you open the grid view for the first time before the user can even click on a column header. To catch this case I would suggest to define some default sorting that you want when no other sorting criterion is given, for example:
switch (sortColumn)
{
case "JobDescriptionID":
return (sortOrder == "desc")
? jobdescriptions.OrderByDescending(j => j.JobDescriptionID)
: jobdescriptions.OrderBy(j => j.JobDescriptionID);
case "JobDescriptionTitle":
return (sortOrder == "desc")
? jobdescriptions.OrderByDescending(j => j.JobDescriptionTitle)
: jobdescriptions.OrderBy(j => j.JobDescriptionTitle);
// etc.
default:
return jobdescriptions.OrderBy(j => j.JobDescriptionID);
}
Edit
About your follow-up problems according to your comment: You cannot use ToString() in a LINQ to Entities query. And the next problem would be that you cannot create a string array in a query. I would suggest to load the data from the DB with their native types and then convert afterwards to strings (and to the string array) in memory:
rows = (from j in jobdescription
select new
{
JobDescriptionID = j.JobDescriptionID,
JobTitle = j.JobTitle,
JobTypeName = j.JobType.JobTypeName,
JobPriorityName = j.JobPriority.JobPriorityName,
Rate = j.JobType.Rate,
CreationDate = j.CreationDate,
JobDeadline = j.JobDeadline
})
.AsEnumerable() // DB query runs here, the rest is in memory
.Select(a => new
{
id = a.JobDescriptionID,
cell = new[]
{
a.JobDescriptionID.ToString(),
a.JobTitle,
a.JobTypeName,
a.JobPriorityName,
a.Rate.ToString(),
a.CreationDate.ToShortDateString(),
a.JobDeadline.ToShortDateString()
}
})
.ToArray()
I had the same type of problem after sorting using some code from Adam Anderson that accepted a generic sort string in OrderBy.
After getting this excpetion, i did lots of research and found that very clever fix:
var query = SelectOrders(companyNo, sortExpression);
return Queryable.Skip(query, iStartRow).Take(iPageSize).ToList();
Hope that helps !
SP

foreach or switch messing up linq query?

Trying to build up a query to filter on data in the following manner works OK, returning users filtered by whatever filters are in the FilterNamesAndValues parameter.
GetAllUsersFiltered(..., Dictionary<string,string> FilterNamesAndValues)
{
....
List<DataContracts.IUser> lstUsers = new List<DataContracts.IUser>();
....
var query = from u in lstUsers select u;
string firstName = string.Empty;
FilterNamesAndValues.TryGetValue("FirstName", out firstName);
query = query.Where(u => u.FirstName == firstName);
string company = string.Empty;
FilterNamesAndValues.TryGetValue("Company", out company);
query = query.Where(u => u.CompanyName == company);
....
return query.ToList();
}
The example below however doesn't work and I can't see why:
GetAllUsersFiltered(..., Dictionary<string,string> FilterNamesAndValues)
{
....
List<DataContracts.IUser> lstUsers = new List<DataContracts.IUser>();
....
var query = from u in lstUsers select u;
foreach (KeyValuePair<string, string> kv in FilterNamesAndValues)
{
if (kv.Value != null)
{
switch (kv.Key)
{
case "FirstName":
query = query.Where(u => u.FirstName == kv.Value);
break;
case "Company":
query = query.Where(u => u.CompanyName == kv.Value);
break;
}
}
}
return query.ToList();
}
After the application has hit the first switch case, I can do a query.ToList() and see a row in there. But by the time the execution has gone around the loop to hit the second filter, query.ToList() returns nothing. The query is not filtered successively the way it was in the first example and worse than that, the filter conditions have effectively been lost. There's probably an obvious explanation for this, but right now I can't see it.
The problem is that you're closing over kv in the foreach, but the query is executed using deferred execution. This causes it to close over the wrong value. For details on what's happening, I'd recommend Eric Lippert's post titled "Closing over the loop variable considered harmful".
You can solve this via a temporary:
foreach (KeyValuePair<string, string> kvOriginal in FilterNamesAndValues)
{
// Make a temporary in the correct scope!
KeyValuePair<string, string> kv = kvOriginal;
if (kv.Value != null)
{
switch (kv.Key)
{
case "FirstName":
query = query.Where(u => u.FirstName == kv.Value);
break;
case "Company":
query = query.Where(u => u.CompanyName == kv.Value);
break;
}
}
}

Resources