Lucene.net Negation clause is not working - full-text-search

I am very much new to Lucene.net and though I am not able to achieve basic functionality i.e. Not in.
My requirement is to search "road?construction" without "Works" word.
e.g.
Main Road Construction Works -- Invalid
Road Construction And Maintenance Services -- Valid (Doesn't contains word Works)
Please refer my code below.
string searchQuery = "\"road?construction\"*";
BooleanQuery query2 = new BooleanQuery();
Query query;
try
{
query = parser.Parse(searchQuery.Trim());
}
catch (ParseException)
{
query = parser.Parse(QueryParser.Escape(searchQuery.Trim()));
}
query2.Add(query,Occur.SHOULD);
query2.Add(new BooleanClause(new TermQuery (new Term("Name", "Works")), Occur.MUST_NOT));
This still gets both above mentioned record in to search result. I wish to cut invalid record(first).
Here is the result query generated in backend.
Please suggest workaround.
Thanks in advanced.

Not sure why your putting wildcard characters into the phrase. If you're looking for "road construction" then that's all you need. If you are looking to allow some variations then maybe a "slop phrase" is what you need ie. "road construction"~2. The number part allows for n "operations" like n additional words inbetween.
Here's a set of tests that show your examples (TestExpr2, TestExpr3) and some working variations (TestExpr1 and TestQuery).
Hope this helps
[TestClass]
public class UnitTest7
{
[TestMethod]
public void TestExpr1()
{
TestExpr("\"road construction\" -works");
}
[TestMethod]
public void TestExpr2()
{
TestExpr("\"road?construction\"* -works");
}
[TestMethod]
public void TestExpr3()
{
TestExpr(QueryParser.Escape("\"road?construction\"* -works"));
}
private void TestExpr(string expr)
{
var writer = CreateIndex();
Add(writer, "Main Road Construction Works");
Add(writer, "Road Construction And Maintenance Services");
writer.Flush(true, true, true);
var searcher = new IndexSearcher(writer.GetReader());
var result = Search(searcher, expr);
Assert.AreEqual(1, result.Count);
Assert.IsTrue(result.Contains("Road Construction And Maintenance Services"));
writer.Dispose();
}
[TestMethod]
public void TestQuery()
{
var writer = CreateIndex();
Add(writer, "Main Road Construction Works");
Add(writer, "Road Construction And Maintenance Services");
writer.Flush(true, true, true);
var searcher = new IndexSearcher(writer.GetReader());
var query = new BooleanQuery();
var p = new PhraseQuery();
p.Add(new Term("name", "road"));
p.Add(new Term("name", "construction"));
query.Add(p, Occur.MUST);
query.Add(new TermQuery(new Term("name", "works")), Occur.MUST_NOT);
var result = Search(searcher, query);
Assert.AreEqual(1, result.Count);
Assert.IsTrue(result.Contains("Road Construction And Maintenance Services"));
writer.Dispose();
}
private List<string> Search(IndexSearcher searcher, string expr)
{
var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);
var queryParser = new QueryParser(Lucene.Net.Util.Version.LUCENE_30, "name", analyzer);
var query = queryParser.Parse(expr);
return Search(searcher, query);
}
private List<string> Search(IndexSearcher searcher, Query query)
{
var collector = TopScoreDocCollector.Create(10, true);
searcher.Search(query, collector);
var result = new List<string>();
var matches = collector.TopDocs().ScoreDocs;
foreach (var item in matches)
{
var id = item.Doc;
var doc = searcher.Doc(id);
result.Add(doc.GetField("name").StringValue);
}
return result;
}
IndexWriter CreateIndex()
{
var directory = new RAMDirectory();
var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);
var writer = new IndexWriter(directory, analyzer, new IndexWriter.MaxFieldLength(1000));
return writer;
}
void Add(IndexWriter writer, string text)
{
var document = new Document();
document.Add(new Field("name", text, Field.Store.YES, Field.Index.ANALYZED));
writer.AddDocument(document);
}
}

Related

Parallel call to Elasticsearch in c#

I am trying to make a parallel call to Elasticsearch index for multiple queries and aggregate the result, using Parallel.ForEach.
My code:
private static List<SearchResponse<dynamic>> SearchQueryInParallel(string indexName, string lang, string[] eQueries)
{
var result = new List<SearchResponse<dynamic>>();
var exceptions = new ConcurrentQueue<Exception>();
object mutex = new object();
try
{
Parallel.ForEach(eQueries,
() => new SearchResponse<dynamic>()
, (q, loopState, subList) =>
{
var x = LowlevelClient.Search<SearchResponse<dynamic>>(indexName, $"article_{lang}", q);
subList = x;
return subList;
}, subList =>
{
lock (result)
result.Add(subList);
}
);
}
catch (AggregateException ae)
{
foreach (var e in ae.InnerExceptions)
{
exceptions.Enqueue(e);
}
}
if (exceptions.ToList().Any())
{
//there are exceptions, do something with them
//do something?
}
return result;
}
The problem I am facing is that sublist in the above case is long.
It gives me the following error:
Can not convert from SearchResponse to long.
The same thing is working when I used without multithreading, the code is:
var items = new List<dynamic>();
var searchResponse = lowLevelClient.Search<SearchResponse<dynamic>>(elasticIndexName, $"article_{languageCode.ToLowerInvariant()}", query);
foreach (var document in searchResponse.Body.Documents)
{
items.Add(document);
}
Any help, please? If somebody has any other way to achieve the parallel call and aggregating data from returned values would be greatly appreciated.

Mock IDocumentQuery with ability to use query expressions

I need to be able to mock IDocumentQuery, to be able to test piece of code, that queries document collection and might use predicate to filter them:
IQueryable<T> documentQuery = client
.CreateDocumentQuery<T>(collectionUri, options);
if (predicate != null)
{
documentQuery = documentQuery.Where(predicate);
}
var list = documentQuery.AsDocumentQuery();
var documents = new List<T>();
while (list.HasMoreResults)
{
documents.AddRange(await list.ExecuteNextAsync<T>());
}
I've used answer from https://stackoverflow.com/a/49911733/212121 to write following method:
public static IDocumentClient Create<T>(params T[] collectionDocuments)
{
var query = Substitute.For<IFakeDocumentQuery<T>>();
var provider = Substitute.For<IQueryProvider>();
provider
.CreateQuery<T>(Arg.Any<Expression>())
.Returns(x => query);
query.Provider.Returns(provider);
query.ElementType.Returns(collectionDocuments.AsQueryable().ElementType);
query.Expression.Returns(collectionDocuments.AsQueryable().Expression);
query.GetEnumerator().Returns(collectionDocuments.AsQueryable().GetEnumerator());
query.ExecuteNextAsync<T>().Returns(x => new FeedResponse<T>(collectionDocuments));
query.HasMoreResults.Returns(true, false);
var client = Substitute.For<IDocumentClient>();
client
.CreateDocumentQuery<T>(Arg.Any<Uri>(), Arg.Any<FeedOptions>())
.Returns(query);
return client;
}
Which works fine as long as there's no filtering using IQueryable.Where.
My question:
Is there any way to capture predicate, that was used to create documentQuery and apply that predicate on collectionDocuments parameter?
Access the expression from the query provider so that it will be passed on to the backing collection to apply the desired filter.
Review the following
public static IDocumentClient Create<T>(params T[] collectionDocuments) {
var query = Substitute.For<IFakeDocumentQuery<T>>();
var queryable = collectionDocuments.AsQueryable();
var provider = Substitute.For<IQueryProvider>();
provider.CreateQuery<T>(Arg.Any<Expression>())
.Returns(x => {
var expression = x.Arg<Expression>();
if (expression != null) {
queryable = queryable.Provider.CreateQuery<T>(expression);
}
return query;
});
query.Provider.Returns(_ => provider);
query.ElementType.Returns(_ => queryable.ElementType);
query.Expression.Returns(_ => queryable.Expression);
query.GetEnumerator().Returns(_ => queryable.GetEnumerator());
query.ExecuteNextAsync<T>().Returns(x => new FeedResponse<T>(query));
query.HasMoreResults.Returns(true, true, false);
var client = Substitute.For<IDocumentClient>();
client
.CreateDocumentQuery<T>(Arg.Any<Uri>(), Arg.Any<FeedOptions>())
.Returns(query);
return client;
}
The important part is where the expression passed to the query is used to create another query on the backing data source (the array).
Using the following example subject under test for demonstration purposes.
public class SubjectUnderTest {
private readonly IDocumentClient client;
public SubjectUnderTest(IDocumentClient client) {
this.client = client;
}
public async Task<List<T>> Query<T>(Expression<Func<T, bool>> predicate = null) {
FeedOptions options = null; //for dummy purposes only
Uri collectionUri = null; //for dummy purposes only
IQueryable<T> documentQuery = client.CreateDocumentQuery<T>(collectionUri, options);
if (predicate != null) {
documentQuery = documentQuery.Where(predicate);
}
var list = documentQuery.AsDocumentQuery();
var documents = new List<T>();
while (list.HasMoreResults) {
documents.AddRange(await list.ExecuteNextAsync<T>());
}
return documents;
}
}
The following sample tests when an expression is passed to the query
[TestMethod]
public async Task Should_Filter_DocumentQuery() {
//Arrange
var dataSource = Enumerable.Range(0, 3)
.Select(_ => new Document() { Key = _ }).ToArray();
var client = Create(dataSource);
var subject = new SubjectUnderTest(client);
Expression<Func<Document, bool>> predicate = _ => _.Key == 1;
var expected = dataSource.Where(predicate.Compile());
//Act
var actual = await subject.Query<Document>(predicate);
//Assert
actual.Should().BeEquivalentTo(expected);
}
public class Document {
public int Key { get; set; }
}

Dynamics CRM Solution Import via SDK is not working

I have the below code that imports a solution into CRM Dynamics.
The code executes successfully and the import job data returns a result of success. How ever when I look for the solution in Settings->Solutions it is not there. Can anyone suggest a fix?
private void ImportSolution(string solutionPath)
{
byte[] fileBytes = File.ReadAllBytes(solutionPath);
var request = new ImportSolutionRequest()
{
CustomizationFile = fileBytes,
ImportJobId = Guid.NewGuid()
};
var response = _settings.DestinationSourceOrgService.Execute(request);
var improtJob = new ImportJob(_settings);
var importJobResult = improtJob.GetImportJob(request.ImportJobId);
var data = importJobResult.Attributes["data"].ToString();
var jobData = new ImportJobData(data);
var filePath = $#"{this._settings.SolutionExportDirectory}\Logs\";
var fileName = $#"{filePath}{jobData.SolutionName}.xml";
Directory.CreateDirectory(filePath);
File.WriteAllText(fileName, data);
PrintResult(jobData.Result, jobData.SolutionName);
}
public class ImportJob
{
private readonly ConfigurationSettings _settings;
public ImportJob(ConfigurationSettings settings)
{
_settings = settings;
}
public Entity GetImportJob(Guid importJobId)
{
var query = new QueryExpression
{
EntityName = "importjob",
ColumnSet = new ColumnSet("importjobid", "data", "solutionname"),
Criteria = new FilterExpression()
};
var result = _settings.DestinationSourceOrgService.Retrieve("importjob", importJobId, new ColumnSet("importjobid", "data", "solutionname", "progress"));
return result;
}
}
Thre ImportSolutionResponse does not contain any information as per the screen shot below.

File upload example for grapevine

I am new to Web API and REST services and looking to build a simple REST server which accepts file uploads. I found out grapevine which is simple and easy to understand. I couldn't find any file upload example?
This is an example using System.Web.Http
var streamProvider = new MultipartFormDataStreamProvider(ServerUploadFolder);
await Request.Content.ReadAsMultipartAsync(streamProvider);
but the grapevine Request property does not have any method to do that. Can someone point me to an example?
If you are trying to upload a file as a binary payload, see this question/answer on GitHub.
If you are trying to upload a file from a form submission, that will be a little bit trickier, as the multi-part payload parsers haven't been added yet, but it is still possible.
The following code sample is complete untested, and I just wrote this off the top of my head, so it might not be the best solution, but it's a starting point:
public static class RequestExtensions
{
public static IDictionary<string, string> ParseFormUrlEncoded(this IHttpRequest request)
{
var data = new Dictionary<string, string>();
foreach (var tuple in request.Payload.Split('='))
{
var parts = tuple.Split('&');
var key = Uri.UnescapeDataString(parts[0]);
var val = Uri.UnescapeDataString(parts[1]);
if (!data.ContainsKey(key)) data.Add(key, val);
}
return data;
}
public static IDictionary<string, FormElement> ParseFormData(this IHttpRequest request)
{
var data = new Dictionary<string, FormElement>();
var boundary = GetBoundary(request.Headers.Get("Content-Type"));
if (boundary == null) return data;
foreach (var part in request.Payload.Split(new[] { boundary }, StringSplitOptions.RemoveEmptyEntries))
{
var element = new FormElement(part);
if (!data.ContainsKey(element.Name)) data.Add(element.Name, element);
}
return data;
}
private static string GetBoundary(string contenttype)
{
if (string.IsNullOrWhiteSpace(contenttype)) return null;
return (from part in contenttype.Split(';', ',')
select part.TrimStart().TrimEnd().Split('=')
into parts
where parts[0].Equals("boundary", StringComparison.CurrentCultureIgnoreCase)
select parts[1]).FirstOrDefault();
}
}
public class FormElement
{
public string Name => _dispositionParams["name"];
public string FileName => _dispositionParams["filename"];
public Dictionary<string, string> Headers { get; private set; }
public string Value { get; }
private Dictionary<string, string> _dispositionParams;
public FormElement(string data)
{
var parts = data.Split(new [] { "\r\n\r\n", "\n\n" }, StringSplitOptions.None);
Value = parts[1];
ParseHeaders(parts[0]);
ParseParams(Headers["Content-Disposition"]);
}
private void ParseHeaders(string data)
{
Headers = data.TrimStart().TrimEnd().Split(new[] {"\r\n", "\n"}, StringSplitOptions.RemoveEmptyEntries).Select(header => header.Split(new[] {':'})).ToDictionary(parts => parts[0].TrimStart().TrimEnd(), parts => parts[1].TrimStart().TrimEnd());
}
private void ParseParams(string data)
{
_dispositionParams = new Dictionary<string, string>();
foreach (var part in data.Split(new[] {';'}))
{
if (part.IndexOf("=") == -1) continue;
var parts = part.Split(new[] {'='});
_dispositionParams.Add(parts[0].TrimStart(' '), parts[1].TrimEnd('"').TrimStart('"'));
}
}
}
If you are looking for something async to use immediately, you can try to implement the answer to this stackoverflow question, which has not been tested by me.

Entity Framework, Code First and Full Text Search

I realize that a lot of questions have been asked relating to full text search and Entity Framework, but I hope this question is a bit different.
I am using Entity Framework, Code First and need to do a full text search. When I need to perform the full text search, I will typically have other criteria/restrictions as well - like skip the first 500 rows, or filter on another column, etc.
I see that this has been handled using table valued functions - see http://sqlblogcasts.com/blogs/simons/archive/2008/12/18/LINQ-to-SQL---Enabling-Fulltext-searching.aspx. And this seems like the right idea.
Unfortunately, table valued functions are not supported until Entity Framework 5.0 (and even then, I believe, they are not supported for Code First).
My real question is what are the suggestions for the best way to handle this, both for Entity Framework 4.3 and Entity Framework 5.0. But to be specific:
Other than dynamic SQL (via System.Data.Entity.DbSet.SqlQuery, for example), are there any options available for Entity Framework 4.3?
If I upgrade to Entity Framework 5.0, is there a way I can use table valued functions with code first?
Thanks,
Eric
Using interceptors introduced in EF6, you could mark the full text search in linq and then replace it in dbcommand as described in http://www.entityframework.info/Home/FullTextSearch:
public class FtsInterceptor : IDbCommandInterceptor
{
private const string FullTextPrefix = "-FTSPREFIX-";
public static string Fts(string search)
{
return string.Format("({0}{1})", FullTextPrefix, search);
}
public void NonQueryExecuting(DbCommand command, DbCommandInterceptionContext<int> interceptionContext)
{
}
public void NonQueryExecuted(DbCommand command, DbCommandInterceptionContext<int> interceptionContext)
{
}
public void ReaderExecuting(DbCommand command, DbCommandInterceptionContext<DbDataReader> interceptionContext)
{
RewriteFullTextQuery(command);
}
public void ReaderExecuted(DbCommand command, DbCommandInterceptionContext<DbDataReader> interceptionContext)
{
}
public void ScalarExecuting(DbCommand command, DbCommandInterceptionContext<object> interceptionContext)
{
RewriteFullTextQuery(command);
}
public void ScalarExecuted(DbCommand command, DbCommandInterceptionContext<object> interceptionContext)
{
}
public static void RewriteFullTextQuery(DbCommand cmd)
{
string text = cmd.CommandText;
for (int i = 0; i < cmd.Parameters.Count; i++)
{
DbParameter parameter = cmd.Parameters[i];
if (parameter.DbType.In(DbType.String, DbType.AnsiString, DbType.StringFixedLength, DbType.AnsiStringFixedLength))
{
if (parameter.Value == DBNull.Value)
continue;
var value = (string)parameter.Value;
if (value.IndexOf(FullTextPrefix) >= 0)
{
parameter.Size = 4096;
parameter.DbType = DbType.AnsiStringFixedLength;
value = value.Replace(FullTextPrefix, ""); // remove prefix we added n linq query
value = value.Substring(1, value.Length - 2);
// remove %% escaping by linq translator from string.Contains to sql LIKE
parameter.Value = value;
cmd.CommandText = Regex.Replace(text,
string.Format(
#"\[(\w*)\].\[(\w*)\]\s*LIKE\s*#{0}\s?(?:ESCAPE N?'~')",
parameter.ParameterName),
string.Format(#"contains([$1].[$2], #{0})",
parameter.ParameterName));
if (text == cmd.CommandText)
throw new Exception("FTS was not replaced on: " + text);
text = cmd.CommandText;
}
}
}
}
}
static class LanguageExtensions
{
public static bool In<T>(this T source, params T[] list)
{
return (list as IList<T>).Contains(source);
}
}
For example, if you have class Note with FTS-indexed field NoteText:
public class Note
{
public int NoteId { get; set; }
public string NoteText { get; set; }
}
and EF map for it
public class NoteMap : EntityTypeConfiguration<Note>
{
public NoteMap()
{
// Primary Key
HasKey(t => t.NoteId);
}
}
and context for it:
public class MyContext : DbContext
{
static MyContext()
{
DbInterception.Add(new FtsInterceptor());
}
public MyContext(string nameOrConnectionString) : base(nameOrConnectionString)
{
}
public DbSet<Note> Notes { get; set; }
protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
modelBuilder.Configurations.Add(new NoteMap());
}
}
you can have quite simple syntax to FTS query:
class Program
{
static void Main(string[] args)
{
var s = FtsInterceptor.Fts("john");
using (var db = new MyContext("CONNSTRING"))
{
var q = db.Notes.Where(n => n.NoteText.Contains(s));
var result = q.Take(10).ToList();
}
}
}
That will generate SQL like
exec sp_executesql N'SELECT TOP (10)
[Extent1].[NoteId] AS [NoteId],
[Extent1].[NoteText] AS [NoteText]
FROM [NS].[NOTES] AS [Extent1]
WHERE contains([Extent1].[NoteText], #p__linq__0)',N'#p__linq__0 char(4096)',#p__linq__0='(john)
Please notice that you should use local variable and cannot move FTS wrapper inside expression like
var q = db.Notes.Where(n => n.NoteText.Contains(FtsInterceptor.Fts("john")));
I have found that the easiest way to implement this is to setup and configure full-text-search in SQL Server and then use a stored procedure. Pass your arguments to SQL, allow the DB to do its job and return either a complex object or map the results to an entity. You don't necessarily have to have dynamic SQL, but it may be optimal. For example, if you need paging, you could pass in PageNumber and PageSize on every request without the need for dynamic SQL. However, if the number of arguments fluctuates per query, it will be the optimal solution.
As the other guys mentioned, I would say start using Lucene.NET
Lucene has a pretty high learning curve, but I found an wrapper for it called "SimpleLucene", that can be found on CodePlex
Let me quote a couple of codeblocks from the blog to show you how easy it is to use. I've just started to use it, but got the hang of it really fast.
First, get some entities from your repository, or in your case, use Entity Framework
public class Repository
{
public IList<Product> Products {
get {
return new List<Product> {
new Product { Id = 1, Name = "Football" },
new Product { Id = 2, Name = "Coffee Cup"},
new Product { Id = 3, Name = "Nike Trainers"},
new Product { Id = 4, Name = "Apple iPod Nano"},
new Product { Id = 5, Name = "Asus eeePC"},
};
}
}
}
The next thing you want to do is create an index-definition
public class ProductIndexDefinition : IIndexDefinition<Product> {
public Document Convert(Product p) {
var document = new Document();
document.Add(new Field("id", p.Id.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED));
document.Add(new Field("name", p.Name, Field.Store.YES, Field.Index.ANALYZED));
return document;
}
public Term GetIndex(Product p) {
return new Term("id", p.Id.ToString());
}
}
and create an search index for it.
var writer = new DirectoryIndexWriter(
new DirectoryInfo(#"c:\index"), true);
var service = new IndexService();
service.IndexEntities(writer, Repository().Products, ProductIndexDefinition());
So, you now have an search-able index. The only remaining thing to do is.., searching! You can do pretty amazing things, but it can be as easy as this: (for greater examples see the blog or the documentation on codeplex)
var searcher = new DirectoryIndexSearcher(
new DirectoryInfo(#"c:\index"), true);
var query = new TermQuery(new Term("name", "Football"));
var searchService = new SearchService();
Func<Document, ProductSearchResult> converter = (doc) => {
return new ProductSearchResult {
Id = int.Parse(doc.GetValues("id")[0]),
Name = doc.GetValues("name")[0]
};
};
IList<Product> results = searchService.SearchIndex(searcher, query, converter);
The example here http://www.entityframework.info/Home/FullTextSearch is not complete solution. You will need to look into understand how the full text search works. Imagine you have a search field and the user types 2 words to hit search. The above code will throw an exception. You need to do pre-processing on the search phrase first to pass it to the query by using logical AND or OR.
for example your search phrase is "blah blah2" then you need to convert this into:
var searchTerm = #"\"blah\" AND/OR \"blah2\" ";
Complete solution would be:
value = Regex.Replace(value, #"\s+", " "); //replace multiplespaces
value = Regex.Replace(value, #"[^a-zA-Z0-9 -]", "").Trim();//remove non-alphanumeric characters and trim spaces
if (value.Any(Char.IsWhiteSpace))
{
value = PreProcessSearchKey(value);
}
public static string PreProcessSearchKey(string searchKey)
{
var splitedKeyWords = searchKey.Split(null); //split from whitespaces
// string[] addDoubleQuotes = new string[splitedKeyWords.Length];
for (int j = 0; j < splitedKeyWords.Length; j++)
{
splitedKeyWords[j] = $"\"{splitedKeyWords[j]}\"";
}
return string.Join(" AND ", splitedKeyWords);
}
this methods uses AND logic operator. You might pass that as an argument and use the method for both AND or OR operators.
You must escape none-alphanumeric characters otherwise it would throw exception when a user enters alpha numeric characters and you have no server site model level validation in place.
I recently had a similar requirement and ended up writing an IQueryable extension specifically for Microsoft full text index access, its available here IQueryableFreeTextExtensions

Resources