SolrNet ANDs and ORs with SolrQuery objects and eDisMax - solrnet

I have been building Solr queries manually as strings and passing them to SolrNet. The queries can be complicated combinations of ANDs and ORs like this:
_query_:"field1:[1 TO 10] OR
field2:[1 TO 10] OR
field3:[1 TO 10]"
AND
_query_:"field4:(keyword)"
AND
_query_:"field5:(keyword)"
This was working well, but looking into the API for SolrNet, I see there are objects I could use for each clause and then pull these objects together to form the complete query. I would much rather implement this with that approach than build and concatenate the strings.
(I should mention that I am using the eDisMax parser, which allows me to use the _query_ field as you see above.)
The API is well documented for ANDs and ORs but I need to have the ANDs and ORs grouped to handle the sitations like above--things like (a OR b) AND (c OR d). Has anyone done this before with SolrNet? Thanks!
UPDATE: I found an example that I think combines ANDs and ORs with parenthesis here. Unfortunately, this assumes that I know the structure of the query in advance. Instead, I will be creating a SolrNet query dynamically based on user input, so I can't hardcode a pattern like (a) && (b || c).

The following code should give you what you are looking for:
var queryList = new List<ISolrQuery>();
if (condition1)
queryList.Add(new SolrMultipleCriteriaQuery(new List<ISolrQuery>
{
new SolrQueryByRange<decimal>("field1", 1, 10),
new SolrQueryByRange<decimal>("field2", 1, 10),
new SolrQueryByRange<decimal>("field3", 1, 10)
}, "OR"));
if (condition2)
queryList.Add(new SolrQueryByField("field3", keyword));
if (condition3)
queryList.Add(new SolrQueryByField("field4", keyword));
var finalQuery = new SolrMultipleCriteriaQuery(queryList, "AND");

Related

How to get documents that contain sub-string in FaunaDB

I'm trying to retrieve all the tasks documents that have the string first in their name.
I currently have the following code, but it only works if I pass the exact name:
res, err := db.client.Query(
f.Map(
f.Paginate(f.MatchTerm(f.Index("tasks_by_name"), "My first task")),
f.Lambda("ref", f.Get(f.Var("ref"))),
),
)
I think I can use ContainsStr() somewhere, but I don't know how to use it in my query.
Also, is there a way to do it without using Filter()? I ask because it seems like it filters after the pagination, and it messes up with the pages
FaunaDB provides a lot of constructs, this makes it powerful but you have a lot to choose from. With great power comes a small learning curve :).
How to read the code samples
To be clear, I use the JavaScript flavor of FQL here and typically expose the FQL functions from the JavaScript driver as follows:
const faunadb = require('faunadb')
const q = faunadb.query
const {
Not,
Abort,
...
} = q
You do have to be careful to export Map like that since it will conflict with JavaScripts map. In that case, you could just use q.Map.
Option 1: using ContainsStr() & Filter
Basic usage according to the docs
ContainsStr('Fauna', 'a')
Of course, this works on a specific value so in order to make it work you need Filter and Filter only works on paginated sets. That means that we first need to get a paginated set. One way to get a paginated set of documents is:
q.Map(
Paginate(Documents(Collection('tasks'))),
Lambda(['ref'], Get(Var('ref')))
)
But we can do that more efficiently since one get === one read and we don't need the docs, we'll be filtering out a lot of them. It's interesting to know that one index page is also one read so we can define an index as follows:
{
name: "tasks_name_and_ref",
unique: false,
serialized: true,
source: "tasks",
terms: [],
values: [
{
field: ["data", "name"]
},
{
field: ["ref"]
}
]
}
And since we added name and ref to the values, the index will return pages of name and ref which we can then use to filter. We can, for example, do something similar with indexes, map over them and this will return us an array of booleans.
Map(
Paginate(Match(Index('tasks_name_and_ref'))),
Lambda(['name', 'ref'], ContainsStr(Var('name'), 'first'))
)
Since Filter also works on arrays, we can actually simple replace Map with filter. We'll also add a to lowercase to ignore casing and we have what we need:
Filter(
Paginate(Match(Index('tasks_name_and_ref'))),
Lambda(['name', 'ref'], ContainsStr(LowerCase(Var('name')), 'first'))
)
In my case, the result is:
{
"data": [
[
"Firstly, we'll have to go and refactor this!",
Ref(Collection("tasks"), "267120709035098631")
],
[
"go to a big rock-concert abroad, but let's not dive in headfirst",
Ref(Collection("tasks"), "267120846106001926")
],
[
"The first thing to do is dance!",
Ref(Collection("tasks"), "267120677201379847")
]
]
}
Filter and reduced page sizes
As you mentioned, this is not exactly what you want since it also means that if you request pages of 500 in size, they might be filtered out and you might end up with a page of size 3, then one of 7. You might think, why can't I just get my filtered elements in pages? Well, it's a good idea for performance reasons since it basically checks each value. Imagine you have a massive collection and filter out 99.99 percent. You might have to loop over many elements to get to 500 which all cost reads. We want pricing to be predictable :).
Option 2: indexes!
Each time you want to do something more efficient, the answer lies in indexes. FaunaDB provides you with the raw power to implement different search strategies but you'll have to be a bit creative and I'm here to help you with that :).
Bindings
In Index bindings, you can transform the attributes of your document and in our first attempt we will split the string into words (I'll implement multiple since I'm not entirely sure which kind of matching you want)
We do not have a string split function but since FQL is easily extended, we can write it ourselves bind to a variable in our host language (in this case javascript), or use one from this community-driven library: https://github.com/shiftx/faunadb-fql-lib
function StringSplit(string: ExprArg, delimiter = " "){
return If(
Not(IsString(string)),
Abort("SplitString only accept strings"),
q.Map(
FindStrRegex(string, Concat(["[^\\", delimiter, "]+"])),
Lambda("res", LowerCase(Select(["data"], Var("res"))))
)
)
)
And use it in our binding.
CreateIndex({
name: 'tasks_by_words',
source: [
{
collection: Collection('tasks'),
fields: {
words: Query(Lambda('task', StringSplit(Select(['data', 'name']))))
}
}
],
terms: [
{
binding: 'words'
}
]
})
Hint, if you are not sure whether you have got it right, you can always throw the binding in values instead of terms and then you'll see in the fauna dashboard whether your index actually contains values:
What did we do? We just wrote a binding that will transform the value into an array of values at the time a document is written. When you index the array of a document in FaunaDB, these values are indexes separately yet point all to the same document which will be very useful for our search implementation.
We can now find tasks that contain the string 'first' as one of their words by using the following query:
q.Map(
Paginate(Match(Index('tasks_by_words'), 'first')),
Lambda('ref', Get(Var('ref')))
)
Which will give me the document with name:
"The first thing to do is dance!"
The other two documents didn't contain the exact words, so how do we do that?
Option 3: indexes and Ngram (exact contains matching)
To get exact contains matching efficient, you need to use a (still undocumented function since we'll make it easier in the future) function called 'NGram'. Dividing a string in ngrams is a search technique that is often used underneath the hood in other search engines. In FaunaDB we can easily apply it as due to the power of the indexes and bindings. The Fwitter example has an example in it's source code that does autocompletion. This example won't work for your use-case but I do reference it for other users since it's meant for autocompleting short strings, not to search a short string in a longer string like a task.
We'll adapt it though for your use-case. When it comes to searching it's all a tradeoff of performance and storage and in FaunaDB users can choose their tradeoff. Note that in the previous approach, we stored each word separately, with Ngrams we'll split words even further to provide some form of fuzzy matching. The downside is that the index size might become very big if you make the wrong choice (this is equally true for search engines, hence why they let you define different algorithms).
What NGram essentially does is get substrings of a string of a certain length.
For example:
NGram('lalala', 3, 3)
Will return:
If we know that we won't be searching for strings longer than a certain length, let's say length 10 (it's a tradeoff, increasing the size will increase the storage requirements but allow you to do query for longer strings), you can write the following Ngram generator.
function GenerateNgrams(Phrase) {
return Distinct(
Union(
Let(
{
// Reduce this array if you want less ngrams per word.
indexes: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
indexesFiltered: Filter(
Var('indexes'),
// filter out the ones below 0
Lambda('l', GT(Var('l'), 0))
),
ngramsArray: q.Map(Var('indexesFiltered'), Lambda('l', NGram(LowerCase(Var('Phrase')), Var('l'), Var('l'))))
},
Var('ngramsArray')
)
)
)
}
You can then write your index as followed:
CreateIndex({
name: 'tasks_by_ngrams_exact',
// we actually want to sort to get the shortest word that matches first
source: [
{
// If your collections have the same property tht you want to access you can pass a list to the collection
collection: [Collection('tasks')],
fields: {
wordparts: Query(Lambda('task', GenerateNgrams(Select(['data', 'name'], Var('task')))))
}
}
],
terms: [
{
binding: 'wordparts'
}
]
})
And you have an index backed search where your pages are the size you requested.
q.Map(
Paginate(Match(Index('tasks_by_ngrams_exact'), 'first')),
Lambda('ref', Get(Var('ref')))
)
Option 4: indexes and Ngrams of size 3 or trigrams (Fuzzy matching)
If you want fuzzy searching, often trigrams are used, in this case our index will be easy so we're not going to use an external function.
CreateIndex({
name: 'tasks_by_ngrams',
source: {
collection: Collection('tasks'),
fields: {
ngrams: Query(Lambda('task', Distinct(NGram(LowerCase(Select(['data', 'name'], Var('task'))), 3, 3))))
}
},
terms: [
{
binding: 'ngrams'
}
]
})
If we would place the binding in values again to see what comes out we'll see something like this:
In this approach, we use both trigrams on the indexing side as on the querying side. On the querying side, that means that the 'first' word which we search for will also be divided in Trigrams as follows:
For example, we can now do a fuzzy search as follows:
q.Map(
Paginate(Union(q.Map(NGram('first', 3, 3), Lambda('ngram', Match(Index('tasks_by_ngrams'), Var('ngram')))))),
Lambda('ref', Get(Var('ref')))
)
In this case, we do actually 3 searches, we are searching for all of the trigrams and union the results. Which will return us all sentences that contain first.
But if we would have miss-spelled it and would have written frst we would still match all three since there is a trigram (rst) that matches.

Power Query - Multiple OR statement with values

I've been doing research on this and I find a plethora of articles related to Text, but they don't seem to be working for me.
To be clear this formula works, I'm just looking to make it more efficient. My formula looks like:
if [organization_id] = 1 or [organization_id] = 2 or [organization_id] = 3 then "North" else if … where organization_id is of type "WholeNumber"
I'd like to simplify this by doing something like:
if [organization_id] in {1, 2, 3} then "North" else if …
I've tried wrapping in Parenthesis, Braces, & Brackets. Nothing seems to work. Most articles are using some form of text.replace function and mine is just a custom column.
Does MCode within Power Query have any efficiencies like this or do I have to write out each individual statement like the first line?
I've had success with the a List.Contains formulation:
List.Contains({1,2,3}, [organization_id])
The above checks if [organization_id] is in the list supplied in the first argument.
In some cases, you may not want to hardcode a list as shown above but reference a table column instead. For example,
List.Contains(TableWithDesiredIds[id_column], [organization_id])

Linq and lambda expression

What is the difference between LINQ and Lambda Expressions? Are there any advantages to using lambda instead of linq queries?
Linq is language integrated query. When using linq, a small anonymous function is often used as a parameter. That small anonymous function is a lambda expression.
var q = someList.Where(a => a > 7);
In the above query a => a > 7 is a lambda expression. It's the equivalent of writing a small utility method and passing that to Where:
bool smallMethod(int value)
{
return value > 7;
}
// Inside another function:
var q = someList.Where(smallMethod);
This means that your question is really not possible to answer. Linq and lambdas are not interchangeable, rather lambdas are one of the technologies used to implement linq.
LINQ is Language integrated query, where is lamda expression are similar to Annonymous method for .Net 2.0.
You can't really compare them may be you are confused because LINQ is associated with lamda expression most of the time.
You need to see this article: Basics of LINQ & Lamda Expressions
EDIT: (I am not so sure, but may be you are looking for the difference between Query Syntax and Method Sytnax)
int[] numbers = { 5, 10, 8, 3, 6, 12};
//Query syntax:
IEnumerable<int> numQuery1 =
from num in numbers
where num % 2 == 0
orderby num
select num;
//Method syntax:
IEnumerable<int> numQuery2 = numbers.Where(num => num % 2 == 0).OrderBy(n => n);
In the above example taken from MSDN, Method Sytnax contains a lamda expression (num => num % 2 == 0) which works like a method, takes number as input and returns true if they are even.
They both are similar, and in the words of Jon Skeet, they both compile to similar code.
In a nutshell:
LINQ is a quering technology (Language Integrated Query). LINQ makes extensive use of lambda's as arguments to standard query operator methods such as the Where clause.
A lambda expression is an anonymous function that contain expressions and statements. It is completely separate and distinct from LINQ.

Explain the below Linq Query?

results.Where(x=>x.Members.Any(y=>members.Contains(y.Name.ToLower())
I happened to see this query in internet. Can anyone explain this query please.
suggest me a good LINQ tutorial for this newbie.
thank you all.
Edited:
what is this x and y stands for?
x is a single result, of the type of the elements in the results sequence.
y is a single member, of the type of the elements in the x.Members sequence.
These are lambda expressions (x => x.whatever) that were introduced into the language with C# 3, where x is the input, and the right side (x.whatever) is the output (in this particular usage scenario).
An easier example
var list = new List<int> { 1, 2, 3 };
var oddNumbers = list.Where(i => i % 2 != 0);
Here, i is a single int item that is an input into the expression. i % 2 != 0 is a boolean expression evaluating whether the input is even or odd. The entire expression (i => i % 2 != 0) is a predicate, a Func<int, bool>, where the input is an integer and the output is a boolean. Follow? As you iterate over the query oddNumbers, each element in the list sequence is evaluated against the predicate. Those that pass then become part of your output.
foreach (var item in oddNumbers)
Console.WriteLine(item);
// writes 1, 3
Its a lambda expression. Here is a great LINQ tutorial
Interesting query, but I don't like it.
I'll answer your second question first. x and y are parameters to the lambda methods that are defined in the calls to Where() and Any(). You could easy change the names to be more meaningful:
results.Where(result =>
result.Members.Any(member => members.Contains(member.Name.ToLower());
And to answer your first question, this query will return each item in results where the Members collection has at least one item that is also contained in the Members collection as a lower case string.
The logic there doesn't make a whole lot of sense to me with knowing what the Members collection is or what it holds.
x will be every instance of the results collection. The query uses lambda syntax, so x=>x.somemember means "invoke somemember on each x passed in. Where is an extension method for IEnumerables that expects a function that will take an argument and return a boolean. Lambda syntax creates delegates under the covers, but is far more expressive for carrying out certain types of operation (and saves a lot of typing).
Without knowing the type of objects held in the results collection (results will be something that implements IEnumerable), it is hard to know exactly what the code above will do. But an educated guess is that it will check all the members of all the x's in the above collection, and return you an IEnumerable of only those that have members with all lower-case names.

Recursively (?) compose LINQ predicates into a single predicate

(EDIT: I have asked the wrong question. The real problem I'm having is over at Compose LINQ-to-SQL predicates into a single predicate - but this one got some good answers so I've left it up!)
Given the following search text:
"keyword1 keyword2 keyword3 ... keywordN"
I want to end up with the following SQL:
SELECT [columns] FROM Customer
WHERE
(Customer.Forenames LIKE '%keyword1%' OR Customer.Surname LIKE '%keyword1%')
AND
(Customer.Forenames LIKE '%keyword2%' OR Customer.Surname LIKE '%keyword2%')
AND
(Customer.Forenames LIKE '%keyword3%' OR Customer.Surname LIKE '%keyword3%')
AND
...
AND
(Customer.Forenames LIKE '%keywordN%' OR Customer.Surname LIKE '%keywordN%')
Effectively, we're splitting the search text on spaces, trimming each token, constructing a multi-part OR clause based on each token, and then AND'ing the clauses together.
I'm doing this in Linq-to-SQL, and I have no idea how to dynamically compose a predicate based on an arbitrarily-long list of subpredicates. For a known number of clauses, it's easy to compose the predicates manually:
dataContext.Customers.Where(
(Customer.Forenames.Contains("keyword1") || Customer.Surname.Contains("keyword1")
&&
(Customer.Forenames.Contains("keyword2") || Customer.Surname.Contains("keyword2")
&&
(Customer.Forenames.Contains("keyword3") || Customer.Surname.Contains("keyword3")
);
but I want to handle an arbitrary list of search terms. I got as far as
Func<Customer, bool> predicate = /* predicate */;
foreach(var token in tokens) {
predicate = (customer
=> predicate(customer)
&&
(customer.Forenames.Contains(token) || customer.Surname.Contains(token));
}
That produces a StackOverflowException - presumably because the predicate() on the RHS of the assignment isn't actually evaluated until runtime, at which point it ends up calling itself... or something.
In short, I need a technique that, given two predicates, will return a single predicate composing the two source predicates with a supplied operator, but restricted to the operators explicitly supported by Linq-to-SQL. Any ideas?
I would suggest another technique
you can do:
var query = dataContext.Customers;
and then, inside a cycle do
foreach(string keyword in keywordlist)
{
query = query.Where(Customer.Forenames.Contains(keyword) || Customer.Surname.Contains(keyword));
}
If you want a more succinct and declarative way of writing this, you could also use Aggregate extension method instead of foreach loop and mutable variable:
var query = keywordlist.Aggregate(dataContext.Customers, (q, keyword) =>
q.Where(Customer.Forenames.Contains(keyword) ||
Customer.Surname.Contains(keyword));
This takes dataContext.Customers as the initial state and then updates this state (query) for every keyword in the list using the given aggregation function (which just calls Where as Gnomo suggests.

Resources