Implement a LIKE query operator for Aerospike - full-text-search

I'm new with Aerospike. Is there any easy way to make a search with a part of text like Mysql. Example:
select * from test where column like '%hello%';
I find difficult to migrate to a NoSQL database if this common operations are not supported.
Thanks.

Predicate filtering was added in release 3.12. You can use the stringRegex method of the PredExp class of the Java client to implement an equivalent to LIKE. Predicate filters also currently exists for the C, C# and Go clients.
This example in the Aerospike Java client shows something similar:
Statement stmt = new Statement();
stmt.setNamespace(params.namespace);
stmt.setSetName(params.set);
stmt.setFilter(Filter.range(binName, begin, end));
stmt.setPredExp(
PredExp.stringBin("bin3"),
PredExp.stringValue("prefix.*suffix"),
PredExp.stringRegex(RegexFlag.ICASE | RegexFlag.NEWLINE)
);
If you're using a language client that doesn't yet support predicate filtering, you'd implement this with a stream UDF attached to a scan or query. For example, in the Python client you would create an instance of class aerospike.Query with or without a predicate and call the aerospike.Query.apply() method.
Ideally you would accelerate this by bucketing and using a predicate to narrow down your search, rather than scanning the entire set. For example, you can create a startswith bin that holds the first letter, use the predicate to find that, then send the records matched through the stream UDF. Just note that a LIKE is a horribly slow operation on an RDBMS, as well, because it can't use an index.
local function bin_like(bin, val, plain)
return function(rec)
if rec[bin] and type(rec[bin]) == "string" then
if string.find(rec[bin], val, 1, plain) then
return true
else
return false
else
return false
end
end
end
local function map_record(rec)
local ret = map()
for i, bin_name in ipairs(record.bin_names(rec)) do
ret[bin_name] = rec[bin_name]
end
return ret
end
function check_bins_match(stream, bin, val, plain)
return stream : filter(bin_like(bin, val, plain)) : map(map_record)
end

Related

What is does expression<T> do?

What does Expression<T> do?
I have seen it used in a method similar to:
private Expression<Func<MyClass,bool>> GetFilter(...)
{
}
Can't you just return the Func<MyClass,bool> ?
Google and SO searches have failed me due to the < and > signs.
If TDelegate represents a delegate type, then Expression<TDelegate> represents a lambda expression that can be converted to a delegate of type TDelegate as an expression tree. This allows you to programatically inspect a lambda expression to extract useful information.
For example, if you have
var query = source.Where(x => x.Name == "Alan Turing");
then x => x.Name == "Alan Turning" can be inspected programatically if it's represented as an expression tree, but not so much if it's thought of as a delegate. This is particularly useful in the case of LINQ providers which will walk the expression tree to convert the lambda expression into a different representation. For example, LINQ to SQL would convert the above expression tree to
SELECT * FROM COMPUTERSCIENTIST WHERE NAME = 'Alan Turing'
It can do that because of the representation of the lambda expression as a tree whose nodes can be walked and inspected.
An Expression allows you to inspect the structure of the code inside of the delegate rather than just storing the delegate itself.
As usual, MSDN is pretty clear on the matter:
MSDN - Expression(TDelegate)
Yes, Func<> can be used in place of place of an Expression. The utility of an expression tree is that it gives remote LINQ providers such as LINQ to SQL the ability to look ahead and see what statements are required to allow the query to function. In other words, to treate code as data.
//run the debugger and float over multBy2. It will be able to tell you that it is an method, but it can't tell you what the implementation is.
Func<int, int> multBy2 = x => 2 * x;
//float over this and it will tell you what the implmentation is, the parameters, the method body and other data
System.Linq.Expressions.Expression<Func<int, int>> expression = x => 2 * x;
In the code above you can compare what data is available via the debugger. I invite you to do this. You will see that Func has very little information available. Try it again with Expressions and you will see a lot of information including the method body and parameters are visible at runtime. This is the real power of Expression Trees.

Doesn't IQueryable.Expression = Expression.Constant(this); cause an infinite loop?

Expression trees represent code in a tree-like data structure, where each node is an expression. In Linq they are used by linq providers to convert them into native language of a target process.
I know very little of expression trees, but I've been reading the following code where author uses Expression.Constant(this) to describe the initial query. Thus according to author Expression.Constant(this) should enable provider to retrieve the initial sequence of elements for someQuery.
But to my understanding this should instead cause an infinite loop, since expression tree in someQuery.Expression is describing someQuery object and not the details of the query itself ( or to put it differently, if target platform is SQL DB, Expression.Constant(this) doesn't describe in non-sql terms which rows or tables a query should retrieve from a DB ). And thus when provider looks into someQuery.Expression, it will only find description D of someQuery object.And if it further inspects the details of D.Expression property, it again finds the description of someQuery object and so on – thus infinite loop:
public class Query<T> : IQueryable<T>...
{
QueryProvider provider;
Expression expression;
public Query(QueryProvider provider) {
this.provider = provider;
this.expression = Expression.Constant(this);
}
...
}
Query<string> someQuery = new Query<string>();
thank you
I'd expect the provider to have knowledge of the query type and know that when it hit a constant expression of type Query<T>, it had hit a leaf, effectively. Sooner or later, the provider has to get to something describing "the whole table" or an equivalent. Of course the Query<T> would need the information about which table etc, but in a full example I'd expect it to have that information.
(Out of interest, am I the author in question? I wrote something very similar in C# in Depth...)

Sorting CouchDB Views By Value

I'm testing out CouchDB to see how it could handle logging some search results. What I'd like to do is produce a view where I can produce the top queries from the results. At the moment I have something like this:
Example document portion
{
"query": "+dangerous +dogs",
"hits": "123"
}
Map function
(Not exactly what I need/want but it's good enough for testing)
function(doc) {
if (doc.query) {
var split = doc.query.split(" ");
for (var i in split) {
emit(split[i], 1);
}
}
}
Reduce Function
function (key, values, rereduce) {
return sum(values);
}
Now this will get me results in a format where a query term is the key and the count for that term on the right, which is great. But I'd like it ordered by the value, not the key. From the sounds of it, this is not yet possible with CouchDB.
So does anyone have any ideas of how I can get a view where I have an ordered version of the query terms & their related counts? I'm very new to CouchDB and I just can't think of how I'd write the functions needed.
It is true that there is no dead-simple answer. There are several patterns however.
http://wiki.apache.org/couchdb/View_Snippets#Retrieve_the_top_N_tags. I do not personally like this because they acknowledge that it is a brittle solution, and the code is not relaxing-looking.
Avi's answer, which is to sort in-memory in your application.
couchdb-lucene which it seems everybody finds themselves needing eventually!
What I like is what Chris said in Avi's quote. Relax. In CouchDB, databases are lightweight and excel at giving you a unique perspective of your data. These days, the buzz is all about filtered replication which is all about slicing out subsets of your data to put in a separate DB.
Anyway, the basics are simple. You take your .rows from the view output and you insert it into a separate DB which simply emits keyed on the count. An additional trick is to write a very simple _list function. Lists "render" the raw couch output into different formats. Your _list function should output
{ "docs":
[ {..view row1...},
{..view row2...},
{..etc...}
]
}
What that will do is format the view output exactly the way the _bulk_docs API requires it. Now you can pipe curl directly into another curl:
curl host:5984/db/_design/myapp/_list/bulkdocs_formatter/query_popularity \
| curl -X POST host:5984/popularity_sorter/_design/myapp/_view/by_count
In fact, if your list function can handle all the docs, you may just have it sort them itself and return them to the client sorted.
This came up on the CouchDB-user mailing list, and Chris Anderson, one of the primary developers, wrote:
This is a common request, but not supported directly by CouchDB's
views -- to do this you'll need to copy the group-reduce query to
another database, and build a view to sort by value.
This is a tradeoff we make in favor of dynamic range queries and
incremental indexes.
I needed to do this recently as well, and I ended up doing it in my app tier. This is easy to do in JavaScript:
db.view('mydesigndoc', 'myview', {'group':true}, function(err, data) {
if (err) throw new Error(JSON.stringify(err));
data.rows.sort(function(a, b) {
return a.value - b.value;
});
data.rows.reverse(); // optional, depending on your needs
// do something with the data…
});
This example runs in Node.js and uses node-couchdb, but it could easily be adapted to run in a browser or another JavaScript environment. And of course the concept is portable to any programming language/environment.
HTH!
This is an old question but I feel it still deserves a decent answer (I spent at least 20 minutes on searching for the correct answer...)
I disapprove of the other suggestions in the answers here and feel that they are unsatisfactory. Especially I don't like the suggestion to sort the rows in the applicative layer, as it doesn't scale well and doesn't deal with a case where you need to limit the result set in the DB.
The better approach that I came across is suggested in this thread and it posits that if you need to sort the values in the query you should add them into the key set and then query the key using a range - specifying a desired key and loosening the value range. For example if your key is composed of country, state and city:
emit([doc.address.country,doc.address.state, doc.address.city], doc);
Then you query just the country and get free sorting on the rest of the key components:
startkey=["US"]&endkey=["US",{}]
In case you also need to reverse the order - note that simple defining descending: true will not suffice. You actually need to reverse the start and end key order, i.e.:
startkey=["US",{}]&endkey=["US"]
See more reference at this great source.
I'm unsure about the 1 you have as your returned result, but I'm positive this should do the trick:
emit([doc.hits, split[i]], 1);
The rules of sorting are defined in the docs.
Based on Avi's answer, I came up with this Couchdb list function that worked for my needs, which is simply a report of most-popular events (key=event name, value=attendees).
ddoc.lists.eventPopularity = function(req, res) {
start({ headers : { "Content-type" : "text/plain" } });
var data = []
while(row = getRow()) {
data.push(row);
}
data.sort(function(a, b){
return a.value - b.value;
}).reverse();
for(i in data) {
send(data[i].value + ': ' + data[i].key + "\n");
}
}
For reference, here's the corresponding view function:
ddoc.views.eventPopularity = {
map : function(doc) {
if(doc.type == 'user') {
for(i in doc.events) {
emit(doc.events[i].event_name, 1);
}
}
},
reduce : '_count'
}
And the output of the list function (snipped):
165: Design-Driven Innovation: How Designers Facilitate the Dialog
165: Are Your Customers a Crowd or a Community?
164: Social Media Mythbusters
163: Don't Be Afraid Of Creativity! Anything Can Happen
159: Do Agencies Need to Think Like Software Companies?
158: Customer Experience: Future Trends & Insights
156: The Accidental Writer: Great Web Copy for Everyone
155: Why Everything is Amazing But Nobody is Happy
Every solution above will break couchdb performance I think. I am very new to this database. As I know couchdb views prepare results before it's being queried. It seems we need to prepare results manually. For example each search term will reside in database with hit counts. And when somebody searches, its search terms will be looked up and increments hit count. When we want to see search term popularity, it will emit (hitcount, searchterm) pair.
The Link Retrieve_the_top_N_tags seems to be broken, but I found another solution here.
Quoting the dev who wrote that solution:
rather than returning the results keyed by the tag in the map step, I would emit every occurrence of every tag instead. Then in the reduce step, I would calculate the aggregation values grouped by tag using a hash, transform it into an array, sort it, and choose the top 3.
As stated in the comments, the only problem would be in case of a long tail:
Problem is that you have to be careful with the number of tags you obtain; if the result is bigger than 500 bytes, you'll have couchdb complaining about it, since "reduce has to effectively reduce". 3 or 6 or even 20 tags shouldn't be a problem, though.
It worked perfectly for me, check the link to see the code !

Strange problem with LINQ to NHibernate and string comparison

I'm using LINQ to NHibernate and encountered a strange problem while comparing strings. Following code works fine but when I un-comment:
//MyCompareFunc(dl.DamageNumber, damageNumberSearch) &&
and comment:
dl.DamageNumber.Contains(damageNumberSearch) &&
then it breaks down and seems that MyCompareFunc() always return true while dl.DamageNumber.Contains(damageNumberSearch) sometimes return true and sometimes returns false.
In other words when I use string.Contains() in LINQ query directly it works, but when I move it to a method, it does not work.
internal List<DamageList> SearchDamageList(
DateTime? sendDateFromSearch, DateTime? sendDateToSearch, string damageNumberSearch,
string insuranceContractSearch)
{
var q = from dl in session.Linq<DamageList>()
where
CommonHelper.IsDateBetween(dl.SendDate, sendDateFromSearch, sendDateToSearch) &&
//MyCompareFunc(dl.DamageNumber, damageNumberSearch) &&
dl.DamageNumber.Contains(damageNumberSearch) &&
insuranceContractSearch == null ? true : CommonHelper.IsSame(dl.InsuranceContract, insuranceContractSearch)
select dl;
return q.ToList<DamageList>();
}
private bool MyCompareFunc(string damageNumber, string damageNumberSearch)
{
return damageNumber.Contains(damageNumberSearch);
}
I have to admit I'm not an expert in NHibernate, but while using a different ORM we have frequently run into the same kind of problem. The thing is that the LINQ engine, while translating the query, is capable of recognizing simple string functions from .NET library like Contains and translating them into the SQL equivalent. This SQL equivalent does the comparison case-insensitive (it depends on the settings of the database, but that's usually the default).
On the other hand, it's not possible for him to parse the source code of your custom function and therefore it can't translate it into SQL and has to just execute it in memory after preloading the result of the previous query from the database. This means it is executed as a .NET code, where the comparison is done by default case-sensitive.
That could be the reason for your mismatch of results ;)
Linq works with expressions, not with compliled functions. It will be fine if you use expression> instead of the "compiled" method.

Sorting IQueryable by Aggregate in VB.net

been searching for a quick example of sorting a IQueryable (Using Linq To SQL) using a Aggregate value.
I basically need to calculate a few derived values (Percentage difference between two values etc) and sort the results by this.
i.e.
return rows.OrderBy(Function(s) CalcValue(s.Visitors, s.Clicks))
I want to call an external function to calculate the Aggregate. Should this implement IComparer? or IComparable?
thanks
[EDIT]
Have tried to use:
Public Class SortByCPC : Implements IComparer(Of Statistic)
Public Function Compare(ByVal x As Statistic, ByVal y As Statistic) As Integer Implements System.Collections.Generic.IComparer(Of Statistic).Compare
Dim xCPC = x.Earnings / x.Clicks
Dim yCPC = y.Earnings / y.Clicks
Return yCPC - xCPC
End Function
End Class
LINQ to SQL doesn't like me using IComparer
LINQ to SQL is never going to like you using your own methods within a query - it can't see inside them and work out what you want the SQL to look like. It can only see inside expression trees, built up from lambda expressions in the query.
What you want is something like:
Dim stats = From x in db.Statistics
Where (something, if you want filtering)
Order By x.Earnings / x.Clicks;
If you really want to fetch all of the results and then order them, you need to indicate to LINQ that you're "done" with the IQueryable side of things - call AsEnumerable() and then you can do any remaining processing on the client. It's better to get the server to do as much as possible though.
My VB is pretty bad, but I think this is what it should look like. This assumes that CalcValues returns a double and the type of rows is RowClass. This example does not use the IComparer version of the OrderBy extension but relies on the fact the doubles are comparable already and returns the CalcValue (assumed as double) as the key.
Dim keySelector As Func(Of Double, RowClass) = _
Func( s As RowClass) CalcValue( s.Visitors, s.Clicks )
return rows.OrderBy( keySelector )
Here are some links you might find useful.
IQueryable.OrderBy extension method
Lambda expressions for Visual Basic
My solution:
Dim stats = rows.OrderBy(Function(s) If(s.Visitors > 0, s.Clicks / s.Visitors, 0))
This also catches any divide by zero exceptions

Resources