Groovy 1.8 :: LINQ Applied - linq

UPDATE 8/31/2011
Guillaume Laforge has nearly done it:
http://gaelyk.appspot.com/tutorial/app-engine-shortcuts#query
Looks like he's doing an AST transform to pull off the:
alias as Entity
bit. Cool stuff, Groovy 1.8 + AST transform = LINQ-esque queries on the JVM. GL's solution needs more work as far as I can see to pull off full query capabilities (like sub queries, join using(field) syntax and the like), but for his Gaelyk project apparently not necessary.
EDIT
As a workaround to achieving pure LINQ syntax, I have decided to def the aliases. Not a huge deal, and removes a major hurdle that would likely require complex AST transforms to pull off.
So, instead of:
from c as Composite
join t as Teams
...
I now define the aliases (note: need to cast to get auto complete on fields):
def(Teams t,Composite c,Schools s) = [Teams.new(),Composite.new(),Schools.new()]
and use map syntax for from, join, etc.
from c:Composite
join t:Teams
...
To solve issue #2 (see original below), add instance level getProperty methods to each pogo alias (whose scope is limited to ORM closure in which it is called, nice). We just return the string property name as we're building an sql statement.
[t,c,s].each{Object o-> o.metaClass.getProperty = { String k-> k } }
Making "good" progress ;-)
Now to figure out what to do about "=", that is tricky since set property is void. May have to go with eq, neq, gt, etc. but would really prefer the literal symbols, makes for closer-to-sql readability.
If interested, LINQ is doing quite a bit behind the scenes. Jon Skeet (praise his name) has a nice SO reply:
How LINQ works internally?
ORIGINAL
Have been checking out LINQ, highly impressed.
// LINQ example
var games =
from t in Teams
from g in t.Games
where g.gameID = 212
select new { g.gameDate,g.gameTime };
// Seeking Groovy Nirvana
latest { Integer teamID->
from c as Composite
join t as Teams
join s as Schools on ( schoolID = {
from Teams
where t.schoolID = s.schoolID } )
where t.teamID = "$teamID"
select c.location, c.gameType, s.schoolName
group c.gameID
order c.gameDate, c.gameTime
}
The proposed Groovy version compiles fine and if I def the aliases c,t,s with their corresponding POGO, I get strong typed IDE auocomplete on fields, nice. However, nowhere near LINQ, where there are no (visible) variable definitions other than the query itself, totally self contained and strongly typed, wow.
OK, so can it be done in Groovy? I think (hope) yes, but am hung up on 2 issues:
1) How to implicitly populate alias variable without def'ing? Currently I am overriding asType() on String so in "from c as Composite", c gets cast to Composite. Great, but the IDE "thinks" that in closure scope undefined c is a string and thus no autocomplete on POGO fields ;-(
2) Since #1 is not solved, I am def'ing the aliases as per above so I can get autocomplete. Fine, hacked (compared to LINQ), but does the trick. Problem here is that in "select c.location, c.gameType...", I'd like the fields to not be evaluated but simply return "c.location" to the ORM select method, and not null (which is its default value). getProperty() should work here, but I need it to only apply to pogo fields when called from ORM scope (e.g. orm field specific methods like select, order, group, etc.). Bit lost there, perhaps there's a way to annotate orm methods, or only invoke "special" pogo getProperty via orm method calls (which is the closure's delegate in above nirvana query).
Should point out that I am not looking to create a comprehensive LINQ for Groovy, but this one particular subset of LINQ I would love to see happen.

One of the biggest reasons for Guillaume to use an AST transform is because of the problem with "=". Even if you use == for the compare as normally done in Groovy, from the compareTo method that is called for it, you cannot make out a difference between ==, !=, <=, >=, <, >. There are two possible paths for this in later versions of Groovy that are in discussion. One is to use for each of those compares a different method, the other is to store a minimal AST which you can access at runtime. This goes in the direction of C# and is a quite powerful tool. The problem is more about how to do this efficiently.

Related

fuzzy logic for query-based document summarisation in Python

I am trying to use fuzzy logic to weight and extract the best sentences for the query. I have extracted the following features which they can be used in fuzzy logic:
Each sentence has cosine value.
How many proper-noun is in the sentence.
the position of the sentence in the document.
sentence length.
I want to use the above features to apply the fuzzy logic. for instance, i want to create the rule base something like the following
if cosineValue >= 0.9 && numberOfPropernoun >=1
THEN the sentence is important
I am not quite sure how to start implementing the rule base, the facts and inference engine. It would like someone to guide me to implement this in python. Please note that I am not familiar with logic programming languages. I would like to implement it in python
This is just a sketch; I'm not even going to try this code because I'm not sure what you want.
Make a class for your features:
Features = namedtuple('Features', ['cosine', 'nouns', 'position', ...])
etc.
Now imagine you are building your AST. What grammar does your language have? Well, you have conditions, and your conditions have consequences, and your conditions can be combined by boolean operators, so let's make some basic ones:
class CosineValue(object):
def evaluate(self, features):
return features.cosine
class Nouns(object):
def evaluate(self, features):
return features.nouns
... etc.
Now you need to combine these AST nodes with some operations
class GreaterThan(object):
def __init__(self, property, value):
self.property, self.value = property, value
def evaluate(self, sentence):
return property.evaluate(sentence) > self.value
Now GreaterThan(CosineValue(), 0.9) is an object (an abstract syntax tree, actually) that represents cosineValue > 0.9. You can evaluate it like so:
expr = GreaterThan(CosineValue(), 0.9)
expr.evaluate(Features(cosine=0.95, ...)) # returns True
expr.evaluate(Features(cosine=0.40, ...)) # returns False
These objects don't look like much, but what they are doing is reifying your process. Their structure encodes what formerly would have been code. Think about this, because this is the only hard part about what you are trying to do: comprehending how you can delay computation by turning it into structure, and how you can play with when values become part of your computation. You were probably stuck thinking about how to write those "if" statements and keeping them separate from the code and the runtime values you need to run them against. Now you should be able to see how, but it's a more advanced way of thinking about programming.
Now you need to build your if/then structure. I'm not sure what you want here either but I would say your if/then is going to be a class that takes an expression like we've just created as one argument and a "then" case, and does the test and either performs or does not perform the "then" case. Probably you will need if/then/else, or else a way to track if it fired, or a way to evaluate your if into a value. You will have to think about this part; nobody can tell you based on what you wrote above what you should be doing.
To make your conditions more powerful, you will need to add some more classes for boolean operators that take conditions as arguments, but it should be straightforward; you'll have And and Or, they'll both take two Condition arguments and their evaluation will do the sensible thing. You could make a Condition superclass, and then add some methods like And and Or to simplify generating these structures.
Finally, if you want to parse something like what you have above, you should try out pyparsing, but make sure you have the AST figured out first or it will be an uphill battle. Or look at what they have; maybe they have some primitives for this, I haven't dealt with pyparsing in a long time.
Best of luck, and please ask a better question next time!

Can LINQ be used in PowerShell?

I am trying to use LINQ in PowerShell. It seems like this should be entirely possible since PowerShell is built on top of the .NET Framework, but I cannot get it to work. For example, when I try the following (contrived) code:
$data = 0..10
[System.Linq.Enumerable]::Where($data, { param($x) $x -gt 5 })
I get the following error:
Cannot find an overload for "Where" and the argument count: "2".
Never mind the fact that this could be accomplished with Where-Object. The point of this question is not to find an idiomatic way of doing this one operation in PowerShell. Some tasks would be light-years easier to do in PowerShell if I could use LINQ.
The problem with your code is that PowerShell cannot decide to which specific delegate type the ScriptBlock instance ({ ... }) should be cast.
So it isn't able to choose a type-concrete delegate instantiation for the generic 2nd parameter of the Where method. And it also does't have syntax to specify a generic parameter explicitly. To resolve this problem, you need to cast the ScriptBlock instance to the right delegate type yourself:
$data = 0..10
[System.Linq.Enumerable]::Where($data, [Func[object,bool]]{ param($x) $x -gt 5 })
Why does [Func[object, bool]] work, but [Func[int, bool]] does not?
Because your $data is [object[]], not [int[]], given that PowerShell creates [object[]] arrays by default; you can, however, construct [int[]] instances explicitly:
$intdata = [int[]]$data
[System.Linq.Enumerable]::Where($intdata, [Func[int,bool]]{ param($x) $x -gt 5 })
To complement PetSerAl's helpful answer with a broader answer to match the question's generic title:
Note: The following applies up to at least PowerShell 7.2. Direct support for LINQ - with syntax comparable to the one in C# - is being discussed for a future version of PowerShell Core in GitHub issue #2226.
Using LINQ in PowerShell:
You need PowerShell v3 or higher.
You cannot call the LINQ extension methods directly on collection instances and instead must invoke the LINQ methods as static methods of the [System.Linq.Enumerable] type to which you pass the input collection as the first argument.
Having to do so takes away the fluidity of the LINQ API, because method chaining is no longer an option. Instead, you must nest static calls, in reverse order.
E.g., instead of $inputCollection.Where(...).OrderBy(...) you must write [Linq.Enumerable]::OrderBy([Linq.Enumerable]::Where($inputCollection, ...), ...)
Helper functions and classes:
Some methods, such as .Select(), have parameters that accept generic Func<> delegates (e.g, Func<T,TResult> can be created using PowerShell code, via a cast applied to a script block; e.g.:
[Func[object, bool]] { $Args[0].ToString() -eq 'foo' }
The first generic type parameter of Func<> delegates must match the type of the elements of the input collection; keep in mind that PowerShell creates [object[]] arrays by default.
Some methods, such as .Contains() and .OrderBy have parameters that accept objects that implement specific interfaces, such as IEqualityComparer<T> and IComparer<T>; additionally, input types may need to implement IEquatable<T> in order for comparisons to work as intended, such as with .Distinct(); all these require compiled classes written, typically, in C# (though you can create them from PowerShell by passing a string with embedded C# code to the Add-Type cmdlet); in PSv5+, however, you may also use custom PowerShell classes, with some limitations.
Generic methods:
Some LINQ methods themselves are generic and therefore require one or more type arguments.
In PowerShell (Core) 7.2- and Windows PowerShell, PowerShell cannot directly call such methods and must use reflection instead, because it only supports inferring type arguments, which cannot be done in this case; e.g.:
# Obtain a [string]-instantiated method of OfType<T>.
$ofTypeString = [Linq.Enumerable].GetMethod("OfType").MakeGenericMethod([string])
# Output only [string] elements in the collection.
# Note how the array must be nested for the method signature to be recognized.
PS> $ofTypeString.Invoke($null, (, ('abc', 12, 'def')))
abc
def
For a more elaborate example, see this answer.
In PowerShell (Core) 7.3+, you now have the option of specifying type arguments explicitly (see the conceptual about_Calling_Generic_Methods help topic); e.g.:
# Output only [string] elements in the collection.
# Note the need to enclose the input array in (...)
# -> 'abc', 'def'
[Linq.Enumerable]::OfType[string](('abc', 12, 'def'))
The LINQ methods return a lazy enumerable rather than an actual collection; that is, what is returned isn't the actual data yet, but something that will produce the data when enumerated.
In contexts where enumeration is automatically performed, notably in the pipeline, you'll be able to use the enumerable as if it were a collection.
However, since the enumerable isn't itself a collection, you cannot get the result count by invoking .Count nor can you index into the iterator; however, you can use member-access enumeration (extracting the values of a property of the objects being enumerated).
If you do need the results as a static array to get the usual collection behavior, wrap the invocation in [Linq.Enumerable]::ToArray(...).
Similar methods that return different data structures exist, such as ::ToList().
For an advanced example, see this answer.
For an overview of all LINQ methods including examples, see this great article.
In short: using LINQ from PowerShell is cumbersome and is only worth the effort if any of the following apply:
you need advanced query features that PowerShell's cmdlets cannot provide.
performance is paramount - see this article.
If you want to achieve LINQ like functionality then PowerShell has some cmdlets and functions, for instance: Select-Object, Where-Object, Sort-Object, Group-Object. It has cmdlets for most of LINQ features like Projection, Restriction, Ordering, Grouping, Partitioning, etc.
See Powershell One-Liners: Collections and LINQ.
For more details on using Linq and possibly how to make it easier, the article LINQ Through Powershell may be helpful.
I ran accross LINQ, when wanting to have a stable sort in PowerShell (stable: if property to sort by has the same value on two (or more) elements: preserve their order). Sort-Object has a -Stable-Switch, but only in PS 6.1+. Also, the Sort()-Implementations in the Generic Collections in .NET are not stable, so I came accross LINQ, where documentation says it's stable.
Here's my (Test-)Code:
# Getting a stable sort in PowerShell, using LINQs OrderBy
# Testdata
# Generate List to Order and insert Data there. o will be sequential Number (original Index), i will be Property to sort for (with duplicates)
$list = [System.Collections.Generic.List[object]]::new()
foreach($i in 1..10000){
$list.Add([PSCustomObject]#{o=$i;i=$i % 50})
}
# Sort Data
# Order Object by using LINQ. Note that OrderBy does not sort. It's using Delayed Evaluation, so it will sort only when GetEnumerator is called.
$propertyToSortBy = "i" # if wanting to sort by another property, set its name here
$scriptBlock = [Scriptblock]::Create("param(`$x) `$x.$propertyToSortBy")
$resInter = [System.Linq.Enumerable]::OrderBy($list, [Func[object,object]]$scriptBlock )
# $resInter.GetEnumerator() | Out-Null
# $resInter is of Type System.Linq.OrderedEnumerable<...>. We'll copy results to a new Generic List
$res = [System.Collections.Generic.List[object]]::new()
foreach($elem in $resInter.GetEnumerator()){
$res.Add($elem)
}
# Validation
# Check Results. If PropertyToSort is the same as in previous record, but previous sequence-number is higher, than the Sort has not been stable
$propertyToSortBy = "i" ; $originalOrderProp = "o"
for($i = 1; $i -lt $res.Count ; $i++){
if(($res[$i-1].$propertyToSortBy -eq $res[$i].$propertyToSortBy) -and ($res[$i-1].$originalOrderProp -gt $res[$i].$originalOrderProp)){
Write-host "Error on line $i - Sort is not Stable! $($res[$i]), Previous: $($res[$i-1])"
}
}
There is a simple way to make Linq chaining fluent, by setting a using statement to the Linq namespace, Then you can call the where function directly, no need to call the static Where function.
using namespace System.Linq
$b.Where({$_ -gt 0})
$b is an array of bytes, and I want to get all bytes that are greater than 0.
Works perfect.

Can you zip or use a custom join operator in non-expression linq?

I have multiple sequences I want to zip in a more readable way such that I could write
a.Zip(b)
in query syntax as
From a in Foo1
Join b in Foo2
without the result being a cross join
or
From a in Foo1
Zip b in Foo2
The standard Zip method allows you to write something like (using C#):
from t in first.Zip(second, (f, s) => new { First = f, Second = s }
select ... t.First ... t.Second;
I think this should be readable enough. Your question seems to suggest that you'd like to be able to create your own keyword e.g. zip and extend the C# query expressions using it. This is not possible in C# or Visual Basic (but I agree it would be nice in a way).
With some effort, you can redefine what standard C# query construct does, so that join would behave like zip (In that case, the equals part of the query would not be needed, so you'd have a lot of syntactic noise). Possibly, you could also redefine what from clause. I didn't try it, but I believe you could get something like:
from f in first.Zip()
from s in second.AddToZip()
from t in second.AddToZip()
select ... f ... s ... t ...;
I wrote an article that describes how to do this redefining for the group by clause (Using custom grouping operator in LINQ), so this can give you an idea how this can be done.
(But honestly, I think that the standard Zip method should be fine. The redefinition of operators is quite subtle. The group by example may be more appealing because grouping using method is uglier, but even that is on the edge...)

How LINQ works internally?

I love using LINQ in .NET, but I want to know how that works internally?
It makes more sense to ask about a particular aspect of LINQ. It's a bit like asking "How Windows works" otherwise.
The key parts of LINQ are for me, from a C# perspective:
Expression trees. These are representations of code as data. For instance, an expression tree could represent the notion of "take a string parameter, call the Length property on it, and return the result". The fact that these exist as data rather than as compiled code means that LINQ providers such as LINQ to SQL can analyze them and convert them into SQL.
Lambda expressions. These are expressions like this:
x => x * 2
(int x, int y) => x * y
() => { Console.WriteLine("Block"); Console.WriteLine("Lambda"); }
Lambda expressions are converted either into delegates or expression trees.
Anonymous types. These are expressions like this:
new { X=10, Y=20 }
These are still statically typed, it's just the compiler generates an immutable type for you with properties X and Y. These are usually used with var which allows the type of a local variable to be inferred from its initialization expression.
Query expressions. These are expressions like this:
from person in people
where person.Age < 18
select person.Name
These are translated by the C# compiler into "normal" C# 3.0 (i.e. a form which doesn't use query expressions). Overload resolution etc is applied afterwards, which is absolutely key to being able to use the same query syntax with multiple data types, without the compiler having any knowledge of types such as Queryable. The above expression would be translated into:
people.Where(person => person.Age < 18)
.Select(person => person.Name)
Extension methods. These are static methods which can be used as if they were instance methods of the type of the first parameter. For example, an extension method like this:
public static int CountAsciiDigits(this string text)
{
return text.Count(letter => letter >= '0' && letter <= '9');
}
can then be used like this:
string foo = "123abc456";
int count = foo.CountAsciiDigits();
Note that the implementation of CountAsciiDigits uses another extension method, Enumerable.Count().
That's most of the relevant language aspects. Then there are the implementations of the standard query operators, in LINQ providers such as LINQ to Objects and LINQ to SQL etc. I have a presentation about how it's reasonably simple to implement LINQ to Objects - it's on the "Talks" page of the C# in Depth web site.
The way providers such as LINQ to SQL work is generally via the Queryable class. At their core, they translate expression trees into other query formats, and then construct appropriate objects with the results of executing those out-of-process queries.
Does that cover everything you were interested in? If there's anything in particular you still want to know about, just edit your question and I'll have a go.
LINQ is basically a combination of C# 3.0 discrete features of these:
local variable type inference
auto properties (not implemented in VB 9.0)
extension methods
lambda expressions
anonymous type initializers
query comprehension
For more information about the journey to get there (LINQ), see this video of Anders in LANGNET 2008:
http://download.microsoft.com/download/c/e/5/ce5434ca-4f54-42b1-81ea-7f5a72f3b1dd/1-01%20-%20CSharp3%20-%20Anders%20Hejlsberg.wmv
In simple a form, the compiler takes your code-query and converts it into a bunch of generic classes and calls. Underneath, in case of Linq2Sql, a dynamic SQL query gets constructed and executed using DbCommand, DbDataReader etc.
Say you have:
var q = from x in dc.mytable select x;
it gets converted into following code:
IQueryable<tbl_dir_office> q =
dc.mytable.Select<tbl_dir_office, tbl_dir_office>(
Expression.Lambda<Func<mytable, mytable>>(
exp = Expression.Parameter(typeof(mytable), "x"),
new ParameterExpression[] { exp }
)
);
Lots of generics, huge overhead.
Basically linq is a mixture of some language facilities (compiler) and some framework extensions. So when you write linq queries, they get executed using appropriate interfaces such as IQuerable. Also note that the runtime has no role in linq.
But it is difficult to do justice to linq in a short answer. I recommend you read some book to get yourself in it. I am not sure about the book that tells you internals of Linq but Linq in Action gives a good handson about it.

How does coding with LINQ work? What happens behind the scenes?

For example:
m_lottTorqueTools = (From t In m_lottTorqueTools _
Where Not t.SlotNumber = toolTuple.SlotNumber _
And Not t.StationIndex = toolTuple.StationIndex).ToList
What algorithm occurs here? Is there a nested for loop going on in the background? Does it construct a hash table for these fields? I'm curious.
Query expressions are translated into extension method calls, usually. (They don't have to be, but 99.9% of queries use IEnumerable<T> or IQueryable<T>.)
The exact algorithm of what that method does varies from method to method. Your sample query wouldn't use any hash tables, but joins or grouping operations do, for example.
The simple Where call translates to something like this in C# (using iterator blocks, which aren't available in VB at the moment as far as I'm aware):
public static IEnumerable<T> Where(this IEnumerable<T> source,
Func<T, bool> predicate)
{
// Argument checking omitted
foreach (T element in source)
{
if (predicate(element))
{
yield return element;
}
}
}
The predicate is provided as a delegate (or an expression tree if you're using IQueryable<T>) and is called on each item in the sequence. The results are streamed and execution is deferred - in other words, nothing happens until you start asking for items from the result, and even then it only does as much as it needs to in order to provide the next result. Some operators aren't deferred (basically the ones which return a single value instead of a sequence) and some buffer the input (e.g. Reverse has to read to the end of the sequence before it can return any results, because the last result it reads is the first one it has to yield).
It's beyond the scope of a single answer to give details of every single LINQ operator I'm afraid, but if you have questions about specific ones I'm sure we can oblige.
I should add that if you're using LINQ to SQL or another provider that's based on IQueryable<T>, things are rather different. The Queryable class builds up the query (with the help of the provider, which implements IQueryable<T> to start with) and then the query is generally translated into a more appropriate form (e.g. SQL) by the provider. The exact details (including buffering, streaming etc) will entirely depend on the provider.
LINQ in general has a lot going on behind the scenes. With any query, it is first translated into an expression tree using an IQueryableProvider and from there the query is typically compiled into CIL code and a delegate is generated pointing to this function, which you are essentially using whenever you call the query. That's an extremely simplified overview - if you want to read a great article on the subject, I recommend you look at How LINQ Works - Creating Queries. Jon Skeet also posted a good answer on this site to this question.
What happens not only depends on the methods, but also depends on the LINQ provider in use. In the case where an IQueryable<T> is being returned, it is the LINQ provider that interprets the expression tree and processes it however it likes.

Resources