From 2b04bb497f2ba433afd062349afdb0f76fd810f9 Mon Sep 17 00:00:00 2001 From: Matt Warren Date: Fri, 4 Oct 2019 17:35:24 -0700 Subject: [PATCH] Add remaining blog posts and fix up a first two. --- ...building-part-1.md => building-part-01.md} | 13 +- docs/blog/building-part-02.md | 991 +++++++ docs/blog/building-part-03.md | 419 +++ docs/blog/building-part-04.md | 744 +++++ docs/blog/building-part-05.md | 2533 +++++++++++++++++ docs/blog/building-part-06.md | 727 +++++ docs/blog/building-part-07.md | 698 +++++ docs/blog/building-part-08.md | 822 ++++++ docs/blog/building-part-09.md | 746 +++++ docs/blog/building-part-10.md | 1842 ++++++++++++ docs/blog/building-part-11.md | 74 + docs/blog/building-part-12.md | 642 +++++ docs/blog/building-part-13.md | 888 ++++++ docs/blog/building-part-14.md | 243 ++ docs/blog/building-part-15.md | 395 +++ docs/blog/building-part-16.md | 189 ++ docs/blog/building-part-17.md | 432 +++ 17 files changed, 12394 insertions(+), 4 deletions(-) rename docs/blog/{building-part-1.md => building-part-01.md} (98%) create mode 100644 docs/blog/building-part-02.md create mode 100644 docs/blog/building-part-03.md create mode 100644 docs/blog/building-part-04.md create mode 100644 docs/blog/building-part-05.md create mode 100644 docs/blog/building-part-06.md create mode 100644 docs/blog/building-part-07.md create mode 100644 docs/blog/building-part-08.md create mode 100644 docs/blog/building-part-09.md create mode 100644 docs/blog/building-part-10.md create mode 100644 docs/blog/building-part-11.md create mode 100644 docs/blog/building-part-12.md create mode 100644 docs/blog/building-part-13.md create mode 100644 docs/blog/building-part-14.md create mode 100644 docs/blog/building-part-15.md create mode 100644 docs/blog/building-part-16.md create mode 100644 docs/blog/building-part-17.md diff --git a/docs/blog/building-part-1.md b/docs/blog/building-part-01.md similarity index 98% rename from docs/blog/building-part-1.md rename to docs/blog/building-part-01.md index 2188545..1edd6f5 100644 --- a/docs/blog/building-part-1.md +++ b/docs/blog/building-part-01.md @@ -1,7 +1,11 @@ -# LINQ: Building an IQueryable Provider - Part I +# LINQ: Building an IQueryable Provider - Part I: Reusable IQueryable base classes + +Matt Warren - MSFT; July 30, 2007 This is the first in a series of posts on how to build a LINQ IQueryable provider. Each post builds on the last one. +--- + I’ve been meaning for a while to start up a series of posts that covers building LINQ providers using `IQueryable`. People have been asking me advice on doing this for quite some time now, whether through internal Microsoft email or questions on the forums or by cracking the encryption and mailing me directly. @@ -41,7 +45,7 @@ The second property gives you the expression that corresponds to the query. This is quintessential essence of IQueryable’s being. The actual ‘query’ underneath the hood of an `IQueryable` is an expression that represents the query as a tree of LINQ query operators/method calls. This is the part of the `IQueryable` that your provider must comprehend in order to do anything useful. -If you look deeper you will see that the whole `IQueryable` infrastructure (including the System.Linq.Queryable version of LINQ standard query operators) +If you look deeper you will see that the whole `IQueryable` infrastructure (including the `System.Linq.Queryable` version of LINQ standard query operators) is just a mechanism to auto-construct expression tree nodes for you. When you use the `Queryable.Where` method to apply a filter to an `IQueryable`, it simply builds you a new `IQueryable` adding a method-call expression node on top of the tree representing the call you just made to `Queryable.Where`. @@ -221,8 +225,9 @@ That And that’s what I’ll start showing next time. ---- -UPDATE: +
+ +## UPDATE: It looks like I’ve forget to define a little helper class my implementation was using, so here it is: diff --git a/docs/blog/building-part-02.md b/docs/blog/building-part-02.md new file mode 100644 index 0000000..e026aa1 --- /dev/null +++ b/docs/blog/building-part-02.md @@ -0,0 +1,991 @@ +# LINQ: Building an IQueryable Provider – Part II: Where and a reusable Expression tree visitor + +Matt Warren - MSFT; July 31, 2007 + +This is the second post in a multi-post series on building LINQ providers using the IQueryable interface. +If you are new to this series please read the following post before proceeding. + +--- + +Now, that I’ve laid the groundwork defining a reusable version of `IQueryable` and `IQueryProvider`, namely `Query` and `QueryProvider`, +I’m going to build a provider that actually does something. +As I said before, what a query provider really does is execute a little bit of ‘code’ defined as an expression tree instead of actual IL. +Of course, it does not actually have to execute it in the traditional sense. +For example, LINQ to SQL translates the query expression into SQL and sends it to the server to execute it. + + +My sample below is going to work much like LINQ to SQL in that it translates and executes a query against an ADO provider. +However, I must add a disclaimer here, this sample is not a full-fledge provider in any sense. +I’m only going to handle translating the ‘Where’ operation and I’m not even going to attempt to do anything more complicated than allow the predicate to contain a field reference and few simple operators. +I may expand on this provider in the future, but for now it is mostly for illustrative purposes only. +Please don’t copy and paste and expect to have ship-quality code. + + +The provider is going to basically do two things: + 1) translate the query into a SQL command text + 2) translate the result of executing the command into objects. + +
+ +## The Query Translator + + +The query translator is going to simply visit each node in the query’s expression tree and translate the supported operations into text using a StringBuilder. For the sake of clarity assume there is a class called ExpressionVisitor that defines the base visitor pattern for Expression nodes. (I promise I’ll actually include the code for this at the end of the post but for now bear with me.) + + +```csharp +internal class QueryTranslator : ExpressionVisitor +{ + StringBuilder sb; + + internal QueryTranslator() + { + } + + internal string Translate(Expression expression) + { + this.sb = new StringBuilder(); + this.Visit(expression); + return this.sb.ToString(); + } + + private static Expression StripQuotes(Expression e) + { + while (e.NodeType == ExpressionType.Quote) + { + e = ((UnaryExpression)e).Operand; + } + + return e; + } + + protected override Expression VisitMethodCall(MethodCallExpression m) + { + if (m.Method.DeclaringType == typeof(Queryable) && m.Method.Name == "Where") + { + sb.Append("SELECT * FROM ("); + this.Visit(m.Arguments[0]); + sb.Append(") AS T WHERE "); + LambdaExpression lambda = (LambdaExpression)StripQuotes(m.Arguments[1]); + this.Visit(lambda.Body); + return m; + } + + throw new NotSupportedException(string.Format("The method '{0}' is not supported", m.Method.Name)); + } + + protected override Expression VisitUnary(UnaryExpression u) + { + switch (u.NodeType) + { + case ExpressionType.Not: + sb.Append(" NOT "); + this.Visit(u.Operand); + break; + + default: + throw new NotSupportedException(string.Format("The unary operator '{0}' is not supported", u.NodeType)); + } + + return u; + } + + protected override Expression VisitBinary(BinaryExpression b) + { + sb.Append("("); + + this.Visit(b.Left); + + switch (b.NodeType) + { + case ExpressionType.And: + sb.Append(" AND "); + break; + + case ExpressionType.Or: + sb.Append(" OR"); + break; + + case ExpressionType.Equal: + sb.Append(" = "); + break; + + case ExpressionType.NotEqual: + sb.Append(" <> "); + break; + + case ExpressionType.LessThan: + sb.Append(" < "); + break; + + case ExpressionType.LessThanOrEqual: + sb.Append(" <= "); + break; + + case ExpressionType.GreaterThan: + sb.Append(" > "); + break; + + case ExpressionType.GreaterThanOrEqual: + sb.Append(" >= "); + break; + + default: + throw new NotSupportedException(string.Format("The binary operator '{0}' is not supported", b.NodeType)); + } + + this.Visit(b.Right); + + sb.Append(")"); + + return b; + } + + protected override Expression VisitConstant(ConstantExpression c) + { + IQueryable q = c.Value as IQueryable; + + if (q != null) + { + // assume constant nodes w/ IQueryables are table references + sb.Append("SELECT * FROM "); + sb.Append(q.ElementType.Name); + } + else if (c.Value == null) + { + sb.Append("NULL"); + } + else + { + switch (Type.GetTypeCode(c.Value.GetType())) + { + case TypeCode.Boolean: + sb.Append(((bool)c.Value) ? 1 : 0); + break; + + case TypeCode.String: + sb.Append("'"); + sb.Append(c.Value); + sb.Append("'"); + break; + + case TypeCode.Object: + throw new NotSupportedException(string.Format("The constant for '{0}' is not supported", c.Value)); + + default: + sb.Append(c.Value); + break; + } + } + + return c; + } + + protected override Expression VisitMemberAccess(MemberExpression m) + { + if (m.Expression != null && m.Expression.NodeType == ExpressionType.Parameter) + { + sb.Append(m.Member.Name); + return m; + } + + throw new NotSupportedException(string.Format("The member '{0}' is not supported", m.Member.Name)); + } +} +``` + + + + +As you can see, there’s not much there and still it is rather complicated. +What I’m expecting to see in the expression tree is at most a method call node with arguments referring to the source (argument 0) and the predicate (argument 1). +Take a look at the VisitMethodCall method above. +I explicitly handle the case of the `Queryable.Where` method, generating a “SELECT * FROM (“, recursively visiting the source and then appending “) AS T WHERE “, +and then visiting the predicate. +This allows for other query operators present in the source expression to be nested sub-queries. +I don’t handle any other operators, but if more than one call to Where is used then I’m able to deal with that gracefully. +It doesn’t matter what the table alias is that is used (I picked ‘T’) since I’m not going to generate references to it anyway. +A more full-fledged provider would of course want to do this. + + +There’s a little helper method included called `StripQutotes`. +Its job is to strip away any `ExpressionType.Quote` nodes on the method arguments (which may happen) so I can get at the pure lambda expression that I’m looking for. + + +The `VisitUnary` and `VisitBinary` methods are straightforward. +They simply inject the correct text for the specific unary and binary operators I’ve chosen to support. +The interesting bit of translation comes in the `VisitConstant` method. +You see, table references in my world are just the root IQueryable’s. +If a constant node holds one of my `Query` instances then I’ll just assume it’s meant to represent the root table +so I’ll append “SELECT * FROM” and then the name of the table which is simply the name of the element type of the query. +The rest of the translation for constant nodes just deals with actual constants. +Note, these constants are added to the command text as literal values. +There is nothing here to stop injection attacks that real providers need to deal with. + + +Finally, `VisitMemberAccess` assumes that all field or property accesses are meant to be column references in the command text. +No checking is done to prove that this is true. +The name of the field or property is assumed to match the name of the column in the database. + + +Given a class `Customers` with fields matching the names of the columns in the Northwind sample database, this query translator will generate queries that look like this: + + +#### For the query: + +```csharp +Query customers = ...; +IQueryable q = customers.Where(c => c.City == "London"); +``` + +#### Resulting SQL: + +```SQL +SELECT * FROM (SELECT *FROM Customers) AS T WHERE (city = ‘London’) +``` + +
+ +## The Object Reader + +The job of the object reader is to turn the results of a SQL query into objects. +I’m going to build a simple class that takes a `DbDataReader` and a type `T` and I’ll make it implement `IEnumerable`. +There are no bells and whistles in this implementation. +It will only work for writing into class fields via reflection. +The names of the fields must match the names of the columns in the reader and the types must match whatever the `DataReader` thinks is the correct type. + +```csharp +internal class ObjectReader : IEnumerable, IEnumerable where T : class, new() +{ + Enumerator enumerator; + + internal ObjectReader(DbDataReader reader) + { + this.enumerator = new Enumerator(reader); + } + + public IEnumerator GetEnumerator() + { + Enumerator e = this.enumerator; + + if (e == null) + { + throw new InvalidOperationException("Cannot enumerate more than once"); + } + + this.enumerator = null; + return e; + } + + IEnumerator IEnumerable.GetEnumerator() + { + return this.GetEnumerator(); + } + + class Enumerator : IEnumerator, IEnumerator, IDisposable + { + DbDataReader reader; + FieldInfo[] fields; + int[] fieldLookup; + T current; + + internal Enumerator(DbDataReader reader) + { + this.reader = reader; + this.fields = typeof(T).GetFields(); + } + + public T Current + { + get { return this.current; } + } + + object IEnumerator.Current + { + get { return this.current; } + } + + public bool MoveNext() + { + if (this.reader.Read()) + { + if (this.fieldLookup == null) + { + this.InitFieldLookup(); + } + + T instance = new T(); + + for (int i = 0, n = this.fields.Length; i < n; i++) + { + int index = this.fieldLookup[i]; + + if (index >= 0) + { + FieldInfo fi = this.fields[i]; + + if (this.reader.IsDBNull(index)) + { + fi.SetValue(instance, null); + } + else + { + fi.SetValue(instance, this.reader.GetValue(index)); + } + } + } + + this.current = instance; + + return true; + } + + return false; + } + + public void Reset() + { + } + + public void Dispose() + { + this.reader.Dispose(); + } + + private void InitFieldLookup() + { + var map = new Dictionary(StringComparer.InvariantCultureIgnoreCase); + + for (int i = 0, n = this.reader.FieldCount; i < n; i++) + { + map.Add(this.reader.GetName(i), i); + } + + this.fieldLookup = new int[this.fields.Length]; + + for (int i = 0, n = this.fields.Length; i < n; i++) + { + int index; + + if (map.TryGetValue(this.fields[i].Name, out index)) + { + this.fieldLookup[i] = index; + } + else + { + this.fieldLookup[i] = -1; + } + } + } + } +} +``` + +The `ObjectReader` creates a new instance of type `T` for each row read by the `DbDataReader`. +It uses the reflection API `FieldInfo.SetValue` to assign values to each field of the object. +When the `ObjectReader` is first created it instantiates an instance of the nested `Enumerator` class. +This enumerator is handed out when the `GetEnumerator` method is called. Since `DataReader`’s cannot reset and execute again, +the enumerator can only be handed out once. +If `GetEnumerator` is called a second time an exception is thrown. + + +The `ObjectReader` is lenient in the ordering of fields. +Since the `QueryTranslator` builds queries using `SELECT *` this is a must because otherwise the code has no way of knowing which column will appear first in the results. +Note, that it is generally inadvisable to use `SELECT *` in production code. +Remember this is just an illustrative sample to show how in general to put together a LINQ provider. +In order to allow for different sequences of columns, the precise sequence is discovered at runtime when the first row is read for the `DataReader`. +The `InitFieldLookup` function builds a map from column name to column ordinal and then assembles a lookup table `fieldLookup` that maps between the object’s fields and the ordinals. + +
+ +## The Provider + + +Now that we have these two pieces (and the classes define in the prior post) it’s quite easy to combine them together to make an actual IQueryable LINQ provider. + + +```csharp +public class DbQueryProvider : QueryProvider +{ + private readonly DbConnection connection; + + public DbQueryProvider(DbConnection connection) + { + this.connection = connection; + } + + public override string GetQueryText(Expression expression) + { + return this.Translate(expression); + } + + public override object Execute(Expression expression) + { + DbCommand cmd = this.connection.CreateCommand(); + cmd.CommandText = this.Translate(expression); + DbDataReader reader = cmd.ExecuteReader(); + Type elementType = TypeSystem.GetElementType(expression.Type); + + return Activator.CreateInstance( + typeof(ObjectReader<>).MakeGenericType(elementType), + BindingFlags.Instance | BindingFlags.NonPublic, null, + new object[] { reader }, + null); + } + + private string Translate(Expression expression) + { + return new QueryTranslator().Translate(expression); + } +} +``` + +As you can see, building a provider is now simply an exercise in combining these two pieces. +The `GetQueryText` method just needs to use the `QueryTranslator` to produce the command text. +The `Execute` method uses both `QueryTranslator` and `ObjectReader` to build a `DbCommand` object, execute it and return the results as an `IEnumerable`. + +
+ +### Trying it Out + + +Now that we have our provider we can try it out. +Since I’m basically following the LINQ to SQL model I’ll define a class for the Customers table, +a ‘Context’ that holds onto the tables (root queries) and a little program that uses them. + + +```csharp +public class Customers +{ + public string CustomerID; + public string ContactName; + public string Phone; + public string City; + public string Country; +} + +public class Orders +{ + public int OrderID; + public string CustomerID; + public DateTime OrderDate; +} + +public class Northwind +{ + public Query Customers; + public Query Orders; + + public Northwind(DbConnection connection) + { + QueryProvider provider = new DbQueryProvider(connection); + this.Customers = new Query(provider); + this.Orders = new Query(provider); + } +} + +class Program +{ + static void Main(string[] args) + { + string constr = @"…"; + + using (SqlConnection con = new SqlConnection(constr)) + { + con.Open(); + Northwind db = new Northwind(con); + + IQueryable query = + db.Customers.Where(c => c.City == "London"); + + Console.WriteLine("Query:\n{0}\n", query); + + var list = query.ToList(); + + foreach (var item in list) + { + Console.WriteLine("Name: {0}", item.ContactName); + } + + Console.ReadLine(); + } + } +} +``` + + +Now if we run this we should get the following output: (Note that you will have to add your own connection string to the program above.) + +``` +Query: SELECT * FROM (SELECT * FROM Customers) AS T WHERE (City = 'London') + +Name: Thom7as Hardy +Name: Victoria Ashworth +Name: Elizabeth Brown +Name: Ann Devon +Name: Simon Crowther +Name: Hari Kumar +``` + +Excellent, just what I wanted. I love it when a plan comes together. + + +That’s it folks. That’s a LINQ IQueryable provider. Well, at least a crude facsimile of one. +Of course, yours will do so much more than mine. It will handle all the edge cases and serve coffee. + +
+ +## APPENDIX – The Expression Visitor + + +Now you are in for it. I think I’ve received about an order of magnitude more requests for this class than for help on building a query provider. +There is an `ExpressionVisitor` class in `System.Linq.Expressions`, however it is internal so it’s not for your direct consumption as much as you’d like it to be. +If you shout real loud you might convince us to make that one public in the next go ‘round. + + +This expression visitor is my take on the (classic) visitor pattern. +In this variant there is only one visitor class that dispatches calls to the general Visit function out to specific `VisitXXX` methods corresponding to different node types. +Note not every node type gets it own method, for example all binary operators are treated in one `VisitBinary` method. +The nodes themselves do not directly participate in the visitation process. They are treated as just data. +The reason for this is that the quantity of visitors is actually open ended. You can write your own. +Therefore no semantics of visiting is coupled into the node classes. It’s all in the visitors. +The default visit behavior for node XXX is baked into the base class’s version of VisitXXX. + + +Another variant is that all `VisitXXX` methods return a node. The `Expression` tree nodes are immutable. +In order to change the tree you must construct a new one. The default `VisitXXX` methods will construct a new node if any of its sub-trees change. +If no changes are made then the same node is returned. +That way if you make a change to a node (by making a new node) deep down in a tree, the rest of the tree is rebuilt automatically for you. + + +Here’s the code. Enjoy. + + +```csharp +public abstract class ExpressionVisitor +{ + protected ExpressionVisitor() + { + } + + protected virtual Expression Visit(Expression exp) + { + if (exp == null) + return exp; + + switch (exp.NodeType) + { + case ExpressionType.Negate: + case ExpressionType.NegateChecked: + case ExpressionType.Not: + case ExpressionType.Convert: + case ExpressionType.ConvertChecked: + case ExpressionType.ArrayLength: + case ExpressionType.Quote: + case ExpressionType.TypeAs: + return this.VisitUnary((UnaryExpression)exp); + + case ExpressionType.Add: + case ExpressionType.AddChecked: + case ExpressionType.Subtract: + case ExpressionType.SubtractChecked: + case ExpressionType.Multiply: + case ExpressionType.MultiplyChecked: + case ExpressionType.Divide: + case ExpressionType.Modulo: + case ExpressionType.And: + case ExpressionType.AndAlso: + case ExpressionType.Or: + case ExpressionType.OrElse: + case ExpressionType.LessThan: + case ExpressionType.LessThanOrEqual: + case ExpressionType.GreaterThan: + case ExpressionType.GreaterThanOrEqual: + case ExpressionType.Equal: + case ExpressionType.NotEqual: + case ExpressionType.Coalesce: + case ExpressionType.ArrayIndex: + case ExpressionType.RightShift: + case ExpressionType.LeftShift: + case ExpressionType.ExclusiveOr: + return this.VisitBinary((BinaryExpression)exp); + + case ExpressionType.TypeIs: + return this.VisitTypeIs((TypeBinaryExpression)exp); + + case ExpressionType.Conditional: + return this.VisitConditional((ConditionalExpression)exp); + + case ExpressionType.Constant: + return this.VisitConstant((ConstantExpression)exp); + + case ExpressionType.Parameter: + return this.VisitParameter((ParameterExpression)exp); + + case ExpressionType.MemberAccess: + return this.VisitMemberAccess((MemberExpression)exp); + + case ExpressionType.Call: + return this.VisitMethodCall((MethodCallExpression)exp); + + case ExpressionType.Lambda: + return this.VisitLambda((LambdaExpression)exp); + + case ExpressionType.New: + return this.VisitNew((NewExpression)exp); + + case ExpressionType.NewArrayInit: + case ExpressionType.NewArrayBounds: + return this.VisitNewArray((NewArrayExpression)exp); + + case ExpressionType.Invoke: + return this.VisitInvocation((InvocationExpression)exp); + + case ExpressionType.MemberInit: + return this.VisitMemberInit((MemberInitExpression)exp); + + case ExpressionType.ListInit: + return this.VisitListInit((ListInitExpression)exp); + + default: + throw new Exception(string.Format("Unhandled expression type: '{0}'", exp.NodeType)); + } + } + + protected virtual MemberBinding VisitBinding(MemberBinding binding) + { + switch (binding.BindingType) + { + case MemberBindingType.Assignment: + return this.VisitMemberAssignment((MemberAssignment)binding); + + case MemberBindingType.MemberBinding: + return this.VisitMemberMemberBinding((MemberMemberBinding)binding); + + case MemberBindingType.ListBinding: + return this.VisitMemberListBinding((MemberListBinding)binding); + + default: + throw new Exception(string.Format("Unhandled binding type '{0}'", binding.BindingType)); + } + } + + protected virtual ElementInit VisitElementInitializer(ElementInit initializer) + { + ReadOnlyCollection arguments = this.VisitExpressionList(initializer.Arguments); + + if (arguments != initializer.Arguments) + { + return Expression.ElementInit(initializer.AddMethod, arguments); + } + + return initializer; + } + + protected virtual Expression VisitUnary(UnaryExpression u) + { + Expression operand = this.Visit(u.Operand); + + if (operand != u.Operand) + { + return Expression.MakeUnary(u.NodeType, operand, u.Type, u.Method); + } + + return u; + } + + protected virtual Expression VisitBinary(BinaryExpression b) + { + Expression left = this.Visit(b.Left); + Expression right = this.Visit(b.Right); + Expression conversion = this.Visit(b.Conversion); + + if (left != b.Left || right != b.Right || conversion != b.Conversion) + { + if (b.NodeType == ExpressionType.Coalesce && b.Conversion != null) + return Expression.Coalesce(left, right, conversion as LambdaExpression); + else + return Expression.MakeBinary(b.NodeType, left, right, b.IsLiftedToNull, b.Method); + } + + return b; + } + + protected virtual Expression VisitTypeIs(TypeBinaryExpression b) + { + Expression expr = this.Visit(b.Expression); + + if (expr != b.Expression) + { + return Expression.TypeIs(expr, b.TypeOperand); + } + + return b; + } + + protected virtual Expression VisitConstant(ConstantExpression c) + { + return c; + } + + protected virtual Expression VisitConditional(ConditionalExpression c) + { + Expression test = this.Visit(c.Test); + Expression ifTrue = this.Visit(c.IfTrue); + Expression ifFalse = this.Visit(c.IfFalse); + + if (test != c.Test || ifTrue != c.IfTrue || ifFalse != c.IfFalse) + { + return Expression.Condition(test, ifTrue, ifFalse); + } + + return c; + } + + protected virtual Expression VisitParameter(ParameterExpression p) + { + return p; + } + + protected virtual Expression VisitMemberAccess(MemberExpression m) + { + Expression exp = this.Visit(m.Expression); + + if (exp != m.Expression) + { + return Expression.MakeMemberAccess(exp, m.Member); + } + + return m; + } + + protected virtual Expression VisitMethodCall(MethodCallExpression m) + { + Expression obj = this.Visit(m.Object); + IEnumerable args = this.VisitExpressionList(m.Arguments); + + if (obj != m.Object || args != m.Arguments) + { + return Expression.Call(obj, m.Method, args); + } + + return m; + } + + protected virtual ReadOnlyCollection VisitExpressionList(ReadOnlyCollection original) + { + List list = null; + + for (int i = 0, n = original.Count; i < n; i++) + { + Expression p = this.Visit(original[i]); + + if (list != null) + { + list.Add(p); + } + else if (p != original[i]) + { + list = new List(n); + + for (int j = 0; j < i; j++) + { + list.Add(original[j]); + } + + list.Add(p); + } + } + + if (list != null) + { + return list.AsReadOnly(); + } + + return original; + } + + protected virtual MemberAssignment VisitMemberAssignment(MemberAssignment assignment) + { + Expression e = this.Visit(assignment.Expression); + + if (e != assignment.Expression) + { + return Expression.Bind(assignment.Member, e); + } + + return assignment; + } + + protected virtual MemberMemberBinding VisitMemberMemberBinding(MemberMemberBinding binding) + { + IEnumerable bindings = this.VisitBindingList(binding.Bindings); + + if (bindings != binding.Bindings) + { + return Expression.MemberBind(binding.Member, bindings); + } + + return binding; + } + + protected virtual MemberListBinding VisitMemberListBinding(MemberListBinding binding) + { + IEnumerable initializers = this.VisitElementInitializerList(binding.Initializers); + + if (initializers != binding.Initializers) + { + return Expression.ListBind(binding.Member, initializers); + } + + return binding; + } + + protected virtual IEnumerable VisitBindingList(ReadOnlyCollection original) + { + List list = null; + + for (int i = 0, n = original.Count; i < n; i++) + { + MemberBinding b = this.VisitBinding(original[i]); + + if (list != null) + { + list.Add(b); + } + else if (b != original[i]) + { + list = new List(n); + + for (int j = 0; j < i; j++) + { + list.Add(original[j]); + } + + list.Add(b); + } + } + + if (list != null) + return list; + + return original; + } + + protected virtual IEnumerable VisitElementInitializerList(ReadOnlyCollection original) + { + List list = null; + + for (int i = 0, n = original.Count; i < n; i++) + { + ElementInit init = this.VisitElementInitializer(original[i]); + + if (list != null) + { + list.Add(init); + } + else if (init != original[i]) + { + list = new List(n); + + for (int j = 0; j < i; j++) + { + list.Add(original[j]); + } + + list.Add(init); + } + } + + if (list != null) + return list; + + return original; + } + + protected virtual Expression VisitLambda(LambdaExpression lambda) + { + Expression body = this.Visit(lambda.Body); + + if (body != lambda.Body) + { + return Expression.Lambda(lambda.Type, body, lambda.Parameters); + } + + return lambda; + } + + protected virtual NewExpression VisitNew(NewExpression nex) + { + IEnumerable args = this.VisitExpressionList(nex.Arguments); + + if (args != nex.Arguments) + { + if (nex.Members != null) + return Expression.New(nex.Constructor, args, nex.Members); + else + return Expression.New(nex.Constructor, args); + } + + return nex; + } + + protected virtual Expression VisitMemberInit(MemberInitExpression init) + { + NewExpression n = this.VisitNew(init.NewExpression); + IEnumerable bindings = this.VisitBindingList(init.Bindings); + + if (n != init.NewExpression || bindings != init.Bindings) + { + return Expression.MemberInit(n, bindings); + } + + return init; + } + + protected virtual Expression VisitListInit(ListInitExpression init) + { + NewExpression n = this.VisitNew(init.NewExpression); + IEnumerable initializers = this.VisitElementInitializerList(init.Initializers); + + if (n != init.NewExpression || initializers != init.Initializers) + { + return Expression.ListInit(n, initializers); + } + + return init; + } + + protected virtual Expression VisitNewArray(NewArrayExpression na) + { + IEnumerable exprs = this.VisitExpressionList(na.Expressions); + + if (exprs != na.Expressions) + { + if (na.NodeType == ExpressionType.NewArrayInit) + { + return Expression.NewArrayInit(na.Type.GetElementType(), exprs); + } + else + { + return Expression.NewArrayBounds(na.Type.GetElementType(), exprs); + } + } + + return na; + } + + protected virtual Expression VisitInvocation(InvocationExpression iv) + { + IEnumerable args = this.VisitExpressionList(iv.Arguments); + Expression expr = this.Visit(iv.Expression); + + if (args != iv.Arguments || expr != iv.Expression) + { + return Expression.Invoke(expr, args); + } + + return iv; + } +} +``` \ No newline at end of file diff --git a/docs/blog/building-part-03.md b/docs/blog/building-part-03.md new file mode 100644 index 0000000..0293533 --- /dev/null +++ b/docs/blog/building-part-03.md @@ -0,0 +1,419 @@ +# LINQ: Building an IQueryable Provider – Part III: Local variable references + +Matt Warren - MSFT; August 1, 2007 + +--- + +This post is the third in a series of posts covering how to build a LINQ IQueryable provider. If you have not read the previous posts, please do so before proceeding. + + +Complete list of posts in the Building an IQueryable Provider series + + +Part III? Wasn’t I done in the last post? Didn’t I have the provider actually working, translating, executing and returning a sequence of objects? + + +Sure, that’s true, but only just so. The provider I built was really fragile. It only understood one major query operator and a few minor ones like comparison, etc. However, real providers are going to have to deal with many more operators and complicated interactions between them. For example, that provider did not even let you project the data into new shapes. How one goes about doing that is non-obvious. + + +Translating Local Variable References + + +Did you see what happens when the query references local variables? No? + + +string city = "London"; + + +var query = db.Customers.Where(c => c.City == city); + + + + + +What happens when we try to translate that? Go ahead try it. I’m waiting. + + +Bam! You get an exception, “The member 'city' is not supported.” What does that mean? I thought the ‘member’ city was one of the columns. Well, it is. What the exception is referring to is the local variable ‘city’. Yet, how is that a ‘member’? + + +Take a look at the ToString() translation of the expression tree. + + +Console.WriteLine(query.Expression.ToString()); + + + + + +Here’s what you get: + + +SELECT * FROM Customers.Where(c => return (c.City = value(Sample.Program+<>c__DisplayClass0).city)) + + +Aha! The c# compile has made a class to hold local variables that are being referenced in the lambda expression. This is the same thing it does when local variables are referenced inside an anonymous method. But you already knew that. Didn’t you? No? + + +Regardless, if I want to have my provider work with references to local variables I’m going to have to deal with it. Maybe I can just recognize field references to these compiler generated types? How do I identify a compiler generated type? By name? What if the c# compiler changes how they name them? What if another language uses a different scheme? Are local variables the only interesting case? What about references to member variables in scope at the time? Those aren’t going to be encoded as values in the tree either are they? At best they will be a constant node referencing the instance the member is on and then a MemberAccess node that accesses the member off that instance. Can I just recognize any member access against a constant node and evaluated it by hand using reflection? Maybe. What if the compiler generates something more complicated? + + +Okay, what I’m going to give you is a general purpose solution that will turn these compiler generated trees into something much more palatable, more like the trees you thought you were getting before I pointed out this mess. + + +What I really want to do is identify sub-trees in the overall tree that can be immediately evaluated and turned into values. If I can do this, then the rest of the translator only needs to deal with these values. Thank goodness I already have the ExpressionVisitor defined. I can use that and invent a pretty simple rule to determine what is sub-tree that can be evaluated locally. + + +Take a look at the code first and then I’ll explain how it does what it does. + + +public static class Evaluator { + + + /// + + + /// Performs evaluation & replacement of independent sub-trees + + + /// + + + /// The root of the expression tree. + + + /// A function that decides whether a given expression node can be part of the local function. + + + /// A new tree with sub-trees evaluated and replaced. + + + public static Expression PartialEval(Expression expression, Func fnCanBeEvaluated) { + + + return new SubtreeEvaluator(new Nominator(fnCanBeEvaluated).Nominate(expression)).Eval(expression); + + + } + + + + + + /// + + + /// Performs evaluation & replacement of independent sub-trees + + + /// + + + /// The root of the expression tree. + + + /// A new tree with sub-trees evaluated and replaced. + + + public static Expression PartialEval(Expression expression) { + + + return PartialEval(expression, Evaluator.CanBeEvaluatedLocally); + + + } + + + + + + private static bool CanBeEvaluatedLocally(Expression expression) { + + + return expression.NodeType != ExpressionType.Parameter; + + + } + + + + + + /// + + + /// Evaluates & replaces sub-trees when first candidate is reached (top-down) + + + /// + + + class SubtreeEvaluator: ExpressionVisitor { + + + HashSet candidates; + + + + + + internal SubtreeEvaluator(HashSet candidates) { + + + this.candidates = candidates; + + + } + + + + + + internal Expression Eval(Expression exp) { + + + return this.Visit(exp); + + + } + + + + + + protected override Expression Visit(Expression exp) { + + + if (exp == null) { + + + return null; + + + } + + + if (this.candidates.Contains(exp)) { + + + return this.Evaluate(exp); + + + } + + + return base.Visit(exp); + + + } + + + + + + private Expression Evaluate(Expression e) { + + + if (e.NodeType == ExpressionType.Constant) { + + + return e; + + + } + + + LambdaExpression lambda = Expression.Lambda(e); + + + Delegate fn = lambda.Compile(); + + + return Expression.Constant(fn.DynamicInvoke(null), e.Type); + + + } + + + } + + + + + + /// + + + /// Performs bottom-up analysis to determine which nodes can possibly + + + /// be part of an evaluated sub-tree. + + + /// + + + class Nominator : ExpressionVisitor { + + + Func fnCanBeEvaluated; + + + HashSet candidates; + + + bool cannotBeEvaluated; + + + + + + internal Nominator(Func fnCanBeEvaluated) { + + + this.fnCanBeEvaluated = fnCanBeEvaluated; + + + } + + + + + + internal HashSet Nominate(Expression expression) { + + + this.candidates = new HashSet(); + + + this.Visit(expression); + + + return this.candidates; + + + } + + + + + + protected override Expression Visit(Expression expression) { + + + if (expression != null) { + + + bool saveCannotBeEvaluated = this.cannotBeEvaluated; + + + this.cannotBeEvaluated = false; + + + base.Visit(expression); + + + if (!this.cannotBeEvaluated) { + + + if (this.fnCanBeEvaluated(expression)) { + + + this.candidates.Add(expression); + + + } + + + else { + + + this.cannotBeEvaluated = true; + + + } + + + } + + + this.cannotBeEvaluated |= saveCannotBeEvaluated; + + + } + + + return expression; + + + } + + + } + + +} + + +The Evaluator class exposes a static method ‘PartialEval’ that you can call to evaluate these sub-trees in your expression, leaving only constant nodes with actual values in their place. + + +The majority of this code is the demarking of maximal sub-trees that can be evaluated in isolation. The actual evaluation is trivial since the sub-trees can be ‘compiled’ using LambaExpression.Compile, turned into a delegate and then invoked. You can see this happening in the SubtreeVisitor.Evaluate method. + + +The process of determining maximal sub-trees happens in two steps. First by a bottom-up walk in the Nominator class that determines which nodes could possibly be evaluated in isolation and then a top-down walk in SubtreeEvaluator that finds highest nodes representing sub-trees that were nominated. + + +The Nominator is parameterized by a function that you supply that can employ whatever heuristics you want to determine if some given node can be evaluated in isolation. The default heuristic is that any node except ExpresssionType.Parameter can be evaluated in isolation. Beyond that, a general rule states that if a child node cannot be evaluated locally then the parent node cannot either. Therefore, any node upstream of a parameter cannot be evaluated and will remain in the tree. Everything else will be evaluated and replaced as constants. + + +Now that I have this class I can put it to work by using it whenever I go about translating expression trees. Fortunately, I already have this operation factored out into the ‘Translate’ method on the DbQueryProvider class. + + +public class DbQueryProvider : QueryProvider { + … + + + private string Translate(Expression expression) { + + + expression = Evaluator.PartialEval(expression); + + + return new QueryTranslator().Translate(expression); + + + } + + +} + + + +Now if we try the following code we get a better result: + + +string city = "London"; + + +var query = db.Customers.Where(c => c.City == city); + + + + + +Console.WriteLine("Query:\n{0}\n", query); + + + +Which writes: + + +Query: +SELECT * FROM (SELECT * FROM Customers) AS T WHERE (City = 'London') + + +Exactly what I wanted. This provider is coming along nicely! + + +Maybe next time I’ll implement Select. \ No newline at end of file diff --git a/docs/blog/building-part-04.md b/docs/blog/building-part-04.md new file mode 100644 index 0000000..d56a207 --- /dev/null +++ b/docs/blog/building-part-04.md @@ -0,0 +1,744 @@ +# LINQ: Building an IQueryable Provider – Part IV: Select + +Matt Warren - MSFT; August 2, 2007 + +--- + + +This post is the fourth in a series of posts covering how to build a LINQ IQueryable provider. If you have not read the previous posts, please do so before proceeding. + + +Complete list of posts in the Building an IQueryable Provider series + + + +I just could not leave well enough alone. I had the crude LINQ provider working with just a translation of the Where method into SQL. I could execute the query and convert the results into my objects. But that’s not good enough for me, and I know it’s not good enough for you. You probably want to see it all; the transformation of a little sample program into a full-fledged working ORM system. Well, I’m probably not going to do that. However, I still think there’s a lot of common ground I can cover, that you can make use of in your provider, by showing you how I’m going to implement Select. + + +Implementing Select + + +Translating the Where method was easy compared to Select. I’m not talking about your garden variety SQL operation of selecting five columns out of ten. I’m talking about a LINQ Select operation where you can transform your data into just about any shape you want. The selector function of the LINQ Select operator can be any transforming expression that the user can imagine. There could be object constructors, initializers, conditionals, binary operators, method calls; the whole lot. How am I going to translate any of that to SQL, let alone reproduce that structure in the objects I return? + + +Fortunately, I don’t really have to do any of that. Why not? Surely, that’s got to be a lot of code to write? Right? The truth is I already have the code to do it, at least most of it, and I didn’t stay up all night writing it in a bout of binge programming. I didn’t write it at all. The user did when he wrote the query. + + +The selector function is the code to construct the results. If this were LINQ to objects instead of my IQueryable provider, the selector function would be the code that was run to produce the results. Why should it be any different now? + + +Aha, not so fast. Surely, the selector function would be just dandy if it were actual code and not an expression tree, and even if it were code it would just be a function that converts one object into another. Yet, I don’t have an object to convert. I have a DbDataReader and it has fields. It’s an object I want to produce. I don’t already have one. + + +Of course, you might be able to think of a real cheesy solution that just combines the prior ObjectReader with basically a LINQ to Objects version of Select to convert the results of retrieving all the data into a different shape. Yet, that would be a gross misuse of the fabric of space-time. We shouldn’t be retrieving all the data. We should just bring back the bits that are needed to produce the results. What a dilemma. + + +Fortunately, it’s still easy. I just have to convert the selector function I already have into the one I need. What is the one I need? Well, one that reads data from a DbDataReader for starters. Okay, sure, but maybe if I abstract the DataReader out of the problem and make it about getting the data from a method called ‘GetValue’. Yes, I know DataReader already has one of those, but it also has this nasty habit of returning DbNull’s. + + +public abstract class ProjectionRow { + + + public abstract object GetValue(int index); + + +} + + + +So here’s this simple abstract base class that represents a row of data. If my selector expression was pulling data out of this guy by calling ‘GetValue’ and then using an Expression.Convert operation I’d be smiling. + + +Let’s take a look at the code that builds my new selector. + + +internal class ColumnProjection { + + + internal string Columns; + + + internal Expression Selector; + + +} + + + + + +internal class ColumnProjector : ExpressionVisitor { + + + StringBuilder sb; + + + int iColumn; + + + ParameterExpression row; + + + static MethodInfo miGetValue; + + + + + + internal ColumnProjector() { + + + if (miGetValue == null) { + + + miGetValue = typeof(ProjectionRow).GetMethod("GetValue"); + + + } + + + } + + + + + + internal ColumnProjection ProjectColumns(Expression expression, ParameterExpression row) { + + + this.sb = new StringBuilder(); + + + this.row = row; + + + Expression selector = this.Visit(expression); + + + return new ColumnProjection { Columns = this.sb.ToString(), Selector = selector }; + + + } + + + + + + protected override Expression VisitMemberAccess(MemberExpression m) { + + + if (m.Expression != null && m.Expression.NodeType == ExpressionType.Parameter) { + + + if (this.sb.Length > 0) { + + + this.sb.Append(", "); + + + } + + + this.sb.Append(m.Member.Name); + + + return Expression.Convert(Expression.Call(this.row, miGetValue, Expression.Constant(iColumn++)), m.Type); + + + } + + + else { + + + return base.VisitMemberAccess(m); + + + } + + + } + + +} + + + +Now, that doesn’t look like all that much code does it. ColumnProjector is a visitor that walks down an expression tree converting column references (or what it thinks are references to columns) into tiny expressions that access the individual data values via a call to this GetValue method. Where does this method come from? There’s a ParameterExpression called ‘row’ that is typed to be a ProjectionRow that abstract class I defined above. Not only am I rebuilding this selector expression I am going to turn it eventually into the body of a lambda expression that takes a ProjectionRow as an argument. That way I can convert the LambdaExpression into a delegate by calling the LambdaExpression.Compile method. + + +Notice this visitor also builds up a string representing our SQL select clause. Bonus! Now whenever I see a Queryable.Select in the query expression I can convert the selector into both the function that produces the results and the select clause I need for the command text. + + +Let’s just see where this fits in. Here’s my modified QueryTranslator. (The relevant bits anyway.) + + +internal class TranslateResult { + + + internal string CommandText; + + + internal LambdaExpression Prsojector; + + +} + + + +internal class QueryTranslator : ExpressionVisitor { + + + StringBuilder sb; + + + ParameterExpression row; + + + ColumnProjection projection; + + + + + + internal QueryTranslator() { + + + } + + + + + + internal TranslateResult Translate(Expression expression) { + + + this.sb = new StringBuilder(); + + + this.row = Expression.Parameter(typeof(ProjectionRow), "row"); + + + this.Visit(expression); + + + return new TranslateResult { + + + CommandText = this.sb.ToString(), + + + Projector = this.projection != null ? Expression.Lambda(this.projection.Selector, this.row) : null + + + }; + + + } + + + + + + protected override Expression VisitMethodCall(MethodCallExpression m) { + + + if (m.Method.DeclaringType == typeof(Queryable)) { + + + if (m.Method.Name == "Where") { + + + sb.Append("SELECT * FROM ("); + + + this.Visit(m.Arguments[0]); + + + sb.Append(") AS T WHERE "); + + + LambdaExpression lambda = (LambdaExpression)StripQuotes(m.Arguments[1]); + + + this.Visit(lambda.Body); + + + return m; + + + } + + + else if (m.Method.Name == "Select") { + + + LambdaExpression lambda = (LambdaExpression)StripQuotes(m.Arguments[1]); + + + ColumnProjection projection = new ColumnProjector().ProjectColumns(lambda.Body, this.row); + + + sb.Append("SELECT "); + + + sb.Append(projection.Columns); + + + sb.Append(" FROM ("); + + + this.Visit(m.Arguments[0]); + + + sb.Append(") AS T "); + + + this.projection = projection; + + + return m; + + + } + + + } + + + throw new NotSupportedException(string.Format("The method '{0}' is not supported", m.Method.Name)); + + + } + + + + . . . + + +} + + + + + +As you can see, the QueryTranslator now handles the Select method, building a SQL SELECT statement just like the Where method did. However, it also remembers that last ColumnProjection (the result of calling ProjectColumns) and returns the newly reconstructed selector as a LambdaExpression in the TranslateResult object. + + +Now all I need is an ObjectReader that works off this LambdaExpression instead of just constructing a fixed object. + + +Look, here’s one now. + + +internal class ProjectionReader : IEnumerable, IEnumerable { + + + Enumerator enumerator; + + + + + + internal ProjectionReader(DbDataReader reader, Func projector) { + + + this.enumerator = new Enumerator(reader, projector); + + + } + + + + + + public IEnumerator GetEnumerator() { + + + Enumerator e = this.enumerator; + + + if (e == null) { + + + throw new InvalidOperationException("Cannot enumerate more than once"); + + + } + + + this.enumerator = null; + + + return e; + + + } + + + + + + IEnumerator IEnumerable.GetEnumerator() { + + + return this.GetEnumerator(); + + + } + + + + + + class Enumerator : ProjectionRow, IEnumerator, IEnumerator, IDisposable { + + + DbDataReader reader; + + + T current; + + + Func projector; + + + + + + internal Enumerator(DbDataReader reader, Func projector) { + + + this.reader = reader; + + + this.projector = projector; + + + } + + + + + + public override object GetValue(int index) { + + + if (index >= 0) { + + + if (this.reader.IsDBNull(index)) { + + + return null; + + + } + + + else { + + + return this.reader.GetValue(index); + + + } + + + } + + + throw new IndexOutOfRangeException(); + + + } + + + + + + public T Current { + + + get { return this.current; } + + + } + + + + + + object IEnumerator.Current { + + + get { return this.current; } + + + } + + + + + + public bool MoveNext() { + + + if (this.reader.Read()) { + + + this.current = this.projector(this); + + + return true; + + + } + + + return false; + + + } + + + + + + public void Reset() { + + + } + + + + + + public void Dispose() { + + + this.reader.Dispose(); + + + } + + + } + + +} + + + +The ProjectionReader class is nearly identical to the ObjectReader class shown in Part II, except that the logic for constructing an object out of fields is missing. In its place is just a call to this new ‘projector’ delegate. The idea is that this is going to be the same exact delegate that we can get out of our rebuilt selector expression. + + +If you recall, I rebuilt the selector expression to reference a parameter of type ProjectionRow. Now, we see here that the Enumerator class inside the ProjectionReader implements this ProjectionRow. It’s a good thing too, because it’s the only object around that knows about the DbDataReader. It also makes it really easy for us to simply pass the ‘this’ expression into the delegate when we invoke it. + + +It’s nice when everything just seems to fit together. Now I just need to assemble the pieces in the DbQueryProvider. + + +So here’s my new provider: + + +public class DbQueryProvider : QueryProvider { + + + DbConnection connection; + + + + + + public DbQueryProvider(DbConnection connection) { + + + this.connection = connection; + + + } + + + + + + public override string GetQueryText(Expression expression) { + + + return this.Translate(expression).CommandText; + + + } + + + + + + public override object Execute(Expression expression) { + + + TranslateResult result = this.Translate(expression); + + + + + + DbCommand cmd = this.connection.CreateCommand(); + + + cmd.CommandText = result.CommandText; + + + DbDataReader reader = cmd.ExecuteReader(); + + + + + + Type elementType = TypeSystem.GetElementType(expression.Type); + + + if (result.Projector != null) { + + + Delegate projector = result.Projector.Compile(); + + + return Activator.CreateInstance( + + + typeof(ProjectionReader<>).MakeGenericType(elementType), + + + BindingFlags.Instance | BindingFlags.NonPublic, null, + + + new object[] { reader, projector }, + + + null + + + ); + + + } + + + else { + + + return Activator.CreateInstance( + + + typeof(ObjectReader<>).MakeGenericType(elementType), + + + BindingFlags.Instance | BindingFlags.NonPublic, null, + + + new object[] { reader }, + + + null + + + ); + + + } + + + } + + + + + + private TranslateResult Translate(Expression expression) { + + + expression = Evaluator.PartialEval(expression); + + + return new QueryTranslator().Translate(expression); + + + } + + +} + + + + + +The call to Translate gives me back everything I need. I just need to convert the LambdaExpression into a delegate by calling Compile. Notice how I still need to keep around the old ObjectReader. That’s just in case there was never a Select in the query at all. + + +Now let’s see if I can really use it. + + +string city = "London"; + + +var query = db.Customers.Where(c => c.City == city) + .Select(c => new {Name = c.ContactName, Phone = c.Phone}); + + +Console.WriteLine("Query:\n{0}\n", query); + + + +var list = query.ToList(); + + +foreach (var item in list) { + + + Console.WriteLine("{0}", item); + + +} + + + + + + + + +Running this now produces the following result: + + +Query: +SELECT ContactName, Phone FROM (SELECT * FROM (SELECT * FROM Customers) AS T WHERE (City = 'London')) AS T + + +{ Name = Thomas Hardy, Phone = (171) 555-7788 } +{ Name = Victoria Ashworth, Phone = (171) 555-1212 } +{ Name = Elizabeth Brown, Phone = (171) 555-2282 } +{ Name = Ann Devon, Phone = (171) 555-0297 } +{ Name = Simon Crowther, Phone = (171) 555-7733 } +{ Name = Hari Kumar, Phone = (171) 555-1717 } + + + +Look, I’m no longer returning all the results. That’s just what I wanted. The translated selector expression turned into a delegate that included the ‘new xxx’ anonymous type initializser and calls to GetValue that read the results from the DataReader right into my object w/o having to use reflection on every single field. This just gets better and better. + + +You must be thinking now that I’m finally done. This provider rocks! What more could be left to do? + + +There’s still a lot more that can be done. Yet, even with select, even with how good it appears to be working, there are some major holes/flaws in this solution. Fixing them is going to take a major overhaul of the code. + + +Fortunately, for me, that’s the fun part. See you next time in Part V. + + + + + +Matt + diff --git a/docs/blog/building-part-05.md b/docs/blog/building-part-05.md new file mode 100644 index 0000000..13a1707 --- /dev/null +++ b/docs/blog/building-part-05.md @@ -0,0 +1,2533 @@ +# LINQ: Building an IQueryable Provider – Part V: Improving column binding + +Matt Warren - MSFT; August 3, 2007 + +--- + +This is the fifth in a series of posts on how to build a LINQ IQueryable provider. If you have not read the previous posts please take a look before proceeding, or if you are daring dig right in. + + +Complete list of posts in the Building an IQueryable Provider series + +Over the past four parts of this series I have constructed a working LINQ IQueryable provider that targets ADO and SQL and has so far been able to translate both Queryable.Where and Queryable.Select standard query operators. Yet, as big of an accomplishment that has been there are still a few gaping holes and I’m not talking about other missing operators like OrderBy and Join. I’m talking about huge conceptual gaffs that will bite anyone that strays from my oh-so-ideally crafted demo queries. + + +Fixing the Gaping Holes + + +Certainly, I can write a simple where/select pair and it works as advertised. My select expression can be arbitrarily complex and it still chugs along. + + +var query = db.Customers.Where(c => c.City == city) + + + .Select(c => new { + + + Name = c.ContactName, + + + Location = c.City }); + + + +However, if just rearrange the order of Where and Select it all falls apart. + + +var query = db.Customers.Select(c => new { + + + Name = c.ContactName, + + + Location = c.City }) + .Where(x => x.Location == city); + +This handsome little query generates SQL that’s not exactly right. + + + + + +SELECT * FROM (SELECT ContactName, City FROM (SELECT * FROM Customers) AS T) AS T WHERE (Location = 'London') + +It also generates an exception when executed, “Invalid column name 'Location'.” It seems my oversimplifying practice of treating member accesses as column references has backfired. I naively assumed that the only member accesses in the sub-trees would match the names of the columns being generated. Yet, that’s obviously not true. So either I need to find a way to change the names of columns to match or I need to figure out some other way to deal with member accesses. + + + + + +I suppose either is possible. Yet, if I consider a slightly more complicated example, renaming the columns is not sufficient. If the select expression produces a hierarchy of objects, then references to members can become a ‘multi-dot’ operation. + + + + + +var query = db.Customers.Select(c => new { + + + Name = c.ContactName, + + + Location = new { + City = c.City, + Country = c.Country + } + }) + .Where(x => x.Location.City == city); + +Now, how am I going to translate this? The existing code does not even contain a concept of what this intermediate ‘Location’ object could be. Luckily, I already realize what I need to do, yet it’s going to require a big change. I need to step away from this notion that my provider is just translating query expressions into text. It’s translating query expressions into SQL. Text is only one possible manifestation of SQL and it’s not a very good one for programming logic to operate on. Of course, I’m going to need text eventually, but if I could first represent SQL as an abstraction, I could handle much more complicated translations. + + + + + +Of course, the best data structure to operate on is a semantic SQL tree. So, ideally, I would have this entirely separate tree definition for SQL that I could translate LINQ query expressions into, but that would be a lot of work. Luckily, the definition of this ideal SQL tree would overlap a lot with LINQ trees, so I’m going to cheat and simply teach LINQ expression trees about SQL. To do this, I’ll add some new expression node types. It won’t matter if no other LINQ API understands them. I’ll just keep them to myself. + + + + + +internal enum DbExpressionType { + + + Table = 1000, // make sure these don't overlap with ExpressionType + + + Column, + + + Select, + + + Projection + + +} + + + + + +internal class TableExpression : Expression { + + + string alias; + + + string name; + + + internal TableExpression(Type type, string alias, string name) + + + : base((ExpressionType)DbExpressionType.Table, type) { + + + this.alias = alias; + + + this.name = name; + + + } + + + internal string Alias { + + + get { return this.alias; } + + + } + + + internal string Name { + + + get { return this.name; } + + + } + + +} + + + + + +internal class ColumnExpression : Expression { + + + string alias; + + + string name; + + + int ordinal; + + + internal ColumnExpression(Type type, string alias, string name, int ordinal) + + + : base((ExpressionType)DbExpressionType.Column, type) { + + + this.alias = alias; + + + this.name = name; + + + this.ordinal = ordinal; + + + } + + + internal string Alias { + + + get { return this.alias; } + + + } + + + internal string Name { + + + get { return this.name; } + + + } + + + internal int Ordinal { + + + get { return this.ordinal; } + + + } + + +} + + + + + +internal class ColumnDeclaration { + + + string name; + + + Expression expression; + + + internal ColumnDeclaration(string name, Expression expression) { + + + this.name = name; + + + this.expression = expression; + + + } + + + internal string Name { + + + get { return this.name; } + + + } + + + internal Expression Expression { + + + get { return this.expression; } + + + } + + +} + + + + + +internal class SelectExpression : Expression { + + + string alias; + + + ReadOnlyCollection columns; + + + Expression from; + + + Expression where; + + + internal SelectExpression(Type type, string alias, IEnumerable columns, Expression from, Expression where) + + + : base((ExpressionType)DbExpressionType.Select, type) { + + + this.alias = alias; + + + this.columns = columns as ReadOnlyCollection; + + + if (this.columns == null) { + + + this.columns = new List(columns).AsReadOnly(); + + + } + + + this.from = from; + + + this.where = where; + + + } + + + internal string Alias { + + + get { return this.alias; } + + + } + + + internal ReadOnlyCollection Columns { + + + get { return this.columns; } + + + } + + + internal Expression From { + + + get { return this.from; } + + + } + + + internal Expression Where { + + + get { return this.where; } + + + } + + +} + + + + + +internal class ProjectionExpression : Expression { + + + SelectExpression source; + + + Expression projector; + + + internal ProjectionExpression(SelectExpression source, Expression projector) + + + : base((ExpressionType)DbExpressionType.Projection, projector.Type) { + + + this.source = source; + + + this.projector = projector; + + + } + + + internal SelectExpression Source { + + + get { return this.source; } + + + } + + + internal Expression Projector { + + + get { return this.projector; } + + + } + + +} + + + + + +The only bits of SQL I really need to add to LINQ expression trees are the concepts of a SQL Select query that produces one or more columns, a reference to a column, a reference to a table and a projection that reassembles objects out of column references. + + + + + +I went ahead and defined my own DbExpressionType enum that ‘extends’ the base ExpressionType enum by picking a sufficiently large starting value to not collide with the other definitions. If there was such a way as to derive from an enum I would have done that, but this will work as long as I am diligent. + + + + + +Each of the new expression nodes follows the same pattern set by the LINQ expressions, being immutable, etc; except they are now modeling SQL concepts and not CLR concepts. Notice, how the SelectExpression contains a collection of columns, and both a from and where expression. That is because this expression node is meant to match what a legal SQL select statement would contain. + + + + + +The ProjectionExpression describes how to construct a result object out of the columns of a select expression. If you think about it, this is almost exactly the same job that the projection expression held in Part IV, the one that was used to build the delegate for the ProjectionReader. Only this time, it’s possible to reason about projection in terms of the SQL query and not just as a function that assembles objects out of DataReaders. + + + + + +Of course, now that I’ve got new nodes, I need a new visitor. The DbExpressionVisitor extends the ExpressionVisitor, adding the base visit pattern for the new nodes. + + + + + +internal class DbExpressionVisitor : ExpressionVisitor { + + + protected override Expression Visit(Expression exp) { + + + if (exp == null) { + + + return null; + + + } + + + switch ((DbExpressionType)exp.NodeType) { + + + case DbExpressionType.Table: + + + return this.VisitTable((TableExpression)exp); + + + case DbExpressionType.Column: + + + return this.VisitColumn((ColumnExpression)exp); + + + case DbExpressionType.Select: + + + return this.VisitSelect((SelectExpression)exp); + + + case DbExpressionType.Projection: + + + return this.VisitProjection((ProjectionExpression)exp); + + + default: + + + return base.Visit(exp); + + + } + + + } + + + protected virtual Expression VisitTable(TableExpression table) { + + + return table; + + + } + + + protected virtual Expression VisitColumn(ColumnExpression column) { + + + return column; + + + } + + + protected virtual Expression VisitSelect(SelectExpression select) { + + + Expression from = this.VisitSource(select.From); + + + Expression where = this.Visit(select.Where); + + + ReadOnlyCollection columns = this.VisitColumnDeclarations(select.Columns); + + + if (from != select.From || where != select.Where || columns != select.Columns) { + + + return new SelectExpression(select.Type, select.Alias, columns, from, where); + + + } + + + return select; + + + } + + + protected virtual Expression VisitSource(Expression source) { + + + return this.Visit(source); + + + } + + + protected virtual Expression VisitProjection(ProjectionExpression proj) { + + + SelectExpression source = (SelectExpression)this.Visit(proj.Source); + + + Expression projector = this.Visit(proj.Projector); + + + if (source != proj.Source || projector != proj.Projector) { + + + return new ProjectionExpression(source, projector); + + + } + + + return proj; + + + } + + + protected ReadOnlyCollection VisitColumnDeclarations(ReadOnlyCollection columns) { + + + List alternate = null; + + + for (int i = 0, n = columns.Count; i < n; i++) { + + + ColumnDeclaration column = columns[i]; + + + Expression e = this.Visit(column.Expression); + + + if (alternate == null && e != column.Expression) { + + + alternate = columns.Take(i).ToList(); + + + } + + + if (alternate != null) { + + + alternate.Add(new ColumnDeclaration(column.Name, e)); + + + } + + + } + + + if (alternate != null) { + + + return alternate.AsReadOnly(); + + + } + + + return columns; + + + } + + +} + + + +That’s better. Now I feel like I’m really headed somewhere! + + + + + +The next step is to take a stick of dynamite and blow up the QueryTranslator. No more monolithic expression tree to string translator. What I need are individual pieces that handle separate tasks; one to bind the expression tree by figuring out what methods like Queryable.Select mean and another to convert the resulting tree into SQL text. Hopefully, by concocting this LINQ/SQL hybrid tree I’ll be able to figure out the member access mess. + + + + + +Here’s the code for the new QueryBinder class. + + + + + +internal class QueryBinder : ExpressionVisitor { + + + ColumnProjector columnProjector; + + + Dictionary map; + + + int aliasCount; + + + + + + internal QueryBinder() { + + + this.columnProjector = new ColumnProjector(this.CanBeColumn); + + + } + + + + + + private bool CanBeColumn(Expression expression) { + + + return expression.NodeType == (ExpressionType)DbExpressionType.Column; + + + } + + + + + + internal Expression Bind(Expression expression) { + + + this.map = new Dictionary(); + + + return this.Visit(expression); + + + } + + + + + + private static Expression StripQuotes(Expression e) { + + + while (e.NodeType == ExpressionType.Quote) { + + + e = ((UnaryExpression)e).Operand; + + + } + + + return e; + + + } + + + + + + private string GetNextAlias() { + + + return "t" + (aliasCount++); + + + } + + + + + + private ProjectedColumns ProjectColumns(Expression expression, string newAlias, string existingAlias) { + + + return this.columnProjector.ProjectColumns(expression, newAlias, existingAlias); + + + } + + + + + + protected override Expression VisitMethodCall(MethodCallExpression m) { + + + if (m.Method.DeclaringType == typeof(Queryable) || + + + m.Method.DeclaringType == typeof(Enumerable)) { + + + switch (m.Method.Name) { + + + case "Where": + + + return this.BindWhere(m.Type, m.Arguments[0], (LambdaExpression)StripQuotes(m.Arguments[1])); + + + case "Select": + + + return this.BindSelect(m.Type, m.Arguments[0], (LambdaExpression)StripQuotes(m.Arguments[1])); + + + } + + + throw new NotSupportedException(string.Format("The method '{0}' is not supported", m.Method.Name)); + + + } + + + return base.VisitMethodCall(m); + + + } + + + + + + private Expression BindWhere(Type resultType, Expression source, LambdaExpression predicate) { + + + ProjectionExpression projection = (ProjectionExpression)this.Visit(source); + + + this.map[predicate.Parameters[0]] = projection.Projector; + + + Expression where = this.Visit(predicate.Body); + + + string alias = this.GetNextAlias(); + + + ProjectedColumns pc = this.ProjectColumns(projection.Projector, alias, GetExistingAlias(projection.Source)); + + + return new ProjectionExpression( + + + new SelectExpression(resultType, alias, pc.Columns, projection.Source, where), + + + pc.Projector + + + ); + + + } + + + + + + private Expression BindSelect(Type resultType, Expression source, LambdaExpression selector) { + + + ProjectionExpression projection = (ProjectionExpression)this.Visit(source); + + + this.map[selector.Parameters[0]] = projection.Projector; + + + Expression expression = this.Visit(selector.Body); + + + string alias = this.GetNextAlias(); + + + ProjectedColumns pc = this.ProjectColumns(expression, alias, GetExistingAlias(projection.Source)); + + + return new ProjectionExpression( + + + new SelectExpression(resultType, alias, pc.Columns, projection.Source, null), + + + pc.Projector + + + ); + + + } + + + + + + private static string GetExistingAlias(Expression source) { + + + switch ((DbExpressionType)source.NodeType) { + + + case DbExpressionType.Select: + + + return ((SelectExpression)source).Alias; + + + case DbExpressionType.Table: + + + return ((TableExpression)source).Alias; + + + default: + + + throw new InvalidOperationException(string.Format("Invalid source node type '{0}'", source.NodeType)); + + + } + + + } + + + + + + private bool IsTable(object value) { + + + IQueryable q = value as IQueryable; + + + return q != null && q.Expression.NodeType == ExpressionType.Constant; + + + } + + + + + + private string GetTableName(object table) { + + + IQueryable tableQuery = (IQueryable)table; + + + Type rowType = tableQuery.ElementType; + + + return rowType.Name; + + + } + + + + + + private string GetColumnName(MemberInfo member) { + + + return member.Name; + + + } + + + + + + private Type GetColumnType(MemberInfo member) { + + + FieldInfo fi = member as FieldInfo; + + + if (fi != null) { + + + return fi.FieldType; + + + } + + + PropertyInfo pi = (PropertyInfo)member; + + + return pi.PropertyType; + + + } + + + + + + private IEnumerable GetMappedMembers(Type rowType) { + + + return rowType.GetFields().Cast(); + + + } + + + + + + private ProjectionExpression GetTableProjection(object value) { + + + IQueryable table = (IQueryable)value; + + + string tableAlias = this.GetNextAlias(); + + + string selectAlias = this.GetNextAlias(); + + + List bindings = new List(); + + + List columns = new List(); + + + foreach (MemberInfo mi in this.GetMappedMembers(table.ElementType)) { + + + string columnName = this.GetColumnName(mi); + + + Type columnType = this.GetColumnType(mi); + + + int ordinal = columns.Count; + + + bindings.Add(Expression.Bind(mi, new ColumnExpression(columnType, selectAlias, columnName, ordinal))); + + + columns.Add(new ColumnDeclaration(columnName, new ColumnExpression(columnType, tableAlias, columnName, ordinal))); + + + } + + + Expression projector = Expression.MemberInit(Expression.New(table.ElementType), bindings); + + + Type resultType = typeof(IEnumerable<>).MakeGenericType(table.ElementType); + + + return new ProjectionExpression( + + + new SelectExpression( + + + resultType, + + + selectAlias, + + + columns, + + + new TableExpression(resultType, tableAlias, this.GetTableName(table)), + + + null + + + ), + + + projector + + + ); + + + } + + + + + + protected override Expression VisitConstant(ConstantExpression c) { + + + if (this.IsTable(c.Value)) { + + + return GetTableProjection(c.Value); + + + } + + + return c; + + + } + + + + + + protected override Expression VisitParameter(ParameterExpression p) { + + + Expression e; + + + if (this.map.TryGetValue(p, out e)) { + + + return e; + + + } + + + return p; + + + } + + + + + + protected override Expression VisitMemberAccess(MemberExpression m) { + + + Expression source = this.Visit(m.Expression); + + + switch (source.NodeType) { + + + case ExpressionType.MemberInit: + + + MemberInitExpression min = (MemberInitExpression)source; + + + for (int i = 0, n = min.Bindings.Count; i < n; i++) { + + + MemberAssignment assign = min.Bindings[i] as MemberAssignment; + + + if (assign != null && MembersMatch(assign.Member, m.Member)) { + + + return assign.Expression; + + + } + + + } + + + break; + + + case ExpressionType.New: + + + NewExpression nex = (NewExpression)source; + + + if (nex.Members != null) { + + + for (int i = 0, n = nex.Members.Count; i < n; i++) { + + + if (MembersMatch(nex.Members[i], m.Member)) { + + + return nex.Arguments[i]; + + + } + + + } + + + } + + + break; + + + } + + + if (source == m.Expression) { + + + return m; + + + } + + + return MakeMemberAccess(source, m.Member); + + + } + + + + + + private bool MembersMatch(MemberInfo a, MemberInfo b) { + + + if (a == b) { + + + return true; + + + } + + + if (a is MethodInfo && b is PropertyInfo) { + + + return a == ((PropertyInfo)b).GetGetMethod(); + + + } + + + else if (a is PropertyInfo && b is MethodInfo) { + + + return ((PropertyInfo)a).GetGetMethod() == b; + + + } + + + return false; + + + } + + + + + + private Expression MakeMemberAccess(Expression source, MemberInfo mi) { + + + FieldInfo fi = mi as FieldInfo; + + + if (fi != null) { + + + return Expression.Field(source, fi); + + + } + + + PropertyInfo pi = (PropertyInfo)mi; + + + return Expression.Property(source, pi); + + + } + + +} + + + +One thing to notice is that there is a lot more going on here than in the QueryTranslator of yore. Translation for Where and Select have been factored out into separate methods. They don’t produce text anymore, but instances of ProjectionExpression and SelectExpression. The ColumnProjector looks like it is doing something much more complicated too. I haven’t shown the code for that yet, but it has changed as well. There’s all these little helper methods for understanding the meaning of a table or a column. This is the starting of a factoring that may well end with some kind of mapping system, but I’ll leave that for the future. + + + + + +A key method to examine is the GetTableProjection method that actually assembles a query (with both Select and Project expressions) that represent the default query for getting members out of the database table. No more “select *” here. The default table projection only represents the columns that I defined in my domain class declarations. + + + + + +Another thing to note is the change in the VisitMemberAccess method. I no longer only consider trivial accesses off parameter nodes. I actually attempt to resolve member access by looking up the meaning of a member and returning the sub-expression that a member translates into. + + + + + +This is how it works. When I translate a ‘table’ constant into a table projection (via GetTableProjection), I include a projector expression that describes how to construct an object out of the table’s columns. When I get to a Select or Where method I add a mapping from the parameter expression declared by the LambdaExpression argument to the projector of the ‘previous’ portion of the query. For the first Where or Select that’s just the projector from the table projection. So, later when I see a parameter expression in VisitParameter I substitute the entire previous projector expression. It’s okay to do this, the nodes are immutable and so I can include them many times in the tree. Finally, when I get to the member access node I’ve already turned the parameter into its semantic equivalent. This expression is likely a new or member-init expression node, and so I merely do a lookup in this structure to find the expression that the member access should be replaced with. Often, this simply finds a ColumnExpression node defined by the table projection. It could, however, find another new or member-init expression from a previous Select operation that produced a hierarchy. If it did, then a subsequent member access operation would look inside this expression, and so on. + + + + + +Whew! That’s a lot to take in. Unfortunately, I am not done. There’s also this ColumnProjector class that is a whole lot different than before. Take a look. + + + + + + internal sealed class ProjectedColumns { + + + Expression projector; + + + ReadOnlyCollection columns; + + + internal ProjectedColumns(Expression projector, ReadOnlyCollection columns) { + + + this.projector = projector; + + + this.columns = columns; + + + } + + + internal Expression Projector { + + + get { return this.projector; } + + + } + + + internal ReadOnlyCollection Columns { + + + get { return this.columns; } + + + } + + + } + + + + + + internal class ColumnProjector : DbExpressionVisitor { + + + Nominator nominator; + + + Dictionary map; + + + List columns; + + + HashSet columnNames; + + + HashSet candidates; + + + string existingAlias; + + + string newAlias; + + + int iColumn; + + + + + + internal ColumnProjector(Func fnCanBeColumn) { + + + this.nominator = new Nominator(fnCanBeColumn); + + + } + + + + + + internal ProjectedColumns ProjectColumns(Expression expression, string newAlias, string existingAlias) { + + + this.map = new Dictionary(); + + + this.columns = new List(); + + + this.columnNames = new HashSet(); + + + this.newAlias = newAlias; + + + this.existingAlias = existingAlias; + + + this.candidates = this.nominator.Nominate(expression); + + + return new ProjectedColumns(this.Visit(expression), this.columns.AsReadOnly()); + + + } + + + + + + protected override Expression Visit(Expression expression) { + + + if (this.candidates.Contains(expression)) { + + + if (expression.NodeType == (ExpressionType)DbExpressionType.Column) { + + + ColumnExpression column = (ColumnExpression)expression; + + + ColumnExpression mapped; + + + if (this.map.TryGetValue(column, out mapped)) { + + + return mapped; + + + } + + + if (this.existingAlias == column.Alias) { + + + int ordinal = this.columns.Count; + + + string columnName = this.GetUniqueColumnName(column.Name); + + + this.columns.Add(new ColumnDeclaration(columnName, column)); + + + mapped = new ColumnExpression(column.Type, this.newAlias, columnName, ordinal); + + + this.map[column] = mapped; + + + this.columnNames.Add(columnName); + + + return mapped; + + + } + + + // must be referring to outer scope + + + return column; + + + } + + + else { + + + string columnName = this.GetNextColumnName(); + + + int ordinal = this.columns.Count; + + + this.columns.Add(new ColumnDeclaration(columnName, expression)); + + + return new ColumnExpression(expression.Type, this.newAlias, columnName, ordinal); + + + } + + + } + + + else { + + + return base.Visit(expression); + + + } + + + } + + + + + + private bool IsColumnNameInUse(string name) { + + + return this.columnNames.Contains(name); + + + } + + + + + + private string GetUniqueColumnName(string name) { + + + string baseName = name; + + + int suffix = 1; + + + while (this.IsColumnNameInUse(name)) { + + + name = baseName + (suffix++); + + + } + + + return name; + + + } + + + + + + private string GetNextColumnName() { + + + return this.GetUniqueColumnName("c" + (iColumn++)); + + + } + + + + + + class Nominator : DbExpressionVisitor { + + + Func fnCanBeColumn; + + + bool isBlocked; + + + HashSet candidates; + + + + + + internal Nominator(Func fnCanBeColumn) { + + + this.fnCanBeColumn = fnCanBeColumn; + + + } + + + + + + internal HashSet Nominate(Expression expression) { + + + this.candidates = new HashSet(); + + + this.isBlocked = false; + + + this.Visit(expression); + + + return this.candidates; + + + } + + + + + + protected override Expression Visit(Expression expression) { + + + if (expression != null) { + + + bool saveIsBlocked = this.isBlocked; + + + this.isBlocked = false; + + + base.Visit(expression); + + + if (!this.isBlocked) { + + + if (this.fnCanBeColumn(expression)) { + + + this.candidates.Add(expression); + + + } + + + else { + + + this.isBlocked = true; + + + } + + + } + + + this.isBlocked |= saveIsBlocked; + + + } + + + return expression; + + + } + + + } + + + } + +The ColumnProjector is no longer assembling text for the select command, nor is it rewriting the selector expression into a function that constructs an object from a DataReader. However, it is doing something that is almost the same thing. It is assembling a list of ColumnDeclaration objects that are going to be used to construct a SelectExpression node, and it is rewriting the selector expression into a projector expression that references the columns assembled in the list. + + + + + +So how does it work? I’ve probably over engineered this class for what I’m using it for right now, but it will come in handy in the future so I’ll leave it as is. Before I get into how it works, let’s think about what it needs to do. + + + + + +Given some selector expression, I really need to figure out which parts of the expression should correspond to column declarations of a SQL select statement. These could be as simple as identifying the column references (ColumnExpression’s) left over in the tree after binding. Of course, that would mean the expression ‘a + b’ would turn into two column declarations, one for ‘a’ and one for ‘b’, leaving the ‘+’ operation residing in the newly rebuilt projector expression. That would certainly work, but wouldn’t it make sense to be able to put the entire ‘a + b’ expression into the column? That way it would be computed by SQL server instead of during construction of the results. If a Where operation, appearing after the Select operation, were to reference this expression it would have to evaluate it on the server anyway. Ignoring for the moment that I have not even allowed for ‘+’ operations to be translated, you can start to see that the problem of figuring out what should be in a column and what should be in the projection is a problem akin to the problem of figuring out how to pre-evaluate isolated sub-trees. + + + + + +The Evaluator class did this work using a two pass system of first nominating nodes that could be evaluated locally and then a second pass to pick the first nominated nodes from the top down, thus achieving evaluation of ‘maximal’ sub-trees. Figuring out what could or should be in a column declaration is just same kind of problem, only with a possibly different rule about what should be included or not. I want to pluck off the maximal sub-trees of nodes that can legally be placed in columns. Instead of evaluating these trees I just want to make them part of a SelectExpression and rewrite the remaining selector tree into a projector that references the new columns. + + + + + +You will see as you examine the code there is one additional complication that did not exist in the Evaluator. I may need to invent names for these column declarations if they truly are based on more complex sub expressions. + + + + + +Okay, now that I’ve built hybrid expression trees, bound them and gave them a nice spanking, I still need to get text out or the whole process is for naught. So I took the text generating code from the QueryTranslator and built a new QueryFormatter class whose sole responsibility is to generate the text from an expression tree. No more needing to actually build the tree at the same time. + + + + + +internal class QueryFormatter : DbExpressionVisitor { + + + StringBuilder sb; + + + int indent = 2; + + + int depth; + + + + + + internal QueryFormatter() { + + + } + + + + + + internal string Format(Expression expression) { + + + this.sb = new StringBuilder(); + + + this.Visit(expression); + + + return this.sb.ToString(); + + + } + + + + + + protected enum Identation { + + + Same, + + + Inner, + + + Outer + + + } + + + + + + internal int IdentationWidth { + + + get { return this.indent; } + + + set { this.indent = value; } + + + } + + + + + + private void AppendNewLine(Identation style) { + + + sb.AppendLine(); + + + if (style == Identation.Inner) { + + + this.depth++; + + + } + + + else if (style == Identation.Outer) { + + + this.depth--; + + + System.Diagnostics.Debug.Assert(this.depth >= 0); + + + } + + + for (int i = 0, n = this.depth * this.indent; i < n; i++) { + + + sb.Append(" "); + + + } + + + } + + + + + + protected override Expression VisitMethodCall(MethodCallExpression m) { + + + throw new NotSupportedException(string.Format("The method '{0}' is not supported", m.Method.Name)); + + + } + + + + + + protected override Expression VisitUnary(UnaryExpression u) { + + + switch (u.NodeType) { + + + case ExpressionType.Not: + + + sb.Append(" NOT "); + + + this.Visit(u.Operand); + + + break; + + + default: + + + throw new NotSupportedException(string.Format("The unary operator '{0}' is not supported", u.NodeType)); + + + } + + + return u; + + + } + + + + + + protected override Expression VisitBinary(BinaryExpression b) { + + + sb.Append("("); + + + this.Visit(b.Left); + + + switch (b.NodeType) { + + + case ExpressionType.And: + + + sb.Append(" AND "); + + + break; + + + case ExpressionType.Or: + + + sb.Append(" OR"); + + + break; + + + case ExpressionType.Equal: + + + sb.Append(" = "); + + + break; + + + case ExpressionType.NotEqual: + + + sb.Append(" <> "); + + + break; + + + case ExpressionType.LessThan: + + + sb.Append(" < "); + + + break; + + + case ExpressionType.LessThanOrEqual: + + + sb.Append(" <= "); + + + break; + + + case ExpressionType.GreaterThan: + + + sb.Append(" > "); + + + break; + + + case ExpressionType.GreaterThanOrEqual: + + + sb.Append(" >= "); + + + break; + + + default: + + + throw new NotSupportedException(string.Format("The binary operator '{0}' is not supported", b.NodeType)); + + + } + + + this.Visit(b.Right); + + + sb.Append(")"); + + + return b; + + + } + + + + + + protected override Expression VisitConstant(ConstantExpression c) { + + + if (c.Value == null) { + + + sb.Append("NULL"); + + + } + + + else { + + + switch (Type.GetTypeCode(c.Value.GetType())) { + + + case TypeCode.Boolean: + + + sb.Append(((bool)c.Value) ? 1 : 0); + + + break; + + + case TypeCode.String: + + + sb.Append("'"); + + + sb.Append(c.Value); + + + sb.Append("'"); + + + break; + + + case TypeCode.Object: + + + throw new NotSupportedException(string.Format("The constant for '{0}' is not supported", c.Value)); + + + default: + + + sb.Append(c.Value); + + + break; + + + } + + + } + + + return c; + + + } + + + + + + protected override Expression VisitColumn(ColumnExpression column) { + + + if (!string.IsNullOrEmpty(column.Alias)) { + + + sb.Append(column.Alias); + + + sb.Append("."); + + + } + + + sb.Append(column.Name); + + + return column; + + + } + + + + + + protected override Expression VisitSelect(SelectExpression select) { + + + sb.Append("SELECT "); + + + for (int i = 0, n = select.Columns.Count; i < n; i++) { + + + ColumnDeclaration column = select.Columns[i]; + + + if (i > 0) { + + + sb.Append(", "); + + + } + + + ColumnExpression c = this.Visit(column.Expression) as ColumnExpression; + + + if (c == null || c.Name != select.Columns[i].Name) { + + + sb.Append(" AS "); + + + sb.Append(column.Name); + + + } + + + } + + + if (select.From != null) { + + + this.AppendNewLine(Identation.Same); + + + sb.Append("FROM "); + + + this.VisitSource(select.From); + + + } + + + if (select.Where != null) { + + + this.AppendNewLine(Identation.Same); + + + sb.Append("WHERE "); + + + this.Visit(select.Where); + + + } + + + return select; + + + } + + + + + + protected override Expression VisitSource(Expression source) { + + + switch ((DbExpressionType)source.NodeType) { + + + case DbExpressionType.Table: + + + TableExpression table = (TableExpression)source; + + + sb.Append(table.Name); + + + sb.Append(" AS "); + + + sb.Append(table.Alias); + + + break; + + + case DbExpressionType.Select: + + + SelectExpression select = (SelectExpression)source; + + + sb.Append("("); + + + this.AppendNewLine(Identation.Inner); + + + this.Visit(select); + + + this.AppendNewLine(Identation.Outer); + + + sb.Append(")"); + + + sb.Append(" AS "); + + + sb.Append(select.Alias); + + + break; + + + default: + + + throw new InvalidOperationException("Select source is not valid type"); + + + } + + + return source; + + + } + + +} + + + + + +In addition to adding logic to write out the new SelectExpression node, I’ve advanced the formatting logic to include (gasp) new-lines and indentation. Now isn’t that special? + + + + + +Of course, I also have to end up with a LambdaExpression that builds the result objects. I used to get that out of the ColumnProjector class, but now it’s generating these semantic SQL projections, not at all what I need to do the real heavy-lifting of making actual objects. So I need to transform it again. I built a little class called ProjectionBuilder to do that. + + + + + +internal class ProjectionBuilder : DbExpressionVisitor { + + + ParameterExpression row; + + + private static MethodInfo miGetValue; + + + + + + internal ProjectionBuilder() { + + + if (miGetValue == null) { + + + miGetValue = typeof(ProjectionRow).GetMethod("GetValue"); + + + } + + + } + + + + + + internal LambdaExpression Build(Expression expression) { + + + this.row = Expression.Parameter(typeof(ProjectionRow), "row"); + + + Expression body = this.Visit(expression); + + + return Expression.Lambda(body, this.row); + + + } + + + + + + protected override Expression VisitColumn(ColumnExpression column) { + + + return Expression.Convert(Expression.Call(this.row, miGetValue, Expression.Constant(column.Ordinal)), column.Type); + + + } + + +} + + + + + +This class simply does what the ColumnProjector used to do, only it now benefits from the better binding logic introduced by the QueryBinder, so it empirically knows which nodes to translate into data reading expressions. + + + + + +Luckily, I don’t have to rewrite the ProjectionReader. It still works the same as before. What I do get to do is get rid of the ObjectReader, since I now always have a projector expression. I build one in the QueryBinder automatically every time I see a table. + + + + + +That just leaves the putting it all together step. Here’s the new rewrite to DbQueryProvider. + + + + + +public class DbQueryProvider : QueryProvider { + + + DbConnection connection; + + + + + + public DbQueryProvider(DbConnection connection) { + + + this.connection = connection; + + + } + + + + + + public override string GetQueryText(Expression expression) { + + + return this.Translate(expression).CommandText; + + + } + + + + + + public override object Execute(Expression expression) { + + + TranslateResult result = this.Translate(expression); + + + Delegate projector = result.Projector.Compile(); + + + + + + DbCommand cmd = this.connection.CreateCommand(); + + + cmd.CommandText = result.CommandText; + + + DbDataReader reader = cmd.ExecuteReader(); + + + + + + Type elementType = TypeSystem.GetElementType(expression.Type); + + + return Activator.CreateInstance( + + + typeof(ProjectionReader<>).MakeGenericType(elementType), + + + BindingFlags.Instance | BindingFlags.NonPublic, null, + + + new object[] { reader, projector }, + + + null + + + ); + + + } + + + + + + internal class TranslateResult { + + + internal string CommandText; + + + internal LambdaExpression Projector; + + + } + + + + + + private TranslateResult Translate(Expression expression) { + + + expression = Evaluator.PartialEval(expression); + + + ProjectionExpression proj = (ProjectionExpression)new QueryBinder().Bind(expression); + + + string commandText = new QueryFormatter().Format(proj.Source); + + + LambdaExpression projector = new ProjectionBuilder().Build(proj.Projector); + + + return new TranslateResult { CommandText = commandText, Projector = projector }; + + + } + + +} + + + + + +It’s not too different from before. The Translate method has multiple steps, invoking the additional visitors, and the Execute method no longer has to build ObjectReader if the projector does not exist. It always exists. + + + + + +So, now if I write the following query: + + + + + +var query = db.Customers.Select(c => new { + + + Name = c.ContactName, + + + Location = new { + City = c.City, + Country = c.Country + } + }) + .Where(x => x.Location.City == city); + + + +It runs successfully producing the following output: + + + + + +Query: + + +SELECT t2.ContactName, t2.City, t2.Country + + +FROM ( + + + SELECT t1.ContactName, t1.City, t1.Country + + + FROM ( + + + SELECT t0.ContactName, t0.City, t0.Country, t0.CustomerID, t0.Phone + + + FROM Customers AS t0 + + + ) AS t1 + + +) AS t2 + + +WHERE (t2.City = 'London') + + + + + +{ Name = Thomas Hardy, Location = { City = London, Country = UK } } + + +{ Name = Victoria Ashworth, Location = { City = London, Country = UK } } + + +{ Name = Elizabeth Brown, Location = { City = London, Country = UK } } + + +{ Name = Ann Devon, Location = { City = London, Country = UK } } + + +{ Name = Simon Crowther, Location = { City = London, Country = UK } } + + +{ Name = Hari Kumar, Location = { City = London, Country = UK } } + + + + + + +A better looking query, a better looking result, and it works no matter how many Select’s or Where’s I add, no matter how complex I make each projection. + + + + + + + + + + + +At least I’ll let you think that until I point out the next gaping hole. J + + + + + +Until next time! + diff --git a/docs/blog/building-part-06.md b/docs/blog/building-part-06.md new file mode 100644 index 0000000..571d872 --- /dev/null +++ b/docs/blog/building-part-06.md @@ -0,0 +1,727 @@ +#LINQ: Building an IQueryable Provider – Part VI: Nested queries + +Matt Warren - MSFT; August 9, 2007 + +--- + +This is the sixth in a series of posts on how to build a LINQ IQueryable provider. If you have not read the previous posts you might want to file for a government grant and take sabbatical because you've got a lot of catching up to do. ?? + + +Complete list of posts in the Building an IQueryable Provider series + + +So, again you thought I was done with this series, that I've given up and moved on to greener pastures. You think that since Select works wonderfully that that's all you need to know to make your own IQueryable provider? Ha! There's loads more to know. And, by the way, Select is still broken. + + + + + +Finishing Select + +Broken? How is this possible? I thought you were this bigshot Microsoft uber developer that never made mistakes! Yet, you claim you gave us shoddy code? I cut and pasted that stuff already into my product and my boss says we 'go live' next Monday! How could you do this to me? Sniff. + + +Relax. It's not that broken. It's just a little bit broken. + + +Recall in the last post I invented these new expression nodes; Table, Column, Select and Projection. They worked great didn't they? Sure they did! The part that is broken is that I did not handle all the cases where they might appear. The case I handled was the most obvious, when Projection nodes appear on the top of the query tree. After all, since I'm only allowing for Select and Where anyway, the last operation must be one of those. The code actually assumed that to be true. + + +That's not the problem. + + +The problem is Projection nodes may also appear inside selector expressions themselves. For example, take a look at the following. + + + var query = from c in db.Customers + + + select new { + + + Name = c.ContactName, + + + Orders = from o in db.Orders + + + where o.CustomerID == c.CustomerID + + + select o + + + }; + + +I added a nested query in the select expression. This is very different than the queries I wrote before, which were only tabular. Now, I'm basically asking my provider to construct a hierarchy, each object is going to have a name and a collection of orders. How in the world am I going to do that? SQL can't even do that. And even if I wanted to disallow it outright, what happens if someone does write this? + + +Bang! I get an exception. Not the one I thought though. I guess the code is more buggy than I realized. I expected to get an exception when trying to compile the projector function, since this lovely query should leave a little ProjectionException fly in my selection expression soup. Didn't I claim that it was okay to invent my own expression nodes because no one was going to see them anyway? Ha! Looks like I was wrong. (The actual exception I get is from assembling a bad expression tree because I was mistyping the Projection nodes when I constructed them. That's going to have to be fixed.) + + +Assuming I fix the typing bug, what am I going to do about these nested projection nodes? I could just catch them and throw my own exception with an apologetic disclaimer that nesting is a no-no. But then I wouldn't be a good LINQ citizen, and I wouldn't have the fun of figuring out how to actually make it work. + + +So on to the good stuff. + + + + + +Nested Queries + +What I want to do when I see a nested ProjectionExpression is turn it into a nested query. SQL cannot actually do this, so I'm going to have to do it or something like it in my own code. However, I'm not going to shoot for a super advanced solution here, just one that actually retrieves the data. + + +Since the projector function has to be turned into executable code, I'm going to have to swap some piece of code into the expression where this ProjectionExpression node currently is. It's got to be something that constructs the Orders collection out of something. It can't come from the current DataReader since that guy only holds tabular results. Therefore its got to come from another DataReader. What I really need is something that turns a ProjectionExpression into a function that when executed returns this collection. + + +Now where have I seen something like that before? + + +Thinking... + + +Right. That's what my provider already does, more or less. Whew, I thought this was going to be difficult. My provider already converts an expression tree into a result sequence via the Execute method. I guess I'm already half way home! + + +So what I need to do is add a function to my good ol' ProjectionRow class that executes a nested query. It can figure out how to get back to the provider for me in order to do the actual work. + + +Here's the new code for ProjectionRow and ProjectionBuilder. + + public abstract class ProjectionRow { + + public abstract object GetValue(int index); + + public abstract IEnumerable ExecuteSubQuery(LambdaExpression query); + + } + + internal class ProjectionBuilder : DbExpressionVisitor { + + ParameterExpression row; + + string rowAlias; + + static MethodInfo miGetValue; + + static MethodInfo miExecuteSubQuery; + + + internal ProjectionBuilder() { + + if (miGetValue == null) { + + miGetValue = typeof(ProjectionRow).GetMethod("GetValue"); + + miExecuteSubQuery = typeof(ProjectionRow).GetMethod("ExecuteSubQuery"); + + } + + } + + + internal LambdaExpression Build(Expression expression, string alias) { + + this.row = Expression.Parameter(typeof(ProjectionRow), "row"); + + this.rowAlias = alias; + + Expression body = this.Visit(expression); + + return Expression.Lambda(body, this.row); + + } + + + protected override Expression VisitColumn(ColumnExpression column) { + + if (column.Alias == this.rowAlias) { + + return Expression.Convert(Expression.Call(this.row, miGetValue, Expression.Constant(column.Ordinal)), column.Type); + + } + + return column; + + } + + + protected override Expression VisitProjection(ProjectionExpression proj) { + + LambdaExpression subQuery = Expression.Lambda(base.VisitProjection(proj), this.row); + + Type elementType = TypeSystem.GetElementType(subQuery.Body.Type); + + MethodInfo mi = miExecuteSubQuery.MakeGenericMethod(elementType); + + return Expression.Convert( + + Expression.Call(this.row, mi, Expression.Constant(subQuery)), + + proj.Type + + ); + + } + + } + + + +So, just like I inject code to call GetValue when I see a ColumnExpression, I'm going to inject code to call ExecuteSubQuery when I see a ProjectionExpression. + + +I decided I needed to bundle up the projection and the parameter I was using to refer to my ProjectionRow, because as it turns out the ProjectionExpression also gets its ColumnExpressions converted. Luckily, there was already a class designed to do that, LambdaExpression, so I used it as the argument type for ExecuteSubQuery. + + +Also notice how I pass the subquery as a ConstantExpression. This is how I trick the Expression.Compile feature into not noticing that I've invented new nodes. Fortunately, I didn't want them to be compiled just yet anyway. + + +Next to take a look at is the changed ProjectionReader. Of course, the Enumerator now implements ExecuteSubQuery for me. + + internal class ProjectionReader : IEnumerable, IEnumerable { + + Enumerator enumerator; + + internal ProjectionReader(DbDataReader reader, Func projector, IQueryProvider provider) { + + this.enumerator = new Enumerator(reader, projector, provider); + + } + + + public IEnumerator GetEnumerator() { + + Enumerator e = this.enumerator; + + if (e == null) { + + throw new InvalidOperationException("Cannot enumerate more than once"); + + } + + this.enumerator = null; + + return e; + + } + + + IEnumerator IEnumerable.GetEnumerator() { + + return this.GetEnumerator(); + + } + + + class Enumerator : ProjectionRow, IEnumerator, IEnumerator, IDisposable { + + DbDataReader reader; + + T current; + + Func projector; + + IQueryProvider provider; + + + internal Enumerator(DbDataReader reader, Func projector, IQueryProvider provider) { + + this.reader = reader; + + this.projector = projector; + + this.provider = provider; + + } + + + public override object GetValue(int index) { + + if (index >= 0) { + + if (this.reader.IsDBNull(index)) { + + return null; + + } + + else { + + return this.reader.GetValue(index); + + } + + } + + throw new IndexOutOfRangeException(); + + } + + + public override IEnumerable ExecuteSubQuery(LambdaExpression query) { + + ProjectionExpression projection = (ProjectionExpression) new Replacer().Replace(query.Body, query.Parameters[0], Expression.Constant(this)); + + projection = (ProjectionExpression) Evaluator.PartialEval(projection, CanEvaluateLocally); + + IEnumerable result = (IEnumerable)this.provider.Execute(projection); + + List list = new List(result); + + if (typeof(IQueryable).IsAssignableFrom(query.Body.Type)) { + + return list.AsQueryable(); + + } + + return list; + + } + + + private static bool CanEvaluateLocally(Expression expression) { + + if (expression.NodeType == ExpressionType.Parameter || + + expression.NodeType.IsDbExpression()) { + + return false; + + } + + return true; + + } + + + public T Current { + + get { return this.current; } + + } + + + object IEnumerator.Current { + + get { return this.current; } + + } + + + public bool MoveNext() { + + if (this.reader.Read()) { + + this.current = this.projector(this); + + return true; + + } + + return false; + + } + + + public void Reset() { + + } + + + public void Dispose() { + + this.reader.Dispose(); + + } + + } + + } + + + +Now, you can see that when I construct the ProjectionReader I pass the instance of my provider here. I'm going to use that to execute the subquery down in the ExecuteSubQuery function. + + +Looking at ExectueSubQuery... Hey, what is that Replacer.Replace thing? + + +I haven't shown you that bit of magic yet. I will in just a moment. Let me explain what is going on in this method first. I've got the argument that is a LambdaExpression that holds onto the original ProjectionExpression in the body and the parameter that was used to reference the current ProjectionRow. That's all fine and dandy though. The problem I have is that I can't just execute the projection expression via a call back to my provider because all of the ColumnExpressions that used to reference the outer query (think join condition in the Where clause) are now GetValue expressions. + + +That's right, I've got references to the outer query in my sub query; chocolate in my peanut butter. I can't leave those particular calls to GetValue in the projection, because they would be trying to access columns that don't exist in the new query. What a dilema, Charlie Brown. + + +Thinking... + + +Aha! I've got it. Fortunately, all the data for those GetValue calls is readily available. It's sitting in the DataReader one object reference away from the code in ExecuteSubQuery. The data is available in the current row. So what I want to do is somehow 'evaluate' those little bits of expressions right here and now and force those sub expressions to call those GetValue methods. I wish I had code that could do that. That would be perfect. + + +Wait, isn't that what Evaluator.PartialEval does? Sure, but that won't work. Why? Because those silly little expressions have references to my ProjectionRow parameter, and ParameterExpressions are the rule that tell it not to eval the expression. If I could only get rid of those silly parameter references and instead have constants that point to my current running instance of ProjectionRow, then I could use Evaluator.PartialEval to turn those expressions into values! Life would be good. + + +How to do that? I need a tool that will search my expression tree and replace some nodes with other nodes. Yah, that's the ticket. + + +Here's something, I call it Replacer. It simply walks the tree looking for references to one node instance and swapping it for references to a different node. + + internal class Replacer : DbExpressionVisitor { + + Expression searchFor; + + Expression replaceWith; + + internal Expression Replace(Expression expression, Expression searchFor, Expression replaceWith) { + + this.searchFor = searchFor; + + this.replaceWith = replaceWith; + + return this.Visit(expression); + + } + + protected override Expression Visit(Expression exp) { + + if (exp == this.searchFor) { + + return this.replaceWith; + + } + + return base.Visit(exp); + + } + + } + + +Beautiful! Sometimes I amaze even myself. + + +Okay, great, now I can swap out those nasty references to the ProjectionRow parameter with the real honest-to-goodness instance. That's what the first line in ExecuteSubQuery does. And it only took a few dozen lines of English to explain it. ?? + + +The second line calls Evaluate.PartialEval. Just what I wanted. The line after that calls the provider to execute! Hurray! Then I throw the results into a List object. Finally, I recognize that I might have to turn the result back into an IQueryable. Weird, I know, but the type of the 'Orders' property in the original query was IQueryable because that's how IQueryable query operators work, so C# invented the anonymous type using that for the member type. If I try to just return the list, the projector that combines the results together will be none-too-pleased. Fortunately, I already have a facility to turn IEnumerable's into IQueryables; Queryable.AsQueryable. + + +Wow! It's almost as if someone designed this stuff to work together. + + +Full disclosure: I did cheat a little bit. I had to modify the Evaluator class. I had to get it to understand my new expression types. I know, I know, I said no one else needed to know about them, but it is my code too, so I guess that's alright. I'll save that one-liner for you to view in the attached zip file. I only post mega-long code snippets, not measly one-liners. + + +I also had to invent a new CanEvaluateLocally rule for Evaluator to use. I needed to make sure that it would never think it was possible to evaluate one of my new nodes locally. + + +So now let's take a look on how DbQueryProvider changed + + public class DbQueryProvider : QueryProvider { + + DbConnection connection; + + TextWriter log; + + public DbQueryProvider(DbConnection connection) { + + this.connection = connection; + + } + + + public TextWriter Log { + + get { return this.log; } + + set { this.log = value; } + + } + + + public override string GetQueryText(Expression expression) { + + return this.Translate(expression).CommandText; + + } + + + public override object Execute(Expression expression) { + + return this.Execute(this.Translate(expression)); + + } + + + private object Execute(TranslateResult query) { + + Delegate projector = query.Projector.Compile(); + + + if (this.log != null) { + + this.log.WriteLine(query.CommandText); + + this.log.WriteLine(); + + } + + + DbCommand cmd = this.connection.CreateCommand(); + + cmd.CommandText = query.CommandText; + + DbDataReader reader = cmd.ExecuteReader(); + + + Type elementType = TypeSystem.GetElementType(query.Projector.Body.Type); + + return Activator.CreateInstance( + + typeof(ProjectionReader<>).MakeGenericType(elementType), + + BindingFlags.Instance | BindingFlags.NonPublic, null, + + new object[] { reader, projector, this }, + + null + + ); + + } + + + internal class TranslateResult { + + internal string CommandText; + + internal LambdaExpression Projector; + + } + + + private TranslateResult Translate(Expression expression) { + + ProjectionExpression projection = expression as ProjectionExpression; + + if (projection == null) { + + expression = Evaluator.PartialEval(expression); + + projection = (ProjectionExpression)new QueryBinder().Bind(expression); + + } + + string commandText = new QueryFormatter().Format(projection.Source); + + LambdaExpression projector = new ProjectionBuilder().Build(projection.Projector, projection.Source.Alias); + + return new TranslateResult { CommandText = commandText, Projector = projector }; + + } + + } + + + +The only thing that changed is my Translate method. It recognizes when it is handed a ProjectionExpression and chooses not do the work to turn an users query expression into a ProjectionExpression. Instead, it just skips down to the step that builds the command text and projection. + + +Did I forget to mention I added a 'Log' feature just like LINQ to SQL has. That will help us see what's going on. I added it here in my Context class too. + + public class Northwind { + + public Query Customers; + + public Query Orders; + + private DbQueryProvider provider; + + public Northwind(DbConnection connection) { + + this.provider = new DbQueryProvider(connection); + + this.Customers = new Query(this.provider); + + this.Orders = new Query(this.provider); + + } + + + public TextWriter Log { + + get { return this.provider.Log; } + + set { this.provider.Log = value; } + + } + + } + + + +Taking it for a Spin + +Now let's give this new magic mojo a spin. + + string city = "London"; + + var query = from c in db.Customers + + where c.City == city + + select new { + + Name = c.ContactName, + + Orders = from o in db.Orders + + where o.CustomerID == c.CustomerID + + select o + + }; + + foreach (var item in query) { + + Console.WriteLine("{0}", item.Name); + + foreach (var ord in item.Orders) { + + Console.WriteLine("\tOrder: {0}", ord.OrderID); + + } + + } + + + + + + +Run this and it outputs the following: + + + + +Thomas Hardy + Order: 10355 + Order: 10383 + Order: 10453 + Order: 10558 + Order: 10707 + Order: 10741 + Order: 10743 + Order: 10768 + Order: 10793 + Order: 10864 + Order: 10920 + Order: 10953 + Order: 11016 +Victoria Ashworth + Order: 10289 + Order: 10471 + Order: 10484 + Order: 10538 + Order: 10539 + Order: 10578 + Order: 10599 + Order: 10943 + Order: 10947 + Order: 11023 +Elizabeth Brown + Order: 10435 + Order: 10462 + Order: 10848 +Ann Devon + Order: 10364 + Order: 10400 + Order: 10532 + Order: 10726 + Order: 10987 + Order: 11024 + Order: 11047 + Order: 11056 +Simon Crowther + Order: 10517 + Order: 10752 + Order: 11057 +Hari Kumar + Order: 10359 + Order: 10377 + Order: 10388 + Order: 10472 + Order: 10523 + Order: 10547 + Order: 10800 + Order: 10804 + Order: 10869 + + + +Here are the queries it executed: (I used the new .Log property to capture these) + + +SELECT t2.ContactName, t2.CustomerID +FROM ( + SELECT t1.CustomerID, t1.ContactName, t1.Phone, t1.City, t1.Country + FROM ( + SELECT t0.CustomerID, t0.ContactName, t0.Phone, t0.City, t0.Country + FROM Customers AS t0 + ) AS t1 + WHERE (t1.City = 'London') +) AS t2 + + +SELECT t4.OrderID, t4.CustomerID, t4.OrderDate +FROM ( + SELECT t3.OrderID, t3.CustomerID, t3.OrderDate + FROM Orders AS t3 +) AS t4 +WHERE (t4.CustomerID = 'AROUT') +SELECT t4.OrderID, t4.CustomerID, t4.OrderDate +FROM ( + SELECT t3.OrderID, t3.CustomerID, t3.OrderDate + FROM Orders AS t3 +) AS t4 +WHERE (t4.CustomerID = 'BSBEV') +SELECT t4.OrderID, t4.CustomerID, t4.OrderDate +FROM ( + SELECT t3.OrderID, t3.CustomerID, t3.OrderDate + FROM Orders AS t3 +) AS t4 +WHERE (t4.CustomerID = 'CONSH') +SELECT t4.OrderID, t4.CustomerID, t4.OrderDate +FROM ( + SELECT t3.OrderID, t3.CustomerID, t3.OrderDate + FROM Orders AS t3 +) AS t4 +WHERE (t4.CustomerID = 'EASTC') +SELECT t4.OrderID, t4.CustomerID, t4.OrderDate +FROM ( + SELECT t3.OrderID, t3.CustomerID, t3.OrderDate + FROM Orders AS t3 +) AS t4 +WHERE (t4.CustomerID = 'NORTS') +SELECT t4.OrderID, t4.CustomerID, t4.OrderDate +FROM ( + SELECT t3.OrderID, t3.CustomerID, t3.OrderDate + FROM Orders AS t3 +) AS t4 +WHERE (t4.CustomerID = 'SEVES') + + + + + +Okay, maybe lots of extra little queries is not ideal. Still, its better than throwing an exception! + + + + + +Now, finally, Select is done. It really can handle any projection. Maybe. \ No newline at end of file diff --git a/docs/blog/building-part-07.md b/docs/blog/building-part-07.md new file mode 100644 index 0000000..04b1662 --- /dev/null +++ b/docs/blog/building-part-07.md @@ -0,0 +1,698 @@ +# LINQ: Building an IQueryable provider – Part VII: Join and SelectMany + +Matt Warren - MSFT; September 4, 2007 + +--- + + +This is the seventh in a series of posts on how to build a LINQ IQueryable provider. If you have not read the previous posts you might want to rethink your place in the universe. ?? +Complete list of posts in the Building an IQueryable Provider series +There's been a few weeks hiatus since the last installment. I hope that most of you have used the free time to explore building your own providers. I keep hearing about other people's LINQ to watchamacallit projects and it is encouraging. Today I want to show you how I went about adding 'join' capability to my provider. Finally, it will be able to do something more interesting than just select and where. + + +Implementing Join + +There are actually a couple different ways to represent a join using LINQ. In C# or VB if I use more than one 'from' clause I am performing a cross product, and if I match keys from one with keys from the other I am performing a join. + +var query = from c in db.Customers + + from o in db.Orders + + where c.CustomerID == o.CustomerID + + select new { c.ContactName, o.OrderDate }; + + +Of course, there is also the 'join' clause that is an explicit join. + +var query = from c in db.Customers + + join o in db.Orders on c.CustomerID equals o.CustomerID + + select new { c.ContactName, o.OrderDate }; + + +Both of these queries produce the same results. So why are there two ways to do the same thing? + + +The answer to that is a bit involved, but I'll try to make a stab at it here. The explicit join requires me to specify two key expressions to match; in database parlance this is known as an equi-join. The nested from clauses allow more flexibility. The reason the explicit join is so restrictive is that by being restrictive the LINQ to Objects implementation can be efficient in execution without needing to analyze and rewrite the query. The good news here is that almost all joins used in database queries are equi-joins. + + +Also, by not being as expressive the explicit join is much simpler to implement. In this post I'll actually implement both, but I'll start with the explicit join because it has fewer pitfalls. + + +The definition of the Queryable Join method looks like this: + +public static IQueryable Join( + this IQueryable outer, + IEnumerable inner, + Expression> outerKeySelector, + Expression> innerKeySelector, + Expression> resultSelector + ) + +That's a lot of arguments and a lot of generics! Fortunately, its not as hard to understand as it looks. The 'inner' and 'outer' parameters are referring to input sequences (the sequences on both sides of the join); each have their own key selector (the expressions that appear in the 'on' clause on opposite sides of the 'equals'); and finally an expression that is used to produce a result of the join. This last 'resultSelector' might be a bit confusing since it does not seem to appear in the C# or VB syntax. In fact, it actually does. In my example above it is the select expression. In other examples, not shown here, it might be a compiler generated projection that carries the data forward to the next query operation. + + +Either way, its rather straight forward to implement. In fact, I already have almost everything I need to implement it. What I don't have yet is a node to represent the join. + + +At least that's easy to remedy. + + internal enum DbExpressionType { + + Table = 1000, // make sure these don't overlap with ExpressionType + + Column, + + Select, + + Projection, + + Join + + } + +I modified my enum to make a new 'Join' node type, and then I implement a JoinExpression. + + internal enum JoinType { + + CrossJoin, + + InnerJoin, + + CrossApply, + + } + + internal class JoinExpression : Expression { + + JoinType joinType; + + Expression left; + + Expression right; + + Expression condition; + + internal JoinExpression(Type type, JoinType joinType, Expression left, Expression right, Expression condition) + + : base((ExpressionType)DbExpressionType.Join, type) { + + this.joinType = joinType; + + this.left = left; + + this.right = right; + + this.condition = condition; + + } + + internal JoinType Join { + + get { return this.joinType; } + + } + + internal Expression Left { + + get { return this.left; } + + } + + internal Expression Right { + + get { return this.right; } + + } + + internal new Expression Condition { + + get { return this.condition; } + + } + + } + + +I've also defined a JoinType enum and filled it out with all the join types I'll be needing to know about. 'CrossApply' is a SQL Server only join type. Ignore it for now, I don't need it for the equi-join. In fact, I only need the 'InnerJoin'. The other two come later. I told you this was the simpler case. + + +What about outer joins? That will have to be a topic for another post. ?? + + +Now, that I have a new JoinExpression I'll need to update my DbExpressionVisitor. + + internal class DbExpressionVisitor : ExpressionVisitor { + + protected override Expression Visit(Expression exp) { + ... + + switch ((DbExpressionType)exp.NodeType) { + ... + + case DbExpressionType.Join: + + return this.VisitJoin((JoinExpression)exp); + ... + + } + + } + ... + + protected virtual Expression VisitJoin(JoinExpression join) { + + Expression left = this.Visit(join.Left); + + Expression right = this.Visit(join.Right); + + Expression condition = this.Visit(join.Condition); + + if (left != join.Left || right != join.Right || condition != join.Condition) { + + return new JoinExpression(join.Type, join.Join, left, right, condition); + + } + + return join; + + } + + } + +So far so good. Now, I just update QueryFormatter to know how to produce SQL text out of this new node. + + internal class QueryFormatter : DbExpressionVisitor { + ... + + protected override Expression VisitSource(Expression source) { + + switch ((DbExpressionType)source.NodeType) { + ... + + case DbExpressionType.Join: + + this.VisitJoin((JoinExpression)source); + + break; + ... + + } + ... + + } + + protected override Expression VisitJoin(JoinExpression join) { + + this.VisitSource(join.Left); + + this.AppendNewLine(Indentation.Same); + + switch (join.Join) { + + case JoinType.CrossJoin: + + sb.Append("CROSS JOIN "); + + break; + + case JoinType.InnerJoin: + + sb.Append("INNER JOIN "); + + break; + + case JoinType.CrossApply: + + sb.Append("CROSS APPLY "); + + break; + + } + + this.VisitSource(join.Right); + + if (join.Condition != null) { + + this.AppendNewLine(Indentation.Inner); + + sb.Append("ON "); + + this.Visit(join.Condition); + + this.AppendNewLine(Indentation.Outer); + + } + + return join; + + } + + } + + +The idea here is that JoinExpression nodes appear in the same place as other query source expressions such as SelectExpression and TableExpression. Therefore, I've modified the VisitSource method to know about Joins, and I've added an implementation for VisitJoin. + + +Of couse, I'm not going to get anywhere if I don't know how to turn expression nodes calling the Queryable Join method into my new JoinExpression. What I need is a method in QueryBinder just like the BindSelect and BindWhere methods. This turns out to be the meat of the operation, however, it turns out to be rather straight-forward since I already have the support built in to handle the other operators. + + internal class QueryBinder : ExpressionVisitor { + ... + + protected override Expression VisitMethodCall(MethodCallExpression m) { + + if (m.Method.DeclaringType == typeof(Queryable) || + + m.Method.DeclaringType == typeof(Enumerable)) { + + switch (m.Method.Name) { + ... + + case "Join": + + return this.BindJoin( + + m.Type, m.Arguments[0], m.Arguments[1], + + (LambdaExpression)StripQuotes(m.Arguments[2]), + + (LambdaExpression)StripQuotes(m.Arguments[3]), + + (LambdaExpression)StripQuotes(m.Arguments[4]) + + ); + + } + + } + ... + + } + ... + + protected virtual Expression BindJoin(Type resultType, Expression outerSource, Expression innerSource, LambdaExpression outerKey, LambdaExpression innerKey, LambdaExpression resultSelector) { + + ProjectionExpression outerProjection = (ProjectionExpression)this.Visit(outerSource); + + ProjectionExpression innerProjection = (ProjectionExpression)this.Visit(innerSource); + + this.map[outerKey.Parameters[0]] = outerProjection.Projector; + + Expression outerKeyExpr = this.Visit(outerKey.Body); + + this.map[innerKey.Parameters[0]] = innerProjection.Projector; + + Expression innerKeyExpr = this.Visit(innerKey.Body); + + this.map[resultSelector.Parameters[0]] = outerProjection.Projector; + + this.map[resultSelector.Parameters[1]] = innerProjection.Projector; + + Expression resultExpr = this.Visit(resultSelector.Body); + + JoinExpression join = new JoinExpression(resultType, JoinType.InnerJoin, outerProjection.Source, innerProjection.Source, Expression.Equal(outerKeyExpr, innerKeyExpr)); + + string alias = this.GetNextAlias(); + + ProjectedColumns pc = this.ProjectColumns(resultExpr, alias, outerProjection.Source.Alias, innerProjection.Source.Alias); + + return new ProjectionExpression( + + new SelectExpression(resultType, alias, pc.Columns, join, null), + + pc.Projector + + ); + + } + + } + +Inside the BindJoin method It's almost like I'm handling two operators at once. I've got two sources, so I end up with two different source projections. I use these projections to seed the global map that is used to translate parameter references and then I translate each of the key expressions. Finally, the same goes for the result expression, except that the result expression can see both source projections instead of just one. + + +Once I have translated all the input expressions I have enough to represent the join, so I go ahead and construct the JoinExpression. Then I use ProjectColumns to build me a column list for the new SelectExpression that I'm going to wrap around the whole thing. Notice there is one small change in ProjectColumns. It allows me to specify more than one pre-existing alias. This is important, because with a Join there are actually two aliases that my result expression may be referring to. + + +That's it. I'm actually done. Join should work as advertised. + + +Let's try it out. + +var query = from c in db.Customers + + where c.CustomerID == "ALFKI" + + join o in db.Orders on c.CustomerID equals o.CustomerID + + select new { c.ContactName, o.OrderDate }; + +Console.WriteLine(query); + + +foreach (var item in query) { + + Console.WriteLine(item); + +} + + + +Running this produces the following output: + + +SELECT t2.ContactName, t4.OrderDate +FROM ( + SELECT t1.CustomerID, t1.ContactName, t1.Phone, t1.City, t1.Country + FROM ( + SELECT t0.CustomerID, t0.ContactName, t0.Phone, t0.City, t0.Country + FROM Customers AS t0 + ) AS t1 + WHERE (t1.CustomerID = 'ALFKI') +) AS t2 +INNER JOIN ( + SELECT t3.OrderID, t3.CustomerID, t3.OrderDate + FROM Orders AS t3 +) AS t4 + ON (t2.CustomerID = t4.CustomerID) +{ ContactName = Maria Anders, OrderDate = 8/25/1997 12:00:00 AM } +{ ContactName = Maria Anders, OrderDate = 10/3/1997 12:00:00 AM } +{ ContactName = Maria Anders, OrderDate = 10/13/1997 12:00:00 AM } +{ ContactName = Maria Anders, OrderDate = 1/15/1998 12:00:00 AM } +{ ContactName = Maria Anders, OrderDate = 3/16/1998 12:00:00 AM } +{ ContactName = Maria Anders, OrderDate = 4/9/1998 12:00:00 AM } + + + +Now for the hard stuff. ?? + + +Implementing SelectMany + +If you've got any experience with SQL you are probably confounded by my insistence that nested 'from' case should be difficult. After all, with SQL this might only seem to be the difference between a CROSS JOIN and an INNER JOIN. For those non-SQL inclined a CROSS JOIN is not really a join at all, it's a cross product. To make it a join, it relies on a join condition placed in the WHERE clause to do the actually joining. So the only real difference, as far as SQL is concerned, is a CROSS JOIN puts is join condition in the WHERE clause and a INNER JOIN puts its join condition in the ON clause. There shouldn't be so much fuss. + + +Oh, but there is. There's a lot of fuss. The problem isn't in the SQL, its more about what's not in the SQL. You see a LINQ nested from is not the same thing as a CROSS JOIN. Sometimes it's the same, but not always. + + +The problem comes down to what is visible when. A join takes two completely independent sub queries and joins them together using a join condition. The join condition can see columns from each side of the join, but that's the only expression that can. A LINQ nested from is very different. The inner source expression can see items enumerated from the outer source. Think of them as nested foreach loops. The inner one can reference the variable for the outer one. + + +The problem comes in finding a suitable translation for queries that are specified with these rogue references to the outer variable. + + +If your query looks like this then no problem: + +var query = from c in db.Customers + + from o in db.Orders + + where c.CustomerID == o.CustomerID + + select new { c.ContactName, o.OrderDate }; + +This translates into method calls that look like this: + +var query = db.Customers + + .SelectMany(c => db.Orders, (c, o) => new { c, o }) + + .Where(x => x.c.CustomerID == x.o.CustomerID) + + .Select(x => new { x.c.ContactName, x.o.OrderDate }); + + +The SelectMany's collection expression 'db.Orders' never references anything from 'c'. This is easy to translate to SQL since we can simply put db.Customers and db.Orders on opposite sides of a join. + + +However, if you simply change how you write the query to this: + +var query = from c in db.Customers + + from o in db.Orders.Where(o => o.CustomerID == c.CustomerID) + + select new { c.ContactName, o.OrderDate }; + +Now, you've got a very different beast. Translated to method calls this becomes: + +var query = db.Customers + + .SelectMany( + + c => db.Orders.Where(o => c.CustomerID == o.CustomerID), + + (c, o) => new { c.ContactName, o.OrderDate } + + ); + +Now the join condition exists as part of the SelectMany's collection expression, so it references 'c'. Translation can no longer simply be a process of putting both source expressions on either side of a join in SQL, whether CROSS or INNER. + + +So how am I going to solve this problem? I'm not. Not really. I'm only going to solve it in the crudest sense. I'm going to let Microsoft's SQL solve it for me, mostly. Microsoft SQL2005 introduced a new type of join operator, CROSS APPLY, that has the exact same semantics as what I'm looking for, a very happy coincidence indeed. A CROSS APPLY's right-hand-side expression can reference columns from the left-hand-side. That's why I included it when defining the JoinType enum. + + +A large part of LINQ to SQL translation engine exists to reduce CROSS APPLY's into CROSS JOIN in an opportunistic fashion. Without this work, LINQ to SQL would likely not work very well with SQL2000, and even so it is not always possible. Adding capability to do this to the sample provider would also be a lot of work that I'm reluctant to do right now. However, I'm not entirely unfeeling, so I've thrown in a bone. I'm going to catch the easy case and convert that to a CROSS JOIN. + + +So let's take a look at the code. + + internal class QueryBinder : ExpressionVisitor { + + protected override Expression VisitMethodCall(MethodCallExpression m) { + + if (m.Method.DeclaringType == typeof(Queryable) || + + m.Method.DeclaringType == typeof(Enumerable)) { + + switch (m.Method.Name) { + ... + + case "SelectMany": + + if (m.Arguments.Count == 2) { + + return this.BindSelectMany( + + m.Type, m.Arguments[0], + + (LambdaExpression)StripQuotes(m.Arguments[1]), + + null + + ); + + } + + else if (m.Arguments.Count == 3) { + + return this.BindSelectMany( + + m.Type, m.Arguments[0], + + (LambdaExpression)StripQuotes(m.Arguments[1]), + + (LambdaExpression)StripQuotes(m.Arguments[2]) + + ); + + } + + break; + ... + + } + + } + ... + + } + + protected virtual Expression BindSelectMany(Type resultType, Expression source, LambdaExpression collectionSelector, LambdaExpression resultSelector) { + + ProjectionExpression projection = (ProjectionExpression)this.Visit(source); + + this.map[collectionSelector.Parameters[0]] = projection.Projector; + + ProjectionExpression collectionProjection = (ProjectionExpression)this.Visit(collectionSelector.Body); + + JoinType joinType = IsTable(collectionSelector.Body) ? JoinType.CrossJoin : JoinType.CrossApply; + + JoinExpression join = new JoinExpression(resultType, joinType, projection.Source, collectionProjection.Source, null); + + string alias = this.GetNextAlias(); + + ProjectedColumns pc; + + if (resultSelector == null) { + + pc = this.ProjectColumns(collectionProjection.Projector, alias, projection.Source.Alias, collectionProjection.Source.Alias); + + } + + else { + + this.map[resultSelector.Parameters[0]] = projection.Projector; + + this.map[resultSelector.Parameters[1]] = collectionProjection.Projector; + + Expression result = this.Visit(resultSelector.Body); + + pc = this.ProjectColumns(result, alias, projection.Source.Alias, collectionProjection.Source.Alias); + + } + + return new ProjectionExpression( + + new SelectExpression(resultType, alias, pc.Columns, join, null), + + pc.Projector + + ); + + } + + + private bool IsTable(Expression expression) { + + ConstantExpression c = expression as ConstantExpression; + + return c != null && IsTable(c.Value); + + } + ... + + } + + + +The first interesting thing to note is that there are two different forms of SelectMany that are interesting. The first form takes a source expression and a collectionSelector expression. It produces a sequence of the same elements that are produced in the collectionSelector, only merging all the individual sequences together. The second form adds the resultSelector expression that lets you project your own result out of the two joined items. I've implemented BindSelectMany to work with or without the resultSelector being specified. + + +Note that on the fourth line of the function I determine which kind of Join I'm going to use to represent the SelectMany call. If I can determine that the collectionSelector is just a simple table query then I know it cannot have references to outer query variable (the parameter to the collectionSelector lambda expression). Therefore I know I can safely chose to use the CROSS JOIN instead of the CROSS APPLY. If I wanted to be a bit more sophisticated I could have written a visitor to prove that the collectionSelector made no references. Maybe I'll do that next time. I have a feeling I'm going to need to know this for other reasons. Yet for now this is the simple test I'm going to use. + + +All in all, this code is not too different from the BindJoin function or the others. I do have to handle the case without the resultSelector. In this case, I simply get to reuse the collectionProjection again as the final projection. + + + + + +Let's try the new code too. + +var query = from c in db.Customers + + where c.CustomerID == "ALFKI" + + from o in db.Orders + + where c.CustomerID == o.CustomerID + + select new { c.ContactName, o.OrderDate }; + +Console.WriteLine(query); + + +foreach (var item in query) { + + Console.WriteLine(item); + +} + + + +Running this code now produces the following results: + + +SELECT t6.ContactName, t6.OrderDate +FROM ( + SELECT t5.CustomerID, t5.ContactName, t5.Phone, t5.City, t5.Country, t5.OrderID, t5.CustomerID1, t5.OrderDate + FROM ( + SELECT t2.CustomerID, t2.ContactName, t2.Phone, t2.City, t2.Country, t4.OrderID, t4.CustomerID AS CustomerID1, t4.OrderDate + FROM ( + SELECT t1.CustomerID, t1.ContactName, t1.Phone, t1.City, t1.Country + FROM ( + SELECT t0.CustomerID, t0.ContactName, t0.Phone, t0.City, t0.Country + FROM Customers AS t0 + ) AS t1 + WHERE (t1.CustomerID = 'ALFKI') + ) AS t2 + CROSS JOIN ( + SELECT t3.OrderID, t3.CustomerID, t3.OrderDate + FROM Orders AS t3 + ) AS t4 + ) AS t5 + WHERE (t5.CustomerID = t5.CustomerID1) +) AS t6 + +{ ContactName = Maria Anders, OrderDate = 8/25/1997 12:00:00 AM } +{ ContactName = Maria Anders, OrderDate = 10/3/1997 12:00:00 AM } +{ ContactName = Maria Anders, OrderDate = 10/13/1997 12:00:00 AM } +{ ContactName = Maria Anders, OrderDate = 1/15/1998 12:00:00 AM } +{ ContactName = Maria Anders, OrderDate = 3/16/1998 12:00:00 AM } +{ ContactName = Maria Anders, OrderDate = 4/9/1998 12:00:00 AM } + + +Yikes! The queries are starting too look a bit hefty. I guess that's what happens when I keep mindlessly adding new select layers. Maybe one of these days I'll figure out a way to reduce this to just what is necessary. ?? + + +Of course, If I write the query in such a way that it does not pass my simple check I'm going to get a CROSS APPLY. + +var query = db.Customers + + .Where(c => c.CustomerID == "ALFKI") + + .SelectMany( + + c => db.Orders.Where(o => c.CustomerID == o.CustomerID), + + (c, o) => new { c.ContactName, o.OrderDate } + + ); + +Console.WriteLine(query); + + +foreach (var item in query) { + + Console.WriteLine(item); + +} + + + +This code produces the following: + + +SELECT t2.ContactName, t5.OrderDate +FROM ( + SELECT t1.CustomerID, t1.ContactName, t1.Phone, t1.City, t1.Country + FROM ( + SELECT t0.CustomerID, t0.ContactName, t0.Phone, t0.City, t0.Country + FROM Customers AS t0 + ) AS t1 + WHERE (t1.CustomerID = 'ALFKI') +) AS t2 +CROSS APPLY ( + SELECT t4.OrderID, t4.CustomerID, t4.OrderDate + FROM ( + SELECT t3.OrderID, t3.CustomerID, t3.OrderDate + FROM Orders AS t3 + ) AS t4 + WHERE (t2.CustomerID = t4.CustomerID) +) AS t5 + +{ ContactName = Maria Anders, OrderDate = 8/25/1997 12:00:00 AM } +{ ContactName = Maria Anders, OrderDate = 10/3/1997 12:00:00 AM } +{ ContactName = Maria Anders, OrderDate = 10/13/1997 12:00:00 AM } +{ ContactName = Maria Anders, OrderDate = 1/15/1998 12:00:00 AM } +{ ContactName = Maria Anders, OrderDate = 3/16/1998 12:00:00 AM } +{ ContactName = Maria Anders, OrderDate = 4/9/1998 12:00:00 AM } + + + +Exactly what I was expecting! + + +Now my provider handles Join and SelectMany calls. Do I hear a "woot woot" out there? Maybe my ears are just ringing. This provider does a lot, but there are still obvious holes to fill and operators to implement. I should get paid for doing this. \ No newline at end of file diff --git a/docs/blog/building-part-08.md b/docs/blog/building-part-08.md new file mode 100644 index 0000000..ddfe78f --- /dev/null +++ b/docs/blog/building-part-08.md @@ -0,0 +1,822 @@ +# LINQ: Building an IQueryable Provider – Part VIII: OrderBy + +Matt Warren - MSFT; October 9, 2007 + +--- + +This is the eighth in a series of posts on how to build a LINQ IQueryable provider. If you have not read the previous posts you might want to shut off the Halo 3 and get crackin'. +Complete list of posts in the Building an IQueryable Provider series +This post has a been a few weeks in coming. I realize that many of you have been anxiously awaiting the next installment. Your custom LINQ providers have been left alone collecting dust when they'd really rather be out on the town dazzling passers-by. + + +Implementing OrderBy + +Today's topic is translating those order-by clauses. Fortunately, there is only one way to do ordering, and that's using the LINQ ordering specific query operators. The bad news is that there are four different operators. + + +When I write a query using the query syntax ordering looks simple. It's just one clause. + + var query = from c in db.Customers + + orderby c.Country, c.City + + select c; + +However, when that syntax is translated into LINQ there is more than one LINQ operator involved. + + var query = db.Customers.OrderBy(c => c.Country).ThenBy(c => c.City); + +In fact, there is one ordering operator per ordering expression specified. So LINQ providers translating to SQL will need to convert this pattern of individual operators back into a single clause. The code to do this is going to be just a wee bit more involved than the previous operator translations, primarily because it needs to gather all these expressions up before it can do anything with them. The prior operators had the luxury of being able to simply build a new outer select layer over the previous query accumulation. They only ever had to consider their immediate arguments. Ordering, instead, must delve deep into the otherness and understand more than just itself. + + +The first thing I'm going to need is a way to represent the order-by clause. The simplest thing to do is simply add this ordering stuff right into the SelectExpression I already have. Yet, since each ordering expression can have a direction, ascending or descending, I'm going to need to remember that too. + + +So I'm going to add these new definitions: + + internal enum OrderType { + + Ascending, + + Descending + + } + + internal class OrderExpression { + + OrderType orderType; + + Expression expression; + + internal OrderExpression(OrderType orderType, Expression expression) { + + this.orderType = orderType; + + this.expression = expression; + + } + + internal OrderType OrderType { + + get { return this.orderType; } + + } + + internal Expression Expression { + + get { return this.expression; } + + } + + } + + + +The new type 'OrderExpression' is not really an Expression node, since I'm not expecting to be able to stick one anywhere in the tree. They will only every appear in a SelectExpression as part of its definition. Therefore, SelectExpression needs to change a little bit. + + internal class SelectExpression : Expression { + ... + + ReadOnlyCollection orderBy; + + internal SelectExpression( + + Type type, string alias, IEnumerable columns, + + Expression from, Expression where, IEnumerable orderBy) + + : base((ExpressionType)DbExpressionType.Select, type) { + ... + + this.orderBy = orderBy as ReadOnlyCollection; + + if (this.orderBy == null && orderBy != null) { + + this.orderBy = new List(orderBy).AsReadOnly(); + + } + + } + ... + + internal ReadOnlyCollection OrderBy { + + get { return this.orderBy; } + + } + + } + + + +Of course, I need to update DbExpressionVisitor to know about this ordering stuff: + + internal class DbExpressionVisitor : ExpressionVisitor { + ... + + protected virtual Expression VisitSelect(SelectExpression select) { + + Expression from = this.VisitSource(select.From); + + Expression where = this.Visit(select.Where); + + ReadOnlyCollection columns = this.VisitColumnDeclarations(select.Columns); + + ReadOnlyCollection orderBy = this.VisitOrderBy(select.OrderBy); + + if (from != select.From || where != select.Where || columns != select.Columns || orderBy != select.OrderBy) { + + return new SelectExpression(select.Type, select.Alias, columns, from, where, orderBy); + + } + + return select; + + } + ... + protected ReadOnlyCollection VisitOrderBy(ReadOnlyCollection expressions) { + + if (expressions != null) { + + List alternate = null; + + for (int i = 0, n = expressions.Count; i < n; i++) { + + OrderExpression expr = expressions[i]; + + Expression e = this.Visit(expr.Expression); + + if (alternate == null && e != expr.Expression) { + + alternate = expressions.Take(i).ToList(); + + } + + if (alternate != null) { + + alternate.Add(new OrderExpression(expr.OrderType, e)); + + } + + } + + if (alternate != null) { + + return alternate.AsReadOnly(); + + } + + } + + return expressions; + + } + + } + +And I have to fix up the other places in the code that I construct SelectExpressions, but that's relatively easy. + + +Converting order-by clauses back to text is also not so bad. + + internal class QueryFormatter : DbExpressionVisitor { + ... + + protected override Expression VisitSelect(SelectExpression select) { + ... + + if (select.OrderBy != null && select.OrderBy.Count > 0) { + + this.AppendNewLine(Indentation.Same); + + sb.Append("ORDER BY "); + + for (int i = 0, n = select.OrderBy.Count; i < n; i++) { + + OrderExpression exp = select.OrderBy[i]; + + if (i > 0) { + + sb.Append(", "); + + } + + this.Visit(exp.Expression); + + if (exp.OrderType != OrderType.Ascending) { + + sb.Append(" DESC"); + + } + + } + + } + ... + + } + ... + + } + + +The heavy lifting comes in the QueryBinder where I need to build up the ordering clause out of these method calls. What I decided to do was build up a list of these ordering expressions and evaluate them all at once. Since ThenBy and ThenByDescending operators must follow other ordering operators, its easy to walk down the tree, throwing each of these in a collection until I reach the root of the order-by clause, a call to OrderBy or OrderByDescending. + + internal class QueryBinder : ExpressionVisitor { + + + + + + + + ... + protected override Expression VisitMethodCall(MethodCallExpression m) { + + if (m.Method.DeclaringType == typeof(Queryable) || + + m.Method.DeclaringType == typeof(Enumerable)) { + ... + + switch (m.Method.Name) { + + case "OrderBy": + + return this.BindOrderBy(m.Type, m.Arguments[0], (LambdaExpression)StripQuotes(m.Arguments[1]), OrderType.Ascending); + + case "OrderByDescending": + + return this.BindOrderBy(m.Type, m.Arguments[0], (LambdaExpression)StripQuotes(m.Arguments[1]), OrderType.Descending); + + case "ThenBy": + + return this.BindThenBy(m.Arguments[0], (LambdaExpression)StripQuotes(m.Arguments[1]), OrderType.Ascending); + + case "ThenByDescending": + + return this.BindThenBy(m.Arguments[0], (LambdaExpression)StripQuotes(m.Arguments[1]), OrderType.Descending); + + } + + } + ... + + } + + + List thenBys; + + protected virtual Expression BindOrderBy(Type resultType, Expression source, LambdaExpression orderSelector, OrderType orderType) { + + List myThenBys = this.thenBys; + + this.thenBys = null; + + ProjectionExpression projection = (ProjectionExpression)this.Visit(source); + + + this.map[orderSelector.Parameters[0]] = projection.Projector; + + List orderings = new List(); + + orderings.Add(new OrderExpression(orderType, this.Visit(orderSelector.Body))); + + + if (myThenBys != null) { + + for (int i = myThenBys.Count - 1; i >= 0; i--) { + + OrderExpression tb = myThenBys[i]; + + LambdaExpression lambda = (LambdaExpression)tb.Expression; + + this.map[lambda.Parameters[0]] = projection.Projector; + + orderings.Add(new OrderExpression(tb.OrderType, this.Visit(lambda.Body))); + + } + + } + + + string alias = this.GetNextAlias(); + + ProjectedColumns pc = this.ProjectColumns(projection.Projector, alias, projection.Source.Alias); + + return new ProjectionExpression( + + new SelectExpression(resultType, alias, pc.Columns, projection.Source, null, orderings.AsReadOnly()), + + pc.Projector + + ); + + } + + + protected virtual Expression BindThenBy(Expression source, LambdaExpression orderSelector, OrderType orderType) { + + if (this.thenBys == null) { + + this.thenBys = new List(); + + } + + this.thenBys.Add(new OrderExpression(orderType, orderSelector)); + + return this.Visit(source); + + } + + ... + + } + + + +When a call to BindThenBy is made (used for ThenBy's and ThenByDescending's), the call's arguments are just appended to a growing list of then-by info. I re-use the OrderExpression to store the then-by's since its the same layout. Later, when BindOrderBy is called, the binding logic can then bind everything and build up single SelectExpression. Note that when I do bind the then-by's, I iterate over the collection in reverse since the then-by's were collected backward. + + +Now, with all that in place, everything should be ready to go. + + +Giving it a whirl, I see that the query: + + var query = from c in db.Customers + + orderby c.Country, c.City + + select c; + +Is translated into: + + + SELECT t1.CustomerID, t1.ContactName, t1.Phone, t1.City, t1.Country + FROM ( + SELECT t0.CustomerID, t0.ContactName, t0.Phone, t0.City, t0.Country + FROM Customers AS t0 + ) AS t1 + ORDER BY t1.Country, t1.City + + +Which is exactly what I wanted. Wahoo! + + + + + +Unfortunately, that's not the end of the story. You probably knew I was going to say that. You were probably thinking, hey, it seems he's about wrapped up on the feature yet there is a whole bunch more text to read. He can't possibly be done. There must be something wrong. He's trying to trick me. It's like one of those puzzle questions. Arrg! I hate puzzle questions. + + +Yes, there is something definitely wrong. In this example, ordering appeared to work fine. The translator translated the query, the server ate it up and presto, results ordered as requested. The problem lies in other potential examples. The truth is that LINQ is a bit more free with ordering than SQL. As its stands now, a slightly different example would generate SQL text that SQL databases would choke on. + + +LINQ allows me to put ordering expressions anywhere I please. SQL is very restrictive. There are a few exceptions, but mostly I'm constrained to having only a single order-by clause on the outer-most SQL Select statement. Just like in my example above. Yet what would happen if the order-by came earlier? What if I piled on some more LINQ operators that logically came after the ordering? + + +What if instead I had this query. + + var query = from c in db.Customers + + orderby c.City + + where c.Country == "UK" + + select c; + + +It's very similar to the one before, yet I now have a where clause after the orderby. I can't do that with SQL. Even if I could, what kind of SQL would my provider generate? + + SELECT t2.City, t2.Country, t2.CustomerID, t2.ContactName, t2.Phone + FROM ( + SELECT t1.City, t1.Country, t1.CustomerID, t1.ContactName, t1.Phone + FROM ( + SELECT t0.City, t0.Country, t0.CustomerID, t0.ContactName, t0.Phone + FROM Customers AS t0 + ) AS t1 + ORDER BY t1.City + ) AS t2 + WHERE (t2.Country = 'UK') +Egads! That's definitely not going to fly. Aside from the fact that this query is getting out of control size-wise, the order-by is now nested and that should not happen. At least if I don't want the user getting exceptions anytime they deviate from the basic ho-hum query. + + +It even fails when I add a simple projection. + + var query = from c in db.Customers + + orderby c.City + + select new { c.Country, c.City, c.ContactName }; + + +This translates with the same problem. + + + SELECT t2.Country, t2.City, t2.ContactName + FROM ( + SELECT t1.City, t1.Country, t1.ContactName, t1.CustomerID, t1.Phone + FROM ( + SELECT t0.City, t0.Country, t0.ContactName, t0.CustomerID, t0.Phone + FROM Customers AS t0 + ) AS t1 + ORDER BY t1.City + ) AS t2 +Clearly something needs to be done. The question is, what? + + + + + +Of course, I already have a solution in case you were wondering. It pretty much entails fixing up the query tree to abide by SQL's rules about ordering. This means lifting the order-by expressions out from where they don't belong and adding them back where they do. + + +Which is not going to be easy. The query tree, based on LINQ expression nodes is immutable, meaning it cannot be changed. That's not the hard part because fortunately the visitors were written to recognize change automatically and build me a new immutable tree. The hard part is going to be making sure the table aliases all match up correctly and dealing with order-by's that reference columns that no longer exist in the output. + + +Looks like the real heavy lifting has not even started yet. + + + + + +Reordering for SQL's sake + +So how did I do it? I basically wrote another visitor that just deals with moving the order-by's around the tree. I've tried to simplify it as much as possible, but in the end its still a complicated piece of code. I could have attempted to integrate this rewriting into the binder itself, but that would have only added as much complication to an already involved process. It was better to separate it out so did not add confusion to the rest. + + +Take a look. + + /// + + /// Move order-bys to the outermost select + + /// + internal class OrderByRewriter : DbExpressionVisitor { + + IEnumerable gatheredOrderings; + + bool isOuterMostSelect; + + public OrderByRewriter() { + + } + + + public Expression Rewrite(Expression expression) { + + this.isOuterMostSelect = true; + + return this.Visit(expression); + + } + + + protected override Expression VisitSelect(SelectExpression select) { + + bool saveIsOuterMostSelect = this.isOuterMostSelect; + + try { + + this.isOuterMostSelect = false; + + select = (SelectExpression)base.VisitSelect(select); + + bool hasOrderBy = select.OrderBy != null && select.OrderBy.Count > 0; + + if (hasOrderBy) { + + this.PrependOrderings(select.OrderBy); + + } + + bool canHaveOrderBy = saveIsOuterMostSelect; + + bool canPassOnOrderings = !saveIsOuterMostSelect; + + IEnumerable orderings = (canHaveOrderBy) ? this.gatheredOrderings : null; + + ReadOnlyCollection columns = select.Columns; + + if (this.gatheredOrderings != null) { + + if (canPassOnOrderings) { + + HashSet producedAliases = new AliasesProduced().Gather(select.From); + + // reproject order expressions using this select's alias so the outer select will have properly formed expressions + + BindResult project = this.RebindOrderings(this.gatheredOrderings, select.Alias, producedAliases, select.Columns); + + this.gatheredOrderings = project.Orderings; + + columns = project.Columns; + + } + + else { + + this.gatheredOrderings = null; + + } + + } + + if (orderings != select.OrderBy || columns != select.Columns) { + + select = new SelectExpression(select.Type, select.Alias, columns, select.From, select.Where, orderings); + + } + + return select; + + } + + finally { + + this.isOuterMostSelect = saveIsOuterMostSelect; + + } + + } + + + protected override Expression VisitJoin(JoinExpression join) { + + // make sure order by expressions lifted up from the left side are not lost + + // when visiting the right side + + Expression left = this.VisitSource(join.Left); + + IEnumerable leftOrders = this.gatheredOrderings; + + this.gatheredOrderings = null; // start on the right with a clean slate + + Expression right = this.VisitSource(join.Right); + + this.PrependOrderings(leftOrders); + + Expression condition = this.Visit(join.Condition); + + if (left != join.Left || right != join.Right || condition != join.Condition) { + + return new JoinExpression(join.Type, join.Join, left, right, condition); + + } + + return join; + + } + + + /// + + /// Add a sequence of order expressions to an accumulated list, prepending so as + + /// to give precedence to the new expressions over any previous expressions + + /// + + /// + + protected void PrependOrderings(IEnumerable newOrderings) { + + if (newOrderings != null) { + + if (this.gatheredOrderings == null) { + + this.gatheredOrderings = newOrderings; + + } + + else { + + List list = this.gatheredOrderings as List; + + if (list == null) { + + this.gatheredOrderings = list = new List(this.gatheredOrderings); + + } + + list.InsertRange(0, newOrderings); + + } + + } + + } + + + protected class BindResult { + + ReadOnlyCollection columns; + + ReadOnlyCollection orderings; + + public BindResult(IEnumerable columns, IEnumerable orderings) { + + this.columns = columns as ReadOnlyCollection; + + if (this.columns == null) { + + this.columns = new List(columns).AsReadOnly(); + + } + + this.orderings = orderings as ReadOnlyCollection; + + if (this.orderings == null) { + + this.orderings = new List(orderings).AsReadOnly(); + + } + + } + + public ReadOnlyCollection Columns { + + get { return this.columns; } + + } + + public ReadOnlyCollection Orderings { + + get { return this.orderings; } + + } + + } + + + /// + + /// Rebind order expressions to reference a new alias and add to column declarations if necessary + + /// + + protected virtual BindResult RebindOrderings(IEnumerable orderings, string alias, HashSet existingAliases, IEnumerable existingColumns) { + + List newColumns = null; + + List newOrderings = new List(); + + foreach (OrderExpression ordering in orderings) { + + Expression expr = ordering.Expression; + + ColumnExpression column = expr as ColumnExpression; + + if (column == null || (existingAliases != null && existingAliases.Contains(column.Alias))) { + + // check to see if a declared column already contains a similar expression + + int iOrdinal = 0; + + foreach (ColumnDeclaration decl in existingColumns) { + + ColumnExpression declColumn = decl.Expression as ColumnExpression; + + if (decl.Expression == ordering.Expression || + + (column != null && declColumn != null && column.Alias == declColumn.Alias && column.Name == declColumn.Name)) { + + // found it, so make a reference to this column + + expr = new ColumnExpression(column.Type, alias, decl.Name, iOrdinal); + + break; + + } + + iOrdinal++; + + } + + // if not already projected, add a new column declaration for it + + if (expr == ordering.Expression) { + + if (newColumns == null) { + + newColumns = new List(existingColumns); + + existingColumns = newColumns; + + } + + string colName = column != null ? column.Name : "c" + iOrdinal; + + newColumns.Add(new ColumnDeclaration(colName, ordering.Expression)); + + expr = new ColumnExpression(expr.Type, alias, colName, iOrdinal); + + } + + newOrderings.Add(new OrderExpression(ordering.OrderType, expr)); + + } + + } + + return new BindResult(existingColumns, newOrderings); + + } + + } + +That's a whole lot of something! ?? + + +The main visitation algorithm works like this. As the visitor walks back up the tree it maintains a growing collection of order-by expressions. This is almost the opposite of what the binder was doing, as it collected then-by expressions as it walked down the tree. If both an outer level and inner level select node have order-by expressions, neither of the expressions are lost. The outer level ones simply take precedence by appearing before the inner ones. This happens in VisitSelect by calling the PrependOrderings function that adds the current order-by's as a block to the head of the growing list. + + +Next I decide if the current select node can even have order-by expressions in it. Currently, this is a simple test of whether the select node is the outermost select node or not. This question would be more interesting if I had TSQL's TOP clause available. I also determine whether this select node can pass on ordering information. Again, currently, this has to do with the nodes outermost levelness. If I had DISTINCT available, I would have shut off the order-by propagation. The reason comes clear when I take into consideration what comes next. The rebind. + + +After determining that this node must pass on its order-by's to an outer node, the order-by expressions themselves must be rewritten to appear as expressions that reference this select's table alias since these expressions are currently built to refer to aliases from an even deeper nesting. In addition, if any of these columns that are referenced in the order-by expressions are not available in this select node's column projection, then I need to add them to the projection so they are accessible from the outer select. This whole ball of wax is called rebinding, and is cordoned off into its own function RebindOrderings. + + +Now to get back to a prior point, if a given select node had also been distinct, it would have been incorrect to allow order-by expressions to introduce new columns into the projection. This would have changed the evaluation of the distinct operation as it would have been based on additional columns. No bother really right now since there is no distinct, but I might add it next week so its worth it to be thinking ahead. This is the actual reason that LINQ to SQL does not maintain ordering through a distinct or union operation. + + + + + +So putting it all together, I just need to modify DbQueryProvider to call this new visitor. + + public class DbQueryProvider : QueryProvider { + ... + + private TranslateResult Translate(Expression expression) { + + ProjectionExpression projection = expression as ProjectionExpression; + + if (projection == null) { + + expression = Evaluator.PartialEval(expression, CanBeEvaluatedLocally); + + expression = new QueryBinder(this).Bind(expression); + expression = new OrderByRewriter().Rewrite(expression); + projection = (ProjectionExpression)expression; + + } + + string commandText = new QueryFormatter().Format(projection.Source); + + LambdaExpression projector = new ProjectionBuilder().Build(projection.Projector, projection.Source.Alias); + + return new TranslateResult { CommandText = commandText, Projector = projector }; + + } + ... + + } + +Now, if I run this otherwise complicated query + + var query = from c in db.Customers + + orderby c.City + + where c.Country == "UK" + + select new { c.City, c.ContactName }; + +I get this translation in SQL + + + SELECT t3.City, t3.ContactName + FROM ( + SELECT t2.City, t2.Country, t2.ContactName, t2.CustomerID, t2.Phone + FROM ( + SELECT t1.City, t1.Country, t1.ContactName, t1.CustomerID, t1.Phone + FROM ( + SELECT t0.City, t0.Country, t0.ContactName, t0.CustomerID, t0.Phone + FROM Customers AS t0 + ) AS t1 + ) AS t2 + WHERE (t2.Country = 'UK') + ) AS t3 + ORDER BY t3.City +Which is far cry better than what it was generating before. + + +If I run it to completion, I get the following output: + + +{ City = Cowes, ContactName = Helen Bennett } +{ City = London, ContactName = Simon Crowther } +{ City = London, ContactName = Hari Kumar } +{ City = London, ContactName = Thomas Hardy } +{ City = London, ContactName = Victoria Ashworth } +{ City = London, ContactName = Elizabeth Brown } +{ City = London, ContactName = Ann Devon } + + + +There! Now that's ordering, or at least a good start. + + +Of course, it would still look better if I could reduce some of the unnecessary layers of sub-queries. Maybe next time. ?? + diff --git a/docs/blog/building-part-09.md b/docs/blog/building-part-09.md new file mode 100644 index 0000000..54688d7 --- /dev/null +++ b/docs/blog/building-part-09.md @@ -0,0 +1,746 @@ +# LINQ: Building an IQueryable Provider – Part IX: Removing redundant subqueries + +Matt Warren - MSFT; January 16, 2008 + +--- + +This is the nineth in a series of posts on how to build a LINQ IQueryable provider. If you have not read the previous posts here's a handy list of all the fun you've been missing. +Complete list of posts in the Building an IQueryable Provider series +It's now officially a trend that additional installments to this series take longer and longer to be produced. Blame the television writer's strike, I do. +Cleaning up the Mess + +I've been promising for a while to show you how I'm going to go about cleaning up the unnecessary layers of nested select expressions that my query translator has been accumulating. It's easy for a human brain to look at a query and realize that it could be written a lot simpler. However, its a lot easier for a computer program to just keep piling the layers on, after all the semantics are the same and the boon we get from keeping the program simple is nothing to sneer at. +It's easy to see the problem in a simple query with a where clause. + + from c in db.Customers + + where c.Country == "UK" + + select c; + + +This innocuous query turns into the following SQL: +SELECT t1.Country, t1.CustomerID, t1.ContactName, t1.Phone, t1.City +FROM ( + SELECT t0.Country, t0.CustomerID, t0.ContactName, t0.Phone, t0.City + FROM Customers AS t0 +) AS t1 +WHERE (t1.Country = 'UK') +Why the extra SELECT? It's easy to see why it happens when you know how the translation works and what the underlying LINQ query really is. +The LINQ query's method call syntax really looks like this: + + db.Customers.Where(c => c.Country == "UK").Select(c => c); + + +It has two LINQ query operators, Where() and Select(). My translation engine in the SqlBinder class translates both of these method calls into two separate SelectExpression's. +Ideally, the SQL query would have looked like this: +SELECT t0.Country, t0.CustomerID, t0.ContactName, t0.Phone, t0.City +FROM Customers AS t0 +WHERE (t0.Country = 'UK') + + +However, that's just the easy case. It gets worse as more operators are added. Did you think the translator was smart enough to merge multiple Where clauses together? I certainly did not add any code for that. It would be nice if the language compiler did it for me, but what about the case where additional Where() operators are added conditionally after the base query is already formed? + +var query = + + from c in db.Customers + + where c.Country == "UK" + + select c; + +... + +query = from c in query + + where c.Phone == "555-5555" + + select c; + + +This becomes a triple layer monstrosity, which would only be good if it were a sandwich. + + +SELECT t2.CustomerID, t2.ContactName, t2.Phone, t2.City, t2.Country +FROM ( + SELECT t1.CustomerID, t1.ContactName, t1.Phone, t1.City, t1.Country + FROM ( + SELECT t0.CustomerID, t0.ContactName, t0.Phone, t0.City, t0.Country + FROM Customers AS t0 + ) AS t1 + WHERE (t1.Country = 'UK') +) AS t2 +WHERE (t2.Phone = '555-5555') +And its not just the layering either. What happens when when I try to project out a subset of the data? + +var query = + + from c in db.Customers + + where c.Country == "UK" + + select c.CustomerID; + +This becomes the following: + + +SELECT t2.CustomerID +FROM ( + SELECT t1.CustomerID, t1.ContactName, t1.Phone, t1.City, t1.Country + FROM ( + SELECT t0.CustomerID, t0.ContactName, t0.Phone, t0.City, t0.Country + FROM Customers AS t0 + ) AS t1 + WHERE (t1.Country = 'UK') +) AS t2 +Why oh why do the nested queries keep reselecting data that's never being used? Hopefully the database engine is smart enough not to pipeline all that data when most of its never even referred to in the query or returned to the client. Yet, wouldn't it be a whole lot nicer if the query translator could reduce this madness down to a simpler form that an actual human might write? Then maybe someone would be able to make head or tails out of this one: + +var query = from c in db.Customers + + join o in db.Orders on c.CustomerID equals o.CustomerID + + let m = c.Phone + + orderby c.City + + where c.Country == "UK" + + where m != "555-5555" + + select new { c.City, c.ContactName } into x + + where x.City == "London" + + select x; + +I'm not even going to show you this one yet, as it might frighten you to the point of powering down the computer. + + +What I am going to do is show you what I've done about it, how I rolled up my sleeves and wrote some code that saves the day. It turned out not to be too horribly difficult. I had imagined working with the immutable expression tree would become more and more complex as the desired transformations became more and more interesting, given how complicated the order-by rewriter seemed to get. However, I was pleasantly surprised to find out that this clean-up logic was in fact turning out to be actually sort of clean. + + + + + +Removing Redudant Subqueries + +I first wanted to tackle how to get rid of redundant subqueries. For example, an identity select does absolutely nothing, yet its adds a whole new layer. What I needed was a way to remove select expressions that do nothing interesting. Of course, the first thing I needed to do was decide what made an expression redundant. Obviously, a select expression is redundant if it does not actually do anything at all, except for re-selecting columns from another select expression. Other select expressions would seem redundant if the only thing they do are operations that could have been combined with those of another select expression. + + +The next thing I needed to do was to figure out how to actually remove a select expression from an expression tree and end up with a tree that is actually legal. This step had an ominous feel that I was going to end up devising something even more arcane than the order-by rewriter. I was surprised that it turned out to be so simple. Take a look. + + internal class SubqueryRemover : DbExpressionVisitor + + { + + HashSet selectsToRemove; + + Dictionary> map; + + public Expression Remove(SelectExpression outerSelect, params SelectExpression[] selectsToRemove) + + { + + return Remove(outerSelect, (IEnumerable)selectsToRemove); + + } + + + public Expression Remove(SelectExpression outerSelect, IEnumerable selectsToRemove) + + { + + this.selectsToRemove = new HashSet(selectsToRemove); + + this.map = selectsToRemove.ToDictionary(d => d.Alias, d => d.Columns.ToDictionary(d2 => d2.Name, d2 => d2.Expression)); + + return this.Visit(outerSelect); + + } + + + protected override Expression VisitSelect(SelectExpression select) + + { + + if (this.selectsToRemove.Contains(select)) + + { + + return this.Visit(select.From); + + } + + else + + { + + return base.VisitSelect(select); + + } + + } + + + protected override Expression VisitColumn(ColumnExpression column) + + { + + Dictionary nameMap; + + if (this.map.TryGetValue(column.Alias, out nameMap)) + + { + + Expression expr; + + if (nameMap.TryGetValue(column.Name, out expr)) + + { + + return this.Visit(expr); + + } + + throw new Exception("Reference to undefined column"); + + } + + return column; + + } + + } + + + +That's it. This is a nice little class that will rewrite an expression tree and remove one or more select expressions from it, automatically fixing up all references to columns that go away. Looking at the two visit methods, the code looks trivial. When I see a select that's one of ones to be removed, I simply throw it away by returning its 'from' expression. When I see a column expression that is referencing a column that is declared in a select expression that is removed, I substitute the expression used in the declaration for the reference. As it turns out the most interesting piece of the whole visitor is figuring out the set of columns that are going away, and this is determined using a nice little LINQ query in the Remove method to construct a dictionary of dictionaries that tells me this. + + +Now all that's left is to write some code that actually figures out which subqueries are the redundant ones. + + +And here that is. + + internal class RedundantSubqueryRemover : DbExpressionVisitor + + { + + internal Expression Remove(Expression expression) + + { + + return this.Visit(expression); + + } + + protected override Expression VisitSelect(SelectExpression select) + + { + + select = (SelectExpression)base.VisitSelect(select); + + + // first remove all purely redundant subqueries + + List redundant = new RedundantSubqueryGatherer().Gather(select.From); + + if (redundant != null) + + { + + select = (SelectExpression)new SubqueryRemover().Remove(select, redundant); + + } + + + // next attempt to merge subqueries + + + // can only merge if subquery is a single select (not a join) + + SelectExpression fromSelect = select.From as SelectExpression; + + if (fromSelect != null) + + { + + // can only merge if subquery has simple-projection (no renames or complex expressions) + + if (HasSimpleProjection(fromSelect)) + + { + + // remove the redundant subquery + + select = (SelectExpression)new SubqueryRemover().Remove(select, fromSelect); + + // merge where expressions + + Expression where = select.Where; + + if (fromSelect.Where != null) + + { + + if (where != null) + + { + + where = Expression.And(fromSelect.Where, where); + + } + + else + + { + + where = fromSelect.Where; + + } + + } + + if (where != select.Where) + + { + + return new SelectExpression(select.Type, select.Alias, select.Columns, select.From, where, select.OrderBy); + + } + + } + + } + + + return select; + + } + + + private static bool IsRedudantSubquery(SelectExpression select) + + { + + return HasSimpleProjection(select) + + && select.Where == null + + && (select.OrderBy == null || select.OrderBy.Count == 0); + + } + + + private static bool HasSimpleProjection(SelectExpression select) + + { + + foreach (ColumnDeclaration decl in select.Columns) + + { + + ColumnExpression col = decl.Expression as ColumnExpression; + + if (col == null || decl.Name != col.Name) + + { + + // column name changed or column expression is more complex than reference to another column + + return false; + + } + + } + + return true; + + } + + + class RedundantSubqueryGatherer : DbExpressionVisitor + + { + + List redundant; + + + internal List Gather(Expression source) + + { + + this.Visit(source); + + return this.redundant; + + } + + + protected override Expression VisitSelect(SelectExpression select) + + { + + if (IsRedudantSubquery(select)) + + { + + if (this.redundant == null) + + { + + this.redundant = new List(); + + } + + this.redundant.Add(select); + + } + + return select; + + } + + } + + } + + +The RedundantSubqueryRemover is a bit more involved that the SubqueryRemover, but it basically has a simple algorithm. When it examines a given select expression it tries to determine if one or more sub-select's are redundant. To determine this set it uses anothger visitor, the RedundantSubqueryGatherer, which builds a list of redundant subqueries that can be reached without recursing down into any 'from' expressions. This allows it to consider all the sub select expressions that are in scope to the parent select expression, seeing through any join expression nodes that may exist. Once I have this list, I just use the SubqueryRemover to remove them. + + +Following that, I look to see if any select expressions can be merged together. The only interesting thing to consider at this time is whether a subquery would be considered redundant except for a where expression that can easily be combined into the outer select expression. If I find one of these, I go ahead and remove the subquery and add its where expression to the outer select expression. Yes, I'm surprised too that it works out that easily. + + + + + +Removing Unused Columns + +The second part of clean up is to get rid of unused columns. If I project into a smaller set of columns and don't even reference some of the others then why keep them in the query at all? It might not matter to the semantics of the query and it might not even make the query faster, but it will certainly be easier on the eyes, and it might make it possible to discover more redundant subqueries that might have slipped by simply because the subquery computed a column that is later ignored. + + +Of course, this turns out to be a lot more complicated than the other two, so I saved it for last. + + internal class UnusedColumnRemover : DbExpressionVisitor + + { + + Dictionary> allColumnsUsed; + + internal Expression Remove(Expression expression) + + { + + this.allColumnsUsed = new Dictionary>(); + + return this.Visit(expression); + + } + + + protected override Expression VisitColumn(ColumnExpression column) + + { + + HashSet columns; + + if (!this.allColumnsUsed.TryGetValue(column.Alias, out columns)) + + { + + columns = new HashSet(); + + this.allColumnsUsed.Add(column.Alias, columns); + + } + + columns.Add(column.Name); + + return column; + + } + + + protected override Expression VisitSelect(SelectExpression select) + + { + + // visit column projection first + + ReadOnlyCollection columns = select.Columns; + + + HashSet columnsUsed; + + if (this.allColumnsUsed.TryGetValue(select.Alias, out columnsUsed)) + + { + + List alternate = null; + + for (int i = 0, n = select.Columns.Count; i < n; i++) + + { + + ColumnDeclaration decl = select.Columns[i]; + + if (!columnsUsed.Contains(decl.Name)) + + { + + decl = null; // null means it gets omitted + + } + + else + + { + + Expression expr = this.Visit(decl.Expression); + + if (expr != decl.Expression) + + { + + decl = new ColumnDeclaration(decl.Name, decl.Expression); + + } + + } + + if (decl != select.Columns[i] && alternate == null) + + { + + alternate = new List(); + + for (int j = 0; j < i; j++) + + { + + alternate.Add(select.Columns[j]); + + } + + } + + if (decl != null && alternate != null) + + { + + alternate.Add(decl); + + } + + } + + if (alternate != null) + + { + + columns = alternate.AsReadOnly(); + + } + + } + + + ReadOnlyCollection orderbys = this.VisitOrderBy(select.OrderBy); + + Expression where = this.Visit(select.Where); + + Expression from = this.Visit(select.From); + + + if (columns != select.Columns || orderbys != select.OrderBy || where != select.Where || from != select.From) + + { + + return new SelectExpression(select.Type, select.Alias, columns, from, where, orderbys); + + } + + + return select; + + } + + + protected override Expression VisitProjection(ProjectionExpression projection) + + { + + // visit mapping in reverse order + + Expression projector = this.Visit(projection.Projector); + + SelectExpression source = (SelectExpression)this.Visit(projection.Source); + + if (projector != projection.Projector || source != projection.Source) + + { + + return new ProjectionExpression(source, projector); + + } + + return projection; + + } + + + protected override Expression VisitJoin(JoinExpression join) + + { + + // visit join in reverse order + + Expression condition = this.Visit(join.Condition); + + Expression right = this.VisitSource(join.Right); + + Expression left = this.VisitSource(join.Left); + + if (left != join.Left || right != join.Right || condition != join.Condition) + + { + + return new JoinExpression(join.Type, join.Join, left, right, condition); + + } + + return join; + + } + + } + + + +The reason is turns out to be so much more complicated is that in order to figure out what columns are used I have to examine the places where they are used before I get to the place where they are defined because that's where I need to remove them, which is exactly opposite of how the visitors normally flow through the expression tree, so instead of simply override a few pieces of a visitor I have to re-specify visit methods I could have otherwise ignored. + + +When I get to a select expression I have to examine the column declarations, where expression and order-by expressions before I recurse down into the from expression, because all of these expression may reference columns declared in the sub queries. The first thing I do is grovel through the set of column declarations and remove the ones that are not referenced by the layer above. The top level select expression's column expression are, of course, reference by the projection expression. Most of the code in VisitSelect is just the work of reassembling this list of declarations. The rest is pretty much boiler plate from the base DbExpressionVisitor, except in a reverse order. + + + + + +Putting It to the Test + +The last thing to do is to wire these new clean-up visitors into the query translation process. The appropriate place to do this is where all the other top level visitors are plugged in, the DbQueryProvider class. + + public class DbQueryProvider : QueryProvider { + ... + + private TranslateResult Translate(Expression expression) { + + ProjectionExpression projection = expression as ProjectionExpression; + + if (projection == null) { + + expression = Evaluator.PartialEval(expression, CanBeEvaluatedLocally); + + expression = new QueryBinder(this).Bind(expression); + + expression = new OrderByRewriter().Rewrite(expression); + + expression = new UnusedColumnRemover().Remove(expression); + + expression = new RedundantSubqueryRemover().Remove(expression); + + projection = (ProjectionExpression)expression; + + } + + string commandText = new QueryFormatter().Format(projection.Source); + + LambdaExpression projector = new ProjectionBuilder().Build(projection.Projector, projection.Source.Alias); + + return new TranslateResult { CommandText = commandText, Projector = projector }; + + } + ... + + } + + +I just added them in right after the OrderByRewriter, allowing unused columns to be removed first in case doing so helps the redundant subquery removal. + + +Now all I have to do is try it out. + + +How about that huge scary query? How bad could it be? + +var query = from c in db.Customers + + join o in db.Orders on c.CustomerID equals o.CustomerID + + let m = c.Phone + + orderby c.City + + where c.Country == "UK" + + where m != "555-5555" + + select new { c.City, c.ContactName } into x + + where x.City == "London" + + select x; + + +Without the new additions the translation is horrific. + + +SELECT t9.City, t9.ContactName +FROM ( + SELECT t8.City, t8.ContactName + FROM ( + SELECT t7.CustomerID, t7.ContactName, t7.Phone, t7.City, t7.Country, t7.OrderID, t7.CustomerID1, t7.OrderDate + FROM ( + SELECT t6.CustomerID, t6.ContactName, t6.Phone, t6.City, t6.Country, t6.OrderID, t6.CustomerID1, t6.OrderDate + FROM ( + SELECT t5.CustomerID, t5.ContactName, t5.Phone, t5.City, t5.Country, t5.OrderID, t5.CustomerID1, t5.OrderDate + FROM ( + SELECT t4.CustomerID, t4.ContactName, t4.Phone, t4.City, t4.Country, t4.OrderID, t4.CustomerID1, t4.OrderDate + FROM ( + SELECT t1.CustomerID, t1.ContactName, t1.Phone, t1.City, t1.Country, t3.OrderID, t3.CustomerID AS CustomerID1, t3.OrderDate + FROM ( + SELECT t0.CustomerID, t0.ContactName, t0.Phone, t0.City, t0.Country + FROM Customers AS t0 + ) AS t1 + INNER JOIN ( + SELECT t2.OrderID, t2.CustomerID, t2.OrderDate + FROM Orders AS t2 + ) AS t3 + ON (t1.CustomerID = t3.CustomerID) + ) AS t4 + ) AS t5 + ) AS t6 + WHERE (t6.Country = 'UK') + ) AS t7 + WHERE (t7.Phone <> '555-5555') + ) AS t8 +) AS t9 +WHERE (t9.City = 'London') +ORDER BY t9.City +When I gather the courage to look I start to worry about exceeding some maximum query size sending this to the server. Clearly, if the table had a more realistic number of columns the size would explode comparatively. + + +Now take a look at the query produced when the new clean-up code is enabled. + + +SELECT t0.City, t0.ContactName +FROM Customers AS t0 +INNER JOIN Orders AS t2 + ON (t0.CustomerID = t2.CustomerID) +WHERE (((t0.Country = 'UK') AND (t0.Phone <> '555-5555')) AND (t0.City = 'London')) +ORDER BY t0.City +How sweet is that? It's actually easier to read than the original LINQ query. + + +Sometimes I do amaze even myself. \ No newline at end of file diff --git a/docs/blog/building-part-10.md b/docs/blog/building-part-10.md new file mode 100644 index 0000000..b7710be --- /dev/null +++ b/docs/blog/building-part-10.md @@ -0,0 +1,1842 @@ +# LINQ: Building an IQueryable Provider – Part X: GroupBy and Aggregates + +Matt Warren - MSFT; July 8, 2008 + +-------------------------------- + + +This is the tenth in a series of posts on how to build a LINQ IQueryable provider. If you have not read the previous posts you'll want to find a nice shady tree, relax and meditate on why your world is so confused and full of meaningless tasks that it has kept you from pursuing the perfection of writing IQueryable providers. + + +Complete list of posts in the Building an IQueryable Provider series + + +Last time I blamed the television writers strike for delaying me in getting out my next installment. This time I blame the lack of one, and sunny days, and my son riding his bicycle, and the grueling, tiresome task of getting paid. Would you believe some of the code we are writing has nothing what-so-ever to do with IQueryable LINQ providers? Crazy, I know. Maybe it would be a different story around here if we had a few more shady trees. + + +Fortunately for you I have a prime piece of savory source code ready for slow grilling over a bed of hot mesquite. It's something I'd like to say I was saving for a special occasion, but the truth is I've been putting it off because I thought the code would be just too overwhelming. I mean for me, not you. You see I've been trying to do two things with this series; one, provide a guide to help developers navigate and understand the power and flexibility of the IQuerayble interface, and two, attempt to prove to myself that code written in a purely functional style (or as pure as I can make it) can end up much cleaner and easier on the eyes. I've been horrified ever since tackling OrderBy that this subject would become my undoing. Knowing how involved the translation was for LINQ to SQL I was dreading the immutable madness that would ensue. Obviously, I survived the ordeal and so will you. + + +Grappling with GroupBy and Aggregates + +Translating just GroupBy itself might not be so difficult if I did not have to also account for aggregates. It seems that GroupBy is not very interesting without the ability to write expressions involving Count() and Max() and Sum(), and that's where the difficulty sets in. + + +A query involving groups and aggregates that looks like this: + + +var query = from o in db.Orders + + + group o by o.CustomerID into g + + + select g.Max(o => o.OrderID); + + +Is translated into a series of method calls that looks like this: + + +var query = db.Orders.GroupBy(o => o.CustomerID).Select(g => g.Max(o => o.OrderID)); + + + + + +And this is a problem because as I translate from the bottom up (as I have been doing all along) building individual SELECT subqueries for each query operator, I won’t even discover the use of the aggregate ‘Max’ until after I’ve created the SELECT with the GROUP BY. So how do I get the two pieces together in the same place at the same time without violating my functional style and immutable expression nodes? Weirder yet, how do I even know that a call to an aggregate method should be tied back to a particular GroupBy? What if I had two group-bys? + + + + + +So where do I start to explain? Maybe the easy stuff first. I’ve added a GroupBy property to SelectExpression. + + + + + +internal class SelectExpression : Expression { + ... + + + ReadOnlyCollection groupBy; + + + + + + internal SelectExpression( + ..., + + + IEnumerable groupBy + + + ) + + + + internal ReadOnlyCollection GroupBy { + + + get { return this.groupBy; } + + + } + + +} + + + + + +Now every place where I construct one I have to specify a collection of grouping expressions (or null). GroupBy expressions also get visited after everything else (so far). Here’s the new version of VisitSelect in DbExpressionVisitor. + + + + + +protected virtual Expression VisitSelect(SelectExpression select) { + + + Expression from = this.VisitSource(select.From); + + + Expression where = this.Visit(select.Where); + + + ReadOnlyCollection columns = this.VisitColumnDeclarations(select.Columns); + + + ReadOnlyCollection orderBy = this.VisitOrderBy(select.OrderBy); + + + ReadOnlyCollection groupBy = this.VisitExpressionList(select.GroupBy); + + + if (from != select.From + + + || where != select.Where + + + || columns != select.Columns + + + || orderBy != select.OrderBy + + + || groupBy != select.GroupBy + + + ) { + + + return new SelectExpression(select.Type, select.Alias, columns, from, where, orderBy, groupBy); + + + } + + + return select; + + +} + + + + + +I also added an AggregateExpression class. It represents a call to an aggregate operator. + + + + + + internal enum AggregateType { + + + Count, + + + Min, + + + Max, + + + Sum, + + + Average + + + } + + + + + + internal class AggregateExpression : Expression { + + + AggregateType aggType; + + + Expression argument; + + + internal AggregateExpression(Type type, AggregateType aggType, Expression argument) + + + : base((ExpressionType)DbExpressionType.Aggregate, type) { + + + this.aggType = aggType; + + + this.argument = argument; + + + } + + + internal AggregateType AggregateType { + + + get { return this.aggType; } + + + } + + + internal Expression Argument { + + + get { return this.argument; } + + + } + + + } + + + + + +Now I at least know what to turn those aggregates into. But how do I get these aggregate expressions into the right place in the query tree? What if I did nothing and just let the aggregates fall where they may? What would it look like? Would it even be legal SQL? + + + + + +SELECT MAX(t5.OrderID) + + +FROM ( + + + SELECT t0.CustomerID + + + FROM Orders AS t0 + + + GROUP BY t0.CustomerID + + + ) AS t5 + + + + + +Oh, no. Danger Will Robinson! This is not legal SQL. Where does OrderID even come from? It’s not even projected out of the sub-query with the GROUP BY. And even if it were somehow projected, the result of the query would be all wrong. It would give me a single maximum instead of one for each group. Getting group-by & aggregates to work together is going to be tricky. + + + + + +Is it even possible to solve it and maintain my layering approach? What about correlated sub-queries? I could form a sub-query that joins back to the original query based on the grouping expressions and have it contain the aggregate. + + + + + +SELECT ( + + + SELECT MAX(t2.OrderID) + + + FROM Orders AS t2 + + + WHERE ((t2.CustomerID IS NULL AND t5.CustomerID IS NULL) OR (t2.CustomerID = t5.CustomerID)) + + + ) AS c0 + + +FROM ( + + + SELECT t0.CustomerID + + + FROM Orders AS t0 + + + GROUP BY t0.CustomerID + + + ) AS t5 + + + + + +Now that at least executes on the server. It might not be pretty and likely not efficient unless the server is really good at unscrambling my query, but it does execute and produce the correct results. So it is technically legal. But surely I can do better! + + +The GroupBy Operator + + + + +By now you are thinking, OMG, he’s going to stick us with a wacky solution like he did with nested queries! Never fear, my friend. Do not look at the man behind the current. Cast your gaze ahead and all will be made clear. In my hands, I have the GroupBy operator. Watch as it transforms into a SQL query inside the QueryBinder! + + + + + + protected override Expression VisitMethodCall(MethodCallExpression m) { + + + if (m.Method.DeclaringType == typeof(Queryable) || + + + m.Method.DeclaringType == typeof(Enumerable)) { + + + switch (m.Method.Name) { + ... + + + case "GroupBy": + + + if (m.Arguments.Count == 2) { + + + return this.BindGroupBy( + + + m.Arguments[0], + + + (LambdaExpression)StripQuotes(m.Arguments[1]), + + + null, + + + null + + + ); + + + } + + + else if (m.Arguments.Count == 3) { + + + return this.BindGroupBy( + + + m.Arguments[0], + + + (LambdaExpression)StripQuotes(m.Arguments[1]), + + + (LambdaExpression)StripQuotes(m.Arguments[2]), + + + null + + + ); + + + } + + + else if (m.Arguments.Count == 4) { + + + return this.BindGroupBy( + + + m.Arguments[0], + + + (LambdaExpression)StripQuotes(m.Arguments[1]), + + + (LambdaExpression)StripQuotes(m.Arguments[2]), + + + (LambdaExpression)StripQuotes(m.Arguments[3]) + + + ); + + + } + + + break; + ... + + + } + + + } + + + return base.VisitMethodCall(m); + + + } + + + + + +As you can see, I’m handling three different variations of GroupBy. The first one simply takes a single keySelector lambda (the thing to group by). It is supposed to return a sequence of grouping’s that each have a group key value and a collection of elements that make up that group. The second form is the same as the first except it also includes an elementSelector lambda that allows me to specify a map between an item in the source sequence and its shape of the elements in the group. The last form includes a resultSelector lambda that allows me to specify what the key and group collection turn into, instead of assuming they always form into instances of IGrouping. With all these variations, the translation of group-by is going to get a bit involved, so bear with me. + + + + + +Get ready. Get set. Open your eyes. Close them. Now open them again and really look. Go! + + + + + + protected virtual Expression BindGroupBy(Expression source, LambdaExpression keySelector, LambdaExpression elementSelector, LambdaExpression resultSelector) { + + + ProjectionExpression projection = this.VisitSequence(source); + + + + + + this.map[keySelector.Parameters[0]] = projection.Projector; + + + Expression keyExpr = this.Visit(keySelector.Body); + + + + + + Expression elemExpr = projection.Projector; + + + if (elementSelector != null) { + + + this.map[elementSelector.Parameters[0]] = projection.Projector; + + + elemExpr = this.Visit(elementSelector.Body); + + + } + + + + + + // Use ProjectColumns to get group-by expressions from key expression + + + ProjectedColumns keyProjection = this.ProjectColumns(keyExpr, projection.Source.Alias, projection.Source.Alias); + + + IEnumerable groupExprs = keyProjection.Columns.Select(c => c.Expression); + + + + + + // make duplicate of source query as basis of element subquery by visiting the source again + + + ProjectionExpression subqueryBasis = this.VisitSequence(source); + + + + + + // recompute key columns for group expressions relative to subquery (need these for doing the correlation predicate) + + + this.map[keySelector.Parameters[0]] = subqueryBasis.Projector; + + + Expression subqueryKey = this.Visit(keySelector.Body); + + + + + + // use same projection trick to get group-by expressions based on subquery + + + ProjectedColumns subqueryKeyPC = this.ProjectColumns(subqueryKey, subqueryBasis.Source.Alias, subqueryBasis.Source.Alias); + + + IEnumerable subqueryGroupExprs = subqueryKeyPC.Columns.Select(c => c.Expression); + + + Expression subqueryCorrelation = this.BuildPredicateWithNullsEqual(subqueryGroupExprs, groupExprs); + + + + + + // compute element based on duplicated subquery + + + Expression subqueryElemExpr = subqueryBasis.Projector; + + + if (elementSelector != null) { + + + this.map[elementSelector.Parameters[0]] = subqueryBasis.Projector; + + + subqueryElemExpr = this.Visit(elementSelector.Body); + + + } + + + + + + // build subquery that projects the desired element + + + string elementAlias = this.GetNextAlias(); + + + ProjectedColumns elementPC = this.ProjectColumns(subqueryElemExpr, elementAlias, subqueryBasis.Source.Alias); + + + ProjectionExpression elementSubquery = + + + new ProjectionExpression( + + + new SelectExpression(TypeSystem.GetSequenceType(subqueryElemExpr.Type), elementAlias, elementPC.Columns, subqueryBasis.Source, subqueryCorrelation), + + + elementPC.Projector + + + ); + + + + + + string alias = this.GetNextAlias(); + + + + // make it possible to tie aggregates back to this group-by + + + GroupByInfo info = new GroupByInfo(alias, elemExpr); + + + this.groupByMap.Add(elementSubquery, info); + + + + + + Expression resultExpr; + + + if (resultSelector != null) { + + + Expression saveGroupElement = this.currentGroupElement; + + + this.currentGroupElement = elementSubquery; + + + // compute result expression based on key & element-subquery + + + this.map[resultSelector.Parameters[0]] = keyProjection.Projector; + + + this.map[resultSelector.Parameters[1]] = elementSubquery; + + + resultExpr = this.Visit(resultSelector.Body); + + + this.currentGroupElement = saveGroupElement; + + + } + + + else { + + + // result must be IGrouping + + + resultExpr = Expression.New( + + + typeof(Grouping<,>).MakeGenericType(keyExpr.Type, subqueryElemExpr.Type).GetConstructors()[0], + + + new Expression[] { keyExpr, elementSubquery } + + + ); + + + } + + + + + + ProjectedColumns pc = this.ProjectColumns(resultExpr, alias, projection.Source.Alias); + + + + + + // make it possible to tie aggregates back to this group-by + + + Expression projectedElementSubquery = ((NewExpression)pc.Projector).Arguments[1]; + + + this.groupByMap.Add(projectedElementSubquery, info); + + + + + + return new ProjectionExpression( + + + new SelectExpression(TypeSystem.GetSequenceType(resultExpr.Type), alias, pc.Columns, projection.Source, null, null, groupExprs), + + + pc.Projector + + + ); + + + } + + + + + +It starts off easy, following the same binding pattern as the other operators. First I bind the key selector and get a key expression. Then if I have an element selector I bind that too, to get an element expression, otherwise I use the projector expression from the incoming source as the element expression. Got it? + + + + + +Now I have to figure out what the grouping expressions are going to be. You might be thinking I already know what they are, and there’s only one of them. It’s the key expression, right? Well, not exactly. It turns out its common to need to group by more than one property or column of data. The only way to do that with LINQ is to construct an anonymous type with multiple parts. So a key might be an anonymous type with multiple interesting fields. How do I turn that into one or more expression that can be evaluated as part of a SQL group by clause? + + + + + +It turns out I already have a handy function that does what I want. If I treat the key expression they same way I treat a projection expression, I can use the ColumnProjector to turn the expression into a list of column declarations, each with its own scalar expression. Then I can simply throw away the column declarations and keep the expressions that define the columns and voila I have grouping expressions! Don’t try this at home. Err, I mean, go ahead and try this at home. It’s easy. Watch, I’ll do it again in a moment. + + + + + +Moving along we come to some code trying to compute a ‘basis’. What’s that? Well, it turns out SQL doesn’t really support a GroupBy operator the way it is defined in LINQ. The LINQ operator, by default, produces a sequence of groups with each group containing the elements that got designated as part of that group. SQL group-by can only produce aggregate computations over the groups, not the groups themselves. Oh, dear, what to do, what to do? + + + + + +The only way to solve this problem is to form a self join back to the data I want. If I take the result of the group-by and join it back to the original query (everything up until the group-by) matching group-by key values I get all the data back that SQL was not going to let me have. That’s what the ‘subqueryBasis’ is. It’s a rebinding of the original input sequence. Then we rebind the key again and do my ColumnProjector trick to get a new collection of group expressions. Now I can use both sets to form a predicate for a join. + + + + + +Next, you can see that if I did specify an element selector I now bind it with respect to this new ‘basis’ and use the resulting expression as the projection coming out of this new branch of the query tree. + + + + + +Alas, I am not yet done. What about that third argument? If it’s specified then I have another lambda I have to do something with and if it is not specified then I need to figure out how I’m going to return an IGrouping. + + + + + +Okay, that doesn’t turn out to be too difficult. I created a Grouping class that I can use in case I need to actually return this stuff all the way back to the calling code. + + + + + + public class Grouping : IGrouping { + + + TKey key; + + + IEnumerable group; + + + + + + public Grouping(TKey key, IEnumerable group) { + + + this.key = key; + + + this.group = group; + + + } + + + + + + public TKey Key { + + + get { return this.key; } + + + } + + + + + + public IEnumerator GetEnumerator() { + + + return this.group.GetEnumerator(); + + + } + + + + + + IEnumerator IEnumerable.GetEnumerator() { + + + return this.group.GetEnumerator(); + + + } + + + } + + + + + +Even if I don’t, I can still use this type as a way to encode the fact that I’m returning a grouping, which is what I do. The final result of the call to GroupBy (unless a result selector was supplied) is a sequence of these groupings, so the projector expression resulting from this operation is simply a construction of one of these Grouping types via a NewExpression node. If a result selector is supplied that lambda determines the shape of the result so I just bind that the same way I bind all the other lambdas. In both cases, I’m using the newly created extra ‘basis’ query that projects out elements to form the group collection that is either transformed with the result selector or returned to the caller. + + + + + +That’s about it; clean and neat. Right? What about those other few lines of code referring to ‘GroupByInfo’ and ‘currentGroupElement’? That, sir, is the man behind the current. Now I’ll show you a little of him. + + + + + +Later, when I’m binding aggregates I will need to invent a way to tie those aggregates back to this group-by; if they belong to this group-by. So, later, I’m going to need to know something about this group-by (and certainly all group-by’s that might have already been encountered) to be able to pick the right one. This is what the GroupByInfo is. It’s the bits of information I hope will be useful later to correlate the two pieces together. If we move along now to aggregate binding I’ll show you how it’s used. + + +Aggregate Binding + + + + +Aggregates get treated just like any other operator. + + + + + + protected override Expression VisitMethodCall(MethodCallExpression m) { + + + if (m.Method.DeclaringType == typeof(Queryable) || + + + m.Method.DeclaringType == typeof(Enumerable)) { + ... + + + switch (m.Method.Name) { + + + case "Count": + + + case "Min": + + + case "Max": + + + case "Sum": + + + case "Average": + + + if (m.Arguments.Count == 1) { + + + return this.BindAggregate(m.Arguments[0], m.Method, null); + + + } + + + else if (m.Arguments.Count == 2) { + + + LambdaExpression selector = (LambdaExpression)StripQuotes(m.Arguments[1]); + + + return this.BindAggregate(m.Arguments[0], m.Method, selector); + + + } + + + break; + + + } + + + } + + + return base.VisitMethodCall(m); + + + } + + + + + +The actual binding has a few extra bits that help make the job a bit easier. + + + + + + Expression currentGroupElement; + + + + + + class GroupByInfo { + + + internal string Alias { get; private set; } + + + internal Expression Element { get; private set; } + + + internal GroupByInfo(string alias, Expression element) { + + + this.Alias = alias; + + + this.Element = element; + + + } + + + } + + + + + + private AggregateType GetAggregateType(string methodName) { + + + switch (methodName) { + + + case "Count": return AggregateType.Count; + + + case "Min": return AggregateType.Min; + + + case "Max": return AggregateType.Max; + + + case "Sum": return AggregateType.Sum; + + + case "Average": return AggregateType.Average; + + + default: throw new Exception(string.Format("Unknown aggregate type: {0}", methodName)); + + + } + + + } + + + + + + private bool HasPredicateArg(AggregateType aggregateType) { + + + return aggregateType == AggregateType.Count; + + + } + + + + + + private Expression BindAggregate(Expression source, MethodInfo method, LambdaExpression argument, bool isRoot) { + + + Type returnType = method.ReturnType; + + + AggregateType aggType = this.GetAggregateType(method.Name); + + + bool hasPredicateArg = this.HasPredicateArg(aggType); + + + + + + if (argument != null && hasPredicateArg) { + + + // convert query.Count(predicate) into query.Where(predicate).Count() + + + source = Expression.Call(typeof(Queryable), "Where", method.GetGenericArguments(), source, argument); + + + argument = null; + + + } + + + + + + ProjectionExpression projection = this.VisitSequence(source); + + + + + + Expression argExpr = null; + + + if (argument != null) { + + + this.map[argument.Parameters[0]] = projection.Projector; + + + argExpr = this.Visit(argument.Body); + + + } + + + else if (!hasPredicateArg) { + + + argExpr = projection.Projector; + + + } + + + + + + string alias = this.GetNextAlias(); + + + var pc = this.ProjectColumns(projection.Projector, alias, projection.Source.Alias); + + + Expression aggExpr = new AggregateExpression(returnType, aggType, argExpr); + + + Type selectType = typeof(IEnumerable<>).MakeGenericType(returnType); + + + SelectExpression select = new SelectExpression(selectType, alias, new ColumnDeclaration[] { new ColumnDeclaration("", aggExpr) }, projection.Source, null); + + + + + + if (isRoot) { + + + ParameterExpression p = Expression.Parameter(selectType, "p"); + + + LambdaExpression gator = Expression.Lambda(Expression.Call(typeof(Enumerable), "Single", new Type[] { returnType }, p), p); + + + return new ProjectionExpression(select, new ColumnExpression(returnType, alias, ""), gator); + + + } + + + + + + SubqueryExpression subquery = new SubqueryExpression(returnType, select); + + + + + + // if we can find the corresponding group-info we can build a special AggregateSubquery node that will enable us to + + + // optimize the aggregate expression later using AggregateRewriter + + + GroupByInfo info; + + + if (!hasPredicateArg && this.groupByMap.TryGetValue(projection, out info)) { + + + // use the element expression from the group-by info to rebind the argument so the resulting expression is one that + + + // would be legal to add to the columns in the select expression that has the corresponding group-by clause. + + + if (argument != null) { + + + this.map[argument.Parameters[0]] = info.Element; + + + argExpr = this.Visit(argument.Body); + + + } + + + else { + + + argExpr = info.Element; + + + } + + + aggExpr = new AggregateExpression(returnType, aggType, argExpr); + + + + + + // check for easy to optimize case. If the projection that our aggregate is based on is really the 'group' argument from + + + // the query.GroupBy(xxx, (key, group) => yyy) method then whatever expression we return here will automatically + + + // become part of the select expression that has the group-by clause, so just return the simple aggregate expression. + + + if (projection == this.currentGroupElement) + + + return aggExpr; + + + + + + return new AggregateSubqueryExpression(info.Alias, aggExpr, subquery); + + + } + + + + + + return subquery; + + + } + + + + + +Binding aggregates has its own slew of caveats that drive how this binding function is formed. Aggregates can be specified with our without an argument. Normally, if an aggregate is not specified with an argument it is implied that the aggregate is operating over the element of the sequence. + + + + + +For example, it is legal to write: + + + + + + db.Orders.Select(o => o.OrderID).Max(); + + + + + +It has the same meaning as: + + + + + + db.Orders.Max(o => o.OrderID); + + + + + +So if an aggregate invocation has no argument, I can simply pick up and use the projector expression coming out of the source sequence, right? Well, not when the aggregate is Count. The LINQ Count aggregate takes a predicate as an argument not a value expression. So I definitely don’t want to treat Count with an argument the same way I treat other aggregates. For Count, I want to take that argument and turn it into a WHERE clause. I do that near the top of the BindAggregate function by tacking on a call to Queryable.Where right onto the source before I even translate it. + + + + + +The next thing I do is actually create the AggregateExpression and stuff it into a column of a new SelectExpression. Whoa! Stop right there. Didn’t I want to avoid this? Don’t I want this aggregate ending up alongside the group-by? Yes, but not just yet. + + + + + +What happens is that the source expression, once translated, already is a correlated sub-query! How did that happen? Take a moment to consider what the source of the aggregate really is. Go ahead, I’ll wait. Got it yet? It’s the silly ‘g’. Ugh. + + + + + +Remember this query? + + + + + +var query = db.Orders.GroupBy(o => o.CustomerID).Select(g => g.Max(o => o.OrderID)); + + + + + +Now do you see it? The ‘g’ is the parameter to the Select operator. Where does it come from? Yes, that’s right. It comes from the output of the GroupBy operator. And what was that? Yes, right again. It was a sequence of Grouping instances. So a single ‘g’ is a single Grouping instance. Which itself is a collection of grouped items, which was formed using that basis query back in the BindGroupBy method, which was a self-join back to the original query with a join condition based on matching up grouping key expressions. Yes, that query, and yes, replicated here, in the context of an aggregate expression, it is indistinguishable from a correlated sub-query. So I simply add my extra aggregate expression on top and, presto change-o, I am finished! + + + + + +Well, not really. Skipping over this next bit about “isRoot” you can see that I create a SubqueryExpression. I have not talked about this yet, but this is how I represent a true correlated sub-query. + + + + + +And here it is in all its majestic glory. + + + + + + internal class SubqueryExpression : Expression { + + + SelectExpression select; + + + internal SubqueryExpression(Type type, SelectExpression select) + + + : base((ExpressionType)DbExpressionType.Subquery, type) { + + + this.select = select; + + + } + + + internal SelectExpression Select { + + + get { return this.select; } + + + } + + + } + + + + + +Okay, that was a tad anti-climatic. Get over it. It comes in handy later. + + + + + +So what’s the rest of the method doing? Ah, yes, you’ve seen it. This is where I look to see if I can tie the aggregate back to the group-by that it’s related to. I do this by looking up the GroupByInfo based on the source projection; again that’s the ‘g’. If the ‘g’ translates into the same expression that I created back in the BindGroupBy method then I know it’s the same one. Back in that method I stored a memento of the group-by keyed by this very same expression, and now I’m pulling this information back up. + + + + + +What you don’t see me doing here is actually going back and changing the SelectExpression that contains the group-by. Believe me; I really want to, but not just yet. Instead, what I do now is drop a breadcrumb into the query tree so that I can later follow the trail in one sweeping tree rewriting operation. To do this, I have invented yet another expression node to act as this breadcrumb. + + + + + + internal class AggregateSubqueryExpression : Expression { + + + string groupByAlias; + + + Expression aggregateInGroupSelect; + + + SubqueryExpression aggregateAsSubquery; + + + internal AggregateSubqueryExpression(string groupByAlias, Expression aggregateInGroupSelect, SubqueryExpression aggregateAsSubquery) + + + : base((ExpressionType)DbExpressionType.AggregateSubquery, aggregateAsSubquery.Type) + + + { + + + this.aggregateInGroupSelect = aggregateInGroupSelect; + + + this.groupByAlias = groupByAlias; + + + this.aggregateAsSubquery = aggregateAsSubquery; + + + } + + + internal string GroupByAlias { get { return this.groupByAlias; } } + + + internal Expression AggregateInGroupSelect { get { return this.aggregateInGroupSelect; } } + + + internal SubqueryExpression AggregateAsSubquery { get { return this.aggregateAsSubquery; } } + + + } + + + + + +What this node does is hold onto two possible futures. Either, the sweeping rewrite is going to move the aggregate expression back into the select with the group-by or its going to abandon all hope and fall back to the correlated sub-query approach. That’s why this node holds onto two different expressions. One is the sub-query node I’ve already created. The other is an expression that is fit and ready to become a citizen of the group. I get this expression by simply rebinding the argument relative to the element expression I stuffed into the GroupByInfo when binding the GroupBy. Now I stick both of the variations into a AggregateSubqueryExpression (reminding myself that unless I rewrite this the aggregate is going to be interpreted as a correlated subquery.) + + + + + +What about ‘currentGroupElement’? How does this factor in? Recall that there was a form of GroupBy that took as an argument the result selector. This is not the form that C# converts query expressions into (but VB does when you use the Aggregate clause). If the aggregate expression is being bound as part of the result selector then whatever expression I return from this method is going to end up in the same select expression with the GroupBy, no need to optimize later. So if this is the case I simply have the method return the optimal aggregate expression and forget all about the sub-query nonsense. + + + + + +That just leaves us with ‘isRoot.’ Explaining isRoot is simple. If the aggregate operator itself is the root of the tree, and in this case I mean with operator on top, or the ‘last’ operator, then it’s a very different kind of aggregate than normal. It is not associated with a group-by. It is a top-level aggregate. + + + + + +I get these when I type things like: + + + + + + db.Orders.Max(o => o.OrderID); + + + + + +Now instead of returning a sequence of aggregates, I need only one for the whole sequence. However, the query provider still gets back a sequence of rows from ADO and the ProjectionReader is still an IEnumerable of whatever the returned element type is. In this case it is IEnumerable. Yet, I can clearly not return that IEnumerable as the result of the execution of the query. I need just a single int. So I need a way to tell the provider how to get the one true int out of the sequence. That’s what I’m doing here. I’ve added an extra lambda to the ProjectionExpression that tells the provider later how to aggregate the sequence back into a singleton. The ProjectionExpression already has a ‘projector’ that tells the provider how to turn DataReader rows into objects. Now is also has an ‘aggregator’ that tells the provider how to turn the entire result set into a single value. The aggregator I construct calls the Enumerable.Single method. + + + + + +That’s really how it all works, except for the real magic, the part where I wave a wand and polymorph the ugly tree into the beautiful one. That’s the aggregate rewriter. + + +Aggregate Rewriting + + + + +Are you ready for it? This is the part that does all the truly maligned and dishonest shuffling of bits. + + + + + + internal class AggregateRewriter : DbExpressionVisitor { + + + ILookup lookup; + + + Dictionary map; + + + + + + private AggregateRewriter(Expression expr) { + + + this.map = new Dictionary(); + + + this.lookup = AggregateGatherer.Gather(expr).ToLookup(a => a.GroupByAlias); + + + } + + + + + + internal static Expression Rewrite(Expression expr) { + + + return new AggregateRewriter(expr).Visit(expr); + + + } + + + + + + protected override Expression VisitSelect(SelectExpression select) { + + + select = (SelectExpression)base.VisitSelect(select); + + + if (lookup.Contains(select.Alias)) { + + + List aggColumns = new List(select.Columns); + + + foreach (AggregateSubqueryExpression ae in lookup[select.Alias]) { + + + string name = "agg" + aggColumns.Count; + + + ColumnDeclaration cd = new ColumnDeclaration(name, ae.AggregateInGroupSelect); + + + this.map.Add(ae, new ColumnExpression(ae.Type, ae.GroupByAlias, name)); + + + aggColumns.Add(cd); + + + } + + + return new SelectExpression(select.Type, select.Alias, aggColumns, select.From, select.Where, select.OrderBy, select.GroupBy); + + + } + + + return select; + + + } + + + + + + protected override Expression VisitAggregateSubquery(AggregateSubqueryExpression aggregate) { + + + Expression mapped; + + + if (this.map.TryGetValue(aggregate, out mapped)) { + + + return mapped; + + + } + + + return this.Visit(aggregate.AggregateAsSubquery); + + + } + + + + + + class AggregateGatherer : DbExpressionVisitor { + + + List aggregates = new List(); + + + private AggregateGatherer() { + + + } + + + + + + internal static List Gather(Expression expression) { + + + AggregateGatherer gatherer = new AggregateGatherer(); + + + gatherer.Visit(expression); + + + return gatherer.aggregates; + + + } + + + + + + protected override Expression VisitAggregateSubquery(AggregateSubqueryExpression aggregate) { + + + this.aggregates.Add(aggregate); + + + return base.VisitAggregateSubquery(aggregate); + + + } + + + } + + + } + + + + + +That was it. No, really. That was the hard part. Look closer, you’ll see where it actually does something. + + + + + +The AggregateRewriter is actually two visitors; the primary rewriting visitor and a secondary gatherer that builds a collection of all the reachable aggregate subqueries. See, I told you having this new node would come in handy. + + + + + +What is actually going on is first all the aggregate expressions are gathered up and formed into a lookup table keyed off the group-by alias. Then during the rewrite, when I get to the select expression with that alias I simply tack on all the aggregate expressions by inventing columns for them. On the side I keep a mapping between all the original aggregate sub-queries and a column-expression that references the newly declared column. A new select expression is created with the new set of columns, triggering a cascade of tree rebuilding (keeping everything nice and immutable.) Then, when the aggregate sub-query is visited later, if it’s found in the map I simply replace it with the new column expression. + + + + + +And now the aggregate expressions are in the right place. Peace, harmony and all is well with the world. + + + + + +I can see you are now impressed with my mad wizard-like skillz. + + + + + +Let’s put it all together. I add the aggregate rewriter to the pipeline of visitors we have in DbQueryProvider.Translate(). + + + + + + private TranslateResult Translate(Expression expression) { + + + ProjectionExpression projection = expression as ProjectionExpression; + + + if (projection == null) { + + + expression = Evaluator.PartialEval(expression, CanBeEvaluatedLocally); + + + expression = QueryBinder.Bind(this, expression); + + + expression = AggregateRewriter.Rewrite(expression); + + + expression = OrderByRewriter.Rewrite(expression); + + + expression = UnusedColumnRemover.Remove(expression); + + + expression = RedundantSubqueryRemover.Remove(expression); + + + projection = (ProjectionExpression)expression; + + + } + + + string commandText = QueryFormatter.Format(projection.Source); + + + string[] columns = projection.Source.Columns.Select(c => c.Name).ToArray(); + + + LambdaExpression projector = ProjectionBuilder.Build(projection.Projector, projection.Source.Alias, columns); + + + return new TranslateResult(commandText, projector, projection.Aggregator); + + + } + + + + + +Notice the new ‘aggregator’ being passed along in the result. The Execute method now makes use of it. + + + + private object Execute(TranslateResult query) { + + + Delegate projector = query.Projector.Compile(); + + + + + + if (this.log != null) { + + + this.log.WriteLine(query.CommandText); + + + this.log.WriteLine(); + + + } + + + + + + DbCommand cmd = this.connection.CreateCommand(); + + + cmd.CommandText = query.CommandText; + + + DbDataReader reader = cmd.ExecuteReader(); + + + + + + Type elementType = TypeSystem.GetElementType(query.Projector.Body.Type); + + + + + + IEnumerable sequence = (IEnumerable) Activator.CreateInstance( + + + typeof(ProjectionReader<>).MakeGenericType(elementType), + + + BindingFlags.Instance | BindingFlags.NonPublic, null, + + + new object[] { reader, projector, this }, + + + null + + + ); + + + + + + if (query.Aggregator != null) { + + + Delegate aggregator = query.Aggregator.Compile(); + + + AggregateReader aggReader = (AggregateReader) Activator.CreateInstance( + + + typeof(AggregateReader<,>).MakeGenericType(elementType, query.Aggregator.Body.Type), + + + BindingFlags.Instance | BindingFlags.NonPublic, null, + + + new object[] { aggregator }, + + + null + + + ); + + + return aggReader.Read(sequence); + + + } + + + else { + + + return sequence; + + + } + + + } + + +Taking it for a Spin + + + + +Let’s give the provider a big soupy group-by ball of aggravation and see how it behaves. + + + + + + var query = from o in db.Orders + + + group o by o.CustomerID into g + + + select new { + + + Customer = g.Key, + + + Total = g.Sum(o => o.OrderID), + + + Min = g.Min(o => o.OrderID), + + + Avg = g.Average(o => o.OrderID) + + + }; + + + + + +This query translates into: + + + + + +SELECT t0.CustomerID, SUM(t0.OrderID) AS agg1, MIN(t0.OrderID) AS agg2, AVG(t0.OrderID) AS agg3 + + +FROM Orders AS t0 + + +GROUP BY t0.CustomerID + + + + + +The results of execution are: + + + + + +{ Customer = ALFKI, Total = 64835, Min = 10643, Avg = 10805 } + + +{ Customer = ANATR, Total = 42618, Min = 10308, Avg = 10654 } + + +{ Customer = ANTON, Total = 74195, Min = 10365, Avg = 10599 } + + +{ Customer = AROUT, Total = 139254, Min = 10355, Avg = 10711 } + + +... + + +{ Customer = WILMK, Total = 75650, Min = 10615, Avg = 10807 } + + +{ Customer = WOLZA, Total = 75595, Min = 10374, Avg = 10799 } + + + + + +Don’t ask me why I think it is interesting to total up the OrderID’s. I’m just weird. + + + + + +That’s it. Really. This time I mean it. I’m done. If you want more take a look at the source code attached. All the gory details are in there, along with a bazillion minor bug fixes I made since the last installment. + + + + + +DISCLAIMER: The provided solution for GroupBy works well for many simple cases, such as when the aggregate expressions immediately follow the GroupBy operator. It is possible to write queries that produce the correlated sub-query form. This is normal. There is definitely still room for improvement. + + + + + +Now, get out from under that shady tree and get back to writing code. You know I am. + diff --git a/docs/blog/building-part-11.md b/docs/blog/building-part-11.md new file mode 100644 index 0000000..833e4f8 --- /dev/null +++ b/docs/blog/building-part-11.md @@ -0,0 +1,74 @@ +# LINQ: Building an IQueryable Provider – Part XI: More of Everything + +Matt Warren - MSFT; July 14, 2008 + +--------------------------------- + +This is the eleventh in a series of posts on how to build a LINQ IQueryable provider. If you have not read the previous posts you’ll want to do so before proceeding, or at least before proceeding to copy the code into your own project and telling your boss you single-handedly solved the data layer problem over the weekend. + + +Complete list of posts in the Building an IQueryable Provider series + + +No excuses for a late post this time. In my estimation, not only am I early but I have arrived with abundance. This is not my ordinary, keep it to the basics, slow and steady post about adding that next incremental feature to a feature-starved sample program. This is the Big Kahuna! This is the post where I finally go over the edge, lose my cool and let loose on the code. No more Mr. Nice guy. Now, I’m really taking it to the cleaners. + + +There’s so much goodness here I don’t even know where to start. Even if I did, how could I possibly describe it all in one little post? I’m not even going to try. You’re going to have to download the source to get the real truth. A link is located at the bottom of the post. Click it. Feel the power flow over the internet, through the wires and onto your screen. Feel the radiance seep in through your eyes and take hold of your brain with the vice-like grip of a big-time wrestler. + + +So just what have I been cooking? Let’s take a look at the highlights. + + +More Query Operators + +That’s right; a lot more. Don’t believe me? Look at the code and see for yourself. Too busy? Boss breathing down your neck. Okay, here’s the list. + + +Distinct - Not only is there translation for this, but Distinct also makes an appearance inside most aggregates! AVG(DISTINCT t0.Price) anyone! + + +Skip & Take – Both work and work well together using the ROW_NUMBER & BETWEEN combo that gets incredible performance. + + +First, FirstOrDefault, Single & SingleOrDefault – No, these are not just the same thing. They really do behave differently. + + +Any, All & Contains - Not only do these guys work, but all three work with local collections. Use collection.Any(xxx) to get any predicate expression expanded into a series of OR conditions over the input set. + + +Framework Methods and Properties + + +Hallelujahs, brothers! Now you can write queries that reference simple String, DateTime, Decimal, and Math operations, and get them translated into equivalent SQL. I’ve not implemented every possible method but you’ll find a lot there. There are so many I can’t even list them all here. Rest assured you’ll get classics like string StartsWith, EndsWith and Contains and every variation of Equals and Compare that the framework can throw at me. + + +Parameters + +That’s what I said, parameters. No more SQL injection attacks in the sample code, I’ve finally gotten around to adding parameter support. Take a look, it’s all rather straight forward; a new node type NamedValueExpression, a new visitor Parameterizer and a few extra lines of code to create and assign parameter values at execution time. Not every argument is parameterized either. Simple numeric values are left as literals. + + +Simpler Queries + +The RedundantSubqueryRemover has been enhanced to know how to merge more types of SelectExpression’s together. Now, you’ll get even less layers of subqueries. + + +Compiled Queries + +OMG! Can it be so? Yes, compiled queries too. How is it possible? It took a little refactoring of how queries are executed. There is now a new low-level Execute method on DbQueryProvider that takes a translated command, parameters and a function that converts a DbDataReader into a result (this could be sequence or a single value.) Given that and LINQ Expression trees themselves, it is easy to see how I can turn a request to execute a LambdaExpression into constructing a function that when called, calls down to the low-level execute method with pre-translated SQL queries and a pre-translated object reader. + + +Unit Tests + +Okay, get back up into your chair. I realized how easy it is to lose control and find the need to roll around the floor in ecstasy. However, your co-workers are starting to wonder if they should call for medical assistance. + + +This does not mean I’m using TDD, or DDT or any TLA methodology; it just an apology really. The sample codebase has become so unwieldy that even I can’t trust myself anymore and have discovered so many prior features that I’ve broken in the process of doing bigger and better things that I literally broke down and wrote some simple unit tests. AND they are simple too. For the most part they prove 1) some LINQ operators are translated into the SQL I thought looked good at the time, and 2) the database server actually accepts these queries and does something with them. There are over 200 of these tests now. There is also a start at verifying actual semantics by looking at the results. I started a second bank of tests for these and so far only have a few compiled-query tests listed. More surely to come. + + + + + +Whew! I know that’s a lot to GROK, but take a look anyway. If you’ve been building your own provider and have been stuck on how to do some of these kinds of features (except for tests, shame on you if you haven’t got your own tests yet), now you can get moving again. + + +But WAIT, there’s more! I’m not done yet. Even with all this I’ve shown you today, I still have a few more ideas. More posts to come. \ No newline at end of file diff --git a/docs/blog/building-part-12.md b/docs/blog/building-part-12.md new file mode 100644 index 0000000..7309e98 --- /dev/null +++ b/docs/blog/building-part-12.md @@ -0,0 +1,642 @@ +# Building a LINQ IQueryable Provider – Part XII: Relationships and the IQToolkit + +Matt Warren - MSFT; November 17, 2008 + +------------------------------------- + +This is the twelfth in a series of posts on how to build a LINQ IQueryable provider. If you have not read the previous posts you probably were born yesterday. How could you possibly make sense of this post without any context at all? At least make an attempt. Sometimes I don't know why I bother. + + +Complete list of posts in the Building an IQueryable Provider series + + + + + +It's been so long since I last posted we must be up to Web 5.0 by now. I suspect there are not any actual web developers left. Its all just twitter application programming and cybernetic mind melds now. Does anybody still write code? In a text based language? Without an electric shunt wired directly into your cerebral cortex? Nobody? + + +Sigh. I'm probably just talking to an empty internet; everyone's moved on to Planck Space. But since I've got nothing better to do, I might as well get on with it. + + +IQueryable Toolkit + +One of the first things you'll notice when you take a look at the source is that I changed it quite a bit. I moved code around, changed names gratuitously, added & removed classes and broke a lot of continuity with the prior versions. One of the biggest changes is that the code is no longer just a sample. All those internal classes are public, the project builds as a DLL, the tests are hosted separately and the namespace is no longer 'Sample.' Why, oh why would I do this? Simple. + + +I originally started the project to give developers a structured hands-on walk through of the construction of an IQueryable provider; hoping to inspire many of you to build your own. (Many of you have.) The source code was made available so you could learn from it and as a cheap way to get you to debug it for me. (Many of you haven't.) And so you could take from it what you would, copy a class here, steal a method there, etc, to make your job easier to get your own LINQ to XYZ up off the ground quicker. + + +Yet, I received so many requests to re-use the entire code base wholesale, that I now realize what you really want is a toolkit not just some sample code. As a toolkit it is still sample code. The sources are still there and you can take from it what you will. Or you can build it into a shiny DLL and then build your code on top of it. All the pieces I've built so far are publicly re-usable. Many of the classes have been enhanced for extensibility, so you can mix & match and override to your hearts content to build the system you want out of the pieces I already have. + + +How it breaks down + +At the top there is a namespace 'IQ'. I thought that it was cleaver & consise. Maybe someone out there will come up with a better one, something catchy, like a cold, and I'll feel compelled to catch it, fall under-the-weather for a few weeks and emerge blissfully sedated enough to go with it. Maybe. Under the 'IQ' namespace exists all the bits and pieces that are generally useful for any IQueryable impementation. This is where you'll find the basic ExpressionVisitor class, the generic Query and base QueryProvider class. You'll also get these: + + + +ExpressionComparer - compares two expression trees for equality. immensely useful. + + +ExpressionWriter - Get a better translation of that narly expression tree into a C#-ish syntax that you can actually read. Helps debugging a lot. + + +Grouping - It's that dead simple implementation of the LINQ IGrouping interface. + + +CompoundKey - A class that helps you represent compound key values. + + +ScopedDictionary - 'cuz I needed it and now you can have it too. Works sort of like a Dictionary but with nested scopes. + + +TypeHelper - Used to be called TypeSystem (how pretentious of me). Helps you know a few things about types. + + +DeferredList - An implementation of IList that enables deferred loading. (Bells are probably ringing in your head about now.) + + +Nested inside this namespace is another one, called simply 'Data' (I know, original, huh). In here you'll find all those other classes that made up most of my previous posts like the provider itself, DbQueryProvider, that works on top of any DbConnection. To really understand what has gone on in my head you need to get into this class and look around. + + +Logically, it still does the same thing. It primarily just implements the IQueryable.Execute method, converting query expressions into SQL commands, executing them using the DbConnection and translating the results into objects. That's all still there, its just been re-arranged a bit and made a lot more extensible. + + +How it is extended is the interesting bit that flavors everything else to come. The DbQueryProvider's brains are now fully pluggable. I took a look at what was going on during execution and broke it down into logically atomic steps. The prior version used to first translate the query, build a generic reader and then execute and return. That's still sort-of what happens, but there's an even better break-down. + + +Queries are translated, but the translation goes through three distinct phases: + + + +1) mapping is applied + + +2) policy is enforced + + +3) the target query language has the last say. + + +Each of these steps is pluggable. How? There are three new classes to get acquainted to. + + + +QueryMapping - Defines information & rules to map an object model onto a database model + + +QueryPolicy - Defines information & rules to determine inclusion of related data & execution plans + + +QueryLanguage - Defines information & rules to adhere to a target language, including converting the query into text + + +Every DbQueryProvider is now supplied these three at runtime, each can be overriden by you to take control at different points in then translation and execution process. The mapping, policy & language can each override a portion of the query translation pipeline, injecting its own rules as rewrites of the query expression using the common DbExpression primitives provided. + + +In addition, the QueryLanguage controls the final translation of the tree to SQL text (or whatever target language is being used), and the QueryPolicy is invoked to build the execution plan. Note, the execution plan is not the simple object-reader of yore. It is now an entire runtime generated program for completely executing the database query & constructing objects out of the results. The QueryPolicy is allowed to do whatever final rewrite is necessary to turn the translated query expression into an executable piece of code. + + +Of course, by default, these three mostly do what was being done before, and if you care to look you'll see that the QueryLanguage's Format method (for turning queries into text) simply defers to the TSqlFormatter class (which used to be called QueryFormatter.) You can overidde this method and have it call your own formatter. You can even override the TSqlFormatter and make a few minor changes if your back end SQL is largely the same. + + +If you want to implement a different strategy to retrieving hierarchical results, you can inject your own logic both during query translation and you can take over the entire process of building the execution plan. The default BuildExecutionPlan on the QueryPolicy class defers to the ExecutionBuilder (used to be ProjectionBuilder in its former life), but you can change that and do it your own way. Of course, all the source is available, so you can build your version based off of the one that exists in the toolkit. + + +The crazy logic for inferring mapping information by simply using the names of types and members is retained, but walled off in a specific implementation of mapping called the ImplicitMapping. You can re-use this one if your needs are as meager as my demos. Or you can build your own. + + +Of course, now that you've had a moment to think about it, given so much flexibility, you'll inevitably start asking about what database providers are supported. (Still just TSQL.) You'll also want to know what complex mapping models I've invented and how they compare to ones used by ORM's. (Still just the implicit demo one.) And then you'll want to know what I did to improve reading data hierarchies because that N+1 strategy is just a loser. (Here's where you finally get something out of me.) + + +Relationships and Loading + +The query translation finally understands relationship properties. The QueryMapping understands the abstraction of mapping properties, these can map to columns or relationships (like associations) or whatever you want. The mapping decides what's possible. What have I implemented? The basic mapping understands association relationships only; those kinds of relationships that are made via a join matching one or more columns across two tables. + + +The interesting part (at least to me) is how relationships are dealt with during query translation and later during execution. Does a one to many relationship property residing in the output projection lead to extra queries at execution time or not? If so, how many? + + +I've implemented a few rewriters that deal with relationships. For example, the SingletonProjectionRewriter finds projections of singleton relationships (one to one or many to one) and folds them into the query itself using a server-side join. A singleton query doesn't change the cardinality of the results, so it is generally safe. Yet, what happens when you refer to the same singleton relationship property more than once? Do you get the same join over and over again? To hold back the flood of redundant joins I had to come up with another rewriter, one that would find the duplicate uses and get them to place nice together, but because this situation can happen more often than just as the side effect of doing the singleton rewrites, I wrote it to work on joins and not just nested projections. The RedundantJoinRemover looks for identical joins expressed in the same query and throws out all the extra ones. It does not look through into sub-queries, so its best to first removed redundant sub-query layers using the RedundantSubqueryRemover in hopes of forcing as many joins into the same layer. + + +The ClientJoinedProjectionRewriter attempts to do something very different. It looks to convert nested projections (ones that would become extra queries per object at execution time) into client-side joins. So you'll get one join per included relationship. This is not the "Big Join" strategy that LINQ to SQL uses. It precisely the strategy that I still consider inferior if I'm asked to ensure correctness of the results. So why did I choose to implement it? I'm not being asked to insure the correctness of results, nor am I being asked to ensure that large result sets can stream back from the server. I assume that the normal degree of ambiguity of results you get any time you attempt to get related results via more than one query is okay in this case too. I also assume that a user will choose to employ a transaction to get better consistency. You can also disable this one easily if you want. You can implement a policy that uses one or more other strategies. + + +This is what the ClientJoinedProjectionRewriter actually does. If the SelectExpression of a nested projection is constrained to the outer query via an equality comparison, like this column equals that parameter and so on I figure I can easily turn this into a client-side LINQ to object style join. So I rewrite the query so it restates all the significant bits of the queries that came before it. So if you wanted to retrieve customers in London and all their orders you'll get one query for the customers in London and then another query for all the orders for all the customers in London. The idea is that the second query is actually run first and all the objects are stuffed into a lookup table. When the query for customers executes, it picks up the matching orders right out of the lookup table. Neat. + + +Taking it for a Spin + +Given a simple query for customers in London: + +var query = from c in db.Customers + + where c.City == "London" + + select c; + +And, assuming I have a policy that tells the provider that the Orders property on customers is supposed to be included whenever I ask for customers. (The TestPolicy used by the supplied test project lets me pick and choose properties by name.) + + +When I start out, the provider first sees the query as LINQ expression tree: + +Query(Test.Customer).Where(c => (c.City = "London")) + + +After mapping the query expression now looks like this: + +Project( + + @"SELECT t0.City, t0.ContactName, t0.Country, t0.CustomerID, t0.Phone + + FROM [Customers] AS t0 + + WHERE (t0.City = 'London')", + + new Customer() { + + City = A0.Column("City"), + + ContactName = A0.Column("ContactName"), + + Country = A0.Column("Country"), + + CustomerID = A0.Column("CustomerID"), + + Phone = A0.Column("Phone") + + }, + + p => Queryable.AsQueryable(p) + +) + +It's got the SelectExpression (formatted by default as TSQL), followed by an expression that constructs a customer out of the query's result columns, and function that converts the presumed resulting IEnumerable into the expected type. + + +Next the QueryPolicy is asked to translate, so it applies rules like determining if a relationship property is included in the output or not. Since I set up the policy to automatically include orders, I get a new expression that looks like this: + +Project( + + @"SELECT t0.City, t0.ContactName, t0.Country, t0.CustomerID, t0.Phone + + FROM [Customers] AS t0 + + WHERE (t0.City = 'London')", + + new Customer() { + + City = A0.Column("City"), + + ContactName = A0.Column("ContactName"), + + Country = A0.Column("Country"), + + CustomerID = A0.Column("CustomerID"), + + Phone = A0.Column("Phone"), + + Orders = Project( + + @"SELECT t0.CustomerID, t0.OrderDate, t0.OrderID + + FROM [Orders] AS t0 + + WHERE (t0.CustomerID = t2.CustomerID)", + + new Order() { + + CustomerID = A1.Column("CustomerID"), + + OrderDate = A1.Column("OrderDate"), + + OrderID = A1.Column("OrderID") + + }, + + p => Enumerable.ToList(p) + + ) + + }, + + p => Queryable.AsQueryable(p) + +) + +Now I've got two queries, one nested inside the other. If nothing happens to this, the inner query will get executed once per row in the outer query results. + + +Next, the QueryPolicy also applies the the ClientJoinedProjectionRewriter. It attempts to convert nested projections into a query to fetch all the data (that will be executed once per outer query) with the resulting objects joined on the client machine. + +Project( + + @"SELECT t0.City, t0.ContactName, t0.Country, t0.CustomerID, t0.Phone + + FROM [Customers] AS t0 + + WHERE (t0.City = 'London')", + + new Customer() { + + City = A0.Column("City"), + + ContactName = A0.Column("ContactName"), + + Country = A0.Column("Country"), + + CustomerID = A0.Column("CustomerID"), + + Phone = A0.Column("Phone"), + + Orders = ClientJoin( + + OuterKey(A0.Column("CustomerID")), + + InnerKey(A1.Column("CustomerID")), + + Project( + + @"SELECT t0.CustomerID, t1.Test, t1.CustomerID AS CustomerID1, t1.OrderDate, t1.OrderID + + FROM [Customers] AS t0 + + OUTER APPLY ( + + SELECT t2.CustomerID, t2.OrderDate, t2.OrderID, 1 AS Test + + FROM [Orders] AS t2 + + WHERE (t2.CustomerID = t0.CustomerID) + + ) AS t1 + + WHERE (t0.City = 'London')", + + Outer( + + A1.Column("Test"), + + new Order() { + + CustomerID = A1.Column("CustomerID1"), + + OrderDate = A1.Column("OrderDate"), + + OrderID = A1.Column("OrderID") + + } + + ), + + p => Enumerable.ToList(p) + + ) + + ) + + }, + + p => Queryable.AsQueryable(p) + +) + +Now I have an embedded ClientJoin node. Notice how the inner query has a join so that it will retrieve all the orders for all the customers in London now. Unfortunately, its an OUTER APPLY join when it does not really need to be. But don't fear. The QueryLanguage will soon fix it. The combined query also gets a new column 'Test'. This is used to determine if the join found at least one matching row or not. If no match was found instead of the value 1 the 'Test' column will be null. I can use this information to make sure the resulting collection for a non-match is empty. + + +Next the QueryLanguage get its shot at translating the tree. When it does, it attempts to get rid of any APPLY node if at all possible; OUTER APPLY's become LEFT OUTER JOIN's and CROSS APPLY's become INNER JOIN's. + +Project( + + @"SELECT t0.City, t0.ContactName, t0.Country, t0.CustomerID, t0.Phone + + FROM [Customers] AS t0 + + WHERE (t0.City = 'London')", + + new Customer() { + + City = A0.Column("City"), + + ContactName = A0.Column("ContactName"), + + Country = A0.Column("Country"), + + CustomerID = A0.Column("CustomerID"), + + Phone = A0.Column("Phone"), + + Orders = ClientJoin( + + OuterKey(A0.Column("CustomerID")), + + InnerKey(A1.Column("CustomerID")), + + Project( + + @"SELECT t0.CustomerID, t1.Test, t1.CustomerID AS CustomerID1, t1.OrderDate, t1.OrderID + + FROM [Customers] AS t0 + + LEFT OUTER JOIN ( + + SELECT t2.CustomerID, t2.OrderDate, t2.OrderID, 1 AS Test + + FROM [Orders] AS t2 + + ) AS t1 + + ON (t1.CustomerID = t0.CustomerID) + + WHERE (t0.City = 'London')", + + Outer( + + A1.Column("Test"), + + new Order() { + + CustomerID = A1.Column("CustomerID1"), + + OrderDate = A1.Column("OrderDate"), + + OrderID = A1.Column("OrderID") + + } + + ), + + p => Enumerable.ToList(p) + + ) + + ) + + }, + + p => Queryable.AsQueryable(p) + +) + +Now the query expression is fully translated we only have left to build the execution plan. The QueryPolicy is asked to turn the expression into a program that will do the actual execution. + +Invoke( + + null, + + lookup1 => ((IQueryable)ExecutionBuilder.Sequence(new Object[] { + + ExecutionBuilder.Assign( + + lookup1, + + Enumerable.ToLookup( + + Enumerable.Where( + + ((DbQueryProvider)Query(Test.Customer).Provider).Execute( + + new QueryCommand>( + + @"SELECT t0.CustomerID, t1.Test, t1.CustomerID AS CustomerID1, t1.OrderDate, t1.OrderID + + FROM [Customers] AS t0 + + LEFT OUTER JOIN ( + + SELECT t2.CustomerID, t2.OrderDate, t2.OrderID, 1 AS Test + + FROM [Orders] AS t2 + + ) AS t1 + + ON (t1.CustomerID = t0.CustomerID) + + WHERE (t0.City = @p0)", + + new String[] {"p0"}, + + r1 => new KeyValuePair( + + r1.IsDBNull(0) + + ? null + + : ((String)Convert.ChangeType( + + r1.GetValue(0), + + System.String + + )), + + r1.IsDBNull(1) + + ? null + + : new Order() { + + CustomerID = r1.IsDBNull(2) + + ? null + + : ((String)Convert.ChangeType( + + r1.GetValue(2), + + System.String + + )), + + OrderDate = r1.IsDBNull(3) + + ? new DataTime("1/1/0001 12:00:00 AM") + + : ((DateTime)Convert.ChangeType( + + r1.GetValue(3), + + System.DateTime + + )), + + OrderID = r1.IsDBNull(4) + + ? 0 + + : ((Int32)Convert.ChangeType( + + r1.GetValue(4), + + System.Int32 + + )) + + } + + ) + + ), + + new Object[] {((Object)"London")} + + ), + + kvp => kvp.Value != null + + ), + + kvp => kvp.Key, + + kvp => kvp.Value + + ) + + ), + + Queryable.AsQueryable(((DbQueryProvider)Query(Test.Customer).Provider).Execute( + + new QueryCommand( + + @"SELECT t0.City, t0.ContactName, t0.Country, t0.CustomerID, t0.Phone + + FROM [Customers] AS t0 + + WHERE (t0.City = @p0)", + + new String[] {"p0"}, + + r0 => new Customer() { + + City = r0.IsDBNull(0) + + ? null + + : ((String)Convert.ChangeType( + + r0.GetValue(0), + + System.String + + )), + + ContactName = r0.IsDBNull(1) + + ? null + + : ((String)Convert.ChangeType( + + r0.GetValue(1), + + System.String + + )), + + Country = r0.IsDBNull(2) + + ? null + + : ((String)Convert.ChangeType( + + r0.GetValue(2), + + System.String + + )), + + CustomerID = r0.IsDBNull(3) + + ? null + + : ((String)Convert.ChangeType( + + r0.GetValue(3), + + System.String + + )), + + Phone = r0.IsDBNull(4) + + ? null + + : ((String)Convert.ChangeType( + + r0.GetValue(4), + + System.String + + )), + + Orders = Enumerable.ToList(lookup1.get_Item(r0.IsDBNull(3) + + ? null + + : ((String)Convert.ChangeType( + + r0.GetValue(3), + + System.String + + )))) + + } + + ), + + new Object[] {((Object)"London")} + + )) + + })) + + ) + + + +Now we're cookin' with GAS! This is the exact LINQ expression that can be run to execute the query and produce the results. Notice how the code directly calls the DbQueryProvider.Execute method that takes the command text, parameters and a function that converts a DbDataReader into a sequence of objects. That function is described inline with the rest of the LINQ expression. Yes, you can do that. That's how all those other LINQ operators work. + + +Also notice how I cheat with expression trees. The process to create the lookup table requires variables, assignments and statement sequences; all things not currently possible with expression trees -- or so you thought. The variable is really a parameter to a lambda expression. If you directly invoke an inline lambda expression you basically create a variable that is in scope over the body of the lambda. Next I call a method on ExecutionBuilder called Sequence. This lets me simulate having statement sequences. All the expressions are evaluated in order because the results are turned into an argument array. The Sequence method simply returns the last value in the array. The last cheat is the assignment. While there is no assignment operator in the expression tree (at least not yet), it is possible to pass a parameter to a method as a by-ref argument. Yes, this is ugly and causes side-effects, but that's exactly what I wanted. (Until we get support for statements sometime in the future!) The ExecutionBuilder Assign method take a ref argument and a value and simply assigns the value to the ref argument. + + +When the query is finally executed, all the data is retrieved up front in only two queries. + +SELECT t0.CustomerID, t1.Test, t1.CustomerID AS CustomerID1, t1.OrderDate, t1.OrderID + +FROM [Customers] AS t0 + +LEFT OUTER JOIN ( + + SELECT t2.CustomerID, t2.OrderDate, t2.OrderID, 1 AS Test + + FROM [Orders] AS t2 + + ) AS t1 + + ON (t1.CustomerID = t0.CustomerID) + +WHERE (t0.City = @p0) + +-- @p0 = [London] + +SELECT t0.City, t0.ContactName, t0.Country, t0.CustomerID, t0.Phone + +FROM [Customers] AS t0 + +WHERE (t0.City = @p0) + +-- @p0 = [London] + + + + + + +Well that's it then. Kudos, if you've read this far you really are trying to understand how to build your own IQueryable provider. Either that or you are up late at night with insomnia trying to find something softer than a hammer to put yourself to sleep with. + + +Have fun with the new code! + diff --git a/docs/blog/building-part-13.md b/docs/blog/building-part-13.md new file mode 100644 index 0000000..23295df --- /dev/null +++ b/docs/blog/building-part-13.md @@ -0,0 +1,888 @@ +# Building a LINQ IQueryable Provider – Part XIII: Updates and Batch Processing + +Matt Warren - MSFT; January 22, 2009 + +------------------------------------ + +This is the thirteenth in a series of posts on how to build a LINQ IQueryable provider. If you have not read the previous posts you probably have a life beyond the keyboard, but if you don't then follow the link below to find oodles more to help fill your meaningless existence. + + +Complete list of posts in the Building an IQueryable Provider series + + +It's been precisely the correct amount of time that it took for me to complete the additional goodness that is jam packed into this drop, less actual work, dinners out, dinners in, any interesting film and televisions programs, housework, trips out to the store, family game night, time reading fiction, napping on the couch and other assorted unavoidable activities. + + +The full source code can be found at: + + +http://www.codeplex.com/IQToolkit + + +I'll try to cover as much as I can in this post, however you'll like find other gems by scouring the source itself. + + +What's inside: + + + +Updates - Insert, Update & Delete operations. + + +Batch processing - true SQL Server batch processing. + + +Server language type systems - correct parameter types. + + +Mapping Changes - use the same class with multiple tables, etc. + + +Insert, Update and Delete + +It's about time that this toolkit actually got usable right out of the box. My original intention with the series was to show how to build an IQueryable provider and that turned more and more into a fully working query engine that you could actually use to get real work done. Yet, how many real world applications only ever need to pull data out of a database and never push it back? Not many. + + +So I knew I'd eventually want to add updates, because I knew that you'd eventually need to do it too. Yet, every time I started thinking about updates I always fell into the trap of thinking about full blown ORM's with object tracking, et al, and I did not really want to go there, at least not yet. As a toolkit I think its just fine to define the primitives that a more advanced system might be built out of. And there is nothing wrong with those primitives being generally useful on their own. So you should be able to use the toolkit as-is and not only get a pretty good query engine but also something that at least works as a rudimentary data access layer. + + +Common Primitives + +So then I set about thinking about just the primitives for updating data. They should have semantics similar to the underlying SQL equivalent commands. That means they should not defer work until some time later, but execute immediately. There should be at least the familiar commands, Insert, Update and Delete; but also Upsert (both Insert & Update combined) since its so often the right thing for many situations. + + +Also, like other LINQ operations, update commands should be a pattern, and be available for any kind of provider. So I set out thinking about what the pattern would look like and how it might be specified. This is what I came up with. + +public interface IUpdatable : IQueryable + +{ + +} + +public interface IUpdatable : IUpdatable, IQueryable + +{ + +} + + +public static class Updatable + +{ + + public static S Insert(this IUpdatable collection, T instance, Expression> resultSelector) + + public static S Update(this IUpdatable collection, T instance, Expression> updateCheck, Expression> resultSelector) + + public static S InsertOrUpdate(this IUpdatable collection, T instance, Expression> updateCheck, Expression> resultSelector) + + public static int Delete(this IUpdatable collection, T instance, Expression> deleteCheck) + + public static int Delete(this IUpdatable collection, Expression> predicate) + + public static IEnumerable Batch(this IUpdatable collection, IEnumerable instances, Expression> fnOperation, int batchSize, bool stream) + ... + +} + + +This pattern works just like the LINQ Enumerable and Queryable patterns. I've declared an interface 'IUpdatable' that extends IQueryable, so anything that is updatable is also queryable, and then an Updatable class with a bunch of new extension methods that encapsulate the pattern. (I realize the IUpdatable name may be in conflict with some other library, but until I think of something better this is what it is.) + + +The Insert method inserts an object instance into a collection. It's not an ordinary insert, like with List. The collection is considered to be remote and inserting into it copies the data from your instance. It has an optional result-selector argument that can be a function you supply to construct a result out of the object after it has been inserted. This, of course, is intended to occur on the server and can be used to read back auto-generated state and computed expressions. + +IUpdatable customers = ...; + +Customer c = ...; + +customers.Insert(c); + +IUpdatable orders = ...; + +Order o = ...; + +var id = orders.Insert(o, d => d.OrderID); + + +The Update method updates a corresponding object already in the collection with the values in the instance supplied. This is a complete overwrite, not a dynamic update like LINQ to SQL would have generated. I have not yet defined a piecemeal update operation, but I still can. We'll see how it goes. In addition to an update selector (like the one for insert) you can also specify an update-check predicate. This is an expression evaluated against the server's state and can be used to implement optimistic concurrency by basically checking to see if the server's state is still the same as you remembered it. An ORM layer built on top of this primitive might choose to generate this expression automatically, based on mapping information, but here you must specify it manually if you want to use it. + +IUpdatable customers = ...; + +Customer c = ...; + +var computedValue = customers.Update(c, d => d.City == originalCity, d => d.ComputedColumn); + +The InsertOrUpdate is the 'Upsert' operation. It will basically insert an object into the collection if a corresponding one does not exist, or update the one that does with the new values. You specify it just like you'd specify an update, instead you call InsertOrUpdate. + + +There are two flavors of Delete. The first one lets you delete the object in the collection corresponding to the instance. You can optionally specify a delete-check, which is similar to the update-check, a predicate function evaluated against the server's state. The delete will only occur if the check passes. The second flavor just lets you specify a predicate. It's basically a delete-all-where method and will delete all objects from the collection that match the predicate. So far, its the only SQL-like 'set-based' operation I've defined. + +IUpdatable customers = ...; + +Customer c = ...; + +customers.Delete(c, d => d.City == originalCity); + +IUpdatable customers = ...; + +Customer c = ...; + +customers.Delete(c => c.CustomerID == "ALFKI"); + + +The last operation is Batch. It will allow you to specify an operation to apply to a whole set of instances. The operation can be one of the other commands like Insert or Update. You can use this method Insert, Update or Delete a whole bunch of objects all at the same time. If possible, the provider will use optimized batching techniques to give you extra performance. + +IUpdatable customers = ...; + +Customer[] custs = new Customer[] { ... }; + +customers.Batch(custs, c => customers.Insert(c)); + + +If you've got many objects to update and you want to have instance specific update-checks done, you can sneak the extra information into the batch process by combining the data together into a single collection and then piecing them apart in the operation. + +IUpdatable customers = ...; + +var oldAndNew = new [] { new { Old = oldCustomer, New = newCustomer }, ...}; + +customers.Batch(oldAndNew, (u, x) => u.Update(x.New, d => d.City == x.Old.City)); + + +Updates and DbQueryProvider + +In order to make use of this new capability I'm going to need a new object to declare the IUpdatable interface. The Query class only implemented IQueryable, and that was fine as long as I only ever want to query. Now I also want to be able to update, so I need a new class to represent the root of my query that I can also update. These things in databases are called tables, so that's what I'll stick with. + +public interface IQueryableTable : IQueryable + +{ + + string TableID { get; } + +} + +public interface IQueryableTable : IQueryable, IQueryableTable + +{ + +} + + +public class QueryableTable : Query, IQueryableTable + +{ + + string id; + + + public QueryableTable(IQueryProvider provider, string id) + + : base(provider) + + { + + this.id = id; + + } + + + public QueryableTable(IQueryProvider provider) + + : this(provider, null) + + { + + } + + + public string TableID + + { + + get { return this.id; } + + } + +} + + +public interface IUpdatableTable : IQueryableTable, IUpdatable + +{ + +} + + +public interface IUpdatableTable : IQueryableTable, IUpdatable, IUpdatableTable + +{ + +} + + +public class UpdatableTable : QueryableTable, IUpdatableTable + +{ + + public UpdatableTable(IQueryProvider provider, string id) + + : base(provider, id) + + { + + } + + + public UpdatableTable(IQueryProvider provider) + + : this(provider, null) + + { + + } + +} + + +You'll note that not only did I define a UpdatableTable class, which is specifically what I wanted, I also went ahead and made a QueryableTable, and extra interfaces to correspond to them. This is intentional. Eventually, I may want to add more methods specific to tables here and I'll need a place to put them. Right now I've only added a property 'TableID'. You can ignore it for now, though it will get more interesting when I discuss the mapping changes. + + +Take a look at the Northwind class in the test source code and you'll see how I made use of my new table class. + + +The Plumbing + +Of course, update commands work in the query provider just like queries do. First there are a bunch of new DbExpression nodes to represent them. + +public abstract class CommandExpression : DbExpression + +{ + +} + + + +public abstract class CommandWithResultExpression : CommandExpression + +{ + + public abstract Expression Result { get; } + +} + + +public class InsertExpression : CommandWithResultExpression + +{ + + public TableExpression Table { get; } + + public ReadOnlyCollection Assignments { get; } + + public override Expression Result { get; } + +} + +public class ColumnAssignment + +{ + + public ColumnExpression Column { get; } + + public Expression Expression { get; } + +} + + +public class UpdateExpression : CommandWithResultExpression + +{ + + public TableExpression Table { get; } + + public Expression Where { get; } + + public ReadOnlyCollection Assignments { get; } + + public override Expression Result { get; } + +} + + +public class UpsertExpression : CommandWithResultExpression + +{ + + public Expression Check { get; } + + public InsertExpression Insert { get; } + + public UpdateExpression Update { get; } + + public override Expression Result { get; } + +} + + +public class DeleteExpression : CommandExpression + +{ + + public TableExpression Table { get; } + + public Expression Where { get; } + +} + + +public class BatchExpression : CommandExpression + +{ + + public Expression Input { get; } + + public LambdaExpression Operation { get; } + + public Expression BatchSize { get; } + + public Expression Stream { get; } + +} + + +Then there's the standard visit method in DbExpressionVisitor, DbExpressionWriter, etc. Binding them happens in the QueryBinder just like all other query operations, but the work of deciding what nodes to generate gets doled out to the QueryMapping object. Luckily, the base QueryMapping class has a default implementation that builds the correct DbExpression node. If you want to map a single object into multiple tables or some other crazy scheme you'll probably have to have a more advanced mapping implementation. ?? + + +These nodes get plumbed through the system until they are encountered by the ExecutionBuilder and are formatted using the QueryLanguage rules. The TSQL formatter converts the nodes into corresponding TSQL text. Depending on the contents of the command expression, the generated SQL may have one or more actual TSQL operations. + + +Batch Processing + +ADO.Net has this nice feature built into its SqlClient API; the ability to get high-performance batch processing. Yet, the only way to get at it is through use of DataSet's or DataReaders. As far as I'm concerned this is rather low level and a bit complicated to use if you are starting out with domain objects and not DataSets. Your data access layer should do this for you. Yet, in order for it to do it, the abstraction for batch processing has to exist, which is why I added it to the updatable pattern. After that it was a cinch. ?? Not really. + + +What I needed fundamentally was something that would execute the same database command over and over again with different sets of parameters. This is basically what TSQL batching does as it is sent over the wire. So I needed to add this abstraction to DbQueryProvider. Yet, since only SqlClient supports this actually behavior I'd need a fall back plan. So DbQueryProvider implements a method to do batch processing, but it does not actually do it optimally. + +public virtual IEnumerable ExecuteBatch(QueryCommand query, IEnumerable paramSets, int batchSize, bool stream) + +{ + + var result = this.ExecuteBatch(query, paramSets); + + if (!stream) + + { + + return result.ToList(); + + } + + else + + { + + return new EnumerateOnce(result); + + } + +} + +private IEnumerable ExecuteBatch(QueryCommand query, IEnumerable paramSets) + +{ + + this.LogCommand(query, null); + + DbCommand cmd = this.GetCommand(query, null); + + foreach (var paramValues in paramSets) + + { + + this.LogMessage(""); + + this.LogParameters(query, paramValues); + + this.SetParameterValues(cmd, paramValues); + + int result = cmd.ExecuteNonQuery(); + + yield return result; + + } + +} + + +public virtual IEnumerable ExecuteBatch(QueryCommand query, IEnumerable paramSets, Func fnProjector, int batchSize, bool stream) + +{ + + var result = this.ExecuteBatch(query, paramSets, fnProjector); + + if (!stream) + + { + + return result.ToList(); + + } + + else + + { + + return new EnumerateOnce(result); + + } + +} + + +private IEnumerable ExecuteBatch(QueryCommand query, IEnumerable paramSets, Func fnProjector) + +{ + + this.LogCommand(query, null); + + DbCommand cmd = this.GetCommand(query, null); + + cmd.Prepare(); + + foreach (var paramValues in paramSets) + + { + + this.LogMessage(""); + + this.LogParameters(query, paramValues); + + this.SetParameterValues(cmd, paramValues); + + var reader = cmd.ExecuteReader(); + + if (reader.HasRows) + + { + + reader.Read(); + + yield return fnProjector(reader); + + } + + else + + { + + yield return default(T); + + } + + reader.Close(); + + } + +} + + +What I have here are four methods, two of which are just private implementations, but two others that are virtual so can be overridden. Batching can work via streaming or not. If streamed the results of each execution (or batch) is yielded out. This works great if the number of individual items is large, but takes a lot of discipline to remember to actually inspect the results or nothing gets executed at all! By requesting no streaming (stream == false) the execution occurs immediately and the results are packaged into a list that you can conveniently ignore if you so choose. That's why the implementation is separated out, so that the enumerable can be captured and converted to a list, enabling either behavior. The two types of ExecuteBatch differ in whether a result is computed via information coming back from the server or not. + + +Now that these are defined, I can implement a new kind of provider, a SqlClient specific version that makes automatic use of optimized batching. + +public class SqlQueryProvider : DbQueryProvider + +{ + + ... + + + public override IEnumerable ExecuteBatch(QueryCommand query, IEnumerable paramSets, int batchSize, bool stream) + + { + + var result = this.ExecuteBatch(query, paramSets, batchSize); + + if (!stream) + + { + + return result.ToList(); + + } + + else + + { + + return new EnumerateOnce(result); + + } + + } + + private IEnumerable ExecuteBatch(QueryCommand query, IEnumerable paramSets, int batchSize) + + { + + SqlCommand cmd = (SqlCommand)this.GetCommand(query, null); + + DataTable dataTable = new DataTable(); + + for (int i = 0, n = query.Parameters.Count; i < n; i++) + + { + + var qp = query.Parameters[i]; + + cmd.Parameters[i].SourceColumn = qp.Name; + + dataTable.Columns.Add(qp.Name, qp.Type); + + } + + SqlDataAdapter dataAdapter = new SqlDataAdapter(); + + dataAdapter.InsertCommand = cmd; + + dataAdapter.InsertCommand.UpdatedRowSource = UpdateRowSource.None; + + dataAdapter.UpdateBatchSize = batchSize; + + + this.LogMessage("-- Start SQL Batching --"); + + this.LogCommand(query, null); + + + IEnumerator en = paramSets.GetEnumerator(); + + using (en) + + { + + bool hasNext = true; + + while (hasNext) + + { + + int count = 0; + + for (; count < dataAdapter.UpdateBatchSize && (hasNext = en.MoveNext()); count++) + + { + + var paramValues = en.Current; + + dataTable.Rows.Add(paramValues); + + this.LogMessage(""); + + this.LogParameters(query, paramValues); + + } + + if (count > 0) + + { + + int n = dataAdapter.Update(dataTable); + + for (int i = 0; i < count; i++) + + { + + yield return (i < n) ? 1 : 0; + + } + + dataTable.Rows.Clear(); + + } + + } + + } + + + this.LogMessage(string.Format("-- End SQL Batching --")); + + } + +} + + +Note that I only have an implementation for the variation of ExecuteBatch that computes no user specified result. This is due to there being no back-channel available when using SqlClient batching. + + +The implementation uses DataTable's and DataAdapters to make this work. A DataTable is created filled with the parameters necessary for executing the command. The DataAdapter is used to invoke the batch using the Update method. Of course, this doesn't actually have to be an update command. I can also use this to batch inserts and deletes too, or really any TSQL that I want to execute as long as I don't expected to get a bunch of data back. + + +Server Language Types + +One thing that has always bugged me about SQL Server was the need to get the command parameters right. If I declare the parameter to be the wrong text flavor I can cause serious performance issues for the query. So just setting the parameter values and having the SqlCommand object guess at the right SqlType encoding is really not a good plan. Fortunately, it is often possible to figure out the correct parameter types to use if the information is available via the mapping. Parameters are often never just sent to the server for no reason. If I use a parameter in a query I'm usually comparing it against a column. In most cases I can simply infer that the parameter should have the same server type as the column. + + +So I've gone ahead and defined a new server type system abstraction and threaded it into some of the DbExpressions and make use of it in some of the visitors. + +public abstract class QueryType + +{ + + public abstract DbType DbType { get; } + + public abstract bool NotNull { get; } + + public abstract int Length { get; } + + public abstract short Precision { get; } + + public abstract short Scale { get; } + +} + +public abstract class QueryTypeSystem + +{ + + public abstract QueryType Parse(string typeDeclaration); + + public abstract QueryType GetColumnType(Type type); + +} + + +A QueryType encodes the typical database scalar type. It has a few properties that you can use to inspect common attributes of a server type, but most code won't really care to know the details, it will just pass the information along until it ends up where I need it. A QueryTypeSystem is basically a factory for producing QueryType's. The Parse method will construct a language-specific QueryType out of some text encoding. This is typically the server language syntax for declaring a column of that particular type, like 'VARCHAR(10)'. + + +A QueryTypeSystem is specific to a language, so QueryLanguage is where you go to get one. + +public abstract class QueryLanguage + +{ + + public abstract QueryTypeSystem TypeSystem { get; } +} + +One place I definitely know where to encode server types is in the ColumnExpression. If a ColumnExpression knows what its server type is, then when I get to the point of comparing parameters to columns I know which server type is in play. + +public class ColumnExpression : DbExpression, IEquatable + +{ + + public QueryType QueryType { get; } + +} + +I've also stuck it into NamedValueExpression, because that's the type I'm using for parameters. + +public class NamedValueExpression : DbExpression + +{ + + public QueryType QueryType { get; } +} + +And I've basically modified Parameterizer, so that if a column and parameter (named-value expression) are ever paired together in any binary expression, I'll infer the parameter to have the same server type as the column. + +public class Parameterizer : DbExpressionVisitor + +{ + + protected override Expression VisitBinary(BinaryExpression b) + + { + + Expression left = this.Visit(b.Left); + + Expression right = this.Visit(b.Right); + + if (left.NodeType == (ExpressionType)DbExpressionType.NamedValue + + && right.NodeType == (ExpressionType)DbExpressionType.Column) + + { + + NamedValueExpression nv = (NamedValueExpression)left; + + ColumnExpression c = (ColumnExpression)right; + + left = new NamedValueExpression(nv.Name, c.QueryType, nv.Value); + + } + + else if (b.Right.NodeType == (ExpressionType)DbExpressionType.NamedValue + + && b.Left.NodeType == (ExpressionType)DbExpressionType.Column) + + { + + NamedValueExpression nv = (NamedValueExpression)right; + + ColumnExpression c = (ColumnExpression)left; + + right = new NamedValueExpression(nv.Name, c.QueryType, nv.Value); + + } + + return this.UpdateBinary(b, left, right, b.Conversion, b.IsLiftedToNull, b.Method); + + } + + protected override ColumnAssignment VisitColumnAssignment(ColumnAssignment ca) + + { + + ca = base.VisitColumnAssignment(ca); + + Expression expression = ca.Expression; + + NamedValueExpression nv = expression as NamedValueExpression; + + if (nv != null) + + { + + expression = new NamedValueExpression(nv.Name, ca.Column.QueryType, nv.Value); + + } + + return this.UpdateColumnAssignment(ca, ca.Column, expression); + + } +} + + +Of course, the same goes for ColumnAssignment used by Insert and Update commands. You'll notice that I'm not having these types flow throughout the expression tree like normal types. I could probably get more edge cases correct if I did, but for now this handles most of the cases. + + +The GetCommand method in DbQueryProvider will now make use of this info when constructing parameters. The SqlQueryProvider expects to see a new TSqlType that's made available by a new TSqlTypeSystem found on the TSqlLanguage object. ?? + +public class SqlQueryProvider : DbQueryProvider + +{ + + protected override DbCommand GetCommand(QueryCommand query, object[] paramValues) + + { + + // create command object (and fill in parameters) + + SqlCommand cmd = new SqlCommand(query.CommandText, (SqlConnection)this.Connection); + + for (int i = 0, n = query.Parameters.Count; i < n; i++) + + { + + QueryParameter qp = query.Parameters[i]; + + TSqlType sqlType = (TSqlType)qp.QueryType; + + if (sqlType == null) + + sqlType = (TSqlType)this.Language.TypeSystem.GetColumnType(qp.Type); + + var p = cmd.Parameters.Add("@" + qp.Name, sqlType.SqlDbType, sqlType.Length); + + if (sqlType.Precision != 0) + + p.Precision = (byte)sqlType.Precision; + + if (sqlType.Scale != 0) + + p.Scale = (byte)sqlType.Scale; + + if (paramValues != null) + + { + + p.Value = paramValues[i] ?? DBNull.Value; + + } + + } + + return cmd; + + } +} + +Mapping Changes + +I've made some changes to the mapping system (or QueryMapping to be particular.) I came across a variety of odd behavior while developing update logic that basically boiled down to the ImplictMapping object not being able to tell the difference between a type that was intended to correspond to a database table and others that appeared there just for the sake of representation, like LINQ anonymous types. Some other mapping implementations might be able to tell the difference, but the simplest one couldn't so I needed to find another solution. + + +Obviously, everything that is an entity in a query comes from somewhere, and that's either from one of the roots of the query (a table) or via a relationship property. It was a mistake to think otherwise (or not think about it at all.) What I needed was a more explicit representation in the expression tree of what was an entity and what was not. I figured I could either add annotations to the tree in every node, or find some nominal solution that would do the trick. + + +I chose to make a new expression node, EntityExpression, which I use as a wrapper around any expression that would be normally constructing an entitiy. This node is placed into the system when the QueryMapping first creates the sub-express for constructing an entity or relationship, at the time I actually know that I'm dealing with an entity and in particular which entity it is. + +public class EntityExpression : DbExpression + +{ + + public MappingEntity Entity { get ; } + + public Expression Expression { get; } + +} + +I've also introduced a new abstraction called MappingEntity. This how I let the QueryMapping object place a bread-crumb into the expression tree so it can be reminded which exact entity was being referred to. + +public class MappingEntity + +{ + + public string TableID { get; } + + public Type Type { get; } + +} + +It' really just as simple little class that minimally remembers the correspondence between the runtime type and the table its being mapped to. If you've been paying attention you'll realize that this 'TableID' is the same property that was added to the IQueryableTable interface. That's how the query engine gets the table-id in the first place, right from the start of the query. Of course, the IQueryableTable interface also knows the runtime type, that's the 'T' part. So your table's have all the information needed to make a MappingEntity. Except that job is deferred to the QueryMapping object so it can do whatever bookkeeping it wants. + +public abstract class QueryMapping + +{ + public virtual MappingEntity GetEntity(Type type); + + public virtual MappingEntity GetEntity(Type type, string tableID); + +} + +You'll also notice that most of the other methods on QueryMapping are now modified to take a MappingEntity as an argument. + +public abstract class QueryMapping + +{ + + public virtual bool IsMapped(MappingEntity entity, MemberInfo member) + + public virtual bool IsColumn(MappingEntity entity, MemberInfo member) + + public virtual bool IsIdentity(MappingEntity entity, MemberInfo member) + + public virtual bool IsComputed(MappingEntity entity, MemberInfo member) + + public virtual bool IsGenerated(MappingEntity entity, MemberInfo member) + + public virtual bool IsRelationship(MappingEntity entity, MemberInfo member) + + public virtual MappingEntity GetRelatedEntity(MappingEntity entity, MemberInfo member) + + public virtual bool IsAssociationRelationship(MappingEntity entity, MemberInfo member) + ... + +} + +Now, what you're probably saying is "Gee, Matt, that's looks a bit ominous. Why aren't these all just methods on MappingEntity now?" And you'd be right. They probably should be. Yet, it makes it a lot more difficult to subclass the mapping object and merely override some of the behavior. Not a big deal for complete mapping sub-systems that someday might exists, but painful for simple uses such as overriding the ImplicitMapping with a few additional rules. So I'm leaving it as-is for now until I can think about it more. Any thoughts from the peanut-gallery? + + +Also, its worth nothing, that given this new arrangement, with explicit entity information in the query tree and connecting each entity info back to the table it originated from, it is now possible to support mapping systems that allow individual runtime types to be mapped to more than one table. So you can have your cake and eat it too! + + +That's All Folks! + +At least for today. The future may hold more goodies. Any suggestions are welcome, either as ideas or source-code contributions. + diff --git a/docs/blog/building-part-14.md b/docs/blog/building-part-14.md new file mode 100644 index 0000000..b018335 --- /dev/null +++ b/docs/blog/building-part-14.md @@ -0,0 +1,243 @@ +# Building a LINQ IQueryable provider – Part XIV: Mapping and Providers + +Matt Warren - MSFT; April 8, 2009 + +--------------------------------- + +This is the fourteenth in a series of posts on how to build a LINQ IQueryable provider. If you have not read the previous posts you might request a weeks vacation, sit back, relax with a mochacino in one hand a netbook in the other, or if you've got better things to do with your time print them all out and stuff them under your pillow. Who knows, it might work better. + + +Complete list of posts in the Building an IQueryable Provider series + + +Okay, enough with all the post-is-late guilt! It's done now, so breathe a sigh of relief and get on with the reading. + + +What's inside: + + + +More Mapping - Finally a real mapping system, with attributes and XML. + + +More Providers - MS Access and MS SQL Server Compact Edition + + +More POCO - Constructors, Enum and Interfaces. + + +More More More + + +The full source code can be found at: + + +http://www.codeplex.com/IQToolkit + + +More Mapping + + +Attribute Mapping - put attributes on the properties in the class that declares your tables. + +This differs from LINQ to SQL mapping attributes which are placed on the entities themselves and is more like the proposed LINQ to Entity mapping attributes. However, I've not actually gone out of my way to make them the same. The advantages to this approach are 1) keeping the mapping separate from the entity objects (more POCO), and 2) being able to supply different mapping for the same entity type based on the table the entities are accessed from. + +Mapping attributes look like this: +[Table] + +[Column(Member = "CustomerId", IsPrimaryKey = true)] + +[Column(Member = "ContactName")] + +[Column(Member = "CompanyName")] + +[Column(Member = "Phone")] + +[Column(Member = "City", DbType="NVARCHAR(20)")] + +[Column(Member = "Country")] + +[Association(Member = "Orders", KeyMembers = "CustomerID", RelatedEntityID = "Orders", RelatedKeyMembers = "CustomerID")] + +public IUpdatableTable Customers + + +You specify the Table, Column and Association attributes as necessary. The 'Member' refers to the member in the entity type. If this is the same name as the database's column name you don't need to repeat it by specifying 'Name' too. + +You can specify nested mapping information by using a dot in the Member name. This allows you to have what some call value types, but to keep from clashing with .Net terminology I don't. For example, if you've defined an Address type that you want to use in a nested relationship (actually embedded in the same table row) you can do that like this: + +[Table] + +[Column(Member = "EmployeeID", IsPrimaryKey = true)] + +[Column(Member = "LastName")] + +[Column(Member = "FirstName")] + +[Column(Member = "Title")] + +[Column(Member = "Address.Street", Name = "Address")] + +[Column(Member = "Address.City")] + +[Column(Member = "Address.Region")] + +[Column(Member = "Address.PostalCode")] + +public IUpdatable Employees + + +Xml Mapping -- this is same as attribute based mapping but data is read from an XML file. + +Xml mapping looks like this: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + +You use it like this: + + +XmlMapping mapping = XmlMapping.FromXml(TSqlLanguage.Default, File.ReadAllText(@"northwind.xml")); + +SqlQueryProvider provider = new SqlQueryProvider(connection, mapping); + + +Multi-table mapping -- Map multiple tables into a single entity. If you've got entity data spread out over multiple tables with a 1:1 association between them you can now specify the additional tables in mapping using the ExtensionTable attribute or equivalent XML element. + +Here's what a multi-table mapping looks like: +[Table(Name = "TestTable1", Alias = "TT1")] + +[ExtensionTable(Name = "TestTable2", Alias = "TT2", KeyColumns = "ID", RelatedAlias = "TT1", RelatedKeyColumns = "ID")] + +[ExtensionTable(Name = "TestTable3", Alias = "TT3", KeyColumns = "ID", RelatedAlias = "TT1", RelatedKeyColumns = "ID")] + +[Column(Member = "ID", Alias = "TT1", IsPrimaryKey = true, IsGenerated = true)] + +[Column(Member = "Value1", Alias = "TT1")] + +[Column(Member = "Value2", Alias = "TT2")] + +[Column(Member = "Value3", Alias = "TT3")] + +public IUpdatable MultiTableEntities + + +Extension tables are specified similar to how Associations are specified, except you are never referring to members, only column names. You use the 'Alias' value to connect column & association mappings with columns from particular tables. All queries for this multi-table entity treat the 'Table' as the primary table queried, all other tables are queried with left-outer joins. All keys for associations must be from the same alias. + +Can I mix nested mapping with multi-table mapping? I have not tried it, but in theory it should work. It should not matter which table your nested entity gets it's data from, so in effect you can have a composition relationship between one table and another as long as it is 1:1. + + + +What about many-to-many? Not yet. Making the system query a many-to-many relationship is relatively easy. I haven't yet figured out the right semantics for inserts & updates. Right now, all insert, updates and deletes are explicit via calls to the IUpdatable with real-live entities. Yet how do you make an explicit update to the link table that you don't have an entity directly mapped to? I need to ponder this some more. Possibly if one side of the relationship is a composition as opposed to an association, then it would be implied when that side is updated. Yet what if you chose not to load the relationship, how do you tell the system to not delete all previous relationships? + + +More Providers + + +MS Access -- This new query provider works with both Access 2000 - 2003 and Access 2007 data files. I don't know what the true differences are between the Jet and the Ace engines; the query language appears to be identical (as per my limited tests so far), yet the filename extension changed in Access 2007 to 'accdb' instead of 'mdb' and the northwind sample database plumped up an extra 66% in disk size without any additional data. + + +In order to make this work I've added an AccessLanguage object that is necessary to get the correct semantics for MS Access queries and an AccessFormatter object that handles generating the correct command text. In order to salvage as much as I could from the TSqlFormatter, I moved most of this code to a common SqlFormatter base class, and now the TSQL and Access formatters only supply the deviations from the standard syntax. (Of course, 'standard' is currently whatever I deem it to be so don't go getting some actual online specification and prove me wrong.) Access only allows one command at a time, so that added an extra wrinkle, but in the end there is now support in the system for providers that can only do one command at a time. This means there are multiple round-trips to the engine for things like inserting a record and getting back the computed keys. Luckily, the access engine is in-proc so this is not really a burden. A new property on QueryLanguage, 'AllowMultipleCommands' determines how the execution plan is generated and whether multiple commands can be lumped together into a single ADO command. + +The good news is that the access engine passes almost all the Northwind tests; some are not possible (mostly ones testing translation of framework methods that have no apparent equivalent in the access expression engine). There were a lot of hairy strange & subtle differences in syntax between Access and TSQL, but most were handled by having different format rules, some required new expression visitors to change the query tree, like no explicit cross joins! This caused me to write a visitor to attempt to get rid of cross joins (often injected by my visitor that tries to get rid of cross-apply joins) which is now generally useful to everyone, and if that didn't do it, another visitor that would attempt to isolate out the cross joins from any other joins and push them into sub-queries where Access lets me use the old-style comma-list, which is truly a cross join, though it just can't be mixed with other kinds of joins in the same from clause. + + + +SQL Compact -- Yes, even more SQL Server. Though to be truthful, SQL Server Compact Edition (aka SQL CE, aka SQL Compact, aka Skweelzy) is not really SQL Server, it is some other entirely different product that handles a subset of TSQL syntax, and is not a server at all since it runs in-proc just like MS Access. + + +What about MySQL or Oracle? +One day. The fact is that MS SQL and MS Access are easy for me to get to, they are already on my box. Getting something else up and running would take actual effort, and the MS secret database police might come get me. Meanwhile, if someone out there wants to put together a provider implementation I'll add it into the drop. + +# Building a LINQ IQueryable provider – Part XIV + +Matt Warren - MSFT April 8, 2009 + +Where did the SqlQueryProvider go? +I moved it. With the addition of the new providers it became apparent that I'd have to start factoring out all this ADO provider specific nonsense, otherwise all uses of the toolkit would have direct dependencies to way more than necessary. So I made separate projects, each building its own library. I may end up separating all the core 'data' stuff out into its own project too. + +The solution builds these libraries now: + + +- IQToolkit.dll +- IQToolkit.Data.Access.dll +- IQToolkit.Data.SqlClient.dll +- IQToolkit.Data.SqlServerCe.dll + + +More POCO + + +Constructors -- Use entities that don't have default constructors. It is now possible to have entities that require invocation of a constructor with parameters. The binding process will figure out how to call your constructor and the client side code will call it for you as long as the constructor parameter names match property names. You can even have fully read-only entities if all data member are accounted for in the constructor. + + +Enums -- They actually sort of work now. You can have a member in your entity typed as some enum and you get automatic mapping between that enum and a numeric type in the database. + + +Interfaces and abstract base classes -- You can now declare you IQueryable's as IQueryable or IQueryable and have the provider create instances of a compatible type under the covers, automatically translating all your references to interface or abstract class members to the appropriate mapped members. You can have mutiple entities share a common interface or base class and get different mapping for each. You can write code using generics and constrain generic parameters based on an interface and write queries that will get correct translation at runtime. (Note, variation of mapping likely won't work with compiled queries, since the translation is fixed on the first execution.) + +Less Policy + + +There's not a whole lot of policy being used right now and the policy objects dependence on the mapping object was no where near as deep as the mapping object's dependence on the language. So policy is now independent of mapping, which means you can construct providers without specifying policy and/or reusing mapping with different policies. Now if I could only make it simpler to specify/construct mapping without needing to know the language. Back to the drawing board. + + +More Insanity + + +I apologize for the churn. The namespace changed so now all heck is going to break loose. Gone is the simple 'IQ' namespace and in its place is the 'IQToolkit' namespace. I really did like the 'IQ' name, it was short, classy and made you feel intelligent just by looking at it. Yet, it was hard to guess at if you did not already know what it was. I chose to change the namespace name to match the product name and the DLL name. You add reference to the IQToolkit.dll and you import/use the IQToolkit namespace. No fuss, no muss. Except for all those files you'll have to edit now. But hey, this is pre-pre-pre-pre beta stuff. Some people may think they are something special by snarkily keeping all their products in beta. They've got a lot to learn. + + +I hope this toolkit is becoming useful to many. I realize there have been a variety of requests for new things in the toolkit that I just have not gotten time to put in yet. So you can expect plenty more in the future. + + +So enough with reading. It's time to code! \ No newline at end of file diff --git a/docs/blog/building-part-15.md b/docs/blog/building-part-15.md new file mode 100644 index 0000000..e2b019b --- /dev/null +++ b/docs/blog/building-part-15.md @@ -0,0 +1,395 @@ +# Building a LINQ IQueryable provider – Part XV (IQToolkit v0.15): Transactions, Sessions and Factories + +Matt Warren - MSFT; June 16, 2009 + +--------------------------------- + +This is the fifteenth in a series of posts on how to build a LINQ IQueryable provider. If you have not read the previous posts you might try searching for the audio tapes on www.Bing.com. That would be a lot easier than reading. You won't find any, but you'll feel better for having tried. + + +Complete list of posts in the Building an IQueryable Provider series + + +Getting this version of the toolkit together has taken a lot of sleepless nights. Thank goodness for Netflix or I'd have had to spend those sleepless nights actually working on the Toolkit. + + +Okay, enough with the flavor text, let's get to the crunch. + + +What's inside: + + + +More Providers - MySQL and SQLite join the previous MS only line up. + + +Transactions - Use ADO transactions to control the isolation of your queries & updates. + + +Entity Providers - The provider concept is expanded to include tables of entities + + +Entity Sessions - The session concept adds identity caching, change tracking and deferred updates via SubmitChanges + + +Provider Factory - Create providers on the fly w/o knowing anything more than the database name and mapping. + + +Madness + + + +The full source code and redistributable DLL's can be found at: + + +http://www.codeplex.com/IQToolkit + + +More Providers + + +MySQL -- With the stewardship of MySQL in doubt after the purchase of Sun by Oracle, I was leery of taking on the challenge of making a MySQL provider for the toolkit. Yet, the benefits of doing so turned out to be very significant and not just for the MySQL users, as it forced me to challenge some of my very assumptions about SQL which in turn made some of my meager testing better, which in turn made the toolkit better as a whole. + +I was surprised when I went to download the free and open-source database product that I had to fill out a bunch of information on myself and wait for a formal approval (that took many days) before I was *allowed* to access the product. This seemed like an awful lot of hassle and odd for an open source project; way too corporate. Fortunately, I persisted and got the server up an running with no problem. Next I searched high and low for MySQL version of Northwind so I could have something to run the tests against. If only Bing were available then! I had to settle for plain ol' Live Search. Fortunately I found what I was looking for, even if it didn't offer me any cash back. + +MySQL seems to me what you get if you take a bunch of C programmers and tell them to make a SQL database product and yet offer no guidance. This is not criticism, just a bit of snarky wit over many of the underscore endowed API's that make up the product. Luckily, you'll not need to worry yourself about the manual dexterity needed to type such awkward function names, as you'll be using LINQ and the MySQL provider will do the work for you. On the positive side, the function library of MySQL is quite large and feature rich. I had no problem finding appropriate translations. MySQL even had many interesting features that I'd like to go back and refactor the toolkit to take advantage of. Next release maybe. + +There were a few problems I uncovered while trying to get MySQL to pass the test suites. I took the opportunity to fill out the execution suite that has languished for a few releases now, so I could actually assert what the correct output ought to be for specific queries. In almost all the cases MySQL did the right thing (as far as I was expecting.) In most of the cases that it did not appear to do the right thing, it often turned out to be bad assumptions on my part about collation ordering and so removing those assumptions from the tests fixed everything up. The remaining problem still boggles my mind. + +In a test designed merely to prove that the translation of simple joins succeeded to generate the right query, the database did not return the correct number of rows. + +from c in db.Customers +from o in c.Orders +from d in o.Details +select d.ProductId + +which executes a MySql query that looks like this: + +SELECT d.ProductId +FROM [Customers] AS c +LEFT OUTER JOIN [Orders] AS o ON o.CustomerID = c.CustomerID +LEFT OUTER JOIN [Order_Details] AS d ON d.OrderID = o.OrderID + +I expected to get a number of rows corresponding to the total number of order-details in the database. But I didn't. I got a whole lot less. So I started experimenting with different forms of the query. If I selected o.OrderID instead of d.ProductID, I got a different number from either of the other two. If I selected c.CustomerID, again an entirely different number of rows. It was only when I chose to select the entire order-detail object did I get the number of rows I was expecting. + +Something strange seems to be going on. On nobody's definition of SQL should I be getting a different number of rows depending on the columns in the selection (except if that SELECT has a DISTINCT operator specified.) It was as if all queries were getting a default DISTINCT operation. When I took a look a the results when I selected out only ProductID, sure enough, all I got back were distinct product-id's. MySQL has an 'ALL' keyword that is the semantic opposite of DISTINCT, so I tried adding that to the query, but to no avail. + +I don't know if this is a problem in the query engine or the transport layer, or if the MySQL folks actually think it is appropriate behavior. As far as I'm concerned it is catastrophically bad; E_FAIL. For this problem alone I would recommend not using MySQL. Still, it may not be as bad as it sounds. It is likely very rare that you'd ever actually write a query that was expected to retrieve duplicate rows of data. So it may not ever impact you at all. I do recommend testing all your queries (and your application) before you put it into production. + + +SQLite - This is an open-source database product that is similar in nature to MS Access and SQL Compact, since it runs in-proc and is not really a server. After wrangling with MySQL, I thought I knew all I needed to know about different forms of SQL, until Jonathan Peppers sent me his go at making a SQLite provider. ( See, I told everyone that I'd add new providers if they would send them to me.) Of course, SQLite had a few problems of its own. + +While MySQL has a huge library of function, SQLite has a tiny one; many .Net API's don't have supported translations. (This is of the set I had translated for MS SQL. To be truthful, MS Access falls short on some of these too.) So queries using API's other than some simple string manipulation or equality testing probably won't work against SQLite. + +In addition one of SQLite's big drawbacks is its lack of a rich type system. SQLite's developers claim this to be a feature, and it may afford great flexibility, but is often a death sentence for LINQ queries, which are all strongly typed. SQLite does have types, but only a few of them, like number, text and binary. Problems arise when you try to use DateTime's, as they are not really their own type in SQLite, but a text layout. For example, if you were trying to find all orders for a customer that happen in January, you might write this query. + +from o in db.Orders +where o.CustomerID == cust && o.OrderDate.Month = 1 +select o + +this would produce this query. + +SELECT o.OrderID, o.OrderDate +FROM Orders AS o +WHERE o.CustomerID = @cust AND STRFTIME('%m', o.OrderDate) = 1 + +The date function to extract the month out of the OrderDate column is really a text formatting function that extracts the month portion of the date-time text layout, which in this case is the text '01', since the date-time format is always padded to a specific width. When this is compared against the number (1), another subtle difference crops up. For all other SQL's I've come across, text is considered the weakest form of type, so when two types are compared for equality text is always converted to the other form. Yet, in SQLite, the opposite it true (which truthfully is more like C# and Java). So when the two types are compared they are found incompatible, because the number (1) is turned into the text '1' and that is not the same as the text '01'. Adding insult to injury, there appears to be no type conversion functions at all, so I can't even work around the problem by injecting a conversion. My ignorance of SQLite may just be showing here, or a lack of sufficient documentation. + +So my recommendation if you are using SQLite, to stay away from most API functions in your where clauses and such. API calls in the projection are okay, because these get executed on the client. + +Transactions + + +I'm surprised no one's called me on this before. The DbQueryProvider and its ilk have been suspiciously lacking in support for transactions. The providers work, LINQ queries are converted into ADO Commands and executed, yet those ADO Command objects are never assigned an ADO transaction, even if you started one explicitly. + +Of course, the official word from Microsoft is to stop using the ADO transactions altogether and instead use the System.Transactions.TransactionScope object, that is newer, better and enables automatic use of distributed transactions, etc, etc, etc. And if you did use TransactionScope, then the problem I'm referring to would not be a problem. SqlCommand object's would implicitly enlist in the transation without me having to specify anything. + +Unfortunately, TransactionScope is mired with many problems and is not supported by all ADO providers so ADO transactions are still a necessity. You can now use ADO transactions with query providers in a manner similar to LINQ to SQL. + +provider.Transaction = provider.Connection.BeginTransaction(); + +// use the provider here to execute queries and updates, etc. + +provider.Transaction.Commit(); +provider.Transaction = null; + +The provider will use whatever transaction object you give it when it creates new ADO command objects. + + +Entity Providers + + +I decided to formalize the pairing of query providers with a Table object that enables updates and other facilities. The definition of an entity provider is now defined by these three interfaces. + + +public interface IEntityProvider : IQueryProvider + +{ + + IEntityTable GetTable(string tableId); + + IEntityTable GetTable(Type type, string tableId); + +} + +public interface IEntityTable : IQueryable, IUpdatable + +{ + + new IEntityProvider Provider { get; } + + string TableId { get; } + + object GetById(object id); + + int Insert(object instance); + + int Update(object instance); + + int Delete(object instance); + + int InsertOrUpdate(object instance); + +} + + +public interface IEntityTable : IQueryable, IEntityTable, IUpdatable + +{ + + new T GetById(object id); + + int Insert(T instance); + + int Update(T instance); + + int Delete(T instance); + + int InsertOrUpdate(T instance); + +} + + + +You can now always get to a table directly from a provider. The two concepts are coupled together. An entity table also has explicit CRUD methods and implements IUpdatable, so no more separation between normal tables and updatable tables. In my mind this simplifies things quite a bit. + +Of course, this caused me to want to rename DbQueryProvider. Ooops. This will likely cause you some grief as any of your existing code that was using DbQueryProvider directly is now not going to compile. The new name for this class is now DbEntityProvider. It might not matter so much now that there is a nifty IEntityProvider interface. + + +Entity Sessions + + +One thing missing from the Toolkit so far has been all of that context stuff that LINQ to SQL and LINQ to Entities have. When you use LINQ to SQL you have a change tracking service that detects when your objects change, and sends the updates for you all at the same time when you call SubmitChanges. + +An entity session is all of this change-tracking, deferred updating stuff packaged up together. It is distinctly different from an entity provider in these ways, yet similar to one in many others. + +An entity session is defined below: + + +public interface IEntitySession + +{ + + IEntityProvider Provider { get; } + + ISessionTable GetTable(string tableId); + + ISessionTable GetTable(Type elementType, string tableId); + + void SubmitChanges(); + +} + +public interface ISessionTable : IQueryable + +{ + + IEntitySession Session { get; } + + IEntityTable ProviderTable { get; } + + object GetById(object id); + + void SetSubmitAction(object instance, SubmitAction action); + + SubmitAction GetSubmitAction(object instance); + +} + + +public interface ISessionTable : IQueryable, ISessionTable + +{ + + new IEntityTable ProviderTable { get; } + + new T GetById(object id); + + void SetSubmitAction(T instance, SubmitAction action); + + SubmitAction GetSubmitAction(T instance); + +} + + +public enum SubmitAction + +{ + + None, + + Update, + + PossibleUpdate, + + Insert, + + InsertOrUpdate, + + Delete + +} + + +public static class SessionTableExtensions + +{ + + public static void InsertOnSubmit(this ISessionTable table, T instance) + + { + + table.SetSubmitAction(instance, SubmitAction.Insert); + + } + + + public static void InsertOnSubmit(this ISessionTable table, object instance) + + { + + table.SetSubmitAction(instance, SubmitAction.Insert); + + } + + + public static void InsertOrUpdateOnSubmit(this ISessionTable table, T instance) + + { + + table.SetSubmitAction(instance, SubmitAction.InsertOrUpdate); + + } + + + public static void InsertOrUpdateOnSubmit(this ISessionTable table, object instance) + + { + + table.SetSubmitAction(instance, SubmitAction.InsertOrUpdate); + + } + + + public static void UpdateOnSubmit(this ISessionTable table, T instance) + + { + + table.SetSubmitAction(instance, SubmitAction.Update); + + } + + + public static void UpdateOnSubmit(this ISessionTable table, object instance) + + { + + table.SetSubmitAction(instance, SubmitAction.Update); + + } + + + public static void DeleteOnSubmit(this ISessionTable table, T instance) + + { + + table.SetSubmitAction(instance, SubmitAction.Delete); + + } + + + public static void DeleteOnSubmit(this ISessionTable table, object instance) + + { + + table.SetSubmitAction(instance, SubmitAction.Delete); + + } + +} + + + +As you can see, an entity session has tables, just like a provider. Yet, those tables are not directly updatable. Instead you can assign entity instances submit actions. These are the actions that take place later when you call SubmitChanges. There are a bunch of extension methods defined to add the LINQ to SQL like InsertOnSubmit() methods to the interface. These simply call the SetSubmitAction() method for you. + +Also note that a session is not a provider. It is a service used in conjunction with a provider. You can use multiple different sessions with the same provider instance. + +There is one current implementation of an entity session in the Toolkit called (you guessed it) DbEntitySession. You create a DbEntitySession by giving it an existing DbEntityProvider. The DbEntitySession hooks the provider in such a way that it gets first crack at all materialized objects before they are returned to you. In this way, the DbEntitySession can employ an identity cache so queries that retrieve the same entity will always return the same entity instance, and it can start automatic change tracking on all entities returned. + +You are also not locked into the session's behavior. At any time you can interact with the underlying provider instead for retrieving entities without passing through the identity cache or being changed tracked. You can even get to the provider's table directly off a session table. + + +Provider Factory + + +Now with so many providers and one single way to write queries you'd think it would be easy to switch between them. In reality it is not. You have to pick the provider you want, reference its library (IQToolkit.Data.XXX), reference its corresponding ADO library (System.Data.XXX), create the ADO connection, the mapping object and construct the provider. + +var connection = new SqlConnection("..."); +var mapping = new AttributeMapping(typeof(Northwind)); +var provider = new SqlProvider(connection, mapping, QueryPolicy.Default, null); +var db = new Northwind(provider); + +You can hide this all inside your database context class (or whatever you want to call yours), so you only have to write it once, but then your context class is tied to a specific provider. Instead, you could wrap this code up into a factory method of your own devising, but then calls to the factory would be spread throughout your codebase. There no good way to defer all this work to some configuration setting. Until now. + +Introducing the new factory methods built into DbEntityProvider. + +public static DbEntityProvider FromApplicationSettings(); + + +public static DbEntityProvider From(string filename, string mappingId); + +public static DbEntityProvider From(string provider, string connectionString, string mappingId); + + + +These methods allow you to get up and running with only knowing a few bits of information. You don't have to hard link you application to any particular provider. + +The FromApplicationSettings method creates you a new instance of a provider from information found in the config file. It looks for the "Provider", "Connection" and "Mapping" properties in the configuration and feeds them to the other factories. It is also possible to look this information up in web settings, but I have not formalized that one yet. + +The provider argument is a string that refers to the name of an assembly that contains the query provider. These are generally of the form IQToolkit.Data.XXX. If that assembly is not loaded, it will be loaded dynamically. This assembly can be in the assembly cache or in the same directory as your app (or other places that the runtime might look.) From this assembly it will look for a type in the same namespace (as the name of the assembly) that derives from DbEntityProvider. + +The connectionString and filename arguments are really the same thing. You can specify either the name of a database file or a full ADO connection string. If a file is specified, a correct connection string is obtained by calling the static GetConnectionString(string) method on the provider. A provider may be inferred from the file extension of a database file if none is specified. + +The mappingId can either refer to the name of a context class (like Northwind) that has mapping attributes on it or the name of an xml file. + +So now you can write code like this to get your provider. + +var db = new Northwind(DbEntityProvider.From(somedbfile, somemapfile)); + +Or better yet, you can use the FromApplicationSettings() method in the constructor of your context and still be configurable at runtime. + + +Madness + + +Of course, it wouldn't be a new toolkit release without some additional crazy changes. One significant change is namespaces again. This time its not going to conflict with your code too much. Most of the classes that where in IQToolkit.Data have been demoted into the namespace IQToolkit.Data.Common. This includes most all classes that are implementation detail or base classes. Mapping attributes and the like are now in IQToolkit.Data.Mapping. This makes the namespace clean and obvious when you start looking for things via intellisense. + +DbEntityProvider and DbEntitySession are the only classes sitting in IQToolkit.Data, as these are the ones you'll likely need to reference when writing code. IEntityProvider and IEntitySession are in IQToolkit namespace, because they are not specific to ADO (System.Data classes). + + + +I hope you find this version feature rich enough to either build application directly on top of it, or model your own provider or data layer by using these techniques. + + +Don't forget the audio tapes. + diff --git a/docs/blog/building-part-16.md b/docs/blog/building-part-16.md new file mode 100644 index 0000000..d80f503 --- /dev/null +++ b/docs/blog/building-part-16.md @@ -0,0 +1,189 @@ +# Building a LINQ IQueryable Provider – Part XVI (IQToolkit 0.16): Performance and Caching + +Matt Warren - MSFT; September 15, 2009 + +-------------------------------------- + + +This is the sixteenth in a series of posts on how to build a LINQ IQueryable provider. If you have not read the previous posts you might try rolling into a ball and crying for momma. That’s right, reading on is as hazardous to your health as a skinned knee. Just be warned and have an anti-biotic on hand. + + +Complete list of posts in the Building an IQueryable Provider series + + +I would have had the update out sooner but I couldn’t think of what to write in this excuse slot. My excuse generator was down for scheduled maintenance but did not come back on line as planned and wouldn’t you know it the techie responsible gave me all sorts of lame excuses. Some people! + + +What's inside: + + + +Performance – I’ve actually tried to improve things. + + +Caching – Re-use queries without the burden of defining delegates. + + +Evaluation – Execute expression trees without Reflection.Emit. + + +Bug Fixes – Most of the bugs reported on CodePlex are fixed. + + + +The full source code and redistributable DLL's can be found at: + + +http://www.codeplex.com/IQToolkit + + +Faster is Better + +One of the big things I tried to tackle this time around was to finally do some performance improvements. Until now, the only performance considerations were in the design of compiled queries and the use of compiled LINQ expressions for materializing objects. Yet, when I looked at actual performance of compiled queries versus straight ADO, there was still a lot of overhead. + + +Where was the time being spent? As it turns out, even though I was compiling a LINQ expression to represent the execution plan and an explicit function to convert DataReader rows into objects, ideally making the writing of data into objects as fast as possible, there was still room for improvement. The problem was not the writing of data into objects, but reading the data from the DataReader. I wholely blame myself for this. In an attempt to simplify the IQToolkit source code way back in one of the original posts I chose to read data using the general DataReader.GetValue method. This has two undesirable side effects; 1) the value is returned typed as object, which mean all non reference types (mostly numeric and datetime types) are boxed, which is measurably significant, and 2) in order to make sure the value was in the correct form to write it into the object I now had to coerce it back (which immediately led to an unbox operation, equally as significant.) + + +I tried many variations of solutions. Ideally, the runtime code would simply know which field access method of the DataReader to call and to coerce the results only if necessary. Unfortunately, the translation engine does not track enough information to be 100% certain of the data type the provider will return. It can make a good guess, but if it is wrong then the query will die in a fiery exception that halts the program and sends your manager storming toward your office. The solution I chose was sort of a hybrid. Based on an expected value type a type specific helper method is now invoked. This method calls the equivalent DataReader method inside of a try-catch. I know, I hate having to use try-catch in this way, but the cheap cost of setting up the try-catch and the rare condition where the guess is wrong led me to the dark side. I will now change the color of my light-saber from blue to red. + +Here’s an example of the GetInt32 helper method. + +public static class FieldReader + +{ + ... + public static Int32 GetInt32(DbEntityProviderBase provider, DbDataReader reader, int ordinal) + + { + + if (reader.IsDBNull(ordinal)) + + { + + return default(Int32); + + } + + try + + { + + return reader.GetInt32(ordinal); + + } + + catch + + { + + return (Int32)provider.Convert(reader.GetValue(ordinal), typeof(Int32)); + + } + + } + + ... + +} + +As long as the expected type is correct, the faster DataReader.GetInt32 method is called. If that fails, then the fallback is to call the general GetValue method and coerce the result. This should rarely happen. + + +This is all it took to get the compiled query into very low overhead versus direct ADO calls; mostly less than 3%. I’ve added a performance test you can run to check it out on your set up. Of course, this will vary depending on the query, the hardware, the load on the server and the network latency. + + +Caching Queries + +“Why can’t the provider simply cache the queries for me.” I’ve gotten this request a lot. Sometimes from direct pleas in email, other times from those of you trying to do it yourself and asking for advice. + + +It seems natural to imagine that it would only take a little bit of work for a provider to determine if a query being executed now is the same or similar to one executed before, so why is the only way to re-use a query to specify one using the cryptic syntax of generic methods, lambda expressions and delegates? And why do I have to hold onto the thing myself, can’t some service do this for me? + + +Of course, my usual reaction is to give a heavy sigh in the privacy of my office and then craft a quite sensible fire-and-brimstone reply, complete with infallible logic and dramatic consternation, as to why and how you really are better off with the compiled query solution. But I’m tired of doing that, so instead of impressing you with my sound reasoning I’m going to show you how I went ahead and just did it. I like challenges, and there’s no better challenge than the impossible, or the sometimes impractical, or the generally ill-advised. + + +What I built is a new piece of code called the ‘QueryCache’. It is actually implemented to be generic enough that it will work with your own IQueryable provider. Yet its not currently integrated into any provider, though you may choose to embed it into yours. You can, however, use the cache as is to execute your queries and take advantage of its cache-y goodness. You don’t have to make delegates and invoke them, you simply have to give the cache your query and it will give you back the results. + + +var cache = new QueryCache(10); + +var query = from c in db.Customers where c.City == "London" select c; + +var result = cache.Execute(query); + +Here’s how it works. The cache maintains a most-recently-used list of compiled queries. Every time you execute a query via the cache, the cache compares your query against the ones in the cache. If it finds a match, it simply uses the existing compiled query and invokes it. If not, it makes a new compiled query and adds it to the list. + + +Of course, that’s the easy part. + + +The hard part is figuring out how to compare an IQueryable query object against a list of compiled-query delegate objects and determine which ones can be reused. For example, are these two the same query? + + +var query1 = from c in db.Customers where c.City == "London" select c; +var query2 = from c in db.Customers where c.City == "Seattle" select c; + +Technically they are different expression trees, but if that’s the deciding factor then I might as well give up now. They are structurally similar and so it is logical to assume that a query compiled for one should be nearly identical to a query compiled for the other. If I were using compiled queries directly I would simply choose to make the name of the city a parameter to the compiled query delegate and invoke it with different values. So isn’t that just what I want the cache to do for me? + + +To do this I need a little tool that will take a look at a query tree and decide which bits should be considered parameters and then give me back a new query tree with the parameters in place of those bits. As it turns out this is not really all that hard since it seems obvious for the most generality that anything that appears in the query tree as a constant should be deemed a parameter. A trivial expression visitor can produce this. + + +If it also wraps the rewritten query tree in a lambda expression and gives me back the set of constants that it extracted then I have everything I need to make a compiled query and invoke it with an array of values. I also have what I need to find an already compiled query in the cache, since compiled queries hold onto their defining query trees. So if my first query in the example above is already in the cache, it already has a parameterized query tree defining it and that ought to look awfully similar to the parameterized version of the second query. + + +The only thing I need now is a way to compare two lambda expressions to see if they are similar. Fortunately, I wrote that ages ago. The ExpressionComparer does just this. + + +If I cobble all these parts together into a reasonable order I get my QueryCache. Now, I can use compiled queries without ever having to manually define them again! Huzzah! + + +Yet, if this is so grand, why haven’t I taken that one last step and simply plugged into the base DbEntityProvider class? + + +Unfortunately, reality is not as rosy and I would hope. + + +There are a few problems holding me back. The most significant is the silly act of parameterizing itself. The problem with the cache is that it doesn’t know which constants in the query should be made into parameters, so it turns them all into parameters. Yet, as much as I’d like it to be otherwise, databases don’t treat all parameters equally. Sometimes a parameter is terrific for performance; a database like SQL Server optimizes by storing parameterized query plans so you basically get the same benefit client and server. Yet, sometimes, for some bits of SQL, for some back ends, parameters either cannot be used or have crippling effects on speed. So it's usually the best policy to be explicit and judicious when using parameters, something the cache in all its glory cannot chose for you. + + +So I leave the cache, for now, as something you can explicitly opt-in to using, per query. + + +Please, give it a try. Feedback is welcome. + + +Expression Evaluator + +Sometimes you may want to execute expression trees as code without actually turning it into true runtime code. You may be running on a platform that does not currently support the Expression.Compile() method (or the ability to use Reflection.Emit to generate IL at runtime) like Windows Mobile. I’ve been encouraged by folks here at Microsoft to think about this from time to time, so I set out to explore this space and the result is a whole mess code called the ExpressionEvaluator, which is an expression tree interpreter. I added my experimental solution to the toolkit in case it is beneficial for someone. It is a lot of code, so it doesn’t compile into the library by default. You have to enable it with a specific #define, NOREFEMIT. This will also switch over all direct uses of Reflection.Emit in the toolkit to using this evaluator. + + +It has a hefty performance overhead, so I would not consider it a viable alternative to Reflection.Emit. I’ve even added many tricks to avoid calls to reflection and boxing. This has made it a lot faster than a naive solution, but still it pales in comparison to real IL. However, in cases where there is no alternative, its probably the best you are going to get. + + +Crazy Antics + +In keeping with the spirit of making changes for no good reason, I’ve unbundled the base set of toolkit code from the ADO provider specific bits. So now if you are using IQToolkit to build your own non-ADO based provider you don’t have to be dragging the rest of it along for the ride. All the SQL translation and DbEntityProvider goodness is now in its own DLL, IQToolkit.Data.dll. This means you’ll probably have to tweak your projects to include this new DLL, but that’s about all. + + +The DLL list is now: + + + +IQToolkit.dll +IQToolkit.Data.dll +IQToolkit.Data.Access.dll +IQToolkit.Data.SqlClient.dll +IQToolkit.Data.SqlServerCe.dll +IQToolkit.Data.MySql.dll +IQToolkit.Data.SQLite.dll + + +Of course, as always, you only need around the ones you are using, or none of them if you are simply lifting the code out as source. + + +That’s All Folks + +I’m sure there are many more wonderful nuggets of goodness I added but forgot to mention. If you discover any of them please file a full report at http://www.codeplex.com/iqtoolkit. \ No newline at end of file diff --git a/docs/blog/building-part-17.md b/docs/blog/building-part-17.md new file mode 100644 index 0000000..d8d3010 --- /dev/null +++ b/docs/blog/building-part-17.md @@ -0,0 +1,432 @@ +# Building a LINQ IQueryable Provider – Part XVII (IQToolkit 0.17): Automatic Caching and Loading Policies + +Matt Warren - MSFT; February 9, 2010 + +------------------------------------ + +This is the seventeenth in a series of posts on how to build a LINQ IQueryable provider. If you thought this series would last this long then you must have some eerie metaphysical powers of insight that go beyond mere blind faith in the gods of the interwebs. So powerful, in fact, you may now be considered a threat to national security, since with all your witchery and kanoodling you might be secretly tapping the thoughts and provoking the actions of persons in higher office into doing unthinkable, irrevocable things. You may think you are doing good by protecting the innocent from master villains like Sylar, but the normals will still fear you anyway. You might want to keep a lookout for strange black sedans parked down the street from your home. I know I do. + + +Complete list of posts in the “Building an IQueryable Provider” series + + +I feel like I’m finally putting the finishing touches on this masterpiece. Sure, there’s always more to do, but the deliberate loose ends that I left in the product so you would have something to fret about are finally being seen to. + + +What's inside: + + + +Cleaning House– I’ve thrown down the gauntlet on FieldReader. + + +Automatic Caching – Entity providers get automatic query caching. + + +Automatic Loading – Finally a usable implementation of QueryPolicy. + +Split Personalities – Language, Mapping and Policy each split into two classes. + +Provider Abstraction – Build entity providers that use API’s other than ADO. + + +The full source code and redistributable DLL's can be found at: + + +http://www.codeplex.com/IQToolkit + + + +Fielding a new Reader + +I was not too happy with the state of the FieldReader class in the last installment. It was a bit quirky that it relied on exception handling to function whenever the entity member type did not match the database’s column type, at least in how the database provider chose to represent it. I normally pride myself on sticking to my guns, and I’ve always told myself to never, ever, use exception handling for anything except for handling truly exceptional cases, and a mismatch like this was not really an exceptional case, since it happens quite a lot when the whole purpose of having a mapping system is to allow for just this sort of flexibility between the representation of objects in the programming language and that of the tables in the database. And besides, it wasn’t fast enough. + + +What I was really trying to do was enable the object materialization code that is built by the ExecutionBuilder class to operate as fast as it could, without relying on a bunch of runtime checks and coercions to get the data into the right form. I wanted to be able to determine at query build time the correct method to call on the DataReader without having to verify each time a field is read that its going to work (because most of the ADO providers just give up and throw exceptions when you ask for the data in the wrong form). I introduced the FieldReader class as a way to isolate the logic that would attempt to reduce this overhead (so I didn’t have to force all this logic in to pure expression node format). Yet, the FieldReader itself still had the problem of trying to figure out if the call it was about to make against the DataReader was going to succeed or not. So to avoid the call to GetFieldType() each time, I bailed out and just wrapped the whole thing in a try-catch, falling back to the expensive GetValue call on the DataReader whenever the provider balked at my tenacity. + + +Looking at the problem again, I realized that it wasn’t the call to GetFieldType() that was so expensive, it was calling it every time I read the field. If I could simply call it once per query per field, then I could know the right course of action for each row without the cost, and without the exceptions. What I needed to be able to do was to save state about the specific query execution, per data reader, per field. But alas, the FieldReader class was just a bunch of static methods. + + +I needed to build a better mouse trap. + + +Getting state to be stored per query is not too difficult, since I create a new instance of a materializer for each query, so there is at least something to stick data onto, even if it might be awkward to do it. Yet, getting state per DataReader is much more difficult since there may be multiple DataReaders in a single query (think hierarchical results with preloaded associations.) What I needed was an abstraction for reading fields that would be different and adjustable for each DataReader. Of course, that’s the point where the aha for the aha moment occurred, and I realized that what I really needed was to abstract the DataReader out of the problem. + + +Introducing the new FieldReader. Its now a DataReader, or at least it looks like one. It’s the equivalent of a DataReader but defined by the entity provider, not the ADO provider. It wraps a DataReader and holds onto whatever state it pleases. + + +So let’s take a look at how it gets to be fast. + + +The FieldReader is an abstract base, and it doesn’t know a thing about DataReader’s at all. It exposes ReadXXXX methods to the materializer, and abstracts over a DataReader by declaring abstract methods that correspond to DataReader methods. + +public abstract class FieldReader + +{ + + TypeCode[] typeCodes; + + public FieldReader() + + { + + } + + + protected void Init() + + { + + this.typeCodes = new TypeCode[this.FieldCount]; + + } + + + public Byte ReadByte(int ordinal) + + { + + if (this.IsDBNull(ordinal)) + + { + + return default(Byte); + + } + + while (true) + + { + + switch (typeCodes[ordinal]) + + { + + case TypeCode.Empty: + + typeCodes[ordinal] = GetTypeCode(ordinal); + + continue; + + case TypeCode.Byte: + + return this.GetByte(ordinal); + + case TypeCode.Int16: + + return (Byte)this.GetInt16(ordinal); + + case TypeCode.Int32: + + return (Byte)this.GetInt32(ordinal); + + case TypeCode.Int64: + + return (Byte)this.GetInt64(ordinal); + + case TypeCode.Double: + + return (Byte)this.GetDouble(ordinal); + + case TypeCode.Single: + + return (Byte)this.GetSingle(ordinal); + + case TypeCode.Decimal: + + return (Byte)this.GetDecimal(ordinal); + + default: + + return this.GetValue(ordinal); + + } + + } + + } + + +The ReadByte method shows an example of how this works. The FieldReader holds onto an array of TypeCodes, one per field, and uses it to determine which underlying method to call on the DataReader. It doesn’t actually call the DataReader directly, since the FieldReader class does not actually know about a DataReader, but it does have abstract methods that a provider can implement to communicate to its model of reading data. The first time this method is called for a particular field ordinal, the TypeCode has a value of Empty. If this is the case it simply computes the correct TypeCode and tries again. A switch statement directs the code to the correct data reading method. + + +Now that I’ve done it, it seems almost obvious. Getting the rest of the stack to talk about FieldReaders instead of DataReaders took some work, and lead to more fiddling and changes, but I’ll get to that later. + + +Automatic Caching + + +In the last version I introduce the QueryCache object and showed how you could use it to get many of the benefits of a pre-compiled query without actually pre-compiling it. Yet, still, in order to use it you had to use it directly each time you wanted to execute a query. That was a bit awkward, but I was hesitant about going further at the time, worried that an automatic caching feature may have undesired effects for particular kinds of queries. I speculated that it was not possible to determine precisely which sub expressions in a query ought to be handled as database parameters and which could be safely identified as constants (some queries cannot work with parameters in particular spots). However, an IQToolkit user called me out on this and showed me the light. I was humbled. + + +Now you are receiving the benefit in a new feature. The automatic query cache. + + +Each EntityProvider (the new name for the provider base class, and yes it does change every version) now has a Cache property that can be assigned an instance of a QueryCache. If the entity provider is given a cache it will automatically use that cache when executing queries. So now you can simply execute queries like normal, without pre-compiling, and if you execute what looks to be the same query again, it executes faster the second time. + + +How much faster is it really? + +Unfortunately, it is still not as fast as having pre-compiled queries. Not by a long shot. The cost of actually looking in the cache and finding an equivalent already compiled query is still rather expensive. + + +What is the expensive part? + + +The most expensive part of this re-use is not comparing expression trees, but isolating the parameter values. I have to break the query down into two parts, the common query base and the parameters specified in the new query. I then match the new query’s common base with the prior executed queries to find a previously compiled query to reuse, but in order to execute the previous query I need to supply the new parameter values and in order to do that I have to turn expression tree fragments into values. + + +I can evaluate LINQ expressions by turning each one into a zero-parameter lambda and using LINQ’s own feature of compiling lambdas into delegates that can then be invoked to get values. Yet that feature, like pre-compiled queries, works great if I am going to evaluate the same expression over and over again. It’s a huge amount of overhead if I just want to evaluate the expression once. + + +The worse part of it all is realizing that was exactly what the partial evaluator was doing. It was isolating sub expressions and evaluating them using the LINQ compile feature, and yet these compiled delegates were only ever being invoked once. + + +I seriously needed a way to evaluate LINQ expressions without using Reflection.Emit. This is probably the point where YOU think “aha”, and predict I’m just being sneaky by leading up to discussing that very same expression evaluator that I included as source code in the last version. But I’m not. I’m not sneaky or even leading up to it. Because that evaluator wasn’t any faster. + + +What’s a developer left to do? + + +Handle the common cases as simply and straight-forward as possible and leave the ugly stuff to LINQ. So, I’ve added a bit of *quick* direct reflection calls to the partial evaluator that gets used in the common cases of basic field accesses, which happens a lot in C# when you write queries that reference locals that get captured in state classes. + + +It is really bizarre to discover that reflection API is the faster solution, after spending so much effort trying to work around it. + + +Automatic Loading + + +The only piece missing so far in making the toolkit a ready-to-use library for building apps on top of was its lack in any meaningful way of controlling deferred loading of associations. I designed in the QueryPolicy abstraction as a way to guide the query translator and executor in doing the right thing, but the only way of actually setting up a policy was to invent your own implementation of one. I figure about three of you actually did that. The rest sent me mail saying that deferred loading did not work. + + +Fortunately, I’m here to tell you that has all changed. There’s a new class in town and its ready to be instantiated. + + +I give you the EntityPolicy. + + +It works similarly to LINQ to SQL’s DataLoadOptions. You construct an instance of one and then call some methods on it telling it how you would like each association to be loaded or not and whether to either defer its loading or load it up front. + +var db = new Northwind(provider); + +var policy = new EntityPolicy(); + +policy.IncludeWith(c => c.Orders); + +provider.Policy = policy; + + +var cust = db.Customers.Single(c => c.CustomerID == "ALFKI"); + + +Use the IncludeWith() method to instruct the provider to include the Orders property when the Customer type is retrieved. This will result in an query returns back all the data for customers and their related orders. + + +The query looks like this when run against the Access test database: + + +PARAMETERS p0 NVarChar; +SELECT t1.[CustomerID], t1.[OrderDate], t1.[OrderID] +FROM [Customers] AS t0 +LEFT OUTER JOIN [Orders] AS t1 + ON (t1.[CustomerID] = t0.[CustomerID]) +WHERE (t0.[CustomerID] = p0) +-- p0 = [ALFKI] + + +PARAMETERS p0 NVarChar; +SELECT t0.[City], t0.[CompanyName], t0.[ContactName], t0.[Country], t0.[CustomerID], t0.[Phone] +FROM [Customers] AS t0 +WHERE (t0.[CustomerID] = p0) +-- p0 = [ALFKI] + + + +It is actually two queries; one for customers and one for orders. + + +If you want to retrieve order details too you’ll need to modify your policy like this: + +var db = new Northwind(provider); + +var policy = new EntityPolicy(); + +policy.IncludeWith(c => c.Orders); + +policy.IncludeWith(o => o.Details); + +provider.Policy = policy; + + +var cust = db.Customers.Single(c => c.CustomerID == "ALFKI"); + + +And the query will turn into three separate queries; one each for customers, orders and order-details: + + +PARAMETERS p0 NVarChar; +SELECT t2.[OrderID], t2.[ProductID] +FROM ([Customers] AS t0 +LEFT OUTER JOIN [Orders] AS t1 + ON (t1.[CustomerID] = t0.[CustomerID])) +LEFT OUTER JOIN [Order Details] AS t2 + ON (t2.[OrderID] = t1.[OrderID]) +WHERE (t0.[CustomerID] = p0) +-- p0 = [ALFKI] + + +PARAMETERS p0 NVarChar; +SELECT t1.[CustomerID], t1.[OrderDate], t1.[OrderID] +FROM [Customers] AS t0 +LEFT OUTER JOIN [Orders] AS t1 + ON (t1.[CustomerID] = t0.[CustomerID]) +WHERE (t0.[CustomerID] = p0) +-- p0 = [ALFKI] + + +PARAMETERS p0 NVarChar; +SELECT t0.[City], t0.[CompanyName], t0.[ContactName], t0.[Country], t0.[CustomerID], t0.[Phone] +FROM [Customers] AS t0 +WHERE (t0.[CustomerID] = p0) +-- p0 = [ALFKI] + + +If you want to include Orders with Customers, but not up front, you can specify an option on the IncludeWith method to defer loading of this property. This will only work if the association property has been declared as a DeferredList or equivalent type (like a simple IList that can be assigned a DeferredList.) Otherwise, you’ll simply force the deferred query to be executed immediately upon assignment, which will be much less efficient than simply instructing the policy to include them without deferring. + +var db = new Northwind(provider); + +var policy = new EntityPolicy(); + +policy.IncludeWith(c => c.Orders, true); + +provider.Policy = policy; + + +var cust = db.Customers.Single(c => c.CustomerID == "ALFKI"); + + +Now, when you execute the query you only see the first query for customers until you actually look inside the Orders collection. + + +Automatic Filtering and Ordering Associations + + +Filtering or ordering an association collection can be done when you include it using the IncludeWith method or separately by using the AssociateWith method. + +var db = new Northwind(provider); + +var policy = new EntityPolicy(); + +policy.IncludeWith(c => from o in c.Orders where o.OrderDate.Month == 1 select o); + +provider.Policy = policy; + + +var cust = db.Customers.Single(c => c.CustomerID == "ALFKI"); + + +This is logically the same as using IncludeWith and AssociateWith separately: + +var db = new Northwind(provider); + +var policy = new EntityPolicy(); + +policy.IncludeWith(c => c.Orders); + +policy.AssociateWith(c => from o in c.Orders where o.OrderDate.Month == 1 select o); + +provider.Policy = policy; + + +var cust = db.Customers.Single(c => c.CustomerID == "ALFKI"); + + +Both produce the same queries: + + +PARAMETERS p0 NVarChar; +SELECT t1.[CustomerID], t1.[OrderDate], t1.[OrderID] +FROM [Customers] AS t0 +LEFT OUTER JOIN [Orders] AS t1 + ON ((t1.[CustomerID] = t0.[CustomerID]) AND (Month(t1.[OrderDate]) = 1)) +WHERE (t0.[CustomerID] = p0) +-- p0 = [ALFKI] + + +PARAMETERS p0 NVarChar; +SELECT t0.[City], t0.[CompanyName], t0.[ContactName], t0.[Country], t0.[CustomerID], t0.[Phone] +FROM [Customers] AS t0 +WHERE (t0.[CustomerID] = p0) +-- p0 = [ALFKI] + + +It is also perfectly legal to specify the AssociateWith before the IncludeWith, or even without one. An AssociateWith by itself will still cause filtering to occur when the association property is used as part of the query. + + +Automatic Filtering Entities + + +It is also possible to specify filter operations that apply to entities anytime they are referenced, not just via an association. You can use the Apply() method to apply any query operation over an entity table any time it is referenced in the query. + + +For example, image you wanted to limit the customer list to only customers in a particular city, and any query over customers would automatically get this restriction. + +var db = new Northwind(provider); + +var policy = new EntityPolicy(); + +policy.Apply(custs => from c in custs where c.City == "London" select c); + +provider.Policy = policy; + + +var custCount = db.Customers.Count(); + + +So even when the Count() aggregate is used, the query includes this filter. + + +PARAMETERS p0 NVarChar; +SELECT COUNT(*) +FROM [Customers] AS t0 +WHERE (t0.[City] = p0) +-- p0 = [London] + + +Of course, you can use this feature for more than just filtering, as long as the query returns the same types of objects that originated in the original table. + + +Split Personalities + + +If you dig a little beneath the surface of the new sources, or if you really are building your own providers based on the toolkit, you’ll notice a big change that has occurred with the definition of the language, mapping and policy objects. Each of them has been split into two. The intention is to separate the models from the behaviors. In their prior incarnations, each of these classes both specified an API that exposed information about each area and at the same time had methods that directly took responsibility over a stage of query translation. + + +So with a QueryLanguage you get a QueryLinguist that applies language specific transformations to queries; with QueryMapping you get a QueryMapper that applies the mapping and with QueryPolicy you get the QueryPolice, that enforces the rules about associations and inclusion. (And as a benefit you can with a straight face, tell users that yes there are query police and they’ll come get if you continue to writer queries like that.) + + +Separating these out had the side effect of solving some awkward problems in the design. For example, the necessity to have a fully specified QueryLanguage in existence before a QueryMapping can be created, since the translation steps of the mapper sometimes depend on the model of the language, yet a correct QueryLanguage model must be chosen by the provider, which in turn depends on the mapping. So there was an almost cyclic dependency that required the user to know more than would otherwise be necessary about the provider before constructing one. + + +Separating out the behaviors allow each language, mapping and policy to be specified separate of one another, enabling the provider to choose the language directly. + + +Provider Abstraction + + +It always bothered me that the backend models for these IQToolkit providers where directly tied to System.Data. It made a sort of sense, because most existing data access API’s on .Net are System.Data providers, so this dependency works for most uses. However, there is so much other code in the toolkit that could be applied to build query providers on top of other API’s that it was unfortunately that the ADO types, connection, transaction, etc, were tied in at such a low level. + + +So keeping in the spirit of always changing everything, I’ve once again refactored the base classes for entity providers. + + +So EntityProvider is the new abstract base class for providers that brings in the model of query translations, languages, mapping and policies, but does not depend on ADO. DbEntityProvider then derives from this and introduces connections and transactions. The existing query providers for TSQL, Access, SQL compact, MySql and SqlLite are all based off the DbEntityProvider class. Yet, it is now possible to build a provider that is based on another API and yet still make use of the rest of the infrastructure. + + +EntitySession is the class to use for sessions. It is not an abstract base, but it no longer depends on ADO types and functions entirely in conjunction with an EntityProvider. So it’s possible to use sessions with new style providers. + + +Even QueryTypeSystem has been simplified to not reference System.Data types, and a new DbTypeSystem class has been introduced to add knowledge of data types like DbType and SqlType. \ No newline at end of file