When we are using Entity Framework as our data access choice to work with SQL Server then there are some potential pitfalls.
One of these pitfalls is doing selective includes. What do I mean by selective includes?
To answer that let's first look at our class and database model.
Lets suppose we have some customers and some invoices in our application. Our database schema and model classes could look like this:
And we may have data like this:
Now suppose we wanted to select all customer records and their invoices raised in August 2014. With plain SQL we might do something like this:
And from there we could create one customer object in C# for Mary and load in 3 invoice objects into her Invoices collection. Nice and simple.
In Entity Framework our first go at coding this problem might look something like this:
If we run this and debug it though we see that all invoices have been loaded as seen below:
The reason for this is that our code in plain English is actually saying get Customers where they have any invoices in August 2014 and include invoices.
Entity Framework will actually include all of the Invoices though - the where clause is just acting on customers - ie select customers who have such invoices.
This is counter intuitive especially if you are from an SQL background prior to working with Entity Framework. This can of course cause awful problems in your application. At worse it will just behave totally incorrectly and even if it works ok it might be very slow if you are loading lots of data that you don't actully need.
So what we need is a way to tell Entity Framework to load customers for a certain condition but also only load invoices for a certain condition as well.
It would be nice if within the Include statement we could put a Where clause on the end, like this for example.
This is just wishful thinking though and if you try this then it will just fall over.
So there are other ways of doing this.
We can of ccourse use the linq language syntax rather than the fluent api but I like the fluent api and use it for eveything so don't want to deviate from that just for this one scenario.
We could also load the customer and then filter the invoices as below:
Now the above code does work in that we end up with the data that we want. However it works by loading all customer invoices and then filtering the in memory collection. So whilst this will give you the data you want, it could be very slow particularly if a customer has thousands or millions of invoices. This is all related to deferred execution. This means that we can construct our linq queries and then add to them more and more and it is not until we iterate over the collection by using a foreach or calling ToList for example that the SQL code will actualy be executed against the database.
So ideally what we want is a way to construct our query in such a way that only one database hit is performed. We can do this but there is something a bit weird about how it works which I shall explain.
Lets first look at the syntax for a working example and then break it down.
Warning - do not make ICollections Virtual in your models otherwise this doesn't work!
Now there is something about this code that bothered me for a while and I couldn't understand it and that really bothered me. It was nice that I had a solution to my coding problem but it bothered me that to me I didn't think this code should work.
Let me explain why by looking at some simillar code but using an in memory collection of POCOs as our data store rather than EF and SQL.
My test project will use Customers and Invoices as well so I can match our Entity Framework code as much as possible.
My customer and invoice classes looks like this:
My main code which is a console application then looks like this
If we look at this code compared with my Entity Framework code we can see that it is exactly the same as shown below:
So what we are doing with both these pieces of code is using the linq Select statement to create a new anonymous type which consists of two properties. One of these properties is called c and is a Customer and the other property is called Invoices and is a collection of Invoices.
At the end of our code we sort of go backwards and use another linq Select but this time we are acting on our anonymous type and selecting just c.
Lets change our console app code slightly to omit this final Select and see what we have.
My new experimental code now has an extra bit of code in to do the above and looks like this:
If we run this and debug it we can see what is in our query before we do our final Select - ie what is in filteredCustomerTest.
It looks like this:
So we can see that we have an anonymous type which has a proprty called c which is of type Customer. This has all 6 invoices. We then have our second property called Invoices which is just the 3 invoices.
Now the problem I had is that when we no do a final Select on this in our console application we select just the c property with .Select(projection => projection.c) and in this case we will be left with the 6 invoices of course and we can see this when we run the application as follows:
This is what I would expect as we are not telling the c property in any way to use the other Invoices property. They are unrelated. This was the problem I was unable to understand with the Entity Framework query. With that it is as if it knew to use the Invoices property. So even though we have the same code, there must be something going on under the hood to explain this behaviour and this is what I would like to understand.
I suspect that the answer lies in the Select method. This method is part of linq and is an extension method of IQueryable as shown below.
When we run the Select method then the source is our DbSet of Customers and we can see in this code that the CreateQuery method is invoked against the Provider which belongs to the source.
Now I don't want to dive too far into the rabbit hole but it is clear enough that if we followed CreateQuery deep enough with the use of something like Reflector then we would eventually find some code which expalins our behavior.
This is enough to satisfy my curiosity for now and I do feel happy and confident in now constrcuting Entity Framework Selective includes.