This is more a general approach question. I have an old ORM like framework for this application. Basically what it does, if it needs to load a bunch of objects/records, is for every child object in my result a new query is triggered to get it's data. Since the nesting can get pretty deep, it triggers a lot of queries to get everything it needs. How would you approach this to make it more efficient?

I've left out a lot, to avoid blinding you with what I have (since that is not really important here).

(I'm not looking for framework suggestions.)

Question one is always: Does the loading really need to be more efficient? In other words, is there a measurable and significant wait/delay when you're loading things? If not, or if the delay is acceptably short, I wouldn't worry about it.

If there's a significant/unacceptable delay, then question two is: Where in code is that time being spent? Use a performance profiler; the results may be surprising (anecdotal evidence available if you want).

If profiling indicates the queries themselves, question three is: Do you need the data all at once, or are you using it piece by piece?

If you're using the data piece by piece, lazy loading is a decent option; it spreads the loading cost over time, and it will be far outweighed by the time spent doing whatever it is you do with the data. Also, this will avoid loading data that you don't actually use, if any.

If you need all of the data together, then it really comes down to the overhead of making the queries. Performance profiling will give you much better information, but off the top of my head, I can think of two things to do:

  1. Reuse the database connection for multiple queries
  2. Combine sub-queries for multiple objects

Does the loading really need to be more efficient?

Yes, it does.

Where in code is that time being spent?

As stated, in getting the child objects recursively one-by-one.

lazy loading is a decent option

I know, but I need all data in this case so it will only move the delay down the line.

Performance profiling will give you much better information

Not what I am after, but what I stated is the bottle-neck anyway.

Reuse the database connection for multiple queries

Already in place.

Combine sub-queries for multiple objects

How would I do that with deep nested child objects?

Legacy in-house. It doesn't matter. Am looking for logic, not a new one ;)

It does matter which ORM it is - is it hand made - are you able to add to it or does it have any extension points or is it set in stone?

EF uses .Include with deferred execution, that works well - maybe you can emulate that - depends on the ORM though!

are you able to add

It's not set in stone, that's why I'm looking for logic. I might be able to put the right method in.

EF uses .Include with deferred execution

Isn't that lazy loading? I'd prefer a method that doesn't use that (I have it already). I do not wish to load the related records when needed, I need them all, I need a more efficient method.

No what I mentioned was eager loading - ie exactly what you need.

.Include tells the ORM what to include - ie which joins to generate in SQL server. It only does one database hit. It does this using deferred execution - ie the first time the data is enumerated.

Oh, sorry. Saw deferred, not Include. I'll have a look.

All deferred means is that you can keep adding to the query - adding extra where clauses etc but not until the point in time when the query is enumerated will EF create the SQL and execute it against the database in one go.

Know of any explanations that describe how you would build that one resulting query? Am browsing/debugging the EF source, but having a theoretical approach would be nice.

Does the loading really need to be more efficient?

Yes, it does.

Where in code is that time being spent?

As stated, in getting the child objects recursively one-by-one.

Performance profiling will give you much better information

Not what I am after, but what I stated is the bottle-neck anyway.

Have you actually profiled the code? It wasn't clear from your post, nor from your response.

I don't mean to pick at this; profiling is something everyone who has a performance problem should do, even if it's only to confirm what they already know about the code: nothing beats cold, hard evidence to justify spending time (== money) on this kind of improvement.

If you haven't, it might be worthwhile... I've seen some abysmally-performing .NET data access code that spent 95% of its time buried deep in the internals of DataReader digging up table metadata to determine column types, rather than on the query itself.

lazy loading is a decent option

I know, but I need all data in this case so it will only move the delay down the line.

Fair enough.

Reuse the database connection for multiple queries

Already in place.

Yay!

Combine sub-queries for multiple objects

How would I do that with deep nested child objects?

You end up with one query for each nesting level. In SQL, it ends up looking something like this:

SELECT
    ParentKey,
    ParentFields...
FROM ParentTable

SELECT
    p.ParentKey,
    c.ChildKey,
    c.ChildFields...
FROM ChildTable c
JOIN ParentTable p
    ON c.ParentKey = p.ParentKey

...modulo your data model.

If you need WHERE clauses, I'd store each SELECT statement's result into a table variable or something, then join on that, which beats filtering every table in every query.

...and you'll have to split up the child records by parent key in code, of course; that cost has to be paid somewhere.

Have you actually profiled the code? It wasn't clear from your post, nor from your response.

I thought this "Not what I am after, but what I stated is the bottle-neck anyway" was an obvious yes ;)

You end up with one query for each nesting level.

EF does it in one query, wondering how to replicate that.

I know thanks, just tricky to get what you need from that.

Load on demand is what you are looking for. I suggest a reading of LINQ to SQL which supports these sorts of scenrios our of the box.

Perhaps you should read the entire thread. That's not what I'm asking.

Let me have a think - working today but will post some thoughts at the weekend.

At a high level you need to provide a mechanism for client code to specify what it wants loaded.

Ef with Linq does this as follows:

var customers = context.Customers.ToList();

var customers = context.Customers.Include("Invoices").ToList()

var customers = context.Customers.Include("Invoices.InvoiceLines").ToList()

So the first says load just customers, the second says load customers and invoices and the last says load customers, invoices and invoicelines.

The above usually uses lambdas rather than strings so:

var customers = context.Customers.Include(c => c.Invoices).ToList()

and so on.

So in your code you need to examine the client specification and when building your sql add joins as appropriate to get the data you need. You then need to parse the resuls from SQL and create OO objects from the flat data.

Thats is really at a very high level. Of course it is more tricky in practice but that's the general idea I think.

Agreed. Using DataAnnotations and Reflection I already know how to determine what is a field/FK to a table. Assuming a simple nesting level of company and contact I can build a straightforward query. Am not sure how to proceed if say the company also has a preferred transporter (also a company) in it. I know all my fields need prefixing, but how do I assure they stay unique?

I'm happy to look at scenarios with you. Can you ceate some actual tables though and show me the SQL to create them then we can be aure we are looking at the right thing.

I will. But that will be next week probably. Dont have access to a computer at the moment.

Here's a simplified class hierarchy, but enough to get started I think.

// base class used for identification (reflection)
class BaseClass
    int Id // PK always

class Contact : BaseClass
    string Name

class Company : BaseClass
    string Name
    List<Contacts> Contacts
    Company PreferredTransporter

SQL:

CREATE TABLE Contact (
    Id INT IDENTITY(1,1) NOT NULL,
    Name VARCHAR(100) NOT NULL,
    CompanyId INT NOT NULL)

CREATE TABLE Company (
    Id INT IDENTITY(1,1) NOT NULL,
    Name VARCHAR(100) NOT NULL,
    PreferredTransporterId INT NULL)

So say I want to build a query to get all companies with their attached contacts and transporters. A transporter will never have a transporter set, so no infinite recursion (avoided by business logic and a check constraint).

Hi

Sorry mean to look at this yesterday but got tied up - will have a tinker tonight.

No rush. If it hadn't started raining I wouldn't have posted anything yesterday ;)

Have you actually profiled the code? It wasn't clear from your post, nor from your response.

I thought this "Not what I am after, but what I stated is the bottle-neck anyway" was an obvious yes ;)

Just making sure... it sounded like you probably had, but experience has taught me to verify in the absence of an explicit statement, especially regarding performance. I've encountered too many people who "know" what the problem is and don't see the point of actually measuring anything.

I'll leave it alone, but I notice you still haven't given an explicit "yes" :)

You end up with one query for each nesting level.

EF does it in one query, wondering how to replicate that.

Everything I can think of offhand ends up with a giant denormalized result, which could get unwieldy if there's a lot of nesting and a lot of data.

You could set up a simple object hierarchy and use a database profiler to see what EF comes up with.

Sorry real busy on my new contract. Will definatley have a look at the weekend at the latest!

Really sorry - still uber busy!

No need to apologize, it is not urgent. Next week am off anyway, and they might have seen the light to switch to EF for new projects.