An Agreeable Procrastination – and the blog of Niels Kühnel

Code, the universe and everything

Posts Tagged ‘.NET

Paging entities with eager fetched collections efficiently in NHibernate

with 7 comments

UPDATE: For almost any practical purpose batch loading is faster. The exotic cases are not worth the effort. This post is only relevant if you are obsessed with ORM performance tuning and want inspiration from some crazy ideas.

Last time I blogged (it’s been a while) I shared some ideas about how unrelated one and many to many relations could be loaded eagerly without the overhead of the Cartesian explosion. It turned out that this could be implemented in NHibernate by overriding the ANSIJoinFragment and wiring it from a custom dialect (see link at end).
Now, another problem arises when you want to paginate the results from these queries. When you say “give me results 10 to 20” NHibernate literally does that as it hydrates rows 10 to 20 from the result set. This gives you less than 10 entities with incomplete collections. At least this happens with SQL server, and all of the following is related to this issue.
What you really want is the 10th to 20th root entities of you query.

Assume that we have a very simple situation with some Persons (ID, Name, Tags) and Tags (ID, Name) and does a query like this

var persons = session.QueryOver().OrderBy(x => x.Name).Asc.Skip(5).Take(10)
                    .Fetch(x => x.Tags).Eager
                    .List();

Let’s consider the generated SQL and see where it goes wrong

SELECT TOP (@p0)
  ID0_1_, Name0_1_, Person3_3_, ID3_, ID5_0_, Name5_0_ FROM
    (SELECT [ID0_1_, Name0_1_...], ROW_NUMBER() OVER(ORDER BY this_.Name) as __hibernate_sort_row FROM [Person] this_left outer join [Tag] tags2_ ON this_.ID = tags2_.Person_id) as query
    WHERE query.__hibernate_sort_row > @p1
    ORDER BY query.__hibernate_sort_row;
@p0 = 10 [Type: Int32 (0)], @p1 = 5 [Type: Int32 (0)]

The problem is that ROW_NUMBER() is used only for offsetting and the good old SELECT TOP … is used for limiting. In the current form
neither of these take into account that multiple rows for the same root (here person) should only be counted once.
If we remove TOP and only uses the row number we get the following query that still doesn’t work, as row numbers are unique:

SELECT [ID0_1_, ... ] FROM
    (SELECT [ID0_1_, ... ], ROW_NUMBER() OVER(ORDER BY this_.Name) as __hibernate_sort_row FROM [Person] this_left outer join [Tag] tags2_ ON this_.ID = tags2_.Person_id) as query
    WHERE query.__hibernate_sort_row > @p1  AND query.__hibernate_sort_row     ORDER BY query.__hibernate_sort_row;
@p0 = 10 [Type: Int32 (0)], @p1 = 5 [Type: Int32 (0)]

If RANK is used instead of ROW_NUMBER we actually get what we want but it’s a.) very inefficient with joins as the server has to join all the records of the tables before it can say anything about rank, b.) too easy 🙂

What we really want is to do a sub query that confines the root entities we want and then join the eager load tables on only those. This is very close to the queries that would normally arise from lazy loading, except that the database server does it all at once as fast as it can.
If we consider this general query structure

SELECT {RootFields} (Other fields...) FROM {Root table} {Root alias} (Some joins and what not...) {Where} ORDER BY {Order by}

it must be made into this

SELECT {RootFields} (Other fields...) FROM (
   SELECT __n__, {RootFields} (Other fields...) FROM (
     SELECT ROW_NUMBER() OVER ({Order by}) __n__, {RootFields} FROM {Root table} {Root alias} {Where}
     ) {Main alias}
     (Some joins and what not...)
    ) query WHERE __n__ BETWEEN @Offset AND @Offset + @Limit
ORDER BY __n__

This puts some restrictions on the where clause. As it is shifted into the root entity query it can’t
consider fields from the other joins, but as the joins are assumed to be for eager loading they shouldn’t be filtered in the first place. If you really want to filter the root entities on their relations EXISTS queries should be used instead.

In my implementation I give the option to toggle the behavior by adding EagerMsSqlDialect.VeryBigNumber to the limit. Default is off,
so the number must be added. This makes it explicit that a dialect specific feature is used.
Curiously, a big number is actually needed as NHibernate cuts off in the hydration process after “limit” records has been processed so the extra records returned by the query wouldn’t be considered otherwise. I do prefer the former reason for VeryBigNumber though 🙂

Putting it all together
So with the eager fork joins and improved paging you can write code like this

var q = session.QueryOver().OrderBy(x => x.Name).Asc
                    .Skip(5)
// Add EagerMsSqlDialect.VeryBigNumber to limit to use the improved paging
                    .Take(EagerMsSqlDialect.VeryBigNumber + 10)
                    .Fetch(x => x.Tags).Eager
                    .Fetch(x => x.Skills).Eager
                    .Fetch(x => x.Phonenumbers).Eager
                    .Fetch(x=>x.Skills.First().Tags).Eager
                    .Fetch(x => x.Skills.First().Groups).Eager
                    .TransformUsing(Transformers.DistinctRootEntity)
                    .List();

Here

  1. only one query is sent to the database
  2. no redundant rows are returned (i.e. no Cartesian explosion)
  3. root entities 5 to 10 are returned
  4. the collections asked for are initialized so no lazy loads are used later

You can find my proof of concept code at https://bitbucket.org/nielskuhnel/eager-nh-extensions together with a small console example.
Unfortunately it doesn’t work with NHibernate.Linq, as a) the current dialect’s JoinFragment isn’t used when joins are constructed b) Take and Limit are discarded when FetchMany is used. Luckily it works like a charm with the new QueryOver API.

I would really love to hear if it works for you, and if it improves performance in code you have already written.

Written by niels.kuhnel

February 13, 2011 at 5:49 pm

Posted in Uncategorized

Tagged with ,