Feb 21 2008

Duplicate Madness

Tag: NHibernate,ORMSymon Rottem @ 4:08 pm

One of the things that really kills me in NHibernate is the way that when using joins there are often duplicated entities in the result set. The thing that set me off today is a thread I got involved in on the NHiberate forums where the poster was getting what appeared to be bizarre entity duplication in a list property of an object retreived using ISession.Get.

In this particular case the model was…

Person -< Order -< OrderItem

…where each Person had multiple Orders and each Order had multiple OrderItems. Each of the collections (Person.Orders and Order.OrderItems) had been mapped as bags and for performance reasons (presumably appropriate to their context) the chosen fetch strategy for the collections was set to fetch=”join”.

Everyone with me so far?

The problem here is that if you perform an ISession.Get on a Person who has 1 Order and that Order has 3 OrderItems the Person.Orders collection will contain 3 of the same Order. Why? Well this stems from the way the underlying result set returned by the database works – NHibernate formulates a query with joins in it to get all the data in one go (since that’s what we asked for).

The SQL query for ISession.Get(typeof(Person), 300) would look something like this (select clause omitted for brevity):

...FROM Person p INNER JOIN Orders o ON o.PersonId = p.Id INNER JOIN OrderItems i ON i.OrderId = o.Id WHERE p.Id =300;

(Note that this is hand rolled. And they might not be inner joins, but you get the idea.)

Using our scenario of a Person with 1 Order which has 3 OrderItems from above the result set might look like this:

p.Id p.Name o.Id o.PersonId o.Date i.Id i.OrderId i.Description
300 Symon 595 300 21/02/2008 9876 595 Fishing Rod
300 Symon 595 300 21/02/2008 9877 595 Hat
300 Symon 595 300 21/02/2008 9878 595 Boat

Now, you see the Get operation only returns one Person object but the Orders collection is populated from the result set and there are 3 rows there that contain order information so 3 order items are added to the collection.

Now it’s not quite as bad as it looks. The database operation was efficient because we didn’t have to go to the database multiple times and if you’re thinking we’re using up a chunk of extra memory with the unwanted extra Order items you’d be wrong – the list contains 3 references to the same object in memory, so there’s really only one Order there.

In this particular example the duplicates would not be there if the collections were mapped as Sets rather than Bags. This is because a Set does not allow duplicates so they are filtered out when they’re added.

A more likely situation where you’re going to see this behavior is where you use the default fetch strategy in your mapping but you execute an HQL or ICriteria query with a join:

.SELECT k FROM Cat c WHERE INNER JOIN c.Kittens k WHERE c.Colour = :ParentColour AND k.Colour = :KittenColour

When you call the IQuery.List() method this will return a list of Kitten entities, but since there can be more than one Cat that has the same Kitten in their Kittens collection (Mummy and Daddy, right?) the same Kitten may show up more than once in the result set.

This is, apparently, behavior by design. When the original Hibernate (Java version) developers were making decisions they felt that it made more sense for the returned objects to match the result set – I can’t remember the post I read it from since it was more than a year ago now, so I might not have this exactly right and I’d be happy for someone to correct me.

Again, the solution is to feed the results of the returned IList into a set which will remove the duplicates for you. There is also a feature on the ICriteria API that allows you to provide a result set transformer too – if you don’t believe me you can re-read the documentation…