Lost Generation – Will come true unless we choose to reverse it

21. March 2009

This is off-topic but I do care about it.

NHibernate POID Generators revealed

20. March 2009

(Disclaimer: This post will be more or less a paraphrase of Fabio Maulo’s post, and I hope I can improve it a bit)

This topic is something that I wanted to write because I wasn’t aware of the drawbacks of “ native/identityimage” generator has until Fabio told me. Now it is my turn to spread the information to those who aren’t aware too. I even made a small poll via twitter, to see who uses what, and the result turns out to be that majority of people use identity/native for some reasons.

NHibernate has several object identifier generators for entities. Each of them has their cons and pros as anything else does.

We can basically seperate generators into two: PostInsertGenerator and ORM Style generators ( you can also call them identity style vs orm stlye generators). impact on your  I will investigate them in their categories.

ORM Style Generators 

ORM style generator can generate the identifiers before objects are sent to database. This is advantageous because you don’t need to go to database in order to have the ID, then set a relation based on this id. It also promotes Unit-Of-Work since you don’t need to go to database everytime an object is added/updated instead you do those at the moment of commit. Those generators are what WE SUGGEST.

Currently NHibernate provides several ORM style generators, some of them are listed below.

  • Guid
    Generates id’s by calling Guid.NewGuid(). Main drawback of this is with indexes. We know that Guids are more or less random(or pseudo-random let’s say) and this randomness creates fragmentation in database index. If you also think that the field is a PK, then it becomes more dramatic since they are stored in sorted manner.
  • Guid.Comb
    A very clever improvement over the Guid way. It creates guid based on the system time, and the guid it creates is database friendly. It doesn’t cause fragmentation in the table. You can see it from here

    I wonder if anybody reads the ALT of images? 
    (Image taken from Pamir Erdem’s blog)
    The effect of SequentialNewId() for default value has more or less the same effect of Guid.Comb
  • HiLo/Sequence HiLo
    This one is the one I like the most. It is both index friendly and user friendly. A HiLo id has 2 parts as the name suggests they are Hi and Lo. Each session factory gets the Hi value from database (with locking enabled), and lo values are managed by the session factory on its own. This algorithm also scales really well. All factories gets the Hi value only once. This reduces the the database traffic that aims to get the Hi values.

Post Insert Generators / Identity Style Generators

Post insert generators, as the name suggest, assigns the id’s after the entity is stored in the database. A select statement is executed against database. They have many drawbacks, and in my opinion they must be used only on brownfield projects. Those generators are what WE DO NOT SUGGEST as NH Team.

Some of the drawbacks are the following

  1. Unit Of Work is broken with the use of those strategies. It doesn’t matter if you’re using FlushMode.Commit, each Save results in an insert statement against DB. As a best practice, we should defer insertions to the commit, but using a post insert generator makes it commit on save (which is what UoW doesn’t do).
  2. Those strategies nullify batcher, you can’t take the advantage of sending multiple queries at once(as it must go to database at the time of Save)

There are several Post Insert Generator strategies (hey 2.1 has even more!) some of which are listed below(there are many, check Fabio’s post here)

  1. Identity
    Identity generator uses the value that is generated by MsSQL "identity” stuff. However, it’s meaning in the mapping changes depending on the dialect. For example, if database supports MsSQL like identity, then it will be used, if it supports sequences, then sequences will be used, etc. Something I learnt today from the MSSQL group is that MSSQL may sometimes return invalid SCOPE_IDENTITY() value.
  2. Guid.Native
    If I am to speak in terms of MsSQL terminology, it uses the NEWID() function to get a uniqueidentifier.

Comparison

I hear you say “you speak too much, all those doesn’t tell much, show me the code!” There it is, the comparison of post insert generators vs ORM style generators.

I will first start with demonstrating how they break UoW, then continue with Batcher! (did you know that NH uses NonBatchingBatcher by default? ;) )

The code under test is simple

[Test]
public void Should_not_insert_entity_in_a_transaction_HiLo()
{
    var post = new PostWithHiLo {Title = "Identity Generators Revealed"};
    var postComment = new PostCommentWithHiLo { Post = post, Comment = "Comment" };
    using (ISession session = factory.OpenSession())
    using (var tran = session.BeginTransaction())
    {
        session.Save(post); //No commit here
        session.Save(postComment);
        long insertCount = factory.Statistics.EntityInsertCount;
        Assert.That(insertCount, Is.EqualTo(0), "Shouldn't insert entity in a transaction before commit.");
    }
}

[Test]
public void Should_not_insert_entity_in_a_transaction_Identity()
{
    var post = new PostWithIdentity {Title = "Identity Generators Revealed"};
    var postComment = new PostCommentWithIdentity {Post = post, Comment = "Comment"};
    using (ISession session = factory.OpenSession())
    using (var tran = session.BeginTransaction())
    {
        session.Save(post);
        session.Save(postComment);
        long insertCount = factory.Statistics.EntityInsertCount;
        Assert.That(insertCount, Is.EqualTo(0), "Shouldn't insert entity in a transaction before commit.");
    }
}

Now, let’s try it. What do you expect in both cases? Should both test pass? The test with identity strategy fails as it tries to insert the entity even before calling a commit.

Now here is the explanation for the batcher:

using (ISession session = factory.OpenSession())
using (var tran = session.BeginTransaction())
{
    for (int i = 0; i < 3; i++)
    {
        var post = new PostWithHiLo {Title = string.Format("Identity Generators Revealed {0}", i)};
        session.Save(post);
    }
    tran.Commit();
}

The upper code sends queries to database only once. However, if you’re using the Identity style generators, then you’re in trouble.

Conclusion

You should know what you’re gaining and what you’re losing when using an identifier strategies. In case of a greenfield application, my choice would be to use HiLo as it is more user friendly(and this is what NH team suggests actually), and Guid.Comb in case a replication kinda thing is required. Most probably I wouldn’t use Identity. However, on a brownfield application, where you can’t really change the DB schema for some reason, than Identity should be used as a last resort.

I’d like to end this post with two sayings that I hear/see from Fabio

Human knowledge belongs to the world!
Quality is not achieved by chance!

An improvement on SessionFactory Initialization

14. March 2009

UPDATE: I have just committed the PersistentConfigurationBuilder for Castle NHibernate Facility. Thank you Jonathon Rossi for informing me!

We have received several complaints about slowness of SessionFactory initialization when there’s hundreds of entities, and Ayende has replied one of them here. It even gets worse if you’re using it in a web environment. You may think that it is not a problem since SessionFactory is initialized once in a web environment, but the major impact is not on production but development. Think how many times you start your application a day.

The problem is not really with NHibernate but with xml validation against the schema. Here are some profiler results for SessionFactory initialization with one thousand entities:

image

As you see, the adding XML resources takes the most time and the reason behind this is the schema validation. There is also an I/O cost involved (1040 resources should be read by NHibernate). There are several ways to get rid of it, one being the serialization of configuration. I spend 3 days (statics prevented me from spotting some bugs in the code) on this and I believe it pretty much works for every configuration. Another way of doing this is the merging of HBM files, which I believe faster than Serialization as Deserialization also takes some amount.

Now the results for the one using the Deserialized Configuration.

image

A nice feature of dotTrace allows us to compare the performance improvements over the old way.

image

We got 10 seconds rescued! Yay!

Now I am going to show how I used this feature in Castle NHibernate Facility. We have IConfigurationBuilder that is used to integrate various Configuration sources (such as FluentNHibernate).

First of all I must ensure that if any of the files that are used to create the Configuration change, we shouldn’t use the serialized configuration, instead the Configuration should be re-created.

public override Configuration GetConfiguration(IConfiguration config)
{
    log.Debug("Building the Configuration");

    string fileName = config.Attributes["fileName"];

    IConfiguration dependsOn = config.Children["dependsOn"];
    IList<string> list = new List<string>();

    foreach (var on in dependsOn.Children)
        list.Add(on.Value);

    Configuration cfg;
    if (IsNewConfigurationRequired(fileName, list))
    {
        log.Debug("Configuration is either old or some of the dependencies have changed");
        using(var fileStream = new FileStream(fileName, FileMode.OpenOrCreate))
        {
            cfg = base.GetConfiguration(config);
            this.WriteConfigurationToStream(fileStream, cfg);
        }
    }
    else
    {
        using (var fileStream = new FileStream(fileName, FileMode.OpenOrCreate))
        {
            cfg = this.GetConfigurationFromStream(fileStream);
        }
    }
    return cfg;
}protected virtual bool IsNewConfigurationRequired(string fileName,IList<string> dependencies)
{
    if (!File.Exists(fileName))
        return true;
    FileInfo fi = new FileInfo(fileName);
    DateTime lastModified = fi.LastWriteTime;
    bool requiresNew=false;
    for (int i = 0; i < dependencies.Count && !requiresNew; i++)
    {
        FileInfo dependency = new FileInfo(dependencies[i]);
        DateTime dependencyLastModified = dependency.LastWriteTime;
        requiresNew |= dependencyLastModified > lastModified;
    }
    return requiresNew;
}

Code doesn’t look really good, I guess, so I am open to any suggestions on improvement. The code is not yet in Castle Codebase, as our NH dependency on trunk is not the latest (and i am too lazy to update it). When I find time, I may update the dependency if others agree.

There is one thing that you have to be careful about. You must be aware that if you’re using IUserType, IInterceptor, ISqlFunction etc, all of those should be Serializable too!

, ,

I joined to devlicio.us

11. March 2009

Casey Charlton was kind enough to invite me to Devlicio.us, the place where the smartest guys of the .netosphere blog regularly. It is a great opportunity for me to be a part of it because the guys there and the audience devlicio.us has can definitely improve my skills while I share my knowledge and experience on the things I work on. On devlicio.us, I will be writing on programming, Castle and NHibernate. I will keep this blog open and and post the articles that I write to devlicio.us but I will also write some off topic here.

Thank you Casey!

Debugging NH itself with NHProf

2. March 2009

This is my second chance to play with NHProf, first was for demonstration purposes to my friend Murat Haksal. As we are close to release in NH, and being lazy for about a month and a half, I decided to do some work on NH (including serialization of configuration).

As usual, I first looked JIRA for some issues I can solve. I found NH-1391 important, so decided to give it a try. I created the test case and it was failing for lazy and non-lazy collections. This is obviously a bug, however it wasn’t that easy to debug. The reason for being hard is that debugger does many things behind the scenes, and it may even initialize your collections. Therefore, I wasn’t able to locate/trace the problem ( I usually forget where collection queries are built and executed, even if I spent some time before for other bugs). At this time, NH Prof came to the rescue.

By default, it doesn’t show StackTrace that belongs to NHibernate,Castle, etc. However, you can tell it not to do so.

 twitter
I removed the namespace NHibernate in order to be able to see the stack trace that belongs to NHibernate.

 


Then the problem was crystal clear. CollectionLoader class is responsible for collections. The rest was easy.

Thanks Ayende! You made it easy!

,