When NOT to Performance Tune Your Application

On a recent project I was told by a colleague about a certain SQL query generated by entity framework, that was ridiculously out of hand. Entity Framework allows you to pretty easily create a simple Data Access to the Table Per (Sub) Type pattern.

What this means is that you may have an inheritance of both a Student and an Instructor, derived from a Person, and query to retrieve a strongly typed object. So here’s where performance & optimization comes in. There’s a couple of ways to query against this data model.

Method 1: Implicit Typing

var query = from p in Persons
            where p.PersonID.Equals(_personID)
            select p;

Method 2: Explicit Typing

var query = from p in Persons.OfType<Instructor>()
            where p.PersonID.Equals(_personID)
            select p;

They seem pretty similar, however there’s quite a significant difference in what gets generated. By using method 2, you wind up letting Entity Framework know exactly what table it’s querying against. Which means your SQL code looks something like this:

SELECT
    PersonID,
    Column1,
    Column2,
    Column3
FROM
    Instructor
WHERE
    PersonID = @PersonID

However when you don’t specify the type, Entity Framework constructs a SQL query intended to make SQL go and figure it out (keep in mind that there’s no automatic discriminator column – it figures out type based off of the primary key – the ID column. More on this in a minute). The generated code looks something like this:

SELECT
    PersonID,
    Column1 as [0x01],
    Column2 as [0x02],
    CASE WHEN [1x01] IS NOT NULL THEN CAST(INT, [1x01])
    ... many more case, casts, for every column in every table ...
    
FROM
    Person
    UNION ALL SELECT 
        PersonID as [1x01],
        Column1 as [1x02],
        Column2 as [1x03]
        UNION ALL SELECT
                ... many more unions for every table ...
WHERE
    PersonID = @PersonID
        

Now you can see that this query gets very complex as a product of:

a) the number of subtypes

b) the number of columns for each type

The query generated gets the Cartesian product of all columns, and looks for the one where the key isn’t null – that’s the “winner” subtype. I imagine (haven’t yet tried this) that having a nested subtype involved here (like BusinessStudent in the linked example above) would cause an even more ugly nesting of the union within another union statement.

Now back to the point of this article – performance. How bad is what we see above? In an empirical example, thanks to JetBrains’ Dottrace and nunit tests I observed averages of:

Method 1: 126ms for the query to run

Method 2: 25ms for the query to run

I had then discovered the benefits of precompiling Entity Framework view code to optimize the SQL generation. This bought me roughly 26% gain in performance for the specific empirical examples.

Method 1: 100ms

Method 2: 20ms

Now we have roughly 80ms to play with – if the code to get from Method 1 to Method 2 (we don’t know the type that we’re retrieving, however we want the optimized query of Method 2) is more than 80ms, then the performance “fix” will be worse than the problem.

So far, given the constraint of EF (for now), and the Table Per (Sub) Type pattern, the only solution that comes to mind is reflection – this would involve a stored type as a discriminator column of sorts, then reflecting on that type, and calling the generic Person.OfType<T>() method via reflection. This costs us an extra query and reflection – neither of which are cheap. A separate empirical example (not the same code as the first) brings the total cost to ~350ms, a net performance loss of 250ms.

Method 1’s performance would have to degrade (through additional columns/subtypes) by ~250ms more in order to justify rolling a custom discriminator and reflecting to grab the subtype.

This was a pretty interesting exercise in when not to make performance optimizations that you know will need to be done long-term.

Posted on 4/30/2009 7:01:00 AM by Jason Nadal

Permalink | Comments |

Categories: development | performance | software

Tags:

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

VMs @ Home Development

Wow... it's been a while since I've updated this blog.

Today's topic is one that took me a while to get to... not to writing, but to actually implementing. More to the point, it took me an eternity to

be convinced that a) vms for doing work (even for yourself) at home is a Good Thing (tm), b) is not just good, but essential, and c) vmware is a

better product and easier to use than MS Virtual Server and Virtual PC.

Some things it takes me a while to get drilled into my head on -- these are the lessons that are hard learned. I say this as I just exit my own webform post editor in favor of writing my posts in notepad... this is something I constantly harp on my wife for as something you just should not do. Who wants to rewrite a 3 page textbox entry after they've already typed it!

That was actually a poor segue, but it at least serves to illustrate my point... avoid getting burned.

I'm a developer. I like cutting edge stuff.

Those two statements together? Beta Testing for the win.

Over the years (well, since I tricked my way into the win '98 beta back in high school), I've tried countless software that was close-to-but-not-quite ready for prime-time. I've lived without a functional dvd player, lived without sound. Lived without being able to display anything on screen (well... except for BIOS), and headed into it face-first. (Till Windows Home Server!)

With my development environment, I've learned that VMWare is the best way to allow me to try out whatever betas I want on my host OS (currently running Win7 with nary an issue, now that I've told it to ignore the fact that the 64bit drivers are unsigned, and got my hands on some beta drivers for other cards). All I have to do is use some snapshots in VMWare Workstation, and I can revert back in the dev environment to stable points! Now I can have my nightly builds of resharper 4.5, and roll back if I hosed my working environment! Add to this the fact that I can share my USB devices, and now I can synch my iPod when I'm out of the state. (I can also have the VMs net connection go through my cell phone... really cool if I'm on the road)

The other cool things are being able to use Unity to have virtual applications running side-by-side with host windows, and being able to have my native 2560x1600 resolution OS on a virtual machine. For $189 this is invaluable, even though it's a steep price to begin with.

 

 

Posted on 3/13/2009 6:27:00 PM by Jason Nadal

Permalink | Comments |

Categories: development | hardware | software

Tags: , , ,

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Retrofitting Unit Tests -- NCover to the rescue

Writing unit tests is the right thing to do to provide regression testing and just provide some proof that things work how they should. They are the safety net that make sure that when you change the facing on the capstone that the rest of the arch won't collapse.

That being said, one does not always get the opportunity to invest the time it takes to get past that big learning curve to make it part of the daily toolbelt. But knowing unit testing will help in the long term, you get the itch to add them after the fact.

The problem is that they're much harder to just tack on as methods aren't guaranteed to be truly granular in their purpose. Methods and classes tend to be responsible for too much. So how can you write tests to cover all of the responsibilities?

The simplest answer is to write to what you know. I know that Method FastCash() logs into my bank account and withdraws $60. So I write a unit test to check a good login, bad login, pad pin number, scenario where I have $60 to withdraw, and a scenario where I'd be overdrawn. Seems like a decent bunch of simple tests for the FastCash() method, right?

Well, by using NCover, you can run coverage reports on your unit tests. When I do this, I discover i'm only hitting 67% of the logic of the FastCash method. As it turns out, there's a third bit of functionality in there -- automatically, it assumes you choose the Checking account. NCover shows you the lines that are not getting hit by your unit tests, and allows you to specifically target those additional scenarios you weren't truly testing.

I was able to put this into good use today while finding some pure business logic classes. That makes it much more simple to write unit tests for as there are minimal dependancies and coupling occasional, and a great opportunity for well defined acceptance criteria. Both make it that area of code a natural fit for unit testing.

Posted on 10/10/2008 6:21:00 PM by Jason Nadal

Permalink | Comments |

Categories: development | software | troubleshooting | tdd

Tags: , , ,

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

NDepend for Quality

Among the many code tools out there, this one has always seemed a far reach for me -- interpreting code complexity and putting a metric on how "good" code is.

From Andre Loker (whose blog I discovered only recently), comes a fairly deep review.They offer a time-bombed trial version that's licensed for use for open source products, and promise to release an extended license before the time bomb has hit. I imagine this is to keep a close tab on their license without granting it in perpetuity.

Odd licensing aside, there is a wealth of information in the reports it generates, and in a turn that I found particularly interesting, it's able to import NCover reports. I'm about to give it a try on my FileCombiner application, though it may take quite a bit of reading in order to get any meaningful information about the results it gives.

Posted on 10/7/2008 6:01:00 PM by Jason Nadal

Permalink | Comments |

Categories: development | software

Tags: ,

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5