Jetbrains Dottrace, Linq 2 Sql, and the case of the static DataContext

I've been in head-down, job-focused, non-blogging mode for a while now, but this issue has been enough to wake me out of hibernation. I've always been best at learning things on the job, since I tend to run into real-world issues that the examples never hit (happy path, anyone?), and I've had an intermittent memory exception in an app that I've been trying to track down on nights & weekends for a while. It's not severe enough to linger, but it's frequent enough (once a month or two) to nag at me for not fixing it.

Enter Jetbrains Dottrace Memory. I've used it for performance analysis a few times in the past, and discovered it's invaluable at tracking down where your problems are. Seriously, I can't recommend the Jetbrains suite of tools enough. Sometime in the past year or so, they split apart DotTrace performance from DotTrace memory & I'm at about a half-upgrade (DotTrace memory 3.5), with a currently bad integration story with Studio & Resharper 6, but that's a side story.

I figured I would need to run a memory snapshot on a machine that runs my application (x64 machine, .NET 4, 32-bit console application in case your curious), since the network connectivity the application relies on doesn't exist on good-old localhost. (In the back of my mind I'm thinking I should take some of the remote server's connection logs, and mock up the remote side using an interface... I still may go down that path, but for now I need to rule out the TcpStream as a factor in the leak).

So I install DotTrace, and didn't bother putting in my personal product key, since I was hoping to grab the dump and analyze it on my development workstation anyway. I fire it up, start my console app, run the dump, and boom. I get stuck mid-dump, my app crashes, and no snapshot. I'll save hours and days of summary here, but how I figured this out was to put in "Console.ReadLine()"s in my code, doing deeper and deeper dumps until I found the line of code causing the problem.

It turns out that my Linq2Sql code is at fault (or DotTrace is... an argument can be made for either side). I have a repository that on construction stores a reference to the DataContext needed for queries. DotTrace failed to allow a memory dump after the data context was used. Wrapping it in dispose did not resolve the issue. A deeper change resolves the single code issue I had, however I would need to analyze the rest of the code I have that still uses Linq2Sql (I'm in favor of the NHibernate / Fluent NHibernate / Lambda Extensions kit for data access these days, but I still have some retro code lying around). The greater concern here is code changes just to allow tooling to work. Part of me is curious if the memory leaks are in fact due to stale DataContext objects that do not get disposed.

I'd be curious to see disposal patterns for Linq2Sql DataContext objects, however part of me thinks disposal of Linq2Sql altogether is the more prudent option. I had found this article on options for context instantiation (http://blog.stevensanderson.com/2007/11/29/linq-to-sql-the-multi-tier-story/), and my app used one of the options that relied on being run in single-thread mode (names changed to protect the guilty):

public static class DataContextHelper
{
   public static MyRequestModelDataContext RequestContext = 
      new MyRequestModelDataContext(ConnectionStrings.MyConnectionString);
   public static MyResultModelDataContext ResultContext = 
      new MyResultModelDataContext (ConnectionStrings.MyConnectionString);
}

 

Unwittingly, by putting a constructor on my repo that went around the static property's context construction, and instead putting the context instantiation inside the repository itself, even having the context being used doesn't prevent the memory dump. I know that having a static class means one instance of that class, and in this case its properties as well, around in memory. I'm not sure what that open data context does to DotTrace, but clearly the result is a crash while attempting to dump. So at this point I start trying to eliminate DataContextHelper in favor of some more atomic operations, and explicit lifecycle of the DataContext.

I started with the above, and a call to create a context in a private member in the repository, and moved to:

public class MyRepository
{
   private MyRequestModelDataContext _context = 
      new MyRequestModelDataContext(ConnectionStrings.MyConnectionString);
   public MyResponse GetResponse()
   {
      var query = from x in context.MyResponses
                  where x.Property.Equals(false)
                  orderby x.AnotherProperty
                  select x;
      var result =  query.FirstOrDefault();
      return MyEntityConverter.ConvertFrom(result);
   }
}

 

Now the interesting thing is that this alone seems to solve the issue... when MyRepository goes out of scope, the garbage collector would need to handle both the repository instance and its own instance of the DataContext.

The only downside to this would be that the instantiation of a context is by nature expensive, however the typical use case would be to create a single repository instance and use that for all of the data calls within a specific operation set or transaction anyway. So I make the change, and see the behavior of crashing memory dumps; clearly this is not the answer. So lets make things a little more interesting. One more change to the repository to make sure the context is cleaned up:

 

public class MyRepository : IDisposable
{
   ...
   public void Dispose()
   {
      if (_context != null)
      {
         _context.Dispose();
      }
   }
}

 

And then back a layer. The original code was:

var repo = new MyRepository();
return repo.GetResponse();

 

And the changed code is:

MyResponse result;
using (var repo = new MyRepository();
{
   //dump 1
   result = repo.GetResponse();
   //dump 2
}
//dump 3
return result;

 

The dump comments are Console.ReadLine's injected in to track when the dumps fail. While the console application is set up to wait, I can run over to DotTrace's "Control Profiling" window and dump the memory into another snapshot (or try to, anyway). The results of the above change are that the three dump points work fine. The next step of course would be to extend out these changes to any other repositories / calls.  The repositories are easy... the calls are not quite as easy. I find usages of the repository class, and wrap everything in using statements; the next step is verifying at least that the functioning of the application is not impaired, using a test run. So far so good. I then go and run memory dumps at various points throughout the process -- mission succeeded. I'm still unclear as to the nature of the behavior, however the code change allows me to proceed with my diagnostic regiment.

What is a bit unsettling still is that there are points in the application during execution where I am unable to dump out memory without crashing. I need to keep an eye on where there are objects around that depend on open data contexts, and those seem to be directly related to when I am able to successfully generate snapshots.

Posted on 7/21/2011 8:00:00 PM by Jason Nadal

Permalink | Comments |

Categories: development | performance

Tags:

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

When NOT to Performance Tune Your Application

On a recent project I was told by a colleague about a certain SQL query generated by entity framework, that was ridiculously out of hand. Entity Framework allows you to pretty easily create a simple Data Access to the Table Per (Sub) Type pattern.

What this means is that you may have an inheritance of both a Student and an Instructor, derived from a Person, and query to retrieve a strongly typed object. So here’s where performance & optimization comes in. There’s a couple of ways to query against this data model.

Method 1: Implicit Typing

var query = from p in Persons
            where p.PersonID.Equals(_personID)
            select p;

Method 2: Explicit Typing

var query = from p in Persons.OfType<Instructor>()
            where p.PersonID.Equals(_personID)
            select p;

They seem pretty similar, however there’s quite a significant difference in what gets generated. By using method 2, you wind up letting Entity Framework know exactly what table it’s querying against. Which means your SQL code looks something like this:

SELECT
    PersonID,
    Column1,
    Column2,
    Column3
FROM
    Instructor
WHERE
    PersonID = @PersonID

However when you don’t specify the type, Entity Framework constructs a SQL query intended to make SQL go and figure it out (keep in mind that there’s no automatic discriminator column – it figures out type based off of the primary key – the ID column. More on this in a minute). The generated code looks something like this:

SELECT
    PersonID,
    Column1 as [0x01],
    Column2 as [0x02],
    CASE WHEN [1x01] IS NOT NULL THEN CAST(INT, [1x01])
    ... many more case, casts, for every column in every table ...
    
FROM
    Person
    UNION ALL SELECT 
        PersonID as [1x01],
        Column1 as [1x02],
        Column2 as [1x03]
        UNION ALL SELECT
                ... many more unions for every table ...
WHERE
    PersonID = @PersonID
        

Now you can see that this query gets very complex as a product of:

a) the number of subtypes

b) the number of columns for each type

The query generated gets the Cartesian product of all columns, and looks for the one where the key isn’t null – that’s the “winner” subtype. I imagine (haven’t yet tried this) that having a nested subtype involved here (like BusinessStudent in the linked example above) would cause an even more ugly nesting of the union within another union statement.

Now back to the point of this article – performance. How bad is what we see above? In an empirical example, thanks to JetBrains’ Dottrace and nunit tests I observed averages of:

Method 1: 126ms for the query to run

Method 2: 25ms for the query to run

I had then discovered the benefits of precompiling Entity Framework view code to optimize the SQL generation. This bought me roughly 26% gain in performance for the specific empirical examples.

Method 1: 100ms

Method 2: 20ms

Now we have roughly 80ms to play with – if the code to get from Method 1 to Method 2 (we don’t know the type that we’re retrieving, however we want the optimized query of Method 2) is more than 80ms, then the performance “fix” will be worse than the problem.

So far, given the constraint of EF (for now), and the Table Per (Sub) Type pattern, the only solution that comes to mind is reflection – this would involve a stored type as a discriminator column of sorts, then reflecting on that type, and calling the generic Person.OfType<T>() method via reflection. This costs us an extra query and reflection – neither of which are cheap. A separate empirical example (not the same code as the first) brings the total cost to ~350ms, a net performance loss of 250ms.

Method 1’s performance would have to degrade (through additional columns/subtypes) by ~250ms more in order to justify rolling a custom discriminator and reflecting to grab the subtype.

This was a pretty interesting exercise in when not to make performance optimizations that you know will need to be done long-term.

Posted on 4/30/2009 7:01:00 AM by Jason Nadal

Permalink | Comments |

Categories: development | performance | software

Tags:

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Javascript Function Body Equality Checking

If you need to check to see if two functions are the same instance, you can just check the variables for equality:

 

var f = new function() { 
  alert("hello world");
}; 

var f1 = new f();
var f2 = new f();
var boolTest = (f1 == f2);

 

However, if you need to check to see if the bodies of those functions are equal (or both contain some text, etc -- essentially, just working with the actual language of the function), you can just cast as string to get the text of the function, then check for equivalency:

function fnsAreEqual(f1, f2){
  return String(f1) === String(f2);
} 

var boolTest2 = fnsAreEqual(
  function(){ alert("sameFn"); }, 
  function(){ alert("sameFn"); }
  ); 

updated: fixed JavaScript formatting

Posted on 4/27/2009 7:40:00 AM by Jason Nadal

Permalink | Comments |

Categories: javaScript | performance

Tags:

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Is my code smelly?

I've run my open source FileCombiner app through NDepend and dotTrace, and the results were on one hand startling, and on the other hand, expected.

Here are the first impressions:

dotTrace:

  • Discovered huge performance hit surrounding my WriteByte code. While attempting to make something that's extremely granular for Unit Testing, I obviously lost site of the overall speed goal.
  • Essentialy what's going on is 29 million individual calls to the WriteByte code -- I need to optimize this (probably by using a buffer), but now I have measured performance as a benchmark to improve.
  • I can now prove that improvements to code occurred, that they have a net positive effect, and the scale by which those improvements exist.

NDepend:

  • This one's tougher. A lot of the measures here I don't yet fully understand.
  • The good:
    • 28% comment ratio (1:1 would be 50%)
    • Distance is extremely low (0.07). Code nicely straddles the safe area between the "zone of uselessness" and the "zone of pain"
    • Most classes have great numbers
    The Bad:
    • App is marked as high instability
    • High instability seems centered around the FilePartJoiner class
    • Trial/Open Source edition of NDepend does not allow import of NCover reports (that was a disappointment)
    • High levels of Efferent Coupling -- need to discover why. Generally this means that a class is tightly coupled to another class, but NDepend states that it filters out framework classes. What's interesting is that this is the main WinForm class. I wonder if the correct behavior here is to remove that class from processing? I wonder how that would affect the net numbers
    • The attribute that I've declared "CoverageExcludeAttribute" that NCover is set up to ignore is showing up as incredibly evil to NDepend. I need to figure out how to exclude that attribute from processing in NDepend! This may be two birds with one stone if I am able to make NDepend recognize that attribute as those classes marked with that attribute seem to be the pain points in the app.

My next steps are to resolve the performance issue, as well as reconfigure NDepend to avoide the CoverageExclude attributes & rerun the NDepend results. This may be followed by moving all library-type classes out to a separate DLL. This was going to be a later stage, but I may be going against the grain of proper dependancy standards by holding off until later.

I'm definitely liking the process here, but it'll be interesting to attempt this same evaluation on an enterprise level application.

 

Posted on 10/8/2008 8:37:00 AM by Jason Nadal

Permalink | Comments |

Categories: development | performance

Tags: , , , , , ,

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5