Tuesday, 31 July 2012

Active Directory Argh!!!

Working on a dynamic problem with this DCOM service component. The service can get hit pretty hard so as part of our testing we run 500 concurrent clients against it for a couple of days. It isn't clear what the problem is an more annoying is that it only crashes after 24 hours+ of runtime. I've tried running it with the debugger attached just to have numerous sessions trashed by comms timeouts causing the debugger to disconnect.

In the mean time I am looking at a problem with the lookup of user information in Active Directory. The service is a DCOM server that looks up a user based on an NT4 ID passed in (domain\username). The service needs to lookup the users Distinguished Name (DN) , User Prinicpal Name (UPN)  or Service Principal Name (SPN) for services accounts, email address and also for service accounts the GUID and DNS name.

We use a mixture of ADSI and an implementation of IADsNameTranslate to do this. Name translation is used to get the DN and UPN/SPN, ADSI is used to get the user's email address, service GUID and service DNS name.

The problem is that with 500 threads all banging away it barfs. Sometimes it just says it cant't contact the domain (0x8007054B) at other times I get handle invalid (0x80070006) as well as a few others like RPC failed (0x800706BF).

With a single thread repeating the process thousands of times all is well. The more threads I have doing this the worse it gets.

I wrote a test program that just does the lookup many times in a number of threads. This fails more than the actual service. It's not consistent too so it will fail a lot when it starts and then settle down and just work for the rest of the test run. I contemplated adding a retry mechanism but I am concerned this will inundate the domain controller if it goes wrong.

The service is free threaded so I added this to the code:
#define _ATL_FREE_THREADED


When I change this to say apartment threading it seems to fail less but with enough threads it still fails.
#define _ATL_APARTMENT_THREADED


I don't really understand this as I would have thought that if the Name Translate object was not thread safe it would have declared itself as apartment threaded and when my free-threaded client invoked it then it would have been invoked in an apartment.

The logical thing to do is to put a big fat lock around the code so only one thread does the lookup at a time but this is a serious performance hit! This guy seems to do just that http://pyyou.wordpress.com/tag/userprincipalname/

I tried just locking the calls the NameTranslate (and not the ADSI calls) but this gives the same result.

I contemplated using DsCrackNames but haven't tried this yet.

I don't understand why this doesn't just work...

No comments:

Post a Comment