Tuesday 31 July 2012

Active Directory Argh!!!

Working on a dynamic problem with this DCOM service component. The service can get hit pretty hard so as part of our testing we run 500 concurrent clients against it for a couple of days. It isn't clear what the problem is an more annoying is that it only crashes after 24 hours+ of runtime. I've tried running it with the debugger attached just to have numerous sessions trashed by comms timeouts causing the debugger to disconnect.

In the mean time I am looking at a problem with the lookup of user information in Active Directory. The service is a DCOM server that looks up a user based on an NT4 ID passed in (domain\username). The service needs to lookup the users Distinguished Name (DN) , User Prinicpal Name (UPN)  or Service Principal Name (SPN) for services accounts, email address and also for service accounts the GUID and DNS name.

We use a mixture of ADSI and an implementation of IADsNameTranslate to do this. Name translation is used to get the DN and UPN/SPN, ADSI is used to get the user's email address, service GUID and service DNS name.

The problem is that with 500 threads all banging away it barfs. Sometimes it just says it cant't contact the domain (0x8007054B) at other times I get handle invalid (0x80070006) as well as a few others like RPC failed (0x800706BF).

With a single thread repeating the process thousands of times all is well. The more threads I have doing this the worse it gets.

I wrote a test program that just does the lookup many times in a number of threads. This fails more than the actual service. It's not consistent too so it will fail a lot when it starts and then settle down and just work for the rest of the test run. I contemplated adding a retry mechanism but I am concerned this will inundate the domain controller if it goes wrong.

The service is free threaded so I added this to the code:
#define _ATL_FREE_THREADED


When I change this to say apartment threading it seems to fail less but with enough threads it still fails.
#define _ATL_APARTMENT_THREADED


I don't really understand this as I would have thought that if the Name Translate object was not thread safe it would have declared itself as apartment threaded and when my free-threaded client invoked it then it would have been invoked in an apartment.

The logical thing to do is to put a big fat lock around the code so only one thread does the lookup at a time but this is a serious performance hit! This guy seems to do just that http://pyyou.wordpress.com/tag/userprincipalname/

I tried just locking the calls the NameTranslate (and not the ADSI calls) but this gives the same result.

I contemplated using DsCrackNames but haven't tried this yet.

I don't understand why this doesn't just work...

Wednesday 18 July 2012

TNT Tracking is weird

I am still waiting (with breath held) for my retina MacBook Pro. They weren't kidding about 3-4 weeks and are about to take it to 4 weeks to the day.

Anyway it went to shipped on Tuesday evening but the tracking number came up with no information until late yesterday. By then it showed the package had left the airport in Shanghai but nearly a day earlier.

Today the status is the same which seems odd. I noticed you can choose your country (the default when you click through from the Apple store is UK for some reason) at the top of the page so I changed to Australia, copied and pasted the tracking number in again and now I see it has arrived in Australia! (but none of the details from China).

Then I thought I would try making china my country (you can choose English language) and this time I have a bunch more scan entries from china.

I have to ask - what is the point of a tracking system where you have to search multiple countries for the status of your package? I reckon the package will be delivered before the initial UK tracking site updates the status.

I'm sure TNT is under some stress if Apple are using them to ship all their new shiny MacBooks but still... Useless...

Sunday 15 July 2012

Deadlocks

Ok this seems pretty obvious but for whatever reason I never noticed this before.

I was hunting a deadlock today that happens when a service loses the database connection. We use windows critical sections to implement our mutexes.

There are upwardly of 30 threads performing a variety of actions in response to request messages. Some have noticed the DB connection loss and are trying to handle it while others are locked waiting for stuff the threads trying to deal with the DB connection loss still have locked. Urgh... And so a dead-lock was born...

Anyway while trying to untangle who locked what I figured out that the if you shift-F9 a critical section is has a OwningThread member which is the ID of the thread that locked the critical section!

Told you it was obvious and I should have noticed it earlier.

Well it helped me today so I thought maybe someone else may not have noticed this.

Wednesday 11 July 2012

Bloody DHCP

Ok so in my recent posted I talked about how damn Optus has this DNS that redirects unknown addresses to looksmart or some such and that you can get around it by using an alternative Optus DNS.

Well it took quite a few attempts to change the DNS setting! I thought that as my DC had a second (hidden) LAN interface after the VM having been moved from VMWare Server to ESX it might be handing out the DNS configured in the hidden interface. My primary connection was coming up as 'Local Area Connection 2' and whenever I tried to change anything it would warn me that there was another interface with a duplicate IP address but I can't see the duplicate device.

Apparently what you do to make hidden devices visible is you run a cmd and

setdevmgs_show_nonpresent_devices=1

Then you run the device manager snap-in (devmgmt.msc) and go to view and select 'show hidden devices',

Now I can see the disabled LAN interface and I just uninstall it.

Back to DHCP - so I tried deleting the lease for my machine from the DHCP management console and doing an ipconfig /renew but I still get the same (wrong) DNS. I tried doing a /release before I did a /renew but still the same.

I broke out wire-shark and found down in the details the DHCP server was handing out the wrong DNS!

Hmm Back to DHCP console. So I right-clicked the server-options and chose configure options. I selected option 6 in the general tab and added the correct DNS servers. I restarted the DHCP server and tried the /release /renew cycle but no luck. Wireshark still sees the wrong address being handed out.

I rebooted the server - still no luck. I checked the logs to see if it was saying anything about the problem but no luck.

I noticed that the reservation for one of my server machines had the wrong DNS server also. I removed it and re-added the reservation - again the DNS was wrong!

I then gave up and deleted the scope and created a new scope. I selected the DNS and WINS servers during the setup options and enabled the scope. /release /renew and voila! It works! Created the reservation and again it gets the right DNS.

God it's hard!


Monday 9 July 2012

ESXi 5.0 Update 1 Auto start VMs

As I said before there is a bug preventing autostart of the VMS after a startup of the box. Fixing this  turned out to be easy once I figured out the commands.

You can log onto the hypervisor by enabling SSH - go to configuration tab in vSphere, select security profile (in the software box) and hit properties in the services area. Enable and start SSH.

Using putty, logon by going ssh -lroot <machine> and entering your password.

Ok now you can run this command to figure out the IDs of the VMs you want to start (and the order). The ID is the first column:

vim-cmd vmsvc/getallvms


Then edit the rc.local file (vim /etc/rc.local) and at the end add a series of commands to start each VM and sleep for a few seconds between commands. The command to start is

vim-cmd vmsvc/power.on <id>


So mine looks like this now:


#!/bin/sh


export PATH=/sbin:/bin

log() {
   echo "${1}"
   /bin/busybox logger init "${1}"
}


# execute all service retgistered in ${rcdir} ($1 or /etc/rc.local.d)
if [ -d "${1:-/etc/rc.local.d}" ] ; then
   for filename in $(find "${1:-/etc/rc.local.d}" | /bin/busybox sort) ; do
      if [ -f "${filename}" ] && [ -x "${filename}" ]; then
         log "running ${filename}"
         "${filename}"
      fi
   done
fi


vim-cmd vmsvc/power.on 3
sleep 10
vim-cmd vmsvc/power.on 4
sleep 10
vim-cmd vmsvc/power.on 6


Now when it boots it should start the VMs in the order you specified!

Sunday 8 July 2012

Intel processor - Choose Carefully!

Sigh... Unfortunately the store (MWAVE in Lidcombe) did NOT allow me to return the i7-3770K and exchange it for an i7-3770. You may recall I discovered (the hard way) that the faster 3770K model DOES NOT support VT-D even though the cheaper one does.

I can sort of understand this and would have been happy to pay a re-stocking fee but what bugs me about this situation is that the description of these units on the vendors site said nothing about VT-D. There was no way to tell the difference and in fact the specs on their site were identical.

Exhibit A
http://www.mwave.com.au/sku-19010245-Intel_Core_i7_3770_Quad_Core_3rd_Gen_Processor_-_Socket_LGA1155_-_3_4GHz_(Turbo_

And exhibit B
http://www.mwave.com.au/sku-19010244-Intel_Core_i7_3770K_Unlocked_Quad_Core_3rd_Gen_Processor_-_Socket_LGA1155_-_3_5G

Ok I should have checked on the Intel site but this seems harsh.

Oh well I was just starting to like this place. Time for a new computer store.

Tom

Bloody Optus DNS

To make a domain work, the DNS server has to be under the control of the domain controller. Windows domains use all sorts of magic host names and records to find stuff.

The initial router I was supplied by Optus (a Cisco unit) allowed me to specify the DNS IP addresses handed out by the built-in DHCP server so all was good - the primary DNS was the domain controller and the secondary was Optus' own.

The router was fast enough but it would occasionally reset at random times (less random when it was hot). One day it gave up entirely. Optus were good in that they sent somebody out pretty much straight away, replaced the router and I was back up and running.

They replaced it with a Netgear unit which also seems pretty fast but it doesn't allow me to specify the DNS! So after some head scratching I decided to try running DHCP on my DC. I've had problems with this before as for whatever reason the switch will not pass on the broadcasts and I found this problem to be worse on wireless.

Anyway I've configured the DHCP server and this has been going Ok.

In the past I found the routers would act as DNS proxies so you would configure the DC as the primary DNS and the router as the secondary. This router doesn't do this and just passes the IP of the DNS it has been given out to the DHCP clients on the network. I didn't realize this and configured the router as the secondary so the effect was that lookups worked as the DNS server would forward up to the network but then if the DC was down (say because I shut it down over night) DNS wouldn't work. I figured this out recently and configured this correctly so even if the DC is down, so long as the computer has a cached IP from a previous DHCP it works.

The problem then is that if you try and access an address that isn't in the DNS the stupid Optus DNS redirects you to this true local search provider. This meant that lookups for my local servers by name often resolved to true local! This is pretty frustrating and not helpful.

Turns out there is an Optus resolver that doesn't do this. This guy posted details of the IP addresses thankfully and this seems to work.

http://justlocal.blogspot.com.au/2009/11/annoying-optus-dns-assist-feature.html

So life is good. If only I could get my VMs to auto-boot with the box...

Thursday 5 July 2012

My Diffs now have Oil

It's been two years since Robin and I bought the Nitro car (an Thunder Tiger EB4 S2) and while we don't use it so much in summer, we've put a few litres of fuel through the thing since we had it.

It needs a bit of attention at the moment. Last weekend at the bashers track (Lansvale) we broke the wing mount, the aerial and cover. We still have problems with the exhaust coming off the rubber joiner every time we bump something.

So I spend some time this week stripping it down including repairing the wing mount and just generally cleaning it. I massively tightened up the screw holding the exhaust and spent some time with some wire wool and some polish cleaning the crap off the tuned pipe.

Tonight I got the diff oil and spent some time pulling down the front and centre diff, wiping all the grease off and filling them with diff oil. I've never stripped the diffs before and was very much looking forward to doing it. I find the internals of the diffs just amazing. The front one (pictured below) was in pretty good condition - it was tight and relatively smooth although it was a little noisy. The centre diff has developed some play and you can see from the colour of the gears it has gotten pretty hot. I suspect the centre diff will need replacing.



It took a bit of effort to get to the front diff. In the end I decided the easiest way was to undo the screws on the server saver posts, undo the strut attaching the plate at the front to the centre diff and then undo the four screws holding the whole front unit in. Then I unscrewed the two really long screws going through the front toe plate, the two screws going through the shock tower and the two long silver screws holding the front cover on the diff housing. This got me into the diff housing so I could remove the diff unit.



.
I think if I repeat the procedure when pulling apart the back it will work. I'll have to undo the strut going from the wing mount down to the chassis also.

I went with advice and am running 5K in the front, 5K centre and 1K oil in the back. It is quite viscous however and I wonder if I will end up changing the front for 3K later. It may settle down after we drive the car however.

Anomaly

Finished reading Anomaly yesterday and it was quite good overall. Anomaly is a budget (self published) Sci-Fi book off Amazon.

Anomaly is a story about a strange physical anomaly that turns up in the middle of New York and which turns out to be an alien artifact. The story centres on a science teacher and a reporter who inadvertently get dragged into the analysis of the artifact. The anomaly captures a large sphere of road and buildings and rotates it relative to the land around it.

The characters struggle to think through what the anomaly might be and why it manifests itself as it does. They think through how to communicate with it and what its goals for being there might be. The book also imagines what the global consequences of the anomaly might be.

I really like the way the book portrays an interesting scientific anomaly without resorting to magic but also without getting bogged down in scientific detail. I like how the main character thinks through it all just using simple logic. (Spoiler) the rainbow texta idea for instance for showing the alien the range of the human visible spectrum is great.

What bothers me is that I think there are other possibilities  that the characters could have come up with based on the evidence but magically the one they chose seems to become the next step (well mostly).

The other aspect that really bothers me is the global impact of the anomaly - the world turns to crap (riots and deaths) just because some weird alien artifact turns up? Seriously?

Yes I know my last book rant was about religion but again this author choose to portray most of the clergy as morons. He effectively sets up a few characters as the 'good cops' with a sensible world view but in order to make his point he has to create a few moronic clergymen for them to argue with. This is at best clumsy and at worst just insulting.

In the book the author deals with the issues arising from the anomaly having turned up in the US and the US controlling its exploration. That part seem very realistic to me as I can imagine the scientists of the world going nuts when they were excluded. The only part I think the book gets wrong is the belief that this situation would be acceptable or necessary.

(Spoiler) and then the aliens turn out to be some sort of galactic police that ensure no new up and coming race does damage within the galaxy. Hmm I wonder if this alien race invades planets to hunt down weapons of mass destruction too?

And I think that's what REALLY tarnishes this book for me - the nauseatingly american-centric viewpoint.

Otherwise the story trundled along and had lots of cool ideas so probably worth the $2.00 or whatever it cost.

Wednesday 4 July 2012

Multiple Inheritance

Found what is probably a 'classic' C++ error today.

We have

Class Message;

Class LogonMessage : public Message

Then we have the processor that is invoked to handle this message which for whatever reason is defined like this:

class ProcessRequest

class ProcessLogonRequest 
    :    public ProcessRequest, 
         public LogonMessage

Now in an exception handler somewhere the code processing a ProcessLogonRequest catches an exception and dies:

void Service::process( ProcessRequest *processRequest )
{
    try
    {
...
    }
    catch( const SomeError& e )
    {
         Message *asMessage = (Message *)processRequest;

         //
         // This crashes
         //
         generateErrorResponse( asMessage->getSomeField() );
    }
}

I amazed this hasn't occurred earlier. I think that because the processor mostly handles all the exceptions (it is only a system error that gets up this far like a DB going away) that it didn't crash earlier.

The problem is of course that Message and ProcessRequest are siblings in the class hierarchy and you can't cast from one to the other. You could first down-cast to a ProcessLogonRequest and then cast to a Message but this isn't possible as the code doesn't know what its got.

Thankfully RTTI is enabled so we just changed this to:

Message *asMessage = dynamic_cast<Message *>(processRequest);

But then there is also a risk that processRequest is an instance of a class NOT derived from a Message so we also needed to handle the case where dynamic_cast returns NULL.

The other way to solve it would be for ProcessRequest to define a virtual method that returns a pointer to itself cast as a Message but then I would have to change every sub-class of ProcessRequest (and these are numerous).

Tom

Monday 2 July 2012

Working at home

Given the name of my blog I thought I had to share this (thanks to Dan). Given the image of the developer after 1 year of working at home, what must I look like after nearly 10!

http://theoatmeal.com/comics/working_home

Sunday 1 July 2012

Server Rebuild

I've been trying to figure out why my VMs don't start when the machine is powered on. Apparently it is a bug :( Have to wait for Update 2 of ESXi 5.0 for a fix. http://communities.vmware.com/message/2014677

The other problem I have been facing is that when I first configured the domain I used the domain name I had based on my old company name (consultancy company).  Initially I was using the domain for testing Autoenrollment functionality but over time I became more reliant on authentication. Anyway - unfortunately I no longer have that company and someone else is using the domain name. This causes havoc with name resolution. I decided to take the plunge and rename the domain. It shouldn't be too hard as I only have a single domain controller. Unfortunately it isn't straight forward and is a 10 step process involving a tool provided by Microsoft. The instructions are here http://technet.microsoft.com/en-us/windowsserver/bb405948.aspx


I started the process and got an error from rendom /list "The Behavior version of the Forest is 0 it must be 2 or greater to perform a domain rename: The server is unwilling to process the request. :8245"

Turns out I had to raise the domain functional level which is easy - go to Active Directory Domains and trusts, right click the domain and choose the 'Raise Domain Functional level'. It still didn't work. After some googling it turns out I need to also raise the forest functional level to do this you right-click on the 'Active Directory Domains and Trusts' entry in the tree and choose 'Raise Forest Functional Level' Then the rendom /list worked.

I hit another problem with this Framdyn.dll not being found by the gpfixup.exe took which turned out to be because there was an error in my path variable! It's been there for ages! Just a missing semicolon between the path to the wbem directory and whatever was next in the path.

I also ended up re-starting every computer in the domain so the new domain name would take effect but overall the process was moderately painless.

Name name resolution works again!