10 January, 2012

Investigating problems in the wrong Universe.

A few weeks ago I was working on implementing an extension to the action oriented workflow paradigm that will allow it to infer implicit workflows using artificial intelligence as working agents do their work. I had finished coding the principle core language element changes (additions of new classes to core api's, recompile, build and distribute to staging server) during the distribute part, or rather after I was unable to restart the server instance of AgilEntity that was rebuilt from the newly deployed changes.

Now, I'd seen similar problems dozens of times before and knew just what to do. My assumption was that obviously some change that i made in the new code deployed was causing a problem, despite the fact that all the new code was properly unit tested per class there was still *the chance* that some novel object interaction in the deployed app. was causing a problem, maybe I neglected to add a handler for a required db column, or forgot to add the column to the db (problems that have happened in the past at this point) I quickly checked off the usual suspects as culprits and was left still with no ability to launch the server.

It was getting late and I was getting tired and frustrated, after 4 hours of trying I decided to give up for the night, let my brain rest and attack with vigor in the morning...er afternoon.

When I got up, as is my ken I hopped out of bed and came over to the computer. I thought to myself why isn't this working?? Partially worried, now I was thinking of the need to finish the code and get the UI implementation started. I worried about possibly critically failing with the implementation...my mind was running away into paranoia territory. So I stepped back and thought to do one thing that all Scientists and Engineers need to learn:

To question the most critical assumptions.

I had covered all the bases and every time I loaded the requested path installation line to the command prompt the server only launched one web context (of several that would indicate a fully functional server)...I looked at the web context that loaded and noticed it was the system context. This led me down a path where I investigated the class that bootstraps the system and after about 40 minutes place a trace line in the bootstrap code to see the path and while it was running is when I had the

paradigm shift in how I perceived the problem.

Like the sudden appearance of alternate faces in an illustration made to be an illusion, I realized that there was nothing wrong with the app. it was running exactly as designed. It was I that was flawed, the critical assumption that the *path I was entering* was the correct one had not been examined and seeing the path in the trace report indicated it was the wrong one.

I am a very parsimonious engineer (lazy), when I was writing the bootstrap code I wrote it so that I could easily modify how the system loaded up by modifying the input line. I was (for some reason) inputting the line for bootstrapping an *uninstalled* server (so it was loading only a local context for the web configuration UI, which worked) instead of for a server already registered and xml configured node instance!

The difference between the lines shows it clearly:

New node install path(windows path) what I was entering for hours:


Installed node install path, what I should have been entering.:


:so my entire Universe was wrong once that mistake was made, all my up level assumptions about what the problem was and associating them to the recent changes (which is normally exactly the right thing to do) was false.

This long story illustrates how even when *doing the right Science* investigating a problem, one should always question the critical foundational assumptions to avoid wasting precious time and trouble shooting away in the wrong Universe.


After entering the correct path (second one) all the contexts loaded perfectly, there wre no errors in boot strapping induced by the changes I had made.