I had the most interesting error to deal with yesterday. I woke up in the early morning to do a code deployment Saturday when I checked my client’s Test environment and noticed that the AOS Service was hanging and I could not start it. Of course, the client would not start because the service would not start – it would just stay in a status of “starting”.
Quickly, I went through the troubleshooting:
- I checked the Windows Event Logs
- The Dynamics AX Event logs
- The SQL Server Logs
- I attached a debugger to the service but couldn’t actually get anything because the service wouldn’t start
- I verified that every dependency was starting just fine
Perplexing!!!
After spending all day losing my hair, I rejected the code deployment to Production for my client, and explained that I would need more time to find the issue. I never do a deployment with things broken in the Test Environment though it was strange as I had personally tested out the new code improvements myself.
So, I went to sleep.. Frustrated that I still had not found the problem and I couldn’t find any indicators.
Then, as what often happens, a fresh mind can make us look at things a different way. I thought about it and began retracing my troubleshooting steps this morning. None of the code had failed on testing. The service would start and just hang..
JUST HANG was the keyword!!! When the AOS Service just hangs, this usually indicates an underlying problem with waiting on the SQL database – either through bad communication or locking.
I went back and looked at the locks, and I found it. Sure enough, there was a transaction on the SQL Server from a user custom report that was locking some of the metadata tables. It was outside the AOS as it was written in SQL. I killed the process and everything worked.
Cautionary tale about being aware of locking.