Recently one of our client’s OSB/ALSB domain has been enduring severe health issues for couple of weeks. Preliminary analysis indicated that a rogue long running thread was chewing up a high CPU and was working on the default diagnostic store of Weblogic server. The diagnostic store was growing at an alarming rate to few hundreds megabytes. Further analysis showed that the OSB/ALSB console is full of all types of SLA alerts being generated. This blog article from Oracle connected the dots. SLA alerts are part of Weblogic Diagnostic Framework (WLDF) and gets stored in default diagnostic store. Because of high number of alerts, the rogue thread has always been running busy and was causing all kinds of resource consumptions.
Lesson learnt – have your SLA alerts checked if you experience similar issues. Tone them down if you can. Turn off globally if you need immediate relief. Purge Weblogic diagnostic store periodically through WLST script or just reset it to a new one if you have the option to do it.