Update June 24: We observed another service disturbance on June 24. See this image for more details.
In SysKit, we rely heavily on cloud services like Office 365, Azure (DevOps), and a plethora of other cloud services we use daily. When one of these goes out, it is the equivalent of not having electricity; the company comes to a grinding halt.
In the past couple of weeks, we had a number of issues with Office 365, SharePoint Online in particular: sites were slow to load, Office clients could not open documents, users could not share documents, etc. The questions on our Microsoft Team channel were: “Is SharePoint Online Down?”, “Is Office 365 down?”. Given that SysKit Insights has built-in functionality to monitor the performance of SharePoint Online, I decided to look into if SharePoint is down.
Office 365 Service Health Status
I took a brief look into the Office 365 admin center but could not find any information about recent outages and problems we experienced, which I found odd, time to explore more.
SysKit Insights to the rescue
We are continuously using our tools in production. We have Insights running against our tenant 24/7, so I went in to find what I could find out, about service status for the period [June 12th – June 18th, 2019], and here are the results:
The monitored Office 365 tenancy is hosted in the European data centers.
As you can see on the image, we are tracking four different performance metrics that are available when a page in SharePoint Online is requested.
These metrics are:
- Page Load Time [ms] – the total time it took for a page to load
- Request Duration (SPRequestDuration) [ms] – the time required for the server to process the request
- SPIISLatency [ms] – the amount of time page was waiting to be processed
- Network Time [ms] – the amount of time page content was transferred through the network (potentially indicating a slow network problem)
As we can see from the data, we gathered, there were unusual, higher than average response times on June 13th midday and June 17th and 18th throughout almost entire working hours, the exact dates when our users had issues accessing the documents. If we look at the same data set, but with the 9th percentile line on it, we can easily spot that the service was slower than usual. So while it was not an outage, the evidence clearly confirms unusual, slowness of the SharePoint online environment.
One other metric, that is also available to monitor the slow performance of SharePoint online is called SPHealthScore. As per Microsoft Docs: “A value between 0 and 10, where 0 represents a low load and a high ability to process requests and 10 represents a high load and that the server is throttling requests to maintain adequate throughput.”. However, our data show there was nothing out of the ordinary for the SPHealthScore, when compared to other days of the period. (Notice that this value is 0 over the weekend).
So if you want to monitor your tenant and stay on top of not just potential outages but also cases like this when the service is not responsive, I invite you to try not just SysKit Insights. Make sure you configure it correctly, so you are alerted ahead of time about potential issues.
Our team is currently working to deliver the first version of SysKit Sense, our cloud offering, that is going to allow you to monitor your Office 365 instance directly from Azure, along with the plethora of geo-distributed global agents. Having multiple, geo-distributed agents ensures not just that performance is adequate for your geo-distributed company but also eliminates a potential local networking problem at one of your locations. Make sure you sign-up for preview of SysKit Sense.
The reports you can see in this blog post, are part of our effort to build a “SysKit Dashboard” live, online system that is going to show real-time data coming from SysKit Sense. Please let me know in the comments below if you would like to test this system.