Check out Matt's book!
This is the field guide every Microsoft Teams user out there has been waiting for, covering Teams, chat, meetings, files, and more!Get your copy now!
Ensure you get what you pay for
For today's post, I thought I’d take a bit of an administrator-centric detour to talk a little bit about Microsoft and Office 365 service level agreements—SLAs—what happens when Microsoft doesn’t meet them, and how to get a refund. For a video overview, complete with step-by-step how-to instructions on submitting a service credit request in the admin center, click play below.
A couple weeks ago, I saw a nice reminder on Reddit that when a Microsoft 365 service goes down long enough, customers are entitled to a credit back on what they paid for that or other affected services. I had totally forgotten about this—my tenant is only a few users so it’s not like the credit is huge to my budget or anything—but I thought it would be a good topic to go through for the blog.
Even more useful would be a case study running through what actually happens when you request that refund. So this post covers a bit about what you pay for, what refunds you’re entitled to, and what happens when you request that service credit. Huge disclaimer, though: I’m not an a lawyer and I’m not an SLA expert, but I am a customer who wants what he pays for. For your specific situation, you’ll always want to dive deep into the SLA and perhaps even get your legal counsel involved so you’re ready with arguments or backup when you request credits, especially if those credits are large.
Microsoft Service Level Agreements
Most of Microsoft’s online services come with a service level agreement. And almost all of them are set at the three-nine level: a 99.9%-uptime guarantee. You might also see references for four-nine and five-nine, though those are almost always advertisements of actual performance, not guaranteed performance.
Overall, Microsoft 365, Office 365, and Azure perform very well. But like all man-made systems, sometimes something happens and, well, they go down. If they’re down long enough, you as a customer are guaranteed a refund or credit.
There are probably dozens of various SLAs within the M365 ecosystem, whether it’s Azure Active Directory, Exchange Online, Teams call quality, various Power Platform ones, and more. What I’m saying is before you put in a credit request, be sure you understand the SLA of the specific service you’re talking about.
Generally, the credit is 25% back for less than 99.9% uptime, 50% back for less than 99%, and 100% back for less than 95%. Two things about this:
- From an outside perspective, being up at least 95% of the time, let alone 99.9% of the time is actually really, really good. I’ve worked at places with on-prem systems that would go down here and there and they rarely made those numbers; I’m not trying to generalize or make a trend out of a single data point, but frankly I find it pretty impressive. (I was not on the team responsible for uptime, for the record.)
- Getting back so much for not meeting that SLA is a pretty good deal. What you get back is a credit toward what you paid for the service to begin with. So if Teams calling performs at 99.5% and—making numbers up here—let’s say you pay $4 per user per month, you’d get back $1 per user affected: 25%. That’s pretty good.
However, you’ll never get more than you pay Microsoft for the service. Even if what you build affects a much larger revenue loss. Or maybe that lost call also lost you a six-figure customer. This isn’t insurance. It’s simply a reimbursement for what you’ve already paid for. Now, the math for this is generally set up as shown below.
It’s unnecessarily confusing and “user minutes” is not defined anywhere I’ve ever seen on Microsoft’s website. But what this all means is basically this: time time the service was up divided by the total time in the month. You can use minutes, hours, days, whatever unit you’d like, as long as you’re consistent with your dimensional analysis.
Where to find the service health reports
During a service outage, you’ll find a reference to the service in question in the M365 admin center under Health > Service Health > Incidents. Now, of course, if the admin center is affected, you won’t be able to see anything, which is why I generally like to follow the @MSFT365Status Twitter handle. I’ve got a full post on how to get push notifications when any M365 service goes down if you're interested.
The important piece of info you need here is the Incident ID. What you’ll find in the Service Health Center a listing of ongoing incidents. But by the time you’re looking for a credit, that incident is history. Which is why you want to click the History tab. Here, you can search the Incident ID and included is a history of updates on the incident and following the incident, a downloadable report on the entirety of the incident. These reports can be kind of interesting to read if you’re into the tech.
How to request a refund
So I decided to see for myself how this process all works. This is a case study of a small tenant—four licensed Office 365 users—in a Europe-based tenant with three users based in the United States.
If you’ve been paying attention, I’m sure you well remember the Azure Active Directory outage on September 28, 2020. Since AAD was down, effectively all Microsoft services were down. It was Incident ID MO222965. So we’re working under the AAD SLA.
Jumping to the Incident’s listing in the Service Health Center, you’ll see they list information about the outage, including that it lasted five hours. You can view all the message updates by clicking View history next to Latest Message.
In this case, I went through the M365 admin center to request my credit, even though this was technically an Azure issue. I figure it’s Microsoft’s issue to figure out who to route the ticket to given the huge impact of this outage.
In the admin center, you’ve got to click the little teal question mark box in the bottom-right to open any ticket. From here, you can’t just open a ticket. Of course you have to provide a summary of your issue so they can hopefully provide you some self-help information before you use up their time. Guess what is never an option for self-help: refunds!
So here I tried “SLA uptime refund”. No luck. But it does give the option to open a ticket. The subject line is populated with my search term and I can take the opportunity to provide some more detail in the box below it. Feel free to take my wording if you’d like.
“Due to the Sept 28 AAD outage (ID MO222965), our users were unable to access most of Microsoft 365 services during the outage. This outage is a violation of the 99.9% uptime SLA. I’d like to request the 25% credit for our Sept bill. Please let me know if you have any questions.”
Note that you have until the end of the next month to submit a credit request, so for this outage, the cutoff date is actually October 31,2020.
Provide your contact info and press submit. Now starts the waiting game.
An SLA refund case study
I submitted my request at 9:13 am local time on a Friday, got my first response at 9:31 am local time saying they were looking into it. The back and forth was friendly and professional over the course of a couple days. The person I worked with listed their work hours in their signature, so I had an idea of when I would get responses, though their hours were less than a full 8-hour day. So our discussion likely went longer than it would have if their hours were 8 hours per day and more aligned with my time zone. And I will say there was a lot of pushback on the credits. It took until the next Thursday—6 days—to settle the issue.
Basically, what I was told in their first response was that the issue didn’t affect my tenant and the incident was not listed in my Service Health Center. But that wasn’t true. I showed you the message. And I can tell you with certainty that I was unable to access Outlook, Teams, and even Word and Excel due to the authentication issues.
After pushing back, they offered me a 25% credit on one E5 license as a good-faith gesture, since they couldn’t confirm that I was affected. In response, I provided a screenshot of the Service Health Center to prove the incident existed in my tenant.
In response to the screen shot, they gave in and acknowledged the outage affected my tenant. But! They stated that the downtime was "announced" and therefore didn’t count against the SLA saying that the incident counted as 0.01% downtime from Microsoft.
They provided no math justifying where the 0.01% comes from (convenient figure, no?). Nor any justification on the meaning of the fact that it was “announced”. I’m not even sure what that means. I’m sure it’s referencing some loophole in the SLA where an if they acknowledge the outage, it doesn’t count against the SLA, but I don’t know the details. Kind of defeats the point, you know? And hell, it’s not my job to these details. If they’re going to deny me, they need to justify it and provide the source of that information. That’s what they’re paid for.
So, ignoring any concept of a loophole, I responded seeking to understand why the math wasn’t in my favor. Looking at the math of this situation, AAD was acknowledged to be down for 5 hours. I will note that we were impacted well before they acknowledged the outage started—probably 2 or 3hours—but that doesn’t really matter here because it wouldn’t change the outcome of the SLA uptime calculations. So that’s 5 hours out of 720 hours in September per user, which is 99.31% uptime, rounding up. Clearly below 99.9%.
I provided this math and asked for an explanation as to why this wasn’t the situation for my tenant. Ultimately, the Microsoft rep responded that my calculations were correct and they informed me I would be receiving the service credits, providing a listing of the amount I would receive as a credit for our four accounts. Ultimately, we’re getting the service credit that I’m under the impression we deserve.
So my takeaway here is not to accept an initial denial of service credits. Push back if you think you’re right. And if you’re not sure, don’t give up: place the burden on them to justify why you’re not entitled to the credit. They’re a trillion-dollar company and can afford it. Seriously, though.
Now, I’m [still] not a lawyer. I don’t write SLAs and I didn’t read them all. And sure, you can say, "Well, it's your responsibility to be fully aware of all the SLAs you agree to." Maybe. But what if you inherit a tenant as a new admin? Which, eventually, everyone will. My point here is, admins are not expected to be legal experts, even if they made the initial agreement, but they're customers just the same and have every right to complain when a perceived degradation of service occurs.
Additionally, my example is a very small tenant, minimal income for them, so there’s a chance that the Microsoft rep was right and just gave up because I’m a stubborn pain in the ass and it just wasn’t worth their effort. Wouldn’t be the first time, I’ll admit.
But to my credit, the math seems pretty clear. And they responded that I was correct once I sent them the details. Which is now a precedent for a future complaint. Maybe not a legal precedent, but a thorn-in-your-side precedent nonetheless. "You told me I was right so I’m going to refer to that later on" precedent. I can be really annoying when I think I’m owed something, by the way.
Anyway, that’s how you submit an SLA credit request. And how you do the math to confirm that you’re entitled to it. The larger the organization, the bigger the impact of this service credit. Your mileage of course may vary and you shouldn’t base your expectations on my experience. But I thought it might be useful to see how the process worked through a real example.
I hope you found this useful. Please leave any questions or comments below. I’m curious to hear your stories in how you’ve dealt with getting service credits back due to an SLA breach and if you noticed any issues or errors in my situation; I just reported as it went, not necessarily how it’s supposed to go. Happy working the system to get the refund you deserve after the anxiety that comes with an M365 service outage.