June 15, 2020 T-Mobile
Network Outage Report
PS Docket No. 20-183
A Report of the Public Safety and Homeland Security Bureau
Federal Communications Commission
October 22, 2020
2
TABLE OF CONTENTS
Para.
I. INTRODUCTION...................................................................................................................................1
II. BACKGROUND.....................................................................................................................................3
III. INCIDENT AND RESPONSE................................................................................................................4
A. Architecture of T-Mobile’s Network ................................................................................................4
B. Root Cause and Event Summary ....................................................................................................11
C. Mitigation and Restoration Efforts .................................................................................................19
IV. ANALYSIS ...........................................................................................................................................26
A. Impact on Calls over T-Mobile’s Network.....................................................................................26
B. Impact on Calls Initiated Outside T-Mobile’s Network .................................................................29
C. Impact on 911 Calls Handled by T-Mobile ....................................................................................35
D. Public Impact ..................................................................................................................................38
V. CORRECTIVE ACTIONS BY T-MOBILE .........................................................................................43
VI. NEXT STEPS........................................................................................................................................45
APPENDIX A: Outage Timeline
3
I. INTRODUCTION
1. Midday on June 15, 2020, T-Mobile experienced an outage on its wireless networks that
lasted over twelve hours, disrupting calling and texting services nationwide, including 911 service, as
well as access to data service in some areas. The Public Safety and Homeland Security Bureau (PSHSB)
estimates based on data provided by T-Mobile and other affected service providers that at least 41% of all
calls that attempted to use T-Mobile’s network during the outage failed, including at least 23,621 failed
calls to 911. The outage was initially caused by an equipment failure and then exacerbated by a network
routing misconfiguration that occurred when T-Mobile introduced a new router into its network. In
addition, the outage was magnified by a software flaw in T-Mobile’s network that had been latent for
months and interfered with customers’ ability to initiate or receive voice calls during the outage.
2. The Bureau investigated this incident, its effects, and the response. As part of its
investigation, Bureau staff reviewed and analyzed outage reports, interviewed T-Mobile personnel,
submitted written questions to affected service providers, and issued a Public Notice seeking comment on
the effects of the outage on public safety and consumers, which yielded 190 responses.
1
This report
presents the Bureau’s findings and recommendations. This outage provides the Federal Communications
Commission (Commission) and stakeholders with the opportunity to learn valuable lessons about network
reliability and the implementation of industry-accepted best practices. For example, the outage
demonstrates the importance of network operators periodically auditing the diversity of their networks
and taking appropriate measures to ensure resilience as needed.
II. BACKGROUND
3. The Commission stays abreast of disruptions to the Nation’s communications infrastructure
through outage reports filed by communications providers in the Network Outage Reporting System
(NORS) in the wake of major disruptions to their networks.
2
As part of this reporting framework,
Commission rules require wireless service providers to report to the Commission “significant
degradation[s] in the ability of an end user to establish and maintain a channel of communications as a
result of failure or degradation in the performance of a communications provider’s network.”
3
An outage
that occurs on facilities that a wireless service provider owns, operates, leases, or otherwise uses is
reportable when it is at least 30 minutes in duration and, inter alia, potentially affects at least 900,000
user minutes of telephony and associated data service or potentially affects a 911 special facility (e.g., a
Public Safety Answering Point (PSAP)).
4
Wireless providers must submit this notification in NORS
1
Public Safety and Homeland Security Bureau Seeks Comment on the Effects of June 15, 2020 T-Mobile Outage on
Public Safety Entities, Government Entities, and Consumers, PS Docket No. 20-183, Public Notice, 35 FCC Rcd
6462 (PSHSB June 23, 2020).
2
NORS is the Commission’s web-based filing system through which communications providers covered by the part
4 outage reporting rules must submit reports to the Commission. These reports are presumed confidential to protect
sensitive and proprietary information about communications networks. See 47 CFR § 4.2. As noted below,
however, this report includes material that is not subject to such confidential treatment.
3
47 CFR 4.5(a); New Part 4 of the Commission’s Rules Concerning Disruptions to Communications, ET Docket
No. 04-35, Report and Order and Further Notice of Proposed Rulemaking, 19 FCC Rcd 16830, 16895-902, paras.
127-143 (2004).
4
47 CFR § 4.9(e)(1)(ii), (v); see also 47 CFR § 4.9(e)(1)(i), (iii), (iv) (requiring a wireless service outage be
reported when it affects a Mobile Switching Center, when it affects at least 667 OC3 minutes, or when it potentially
affects any special offices and facilities). PSHSB’s meetings with T-Mobile to discuss their outage reports during
the pendency of a permit-but-disclose proceeding fall within exception 10 of the Commission’s ex parte rules. 47
CFR § 1.1204(a)(10). Accordingly, PSHSB did not ask T-Mobile to disclose these meetings in the docket for this
proceeding, PS Docket No. 20-182, because doing so would have interfered with the effective conduct of its
investigation.
4
within 120 minutes of discovering that such an outage has occurred.
5
Wireless providers must also file an
Initial Outage Report not later than 72 hours after discovering the outage, and a Final Communications
Outage Report not later than 30 days after discovering the outage.
6
The Commission requires wireless
providers to notify 911 special facilities as soon as possible when they discover outages that could affect
them.
7
Wireless providers must convey all available and potentially useful information to the 911 special
facility to help mitigate the effects of the outage on callers to that facility.
8
III. INCIDENT AND RESPONSE
A. Architecture of T-Mobile’s Network
4. T-Mobile operates a nationwide wireless network which supports cellular calling over 4
th
Generation Long Term Evolution (LTE), 3
rd
Generation (3G), and 2
nd
Generation (2G) cellular
technologies, as well as calling over Wi-Fi. The overwhelming majority of T-Mobile customers use T-
Mobile’s LTE network. All Voice over LTE (VoLTE)-capable handsets sold by T-Mobile’s network
have the capacity to make calls over the 3G, 2G, or Voice over Wi-Fi networks when VoLTE calling is
not available. While each of these technologies uses different facilities to access the network, they also
work interdependently to ensure that phone calls can complete across different networks.
5. Each LTE tower in T-Mobile’s network relies on wired connections and several routers to
take the necessary steps to complete calls and provide Internet access. This network is illustrated in
Figure 1 below. When a mobile device initiates a VoLTE phone call, the call transits from the device to
the LTE tower, from the LTE tower through the Evolved Packet Core, and then to the IP Multimedia
Subsystem where the device is registered and the call is routed appropriately towards the destination. The
Evolved Packet Core allows the devices to access the Internet and provides a multitude of other critical
functions. When a mobile device uses LTE data, this data connection transits from the device to the LTE
tower, from the LTE tower to the LTE Evolved Packet Core, and then unlike a VoLTE call, to the
Internet.
5
47 CFR § 4.9(e)(1); 47 CFR § 4.11 (specifying additional information that these reports must contain).
6
47 CFR § 4.9(e)(4).
7
See New Part 4 of the Commission’s Rules Concerning Disruptions to Communications, ET Docket No. 04-35,
Report and Order and Further Notice of Proposed Rulemaking, 19 FCC Rcd 16830 (2004) (2004 Part 4 Report and
Order); 47 CFR § 4.9.
8
See New Part 4 of the Commission’s Rules Concerning Disruptions to Communications, ET Docket No. 04-35,
Report and Order and Further Notice of Proposed Rulemaking, 19 FCC Rcd 16830 (2004); 47 CFR § 4.9(e).
5
Figure 1 – Paths of Calls and Data Across T-Mobile’s Network
6. The vast majority of mobile devices connected to T-Mobile’s network are simultaneously
registered and connected to both LTE and T-Mobile’s 3G and 2G circuit-switched networks. This allows
for calls to be placed over circuit-switched networks in the event a VoLTE or Wi-Fi call fails or times out,
or if LTE and Wi-Fi coverage is poor in a particular area. To access the 3G and 2G circuit-switched
networks, as well as the 911 network, the IP Multimedia Subsystem uses a node to send calls to the
correct gateways that lead to other networks.
7. Routing. Routers connect T-Mobile’s LTE towers to T-Mobile’s LTE network. These
routers utilize a routing protocol called Open Shortest Path First. Under this protocol, each set of lines
that connects T-Mobile’s routers (also known as a “link”) is assigned a weight. The network decides
where to send LTE traffic by selecting the route with the lowest cumulative weight. Figure 2 illustrates
an example of how the optimal route is determined.
9
9
Figure 2 is for illustrative purposes only and does not reflect the actual weighting of T-Mobile links, T-Mobile’s
network topology, or the locations of T-Mobile routers.
WiFi
4G LTE
3G and 2G
Internet
WiFi
Gateway
3G/2G
Network
Circuit
Switched
Gateway
T-Mobile Operated
Network
Public Switched
Network
Internet
Other Service
Providers
IP Multimedia
Subsystem
LTE Evolved
Packet Core
Voice Traffic
6
Figure 2 – Example of the Open Shortest Path First Protocol
8. In Figure 2, each line that connects routers is assigned a number that represents its weight.
For traffic that originates from the Seattle router and is destined for Miami router, the path with the lowest
cumulative weight transits through the Los Angeles router. However, if the link between Seattle and Los
Angeles loses connectivity, the network will adapt by finding a new, alternative route. In that instance,
the path through Denver and New York will now have the lowest cumulative weight.
9. Mobile Device Registration. To make a VoLTE or Wi-Fi call, mobile devices must register
with T-Mobile’s Internet Protocol (IP) Multimedia Subsystem prior to sending and receiving any phone
calls or most text messages on T-Mobile’s network. The IP Multimedia Subsystem’s registration system
utilizes three nodes that manage the secure connection to the mobile device, route the call, and initiate the
connection with the called party. The mobile device registration system periodically refreshes user
registrations to ensure that information that it needs to connect calls and text messages stays updated. To
promote efficiency, T-Mobile programmed the system to retain, for each user, a record of which node was
last used to initiate a connection with a called party. If that specific node is congested, the system is
designed to try using a different node to complete the registration. T-Mobile states that its IP Multimedia
Subsystem contains many instances of this three-node registration system to provide both regional and
national redundancy. Conversely, LTE data does not require registration with the IP Multimedia
Subsystem.
10. Mobile devices also do not have to go through this same registration authentication process to
place 911 calls. FCC rules require unregistered mobile devices to be able to complete 911 calls to Public
Safety Answering Points (PSAPs).
10
Accordingly, in T-Mobile’s network, the IP Multimedia Subsystem
reroutes 911 calls made over VoLTE to a different node on T-Mobile’s network that is responsible for
processing 911 calls. As a result, in the event of an outage affecting T-Mobile’s IP Multimedia
Subsystem, 911 calls made over VoLTE and Voice over Wi-Fi would complete at a higher rate than other
VoLTE and Voice over Wi-Fi calls.
B. Root Cause and Event Summary
11. On June 15, 2020, T-Mobile was midway through the process of installing routers in the
southeast region of its network. Once the first of the two routers was installed, active, and handling
customer traffic, T-Mobile planned to slowly integrate the second new router into its network. T-Mobile
had planned to configure the second new router so that it was passive, connected to the network but only
10
See 47 CFR § 9.4 (requiring commercial mobile radio service (CMRS) providers subject to the 911 rules to
transmit all wireless 911 calls “without respect to their call validation process”); see also
911 Call-Forwarding
Requirements for Non-Service-Initialized Phones, Notice of Proposed Rulemaking, 30 FCC Rcd 3449, 3450, para. 1
(2015) (stating that the rule requires providers to transmit both 911 calls originating from customers that have
contracts with CMRS providers and calls originating from “non-service-initialized” devices to PSAPs).
7
receiving network traffic if another router or link between routers failed. To do this, T-Mobile
deliberately configured the links connected to the new passive router to have high Open Shortest Path
First weights. In the course of that configuration, T-Mobile misconfigured the weights of the links of
another router that was already active in the network segment but was not designed to process call
signaling traffic. T-Mobile did not have a fail-safe process in place to prevent or provide notice of this
misconfiguration. In the event of a router or link failure, the low Open Shortest Path First weights to this
router would cause it to receive a large percentage of call signaling traffic, which it could not pass.
12. Link Failure Exacerbated by Routing Misconfiguration. At 12:33 PM EDT, a fiber
transport link in the southeast region of T-Mobile’s VoLTE network failed.
11
Although T-Mobile
generally designed its network to mitigate this kind of failure by transferring traffic across a different link,
T-Mobile had misconfigured the weight of the links to one of its routers as illustrated in Figure 3. This
prevented the traffic from flowing to the new active router as intended.
12
Instead, traffic flowed to a
router that was not prepared to receive that traffic and not properly configured to pass a large percentage
of the call signaling traffic it received.
13
Because the router could not pass the traffic, the Atlanta market
became isolated, causing all LTE users in the market to lose connectivity to LTE data, VoLTE, and the
3G and 2G circuit switch network, which disrupted voice, text, and data services in the Southeast. After
twelve minutes, at 12:45 PM EDT, the fiber transport link was restored without intervention, ending the
isolation of the Atlanta market. However, registration system congestion caused by the link failure would
continue to affect T-Mobile’s networks.
13. Software Error. When the fiber transport link failed and the Atlanta market became isolated,
mobile device registration attempts in that market timed out. Mobile devices in the Atlanta area then tried
to re-register with the IP Multimedia Subsystem using Wi-Fi. While the network was designed to connect
to a different node to complete the registration, a software error triggered by the market isolation
prevented that connection from being completed.
14
Instead, the registration system repeatedly routed re-
registration attempts for each mobile device to the last node retained in its records, which was unavailable
due to the market isolation. Accordingly, mobile devices repeatedly attempted and failed to register using
Wi-Fi, creating a “registration storm” that congested the IP Multimedia Subsystem. After the failed
optical link was restored at 12:45 PM EDT, the mobile devices that had attempted and failed to register
over Wi-Fi now attempted to re-register over VoLTE. Because the IP Multimedia Subsystem was still
congested by the registration storm, the VoLTE re-registration attempts failed, and their network activity
further exacerbated the congestion. T-Mobile customers that could not re-register were unable to make
VoLTE and Wi-Fi calls, but could use LTE data.
14. Troubleshooting Misdiagnosis Exacerbates Outage. While T-Mobile engineers attempted
to recover from this outage and restore service, they ended up exacerbating its impact because they
11
A full timeline of this outage and T-Mobile’s attempts to mitigate it is included as Appendix A.
12
Configuration A represents how T-Mobile misconfigured the Open Shortest Path First weights of the links in the
network segment that precipitated this outage. As a result, when the fiber transport link failed and traffic was
redirected to the lowest cumulative weight route, it arrived at a router that was not prepared to receive call signaling
traffic, resulting in the traffic being dropped. Configuration B represents how T-Mobile could have configured the
Open Shortest Path First weights to properly transfer traffic in the event of a link failure. If T-Mobile had set the
weights of the Open Shortest Path First links to the routers that could not process call signaling traffic higher, as
Configuration B shows, then traffic would not have flowed to routers that could not process call signaling traffic and
dropped. Instead, the path with the lowest cumulative weights would have passed only through routers that were
prepared to handle call signaling traffic and to the passive router, as intended.
13
The traffic dropped was Multi-Protocol Label Switching (MPLS), which is used for traffic engineering and
optimizing the resources of a network.
14
This software error likely did not cause problems before this outage occurred because the outage was the first
notable market isolation since T-Mobile integrated this software into its network.
8
misdiagnosed the problem. T-Mobile believed that the fiber transport link that failed earlier in the day
was continuing to cause the ongoing outage. Acting on this belief, T-Mobile manually shut down the link
in an attempt to transfer traffic away from it. Due to the still-misconfigured Open Shortest Path First
weights, however, these steps recreated the outage’s initial conditions. LTE customers in the Atlanta
market were again disconnected from the LTE network and forced to establish calls over Wi-Fi, and their
registration attempts again failed and created a registration storm that added further congestion to T-
Mobile’s IP Multimedia Subsystem.
15
15. T-Mobile engineers almost immediately recognized that they had misdiagnosed the problem.
However, they were unable to resolve the issue by restoring the link because the network management
tools required to do so remotely relied on the same paths they had just disabled. When T-Mobile
engineers were able to access the equipment on site and correct their mistake by restoring the link an hour
later, customers in the Atlanta market were again able to attempt to register to VoLTE. However, this
again created additional congestion because T-Mobile engineers had not yet addressed the software error
that prevented registrations from completing.
16. Nationwide Spread. This wave of Voice over Wi-Fi and VoLTE registration attempts
resulted in the outage spreading out of the Atlanta market and across the country. When the IP
Multimedia Subsystem’s registration system for the Atlanta market was unavailable, external incoming
traffic destined for that system was redirected to the IP Multimedia Subsystem registration systems for
other regions. This, in turn, created enough congestion in those registration systems to cause the T-
Mobile network to send the registration attempts to other nodes. The software error again routed re-
registration attempts to the last node on record, which was likely already experiencing severe congestion.
Around 3:00 PM EDT, IP Multimedia Subsystem, VoLTE, and Voice over Wi-Fi registrations began to
fail nationwide as all IP Multimedia Subsystem registration nodes became increasingly congested.
17. Spread to 3G and 2G Networks. The vast majority of T-Mobile customer mobile devices
that were unable to connect to the VoLTE or Voice over Wi-Fi networks after 10 seconds fell back to T-
Mobile’s 3G and 2G circuit-switched networks to make and receive calls while the device continued its
registration attempts to the VoLTE network. The large number of devices attempting to fall back to the
3G and 2G networks created intermittent congestion in those networks, too. When 3G and 2G calls began
to fail due to that congestion, the network nodes that choose gateways for IP Multimedia Subsystem calls
destined for those networks would hold the resources for these call sessions after the call terminated.
These abandoned call sessions’ resource reservations overwhelmed these nodes’ computing resources,
which caused many 3G calls and 2G calls to fail.
18. Spread to 911 Networks. While mobile devices do not need to have their registration
authenticated in order to complete 911 calls, 911 calls were nonetheless affected by the 3G and 2G
network congestion because the same network nodes that choose gateways for calls destined for 2G and
2G networks also choose gateways for 911 calls. When those nodes’ computing resources became
overwhelmed by abandoned call sessions’ resource reservations, it also caused many 911 calls to fail.
C. Mitigation and Restoration Efforts
19. T-Mobile noticed service disruptions in its LTE network starting with the initial link failure at
12:33 PM EDT on June 15, 2020 when it confirmed that a fiber transport link to one of its routers failed.
From 12:45 PM EDT until 3:22 PM EDT, T-Mobile believed the service disruptions were caused by
either the new router it connected to the network or the link that had failed at the start of the outage,
which led T-Mobile engineers to manually shut down the external link to the new router. T-Mobile filed
15
The outage remained contained to the Atlanta market at this point because T-Mobile’s network is designed so that
registration traffic in one region cannot overflow into other regions.
9
a notification in NORS at 3:06 PM EDT.
16
20. From that point onward, T-Mobile’s restoration efforts were primarily focused on quelling
the registration congestion. Although T-Mobile had not yet diagnosed or fixed the software error, T-
Mobile reduced the number of registrations retries allowed by the IP Multimedia Subsystem registration
nodes. T-Mobile attempted to mitigate some of the congestion by requesting that its wholesale transport
provider, Inteliquent, Inc. (Inteliquent), lock inbound local and long-distance traffic.
17
Further, T-Mobile
increased capacity of the registration system by activating additional IP Multimedia Subsystem
registration nodes to increase capacity. T-Mobile also turned off those nodes’ overload controls, which
had transferred excessive signaling to other, regional nodes, spreading the outage beyond the Atlanta
market. Finally, T-Mobile restarted, removed, and replaced some of the nodes that choose gateways for
IP Multimedia Subsystem calls destined for 3G, 2G, and 911 networks to clear congestion from them.
Together, these changes reduced network congestion and restored the network to a normal working state
at 12:46 AM EDT on June 16.
21. Public Communication. T-Mobile also attempted to mitigate the effect of the outage by
communicating with its subscribers through a variety of channels, including direct communications with
subscribers (both individual and enterprise) as well as public statements and responses to media inquiries.
Neville Ray, T-Mobile President of Technology, first confirmed the existence of the outage for T-
Mobile’s subscribers and the public on Twitter at 4:18 PM EDT by describing it as a “voice and data
issue that has been affecting customers around the country,” and he would later tweet to encourage people
to use over-the-top voice apps like WhatsApp, iMessage, Signal, and FaceTime.
18
T-Mobile tweeted in
Spanish to confirm the existence of the outage at 6:25 PM EDT.
19
At 7:00 PM EDT, T-Mobile posted an
outage statement as a splash screen across all of their digital properties. T-Mobile also used Twitter to
announce that the outage had been resolved.
22. Over-the-top Voice Applications. T-Mobile states that the availability of over-the-top voice
applications, such as WhatsApp calling, Facetime, and Facebook Messenger, may have mitigated the
customer impact of the outage. Consistent with T-Mobile’s recommendation via Twitter, some T-Mobile
subscribers completed calls during the outage in this way. T-Mobile states that over-the-top voice
applications were functional and used by customers throughout this outage with the exception of short
periods in the Atlanta market when LTE data was unavailable. The LTE data on which over-the-top
voice applications rely continued to be operational during this outage because LTE data does not require
registration with the IP Multimedia Subsystem. Although T-Mobile states that it cannot calculate the
exact number of over-the-top calls made during the outage, its data suggests that customers used over-
the-top applications to make voice calls during the outage.
20
The increase in over-the-top traffic was
likely driven by users’ inability to complete calls over T-Mobile’s network.
16
T-Mobile filed its initial report on June 18, 2020 at 2:57 PM EDT and its final report on July 15, 2020. PSHSB
also reviewed NORS reports filed in connection with the outage from AT&T, US Cellular, and Verizon.
17
In instances in which T-Mobile does not directly interconnect with another telecommunications service provider
to deliver voice calls, T-Mobile generally has arranged to exchange voice calls indirectly via Inteliquent’s network
(e.g., by designating Inteliquent as T-Mobile’s default tandem).
18
Neville Ray, Twitter, https://twitter.com/NevilleRay/status/1272624569707184128 (last visited Aug. 17, 2020);
Neville Ray, Twitter, https://twitter.com/NevilleRay/status/1272650750665953280 (last visited Aug. 18, 2020).
19
T-Mobile Latino, Twitter, https://twitter.com/TMobileLatino/status/1272656463148781568 (last visited Aug. 17,
2020).
20
Not all over-the-top calls carried by T-Mobile’s network were necessarily successful. Some of these calls may
have failed because of the data outage in the Atlanta market and because while some over-the-top calling apps are
data-only (e.g., WhatsApp), other over-the-top calling apps allow calls to use the public-switched telephone network
(e.g., Skype-out), which could have been affected by this outage.
10
23. PSAP Notification. T-Mobile also attempted to mitigate the effect of the outage by notifying
PSAPs. T-Mobile avers that it began notifying all the PSAPs with which it is connected nationwide via
email and/or phone call immediately after determining that the outage was reportable.
21
Specifically, T-
Mobile states that it determined the outage was reportable at 2:35 PM EDT and that it began notifying
PSAPs at 2:41 PM EDT. T-Mobile’s notification to PSAPs stated that “911 calls are still completing”
and warned PSAPs only that the delivery of location information may be affected. Immediately after
sending this email message, T-Mobile placed automated phone calls to those PSAPs around the country
that had informed T-Mobile that they require telephonic confirmation of outage notifications.
22
Although
this notification understated the impact of the outage on 911 calling, T-Mobile states that the information
it provided to PSAPs was consistent with its understanding of the outage’s impact at that time. T-Mobile
did not follow up with PSAPs to update their understanding of how the outage may be affecting them
until it began sending emails and/or phone calls to the same PSAPs to inform them that the outage was
resolved.
23
24. PSHSB elicited input from public safety entities on this outage’s impact and the notification
that they received from T-Mobile. PSHSB did not receive complaints from PSAPs that T-Mobile did not
notify them, and PSAPs did not submit comments on the record raising concerns that T-Mobile’s PSAP
notifications were inaccurate. Jefferson County, Colorado Emergency Communications Authority
(Jefferson County, Colorado) commented that it did not receive an official notification from T-Mobile
about the outage until 3:00 PM EDT and that, because of that lapse in time, it “was forced to resort to
piecing available information together to discover the scope of the outage,” including later trying to
determine which service providers were impacted by the outage.
24
25. Public Notification by PSAPs. Public safety officials in Seminole County, Florida and
Jefferson County, Colorado sent emergency alerts using the Emergency Alert System and Wireless
Emergency Alerts to mitigate the outage’s impact on the public’s access to emergency services. Both
entities’ alerts informed the public that 911 service was down for “some carriers,” and instructed the
public that, if they needed emergency assistance, they should call a PSAP’s alternative 10-digit number,
which they included in the body of the message.
25
Jefferson County, Colorado states that, when it sent
out its alert at 7:02 PM EDT, it received “122 emergency and administrative calls in the next [five] . . .
minutes,”
26
suggesting that there was significant pent-up demand to reach the PSAP because callers were
not able to reach it by dialing 911. Jefferson County, Colorado states that it received over 1,800 calls to
21
T-Mobile states that it updates PSAP contact information promptly upon receiving updates from PSAPs, and
proactively requests the PSAPs to update their contact information twice a year.
22
T-Mobile’s Initial PSAP Notification stated that “T-Mobile is notifying you that it is working to resolve a network
degradation that may impact the delivery of location information to your PSAP(s). T-Mobile understands that voice
calls to 911 are still completing. The FCC will be notified when T-Mobile files the appropriate Outage Notification.
T-Mobile’s Network Operations Center is available for any inquiries 24 hours a day, 7 days a week at [phone
number omitted], Option #7. Please reference Trouble Ticket [number omitted] when contacting T-Mobile.”
23
T-Mobile’s Follow-Up PSAP Notification stated that “The network degradation T-Mobile recently informed you
of (reference Trouble Ticket [number omitted]) is resolved. T-Mobile’s Network Operations Center is available for
any inquiries 24 hours a day, 7 days a week at [phone number omitted], Option #7. Please reference Trouble Ticket
[number omitted] when contacting T-Mobile.”
24
Jefferson County Communications Center Authority July 2, 2020, Comments at 2.
25
The public notification that these PSAPs provided is a standard response to an observed degradation of 911. T-
Mobile’s PSAP notification did not include sufficient information to inform PSAPs about how the public could
work around this outage.
26
Id. at 1.
11
its administrative line on June 15, 2020, more than on any other single day ever.
27
Some public safety
officials also notified the public about T-Mobile’s outage via social media. For example, Allegheny
County, Pennsylvania tweeted that some cell phone callers may not be able to call 911;
28
the Harris
County, Texas Sheriff’s Office tweeted that T-Mobile’s outage was affecting 911 service;
29
and the South
Salt Lake, Utah Police Department posted on Facebook that texting 911 and its administrative line may
work for T-Mobile customers while 911 service was down.
30
IV. ANALYSIS
A. Impact on Calls over T-Mobile’s Network
26. T-Mobile states that its network experienced an 18% reduction in completed calls during the
over-12-hour period of the outage when compared to the same period during the previous Monday. T-
Mobile states that some T-Mobile customers nationwide may have experienced intermittent issues placing
and receiving calls over VoLTE, depositing and retrieving voicemails over LTE, and sending and
receiving text messages. T-Mobile also states that, with the exception of subscribers in the Atlanta
market during the initial transport link failure and market isolation, all T-Mobile customers were able to
use LTE data services throughout the event. T-Mobile states that this outage was unrelated to any work
to integrate the T-Mobile and Sprint networks, and legacy Sprint customers were unaffected.
27. Critically, an 18% reduction in call success does not mean that only 18% of calls on T-
Mobile’s network failed during the outage. T-Mobile acknowledges that network congestion likely
required many of its subscribers to make 2-3 call attempts before successfully connecting. The record in
this proceeding demonstrates that consumers were frustrated that their calls failed, rather than satisfied
because their calls eventually succeeded after many retries.
31
28. As such, PSHSB requested for T-Mobile to disclose the number of failed calls during the
outage, rather than the reduction in completed calls. T-Mobile states that it is limited in its ability to
measure call failures. T-Mobile states that it cannot provide estimates of the total number of call
attempts, nor the total number of failed calls during the outage because its network does not record call
attempts that failed to successfully register with the IP Multimedia Subsystem or that failed during the
earliest stages of the multi-stage call completion process. T-Mobile also states that it cannot accurately
estimate the number of calls originating from other service providers (i.e., calls sent to, rather than by, T-
Mobile subscribers) that failed while attempting to reach T-Mobile’s network. T-Mobile was able to
estimate the number of call attempts that failed as a result of the congestion on their 3G and 2G networks
because those calls failed after registration was complete.
32
In short, the outage measurements that T-
Mobile provided to PSHSB in the course of its investigation likely do not fully capture the call failures
caused by T-Mobile’s outage nor accurately reflect the consumer experience of the outage.
27
Id.
28
Allegheny County, Twitter, https://twitter.com/Allegheny_Co/status/1272636542276849664 (last visited Aug. 21,
2020).
29
Harris County Sheriff’s Office, Twitter, https://twitter.com/HCSOTexas/status/1272625869492715521 (last
visited Aug. 21, 2020).
30
South Salt Lake Police Department, Facebook, https://www.facebook.com/SSLPD/posts/1215015118846786 (last
visited Aug. 21, 2020).
31
See, e.g., Denae Jones June 24, 2020, Comments at 1; Mackenzie Rouse and Jake Rouse June 26, 2020,
Comments at 1; Franco Eulogio Mau June 24, 2020, Comments at 1; Menachem R. June 24, 2020, Comments at 1;
Scott Sprague June 23, 2020, Comments at 1.
32
T-Mobile states that its customers were generally able to place calls and send messages on the T-Mobile network
using 2G, 3G, and over-the-top voice and text applications, with “some limitations.”
12
B. Impact on Calls Initiated Outside T-Mobile’s Network
29. Although T-Mobile states that it cannot accurately estimate the number of calls initiated
outside T-Mobile’s network that failed due to the outage, T-Mobile did estimate that the outage resulted
in an incremental loss of 1.5% of calls originating on other carrier networks that T-Mobile states it would
have attempted to complete on a normal day. PSHSB’s investigation of this outage, however, gives it
access to data that was not available to T-Mobile when T-Mobile estimated its outage’s impact. PSHSB
finds that T-Mobile’s estimate of failed calls originating from other service providers’ networks is
significantly lower than, and conflicts with, some of those providers’ estimates. PSHSB estimates, based
on confidential and non-confidential data that other service providers shared with PSHSB, that over 250
million calls (or 73% of calls shared with PSHSB) from other service providers’ subscribers to T-Mobile
subscribers failed due to the outage.
30. Based on confidential call success and 3G and 2G call failure data shared by T-Mobile,
together with data on 911 calls and calls originating outside of T-Mobile’s network, PSHSB estimates
that at least 41% of all calls that attempted to use T-Mobile’s network during the outage did not complete
successfully. This estimate does not include any possible call failures arising from T-Mobile subscribers’
VoLTE or Voice over Wi-Fi call attempts, which could not be determined. However, PSHSB expects
that if this number could be determined, it would result in PSHSB’s estimate being much larger.
31. CenturyLink. CenturyLink found that its July Monday Average for failed calls was 0.195%
of the number of failed calls it experienced on June 15; and that its July Monday Average for dropped
calls was 0.053% of the number of dropped calls it experienced on June 15. In other words, CenturyLink
experienced more than 500 times more failed calls and more than 1,800 times more dropped calls as a
result of this outage.
32. US Cellular. US Cellular reports that 285,497 calls from its network successfully completed
into T-Mobile’s network during the outage, as compared to 951,271 during the same period on the
preceding Monday, a 70% reduction in call success. US Cellular states that during two periods, between
3:10 – 5:30 PM EDT and 8:20 – 9:20 PM EDT, most calls (99%) were blocked as compared to the
average Monday where 1.9% of calls are blocked.
33
Further, US Cellular reports that it received 308,766
calls from T-Mobile’s network during the outage as compared to 1,032,543 during the same period on the
preceding Monday, a 70% reduction in call success.
33. AT&T. AT&T reports that 30,410,776 AT&T Mobility calls and 93,459 wireline calls from
AT&T networks were blocked from delivery to T-Mobile’s network during the outage, as compared to
213,704 combined AT&T Mobility and wireline calls blocked from delivery to T-Mobile’s network on an
average Monday. AT&T estimates that it experienced over 99.9% call blocking from AT&T’s network to
T-Mobile’s network from 2:00 PM EDT to 6:00 PM EDT, and over 90% call blocking from AT&T’s
network to T-Mobile’s network from approximately 6:00 PM EDT to 11:00 PM EDT. AT&T further
reports that 8.2 million calls from AT&T Mobility’s wireless network successfully completed into T-
Mobile’s network during the outage, as compared to 30.5 million during the same period of the preceding
Monday, a 73% reduction in call success. Further, AT&T estimates that 940,000 calls were successfully
completed from T-Mobile to AT&T Mobility during the outage period as compared to 19.6 million
during the same period on the preceding Monday, a 95% reduction in call success.
34. Verizon. Verizon reports that approximately 11,800,000 Verizon Wireless calls and 373,460
wireline calls were blocked from delivery to T-Mobile’s network during the outage, as compared to less
than 10 per hour on an average Monday. Verizon states that, to the best of its knowledge, all wireline
calls to the T-Mobile network were blocked for the entirety of the event. With respect to wireless calls,
Verizon states that, at 12:33 PM EDT, the failure rate to T-Mobile was 18%; at 2:48 PM EDT, it
33
US Cellular indicates that a higher percentage of calls did go through between 5:30 PM EDT and 8:20 PM EDT.
13
oscillated between 70% and 95%; at 5:25 PM EDT, it fell to around 50%; and at 10:31 PM EDT, it fell to
8%. By 11:45 PM EDT, the call failure rate fell to Verizon’s baseline call failure rates for a typical
Monday.
C. Impact on 911 Calls Handled by T-Mobile
35. According to T-Mobile, the outage prevented 23,621 of the 134,874 calls to 911 that reached
T-Mobile’s network (17.5%) from reaching PSAPs. 911 calls made over T-Mobile’s VoLTE and Voice
over Wi-Fi networks completed at a higher rate than other calls made during the outage because T-
Mobile’s 911 infrastructure did not require 911 callers to register using the congested registration nodes.
The 911 calls that failed on T-Mobile’s network did so for several reasons:
7,469 calls to 911 failed due to congestion after reaching the part of T-Mobile’s IP
Multimedia Subsystem that sends calls to the 911 network.
16,152 calls to 911 failed due to congestion after reaching T-Mobile’s 3G and 2G circuit-
switched networks.
36. Within this account of failed 911 calls due to congestion, 2,501 calls to 911 failed because of
the congestion on PSAP administrative lines, which would have resulted in 911 callers receiving a busy
signal,
34
and 1,128 calls to 911 failed because of congestion at a national emergency call center used by
T-Mobile to deliver calls that cannot otherwise be routed. T-Mobile also counts within this total 572 calls
to California PSAPs that failed because they were not default routed, which T-Mobile asserts to be
consistent with state requirements.
35
T-Mobile delivered location information for 134,524 emergency
calls during the outage. Of these, it provided Phase II location information for 111,454 calls, suggesting
that only Phase I location was delivered for 23,070 calls.
37. As with its analysis of failed calls during T-Mobile’s outage,
36
PSHSB expects that the
average customer may have needed to make multiple call attempts to reach 911. T-Mobile states that it
saw a 30% increase in 911 call attempts on June 15 as compared to the average number of 911 calls
attempted during the previous two weeks. Some of this increase may have been due to “911 Hang-
up/Checks,” where the initial caller called 911 just to confirm whether they would be able to reach 911
call-takers in the event of an emergency, but then hung up, prompting the PSAP to call back.
37
Jefferson
County, Colorado states that this outage generated a higher-than-normal number of “911 Hang-
up/Checks,” which may have delayed needed emergency services because of the need for the PSAP to
call back.
38
D. Public Impact
38. Comments confirm PSHSB’s findings the public lost data service in some areas during this
outage, and lost calling and texting services nationwide. Due to the congestion in the IP Multimedia
34
T-Mobile states that these calls failed due to either a lack of availability of 911-call takers or the fact that the
administrative line may have been congested by users testing their ability to reach 911.
35
Cal. Gov't Code § 8592.8-.9 (2017) (stating that a 911 call may be routed to the California Highway Patrol call
center or local PSAP after an annual review assesses the appropriate call routing to “maximize the efficiency of the
911 system” based on where the 911 call originates, whether routing is “economically and technologically feasible,”
and whether routing “benefit[s] public safety”).
36
PSHSB expects that T-Mobile’s records as to the number of attempted 911 calls are more likely to be complete
than T-Mobile’s overall calling records because the 911 calls that did fail during the outage failed because of
congestion, not because of an inability to register.
37
Jefferson County Communications Center Authority July 2, 2020, Comments at 1.
38
Id.
14
Subsystem, T-Mobile customers would have experienced random call success as the network would have
blocked normal public calls indiscriminately. This would mean that, while some callers might have been
able to connect a call on the first or second try, many other callers would have needed many more call
attempts to complete a call. The Bureau highlights the following themes extracted from these public
comments to illustrate this outage’s impact.
39. Based on the record, the June 15 outage on T-Mobile’s networks prevented some consumers
from summoning the help that they needed during emergencies. Not only were some consumers unable
to reach PSAPs by dialing 911, but they also were unable to reach roadside-service providers, medical
professionals, and family. One commenter noted that his mother, who has dementia, could not reach him
after her car would not start and her roadside-assistance provider could not call her to clarify her location;
she was stranded for seven hours but eventually contacted her son via a friend’s WhatsApp.
39
One
medical professional said that this outage prevented him from connecting to his patients for telephone and
video appointments, exacerbating the stress induced by the coronavirus COVID-19 pandemic.
40
Another
medical professional said he felt lucky that he was not on-call at his hospital on June 15 because he was
unreachable due to his cellphone being his primary link to the hospital.
41
Two commenters described
being unable to alert family about surgery complications while alone in the hospital due to COVID-19-
related restrictive guests policies.
42
Fortunately, the Bureau did not receive any comments suggesting that
individuals experienced physical harm as a direct result of this outage.
40. The outage likely produced a large financial impact for individuals, employees, and
businesses. The record suggests that this outage resulted in a lost day of productivity for many who rely
on communications networks to do their jobs. For example, one commenter, a social worker, could not
communicate with at-risk children and families.
43
Another commenter stated that he missed client phone
calls and text messages, which cost him more than $3,000 in billable hours.
44
Others could not
accomplish basic work-related tasks such as scanning packages or using ride-sharing services to commute
to work.
45
Others expressed frustration at paying for T-Mobile’s wireless service, and then not being
compensated by T-Mobile when that service became unavailable.
46
The record does show, however, that
39
Kevin Fuhr June 23, 2020, Comments at 1.
40
Jake Walsh June 29, 2020, Comments at 1.
41
Vincent Romanelli June 24, 2020, Comments at 1.
42
See, e.g., Dawn Allen June 25, 2020, Comments at 1 (“I ended up being hospitalized for what was suppose[d] to
be an outpatient procedure. I was unable to be reached by my family[,] . . . trying to get a status on my care[,] nor
could I reach them. Being in pain and hospitalized while alone with no communication or explanation as to why
[telecommunication] services were disrupted compounded…my situation.”); Doug Bass June 25, 2020, Comments
at 1 (“I had my knee replaced at 8:45 . . . When I came out of surgery at 11:30, I was unable to get in touch with my
mother (the single visitor I was allowed to have for the entire day…due to COVID). I had to wait until almost 2 pm
when the nurse was able to get a phone (landline handset) to bring to my room for me to use.”).
43
Xiomara Cosme June 24, 2020, Comments at 1 (“I am a Children and Youth Social Worker where most of my job
duties are done from my phone. Due to the outage, I was not able to assess the safety of children or follow up with
families that where in need.”).
44
Jordan June 24, 2020, Comments at 1.
45
Brian Elsman June 23, 2020, Comments at 1; Denae Jones June 24, 2020, Comments at 1.
46
Denae Jones June 24, 2020, Comments at 1 ("T[-M]obile never sent a message, nor an apology[, n]or
compensated/credited me for that day.”); Mackenzie Rouse and Jake Rouse June 26, 2020, Comments at 1 ("T-
Mobile has stated they won’t credit our account because of the outage and I think that’s wrong.”); Franco Eulogio
Mau June 24, 2020, Comments at 1(“No refunds.”).
15
some individuals who contacted T-Mobile to complain about the outage received $5 or $10.
47
41. Based on the record, the effects of the COVID-19 pandemic on consumers’ employment
circumstances appear to have worsened this outage’s disruption of people’s work lives. One commenter
stated that he was unable to remotely log in to his workplace because it requires two-factor authentication,
which did not function properly due to the outage.
48
Several commenters stated that the outage caused
them to miss job opportunities, including phone interviews,
49
and frustrated furloughed or laid-off
employees’ ability to file for unemployment.
50
42. PSHSB received more than 60 comments reporting a lack of communication from T-Mobile.
One commenter states that he thought the issue was his phone, so he bought a new one.
51
Another
commenter states that he drove to multiple T-Mobile stores during the pandemic to identify the issue.
52
One commenter states that he tried calling T-Mobile for more information, but T-Mobile did not answer,
53
and another commenter states that he called T-Mobile, but the person he spoke to said it was an issue
impacting all service providers.
54
Commenters also highlighted opportunities that T-Mobile could have
taken to communicate more effectively with the public about this outage. Several commenters noted that
T-Mobile could have alerted its customers about the outage via email or text message because LTE data
service was available for many consumers.
55
V. CORRECTIVE ACTIONS BY T-MOBILE
43. After T-Mobile resolved the outage, it took steps to prevent a recurrence of a similar event.
Specifically, T-Mobile:
Optimized Open Shortest Path First weights on links connecting to the routers in the Atlanta
47
Menachem R. June 24, 2020, Comments at 1 (“A lot of business lost . . . For my issues over two business days, I
was given $10 . . . [D]isgusting!”); Scott Sprague June 23, 2020, Comments at 1 (“I reached out to T-Mobile that
evening to see what was going on and if I could get some sort of credit from them. I fought with them for over two
hours because all they wanted to credit me was five dollars when I had missed out on a couple of hundred dollar
jobs.”); Nikki Gilbert July 15, 2020, Ex Parte at 1 (“You charge [$]75.00 or more for family plans but yet you offer
me [$]5.00 in a crisis? Sad.”).
48
George Rasko June 24, 2020, Comments at 1 (“My employer requires two-factor authorization for many on-line
activities. Logging-in with a password isn't enough . . . During the T[-]Mobile outage, the IT Help Desk spent two
hours trying to figure out why . . . I was not getting a call to finalize my computer login.”).
49
Rachel Church June 24, 2020, Comments at 1; Jordan Abad June 23, 2020, Comments at 1; Michael Farough June
24, 2020, Comments at 1; Samantha Dixon June 24, 2020, Comments at 1.
50
Martin Jamison-LeGere June 24, 2020, Comments at 1.
51
James Parziale July 1, 2020, Comments at 1.
52
David L. Risdon June 24, 2020, Comments at 1.
53
Brian Elsman June 23, 2020, Comments at 1 (stating that he eventually reached customer service via email).
54
Victor Burns June 30, 2020, Comments at 1.
55
Michael Thaler June 24, 2020, Comments at 1 (“There is NO REASON for T-M not sending texts to ALL
CUSTOMERS who might have been affected by the outage and to let us know they are working on it. SMS was
working as was data—meaning they could have sent emails.”); Stephanie Christmas June 24, 2020, Comments at 1
(“No information about the outage was available on their website or Mobile App.”); Nathaniel Leandro June 23,
2020, Comments at 1 (“My data services were not interrupted and I have T-Mobile's carrier app installed on my
phone. They should have sent a push notification…[or] text alerts.”); Kirk Ealy June 24, 2020, Comments at 1 (“No
text from T-Mobile about the issue and at this time my ability to send and receive a SMS message was working! . . .
The customer app showed no alert.”); but see J Bibi June 25, 2020, Comments at 1 (stating “I expect consumers to
be knowledgeable enough . . . to perform a simple Google search to learn about the outage.”)
16
market;
Created a separate communications channel to enable T-Mobile to manage the affected router
even during an outage condition so that, in the case of a recurrence of a similar event, T-Mobile
would be able to restore the affected router to a working state more quickly after intentionally
taking it offline;
Augmented processes regarding the phased integration of new devices into the network to include
additional potential failure scenarios like those seen in this outage;
Activated additional IP Multimedia Subsystem registration nodes to increase capacity;
Revised IP Multimedia Subsystem registration nodes’ overload settings for better management of
overload conditions;
Corrected the software error in the IP Multimedia Subsystem;
Introduced additional dedicated 911 nodes to enhance resiliency;
Reduced the number of retries allowed by the IP Multimedia Subsystem registration nodes
responsible for managing the secure connection with the mobile device from 4 to 2;
Improved the clarity and specificity of the error message generated on nodes that interconnect
with external networks when IP Multimedia Subsystem services are impacted to facilitate future
troubleshooting;
Improved call distribution logic for Voice over Wi-Fi services to allow regional containment
during potential future outages;
Deployed new vendor software updates to improve IP Multimedia Subsystem node robustness
and resiliency; and
Audited multiple systems across the circuit-switch, IP Multimedia Subsystem, and transport
networks for potential enhancements.
44. While fiber link failures are common, PSHSB finds that these steps, taken together, will
reduce the likelihood that a fiber link failure could result in the recurrence of a similar event in T-
Mobile’s network because traffic would be routed to an alternative path that could handle it. Moreover, if
such an event recurred on T-Mobile’s network, it would not cause such a large service disruption because
T-Mobile would have improved its networks’ ability to manage congestion in the case of a similar event
and would have increased network capacity to maintain the network in a working state even with an
increased volume of traffic.
VI. NEXT STEPS
45. The Bureau plans to engage in stakeholder outreach and guidance regarding industry-
accepted, recommended network reliability best practices to protect against similar outages in the future.
T-Mobile did not follow several network reliability best practices that could have prevented the outage, or
at least mitigated its effects:
Network operators should periodically audit the physical and logical diversity called for by the
design of their network segment(s) and take appropriate measures as needed.
56
The router that
dropped signaling traffic and precipitated this outage could never have provided functional
56
Communications Security, Reliability and Interoperability Council, Best Practice 12-9-0532 (2011),
https://opendata.fcc.gov/Public-Safety/CSRIC-Best-Practices/qb45-rw2t/data; see also FCC, March 8, 2017 AT&T
VoLTE 911 Outage Report and Recommendations, PS Docket No. 17-68 (PSHSB May 2017),
https://apps.fcc.gov/edocs_public/attachmatch/DOC-344941A1.pdf (recommending the same).
17
diversity for the link that failed because the router was not provisioned to process the signaling
traffic that the failed link carried. Further, T-Mobile could have prevented the outage if it had
audited its network during the new router integration to ensure that the traffic destined for the
failed link would redirect to a router that was able to pass it. If the backup route had operated as
it was designed, a nationwide outage would likely not have occurred.
Network operators and service providers should consider validating upgrades, new procedures
and commands in a lab or other test environment that simulates the target network and load prior
to the first application in the field.
57
T-Mobile had a latent software error in its network that it
failed to identify and address before it had a catastrophic impact. Had T-Mobile validated its IP
Multimedia Subsystem registration node software and router integration in a test environment that
simulated the relevant network segment, it could have discovered the software flaw and routing
misconfiguration before they could impact live calls.
Service providers should use virtual interfaces for routing protocols and network management to
maintain connectivity to network elements in the presence of physical interface outages.
58
The
most severe impact on calling that this outage caused occurred when T-Mobile engineers
intentionally took down a link in the course of troubleshooting and then were unable to restore it
for an hour. Had T-Mobile maintained a separate communications channel to enable it to manage
the affected router even when they took the suspected link down during troubleshooting, they
could have maintained superior visibility into the network and potentially resolved the outage
more quickly. T-Mobile implemented this best practice as a corrective action to prevent a
recurrence of this event.
59
Network operators and service providers should actively monitor and manage 911 network
components using network management controls, where available, to quickly restore 911 service
57
Communications Security, Reliability and Interoperability Council, Best Practice 12-10-0559 (2011),
https://opendata.fcc.gov/Public-Safety/CSRIC-Best-Practices/qb45-rw2t/data; see also Communications Security,
Reliability and Interoperability Council, Best Practice 12-9-8748 (2011), https://opendata.fcc.gov/Public-
Safety/CSRIC-Best-Practices/qb45-rw2t/data (stating that network operators, service providers, and equipment
suppliers “should test new devices to identify unnecessary services, outdated software versions, missing patches, and
misconfigurations, and validate compliance with or deviations from an organization’ s security policy prior to being
placed on a network”); Communications Security, Reliability and Interoperability Council, Best Practice 12-9-8035
(2011), https://opendata.fcc.gov/Public-Safety/CSRIC-Best-Practices/qb45-rw2t/data (stating that network operators
and service providers should include steps to appropriately test all patches and fixes in a test environment prior to
distribution into the production environment in their patch/fix policy and process guidelines).
58
Communications Security, Reliability and Interoperability Council, Best Practice 12-10-0409 (2011),
https://opendata.fcc.gov/Public-Safety/CSRIC-Best-Practices/qb45-rw2t/data.
59
CenturyLink’s failure to implement this best practice also contributed to their December 2018 outage. Public
Safety and Homeland Security Bureau, December 27, 2018 CenturyLink Network Outage Report (2019),
https://docs.fcc.gov/public/attachments/DOC-359134A1.pdf. PSHSB issued a Public Notice to remind industry of
the importance of implementing it. Public Safety and Homeland Security Bureau Encourages Communications
Service Providers to Implement Important Network Reliability Best Practices, Public Notice, 34 FCC Rcd 9453
(PSHSB Oct. 15, 2019).
18
and provide priority repair during network failure events.
60
Reasonable 911 network monitoring
would have revealed to T-Mobile in real time that the outage was causing call blocking on PSAP
administrative lines, but the content of T-Mobile’s PSAP notification manifests that it likely did
not understand the extent of its outage’s 911 impact while it was occurring. Had T-Mobile
actively monitored its 911 network components, it might have been able to provide more accurate
PSAP notification.
46. As a result of its investigation, PSHSB has also identified network reliability issues that
network reliability standards bodies could study:
Whether VoLTE providers should prioritize redundancy for links that provide transport for
signaling and registration traffic between IP Multimedia Subsystem cores and other networks;
and
Whether, during any provisioning or rearrangement of IP Multimedia Subsystem routes, a
VoLTE provider should prioritize audits of all signaling and registration traffic that would need to
be rerouted in the event that the IP Multimedia Subsystem becomes unavailable.
61
47. In keeping with past practice, the Bureau plans to release a Public Notice, based on its
analysis of this and other recent outages, reminding companies of industry-accepted best practices,
including those recommended by the Communications Security, Reliability and Interoperability Council,
and their importance.
62
In addition, the Bureau will contact other major transport providers to discuss
their network practices and will offer its assistance to smaller providers to help ensure that our nation’s
communications networks remain robust, reliable, and resilient.
60
Communications Security, Reliability and Interoperability Council, Best Practice 12-9-0574 (2011),
https://opendata.fcc.gov/Public-Safety/CSRIC-Best-Practices/qb45-rw2t/data.
61
T-Mobile could have potentially prevented this outage if it had conducted such an audit as part of the
rearrangement and provisioning process. Note that the audit would not have been timely if scheduled for a later date
because the failure occurred almost immediately after backup facilities were rearranged.
62
See, e.g., Public Safety and Homeland Security Bureau Encourages Communications Service Providers to Follow
Best Practices to Help Ensure Network Reliability, Public Notice, 33 FCC Rcd 3776 (PSHSB 2018).
APPENDIX A:
Outage Timeline
Time (EDT)
Event
12:33 PM
A fiber transport link in the southeast region of T-Mobile’s VoLTE network failed
12:45 PM
The fiber transport link was repaired without intervention
T-Mobile manually shut down the link to the new router in an attempt to transfer traffic
away from the link it suspected was responsible for the ongoing outage.
2:41 PM
T-Mobile began notifying PSAPs
~3:00 PM
IP Multimedia Subsystem VoLTE and Voice over Wi-Fi registrations began to fail
nationwide as all IP Multimedia Subsystem regional registration nodes became
increasingly congested.
3:06 PM
T-Mobile filed a notification in NORS
T-Mobile reduced the number of registrations retries allowed by the IP Multimedia
Subsystem registration nodes.
4:18 PM
Neville Ray, T-Mobile President of Technology, first confirmed the existence of the
outage for T-Mobile’s subscribers and the public on Twitter by describing it as a “voice
and data issue that has been affecting customers around the country,”
T-Mobile attempted to mitigate the outage by requesting that its wholesale transport
provider, Inteliquent, lock inbound local and long-distance traffic.
T-Mobile increased capacity of the registration system by activating additional IP
Multimedia Subsystem registration nodes to increase capacity.
6:25 PM
T-Mobile tweeted in Spanish to confirm the existence of the outage.
7:00 PM
T-Mobile posted an outage statement as a splash screen across all of their digital
properties.
T-Mobile turned off the IP Multimedia Subsystem registration nodes overload controls.
T-Mobile restarted some of the nodes that choose gateways for IP Multimedia
Subsystem calls destined for 3G, 2G, and 911 networks in order to clear congestion
from them.
June 16,
12:46 AM
T-Mobile network restored to a normal working state.
T-Mobile began sending emails and/or phone calls to the same PSAPs to inform them
that the outage was resolved.