Vault Service Disruption affecting all PODs in Region Veeva Align, Veeva Vault, Veeva CDMS, MyVeeva (ePRO, eConsent), Veeva SiteVault, Vault CRM

Incident
April 25 2024,12:06pm PDT

Vault Service Disruption affecting all PODs in Region Veeva Align, Veeva Vault, Veeva CDMS, MyVeeva (ePRO, eConsent), Veeva SiteVault, Vault CRM

Status: closed
Start: April 24 2024,6:17am PDT
End: April 24 2024,8:46am PDT
Duration: 2 hours 29 minutes
Affected Components:
Vault CRM Veeva CDMS Veeva Vault Veeva Align Vault CRM Align Veeva SiteVault VA-US Vault-US PODs VA-US-2 VA-US-3 Vault-EU PODs VA-US-SBX Vault-AP PODs VA-EU VA-EU-2 VA-EU-3 VA-EU-4 VA-EU-5 VA-EU-6 VA-EU-SBX VA-AP VA-AP-SBX VV1-1189 VV2-2145 VV1-1 VV2-10 VV3-33 cdb-1 cdb-101 VV1-1082 VV3-3122 VV2-2097 VV2-5000 VV1-4000 VV2-5001 VV3-6000 VV3-6001 VV1-1187 cdb-201 VV1-2 VV2-27 VV3-34 cdb-3 VV1-1110 VV2-2129 VV1-4001 VV3-3121 VV1-3 VV2-28 VV3-3048 cdb-5 VV1-1111 VV1-1191 VV1-4 VV2-29 VV3-3057 VV1-1160 VV2-2092 cdb-7 VV1-5 VV2-30 VV3-3064 VV2-2142 cdb-9 VV1-6 VV2-31 VV3-3086 cdb-11 VV1-7 VV2-32 VV3-3096 cdb-13 VV1-8 VV2-35 VV3-3099 cdb-15 VV1-9 VV2-41 VV3-3120 VV1-11 VV2-44 cdb-1002 VV3-3121 VV1-12 VV2-2047 VV1-13 VV2-2050 VV3-3123 VV1-17 VV1-14 VV2-2056 VV3-3124 VV1-22 VV1-15 VV2-2060 VV3-3125 VV1-37 VV1-16 VV2-2063 VV3-3127 VV1-1090 VV1-18 VV2-2070 VV3-3128 VV1-1127 VV1-19 VV2-2072 VV3-3129 VV1-1139 VV1-20 VV2-2075 VV1-1148 VV1-21 VV2-2080 VV1-1158 VV3-3131 VV1-23 VV2-2083 VV1-1159 VV1-24 VV2-2085 VV1-1161 VV1-25 VV2-2087 VV1-1174 VV1-26 VV2-2091 VV1-1175 VV1-38 VV2-2092 VV1-39 VV2-2095 VV1-40 VV1-42 VV2-2098 VV1-43 VV2-2120 VV1-1045 VV2-2121 VV1-1046 VV2-2122 VV1-1049 VV2-2124 VV1-1051 VV2-2125 VV1-1052 VV2-2126 VV1-1053 VV2-2127 VV1-1054 VV2-2128 VV1-1055 VV1-1058 VV2-2130 VV1-1061 VV2-2131 VV1-1062 VV2-2132 VV1-1065 VV2-2133 VV1-1066 VV2-2134 VV1-1067 VV2-2135 VV1-1068 VV2-2136 VV1-1069 VV2-2137 VV1-1073 VV2-2138 VV1-1074 VV2-2139 VV1-1076 VV2-2140 VV1-1077 VV2-2141 VV1-1078 VV2-2142 VV1-1079 VV2-2143 VV1-1081 VV1-1084 VV1-1088 VV1-1089 VV1-1094 VV1-1120 VV1-1121 VV1-1122 VV1-1124 VV1-1126 VV1-1128 VV1-1129 VV1-1130 VV1-1131 VV1-1132 VV1-1133 VV1-1135 VV1-1136 VV1-1137 VV1-1138 VV1-1140 VV1-1142 VV1-1144 VV1-1145 VV1-1146 VV1-1149 VV1-1150 VV1-1151 VV1-1152 VV1-1153 VV1-1154 VV1-1155 VV1-1156 VV1-1157 VV1-1163 VV1-1164 VV1-1165 VV1-1166 VV1-1168 VV1-1169 VV1-1170 VV1-1171 VV1-1172 VV1-1173 VV1-1176 VV1-1178 VV1-1180 VV1-1181 VV1-1183 VV1-1184 VV1-1185 VV1-1186 VV1-1190
Update

April 24 2024,6:17am PDT

April 24 2024,6:17am PDT

Veeva Vault PODs  Veeva Align, Veeva Vault, Veeva CDMS, MyVeeva (ePRO, eConsent), Veeva SiteVault, Vault CRM are currently experiencing a service disruption. Veeva engineering teams are working to return the service to normal as quickly as possible.

Update

April 24 2024,6:53am PDT

April 24 2024,6:53am PDT

We are continuing to investigate the issue.  We are in the process of reinitiating the Vault service across the regions.  Another update will be provided within 30 minutes.  Thank you for your patience.

Update

April 24 2024,7:15am PDT

April 24 2024,7:15am PDT

We are currently performing tests across the regions.  Vaults will remain in maintenance during this initial testing period.  As testing completes, Vault services will be resumed.  

Update

April 24 2024,7:51am PDT

April 24 2024,7:51am PDT

We are very sorry about this Vault side outage and are working as hard and fast as we can.. AP , EU and US West service is currently available. We are continuing to see issues, especially with US East.

I will post again within 15 minutes.

 - Avril England (General Manager, Vault)

Update

April 24 2024,8:10am PDT

April 24 2024,8:10am PDT

We are continuing to see authentication service spikes from US East and are working to simultaneously isolate the issue and stabilize the service.   AP, EU and US West remain online for now.  East remains unstable.

Next post will be at 8:30am Pacific. 

 - Avril England (General Manager, Vault)

Update

April 24 2024,8:30am PDT

April 24 2024,8:30am PDT

As of 8:20 a.m. Pacific, production service has been restored across all regions and appears to have stabilized. Sandbox service is offline for US East while we continue to troubleshoot the issue.

There is no evidence to suggest that this is related to a DDoS attack.  However, we have been seeing an increase in the number of calls, which in turn is creating a backlog that is not dissipating normally.  This backlog prevents authentication requests from being processed on a timely basis from the user activity on the Vaults, resulting in a Vault outage.

Please know that we are treating with the utmost urgency across Veeva and recognize the significant impact to your businesses.  

Next update at 8:45 a.m. Pacific.

  • Avril England, General Manager Vault

Update

April 24 2024,8:45am PDT

April 24 2024,8:45am PDT

US East PODS are still seeing some instability and users are experiencing slow / unresponsive Vaults. US East sandboxes remain offline.  All other regions are currently stable. 


We are focusing our efforts on US East production at this time.


Next update at 9:00 a.m. Pacific.

  • Avril England, General Manager Vault

Resolved

April 24 2024,8:46am PDT

April 24 2024,8:46am PDT

Service remains stable for all production vaults in all regions.

Update

April 24 2024,9:00am PDT

April 24 2024,9:00am PDT

AP, EU and US West Vaults remain online and available.


US East Vaults are now online but some Vaults had continued to experience the lingering effects of the outage.  These lingering issues should now be addressed. The team is still carefully monitoring all production operations for further issues. 


Next update at 9:15 a.m. Pacific.


  • Avril England, General Manager Vault

Update

April 24 2024,9:19am PDT

April 24 2024,9:19am PDT

All Vault regions are online with production Vaults. US East Sandboxes remain offline.

The team continues to monitor the load and it appears to have stabilized.


Next Update at 9:30 a.m. Pacfic.


  • Avril England, General Manager Vault

Update

April 24 2024,9:30am PDT

April 24 2024,9:30am PDT

All production vaults remain stable and available.


We are now going to begin bringing US East sandboxes slowly back online.

This slow reintroduction will be done with the goal of identifying which, if any, Vaults are creating unusual load on the authentication services. 


I will provide the next update at 10:00 a.m. Pacfic.

  -Avril England, General Manager Vault

Update

April 24 2024,10:00am PDT

April 24 2024,10:00am PDT

Service remains stable for all production vaults in all regions.

The US East sandboxes are being slowly back online. As of 10am Pacific, we are approximately 15% complete with restoring service to US East sandboxes.

Sandboxes in all other regions are operating normally.

The team is currently working on (1) restoring remaining sandbox service, (2) analyzing data for root cause and (3) evaluating changes to protect the service while root cause investigation continues.

I'll provide the next update at 10:30 a.m. Pacific.

 - Avril England, General Manager Vault


Update

April 24 2024,10:30am PDT

April 24 2024,10:30am PDT

Service remains stable for all production vaults in all regions.

The US East sandboxes are approximately 50% back online now.  We are continuing the gradual restoration of service to the remaining 50%. 

Sandboxes in all other regions are operating normally.

Given the criticality of the situation, establishing root cause and sharing it with you is of the utmost importance. I will post information regarding root cause, mitigation strategies and preventive measure as soon as they are known. 

An official RCA / Incident Report covering all three incidents will be available 10 working days after the root cause has been established.  

I'll provide the next update at 11:00 a.m. Pacific.

 - Avril England, General Manager Vault

Update

April 24 2024,11:00am PDT

April 24 2024,11:00am PDT

All Vaults are back online across all regions.

Mitigation strategies for peak times, while this issue persists, are under review and I will share them as soon as we finalize the plans for tomorrow.

Root cause analysis also continues with urgency.

I will post my next update at 12:00 Pacific.

 - Avril England, General Manager Vault



Update

April 24 2024,12:00pm PDT

April 24 2024,12:00pm PDT

The Vault service is fully operational across all regions for production and sandbox vaults. 

Our engineering teams remain focused on investigating root cause and exploring mitigations for system stability while analysis continues.

We are in the process of scheduling a customer zoom call with Veeva's senior leadership for tomorrow.  I'll post the information with the next update at 1:00 p.m. Pacific.

 - Avril England, General Manager Vault

Update

April 24 2024,1:00pm PDT

April 24 2024,1:00pm PDT

The Vault service is fully operational across all regions for production and sandbox vaults. 

For tomorrow, Thursday, April 25,  Vault Sandboxes will be offline during the peak period from 2 a.m. - 9 a.m. Pacific.  This is not a fix for the load issue, but rather a mitigation effort, with the goal of reducing load on Vault's authentication service. 

We are also hosting a zoom meeting where Peter Gassner, Veeva CEO and I  will provide an overview of the outage details and corrective actions.  We will also be available for questions as part of that call.  Please register here: https://veeva.zoom.us/webinar/register/WN_XzbLJcigTMidsumr4263eA

I will not post further updates today, unless the status of the Vault service changes. 

Again, thank you for your patience as we work through this issue.

 - Avril England, General Manager Vault

Incident Summary

April 25 2024,7:15am PDT

April 25 2024,7:15am PDT

The service continues to operate normally across all PODS in all regions.  It was not necessary to remove the sandboxes from the service.  

Two fixes were deployed last night to reduce the traffic on the service.  This, in addition to the capocity increase on Tuesday night, has resulted in system stability today.  

We are closely monitor the service on high alert while also continuing forensics analysis to look for other areas of improvement in how are applications interact with our authentication services.

- Avril England, General Manager Vault

Resolved

April 25 2024,12:06pm PDT

April 25 2024,12:06pm PDT

We hosted a zoom meeting today where Peter Gassner, Veeva CEO and I, provided an overview of the outage details.  Thank you to those that joined.  For that that could not,  a summary of that call, and recap of the outage issue and remediation can be found here: https://veeva-trust.status.page/incident/413153