Vault Authentication Service Outage - Summary

Informational
August 12 2024,1:30pm PDT

Vault Authentication Service Outage - Summary

Status: closed
Date: May 15 2024,9:30am PDT
Affected Components:
Veeva CRM Veeva MultiChannel & Integrations Vault CRM Veeva Nitro Consumer Products Mobile Veeva Network Veeva CDMS Veeva Vault Veeva Align Vault CRM Align Veeva OpenData MyVeeva (ePRO, eConsent) Veeva SiteVault ePRO Vault Veeva China SFA Veeva Crossix Veeva Compass Veeva Link Veeva Mobile Veeva RTSM Veeva Sub-processors update Veeva Support Locations MC-01 NITRO-US QualityOne OD-NA MyVeeva-US VDM1 MC-20 VA-US NITRO-EU QualityOne Audit Checklists OD-EU MyVeeva-EU Vault-US PODs SiteVault-US VCRM-US PROD VCRMA-US ePRO Vault-US CDMS-US MC-30 NITRO-AP QualityOne Station Manager VA-US-2 VDM2 MyVeeva-AP Vault-EU PODs SiteVault-EU VCRMA-EU ePRO Vault-EU CDMS-EU VDM3 ENGAGE-01 VA-US-3 Vault-AP PODs SiteVault-AP VCRM-EU PROD ePRO Vault-AP CDMS-AP WC-36 VDM5 VA-US-SBX VA-EU VDM6 VDM20 VA-EU-2 VCRM-AP PROD VA-EU-3 VDM21 VA-EU-4 VDM22 VA-EU-5 VDM30 VA-EU-6 VDM40 SANDBOX VA-EU-SBX SANDBOX2 VA-AP SANDBOX3 VA-AP-SBX SANDBOX20 SANDBOX21 SANDBOX30 CRM-20 CRM-30 SMAR-CN VV1-4001 VV2-5001 VV3-6001 CRM-04 CRM-03 VV2-10 VV3-33 cdb-1 cdb-101 VV1-1082 VV3-3122 VV2-2097 VV2-5000 VV1-4000 VV3-6000 VV1-1187 VV2-2145 cdb-201 VV1-1179 VV2-2144 VV3-3130 VV2-2149 CRM-05 VV1-1 VV2-27 VV3-34 cdb-3 VV1-1110 VV2-2129 VV2-2092 VV3-3121 VV1-1189 VV1-1182 CRM-06 VV1-2 VV2-28 VV3-3048 cdb-5 VV1-1111 VV2-2142 VV1-1191 VV1-1188 CRM-08 VV1-3 VV2-29 VV3-3057 VV1-1160 cdb-7 CRM-10 VV1-4 VV2-30 VV3-3064 cdb-9 VV1-5 VV2-31 VV3-3086 cdb-11 VV1-6 VV2-32 VV3-3096 cdb-13 VV1-7 VV2-35 VV3-3099 cdb-15 VV1-8 VV2-41 VV3-3120 cdb-302 VV1-9 VV2-44 cdb-1002 VV3-3121 VV1-11 VV2-2047 VV3-3122 cdb-1004 VV1-12 VV2-2050 VV3-3123 VV1-17 VV1-13 VV2-2056 VV3-3124 VV1-22 VV1-14 VV2-2060 VV3-3125 VV1-37 VV1-15 VV2-2063 VV3-3127 VV1-1090 VV1-16 VV2-2070 VV3-3128 VV1-1127 VV1-18 VV2-2072 VV3-3129 VV1-1134 VV1-19 VV2-2075 VV1-1139 VV3-3130 VV1-20 VV2-2080 VV1-1148 VV3-3131 VV1-21 VV2-2083 VV1-1158 VV1-23 VV2-2085 VV1-1159 VV1-24 VV2-2087 VV1-1161 VV1-25 VV2-2091 VV1-1174 VV1-26 VV1-1175 VV2-2092 VV1-38 VV2-2095 VV1-1193 VV1-39 VV2-2097 VV1-1194 VV1-40 VV2-2098 VV1-42 VV2-2120 VV1-43 VV2-2121 VV1-1045 VV2-2122 VV1-1046 VV2-2124 VV1-1049 VV2-2125 VV1-1051 VV2-2126 VV1-1052 VV2-2127 VV1-1053 VV2-2128 VV1-1054 VV2-2129 VV1-1055 VV2-2130 VV1-1058 VV2-2131 VV1-1061 VV2-2132 VV1-1062 VV2-2133 VV1-1065 VV2-2134 VV1-1066 VV2-2135 VV1-1067 VV2-2136 VV1-1068 VV2-2137 VV1-1069 VV2-2138 VV1-1073 VV2-2139 VV1-1074 VV2-2140 VV1-1076 VV2-2141 VV1-1077 VV2-2142 VV1-1078 VV2-2143 VV1-1079 VV2-2144 VV1-1081 VV2-2145 VV1-1082 VV2-2146 VV1-1084 VV2-2147 VV1-1088 VV2-2148 VV1-1089 VV1-1094 VV1-1110 VV1-1111 VV1-1120 VV1-1121 VV1-1122 VV1-1124 VV1-1126 VV1-1128 VV1-1129 VV1-1130 VV1-1131 VV1-1132 VV1-1133 VV1-1135 VV1-1136 VV1-1137 VV1-1138 VV1-1140 VV1-1142 VV1-1144 VV1-1145 VV1-1146 VV1-1149 VV1-1150 VV1-1151 VV1-1152 VV1-1153 VV1-1154 VV1-1155 VV1-1156 VV1-1157 VV1-1160 VV1-1163 VV1-1164 VV1-1165 VV1-1166 VV1-1167 VV1-1168 VV1-1169 VV1-1170 VV1-1171 VV1-1172 VV1-1173 VV1-1176 VV1-1178 VV1-1179 VV1-1180 VV1-1181 VV1-1182 VV1-1183 VV1-1184 VV1-1185 VV1-1186 VV1-1188 VV1-1190 VV1-1192 VV1-1193 VV1-1194 VV1-1195 VV1-1196
Update

May 15 2024,9:30am PDT

May 15 2024,9:30am PDT

Outage Retrospective and Next Steps


The following is an update regarding the Vault outages that occurred on 22, 23, and 24 April 2024. 


The Incident Report was made available on May 7 and details the changes implemented, particularly on the evening of 24 April, that led to Vault Auth stability. The Incident Report is available via Support or your account team.


The six areas of Corrective and Preventative Actions (CAPAs) tracked in Veeva’s QMS, covering the following areas: 


  • ​​VAULT AUTHENTICATION SERVICES ARCHITECTURE

  • ​​VAULT APPLICATION CODE AND VAULT INFRASTRUCTURE 

  • ​​CONTINUED INVESTIGATION 

  • ​TEST ENHANCEMENTS 

  • ​MONITORING AND LOGGING 

  • ​VAULT CAPACITY PLANNING 


The detailed work to size and scope the specific action is now underway, with a target of August 7th for a definitive set of plans. In addition to the formal CAPA process, Vault engineering is undergoing a complete code review to identify any other, similar, code inefficiencies. These changes will be available once fully tested. In addition, more awareness and oversight has been given to Vault application design and coding practices pertaining to the use of Vault’s Auth service.  


For further details, please refer to the Incident Report.

Incident Summary

May 31 2024,2:22pm PDT

May 31 2024,2:22pm PDT

Outage Retrospective (31 May 2024)


The following is an update regarding the Vault outages that occurred on 22, 23, and 24 April 2024.


Recall from the last update (15 May) that the Incident Report (IR) was made available on 7 May. The IR identified six areas of Corrective and Preventative Actions (CAPAs) tracked in Veeva’s QMS, covering the following areas: 

  • ​​VAULT AUTHENTICATION SERVICES ARCHITECTURE

  • ​​VAULT APPLICATION CODE AND VAULT INFRASTRUCTURE 

  • ​​CONTINUED INVESTIGATION 

  • ​TEST ENHANCEMENTS 

  • ​MONITORING AND LOGGING 

  • ​VAULT CAPACITY PLANNING 

To date there are 12 groupings of Development efforts applied across the 6 CAPAs. Each grouping (or "epic") encompasses a range of discrete efforts. Many efforts have been addressed. Many are currently being worked. Others have been or are being scoped and assigned.


To illustrate, one epic exists to reduce Vault Auth calls by caching operational metadata information. This particular epic pertains to the CAPA: VAULT APPLICATION CODE AND VAULT INFRASTRUCTURE. There are 100 discrete work efforts within this epic. As of this update, 43 of the items have been addressed.


Many of the other epics are of a similar scale and composition.


The work done to categorize, size, assign, and work these items puts us in a good position to meet the 7 August due date to assess the ultimate completion dates of all discrete efforts representing a given CAPA.


This work is over and above the awareness and oversight given to Vault application design and coding practices pertaining to the use of Vault’s Auth service.  

Incident Summary

August 12 2024,1:28pm PDT

August 12 2024,1:28pm PDT

Outage Retrospective and Next Steps (12 August 2024)


The following is an update regarding actions taken by Vault teams to address the root causes of the Vault Auth outages that occurred on 22, 23, and 24 April 2024. 


Recall that an Incident Report was provided on 7 May 2024 that detailed the changes implemented to that time, particularly on the evening of 24 April 2024, that led to Vault Authentication Services (“Vault Auth”) stability. (The Incident Report is accessible via your Veeva account team members.)


The Incident Report further outlined six Corrective and Preventative Actions (CAPAs) tracked in Veeva’s QMS, covering the following areas: 


  • ​​VAULT AUTHENTICATION SERVICES ARCHITECTURE

  • ​​VAULT APPLICATION CODE AND VAULT INFRASTRUCTURE 

  • ​​CONTINUED INVESTIGATION 

  • ​TEST ENHANCEMENTS 

  • ​MONITORING AND LOGGING 

  • ​VAULT CAPACITY PLANNING 


Each CAPA represents a significant effort on its own. In the months since the outage events, Vault teams have engaged in concentrated efforts to complete the CAPAs. While a target of 7 August 2024 was initially established to determine completion dates for the six CAPAs, and this has been accomplished, much work has gone into addressing each CAPA in addition to sizing and scoping remaining items. As a consequence of the work starting at the time of the event continuing unabated since, Vault Auth pressure has been significantly reduced and Vault Auth stability has been attained.


Remaining efforts now mostly entail complex architectural modifications requiring multiple releases to implement. The aim of these efforts is to allow for continued Vault platform growth and stability for years to come.


What follows is a summary of each CAPA with their respective anticipated release completion timeframes.


VAULT AUTHENTICATION SERVICES ARCHITECTURE

[DEV-714916] Enhancements to Vault Auth architecture

Target release for completion: 25R1


A key aspect of this CAPA is to offload Vault Auth of all but user authentication handling responsibility–a significant architectural change. This and other efforts represented by this CAPA are divided among 4 separate development threads. They encompass, in total, 45 individual work efforts, 22 of which have been closed. The remaining items require work efforts that span releases given their complexity and scope.


VAULT APPLICATION CODE AND VAULT INFRASTRUCTURE

​[DEV-714918] Enhancements to Vault application code and Vault infrastructure to reduce Vault Auth calls

Target release for completion: 24R3.3


This effort constitutes 8 major efforts. Combined, these efforts specify 326 individual work efforts, of which 284 are completed. Remaining efforts targeted for the 24R3.3 Limited Release will materialize in the 25R1 General Release.


​​CONTINUED INVESTIGATION 

[DEV-714923] A comprehensive investigation continues into the outages

Target release for completion: 25R1


18 of the 20 tasks associated with this effort have been completed. The CAPA remains open to allow for the remaining two items to be accomplished in a deliberative manner.


​​TEST ENHANCEMENTS 

[DEV-714924] Expand test processes to identify excessive Vault Auth accesses earlier in the development cycle

RESOLVED


This CAPA consisted of 3 discrete efforts that stress tested Vault Auth and led to scripts that can be reapplied in ongoing performance testing.


MONITORING AND LOGGING 

[DEV-714926] Improve Vault monitoring tools to support more granular analysis when events occur

Target release for completion: 25R1


There are a total of 9 tickets constituting this CAPA, with 4 completed to date. Remaining items are architecturally complex and will take time to implement.


​​VAULT CAPACITY PLANNING 

​[DEV-714928] Ensure that expansion of Vault PODs and Vault instances accounts for impacts to Vault Auth

RESOLVED


This effort introduces enhancements to documentation and tracking of Vault POD and Vault capacity planning to reflect additional context into Vault POD and Vault capacity planning processes.


In summary, Veeva continues to harden Vault Auth while simultaneously reducing its scope of activities. The actions taken to date have led to gains in stability. Remaining actions build on this momentum and improve the scalability and reliability of Vault Auth to accommodate future expansion.