Gorilla Status

Welcome to the Gorilla status page! This will be updated by the development team to keep you informed of any server issues or planned maintenance that we have coming up.


All Systems Operational


Scheduled Maintenance

There is no future maintenance scheduled at present


Incident History

October 6th, 2020

Server Issues

14:30 UTC After a routine autoscaling of the Gorilla server following increased traffic, the server has been notable underperforming. This has resulted in increased response times and periods of unresponsiveness.

15:00 UTC The lead developer is manually redeploying and restarting the server. However, this process also seems to be taking longer than normal.

Resolved

15:10 UTC Normal operation has resumed. We are continuing to investigate the root cause of the problem and will provide a thorough update soon.

September 17th, 2020

Bad Deploy

13:20 UTC A transient error has caused the main servers to not restart gracefully after deploying some maintenance changes

Resolved

13:25 UTC We've had to force restart the server and all is back up now. Apologies for any inconvenience.

July 9th, 2020

Error on experiment completion

21:30 UTC We're seeing errors on experiment completion, which causes Gorilla to hang. We are investigating the root cause.

Resolved

July 10th, 05:30 UTC We've deployed a fix, so experiments should now behave normally. We tracked the cause down to an unexpected side effect of some new monitoring. If you have been affected by this issue, please file a support ticket and we will assist you.

April 16th, 2020

Unexpected Restarts

15:16 UTC We're seeing some unexpected server restarts, which cause a few minutes of downtime each time they happen.

15:44 UTC We're seeing another batch of restarts, resulting in a few minutes of downtime.

16:09 UTC The server seems to be responding normally, but we will consider this issue ongoing and continue to investigate.

Resolved

19:25 UTC We have seen no restarts beyond the two clusters at 15:16 and 15:44, and so are considering this issue resolved and continuing to investigate the causes.

April 9th, 2020

Server Update

07:16 UTC April Update is being deployed.

07:22 UTC April Update deployed. All systems operational throughout.

February 20th, 2020

Server Update

05:22 UTC February Update is being deployed.

05:31 UTC February Update deployed. All systems operational throughout.

February 4th, 2020

Archive Building Issue

12:27 UTC Experiment Data Archives appear not to be building correctly - the individual files come through empty with just a file path in them. Data files downloaded from the individual tree nodes downloads just fine.

Resolved

21:35 UTC We have resolved the issue. There was a recently introduced bug in the archive building logic, which has now been fixed.

December 12th, 2019

Server Update

06:19 UTC December Update is being deployed.

06:27 UTC December Update deployed. All systems operational throughout.

December 4th, 2019

SSL Issue

16:10 UTC The service that automatically renews our SSL certificates appears to have failed, leaving Gorilla without a valid SSL certificate

Resolved

16:38 UTC We have resolved the issue and Gorilla now has a valid SSL certificate

November 27th, 2019

Server Update

06:20 UTC November Update is being deployed.

06:33 UTC November Update deployed. All systems operational throughout.

October 22nd, 2019

Server Update

04:46 UTC October Update is being deployed.

04:58 UTC October Update deployed. All systems operational throughout.

September 24th, 2019

Server Update

03:20 UTC September Update is being deployed.

03:41 UTC September Update deployed. All systems operational throughout.

August 28th, 2019

Server Update

05:44 UTC August Update is being deployed.

06:01 UTC August Update deployed. All systems operational throughout.

August 22nd, 2019

Delays building experiment data

12:30 UTC We are currently investigating reports that data is being slow to generate and have been able to confirm that a small number of files have stalled in production.

Resolved

August 23rd 2019, 17:00 UTC We have resolved the issue and data generation is back to normal.

August 20th, 2019

Unexpected Server Restarts

14:30 UTC We are currently investigating an ongoing issue with unexpected restarts of our primary server. While the server is recovering quickly after each restarts, the limited periods of downtime (30-90 seconds) have the potential to cause disruption in active experiments.

August 21st 2019, 16:00 UTC We have isolated a potential cause and will be deploying a potential fix as soon as we are able to do so.

Resolved

August 23rd 2019, 10:00 UTC We have resolved the issue and all systems are fully operational.

July 30th, 2019

Server Update

06:20 UTC July Update is being deployed.

06:32 UTC July Update deployed. All systems operational throughout.

June 25th, 2019

Server Update

04:46 UTC June Update is being deployed.

04:55 UTC June Update deployed. All systems operational throughout.

June 11th, 2019

Database Issues

12:10 UTC The same issue that occured yesterday has reoccured, with Gorilla becoming unresponsive. We're looking into it as a matter of urgency.

12:50 UTC While database activity and responsiveness has now returned to normal levels, the whole team are extremely concerned that this problem has occured multiple times in a row. Current development work is being set aside so the team can focus their full efforts on tracking down the problem and resolving it. We are also investigating a restart of our services about 40 minutes ago, which could indicate that a hardware problem on Azure was at least a contributing factor.

18:30 UTC The problem has begun reoccuring intermittently over the last few hours.

Resolved

22:30 UTC We've temporarily increased server hardware and database capacity to allow Gorilla functionality to return to normal while we continue to investigate the root cause of the problem. Gorilla services should no longer be affected.

June 10th, 2019

Database Issues

14:23 UTC A sudden surge in activity and corresponding database requests has caused Gorilla to become unresponsive. We're looking into it as a matter of urgency.

Resolved

14:50 UTC Database activity and system load has returned to normal levels and all systems are fully operational. The primary cause of the activity surge is under investigation and, once identified, measures will be taken to prevent its recurrence.

May 16th, 2019

Server Update

04:29 UTC May Update is being deployed.

04:38 UTC May Update deployed. All systems operational throughout.

April 16th, 2019

Server Update

05:09 UTC April Update is being deployed.

05:17 UTC April Update deployed. All systems operational throughout.

March 28th, 2019

Login Issues

10:30 UTC Some accounts are having trouble accessing Gorilla after accepting the new Terms and Conditions. We have identitfied the problem and are deploying a fix.

Resolved

10:34 UTC All systems are fully operational.

To add a bit of context as to what happened here: there was essentially a null check missing in some new logic that we run when loading the home page (gorilla.sc/admin/home), which caused the home page to hang. As users are redirected to their home page after the terms and conditions, it appeared as if Gorilla was unresponsive, but was only actually the home page that was affected (e.g. /admin/projects or /admin/myaccount) would load fine. Additionally, participants would have been unaffected (they obviously never load that page) and so live experiments would have continued as normal. Sorry for the inconvenience!

March 28th, 2019

Server Update

05:12 UTC March Update is being deployed.

05:27 UTC March Update deployed. There was about 4 minutes of downtime between 05:23 and 05:27; we're investigating the causes.

February 19th, 2019

Server Update

05:36 UTC February Update is being deployed.

05:42 UTC February Update deployed. All systems operational throughout.

January 22nd, 2019

Server Update

05:08 UTC January Update is being deployed.

05:22 UTC January Update deployed. All systems operational throughout.

December 13th, 2018

Server Update

05:19 UTC December Update is being deployed.

05:38 UTC December Update deployed. All systems operational throughout.

November 13th, 2018

Server Update

05:06 UTC November Update is being deployed.

05:24 UTC November Update deployed. All systems operational throughout.

October 16th, 2018

Server Update

04:45 UTC October Update is being deployed.

05:12 UTC October Update deployed. All systems operational throughout.

September 12th, 2018

Server Update

04:39 UTC September Update is being deployed.

04:44 UTC September Update deployed. All systems operational throughout.

August 3rd, 2018

Server Update

05:20 UTC July Update is being deployed.

05:31 UTC July Update deployed. All systems operational throughout.

July 9th, 2018

Database Issues

11:40 UTC A sudden database surge has caused Gorilla to become unresponsive. We're looking into it as a matter of urgency

Resolved

12:09 UTC All systems are fully operational.

June 27th, 2018

Server Update

04:14 UTC June Update is being deployed.

04:26 UTC June Update deployed. All systems operational throughout.

May 24th, 2018

Database Issues

10:01 UTC A minor database migration which should have been quick appears to be taking much longer than expected, causing the system to stop working. We're looking into this as a matter of urgency.

Resolved

10:12 UTC All systems are fully operational.

May 7th, 2018

Server Update

04:00 UTC May Update is being deployed.

07:00 UTC May Update deployed. All systems operational throughout.

March 27th, 2018

Server Update

06:44 UTC April Update is being deployed.

06:51 UTC April Update deployed. All systems operational throughout.

March 13th, 2018

Server Update

09:12 UTC March Update is being deployed.

09:20 UTC March Update deployed. All systems operational throughout.

February 6th, 2018

Server Update

08:44 UTC February Update is being deployed.

08:47 UTC February Update deployed. All systems operational throughout.

December 5th, 2017

Server Update

08:29 UTC December Update is being deployed.

08:37 UTC December Update deployed. All systems operational throughout.

November 7th, 2017

Server Update

08:50 UTC November Update is being deployed.

09:01 UTC There is quite a lot of live traffic, so we're holding off updating the server for a few minutes to see if it dies down

09:16 UTC November Update deployed. All systems operational throughout.

October 4th, 2017

Minor Outage

12:37 UTC The server was suddenly unavailable for around 40 seconds. We have traced this to an unexpected reboot of all our servers. We will follow this up with our cloud hosting provider to ensure it does not happen again.

Resolved

12:38 UTC All systems are fully operational.

September 26th, 2017

Database Issues

09:36 UTC An incorrect configuration setting led to too many database connections being established, flooding the database.

09:42 UTC The database connections have been cooled. Everything appears to be back to normal, but we will continue monitoring the situation.

Resolved

10:10 UTC The database load is back to normal. All systems are fully operational.

September 26th, 2017

Server Update

07:20 UTC October Update is being deployed.

07:34 UTC October Update deployed. All systems operational throughout.

September 12th, 2017

Database Issues

04:00 UTC A migration that was scheduled to run in the background is taking far longer than expected and consuming all database resources. The development team are working to try and speed it up.

09:19 UTC The migration is still running.

13:43 UTC Good grief - this is taking forever. Sorry everyone. We've weighed up the pros and cons of trying to intervene, but we think that any attempt to meddle is just going to create more problems that will take even longer to clear up. We think we just need to sit on our hands and wait.

Resolved

14:10 UTC The migration is now complete and all services are operational.