Email Cluster A Status Archive

Email Cluster A is Online

Updated Thursday, July 29th, 2010 at 3:43 AM ET
2010-07-29 at 7:43 UTC - Other time zones

The event related to provisioning services on Mail Cluster A has been resolved. The root cause appears to have related to a power event at the data centre. We are still waiting for the data centre facilities team to identify the cause of the event and correct it.

In the meantime, we have connected alternative power to restore service. We may perform an emergency maintenance in the near future. Thank you for your patience, we apologize for the inconvenience.

Incident Summary:
Our Network Operations Centre monitoring received alarms for power failures at our Toronto data center. Our technical teams began their investigation at approximately 05:00 UTC and determined that email provisioning services for Cluster A were also degraded. Mailboxes via IMAP, POP and Webmail and inbound/outbound mail flow were unaffected. Provisioning changes such as adding new accounts and password changes were affected.

Our data center provider was engaged to investigate. We confirmed the scope of the issue and notified customers at 05:34 UTC. There was power failure in the data center which affected a rack of equipment. Email provisioning services were restored by 07:43 UTC. Due to the built in systems redundancy, email provisioning was the only affected service. There was a software defect which created an unnecessary dependency between the provisioning subsystem and some of the other subsystems which were affected by the power failure. This software defect will be corrected in one of the future email software releases.

This update is related to

Email Cluster A is Degraded

Updated Thursday, July 29th, 2010 at 1:43 AM ET
2010-07-29 at 5:43 UTC - Other time zones

Our operations team is currently investigating an issue related to degraded performance. We're speaking with our data centre techs and expect to have some more details on this event soon.

Update:
06:16 UTC: a power event in our Toronto data centre has affected provisioning related to mail cluster A. Mail delivery is unaffected, but provisioning-related functions are unavailable. We've escalated the issue and are waiting on the DC facilities team to further investigate onsite.

6:57 UTC: one of our engineers has arrived at the data centre and investigating further. We expect a DC facilities technician to arrive within the hour (08:00 UTC) to further troubleshoot the electrical power issue.

This update is related to

Email Cluster A is Online

Updated Saturday, June 26th, 2010 at 5:00 AM ET
2010-06-26 at 9:00 UTC - Other time zones

The OpenSRS Email Services - Cluster A maintenance is complete.

Email Cluster A is In Maintenance

Updated Saturday, June 26th, 2010 at 12:45 AM ET
2010-06-26 at 4:45 UTC - Other time zones

We have scheduled a four hour network maintenance window for OpenSRS Email Services - Cluster A starting now (05:00 UTC). During this time, we will be updating our calendar services.

Service Impact:

OpenSRS Email Services - Cluster A will be available during the 4-hour window except calendar services. All mailbox access (IMAP, POP, SMTP, Webmail) and provisioning via the Mail Administration Center (MAC) will remain fully online.

Impact for Resellers:
Calendar services will be unavailable for the duration of the window. All other Open SRS Email Services will remain available.

Impact for End-users:

Calendar services will be unavailable for the duration of the window. All other Open SRS Email Services will remain available. Calendar functionality will remain unchanged. This is an infrastructure change only.

Email Cluster A is Online

Updated Sunday, June 13th, 2010 at 4:13 PM ET
2010-06-13 at 20:13 UTC - Other time zones

OpenSRS Email Services - Cluster A are available.

Incident Summary

2010-06-13 15:45 UTC: Our monitoring systems first alerted our Network Operations Centre staff to the presence of slow connections to a number of services and network systems.

2010-06-13 16:00 UTC: Operations staff became involved and began investigating the cause of the incident.

2010-06-13 16:15 UTC: A large number of alerts continued to show that core network components were overloaded, affecting connections to Email Cluster A, Domains/SSL Provisioning and Management, Managed DNS, Storefront. Monitoring probes where registering significant spikes of abnormal traffic across a number of our services. This spike in traffic saturated a number of core network components. The incident was labeled Critical and was escalated to the Executive level.

2010-06-13 16:15 - 20:00 UTC: Using various traffic mitigation strategies, our Operations staff managed to balance all incoming data streams to our datacenter, easing pressure on a number of core network components. After identification and analysis, network elements were configured to drop unwanted traffic at the network edges.

2010-06-13 20:00 UTC: Email Cluster A service returning to normal operation.

2010-06-13 20:15 UTC: Domain Service was set to degraded status, while further traffic mitigation work was performed.

2010-06-13 21:00 UTC: All services online.

This update is related to

Email Cluster A is Degraded

Updated Sunday, June 13th, 2010 at 3:40 PM ET
2010-06-13 at 19:40 UTC - Other time zones

OpenSRS Cluster A are currently degraded. We have been experiencing a high traffic load for approximately 3 hours and 30 minutes. All of our technical teams are fully engaged and investigating.

Email Services including mailbox access (IMAP, POP, SMTP and webmail) are available. However,if you are using our DNS for your email services, you may experience degraded services. Provisioning may also be affected. Email Cluster B is unaffected.

Our senior executive team is engaged and working to obtain details. We apologize for the inconvenience to you and your customers.

PLEASE NOTE: Our main communication channels, including http://status.opensrs.com, were affected during this issue. We will be providing messages via email and our twitter account http://twitter.com/@opensrsstatus will have regular updates.

The OpenSRS Team

This update is related to

Email Cluster A is Online

Updated Saturday, June 5th, 2010 at 5:54 PM ET
2010-06-05 at 21:54 UTC - Other time zones

All OpenSRS Cluster A is fully online. We have resolved the issue and will continue to monitor the service.

Incident Summary:

2010-06-05 17:15 UTC: Our monitoring systems first alerted our Network
Operations Centre staff to the presence of slow connections to a
number of services and network systems .

2010-06-05 17:30 UTC: Operations staff become involved and started
working on the incident.

2010-06-05 17:45 UTC: A large number of alerts continued to show that
core network components were overloaded. Incoming traffic to our
Canadian datacenter location was more than quadruple the normal peak
traffic level. This spike in traffic saturated our upstream links. The
incident was labeled Critical and was escalated to the Executive
level.

2010-06-05 18:00 - 20:00 UTC: Using various traffic routing strategies
over primary and backup data links, our Operations staff managed to
balance all incoming data streams to our datacenter, easing pressure
on a number of core network components. After identification and
analysis, network elements were configured to drop unwanted traffic at
the network edges.

2010-06-05 20:00 - 22:00 UTC: Services delivered out of our Canadian
Toronto Datacenter were returning to normal.

2010-06-05 22:00 - 00:00 EDT: Operations staff noted that the
intensity of the unwanted incoming traffic was decreasing.

2010-06-06 01:00 UTC: Operations staff noted that the attack signature
disappeared completely from incoming data streams.

This update is related to

Email Cluster A is Degraded

Updated Saturday, June 5th, 2010 at 4:22 PM ET
2010-06-05 at 20:22 UTC - Other time zones

Beginning on 2010-06-05 at 17:51 UTC UTC the OpenSRS network experienced extremely high load that overloaded our network and one of our upstream providers.  Since then we have re-distributed load across a number of providers and we believe we have identified the cause of the load and mitigated its impact.

We have restored intermittent service at this time but are keeping service Status at Degraded at this time while we continue to monitor and make adjustments to fully restore service.

An incident report will be provided by 21:00 UTC Monday.

This update is related to

Email Cluster A is Degraded

Updated Saturday, June 5th, 2010 at 2:23 PM ET
2010-06-05 at 18:23 UTC - Other time zones

We are currently experiencing intermittent connection issues. We are investigating the problem.

Email Cluster A is Online

Updated Friday, May 21st, 2010 at 3:06 PM ET
2010-05-21 at 19:06 UTC - Other time zones

Customers using domain aliases to login to webmail should no longer encounter login issues. We tested and implemented a fix.

All other Email services were unaffected.