status

Domain Service Status Archive

Domain Service is Online

Updated Friday, March 12th, 2010 at 7:55 AM ET
2010-03-12 at 12:55 UTC - Other time zones

Domains services are online.

Update:
We were investigating an intermittent 'internal error' message for parked pages. This has been resolved.

This update is related to

Domain Service is Online

Updated Monday, March 8th, 2010 at 3:07 PM ET
2010-03-08 at 20:07 UTC - Other time zones

Domains services are online. For the past hour, there were no instances of intermittent timeouts for Domain look-ups.

Incident Summary: (updated 22:29 UTC/17:29 ET)
Over the past week, we have been conducting a phased code roll-out for new OpenSRS look-up functionality. These changes will significantly improve the response times for all look-ups (including name suggestion calls).

The new functionality uses parallel streamed look-up elements (name suggestion, calls to individual Registries, etc.) and new levels of caching (in-memory caching of recent look-ups, plus data from zone files) to achieve the response time goals.

We had been performing volume and stress tests in QA and development environments since October 2009, but we recognized that we could not completely reproduce production loads and request profiles. Therefore we have been extensively testing this functionality in small pieces in Production since December 2009. Positive results encouraged us to roll these out as integrated components starting last Tuesday March 2nd.

We steadily increased the load on the new infrastructure through Thursday March 4th, until roughly 80% of load was using the new functionality by the Friday March 5th. On that day we first experienced an issue where particular ‘workers’ (components that process different types of look-up commands) reached a maximum queue size that, due to volume, it could not clear. This resulted in some timeouts back to requesting clients. These workers were restarted and full service resumed within roughly an hour. Analysis then pointed to an issue with the system time being out of sync between components on different machines. This was corrected on all machines and no issues were detected for the remainder of Friday and over the weekend.

However, we experienced the same intermittent domain look-up timeout issue again this morning (Monday March 8th). The development team was prepared for such an occurrence and gathered more granular log information before the workers were again restarted to restore service.

Analysis of the logs indicated that:

1 - we were seeing a request pattern and volume that we did not encounter in the QA and development environments.
2 - our application load-balancing component was reacting to that pattern of requests by favouring certain types of look-ups for faster processing
3 - certain types of workers which had been configured to handle high-volume look-up transactions (mostly .COM and .NET look-ups) were also configured to handle the lower volume (but higher latency) look-ups that were being favoured by the load-balancing function. This eventually caused the queues for high-volume transactions to fill up and not get relieved.

We have now implemented two things to fix the problem:

1 – Added a large amount of worker capacity to the pools
2 – Re-configured the workers to separate the high-volume lookups from all other types

Since then we have seen no re-occurrence of the issues and we believe that, having taken these steps, the service is returned to full stability. As well, we are re-evaluating our QA tests to look for this and similar issues in the future to avoid a recurrence post-release of new code.

This update is related to

Domain Service is Degraded

Updated Monday, March 8th, 2010 at 11:45 AM ET
2010-03-08 at 16:45 UTC - Other time zones

Customers may experience very intermittent service issues with domain look-ups. These blips are under 15 minutes. Our technical teams are monitoring, testing and investigating.

Update: (18:05 UTC/13:05ET)
Our Network Operations Center team advises that instances of intermittent domain look-up timeouts have decreased significantly. All of our technical teams are working to test, monitor and address this issue.

Update: 19:01 UTC/ 14:01ET
Domain look-ups are working well. We implemented a change to our systems and are closely monitoring the results. We are leaving our status message as "Degraded' while we continue to closely test and analyze services.

This update is related to

Domain Service is Online

Updated Monday, March 8th, 2010 at 10:28 AM ET
2010-03-08 at 15:28 UTC - Other time zones

Domains services are online. Customers should no longer obtain intermittent domain look-up issues.

We will provide an incident summary once it is available. Our technical teams continue to investigate.

This update is related to

Domain Service is Degraded

Updated Monday, March 8th, 2010 at 9:52 AM ET
2010-03-08 at 14:52 UTC - Other time zones

We are investigating intermittent domain look-ups including those via the API. Resellers may experience timeouts. We will have more details soon.

Update: (15:00UTC/10:00ET):
Our technical teams are reviewing the logs. We continue to investigate.

15:13UTC/10:13ET
Our Technical teams are testing and obtaining more successful results with look-ups. Services continue to be degraded while we work to analyze and address the symptoms.

This update is related to

Domain Service is Online

Updated Friday, March 5th, 2010 at 8:06 AM ET
2010-03-05 at 13:06 UTC - Other time zones

Domains services are online. We addressed the issue of high connections for domain look-ups. Our technical teams continue to investigate.

Incident Summary: (21:14 UTC)
Domain look-ups were experiencing intermittent connection issues. We notified resellers at 07:45 ET/12:45 UTC. Our technical teams investigated the logs. The symptoms subsided naturally. Services were restored at 08:06 ET/13:06 UTC. We continue to investigate the root cause.

This update is related to

Domain Service is Degraded

Updated Friday, March 5th, 2010 at 7:45 AM ET
2010-03-05 at 12:45 UTC - Other time zones

Resellers may experience intermittent connection issues for domain lookups. Our Technical teams are investigating. We will have more details soon.

Update: (12:59 UTC)
Symptoms have decreased. The technical teams continue to test and review logs. Resellers should see a reduction of look-up issues.

This update is related to

Domain Service is Online

Updated Friday, February 26th, 2010 at 2:01 PM ET
2010-02-26 at 19:01 UTC - Other time zones

We encountered a brief power outage at OpenSRS headquarters. Only our Technical Support's ticketing and phone systems were temporarily unavailable. All other services were fully online.

Domain Service is Online

Updated Tuesday, February 16th, 2010 at 4:44 PM ET
2010-02-16 at 21:44 UTC - Other time zones

All Domains services are online. We addressed the issue which was causing domain look-up/name suggestion errors.

Incident summary (Updated: February 17, 2010 16:17ET / 21:17 UTC):

The incident was caused by a name resolution timeout error on a database component, part of the name suggestion service. The result was that domain look-ups were intermittently unavailable. We advised customers at 16:22 ET/21:22 UTC. Our technical teams quickly identified and addressed it. A minor authorized change to our internal DNS service implemented a number of hours prior to the Incident has been determined as being the root cause for those errors. We advised customers that it was resolved at 16:44ET/ 21:44 UTC.

This update is related to

Domain Service is Degraded

Updated Tuesday, February 16th, 2010 at 4:22 PM ET
2010-02-16 at 21:22 UTC - Other time zones

Resellers may encounter errors with domain look-ups and using the name suggestion tool. Our technical teams are investigating. All other domains services are available.

This update is related to