Disaster Recovery- Information Management Systems

profileManoKat
DisasterRecoverPlanPaper.docx

Running head: LOOP INC. DISASTER RECOVERY PLAN 1

LOOP INC. DISASTER RECOVERY PLAN 10

Loop Inc. Disaster Recovery Plan

Manogna Telikapalli

BA63570 G2 Disaster Recovery

Professor Gulsebnem (Sheb) Bishop

https://lh4.googleusercontent.com/ewY5ArDTm7Ds_ZZn0Unm86Izw1VnBZd4lnJFSCedswG7mSb8Rec9VeVXda6KnuapGYh0bJOcHxkfwRYwJeFZnP-0fjtVkLgA5SSfUNFYbtxNj0W49QUydpDSqFpZ1zKbqqw_Msh6sCB5E2yE7w

Loop Inc. Disaster Recovery Plan

Executive Summary

Loop Inc. as an e-commerce company that offers different forms of online services, the company needs to have a strong disaster recovery plan. The company has specialized in selling software, applications, and other electronic software products via its e-commerce platform. This business and others such as offering cloud storage services raise to allow the company to transact a lot of data. This paper will highlight on the company’s Disaster Recovery Plan and ways the company can recover its Information Technology function after a disaster. Disasters are both unpredictable and inevitable which raises the need for a disaster recovery plan in place. The guidelines in this plan can be applied to Loop Inc. entirely or on the company’s subsystem within the enterprise.

Loop Inc.’s disaster recovery plan involves or focuses on the amount of downtime which will be measured in days. The recovery plan highlights the different possible causes of disasters by assessing human, natural, or mechanical as the key sources of disasters. The recovery plan also focuses on the effects of disasters and developing guidelines on how to recover from them. The disaster recovery plan identifies and classifies key risks or threats that may lead to disasters for the company. The recovery plan also defines the processes and resources that facilitate or support business continuity during a disaster. The plan also defines the reconstitution mechanism of getting the business back to normal operations and ways of getting through the after-effects of the disaster.

Table of Contents Loop Inc. Disaster Recovery Plan 2 Executive Summary 2 Identification of Disaster Risks 4 Risk Classification 4 Risk Assessment 6 Effects of Disasters 7 Disaster Recovery Mechanisms 9 Disaster Recovery Phases 11 Recommendations 14 References 15

Identification of Disaster Risks

Loop Inc., just like most other online companies faces both direct and indirect risks to its business. The identification of disaster risks is guided by the essential function of the business. These functions set ground for assessing and mitigating the risks. Essential functions to the company are those whose interruption would greatly interpose the flow of business and may result in financial losses (Reason, 2016). Online presence and functionality is the key function of the business. The company relies on its online presence to offer services to its clients. There are other risks but online presence should be given more priority. Risk evaluation was based on a number of attributes as shown below.

Figure 1 Risk Attributes

C:\Users\ACER\Pictures\Risk Attributes.png

Online functionality faces numerous risks such as network failure, power outages, and physical or electronic damage to supporting equipment and facilities. The magnitude of risks in the company is dependent on the affected component and effect to the company’s core functions. The effects of disasters that affect the entire business, for example, server failure are different from those disasters that affect specific sections of the business.

Risk Classification

The evaluation process allowed for the categorization of risks into different classes to help the company accurately prioritize them. Loop Inc.’s risks can be classified as data systems risks, external risks, facility risks, departmental risks, and desk-level risks.

i. Data systems risks

Data systems risks are associated with shared infrastructure usage, for example, software applications, file servers, and networks. The failure of this shared infrastructure can impact different departments of the business. The analysis of these risks has helped in the identification of all specific points of failure within Loop Inc.’s data system’s architecture (Sadgrove, 2016). Inappropriate operation processes can also result in data systems risks

The company may face a lengthy and expensive recovery process from such kind of failures because there may need to update or replace software, equipment, or personnel.

Loop Inc.’s data system risks will be evaluated under the following subcategories:

· Telecommunication systems and network

· Data communication network

· Shared servers

· Data storage or backup systems

· Software applications and bugs

· Viruses

ii. External risks

External risks are those associated with failure outside the enterprise. These risks are noteworthy because they are not under the company’s control. External risks for the company can be natural, human-caused, civil, or supplier related. Natural disasters are key to this disaster recovery plan and on top of the list because they damage a large geographic area. Earthquakes have been noted to be a major risk to Loop Inc. because most of the company facilities are located in earthquake-prone areas (Lan & Mojtahedi, 2017). The chances of mitigating these risks are considerable because meteorological threats can be forecasted and the company can set-up disaster recovery facilities. Human-caused risks can include sabotage, acts of terrorism, crimes, and operations mistakes among others. Civil risks include labor disputes, local political stability, and software legal claims among others. Civil risks can either be internal or external of the company.

iii. Facility risks

Loop Inc. is highly dependent on the wellbeing of its local facilities. Facility risks analysis allows for the consideration of power sources, communication facilities, availability of water, climate control, and ability to avoid or control fire, structural risks, and physical security among others (Lan & Mojtahedi, 2017). The security of the company’s facilities is a mandatory measure to protect assets from both employees and outsiders.

iv. Departmental risks

Failures within specific departments can be a risk to the company. Such risks can include the failure to load given scripts on the company’s system and missing communication links within a department among others. Unavailability of skilled employees can be a risk to the company’s output or performance.

v. Desk-level risks

Successful operations of Loop Inc. are dependent on most of the day-to-day personal work of employees. Desk-level risks have necessitated the analysis and accounting of all processes and tools that facilitate an employee’s job.

Risk Assessment

The risk assessment is based on the completed risk classification. The risks will all be scored and sorted into different categories based on the impact and likelihood. Risk assessment form will form the basis of the score sheet which will be used in the scoring process. The score sheet includes a main risk category and its subcategories in form of groups. Risks are the specific risks in each subcategory or group that can affect the business. Likelihood, impact, and restoration time are estimated in a scale from 1 to 10 but the likelihood is considered as long plan period such as 5 years while the impact is highly sensitive to time.

Below is an example of the company’s risk assessment form with all the keys of a score sheet. Projected rough risk analysis score is arrived at after multiplying the likelihood time, impact time, and restoration time (Webber & Wallace, 2017). There is a total risk score of zero when there is a zero value within one of the two columns. The biggest risk and those that require more attention are put to the top when the score sheet table is sorted in a descending order.

Table 1 Risk Assessment Form

Risk Assessment Form

External risks

Date:

Likelihood

Impact

Restoration Time

Score

Grouping

Risk

0 – 10

0 – 10

0 – 10

Natural disasters

Tornado

0

9

10

0

Severe thunderstorm

1

4

2

8

Earthquake

5

9

10

450

Hail

7

2

6

84

Snow or ice

8

5

7

280

Human-caused risks

Poor skills

1

6

10

60

Human error

2

5

7

70

Sabotage

1

8

2

16

Power supply cut

9

9

2

162

Effects of Disasters

The previous sections of this disaster recovery plan will help in assessing risks and making decisions on where to cover the most critical risks. In this section, the recovery plan will determine and list the most probable effects of each disaster. The disaster recovery process of Loop Inc. will cover these specific effects. Multiple causes have been noted to produce the same effects and these effects can, in turn, lead to other effects. This recovery plan focuses on earthquakes and power supply cut as some of the main risks due to natural disaster and human-caused risks (Reason, 2016). An earthquake leads to the failure of several entities such as office facilities, operations staff, power system, telephone system, and data systems of the company. Below is a sample mapping of the cause, effects, and affected entities in cases of earthquakes or power supply cut.

Table 2 Disaster Affected Entities

Risk (Disaster)

Effects

Affected Entity

Earthquake

Telecom failure

Telephone instruments and network

Desktops destroyed

Desktops and workstations

Office space destroyed

Office space

Power disruption

Power

Operators cannot report to work

Office staff

Data systems destroyed

Data systems

Power supply cut

Data systems powered of

Data systems

Desktops powered off

Desktops/workstations

Power disruption

Power

Telecom failure

Telephone instruments and network

Data network down

Network devices and links

In the above table, it may be noted that several disasters may affect the same entities and this can help identify the entities which are most affected. Data systems and power are the main entities with the highest probability of being affected because they support most of the company’s entities.

Determining the effects of disasters also requires the company, through the recovery plan, to set downtime tolerance limits. The downtime limits will be based on the “Affected Entity” list with each entity having a set downtime limit (Torabi & Sahebjamnia, 2015). The tolerance limit will be sorted in an ascending order and those entities with the least tolerance limit will be highly prioritized for recovery. The cost of downtime has been used as one of the metrics for evaluating downtime tolerance limits.

Table 3 Risk Tolerance Limits

Risk (Disaster)

Affected Entity

Cost of Downtime

Tolerance Limits

0-5

0 - 10

Earthquake

Telephone instruments and network

4

2

Desktops and workstations

3

3

Office space

3

3

Power

4

1

Office staff

2

4

Data systems

5

1

Power supply cut

Data systems

5

1

Desktops/workstations

3

3

Power

4

1

Telephone instruments and network

4

2

Network devices and links

4

1

The investment required for any recovery plan is based on the cost of downtime which can be either tangible or intangible costs. Tangible costs are consequences of the business’s interruption, productivity, and generating less revenue. Intangible costs can be identified as lost opportunities when the company loses reputation and customers approaching competitors among other factors. The recovery plan identifies there are several interdependencies from the affected entities. There are disaster affected entities which will need a detailed recovery sequence, for example, data system restoration is dependent on the restoration of power.

Disaster Recovery Mechanisms

After preparing the list of affected entities and assessing their failure tendency, there is enough laid ground for analyzing different recovery methods available for each entity. This analysis helps to identify the best suitable recovery method for each entity.

i. Data systems

Disaster recovery facilities are key in supporting the effective data redundancy in the company’s onsite data center. These facilities will act as offsite data storage and will also have recovery systems from other entities such as power cuts, network outages, storage, connectivity to paths and devices. To increase redundancy and prevent the need for a disaster recovery, technologies such as the redundant array of independent disks (RAID) and mirroring will be used in the software layer (Chang, 2015). One way of providing fast recovery from any hardware or software error is having on-site data center redundancy because there will not be a need for disaster recovery.

The company’s business needs will determine the nature of each disaster recovery mechanism. Loop Inc. will have several duplications of its data center to ensure the company’s business processes are not affected by any site loss. The company can also build its own data center specifically for disaster recovery purposes with the basic and necessary required hardware to keep the business running. Loop Inc. can eventually opt for a colocation facility where the company can access data center services on a rental basis.

ii. Major incident process

The major incident process will have an objective of efficient resolution of incidents which have a key impact on the company’s critical business processes. The process ensures there are quality and quantity of communication in cases of major incidents. The process also ensures there are sufficient resources for the resolution of any major incident (Torabi & Sahebjamnia, 2015). The major incident process will offer a systematic incident review to prevent similar incidents from reoccurring. To ensure these objectives are realized, Loop Inc. will its customized “Major Incident Handling Plan Model” and it will be anchored on communication.

Figure 2 Major Incident Workflow

The major incident handling model will need a suitable major incident team for it to work successfully. The major incident team will consist of the problem manager, major incident manager, incident manager, and the service desk manager among other members. Loop Inc. will need a team that can accurately and swiftly tackle any incident in question while maintaining good customer relations. The team will also be responsible for the root cause analysis after resolving a given incident.

Disaster Recovery Phases

Loop Inc.’s disaster recovery process will happen under the activation, execution, and reconstitution sequential phases. The activation phase will involve the assessment and announcements of the disaster effects (Kerzner & Kerzner, 2017). The execution phase will involve the execution of the actual procedures for the company to recover from each disaster. The company’s business operations are restored on the recovery facilities or systems. In the last phase, the reconstitution phase, the execution phase procedures are stopped after the original system is restored.

i. Activation phase

The activation phase will involve notification procedures, damage assessment, and disaster recovery activation planning. Notification procedures will be highly dependent on effective communication because they are the primary measures taken as soon as an emergency or disruption has been predicted or detected. Notification procedures will contain the process to notify the recovery personnel during working and outside working hours (Torabi & Sahebjamnia, 2015). A notification will be sent to the damage assessment team after the disaster detection for them to assess the real damage and instrument subsequent actions.

Notifications from one team to another can take place through a pager, telephone, cell-phone or an e-mail. Loop Inc. has a notification policy that describes the procedures to be followed when required personnel cannot be contacted. These policies are clearly documented in the contingency plan. To document primary and alternate contact methods, Loop Inc. will use a call tree as shown below. The call tree has procedures to be followed in cases when a specific individual cannot be contacted.

Figure 3 Call Tree Chart

The contact list on the plan will unmistakably identify staff to be alerted and they will be classified by name, role, and contact information. In cases where disrupted systems are interconnected with external organizations, the plan will provide a point of contact in the given organizations (Richie & Kliem, 2015).

Damage assessment

Damage assessment will help establish ways the contingency plan will be executed when the business’s services are disrupted. The nature and degree of the damage to the system are assessed quickly as conditions permit. The evaluation should be done with personal safety as the highest priority and the damage assessment team should be the first to be notified of the incident and they will use the damage assessment guidelines for investigating different types of disasters (Richie & Kliem, 2015). Power outage in the data center facility can have an assessment on whether power can be restored before the facilities UPS system runs out of static power. If the power cannot be restored, a disaster recovery plan can be activated immediately.

Damage assessment processes will vary with each given emergency and Loop Inc. can use the following general procedures.

· Origin of the disruption or disaster.

· The potential for additional emergencies or damages

· The area affected by the disaster.

· Status of the physical infrastructure

· Inventory of the key equipment.

· Functionality status of the important equipment

· Type of damage to equipment

· Items to be replaced.

· Estimated restoration time for normal services.

Activation planning

The disaster recovery plan should only be activated when a thorough damage assessment has been conducted to avoid stalling normal business operations as a result of false alarms. The Disaster Recovery Committee will do a disaster activation planning depending on the extent of the damage from the disaster (Cook, 2015). The committee's plan should:

· Plan for communication between teams

· Catalog systems and services that need to be restored

· Catalog instructions for reporting failures to the team

· Showtime estimations for each restoration

ii. Execution Phase

The execution phase is involved in bringing up the disaster recovery system, for example, temporal manual processing, operation, and recovery on an alternate system. Sequence recovery activities should include instructions to coordinate with other teams in given situations, for example, when items need to be procured, completion of a key step, and when an action is not realized within the estimated time frame (Lan & Mojtahedi, 2017). Listed recovery procedures will provide detailed processes of restoring the system and its components. Loop Inc.’s procedures for IT service damage will address actions such as:

· Acquire access authorization to damaged premises

· Notify users linked with the system

· Procure needed office supplies and a working space

· Secure and load backup media.

· Restore critical application software and operating system

· Restore system data.

· Test system functionality and security controls

· Connect the system back to other external systems of the network

iii. Reconstitution phase

In this phase, the business’s operations are transferred back to the original facility. Rebuilding can also be done in cases where the original facility is unrecoverable (Lan & Mojtahedi, 2017). This phase may last several days depending on the nature and severity of the destruction. The Disaster Recovery Committee will be involved in:

· Constantly monitoring the site or facility’s suitability for reoccupation

· Verifying the site or facility is free from aftereffects of the disaster

· Establish and maintain connectivity between internal and external systems.

· Ensure full functionality by testing the system’s operations

· Shut down the contingency system

· Arrange for operations as staff return to the original or rebuild the facility.

Recommendations

This disaster recovery plan document should be constantly kept up to date with Loop Inc.’s current organization environment. To maintain the plan documentation, Loop Inc. should be involved in the periodic mock drill. The company should also capture the document’s experience in case of a disaster to help improve the plan. The disaster recovery plan can also be maintained by periodic updates which reflect the current information about the components covered in the DRP.

References Chang, V. (2015). Towards a Big Data system disaster recovery in a Private Cloud. Ad Hoc Networks, 35, 65-82. Cook, J. (2015). A six-stage business continuity and disaster recovery planning cycle. SAM Advanced Management Journal 80.3, 23. Kerzner, H. R., & Kerzner, H. (2017). Project management: a systems approach to planning, scheduling, and controlling. . John Wiley & Sons. Lan, B., & Mojtahedi, M. (2017). Critical attributes for proactive engagement of stakeholders in disaster risk management. International journal of disaster risk reduction, 21, 35-43. Reason, J. (2016). Managing the risks of organizational accidents. Routledge. Richie, G. D., & Kliem, R. L. (2015). Business continuity planning: A project management approach. . Auerbach Publications. Sadgrove, K. (2016). The complete guide to business risk management. Routledge. Torabi, S. A., & Sahebjamnia, N. (2015). Integrated business continuity and disaster recovery planning: Towards organizational resilience. European Journal of Operational Research 242.1, 261-273. Webber, L., & Wallace, M. (2017). he disaster recovery handbook: A step-by-step plan to ensure business continuity and protect vital operations, facilities, and assets. Amacom.