2016-01: Disaster Recovery

Location:  The Working Centre 58 Queen Street South, Kitchener, ON (plan)
Date: January 11, 2016
Time: 7:00 PM

Sometimes, disaster strikes: lightning, fire, floods or other traumas can wipe out our IT infrastructure. How can we protect against losing everything?

Have you put together disaster recovery plans? Have you conducted disaster recovery exercises? What things are most important to consider? What things are trickiest? What best practices have you learned? Bring your experiences and stories.

Meeting Notes

– How can we protect against losing everything?
– Have you conducted disaster recovery exercises?
– What is most important to consider?
– What is trickiest?
– What best practices have you learned?

Why do disaster recovery?
————————-
– Sometimes external regulatory bodies require disaster recovery exercises
– Large customers might demand this
– Deal with: Corruption, Partial Loss, Complete loss
– Deal with hardware issues: damage, partial, complete hardware loss
– Recovering softwarer after a loss

– How do we manage this for small organizations?

### Convincing management
– How much will it cost if there is a disaster?
– How frequently will this happen?
– How much will this cost in terms of reputation and business continuity?

What to protect against?
————————

### Fires, Floods
– Offsite backups

### Natural vs non-natural disasters

### Hardware failure (especially unrecognized)

– How do we test the integrity of backups in a sane way?
– How do you set up a separate environment independent of the real one?
+ eg Microsoft Exchange restore (needs a domain!)
– Test restores into a virtual machine?

How do we get a test environment set up?

Horror stories?
—————
### Cryptolocker/cryptowall

– Has been recoverable with previous versions/shadow copies
– But you had better not disable UAC
– Use Shadow Explorer
– This will lock your OneDrive as well

### Infected machines
– Home users with malware connecting into the staff network
– Access to machines by careless nephews

### Recover from bad hard drives
– ddrescue, photorec, foremost, testdisk
– RAID disk dies as there is a rebuild

### Loss of webserver
– bad kernel install
– loss of power. On reboot there was a kernel update
– Bob decided to upgrade Ubuntu
– The new version of Ubuntu was not compatible with old /var
– His last backups were from August
– LVM did not have enough room to expand /var
– He put in a new hard drive for LVM
– But his new drive went bad
– He saved his databases with mysqldump –all-databases
+ This was too big
+ There was a logical error in the dump
+ He has to chop up his database file to recover one DB by DB
– He used MariaDB as a “drop-in” replacement, but the file formats are not compatible
– “Oh no! Not another learning experience!”

### Cloud credentials
– Is this different from being locked out of physical systems?
– If your data is in the cloud are you more vulnerable
– But your local systems can be hacked as well
– Should you worry more or less about this?
– eg other people access deleted tweets via the API key
+ API keys can be revoked as well
+ Similar to being blacklisted on email servers, by Rogers

### Identity
– Establishing identities face to face is easier than with strangers
+ Web of Trust

Past Experiences
—————-
### Y2k switchover to another site
– The entire site was to be moved to another site
– This is good for testing the plan
– It took a month of planning
– At Ontario Hydro they had two different locations

### Famous disasters and recoveries
– RIM had some bad things happen
– Insurance companies would have a good sense of this (disaster theory)
– 2003 power failure

Strategies
———-
– Use redundancy for backups
+ Back up more than you need incrementally
+ eg back up to DVD and rsync to another host

– Don’t compress/encrypt backups?

– Back up your passwords someplace offsite and separate

– Choose products that are easier to back up
+ But nonprofits like cheap stuff, and if the proprietary software is cheap then they will use it
+ Organizational reputation has a high dollar value

– Backups need to be easy
+ eg cheap NASes with rsync (eg Synology, Qnap)
+ Having the ability to use the underlying Linux is good

– Auditing backups
+ Use btree file systems (btrfs) with checksums and snapshots
+ Spot checks of data

– High availability systems

– Fireproof safes for removable media

– Testing backups
+ Databases are hard
+ Exchange is hard
+ File shares should be easier? Random restores?

– Most data people had does not change much. Separate static from dynamic data.
+ But this requires user compliance
+ Incremental backups does this to some extent
+ Doing this for nonprofits is even harder because volunteers cannot be trained so well

– Integrated practices vs one-time practices
+ One time: restore to bare metal
+ Integrated: password lists, shadow copies, redundant locations

### Windows backups?

– How to back up personal files on a system
– Installing stuff on Windows will put files all over the place
– Use Windows Backup and Restore
+ This works but your backup drive can be backed up
– Norton Ghost 15 (but this turns off the machine)
– Clonezilla (same)
– Use Restore points
– ShadowProtect from StorageCraft

### Mobile backups
– Use protection software (eg Blackberry protect)
– Google sync
– A bunch of stuff is stored on the server
– (but what about pictures?)
– Syncthing with a one-way sync

This entry was posted in Disaster Recovery, Past Meetings. Bookmark the permalink.

Leave a Reply