Since 2018, we've made an annual post detailing our infrastructure and planning efforts to serve your library, your staff, and your patrons after a natural disaster, and share the additional improvements we've made in the last year to our Disaster Preparedness and Recovery plans. We've spent a significant amount of time building infrastructure, reviewing procedures, and planning for an untimely incident. For this year's post, we starting with our most recent upgrades and working backwards to our original September 2018 update.
2021 Updates & Improvements
To prepare for this year's post, we asked Lee what he felt were important updates or changes made in 2021, or for any other info he wants the KLAS Users' Community to know about our efforts. The two items he mentioned are:
- The recovery process is the the same as before, but note that restoration of databases is sequential (per server), not parallel. So, while a one-hour recovery for a database at top of the queue is possible others at the bottom of the list will have a longer wait as the recovery process works through the list.
- We have added weekly server snapshots to our AWS backup servers. For an end user, this makes no difference to restoration, but it makes Keystone staff's jobs infinitely easier.
2020 Updates & Improvements
Much of what we put into place as part of our disaster preparedness plan in 2020 is what allowed our staff to begin to work from home in March of that year and continue to do so even today and for the foreseeable future.
On September 1, 2020 posted a list disaster recovery and preparedness process and infrastructure improvements we'd make over the past year such as:
- Cloud-based databases running in multiple regions, to better place the system geographically near the library
- Incremental transaction data backups happen every 10 minutes
- Database backups are saved in the local region, as well as to a separate region. If a database hosted on the east coast has a disaster, there is a copy of the database backup in another region.
- Database backups are saved to the local server, as well as copied to S3 storage
- Database backups are also copied from Amazon's data centers to Google Storage
- A new automated system restoration process which had been manual prior to the beginning of 2020. As an automated process, it takes about an hour while the previous manual process took 10-12 hours at a minimum.
2019 Updates & Improvements
On August 15, 2019, we shared the news of some additional steps we'd taken to further enhance our disaster preparedness and recovery efforts including:
- Back-end changes to KLAS 7.7 to it quicker and easier to create and store database back-ups
- A move to storing back-ups in the cloud, so they are safe and retrievable no matter where disaster strikes
- A new monitoring app so staff can be notified right away if something goes wrong with the servers and any emergencies can be dealt with as quickly as possible
2018 Updates & Improvements
Our September 11, 2018 our first Emergency Recovery & Disaster Preparedness Key Notes Blog post was written as we faced the threat of Hurricane Florence and shared details about we'd implemented at that point including a combination of both procedural and physical preparedness such as:
- A gas-powered generator at our office
- Redundant internet providers, firewalls, and network routers
- Daily backups of data to our on-site servers
- Weekly data backups stored offsite
- Encrypted database backups on AWS S3
- VOIP Telephone system to allow staff to work remotely
- Keystone Status Page to communicate database availability, even if we’re unreachable
- Contingency plans and equipment needed for remote database and customer support
Here in North Carolina we usually think of late summer / early fall as the start of hurricane season. Well, this year is different (as is everything else in 2020) and we've already had a couple of named hurricanes develop, with one hitting Outer Banks in late July. Therefore, we wanted to go ahead and review what we do ensure we can continue to serve your library, your staff, and your patrons after a natural disaster, and share the additional improvements we've made this year to our Disaster Preparedness and Recovery plans. We've spent a significant amount of time building infrastructure, reviewing procedures, and planning for an untimely incident.
In fact, much of what we put into place as part of our disaster preparedness plan is what allowed our staff to begin to work from home in March and continue to do so even today and for the foreseeable future.
Our September 11, 2018 Key Notes Blog Post was written as we faced the threat of Hurricane Florence and shared details about we'd implemented at that point including a combination of both procedural and physical preparedness such as:
- A gas-powered generator at our office
- Redundant internet providers, firewalls, and network routers
- Daily backups of data to our on-site servers
- Weekly data backups stored offsite
- Encrypted database backups on AWS S3
- VOIP Telephone system to allow staff to work remotely
- Keystone Status Page to communicate database availability, even if we’re unreachable
- Contingency plans and equipment needed for remote database and customer support
On August 15, 2019, Katy posted to share the news of some additional steps we'd taken to further enhance our disaster preparedness and recovery efforts including:
- Back-end changes to KLAS 7.7 to it quicker and easier to create and store database back-ups
- A move to storing back-ups in the cloud, so they are safe and retrievable no matter where disaster strikes
A new monitoring app so staff can be notified right away if something goes wrong with the servers and any emergencies can be dealt with as quickly as possible
Today I'd like to share this year's improvements to our disaster recovery and preparedness process and infrastructure, which include:
- Cloud-based databases running in multiple regions, to better place the system geographically near the library
- Incremental transaction data backups happen every 10 minutes
- Database backups are saved in the local region, as well as to a separate region. If a database hosted on the east coast has a disaster, there is a copy of the database backup in another region.
- Database backups are saved to the local server, as well as copied to S3 storage
- Database backups are also copied from Amazon's data centers to Google Storage
- A new automated system restoration process which had been manual prior to the beginning of 2020. As an automated process, it takes about an hour while the previous manual process took 10-12 hours at a minimum.
Important Coronavirus Info
ExpiredAs efforts to slow the spread of the Coronavirus / COVID-19 ramp up, we wanted to share some info with you all:
1) Keystone Support will still be available
We are prepared to work from home if needed, and currently have no plans to close or restrict hours.
2) Working from home with KLAS
Working from home can't help you keep up with circulation during closures, but letting Reader Advisors (and other staff who do their jobs only or primarily in KLAS) work from home is easy with v7.7.
Because KLAS v7.7 secures your connection with seamless HTTPS instead of a VPN, Keystone-Hosted users can connect from anywhere. So long as your staff have access to home internet, they can take KLAS home with them. It will just work--no VPN or additional set-up required.
If you are still on Keystone-hosted version 7.6, you can connect from home using the instructions we gave after recovering from the SAN failure. No special VPN required, just the correct version of the OpenVPN software. If you need us to re-send instructions, let us know.
For Self-Hosted users, additional set-up may be needed to connect to your servers off-site. For 7.7, we should be able to create an activation key that will allow remote access. For 7.6, start with your own IT, and let them know that we're here to assist if needed.
Finally, if you have reduced staff handling your mail, we have strategies in this Forum Post: Short Staffed? Strategies to keep up.
3) Quarantined Inventory
If you need to quarantine returned books or other inventory from some or all of your patrons, contact Customer Support.
We can help you set up an addition circulation status (NAC:QTN), allowing you to track what inventory is currently set aside to allow any contamination to die off. If you find other measures are called for, we can discuss your needs and help you determine the best strategy.
In short, please know that we are here to support you and any efforts you are taking to stay healthy and protect your patrons.
First, we'd like to share a note with you from James Burts, Keystone Executive Vice President,
"Dealing with the new realities of Covid-19 has certainly been a very strange time. At Keystone, we began having staff wanting to work from home and self-isolate on March 12th, and over the following 2 weeks had increasing numbers of staff opting to work from home. As of March 30th, our local county mandated that we all self-isolate and work from home. I certainly hope you’ve not seen a change in our ability to support you all.
Fortunately, we’d already taken the steps necessary to allow all our staff to work from home effectively— steps we’d taken expecting that it would help us in the event of a snowstorm or hurricane that made roads unsafe. Instead, the road are nearly empty, but it’s simply getting people together that’s unsafe. Who would have thought??
We continue to be available to help you and your staff in anyway we can. Whether that’s helping create new workflows to quarantine materials, or helping your staff work on record cleanup while they are working from home without access to your collection— we are here for you all. We have provided some ideas for managing these strange times on KlasUsers.com, and are always interested in hearing other ideas you may have. If you have any questions, or any ideas that you would like to implement, please reach out to us. We’ll be happy to help talk through your thoughts, and help address your needs."
Next up, as part of our ongoing work to support you, a few more tips and tricks, this time for:
- Strategies for serving patrons when you have a restricted card run
- Blocking service to prisons or other institutions
Or you can follow these links for our previous suggestions:
- How can you connect to KLAS if you use a Mac?
- How can I quickly increase the number of books we're sending our patrons and / or titles we're duplicating onto a cartridge for them?
- Do I need to shut off Nightly?
- WebOPAC Notice
- Emailing your newsletter
- Record clean-up
Tip 1:
Strategies for serving patrons when you have a restricted card run.
Nightly sorts patrons that need service by:
- Serve Code (least frequent to most frequent, with List Only ahead of Autoselect)
- Last Served date (none to oldest to most recent)
This gives priority to patrons who haven’t been served for a while, and gives List Only patrons a better chance of getting their titles before they go out to Autoselect patrons. Under normal circumstances, this setup ensures that everyone will be served in a reasonable timeframe, even if you restrict your card run and don’t get to everyone who needs service each day.
However, these are not normal circumstances. If you’re currently running on a skeleton crew and severely limited card run, your Nightly Auto patrons might languish at the end of the list.
While these circumstances are in place, or even when the floodgates re-open and you need to play catch-up, you may want to switch up this order from time to time. If you would like to change up the order of the Nightly sort to give different patrons a shot at getting books, please contact Keystone Customer Support—and then be sure to let us know when we should put it back.
Tip 2:
Blocking service to prisons or other institutions:
If your Department of Corrections requires that service is suspended during this time, we can apply a block to all inmates for you. This will stop all circulation to those patrons for a specified period of time, though it does not impact their NLS direct magazines. Please let us know how to identify incarcerated patrons (such as by Patron Type), and how long you need the block to remain in place.
If a nursing home or other facility requests that you stop service to their patrons, you can apply a similar block. You would first need to set up a "Quarantine" block (let us know if you need assistance). Then, find the patrons who live in that place and add the block to each record. If you don't already have relationships set up linking the patrons to the facility, you can find them by querying on the address.
Quick Search –
- Main Status | Equals | A
- City | Equals | Raleigh
Advanced Search –
- Address | Street Address | Matches | 8016 Glenwood
This query will limit your results to active records in the target city, with an address matching the facility’s street address. The idea is to be just specific enough, hence why I recommend searching the street address only for the number and street name. If it isn’t a common street, you might even leave off the number, and review results to see if the facility has multiple buildings.
And that's it for this week's tips! We hope these have been helpful for you; please continue to let us know how we can help!
As thunder sounds over the Keystone offices and we enter peak hurricane season, it seems like a good time to revisit our Disaster Preparedness and Recovery procedures.
We’ve posted about this before, and the 2018 Post is still applicable, so feel free to have a look back at that one. But technology is ever-evolving, and we’ve been keeping up. Here’s an overview of some changes:
7.7 Procedures
The back-end changes in KLAS version 7.7 mean that creating and restoring backups is a different process from 7.6. As 7.7 was being created, new procedures were researched, tested, and implemented to ensure that data would be well-maintained going forward.
Cloud Storage
We have increasingly been pivoting to storing back-ups in the cloud, so that they are safe and retrievable no matter where disaster strikes.
Keeping that data secure and private is of course a high priority. We’ve also done extensive testing on the best methods for generating those backups and restoring them, so we can be confident that all the data is being kept, that it’s refreshed on the right schedule, and that we can get it back in place on our local servers ASAP if needed.
Finally, those cloud servers need routine maintenance and updates. As we need more of them, that has made a lot more work for Lee, who keeps on top of regular system updates for all of our servers including the cloud-based ones. So, he has also implemented a new system that will allow him to enter commands or initiate updates in one place, and have them out to all of the cloud servers at once. (I wouldn’t mind something like that for my chores... imagine doing one load of laundry and when you’re done, two loads are clean!)
New On-Call App
Finally, our on-call staff have switched to a new monitoring app, ensuring that they will continue to be notified right away if something goes wrong with the servers and any emergencies can be dealt with as quickly as possible.
Downtime Update
ExpiredWhat a week! ...and it’s only Tuesday.
As I’m sure all of you know, one of our servers decided that 2020 was just too much for it and bit the dust on Monday morning. Our 7.7 customers dodged the worst of it--we’ve been moving everyone to newer servers as they migrate to the new version--and have seen little disruption. Unfortunately, the rest of you have had significant downtime, and we apologize.
We attempted to resuscitate the server without success. At that point, our disaster recovery procedures went into effect: backups were recovered and our valiant IT and Dev team spent the rest of the day and night porting them to a new cloud server and getting everything rebuilt. Since then, we’ve been working with everyone to get VPNs pointing to the new server location, correcting settings to restore printing and reports, and doing a whole lot of troubleshooting. (All while Nancy is also running an Administrators Training.)
Up next: finish getting all the WebOPACs and WebOrder systems back online and functioning normally.
So, while we know this process hasn’t gone as quickly as hoped, please be assured that we are doing everything we can to get you back in business ASAP. Thank you all for your patience and assistance!
Offline Check In
ExpiredWe’ve posted a lot lately about our emergency preparedness, but is there anything you all can do to keep a minor disaster from grinding your services to a halt?
There is!
If you experience a network outage, whether due to bad weather, service provider outage, or construction chopping through your network cable, you can still work on checking in returned materials. This method will work if you can’t log into KLAS or even if you can’t access the internet at all. All you need is a computer (a laptop on battery power is fine) and a scanner.
Here’s how:
- Open Notepad. Other word processors (such as Microsoft Word) can add formatting and metadata that will muddy the data. Basic Notepad leaves the text clean and ready to be imported.
- Scan or type in the shelf ID indicating where the books will be shelved.
- Scan the barcodes of everything for that shelf.
- Repeat steps two and three as needed.
- If you are on battery power, make sure to save the file frequently.
- Save the file. You can then go ahead and shelve the materials as if they have been checked in.
This is an example of how your file should look (with, of course, your own library’s shelf IDs):
123F
01291237437
21012913274
20129137423
123G
20390746787
39287179034
30927405972
Then, once order has been restored and you can access KLAS again:
- Open the Circulation Module – Batch Check In window.
- Use the Browse button at the bottom of the screen and select your saved file.
- The shelf locations and barcodes will be loaded in.
- Press Submit.
- KLAS will separate each shelf into a separate batch and check in each item.
Hopefully, you won’t need to employ this method. But if you do experience an extended network outage, there’s no need to let the books pile up!
As always, if you have questions or need some additional assistance contact KLAS Customer Support.
Around our Raleigh, NC office today everyone is talking about how they are preparing their homes for the unwelcome arrival of Florence. But you might be wondering how we’ve prepared our operations and support for such an event.
The answer is a combination of both procedural and physical preparedness including:
- A gas-powered generator at our office
- Redundant internet providers, firewalls, and network routers
- Daily backups of data to our on-site servers
- Weekly data backups stored offsite
- Encrypted database backups on AWS S3
- VOIP Telephone system to allow staff to work remotely
- Keystone Status Page to communicate database availability, even if we’re unreachable
- Contingency plans and equipment needed for remote database and customer support
We have prepared for events like we are now facing—whether the event was a hurricane, ice storm, or some other disaster. Part of our annual SSAE audit is to further review and refine those disaster preparedness plans.
For example, our office is equipped with a 60kVA natural gas generator to power the building in the event of an outage. If electrical service is disrupted, the generator will maintain power to our servers and communications systems as long as the natural gas line provides fuel. Keystone also has redundant fiber-optic connections to the Internet with separate vendors, along with dual network routers and firewalls. This increases the chances that our communications will remain operational throughout a natural disaster—if one network goes down, the other can take over.
Here’s a summary of some of the anticipated “events”, the measures to manage them, and the anticipated impact on customers:
- Loss of power at Keystone’s office
- Natural gas generator automatically kicks in to provide power to Keystone’s servers, and 2 workspaces for Keystone staff to be able to work.
- The generator was tested less than a week ago (Saturday 9/8).
- No discernable impact on customers.
- Roads are impassable / Unsafe for Keystone staff to come to work
- VOIP telephone system allows staff to respond to phone calls remotely
- VPN access allows staff to connect to the office network and work as if they were at their desks.
- No discernable impact on customers.
- Roads are impassable and power is out at Keystone staff homes
- Unfortunately, we don’t want staff trying to travel to the office if it’s not safe, and if they lose power/internet access from home, the staff won’t be able to support customers.
- Impact: KLAS Hosting will continue uninterrupted, but our response to support calls will likely be greatly reduced until conditions improve.
- Loss of communications at Keystone’s office
- Keystone has redundant fiber-optic Internet connections, from separate companies.
- KLAS Hosting would not be impacted unless *both* connections were lost. The systems will automatically accommodate the loss of one.
- Keystone’s telephone service would be impacted if our 2nd connection is lost. In this case, we will be reduced to only email communications.
- In the case of loss of both communications links, Keystone will transfer KLAS hosting operations to our cloud-based Disaster Recovery site. (more information below)
- Customer impact: depending on the number of communications links that are lost, the impact will range from being rather minor to quite substantial. Keystone will use the “Announcements” section of the new “Keystone Status Page” to communicate the current impact, and how to best contact us. It is accessible from the Keystone Status Page menu item on klasusers.com or this URL: https://uptime.statuscake.com/?TestID=emlREBtN3e
In the event of loss of both network connections, or power from the generator, we will begin the process of migrating customers to our new cloud-based disaster recovery site. This process takes several hours to complete, therefore customers will be notified when their database is available along with instructions on how to connect to the cloud database, if necessary.
Procedurally, we take steps on a daily, weekly, and monthly basis to plan for both minor and major disruptions and disasters. We have checklists of tasks to complete just prior to anticipated weather events such as Florence to make sure we securely lessen the impact and the time and effort needed to recover. Such regular and one-time planning includes making sure we have secure backups of your data as well as the ability to continue maintaining and providing access to it. Additionally, we’ve taken steps to allow our support service personnel to work remotely with continued access to your databases, and to our phone and email tracking systems.
Just recently we implemented StatusCake, a monitoring system with a Keystone Status Notification Page, which will tell you whether or not KLAS is up, even if you cannot reach us. In KLAS 7.7 (the next major release) each library / organization will have their own notification page to check their own database’s status. KLAS 7.7 will also allow you to access KLAS remotely via a secure https: connection rather than requiring a VPN.
Our physical office is not located in a floodplain, is near major electrical and telephone distribution centers, and all cables into the building are buried underground. During Hurricane Fran, the last major hurricane to affect Raleigh, services were restored to the area around Keystone current offices within 2 hours. Since then, the City of Raleigh Police Department has leased the lower floor of our building which means that restoring power and communications to the building is of an even higher priority to the utility companies.
We hope that all of you who are also in Florence’s path are able to stay safe. And for all of our customers, no matter what Mother Nature sends our way.
Rest assured that we're well-equipped to keep your data safe and any interruptions to your service at a minimum.