Thursday, June 24th, 2010 - 11:37 am EDT

Tech Tip: Common Ways to Tell You Are Not Prepared to Recover from a Disaster

Posted by: Michelle Liro

Today's tip comes to us from author Eric Beehler via our friends at Realtime Publishers.

Disaster recovery is somewhat of a buzzword in the IT industry, and IT professionals have all been exposed to their share of great disaster recovery ideas from business managers. These ideas are often based on the industry buzz and seem to only make more work for you with little gain overall. This is usually because the idea is not backed up with a real plan. The actual implementation of disaster recovery is usually a big chore to undertake correctly, but in the end, it is well worth the trouble.

It's important to be ready to recover your data and systems when a disaster strikes, but it is rarely a top priority in the grand scheme of IT projects when crisis has yet to strike close to home. Unless your company has decided to make disaster recovery a high-level objective, it's usually the front-line administrator that will be saddled with the responsibility of implementing some sort of plan to save the day -- but you will likely be short changed on training and resources to get the job done.

There are many ways to deal with a disaster, from having a set of cold standby machines to employing a fully redundant hot data center. In reality, as the administrator, your job doesn't change much based on the scenario for recovery; it has to be up and available to keep your business running. You likely have some kind of plan now, but if you haven't been through the real thing, you really don't know if your plan will hold water. For Windows administrators, there are several problems that seem to expose themselves when it's time to exercise a disaster recovery plan, or worse yet, go through the real thing. Here are some common ways to tell that you are not ready for a disaster.

Plan for an Alternative Site
You are not ready for a disaster if you don't have a place to go, which requires planning for a full on-site disaster in which your site is down or inaccessible. There are several methods to address this issue if you don't have a solution today, from having an alternative site with servers waiting to be loaded up for operation to a warm site that is always ready and waiting to take traffic. These decisions are not usually made by you but by the CIO. All you can often do is consider the solution given to you and how that will impact your ability to recover. A cold site, for example, will allow you to have hardware and connectivity available, but you will need to account for operating systems (OSs), drivers, configuration differences, and data center differences. In a warm site, you have to ensure that changes to configurations and data remain synched across the two sites.

Plan for Downtime
You also have to consider whether the site solution will support the Recovery Time Objective (RTO) required by the applications and business. Simply put, the RTO is the amount of time your users will be without the functions supported by your server, which could be a Web site, a mailbox, or the ability to log on to the domain. You should have this time defined per application or function supported by your server. This, of course, in a bigger effort for disaster recovery, may be defined for you, but don't be surprised if the business people you support have no idea that your server supports the functionality they require. You may need to interject with your personal knowledge of how your server functions in order to get this definition correct.

There are generally accepted categories for RTO that fall into tiers, as Figure 1 shows. Use these as a guideline but feel free to create standards within your own organization to meet your needs. If you have a need to recover applications with 2, 4, and 8 hours, redefine the tiers so that they make sense to your business through an analysis of the business impact of downtime. Just be sure that you can apply the standards as broadly as possible across the organization.
 

Plan Your Tolerance to Data Loss
You are not ready for a disaster if you don't know your tolerance for data loss. Let's start with the basic foundation of the backup. Whether you use simple tape backups or an advanced nearline solution, you have to consider that most solutions are put in place to account for day-to-day operational needs. First, the exercise you went through with RTO must be done for the Recovery Point Objective (RPO), which is the amount of data that can be lost. You have to understand what the business can afford to lose; this value is not necessarily tied to an RTO tier. Take, for example, a point of sale system. If the system is down for 5 hours, the business may be able to recover by entering the orders taken while the system was down, but data loss of 5 hours may mean millions of dollars in lost sales.



The gut reaction for your RPO on some of your systems may be that no data loss is acceptable. In other cases, 24 hours of data loss may be acceptable. The goal is to understand what can be tolerated, not what is desired. Everyone will desire no data loss, but put a realistic perspective to the real value of the data. If you define Tier A RPO as no data loss, then you have to put systems in place that allow for that reliability. This means copying transactions as they happen to a backup site, which is an expensive solution that should be used only on your critical business applications, depending on your budget. If you have Tier B systems as defined in Figure 2, you will need some sort of solution that will be separate from your nightly backups, as you cannot count on having your last nightly tape backup at your recovery site.

Considering the Loss of a Backup
You are not ready for disaster if you rely on your daily backup for a recovery scenario. You may have in your head that you can rely on the last tape backup in the event of a disaster. Whether such is the case depends on a key question: can you get your restore process to work offsite? Don't be so quick to answer this one. If you take advantage of offsite storage either through a vendor or your own in-house process, it is an excellent step, but offsite storage doesn't necessarily guarantee you can restore at your disaster recovery site within the specified RPO and RTO.

Tape drive compatibility, backup software, delivery time, drivers, and OSs are all considerations that you must address prior to saying your solution is ready. This is especially true for a third-party backup site that will provide you with "like" hardware. That equipment will not be your equipment, and even if it is, expect aspects of the infrastructure to be different, such as IP address schemes, firmware (which can be a nightmare when working with SANs), and simple access to the hardware.

You also have the issue of archive requirements and the fact that you likely rely on these tapes for your day-to-day restores. If you perform restores for file recovery and other issues, you likely want to keep those tapes close by. If you ship them away for maximum protection, it's going to cost a pretty penny in order to request tapes from your offsite storage vendor.

You also have to consider how those tapes make it to the recovery site. If you make full backups only once a week and you only do offsite storage once a week, you might only get a restore from 2 weeks prior. Why? Because if you are lucky enough to get your tapes offsite a day or two after the full backup and you get the shipment to your disaster recovery site 4 to 8 hours after they are requested, you can almost bet that Murphy's Law will strike and you will get a bad tape somewhere in the set. Then you have to move back in the chain, and with most full backups run weekly, you might be taking you system back 2 weeks or more if Murphy continues to strike. Now, the RPO of your plan that you expected to meet with your existing backup plan is not being met.

Even if you do recover your servers with no issues, how long will it take to recover them all? Consider the queuing on the tape drives, with multiple servers waiting for those tapes to be loaded. It could take quite a long time before you even get a chance to try a restore to your server depending on the technology present at the recovery site. What can you do? Well, time to restore will be reduced if you can restore large chunks at one time. Consider putting systems with like RPO and RTO requirements in the same backup set.

Better yet, host them on a LUN or set of LUNs on your SAN or other logical storage method in your situation so that a restore can be done all at once. You might even consider booting from the SAN, which might save you from having to restore the local disk of many servers. If you have a blade server solution, this may even be baked into your infrastructure.

Using Disk-Based Backup
Let's also consider disk-based backup. This solution has become increasingly popular because of the low cost of hard disks and the ease of backup and restore. In addition, disks often take minutes to back up and restore what used to take hours. The software supported by these systems even has versioning, much more frequent backups, and nifty utilities that make life much easier on the administrator. This is usually all handled by complex backup management software such as Microsoft System Center Data Protection Manager. When using this kind of solution, consider employing these often-integrated features to support data replication of some sort, although vendors name these types of features differently.

You can even copy your live data to your recovery site using a SAN/NAS vendor's Failure Resistant Disk Solution (FRDS). You should, however, consider the fact that this kind of solution will be much more expensive than tapes because it will require duplicate equipment with data replication happening across a wide area network (WAN).

You should refer to your RTO and RPO tiers to determine whether certain servers and data sets could stand to be away from your disk replication and rely more on a tape solution. You should also consider your disaster site and understand whether it can support this kind of solution. You should treat your server restores as a form of triage. You need to know, based upon RTO and RPO, what you are going to recover first and what can wait.

Considering Configuration
If you can't identify the full configuration of your servers, you are not ready for a disaster. Realistically, can you keep track of 300 shares on a terabyte SAN served by a load-balanced Windows cluster server? Do you know which shares go to which directories on which LUNs? You have to document configurations. This is true whether you have a basic bare metal restore plan or a full redundant data center. The luxuries of a production environment won't be at your disposal. A normal production environment allows you the opportunity to compare configurations when something goes wrong and work through a problem. A disaster affords you no such luxury.

No matter how familiar you are with your systems, you need to have everything documented that can be changed. For any applications, you should have a guide for their installation in your environment. You should have the servers documented with everything from IP addresses and patches to database connections and configuration files. If you run IIS for Web applications, you should have that configuration documented as well. Some sort of context diagram is often useful to determine how your server interacts with other systems.

Utilize configuration management systems, such as SMS, to do some of the heavy lifting for you. Create reports and keep them up to date in an alternative location, either a paper copy offsite or an electronic one. Configuration problems seem to be a killer when recovering because changes sometimes get applied without strict control. What seems like a small change can kill you in a disaster when it hasn't been documented.

Documenting the infrastructure goes beyond your own servers, but is just as important when it's time to troubleshoot. You can bring your file server back and you can bring your application servers back, but if you don't have proper DNS or connectivity, no one will be connecting to those systems you've recovered. If you have dependencies on other systems, you need to identify them. Know what names should be in DNS, what IP addresses and subnet you are on, what systems you interact with such as database servers or other back-end services such as the DMZ or Internet access. When you tell a database administrator that your application is taking SQL errors, you should know what database server, database, port, connection type, and authentication type you are using. You should also know the user name and password being used, if there is one. Does the server break down into pieces? Does it have multiple applications or functions? Document those functions separately.

You can't think of server as a single system if your customers don't see it as a single function. Remember that restoring an infrastructure is many pieces to a whole, and you should not expect any of those pieces to work correctly as you can in a production environment. In fact, when you face an issue in production, it usually has a single root cause, but a disaster recovery will usually experience several major issues at the same time. You need to know where you stand in the ecosystem of your environment to understand how to identify and help fix those issues.

Identifying Single Points of Failure
If you have a single point of failure, you are not ready for a disaster. A single point of failure can ruin your nicely laid out plans. Although not a requirement for a disaster recovery, the ‘N + 1' definition used when considering disaster recovery is many components backed up by a single component. You can still run into problems using N + 1, especially at a cold site where you have not been exercising your disaster recovery equipment to ensure its health. You might consider having additional servers of a similar capacity available above the minimum number required to recover just in case you experience a failure at your recovery site.

An optimal solution will have redundancy built-in to your recovery site the way you have it outfitted at your production site. If you have a failover cluster in one location, you would do the same in the recovery site, even though you could technically get by with a single server, assuming that server functions as expected. You should also consider the interdependencies of your infrastructure, such as network, when you think of this issue. Single switches, routers, domain controllers, and sources of power can also be points of failure.

Single point of failure doesn't stop at the system level. You might have that one guy or gal who knows everything about your environment. When you're at his desk and something goes wrong with the system or a specific application, he always has the answer. This gal is a good person, but when it comes down to it, you can't rely on a single person. When a disaster strikes, the go-to person may not be available during the recovery phase-yourself included. When everyone looks around and throws up their hands because such and such is down, what do you do? You wish you could go back in time and document that ingrained knowledge. This is also true for day-to-day operations, but especially necessary when everything is going wrong because of a disaster. The person who knows it all is not what you need, you need full documentation of the knowledge that person possesses. Your go-to should really be your documentation.

Integrating Disaster Recovery into Daily Life
If you don't integrate disaster recovery into your daily operations, you are not ready for a disaster. Organizations that plan for disaster recovery as a single project with a start and an end will fail. Don't let the hard work go to waste. When you put these plans in motion, get all that documentation done, have recovery solutions in place, and continue to update your documentation and test your systems. If you don't test you disaster recovery process regularly, how do you know it will work? If you don't update your documentation day-to-day when changes are made, your documentation is outdated and may even be detrimental to your recovery efforts. Don't let apathy or a disconnected process of change management get you in the end. Not only does integration help your readiness, it reduces the dedicated time necessary to getting disaster recovery ready. Find a way to make what you use in disaster recovery a part of daily life.

Eric Beehler has been working in the IT industry since the mid-90s and has been playing with computer technology well before that. From Help desk technician to solutions provider, he has been involved at many layers of enterprise solutions from the desktop to the network to the server and the SAN. He currently has certifications from CompTIA (A+, N+, Server+), and Microsoft (MCITP: Enterprise Support Technician and Consumer Support Technician, MCTS: Windows Vista Configuration, MCDBA SQL Server 2000, MCSE+I Windows NT 4.0, MCSE Windows 2000, and MCSE Windows 2003). He also holds a Master’s degree in Business Administration from the University of Colorado at Colorado Springs. His experience includes more than nine years with Hewlett-Packard’s Managed Services division, working with Fortune 500 companies to deliver network and server solutions and, most recently, IT experience in the insurance industry working on highly available solutions and disaster recovery. He has co-authored books, including MCITP: Microsoft Windows Vista Desktop Support Enterprise Study Guide (Sybex/Wiley Publishing), authored several white papers, and co-hosts the "CS Techcast" podcast aimed at IT professionals. He provides consulting and training through Consortio Services, LLC.

For additional information about Disaster Recovery and High Availability topics, be sure to check out Marathon's Resource Center which has an extensive library of white papers, webinars and eBooks availabile for download.

 

Show Discussion / Comments (0)
Disaster Recovery  Availability  Business Continuity  Disaster Tolerance  High Availability 

| More



Thursday, June 17th, 2010 - 1:37 pm EDT

How to Cut Risks and Costs with a Downtime Analysis & Action Plan

Posted by: Michelle Liro

Earlier this week, we hosted a webinar on the topic of “How to Cut Risks and Costs with a Downtime Analysis & Action Plan.” We know from our experience in application availability that many companies avoid these types of assessments – they either don't know where to start or decide that they don’t have the time or experience to conduct an assessment, so they just live with the unknowns and hope that nothing bad happens. (We’ve seen the consequences of downtime at many companies and don’t recommend this method!)

Our VP of Services & Support, Beth Shea, explored this topic in detail and provided a simple framework that companies can use today to uncover their risks and put measures in place to minimize the impact of downtime. To learn more, be sure to watch the 30-minute webinar. You can also check out the Q&A session from the webinar, summarized below.

Q: When looking at the impact of downtime, it is just unplanned downtime, or should you include planned downtime as well?
You absolutely need to plan for both planned and unplanned downtime, as there’s a real cost and business impact to both. They both need to be included in your impact assessment.

Q: What about branch offices – should they be included in a downtime assessment?
According to Forrester Research, about 20% of a company’s business is tied up in branch and remote offices, and IT needs to include these offices in any assessment that they are conducting. You shouldn’t overlook these offices when putting together your downtime and business impact assessments. They have to be factored in.

Q: How often should I conduct a business impact and risk assessment?
What we’ve found with our customers is that conducting an annual assessment is sufficient, or in some cases, twice a year, depending on the type of business. You can then use these as your benchmark going forward to determine the success of the initiative and ensure that you have the key metrics to report to your management team.

Q: How do you determine when to use local high availability vs. a disaster recovery solution?
Fault tolerance, high availability, disaster recovery - all of these different terms can be confusing and they can have different meanings to different people. The way we think of this is that when you’re implementing high availability or fault tolerance this is to ensure that locally you are protected against the everyday, nuisance failures that cause downtime. If you lose a fan or a drive for example, you would automatically route to another server within the same building or local area. Disaster recovery solutions are really for recovery from catastrophes (fire, flood) or other events where you need to failover to a much more distant location. You don’t want to use this type of solution for everyday failures, as it can be very time consuming to failover and failback, and you can potentially lose some data. For local protection, you want high availability/fault tolerant solutions.

Q: What about hosted applications like salesforce.com, how do I account for those in this type of assessment?
In today’s world, so many applications are offered as Software-as-a-Service (SaaS) or sometimes called hosted applications, where they are no longer hosted at your site. However, they are still important to your business and need to be included as part of your overall assessment. Our approach is to conduct the assessment for your SaaS applications as if all they were onsite. Then use your tiered analysis and make sure that your SaaS vendor is meeting your availability requirements for that application, and that they have the necessary protections in place to protect that application to the same level that you would protect if it were in-house.

Q: Does Marathon offer any services to conduct this type of assessment?
Yes – this is a service that we provide for our customers. Most customers are very satisfied with the service, because it usually has an immediate ROI for their business. If you are interested in this type of service, please feel free to us at 978-489-1100.

Q: Does Marathon have any templates available to build a framework for this type of assessment?
Absolutely. From our 16+ years of working with customers on the assessment and prevention of downtime, we’ve put together an extensive list of questions to ask about the business risks and impact of downtime. Please feel free to contact us if you would like more information.

Q: How do you measure or put a price on the intangible impacts of downtime?
This can be tough to nail down, but what we recommend is developing some basic estimates. This isn’t meant to be an exact number, what we are really trying to achieve here is to prioritize applications, put them into the tiers that we discussed and make sure that you are putting the right amount of resources against the right applications. From a productivity perspective, one metric you could use is to look at the cost of employee salaries and how much it would cost in salary costs to have employees not be able to work for a certain amount of time. This is just one example.

Q: Does everRun handle quick switch over to back up site if the main site goes down?
Yes, within seconds.

Q: What are the requirements for the backup site?
The machines at the backup site are in the same pool as the primary site, so the backup machines must meet the requirements to be in the same pool as the primary site machines.

Q: How about regular data sync between main site and backup site?
Since the primary and backup site are running in lockstep mode, the application and the data are always in sync between the primary and backup sites.

Show Discussion / Comments (0)
Downtime  Availability  Disaster Recovery  Fault Tolerance  High Availability  Interview  Webinar 

| More



Monday, May 24th, 2010 - 11:58 am EDT

The Changing Dynamics of Data Protection

Posted by: Michelle Liro

Frank Ohlhorst, former Executive Technical Editor for eWeek and award-winning IT expert, was our expert guest speaker this week for the webinar, “Cut Your DR Costs and Get Better Data Protection.” During his presentation, Frank reviewed why he believes that now is the time to rethink traditional approaches to disaster recovery. He explained why the total cost of ownership for disaster recovery solutions is on the rise, and why changing data protection dynamics are making it more economical to focus your time and budget on the prevention of downtime and data loss, rather than recovery.

Below is the summary of the audience questions from the Q&A portion of the webinar.

Q: You talked about how HA can give you a geographic advantage. What do you mean by that?
Frank Ohlhorst: High availability systems are designed to work with multiple servers and there’s no reason why you can’t have those servers located hundreds or thousands of miles apart. You get a geographic advantage because your data centers is in multiple places and regional areas, so if a weather-related or other event occurs, let’s say a blizzard up north with a power outage, your data center down south can pick up the slack without kicking users off the system. The same can be said about a data center located in an area with hurricanes or other natural disasters. The geographic separation gives you added protection.
When high availability is paired with load balancing, it helps to locate the data resources closer to where the users are requesting them. Let’s say you have users in Utah, it’s better performance-wise to have them talk to the data center in Nevada rather than Virginia. It helps on that level also. HA solutions also have the tools for monitoring what is going on with your users and network, to help you plan out how you should assign users to specific data centers for the most efficiency.

Q: I understand how high availability can handle unplanned downtime, but what about planned downtime? Can it help there as well?
Frank Ohlhorst: Yes, the idea there is being as you have multiple active systems to meet the user’s needs, you can take one of those systems down for maintenance and have the users serviced by the active machines while you make the updates and improvements. Then when you are done, just resynchronize with the other systems, move the users over to those systems and update the rest of the servers.
Another great benefit of this is for testing upgrades and changes. So take one system offline and test your upgrades to see if they work properly before you return that system to production.

Q: If I have an HA solution in place, is back-up still necessary?
Frank Ohlhorst: 99% of the time the answer to that question is yes. It depends on what your corporate needs are. There are certain situations where HA might not deal with your catastrophe. Those are usually software-damaging events, like a virus infection, that winds up getting replicated across the system. Of course, that should really be part of your security planning to prevent events like that from even happening. With today’s security technologies, it’s pretty easy to prevent that. But if you did ever have one of those events, you do need something to roll-back to, and that’s where the back-up comes in to play. Ideally though, you should be preventing that type of event, because you also have the potential to lose active data if that happens. When it comes to compliance or auditing, you have to restore data relevant to that time period to meet the needs of e-discovery, compliance, accounting audits and other similar requirements. So you can’t just say, “I have HA in place, so I don’t need to back-up.”

Q: What about data de-duplication technologies, don’t they help solve this problem of managing large volumes of data?
Frank Ohlhorst: They reduce the data footprint for sure, but what we’re talking about here is availability of the data. They can certainly reduce the size of your data footprint, you can use de-dup to speed up backups. At the end of the day though, if the system or application is not accessible to the user, then it’s not available and you haven’t met your objectives. It’s a simple matter of business logic that data de-duplication can improve performance and reduce the size of the footprint, but it doesn’t solve the problem of providing access to users during catastrophic events.

Q: Do you see continuous availability and high availability as the same, and if so, how do you differentiate between the two and the costs?
Frank Ohlhorst: There was a time when those technologies were very, very different. That was way back when we relied on expensive hardware-based solutions or appliances that provided continuous availability. High availability at that time was thought of as a method to switch from one server to another using a manual process in the case of an emergency.

High Availability technology has evolved significantly since then. Now, the two are really one in the same from a planning and software point of view. Today’s HA solutions eliminate that step of manual switchover. What you see with the vendors today is automatic HA technology that really delivers continuous availability. And the cost gap today is pretty much zero, since the technology for continuous availability and high availability has evolved to be almost one in the same.

Q: With an SRDF/S-type solution, how can we get around the fact that being geographically more separated to mitigate regional disruptions can mean slower primary system response times due to the need to remain synchronous?
Frank Ohlhorst:
Let’s look at this first from the ideology of what we’re trying to do which is business continuity. So, if you encounter a situation when you lose connectivity to a system and it’s still available at another location, then you’ve met the goal there of providing continuity. And you’re in much better shape than you would be at that point if you had a disaster recovery solution instead of a business continuity solution.

The question you have to ask yourself at that point in time is: Is reduced performance better than no performance at all? For most businesses, the answer is yes. For others, if the performance lag is significant enough it can impact business. In those cases, you’ll have to work out a way to develop geographically dispersed sites can that can provide enough performance to the user sets that need access to the data. You also need to make sure that your connectivity has enough bandwidth to support your BC/HA solutions, which means the ability to replicate the data in real time across the wire. You might have to invest in larger pipes for better connectivity to support that. But again, that depends on your particular business and your needs. There is no one correct answer to this question, but the good news is that there are several solutions today that can help you solve this problem and meet the levels of availability that you need for your business.

Show Discussion / Comments (0)
Disaster Recovery  Availability  Business Continuity  Continuous Availability  Data Replication  Disaster Tolerance  Fault Tolerance  High Availability  Interview  Webcast  Webinar 

| More



Wednesday, April 28th, 2010 - 10:48 am EDT

High Availability for the Masses

Posted by: Michelle Liro

LAN Magazine, a leading IT publication in the Netherlands, recently published an in-depth review of everRun 2G titled, “High Availability for the Masses.” You can read the review (in English) here. Bram Dons, the reviewer and author of the article, had this to say about everRun 2G:

"Although recovery certainly plays a role in high availability, it is much more about preventing downtime and data loss. Marathon Technologies shows this with the new everRun 2G."

“In all evaluation tests we have conducted to date, Marathon’s everRun 2G is the only software product that manages to provide complete continuity in all circumstances. Failure of system components, power interruptions—the product does not fail."

To read additional reviews of everRun from leading IT publications like eWeek, Network Computing UK, and IX Magazine, check out our product reviews page.

Show Discussion / Comments (0)
EverRun  High Availability 

| More



Monday, April 26th, 2010 - 1:38 pm EDT

10 Common Mistakes Made by Disaster Recovery Teams

Posted by: Michelle Liro

The application availability experts here at Marathon were asked to put together "10 Common Mistakes Made by Disaster Recovery Teams" for a featured slideshow for ITBusinessEdge.com. These 10 common mistakes are summarized below:

1. Confusing HA and DR
A lot of companies confuse high availability (HA) and disaster recovery (DR), or implement a DR solution when they really need HA. Put simply, HA is about preventing the everyday failures that cause downtime (network card failure, storage corruption), while DR solutions are designed to help you recover from true disasters (floods, hurricanes), not minor problems.

2. No specific disaster recovery plan
Implementing disaster recovery software or speaking broadly about “what-ifs” is not enough. The IT team must be well versed in a set plan which has been tested and proven effective. IT staff, as well as upper-level management, should be trained in the DR protocols in the case of any business disruption. In the event of a disaster, team members should already be familiar with the plan and not rely on in-the-moment decision making.

3. Untested disaster recovery plan
While testing the plan may not mean that it will go off without a hitch, it is an important step in preparing the company for a disaster. After testing, improvements should be made and the plan should be scrutinized for any possible holes.

4. Only involving the IT team in the planning process
Disasters affect the entire business, not just your IT infrastructure. Representatives from all company departments should be involved in the planning process and should know their role in the event of a disaster. In addition, it is imperative to train company executives and decision makers in how to carry-out the plan. They should be aware of all protocols, and be involved in testing exercises.

5. Adding too much complexity
Many technologies actually introduce complexity into the IT environment. For example, clustering technologies may require administrators to painstakingly maintain each server in the cluster to support successful failover. IT organizations instead should find and embrace those technologies that reduce complexity for operational staff—thereby eliminating potential sources of human error.

6. Purchasing inexpensive, low-quality hardware
While it is tough to justify shelling out the extra dough for a top-of-the-line server, it is well worth it on the day that your processor fails. Many IT staffs are working with constrained budgets and therefore have to buy lower priced equipment. This equipment is more likely to see failures, increasing the likeliness of future problems.

7. Using common components in the physical network hardware
For example, dual-ported network cards share common hardware logic, and a single card failure can disable both ports. For full redundancy, you need either two separate adapters or a built-in network port combined with a separate network adapter.

8. Utilizing on-site data replication
Many factors can cause site-wide failures, including an air conditioning failure or leaking roof, a power failure, or a major hurricane. Site disruptions can last anywhere from a few hours to days or even weeks. There are two methods for replicating data across sites. One method is to tightly couple redundant servers across high speed/low latency links, to provide zero data-loss and zero downtime. The other method is to loosely couple redundant servers over medium speed/higher latency/greater distance lines. This provides a disaster recovery capability where a remote server can be restarted with a copy of the application database missing only the last few updates. In the latter case, asynchronous data replication maintains a backup copy of the database.

9. Implementing a plan that worked for someone else
DR/HA is not one-size-fits-all. Every business has different objectives for different applications. It’s ok to look to others for guidance, but stay focused on your specific goals.

10. Not understanding business requirements
What exactly is it that you need to accomplish? Implementing wrong or incomplete solutions can waste time and money. Know what clients and users need and adjust the DR plan based on the service levels that need to be met.
 

Show Discussion / Comments (0)
Disaster Recovery  High Availability 

| More



Thursday, April 15th, 2010 - 5:38 pm EDT

High Availability on a Budget Q&A

Posted by: Michelle Liro

Earlier this week, we had a great webinar featuring guest speaker Greg Cullen, Sr. Director of Technology at Marathon. Greg provided tips and advice that government agencies can use to ensure optimal availability for their critical applications at minimal cost. He also reviewed  three different government agencies that protected their data and applications from downtime with Marathon’s everRun software.

These included the Brookline Police Department, which kept their 911 system available – even during a hardware failure on Christmas day when no IT staff were on duty; the City of Santa Rosa utilities department, that protects its water treatment facility against outages from earthquakes, power outages and other natural disasters; and a county court system in California that protects its virtualized Exchange server and other paperwork processing applications from downtime.

The transcript of the Q&A session with Greg from the live webinar is below.
 

Q: Can you deploy the two everRun servers in different locations or do they have to be in the same location?
You absolutely can. You can deploy the two servers in the same room, or different rooms in the same building and even in two separate buildings. We call that configuration SplitSite. That’s one way to get disaster tolerance in the event that you have a site-wide outage, rather than having them in the same room or building.

Q: Is there a limit to how far apart the servers can be?
In general, they can be separated by as much as 100 miles, although it really depends on your bandwidth and latency on the connection between the two sites.

Q: What are the hardware requirements for everRun?
You’ll need Intel-based servers that have a moderate amount of memory, or as much as the application requires. And you’ll need enough networks to do the production side of it as well as maintaining the redundancy between the severs. Generally speaking, having four network adaptors in each server and somewhere in the order of 100GB of disk drive is sufficient.

Q: How does the everRun software compare with a clustering solution?
One of the biggest difference between everRun and clustering is that everRun is a single image for the application. Instead of installing and managing two instances of your application like you do with clustering, everRun is just a single image to install and manage. Changes happen on both servers simultaneously through that single image.

Also, everRun software does not require cluster-aware applications. everRun is application agnostic, and can support almost any Windows application. And one more thing, with most clustering solutions, you also need to have a shared storage container that both servers are connected to. everRun can support that model as well, but doesn’t require it like clusters do. In fact, to remove single points of failure, it’s much better to have local storage connected to each of the servers and everRun will manage that storage as a mirrored device.

Q: I’m confused by your use of DR. Can you define what you mean by disaster recovery?
We’ve found that everyone has a different definition of what they mean by disaster recovery. At a very high-level, we see disaster recovery as the need to protect your data. By comparison, we see high availability as the need to protect your application, data and network connectivity. DR means you’re trying to copy your changed data off site to protect it in the event of a true disaster. After the “disaster” is over, you then need to bring that data back to the primary site, or configure an alternate server to use the data in the DR site.

Q: Does everRun work with Siemens building security systems?
Yes, we have been working for several years with building automation and security companies including Johnson Controls, Tyco, Andover Controls, Siemens and many others. As long as the building system runs in Windows Server 2003 or 2008, we can provide availability for it with no custom scripts or custom coding. We have many deployments of everRun protecting these building security systems around the world.

Q: Does everRun work with e911 systems?
Yes – absolutely. Generally speaking, everRun is application agnostic and can work with almost any Windows application. We have many solutions out there where these emergency 911 centers are protected by everRun so that if there is some type of disaster, these systems continue to run.

Q: Is everRun available on a GSA schedule?
Yes, through our channel partners. Contact your Marathon account representative or call 978.489.1100 for specific partner information.

Q: How does everRun differ from data replication solutions?
A lot of times when people look at availability, they simply try to replicate the data. There’s a big issue with that though. That’s only one part of what you need to recover in the event of a failure. everRun not only replicates the data, but also keeps a redundant set of your application environment and network connectivity, and everything else that is required for the application to not see any failures at all, or to recover very quickly in the advent of certain types of failures.
 

Show Discussion / Comments (0)
High Availability  EverRun  Interview  Webinar 

| More



Friday, April 2nd, 2010 - 1:29 pm EDT

Welcome Thomas Goebels

Posted by: Marathon Technologies

Thomas Goebels recently joined Marathon as sales manager for the DACH (Germany, Austria and Switzerland) region. Thomas will be responsible for growing Marathon’s business across the region with a particular focus on the channel and end users. Thomas brings impressive sales experience and a remarkable track record to his new role, having previously held positions with NetScout Systems, PC-WARE Information Technologies AG, Bechtle and Novell. Thomas will be working closely with Marathon’s distributor in Germany, ADN. ADN has been Marathon’s key distributor for the DACH region since 2007.

“The need for data and application availability has never been greater as companies of all sizes now have to operate on a 24x7 global basis. Marathon is uniquely qualified to provide solutions to meet the customer pain points associated with this,” said Thomas. “I am eager to build on Marathon’s vision of providing automated high availability and disaster recovery solutions for businesses of all sizes.”

Thomas can be reached at tgoebels (at) marathontechnologies.com
 

Show Discussion / Comments (0)
Announcements  Marathon 

| More



Thursday, March 18th, 2010 - 11:16 am EDT

Automation Webinar Q&A

Posted by: Michelle Liro

Earlier this week, Craig Resnick, research analyst from ARC Advisory Group, joined us to discuss Best Practices for Preventing Downtime in Automation Systems. Craig's presentation was very well-recieved, with several attendees commenting on the high quality of the information Craig provided. If you haven't had a chance to see it yet, the on-demand recording is here and the recap of the Q&A from the webinar is below.

Q: Has the hierarchy at manufacturers changed where the groups that mange these different domains have converged, or are they still separate?
Craig Resnick, ARC Advisory Group: Over the last five years, we’ve seen the convergence of IT with the automation and operations groups. Five years ago we used to joke about the “civil wars” between these groups. IT used to poke fun at the factory floor about the age of the equipment, which can be 10, 20 or even 30 years old in some cases. The Factor Floor used to poke fun at IT because, as they put it, IT didn’t understand what “real-time” means. We’re finding now that there are many initiatives between these groups to converge different processes at different levels. This is an ongoing process that will take a while, but from what we’ve seen, once the convergence is made, it usually has very positive results for the business.

Q: Is everRun tested and approved by Siemens, Rockwell, etc.?
Yes. everRun works with a number of different automation systems and applications from Rockwell, Siemens, Johnson Controls, Dematic, Wonderware and many others. We’ve done qualification and certification testing with many vendors in the automation space. Because of the way that everRun is designed, it is almost transparent to the application, so we really can work with most vendors and have a very quick validation/certification process.

Q: Will a TCP connection from a SQL client to a SQL server be maintained through a failover?
At Marathon, we take a different approach to application availability. It’s not about failover and recovery, it’s about keeping systems up and running, even during a failure, with no impact to the users or the data. Failover isn’t something that we really do. We can actually maintain those connections, even with a failure, at all times if that’s what you need. We can maintain all connectivity, transparent to the user and the IP connections, and keep the system states intact.

Q: Does everRun work in both physical and virtual environments?
Yes, everRun works in both physical and virtual environments. We can protect both single and multiple workloads.

Q: What is the typical integration period to get everRun up and running at a site?
A typical engagement is about 2-3 days. The software itself installs very quickly and then after that there is the deployment and migration of applications, testing and training. WE provide these services through our everRun ONE program.

Q: What is the typical overhead of everRun?
That will vary based on the application. Anywhere from 5-15% depending on the characteristics of the applications – storage intensive, I/O intensive, etc. But 5-15% is a typical estimate.

Q: Are the partnerships validated in both physical and virtual environments? We use the Dematic voice picking application.
We do support Dematic applications in both physical and virtual environments. Some of our vendors have only tested physical, some virtual and some both. Our technology is very similar for both physical and virtual, and in most cases will work with most applications in both. If you have a specific application that you would like to check on, just give us a call.

Show Discussion / Comments (0)
Webinar  Interview  Manufacturing  SQL  Virtualization 

| More



Tuesday, March 9th, 2010 - 10:52 am EST

Q&A with Craig Resnick of ARC Advisory Group

Posted by: Michelle Liro

Next week Craig Resnick, research director and automation expert at ARC Advisory Group will be the guest speaker for our webinar "Best Practices for Preventing Downtime in Automation Systems."  We recently sat down with Craig to discuss some of the recent trends in the manufacturing and automation industries.

Q: What are some of the newer trends that you are seeing in the automation space?

Craig Resnick: A primary trend that we see at ARC is the convergence of automation and IT systems. Nearly every manufacturing company uses a variety of plant automation and enterprise IT systems to manage its operations. Plant floor systems, such as distributed control systems (DCS), programmable automation and logic controllers (PACs/PLCs), and a wide range of plant floor applications provide a wealth of real-time information regarding productivity, efficiency, equipment health, capability, and quality. Business systems, in turn, provide information on raw material costs, product orders and inventories, manufacturing resources, production schedules, etc. This wide range of information often remains isolated in systems such as manufacturing execution systems (MES), laboratory systems, maintenance systems, scheduling systems, enterprise resource planning (ERP) systems, supply chain management (SCM) systems, and customer relationship management (CRM) systems. Decisions based on data from any one of these system will always be less than optimal because, without the corresponding information from the other systems, the information will be incomplete.

To close this gap between automation and IT systems, and to address the trend of the plant floor becoming more IT-centric, ARC has defined a new space, defined as Collaborative Production Systems. These new systems consist of platforms in which the controls layer domains of process, logic, motion, building automation, and power control systems converge with the information layer domains of production management and MES systems. These converged systems enable, for example, the required data and information to be directly tied into applications such as corporate reporting and manufacturing compliance. Collaborative Production Systems will become the industrial blade server that provides full monitoring and control of the enterprise, from the office to the plant floor, sharing that information with the supply chain to, for example, procure materials and resources and purchase or sell power at the optimal times and prices from the smart grid, while providing full financial metrics and KPIs to ERP systems to maximize profitability.


Q: Now that corporate reporting and systems are heavily tied into the “factory floor”, how is that changing the need for system availability and data protection?

Craig Resnick: The need for system availability and data protection continues to expand, driven by a combination of issues ranging from global competition to regulatory requirements. Process safety and critical control are primarily focused on system availability and process uptime. As a specific example, take the Pharmaceutical industry, where data and batch information can never be lost or interrupted. System availability and data protection needs are also forcing E-records regulations to evolve across the globe. In the US, this includes 21 CFR Part 11, as well as the FDA’s Good Manufacturing Practice (GMP) and Process Analytical Technology (PAT) initiatives. In Europe, this includes Annex 11 of the EU GMPs, electronic Signatures Directive 1999/93/EC, and Data Protection Directive 95/46/EC. The European Data Protection Directive requires even more protection on data than the current FDA regulations and extends this requirement to clinical trials patients, as all clinical trials data requires maximum protection to remain compliant with regulations.

Unscheduled downtime is expensive. It often impacts production’s ability to meet its schedule and may cause missed customer commitments. Unplanned downtime, which also includes unexpected stoppages resulting from equipment failure, operator error, or nuisance trips, is the nemesis of all manufacturers. Statistics on the impact of unplanned downtime on plant operations show that it accounts for 2 to 5 percent of production lost in, for example, the petrochemical industry. Unscheduled downtime is also costly in terms of equipment damage, environmental harm, and worker safety. The cost of downtime is reflected in a primary key performance indicator (KPI) used by manufacturers known as Dynamic Overall Equipment Effectiveness (OEE), which helps determine the real-time impact of the performance of any individual process or piece of equipment on the overall efficiency of the plant. Unscheduled downtime is a primary factor that significantly lowers Dynamic OEE, which translates to the manufacturer decreasing both its efficiency and profitability.

Q: What are some of the basic steps that companies can implement to ensure the availability of their systems?

Craig Resnick: The first step that companies can implement to ensure the availability of their systems is to maximize their operator’s effectiveness in the control room, which is essential to minimize the risks of accidents, eliminate unscheduled downtime, and maximize production quality. The global process industry loses $20 billion, or five percent of annual production, due to unscheduled downtime and poor quality. ARC estimates that almost 80 percent of these losses are preventable and 40 percent of those preventable losses are primarily the result of human or operator error. Maximizing operator effectiveness requires automating as many functions as technology will allow, as well as reducing complexity wherever possible. For example there are still many plants where operators monitor the processes and collect data manually or semi-automatically using chart recorders. This process is both tedious and error prone, and does not provide appropriate process insight or instill a sense of ownership among the control room operators.

The Abnormal Situation Management Consortium (ASM) points out that most incidences occur from multiple modes of failure. Preventable human error is a contributing factor to these losses, but is hardly the only cause. Preventing abnormal situations requires a multilayered multi-discipline approach focused on maximizing production throughput, efficiency and quality while minimizing lost production time and preventing damage to assets and endangerment to personnel. This approach requires deploying collaborative production systems designed and implemented to be able to deliver high levels of availability and fault-tolerance expected from any other mission critical industrial system. This typically requires effective data backup mechanisms, redundant controllers for critical applications, plus industrial grade software. Manufacturers are also deploying more fault tolerant server technology to ensure continuous availability of these mission critical applications; the continuous flow of vital products to the market; and the avoidance of the potentially negative financial, social, or environmental impact that operating without high availability fault-tolerant systems might bring.

 

To learn more about preventing downtime in your automation applications, be sure to attend next week's webinar where Craig will provide expert info on steps for reducing the human error that leads to downtime, how to protect your hardware, storage and networks for complete availability coverage, and how to protect against a complete site failure. You can register here.
 

Show Discussion / Comments (0)
Manufacturing  Downtime  Fault Tolerance  High Availability  Interview  Webcast  Webinar 

| More



Monday, March 8th, 2010 - 11:29 am EST

Best Practices for Creating Disaster Recovery Plans for Your SMB

Posted by: Michelle Liro

Marathon’s Sr. Director of Products, Michael Bilancieri, recently answered some questions for Paul Mah of ITBusinessEdge.com regarding disaster recovery planning for small & medium businesses. A few of Michael’s answers are highlighted below. For the complete Q&A with Paul Mah, see the article here.

Mah: Any tips to help SMBs with constrained budgets get management’s approval to implement a DR program?
Bilancieri: This may be the most important part of the process. Without support from the senior management team, any DR plan will be hard to get off the ground. The key takeaway here is to translate the technical language into business terms.

Since DR is not primarily about the technology (it is about the business value), it is important to clearly express what downtime means in terms of revenue loss. By creating a chart, organized by each application, it is easy to clearly articulate how much revenue is lost across each application for a certain amount of time.

Mah: What are the best criteria for determining an optimal disaster recovery plan?
Bilancieri: First, you have to identify what it is you need to accomplish. This includes defining the recovery time objectives (RTO), which is the amount of time applications can be unavailable and recovery point objectives (RPO), which is the amount of data that can be lost when a recovery is required.

Keep in mind that these values will likely vary for each of your different applications. Implementing incorrect or incomplete solutions will result in wasted time and resources. Check with your users and clients to determine their requirements and any service level agreements that must be met.

Mah: Once you determine exactly what your needs, how do you select a plan?
Bilancieri: DO YOUR HOMEWORK! Seriously, there are so many different products that claim to be “DR” solutions, all approaching the problem from different angles, it can be very confusing to determine what actually does the job you are looking for it to perform. As you research different products to implement as part of your DR plan, be sure to ask specifically what their product does (copies just the data, takes data snapshots, captures complete images of the full system, etc.) and don’t be afraid to ask probing questions.

Many vendors make the same claims using the same terms but actually deliver very different results. If you are going to test these solutions in-house, which is recommended, try to do the test under similar conditions as your production environment, with similar system and application loads. Oftentimes, something works well in a test environment [where there is] no real processing happening, [but] fails to function adequately once deployed in the live production environment.

Mah: What would a DR plan look like for a company that may face natural disasters such as hurricanes and flooding?
Bilancieri: Since hurricanes and floods can cause severe damage that can result in long-term outages, it would be wise to implement a solution that protects your systems between locations that could not be affected by the same disaster. Ensure that the backup, or DR, site is planned for a location that can be readily accessible by your users and clients should the primary location be destroyed or otherwise inaccessible.

Marathon has a customer based in Georgia, The Sullivan Group, which implemented a disaster recovery plan just for this reason. The team decided to virtualize its data center with Citrix XenServer and implement Marathon's everRun VM solution to provide redundant virtual machines and synchronized mirroring of the entire system including network, applications and data. The Sullivan Group has a small IT staff but needs to be continuously available for their clients, so they needed a solution that was fully automated and offered simply implementation.

Their first step was to identify what their customers’ needs were - and they decided that they needed continuous protection. Second, the team determined exactly what they could afford, and the ROI they would see from implementing DR software. They already knew that they would constantly face the threat of storms, and that they needed their data to be backed up in a remote location. Finally, they determined exactly what solution their IT staff could support and decided exactly which business applications needed to be fully available.
 

 

Show Discussion / Comments (0)
Disaster Recovery  High Availability  Interview 

| More



Monday, March 1st, 2010 - 9:22 am EST

Uptime in healthcare: Saving lives is key

Posted by: Michelle Liro

Nick Turnbull, Marathon’s VP of international sales explored the importance of ensuring that hospitals and healthcare providers suitably protect themselves for the possible failure of their IT systems to minimise disruption in a recent article published in HES magazine. An excerpt from the article is below. For more information on this topic, you can also download our recent white paper "Finding a Cure for Downtime: 7 Steps to Reducing Downtime in Healthcare Information Systems."

The management of administrative, financial and clinical aspects of a hospital rely on continuous uptime. Every single minute of downtime can jeopardise compliance, revenue and most importantly the health and wellbeing of patients. Downtime is a risk no hospital or healthcare provider can afford to take.

Hospital equipment and systems are designed for a variety of tasks, some keep track of the administrative issues of a hospital, or look after clinical information systems that concentrate on patient-related and clinical-state related data such as the electronic patient record (EPR) and the monitoring of life support machines, MRI scanners, etc. The ensured access to all IT systems is paramount for the smooth running of a hospital and to ensure quality patient care.

EPRs and the need for centralised patient information have seen regular attention in the media recently. EPR systems have the potential to bring huge benefits to patients and they are being implemented in health systems across the developed world. Storing and sharing health information electronically can help to speed up clinical communication, reduce the number of errors, and assist doctors in diagnosis and treatment. Equally, this kind of electronic data can also have vast potential to improve the quality of healthcare audit and research. However, increasing access to data through ERP systems also brings new risks to the privacy and security of health records, as well as practical aspects that need to be catered for in order to reap the full benefits of such a system without any disadvantages.

The importance of being able to access EPRs becomes apparent when looking at the accident and emergency setting or for example the cancer unit of a specific hospital. It is here that access to a patient’s medical history can become a matter of life and death.

It is clear that nowadays IT sits at the heart of modern hospitals, so it is key to ensure IT systems are available to healthcare professionals 24 hours 7 days a week. This means that hospital management and their IT support needs to be sure that the technology they deploy can monitor the entire system around the clock. So, for the last few years, hospitals and healthcare providers have turned to the latest and greatest technologies to support their systems, minimising the risk of disruptions to operations and ensuring availability.

The right solution needs to be 100% effective, which is only possibly if it monitors and receives data continually 24 hours, 7 days a week, with absolutely no hiccups. If this is not the case all the time, there could be lives at stake. If any unexpected downtime occurs, the ability to access records, the continued running of life support monitors and MRI scanners is at risk. Clearly, hospitals and healthcare providers need to ensure that their systems are adequately protected against any unexpected IT failures.

While undoubtedly many hospitals and healthcare providers are using various different systems, they should not forget that these systems need to be protected. If the servers behind the systems experience downtime, it could cause havoc with patient-facing devices and the EPR system. IT system availability is no longer an ideal – it is a necessity. That is why hospitals and healthcare providers need to be sure that when adopting the newest, ‘safest’ technologies, they also ensure that they come with rock solid availability. Continuous uptime for the healthcare industry is absolutely essential to ensure the success of the business as well as the safety of patients.
 

Show Discussion / Comments (0)
Healthcare  High Availability 

| More



Thursday, February 11th, 2010 - 3:47 pm EST

NEC Philips and Marathon Announce Partnership

Posted by: Michelle Liro

We’re very excited to announce our partnership with NEC Philips Unified Solutions today. Through this partnership, Marathon’s everRun software will provide high availability and fault tolerant capabilities for several of NEC Philips’ business communication systems, including the SIP@Net server and Business ConneCT and MA4000 applications. These new combined offerings will be available to channel partners and customers throughout EMEA.

From left to right:
Gerard Wubben, General Manager, Raxco Software; Nick Turnbull, VP International Sales, Marathon Technologies; Rafael Costa, VP Worldwide Sales Marathon Technologies; Yoshihiko Katsura, Senior VP Portfolio, Applications & Operations NEC Philips Unified Solutions; Paul Kievit, President NEC Philips Unified Solutions; Jim Welch, President & CEO Marathon Technologies; Benne van der Lugt, Director Enterprise solutions portfolio NEC Philips Unified Solutions; and Marco Koenen, Enterprise Business Manager NEC Philips Unified Solutions
 

Signing the agreement: Paul Kievit, President NEC Philips Unified Solutions and Jim Welch, President & CEO Marathon Technologies

Show Discussion / Comments (0)
Partners  Announcements  EverRun 

| More



Wednesday, February 3rd, 2010 - 4:38 pm EST

Top 5 Tips for Branch Office Application Availability

Posted by: Michelle Liro

Keeping your applications “always-on” for users is no easy task, and can be particularly tricky for branch or remote locations where you probably have little or no IT staff to support your efforts. Forrester Research senior analyst Stephanie Balaouras has been studying this trend and has pulled together the top 5 best practices for supporting application availability at remote and branch locations. She presented these during a webinar last month and we've also summarized them below.


TIP #1 – Don't Overlook Remote Location Availability

While this may seem like an obvious point, it’s actually very common for IT departments to overlook their branch and remote locations when it comes to application availability. You can’t neglect these offices for both high availability (HA) and disaster recovery (DR) plans—you need a holistic approach to protect all of your business applications, no matter where they are located. This also means that you need to factor in these systems when planning your IT budget as well.

According to recent Forrester Research data, IT systems at remote and branch office locations account for more than 20% of your total infrastructure. They are critical to your business process and operations. Today, a lot of these locations don’t have HA or DR, and in some cases, they don’t even have basic back-up. Make sure that these offices and locations aren’t forgotten as part of your HA and DR plans.

TIP #2 – Classify Systems by Criticallity

When developing your strategy for operational HA and DR, best practices include performing a business impact analysis. This doesn’t have to be a lengthy process—you just need to map the dependent systems for each business process, and then create a rough estimate the cost of downtime for each. Once you have that information, you can determine availability rates as well as recovery objectives. As part of that process you should also identify the most probable types of downtime. When you put that all together, you can classify systems by criticality, such as mission critical, business critical, business supporting, etc., and you can then determine the availability rates needed for each of those systems.

TIP #3 – Develop Tiers of Service for Availability

Once you understand your range of recovery objectives, it helps to have an IT availability and service continuity catalog. This catalog defines a range of service tiers. Forrester typically sees four levels: mission critical, business critical, business important and business supporting. Each of these tiers has associated recovery objectives, technology pre-requisites and the costs to deliver that service. This catalog helps to simplify your strategy, by allowing you to assign appropriate tier classifications to new systems quickly and easily.

Another benefit of using this method is that it also helps you to limit the number of point products you are using for HA and DR. The more point products you are using, the more you complicate the sequencing and complexity of preventing a failure or recovering from a failure. Keep it simple. Every time you deploy a new application or system, assign a tier from your catalog, put the appropriate protection in place, and then communicate that to the business.


TIP #4 – Measure Availability from the End-User Perspective

Well-written objectives measure both planned and unplanned downtime and also take into account the timing of downtime. For example, you don’t take your systems down for planned maintenance during peak sales periods or at 1pm on a weekday when your traffic is at its highest level. You select times when users will be least affected. Availability isn’t about the individual IT system, infrastructure or component. Technology uptime is important to track but is not a true measure of availability. True availability has to be measured from the end user perspective. If the application or service is not available for use, even if the individual components are functioning, then that means the service is down. When making decisions about HA and DR strategies, you have to look at availability from a people perspective, not a technology perspective.


TIP #5 – Make Availability Part of Every IT Decision

Availability is no longer an optional practice. It’s essential. It’s something you owe to your employees, your customers, your partners and your investors. Application resiliency has to be part of the planning process right from the start—HA and DR should not be an after-thought. Even in remote and branch locations, these applications are critical to the success of the business, so availability of the systems should be included during the planning phases of the project, rather than an add-on after the project is completed.

 

Show Discussion / Comments (1)
Availability  Disaster Recovery  High Availability 

| More



Monday, January 18th, 2010 - 8:52 am EST

Q&A with Forrester Analyst Stephanie Balaouras

Posted by: Michelle Liro

Last Thursday’s webinar “Application Availability for Remote & Branch Locations” with Forrester analyst Stephanie Balaouras was packed with useful tips and best practices for protecting remote and branch offices from application service disruption. Stephanie has conducted extensive research in this area and shared her Top 5 Best Practices during the webinar. A recording of the webinar is now available in case you missed the live event.

The summary of the webinar Q&A with Stephanie and Michael Bilancieri, Sr. Director of Products for Marathon, is below.


Q: I like the idea of integrating HA and DR plans. How often should those plans be updated?
A: Stephanie Balaouras, Forrester: The ideal scenario is to update your high availability and disaster recovery plans continuously as part of your change management and configuration management. That’s the ideal scenario. They should be integrated into day-to-day operations and your plans should be updated as a part of that. If that’s not feasible, then at least quarterly updates should be made to the plans. One of the hardest parts of DR is that if you don’t keep the plans updated and you’re not testing regularly you’ll have major configuration drift between your sites. When you have a failure or disaster and have to invoke your DR plan is not the time you want to find out just how far your configurations have drifted and that you can’t recover. One solution for this is the combination of virtualization and replication, which can reduce complexity because in most cases you’re actually replicating the configuration changes as they happen.

Q: On your disaster recovery continuum slide (slide #14), can I think of that as a disaster recovery maturity model?
A:
Stephanie Balaouras, Forrester: Not really. When I evaluate a company for disaster recovery maturity, I look at two dimensions – process and technology.

On the process side, I look at things such as: Have you run a business impact analysis? What about a risk assessment? Are preventative measures in place? Do you have documented plans, and are they up to date? How often do you test them?

On the technology side, I look at things like the RTO and RPO that you have defined: Are they matched up with the appropriate technology solution? If RTO is less than 2 hours and RPO is zero then I would expect that you are replicating data and doing rapid system restart with virtualization. If I find that you are using tape in that situation, then that’s a problem. I think when it comes to maturity you have to look at process and technology together. Not only should you match up with the right technology, but you might actually leverage more than one technology depending on your needs.

Q: Traditionally, HA & DR at remote locations has not been a priority. Do you see that attitude changing with clients that you talk to?
A:
Stephanie Balaouras, Forrester: I do see things changing. I run an annual survey with the Disaster Recovery Journal. One of the questions we ask is: How critical is it to upgrade disaster recovery at your sites? The answer is always either “high” to “extremely critical”. It doesn’t always get addressed the way we want it to, but the recognition is there.

I see three main drivers for this trend. First, availability and disaster recovery are now considered a fiduciary responsibility. It’s no longer an optional practice. It’s essential. It’s something you owe to your employees, your customers, your partners and your investors. Second is the cost of downtime. Companies are much savvier at calculating this cost and aware of the problems they can avoid by not having downtime. When you understand those costs, you can make the right technology investment choices. The final driver I see is the changing business environment. A lot of companies are operating globally on a near 24x7 basis. Like an online retailer for example. We’re operating close to 24x7 and there is no tolerance for downtime anymore. All three of these – fiduciary responsibility, cost of downtime and a 24x7 business environment are moving the needle quite a bit.

Q: In my environment, our IT staff says they have no way to measure if an application is up or not. They can tell us if a server is up, or if a database is up, but not the application. What solutions have you seen that can tackle that issue?
A:
Stephanie Balaouras, Forrester: There’s a couple of ways to address that. There are third party application monitoring tools from the large system vendors. They are great for basic monitoring and telling you if your application is up or down, but they don’t tell you about degradation of performance. The other option is that different HA solutions will be able to detect whether the applications is up or down.

 

Michael Bilancieri, Sr. Director of Products for Marathon, answered your questions about everRun software.

Q: Does everRun have any kind of alerting capabilities for system problems?
A:
Yes, everRun has alerts. You can send notifications back to any location. It will tell you that something has failed – it’s not a downed system because everRun kept it going through redundancy, but it alerts you that it needs attention.

Q: Does everRun require that the two servers to be identical?
A:
The servers don’t have to be exactly the same; however, the CPUs should be identical as a best practice. For what we call our Level 2 protection (for component level protection of the network and disk), you can use different RAM and spindle speeds on storage. Level 3 protected workloads require the servers to be alike. You can view a complete list of supported processors on our website.

Q: How much of CPU and IO payload will we have by running the everRun software?
A:
It varies depending on the applications and systems and where the load may be. The general range is from 5-10%. We have specific application performance documentation for Exchange 2007 and XenApp that you can download from our website.

Q: I understand from your presentation that everRun doesn’t require a SAN, but does it work with SAN?
A:
everRun can support a SAN in multiple ways. everRun can support a SAN where you have a single copy of the data. And both servers will connect to the single copy of the data. everRun also supports a SAN where one of the servers is connected to that SAN and the other server has its own storage and we can mirror between that. A lot of our customers are using that option to provide data protection and fault tolerance at the data level. We can use different types of storage on either side.
A great benefit of everRun is that is has an agnostic approach to storage. Pretty much any type of storage will work. iSCSI, fiber, direct attached, etc.

Q: Does Marathon have a strategy for SAP environments?
A
: Applications are transparent to everRun. We protect many types of SQL, Oracle and SAP applications. There are some best practices around that and we can offer you assistance with those. everRun is invisible to the application, so there are no configuration and design issues. You design your application the way you need to for your business and then everRun protects it without needing changes.

Q: What versions of Windows Server does everRun support?
A:
everRun supports Windows Server 2003 SP2 Standard and Enterprise Editions, 32-bit and 64-bit, as well as Windows Server 2008 Standard and Enterprise Editions, 64-bit.

Q: The requirement for redundant systems is obvious, one local and one remote, but I am concerned with the return of the repaired server back to the primary server role. Has that issue been also automated in your application?

A: Replacing one of the servers in an everRun configuration is quite simple as well. It is required that the everRun software be installed and the server be physically connected to the remaining everRun system. Once connected and configured to see each other as a pair, there is a ‘re-pairing’ process that is initiated via command which starts the process of creating the redundant OS environment on the new system and mirroring all of the storage to the new system. Once the mirroring is complete, the system is once again fully protected.


 

Show Discussion / Comments (1)
Availability  High Availability  Webcast 

| More



Thursday, December 10th, 2009 - 3:15 pm EST

Q&A from the SharePoint HA webinar

Posted by: Michelle Liro

Tom Reed, Marathon’s Senior Systems Engineer and MCSE, hosted our most recent webinar on SharePoint High Availability. We’ve summarized the Q&A portion of the webinar below. A recording of the webinar is also available for on-demand viewing.

Q: Do I have to have identical servers to use everRun?

The servers don’t have to be exactly the same; however, the CPUs should be identical as a best practice. For what we call our Level 2 protection (for component level protection of the network and disk), you can use different RAM and spindle speeds on storage. Level 3 protected workloads require the servers to be alike. You can view a complete list of supported processors on our website. 

Q: What kind of storage do I need to use everRun?

One of the great things about everRun is that it is storage agnostic. It doesn’t matter what type of storage you are running. You can work with SAS drives, and iSCSi, local or fibre SAN, pretty much any type of storage and it doesn’t have to be the same on both sides. Some customers using everRun SplitSite are using SAN at the primary data center and local disk at the secondary data center, which can save storage costs.

 Q: Does everRun DR integrate with SRM from VMware and how does this work with VMs as a second server?

VMware SRM, or site recovery manager, is designed to asynchronously replicate the actual virtual machines to a secondary site. It does this by using replication software at the SAN level. So once you purchase SRM you have to purchase SAN replication software as well. If you didn’t want to replicate the actual virtual machines over, what you could do is use everRun DR, the difference being that we do not bring over the current virtual machine. We have a separate vm built and we have the capability to start and stop service, recover from a single point in time, and drag and drop recover files on a replicated data drive. If you are looking for an in-depth comparison of VMware SRM vs. everRun DR, you can contact us at 800.884.6425 or via email for more info.

 Q: How much overhead does everRun place on the protected server?

General use cases today are 3-10%. We have application performance documentation for Exchange 2007 and XenApp that you can download from our website. We will have a similar document for SharePoint in early 2010.

 Q: How does everRun differ from a backup solution?

We have found that there is a lot of confusion in the industry around the difference between backup vs. high availability. Backup solutions are designed to provide a disk-to-disk or disk-to-tape scenario for recovery of data. Backup is a recovery option, not a prevention option. It lets you recover to your last point in time, last snapshot, or last tape. Again, this will not prevent downtime or provide availability for users. It is a means of recovery. everRun DR can provide this type of solution if this is what your business needs. If your goal is to prevent outages and data loss (rather than recover from them), what you really want is a local high availability solution. 

Q: What version of Windows does everRun support?

everRun supports Windows Server 2003 SP2 Standard and Enterprise Editions, 32-bit and 64-bit, as well as Windows Server 2008 Standard and Enterprise Editions, 64-bit.

 Q: Does everRun work with SQL 2008?

Yes. everRun supports any Windows application without requiring changes or customization. Because everRun resides below the operating environment, we are protected underneath that. We have a number of ISVs that use our software with their applications and they use us because they don’t have to make any changes to their software. It’s not tied into the application, and doesn’t need to be “cluster aware” or anything similar to that.

 Q: Can I use everRun between two VMs? Meaning two VMs instead of two physical servers?

We build out the virtual machines when you install our software, so if you’re using our VMs to build out your machines, then we can do that.

 Q: Do you have experience using everRun in education environments?

Yes, a couple of examples of everRun being used in education environments include Michigan State University in the US, and Wellcome Trust Sanger Institute in the UK. We have several additional education customer examples and references that we can provide to you. Give us a call at 800.884.6425 for more information.

Q: How do you determine when to use everRun HA vs. everRun DR solution?

A good method for determining which solution is most appropriate for your situation is to take a closer look at your Recovery Point Objective (RPO) and your Recovery Time Objective (RTO). How long can you be down and how much data can you lose? If you can be down for several days, then you want to look at a DR solution. Just take into consideration that while DR failover is sometimes necessary, it can be a lengthy, complex process and is sometimes invasive to your environment. First you have to failover to the DR site and then failback when the primary site is restored, which can be very time consuming. 

The majority of failures are not catastrophic. Most are pretty common like network issues or hardware failures. For this scenario what you should really look at is local high availability protection. For the most complete protection overall, best practices are to have local high availability protection and then DR as a back-up for a major disaster. Then at the DR site you should also have an HA solution because if you do have that major catastrophe and failover, you want to make sure that secondary environment is protected while you are re-building the primary site.

 Q: What are some large county government examples using everRun software?

everRun has been deployed by many different government agencies. You can read about deployments at the Brookline Police Department, the County of Chester (Pennsylvania), and the City of Santa Rosa, California Utilities Department on our website. We also have many more government customer examples and references that we can provide to you. Give us a call at 800.884.6425 for more information.

 Q: When using everRun, can I use the secondary server to backup the data to avoid impact on the primary production server, or will both servers feel the impact during the backup window?

You should run your backup on the active server. On the secondary server, the workloads are in paused mode, so you can’t run a backup agent there. If you run it on the primary server, then it’s cloned over to the secondary server automatically.

 Q: Does everRun guarantee no downtime, or 99.999?

Yes,we provide 99.999% (5 9’s) protection with our Level 3 system fault tolerant protection.

 

Show Discussion / Comments (1)
Sharepoint  EverRun  Webinar 

| More



Thursday, December 10th, 2009 - 9:35 am EST

Top 5 High Availability Topics of 2009

Posted by: Michelle Liro

It’s always interesting at this time of year to take a look back at what was top of mind for our newsletter readers. It’s also a great opportunity for you to discover a key topic that you might have missed the first time around. Here are our top 5 most downloaded articles and white papers of 2009:

1. Configuring High Availability for Windows Server 2008 Environments
2. Optimizing Exchange High Availability - A New Approach
3. Increasing Reliability and Availability in a Virtualized SQL Server Environment
4. Reduce Downtime by 70% - Without Spending a Dime
5. iX Magazine product review: vSphere 4 FT vs. Citrix XenServer with everRun VM
 

If you would like to stay current with latest trends, developments and tools in the world of high availability and disaster reocvery, be sure to sign up for our monthly newsletter by sending an email to mstec@marathontechnologies.com or click on the Resource Center and look for the sign-up box in the right column.

Show Discussion / Comments (0)
High Availability  Citrix  Exchange  Fault Tolerance  Marathon  Windows 

| More



Wednesday, December 2nd, 2009 - 9:44 am EST

everRun Goes for the Gold

Posted by: Michelle Liro

The 2010 Winter Olympics may still be a few months away, but everRun has recently added a couple of medals to the trophy case, including the 2009 Windows IT Pro Magazine Editors Gold Award in the “Best High Availability/Disaster Recovery Product” category, and the Bronze award in the "Best Mid-range Software" category from TechAwards Circle.

Marathon has received 16 industry awards in the last two years. Congratulations to the everRun team for producing such an outstanding product worthy of industry recognition 16 times over!

eveRun wins 2009 Windows IT Pro Gold Award

Show Discussion / Comments (0)
Awards  EverRun  Marathon 

| More



Monday, November 30th, 2009 - 1:09 pm EST

Protecting SaaS with Automated High Availability

Posted by: Michelle Liro

J. Knipper and Company, Inc. , a leader in the healthcare marketing industry recently implmented Marathon's everRun automated availability software for the continued successful management of two specific high-level downtime risks:

  • the protection of building systems that regulate their refrigerated pharmaceutical samples. FDA regulations require stringent control of these systems.
  • the protection of their MyPharmaSuite™ Software as a Service (SaaS) web applications. Downtime here could result in customer service headaches and frustration.

Founded in 1986, J. Knipper provides healthcare marketing solutions in direct marketing, sampling, compliance, information technology, and sales force productivity. Going beyond its roots in direct marketing and sample distribution, Knipper continues to offer innovative solutions to sales force challenges, including MyPharmaRep.com (online solution for vacant-territory coverage) and MySampleCloset.com (online sample ordering).

The company has invested considerably in its physical and data infrastructure over the years: the company currently has 270,000 square feet of space, with 12,000 of that for refrigerated and 7,500 for controlled substances, along with an advanced Data Center, fully protected for the secure and safe handling of database management and sample supplies. Knipper is one of a handful of companies approved as an AMA DBL (database licensee). With its new infrastructure and SaaS product suite, Knipper needed to find a high availability solution that would deliver the 24x7 reliability required for the safe handling of application hosting, database management, and enterprise services.

“We have several client-facing web applications designed to enable physicians to order products and to provide sales representatives with a means to order samples and access product literature. It’s extremely important that these web applications are available 24x7. Many of the reps and physicians that our products serve are ordering in the middle of the night, early in the morning, at all times of the day,” said Knipper’s web systems engineer Marc Gerardi.

Knipper was using automatic system health checks with third party tools and file sync technology to control these downtime risks. “It was a time consuming manual process of systems recovery requiring dedicated monitoring personnel and rapid group response to mitigate issues when they occurred,” said Tony Quintenz, Knipper’s Director of Network Services. “The overhead associated with mitigating downtime, including the administrative and operating costs, as well as the learning curve and time required to train employees in the process was a significant burden on the entire IT team,” added Mr. Quintenz.

Knipper is now using Marathon’s everRun software, running on standard Dell servers, to guarantee that its critical applications, including the warehouse management system and web applications, would be available and operational at all times with a lower overhead and higher ROI. everRun is used to protect many of Knipper’s enterprise production environments from downtime, including several Dell PowerEdge R900 servers with Microsoft SQL Server. Knipper purchased, installed and configured the everRun software quickly and easily.

“We chose Marathon because they offered the best package overall. everRun offered real-time synchronization which is key for our 24x7 operations, it’s cost-effective and required minimal training for our employees. Another winning factor was the simple implementation; we were set up within a matter of hours, not days,” said Mr. Quintenz.

Since Knipper finished the everRun implementation, its enterprise environments, including its warehouse management system and client facing web applications have maintained the highest level of availability in spite of planned and unplanned events. “It’s a great product, we are now able to maintain our high level of redundancy, reliability and flexibility for our web-based products and enterprise services in the most cost effective and efficient manner,” added Gerardi.

After realizing the disaster recovery and data protection benefits of supporting its critical applications with everRun, Knipper plans to expand its use for additional web applications and other solutions. “As we expand more of our services on the web and offer additional options for customers, we will look to everRun for the continued protection we need.”

 

Show Discussion / Comments (0)
Case Study  Healthcare 

| More



Monday, November 16th, 2009 - 10:43 am EST

High Availability Webinar Q&A

Posted by: Michelle Liro

We had some great questions during last week's webinar High Availability Doesn't Have to be Expensive. A recap of the Q&A is below, including the questions that we weren't able to get to because of time constraints. Be sure to check out our library of on-demand webinars, for this webinar, as well as other topics including SQL availability, Windows Server availability, everRun product demos and more.

Q: How is everRun different from replication solutions?
To understand how everRun is different from replication solutions, you need to take a look at the key differences between disaster recovery and high availability. Availability is about preventing outages instead of just recovering from them; about maintaining the user state with minimal interruptions. With disaster recovery (DR) and replication methods, if there is a failure, you lose connectivity for a period of time and then you have to recover your data and system state. Conversely, availability is about reducing and preventing downtime and keeping users online, even through a failure.

everRun is used for availability, both locally and for short-distance geographic separation as well. We have a replication and recovery solution as well that can be used for disaster recovery for long distances. You should determine what your objectives are: do I have to keep my applications up and running or do I just need to recover it if something fails? What’s the recovery time objective for each application? It’s up to your individual applications and what level of protection you need for each. Oftentimes, availability is a priority as downtime is not desirable, with DR also a requirement on top of that to ensure recovery in the event of a major outage.

Q: What kind of bandwidth requirement is needed for a two-site solution?
As a general rule of thumb, an OC3 connection is required per application workload being protected. Latency is really more critical than bandwidth and this will vary based on the applications and environment.

Q: How does everRun software compare with EMC’s RepliStor and AutoStart applications?
everRun is different from these products because it provides high availability in an automated way with fault tolerant capabilities to prevent user interruptions when hardware fails. This goes back to prevention rather than recovery.

RepliStor is a DR/replication product. While it does provide a failover/restart capability, as do most DR solutions, it is really best used for failover in the event of a major disaster. There’s usually a substantial amount of downtime and a manual failover process to get the systems back online at the secondary site and to failback once the primary site is back online. For DR, you probably want to be able to specify when your systems fail over. But, you will lose some data because this is an asynchronous solution. For minor outages, you really don’t want this. For example, let’s say the power goes out in your primary location for an hour. It can take even longer than that with DR systems to failover to the secondary DR site. You would have been better off just waiting an hour for the power to come back on and restarting the primary systems. RepliStor is more suited for major disaster scenarios, rather than just minor local or regional failures.

Auto-Start is more of a clustering type of product designed for availability and application restarts. It’s not designed to prevent downtime due to failures, but rather to recover from them.

Q: Can everRun be used for planned downtime?
Planned downtime for patches, upgrades, etc. can sometimes cost as much or more to your company as unplanned downtime. The answer to this question will depend on the type of updates. Some OS upgrades do require that there be a restart for the changes to take effect. For some types of planned maintenance, everRun can eliminate the need for downtime. For the others, one of the main things everRun can do is to reduce the risk of updating a system and not having it come back online. For example, you’ve just overwritten your production system and it worked in a test environment, but now it won’t come up in production. We can reduce that risk greatly, by getting it back online quickly without the need to rebuild the server.

Q: What is the difference between everRun and vMotion and VMware HA?
These are two different products, so we'll start with VMware HA. The HA product is a failover/restart capability. If you lose a host, the system will try to restart the virtual machines on another host on the pool. There’s no real guarantee here though. It’s going to try to find resources when a failure happens, but they might not be there. There are some checks in place to warn when over using resources will impact the recovery plan but there is nothing to prevent this. When there’s a critical RTO though, it’s better to have something that is more assured like what everRun provides. everRun uses mirrored systems, so you always know that you have resources available in the event of a failure. everRun also protects the data – we don’t require a SAN. everRun can mirror data between to two systems or two buildings and it doesn’t have to be the same type of storage on both sides. It can be SAN on one side and NAS on the other. everRun can move the data between locations and keep it tied to the applications to keep your business running, even when there is a failure.

As far as vMotion, that is primarily used for planned downtime. Motion capabilities in general allow virtual machines to be moved or “motioned” while they are running from one host to another host. everRun can provide that capability as well. We call it online migration. If you want to take host offline for planned downtime for upgrades for example, we can do that. Motioning is really for planned downtime. If something fails unexpectedly, vMotion can’t help you there. everRun provides capabilities for both planned and unplanned downtime.

Q: What versions of Windows does everRun support?
everRun supports Windows Server 2003 SP2 Standard and Enterprise Editions, 32-bit and 64-bit, as well as Windows Server 2008 Standard and Enterprise Editions, 64-bit.

Q: In considering using everRun across two sites, is everRun doing real-time synchronization between the sites?
Yes it is. It’s writing the data on both systems in a synchronous manner, so that data is always complete—system data and applications are secure and exactly mirrored on the secondary server. It protects the entire environment—the operating system, registry, every setting, etc. is completely cloned. You can turn on that system on the other site and not have to rebuild the server. We maintain exact mirror copies of both servers. That goes back to our message about prevention and computing through failures, rather than downtime and recovery.

Q: Does everRun work with SQL 2008 and SharePoint 2010?
everRun sits below Windows. It’s not in the operating environment. We protect the entire environment, so anything in that environment is automatically protected, whether it’s SQL or Exchange or anything else, even custom applications. There is no customization needed for everRun to protect any Windows applications.

Q: What are the storage requirements for everRun?
everRun offers two storage configuration options: mirrored storage and a shared storage model. When using mirrored storage, everRun will synchronously mirror all storage between paired hosts; this includes the OS, the application, data, etc. This does not require similar storage vendors or types. One host can have SAN-attached storage while the other has local SCSI storage.

In a shared-storage configuration, the everRun paired hosts must be connected to the same storage device with access to the shared LUN’s. In this configuration, everRun does not mirror the data or protect against failures within the storage subsystem. Because of this, it is critical that you ensure proper configuration of the storage devices to protect against failures.

Q: Should everRun be set up on a seperate server?
everRun is typically deployed on two new servers, however an existing server can be utilized, requiring only one additional server.

Q: How is everRun different from the NeverFail product?
Neverfail is an asynchronous DR solution with failover/restart capabilities.
 

Show Discussion / Comments (2)
High Availability  Marathon  Webinar 

| More



Monday, November 9th, 2009 - 4:46 pm EST

Q&A with Jim Welch, Marathon Technologies

Posted by: Michelle Liro

Jim Welch, Marathon's President and CEO is featured in this week's Worcester Business Journal in the "Shop Talk" column. Here's an excerpt from the interview:

WBJ: How has this crazy economy impacted Marathon Technologies?

Jim Welch: I think, prior to me joining {Marathon}, sales cycles were getting longer and people were being more careful about what they were buying. However, the nice part of our core business is, if you need it, you need it. If you’re putting in an application that has to run and be reliable, you don’t have a choice. It’s part of the infrastructure that keeps your business running. So, from that perspective, we’ve weathered the storm fairly well and I think as you look forward, now that that general IT spending is starting to ease up a little bit, we’re going to see a lift.

WBJ: Who are your customers?

Jim Welch: Typically, our best customers are ones that need their systems to never go down. But if you look at how things are changing over the last couple of years, we see a drive by IT shops to reduce costs by consolidating infrastructure. The way they consolidate is by putting more applications on fewer servers which stacks up their application risk, if you will, so if that one physical hardware fails, not one application fails but now three, five or eight could fail. So, in those environments we’re becoming more important as part of that infrastructure so they can rely on fewer servers.

Be sure to check out the rest of the interview, including the WBJ's interesting photo style on the Worcester Business Journal website.

Show Discussion / Comments (0)
Marathon  Interview 

| More



View earlier posts in the archive