NCD – Self-Managed Navisite Cloud Director Replication Best Practices
Navisite Cloud Director® (NCD) integrates with Zerto Virtual Replication (ZVR) disaster recovery protection, enabling hypervisor (or virtual machine monitor)-based replication between data centers (NCD-to-NCD Replication), or between your on-premise virtual environment (VE) and your NCD environment (Customer-to-NCD Replication). The replication service works with both VMware® vCenter™ and vCloud Director® (VCD) virtual environments.
Zerto Virtual Replication allows you to automatically and continuously replicate application data and virtual machine (VM) images, as well as system configurations and dependencies, in order to facilitate disaster recovery (DR).
This article provides best practice recommendations regarding replication. Adherence to the best practices listed in this document may lessen the risk of possible impacts related to replication.
- NCD-to-NCD ZVR replicates virtual applications (vApps) and their virtual machines (VMs) between your NCD vCloud environments (data centers) for disaster recovery.
- For an overview of the steps necessary to establish NCD-to-NCD replication, see Replicating Your vApps and VMs Between Data Centers.
- For examples and exercises serving as an introduction to Navisite Cloud Director NCD-to-NCD Replication functionality, see the NCD-to-NCD Zerto Replication Tutorial.
- Customer-to-NCD ZVR replicates an on-premise vCenter's VMs contained within defined Virtual Protection Groups (VPGs) to dedicated disaster recovery virtual applications (vApps) within Navisite Cloud Director.
- For an overview of the steps necessary to establish Customer-to-NCD replication, see Replicating your On-premise Virtual Environment to Navisite Cloud Director.
- For examples and exercises serving as an introduction to Navisite Cloud Director Customer-to-NCD Replication functionality, see the Customer-to-NCD Zerto Replication Tutorial.
- For commonly encountered Customer-to-NCD ZVR problems and detailed instructions for diagnosing and correcting them where applicable, see Customer-to-NCD Zerto Replication Troubleshooting.
A NCD Zerto Replication video tutorial is available here.
The full Zerto documentation set for the currently recommended version of Zerto Virtual Replication is available here.
Replication Design Considerations
Protected and Disaster Recovery (DR) Site Zerto Version Compatibility
It is recommended that self-managed NCD customers configuring replication use the latest Navisite-recommended version of Zerto Virtual Replication.As of March 4, 2020, the current Navisite-recommended version is Zerto 7.03.
For self-managed customers, it is the customer's responsibility to manage and maintain the installed version of Zerto Virtual Replication. Please check with the Navisite Support (at 866-729-0928) for latest recommended version.
For self-managed customers, it is the customer's responsibility to manage and maintain the installed version of Zerto Virtual Replication. Please check with the Navisite Support (at 866-729-0928) for latest recommended version.
Considerations for MPLS Connections
Zerto replication appliances utilize Linux PMTU (Path MTU) in order to negotiate the MTU (Maximum Transmission Unit) from the customer location to NCD. In some cases, when encapsulation is involved, PMTU cannot auto-detect and adjust the MTU, which can cause network issues. In such situations, please contact the Navisite Helpdesk (at 877-946-7942) for assistance in working with Zerto Support to change the MTU on the VRAs to remedy any MTU issue.Note: If a customer performs an upgrade to Zerto, the MTU changes are not persistent, and Zerto will have to reapply them.
Virtual Protection Group (VPG) Sizing Considerations
Navisite recommends keeping the number of VMs in a VPG as small as possible. VPGs should be sized into small logical groups (e.g., based on application, or client/user group).Internally, each disk in a VPG is individually managed by Zerto, and each is on its own VRA target appliance and assigned to an ESX host/datastore. Keeping VPGs small reduces resource utilization and the impact any single resource allocation issue may cause. Keeping VPGs small also results in shorter unavailability for each VPG during maintenance.
Smaller VPGs allow for a more granular display of which VMs are using your bandwidth, which can allow faster troubleshooting of any configuration or performance issues, and possibly avoid the expense of extra bandwidth.
See Estimating VPG Bandwidth in Customer-to-NCD Zerto Replication Troubleshooting for more information.
VM Considerations
Because there is no performance impact to using thin provisioning of the recovery disks, Navisite recommends using thin provisioning, rather than thick provisioning.Carefully consider the actual usage and need of your servers, and move high-usage but low-value data to "swap" volumes – this will replicate them once, but not later. This is ideal for items requiring a directory structure, but not actual data, such as print servers, temp volumes, etc.
Also consider overnight processing usage such as locally copied logs, logs copied from one server to another, nightly data import feeds, etc. If bitmap or delta syncing is observed in the morning, it could indicate that such events may be occurring, and you should consider moving such files to a disk marked as a "swap" volume.
Best practice documentation is available at the Zerto Support Portal.
Note: Navisite recommends that domain controllers not be replicated with Zerto, but that customers have a running domain controller in the cloud. This keeps the domain updated, and protects the domain controller from experiencing failovers.
Configuring VPNs between NCD and On-premise Environments
Customer-to-NCD replication requires VPN connectivity between your NCD environment and on-premise Edge Gateways to provide access to local networks dedicated for Zerto replication use.See Configuring VPNs Between On-Premise Networks and vDataCenters for details on VPN configuration in NCD.
Firewall rules configured for the VPN must provide the following connectivity:
- Ports 9081-10000 must be open to your NCD Org Network.
- Ports 4007, 4008, and 9081 must be open to your on-premise Zerto Replication Network.
Zerto Maintenance Best Practices
Replication Status Conditions
A replication status of "bitmap syncing" is not an error. Bitmap syncing could occur due to a burst of write activity which was cached on the source side and is being transmitted. If this occurs often or for long periods of time, it may indicate that more bandwidth is required, or that you have data that could be moved to a "swap" volume rather than requiring replication."Delta Sync" is also not a replication status error. It indicates that the changes were not captured and deltas are being computed.
VMs that are powered off will cause syncs to never progress beyond a certain percentage, and will also cause VPGs to not meet SLAs even if other VMs are replicating successfully. If you have a replicated VM that is expected to be in maintenance mode or powered off for a long period of time, Navisite recommends removing it from the VPG.
ZVM Maintenance Tasks
If you intend to remove replication (for example, to uninstall or delete a ZVM), Navisite recommends that you first export your VPG settings, un-pair (and save preseeds), and then uninstall and reinstall replication.Failing to do so may require that your Zerto Cloud Connector (ZCC) be deleted and recreated, which can result in delays. In such a case, be sure to provide the Navisite Helpdesk (at 877-946-7942) with details about all steps which have been performed to that point.
vCenter Maintenance Tasks
Navisite recommends that any vCenter upgrades be performed as in-place upgrades. Configuring a new vCenter results in the loss of all Zerto inventory and settings. Refer to Upgrading the Zerto Virtual Replication Environment for more information.Removing VMs, ESXi hosts, or datastores that Zerto believes are in use can cause issues. Navisite recommends always vMotioning VMs off hosts, and removing VRAs from hosts, rather than simply removing these items.
Keeping your connection active is important for replication and Zerto data consistency. If you wish to avoid using bandwidth (for example, if you expect large amounts of change and simply wish to delta sync when complete) it is better to pause replication, rather than taking down a VPN.
If you do experience disconnection, and replication does not commence once connected, a "Force Sync" on the VPGs may solve the problem.
VM Maintenance Tasks
VMs that are powered off will cause syncs to never progress beyond a certain percentage, and will also cause VPGs to not meet SLAs. If you have a replicated VM that is expected to be in maintenance mode or powered off for a long period of time, Navisite recommends removing it from the VPG.Failover Tests and Failovers
Although Failover Tests allow you to failover multiple VPGs at once, Navisite recommends staggering them using the order in which you wish services run.For example, certain infrastructure services (such as authentication and data services) need to be running before starting application services or web services. In most cases, the sequence should be:
- Infrastructure services
- Data services
- Application services
During a Failover Test, disk changes to both the "test VM" and the protected VM are stored in the journal. This may cause your journal to fill up, and it is suggested that Failover Tests be kept brief.
Seeding Zerto Replication Data at Navisite
When you replicate a large number of VMs and a large amount of data from customer premises to NCD, executing the replication "over the wire" via networking may be excessively time and performance intensive. For such circumstances, Navisite sometimes recommends seeding the Zerto data at Navisite. This section details when and how such seeding should occur.Conditions and Circumstances for Seeding
To decide whether seeding is preferable to over-wire transfer, consult Navisite replication experts about the quantity of data and resource conditions under which the replication is to occur. There is no hard and fast rule, threshold, or limit; Navisite considers and advises about preseeding on a case by case basis. The three main determining factors are bandwidth, size, and time. If a customer has sufficient bandwidth and can afford the time necessary to sync fully, the seeding procedure and shipping of a drive may not be necessary or preferable.Procedure and Considerations
Navisite has received the following direction on the seeding process from Zerto:- Your Navisite Cloud administrator configures a "preseed folder" within the customer's ZORG and on the recovery datastores that are part of that customer's available storage profiles. This is performed by Navisite technical staff who subsequently physically plug in the USB and have access to the data.
- The copy of the customer's VMDKs will be located on the recovery datastores in the preseed folder.
- The customer must be able to "customize" service profiles so they can configure the preseed disks via the Zerto GUI during initial VPG creation. Alternately, they can create the VPG without customizing, and then edit the VPG to configure the preseed disks (note that sync will start and proceed while they edit the VPG to preseed the disks).
- The customer provides USB attached media, uses that media for seeding, and then sends the media to Navisite. Navisite mounts the drive and presents it so that the customer can import the seeded records.
Contacting Support
Instructions for contacting Navisite Cloud Director Support can be found here.When contacting Support, please provide as much detail as possible relating to the issue being experienced, including information on all Navisite-provided troubleshooting procedures attempted prior to contacting Support.
Zerto Acronyms
VPG | Virtual Protection Group | A logical grouping of Virtual Machines (VMs) that is used to organize Zerto replication. |
SLA | Service Level Agreement | Agreement between customer and service provider which defines the service that is to be delivered to the customer – a Zerto SLA is an agreement of how much history will be made available for disaster recovery (DR) within the protected site journal files. |
DR | Disaster Recovery | DR is the process of using a replication service to recover computing resources in the case of a disaster. DR requires human intervention to start, and does not provide instantaneous recovery of lost computing resources. DR is a way of preparing for disaster situations that will allow for "quick" recovery of computing resources. |
ZVM | Zerto Virtual Manager | A ZVM is a service that runs on a Windows VM within a cloud service-provided data center. The ZVM manages all activities related to DR within that data center, including interactions with the vCenter itself. There is one ZVM per data center involved in DR activities. |
VRA | Virtual Replication Appliance | A VRA is a service that runs on each ESX/ESXi host within a data center. The VRA is responsible for replication activities within a given ESX/ESXi host. |