Data availability, integrity, and security are one of the most important functions provided by modern HCI solutions. Nutanix offers several ways to protect and secure data. Data-at-Rest encryption and data in transit encryption just to name a few of them. Nutanix has implemented the concept of data Replication Factor (RF) to protect data availability and integrity in case of hardware failure. By default, Replication factor 2 (two) is implemented in every Nutanix storage container. This means, that every block of data has its exact copy elsewhere in the cluster. It can be placed:
- on a different node (in multi-node cluster deployment)
- different disk in a single node cluster deployment
- on a node in a different block (chasse) called block-awareness, if you have a cluster span across multiple blocks
NOTE: the system does enable block awareness automatically as soon as cluster configuration meets block-awareness requirements. - on a node in a different data center rack if you implemented rack-awareness into the cluster.
With AOS 6.5, Nutanix supports 3 replication factor configurations:
- RF=3
- RF=2
- RF=1
Why would you change the replication factor to RF=3
There are multiple reasons why would you change the default replication factor. Below, you can find the most common use cases.
Increase data resiliency in the T0 cluster
Changing from default RF=2 to RF=3 increases the number of copies of the data from 2 to 3. This means the system can tolerate two simultaneous HW failures and still serve data to the application.
Customers are enabling RF=3 on clusters with Tier 0 applications where data availability is the most important factor.
NOTE#1: changing the Nutanix container replication factor from RF2 to RF3 does come with storage “costs” as the system has to store an additional copy of the data. To limit the impact of storage utilization on the cluster, customers can enable Erasure coding on the system, which helps reduce storage utilization.
NOTE#2: Changing from RF2 to RF3 increases the number of metadata copies from 3 to 5
Comply with Nutanix best practices
When building the Nutanix cluster, you can keep adding nodes to the system gradually, one after another. Nutanix recommends on clusters with 24 nodes or more to use Replication Factor 3. The reason behind this recommendation is to mitigate risk. The more nodes in the cluster, the higher the risk of HW components (in our case disk drives or nodes) failing.
How do I change Nutanix replication factor from RF2 to RF3?
NOTE#1 Changing RF2 to RF3 – changes the number of metadata replicas from 3 to 5. You can change RF 3 to RF2 on the storage container level but a number of metadata replicas will remain 5
NOTE#2 make sure you have enough storage free space on the cluster
Log in to CVM over SSH and run ncli storage-container edit rf=3 name=<Storage_Container_Name>
Depending on how many nodes are in the cluster, how much data, and how busy the cluster is, the operation may take any time from 30 minutes to a few hours.
<ncli> storage-container edit rf=3 name=SelfServiceContainer
Id : 0005e404-1657-774a-7cca-3cecef82f0e1::1448
Uuid : d8d77b5e-7693-4d65-9421-cfa3bad00986
Name : SelfServiceContainer
Storage Pool Id : 0005e404-1657-774a-7cca-3cecef82f0e1::18
Storage Pool Uuid : e138ea63-24a9-4880-8de9-17aac8743711
Free Space (Logical) : 48.36 TiB (53,173,256,954,970 bytes)
Used Space (Logical) : 2.62 TiB (2,875,405,090,816 bytes)
Allowed Max Capacity : 50.98 TiB (56,048,662,045,786 bytes)
Used by other Containers : 13.96 GiB (14,985,650,176 bytes)
Explicit Reservation : 0 bytes
Thick Provisioned : 0 bytes
Replication Factor : 3
Oplog Replication Factor : 3
NFS Whitelist Inherited : true
Container NFS Whitelist :
VStore Name(s) : SelfServiceContainer
Random I/O Pri Order : SSD-PCIe, SSD-SATA, DAS-SATA
Sequential I/O Pri Order : SSD-PCIe, SSD-SATA, DAS-SATA
Compression : on
Compression Delay : 0 mins
Fingerprint On Write : off
On-Disk Dedup : off
Erasure Code : off
Software Encryption : off
<ncli>
You can watch progress in Prism UI. When the process is completed, you can check changes in the command line or in Prism Element
Replication Factor : 3
Oplog Replication Factor : 3
<ncli> storage-container list
Id : 0005e404-1657-774a-7cca-3cecef82f0e1::1448
Uuid : d8d77b5e-7693-4d65-9421-cfa3bad00986
Name : SelfServiceContainer
Storage Pool Id : 0005e404-1657-774a-7cca-3cecef82f0e1::18
Storage Pool Uuid : e138ea63-24a9-4880-8de9-17aac8743711
Free Space (Logical) : 48.36 TiB (53,170,239,718,490 bytes)
Used Space (Logical) : 2.62 TiB (2,878,421,329,237 bytes)
Allowed Max Capacity : 50.98 TiB (56,048,661,047,728 bytes)
Used by other Containers : 13.96 GiB (14,986,648,234 bytes)
Explicit Reservation : 0 bytes
Thick Provisioned : 0 bytes
Replication Factor : 3
Oplog Replication Factor : 3
NFS Whitelist Inherited : true
Container NFS Whitelist :
VStore Name(s) : SelfServiceContainer
Random I/O Pri Order : SSD-PCIe, SSD-SATA, DAS-SATA
Sequential I/O Pri Order : SSD-PCIe, SSD-SATA, DAS-SATA
Compression : on
Compression Delay : 0 mins
Fingerprint On Write : off
On-Disk Dedup : off
Erasure Code : off
Software Encryption : off
Nice write on this. Very well explained.