The
PDL condition is useful for both “stretched storage cluster” and non-stretched
environments.
Consider this scenario for stretched storage cluster:
PDL
is probably most common in non-uniform stretched solutions like EMC VPLEX. With
VPLEX site affinity is defined per LUN. If your VM resides in Datacenter-A
while the LUN it is stored on has affinity to Datacenter-B, in case of failure,
this VM could lose access to the LUN. These new PDL enhancements will ensure
the VM is killed and restarted on the other side.
Consider this scenario for non-stretched cluster:
PDL occurs where for instance;
the storage admin makes a mistake and removes access for a specific host to a
LUN.
Note:
Please
note that action will only be taken when a PDL sense code is
issued. When your storage completely fails for instance it is impossible to
reach the PDL condition as there is no communication possible anymore from the
array to the ESXi host and the state will be identified by the ESXi host as an
All Paths Down (APD) condition. APD is a more common scenario in most
environments. If you are testing these enhancements please check the log files
to validate which problem has been identified.
PDL enhancements:
Two
advanced settings make this possible. The first setting is configured on a host
level and is “disk.terminateVMOnPDLDefault=TRUE”
should be added /etc/vmware/settings. This setting ensures that when a datastore
enters a PDL state, corresponding virtual machine is killed. The virtual
machine is killed as soon as it initiates disk I/O on a datastore which is in a
PDL condition and all of the virtual machine files reside on this datastore.
Note that if a virtual machine does not initiate any I/O it will not be killed!
The
second setting is a vSphere HA advanced setting called das.maskCleanShutdownEnabled. This setting is also not enabled by
default in vSphere 5.0 Update 1 but it is enabled by default from vSphere 5.1. These
settings allow HA triggering a restart response for a virtual machine which has
been killed automatically due to a PDL condition. This setting allows HA to
differentiate between a virtual machine which was killed due to the PDL state
or a virtual machine which has been powered off by an administrator.
As
soon as “disaster strikes” and the PDL sense code is sent. You will see the
following popping up in the vmkernel.log that indicates the PDL condition and
the kill of the VM:
2012-03-14T13:39:25.085Z cpu7:4499)WARNING: VSCSI: 4055:
handle 8198(vscsi4:0):opened by wid 4499 (vmm0:fri-iscsi-02) has Permanent
Device Loss. Killing world group leader 4491
2012-03-14T13:39:25.085Z cpu7:4499)WARNING: World: vm 4491: 3173: VMMWorld group leader = 4499, members = 1
Thanks to Duncan Epping for his posts
2012-03-14T13:39:25.085Z cpu7:4499)WARNING: World: vm 4491: 3173: VMMWorld group leader = 4499, members = 1
Thanks to Duncan Epping for his posts
No comments:
Post a Comment