Symptoms

Data Domain Operating System (DDOS) version 6.x (and later) contains a new safeguard which causes the Data Domain File System (DDFS) to be disabled if system time configured on the Data Domain Restorer (DDR) jumps back more than 60 seconds. If such a change in system time occurs, the following takes place:   
  • DDFS is disabled and does not automatically restart
  • An alert (EVT-ENVIRONMENT-00052) is posted, i.e.:    
Event posted: p0-32 -EVT-ENVIRONMENT-00052: File system is disabled due to a critical condition.EVT-OBJ::Enclosure=1 EVT-INFO::Cause=System Time backward jumped
Once this issue is encountered:  
  • DDFS is unable to manually restart (it panics during boot)
  • Changing the date/time forwards (to reverse the backwards jump) does not allow DDFS to start

Cause

This safeguard was implemented as a backwards jump in system time may adversely affect certain backup applications which store data on the DDR. As a result, its designed such that the administrator of the DDR has to specifically allow the change in system time before DDFS can be re-enabled.

An example of the issue is shown below:  
  • Initially, DDFS is running as normal:    
# filesys status
The filesystem is enabled and running.
  • The DDR has system date/time of 13:28 on March 7th 2017:    
# date
Sun Mar  7 13:28:24 PST 2017
  • The date is manually set backwards to 1st January 2017 (note that the network time protocol/NTP may need to be disabled before this change is possible):    
# system set date 01012017
  • Logs on the DDR (i.e. messages.engineering) indicate that the system date/time was changed backwards and that DDFS is therefore being disabled:
Mar  7 13:28:24 rtp-ddr30 ddsh: NOTICE: MSG-DDSH-00009: (tty=ttyS0, session=15703) root: command "system set date 01012017"
...
Jan 1 20:17:04 rtp-ddr30 ddr_stated: Availability stats: Invalid time interval -5591476. Probably the system clock was changed.
Jan 1 20:17:51 rtp-ddr30 platmon: INFO: Found a system time jump: -5591485
Jan 1 20:17:51 rtp-ddr30 platmon: INFO: Before Jump: system time: Tue Mar 7 13:28:15 2017 , rtc time: Tue Mar 7 13:28:16 2017 , ntp last sync time: Unknown
Jan 1 20:17:51 rtp-ddr30 platmon: INFO: After Jump: system time: Sun Jan 1 20:17:51 2017 , rtc time: Sun Jan 1 20:17:51 2017 , ntp last sync time: Unknown
...
Jan 1 20:17:51 rtp-ddr30 platmon: NOTICE: post_alert: Generating alert EVT-ENVIRONMENT-00052
Jan 1 20:17:52 rtp-ddr30 platmon: INFO: Event posted: p0-32 (11000020:285212704): EVT-ENVIRONMENT-00052: File system is disabled due to a critical condition.EVT-OBJ::Enclosure=1 EVT-INFO::Cause=System Time backward jumped
Jan 1 20:17:52 rtp-ddr30 platmon: NOTICE: evaluate_symbol_node: taking action(s) on error_indict(1)
Jan 1 20:17:52 rtp-ddr30 platmon: INFO: System time jumped, needs service now
Jan 1 20:17:52 rtp-ddr30 platmon: ERROR: Fatal error in platform monitor, DDFS shall be disabled
...
Jan 1 20:17:55 rtp-ddr30 ddr_procmon: ERROR: Critical error is detected by platform monitoring, filesystem is shutdown.
...
Jan 1 20:17:55 rtp-ddr30 ddr_stated: INFO: change_state(): shutdown requested
Jan  1 20:17:55 rtp-ddr30 ddfs[3761]: NOTICE: MSG-DDR-00003: Shutting down ddfs
  • An emergency alert is posted indicating that DFS has been disabled 'due to a critical condition':    
image.png

Resolution

Note: Once this issue is encountered, DDFS will not be able to be started as it will refuse to start, for example:
 
# filesys enable
Please wait...
01/01 20:32:10.217 (tid 0xxxxxxx): INFO: Event posted: m0-28 (2100001c:553648156): EVT-FILESYS-00008: Filesystem has encountered an error and is restarting.
**** There was a problem bringing up the filesystem. Status: The filesystem is aborting due to a problem.
In addition, reversing the backwards jump in system time does not allow DDFS to be re-enabled (the issue will persist).

To re-enable DDFS, the following steps must be taken. If the affected DD is the active node for a DD HA pair, follow the steps below, if applicable, on both nodes, before enabling the FS process:   
  1. Set the systems date/time to the correct value (note that if time zone is changed, the DDR may prompt that a reboot is required. If so, this should be performed immediately to ensure that all processes have recognized the newly configured time zone)
  2. Clear the emergency alert corresponding to the 'filesystem disabled due to a critical condition' error:    
# alert clear alert-id [alert id]

For example, if this were alert p0-32 (as shown above):

# alert clear alert-id p0-32
  1. Wait a minute for the alert to clear internally, and system status to be updated. Not doing so could result in the internal system status to have not fully updated by the time the FS process is started up, which could lead to a one-off FS crash and alert
  2. Enable DDFS:    
# filesys enable
 
DDFS should now boot/run as normal. If you failed to wait a sufficient time after clearing the alert before starting the FS process, you may receive an alert on the CLI about the FS to have encountered a problem, however, FS will continue trying to start up and, if the problem was as described in this KB, the FS process will eventually enable.

For further information on this issue or any of the information contained within this article, contact your contracted support provider.

Additional Information

When the DD is part or joined to a Windows Active Directory, the DC (Domain Controller) will be a source for system time for the DD, and the DD will periodically sync the date and time against that for the DC. If changes to date and time to the Windows DC occur, these are pushed to the DD through CIFS, and if a backwards change in time (longer than 60 seconds) occur, this also triggers the behavior described here.

To learn if this could be the case, start with checking if the DD is configured for CIFS and bound / joined to a particular Active Directory realm:  

# cifs show config
Mode Active-Directory
Realm realm.example.com
Domain Controllers *
WINS Server not specified
NB Hostname DD9300
Max Connections Not Available
Max Open Files       Not Available

If this is the case, check the "cifs.log" file for entries such as the ones below:  

# log view debug/cifs/cifs.log
Mar 28 22:03:16 DD9300 lsass: ALWAYS: [24497/1585429396.001947087] [lsass] ADSyncTimeToDC: Attempting to change System Time, from [Sat Mar 28 22:03:16 2020 ] to [Sat Mar 28 22:54:38 2020 ]
Mar 28 23:44:38 DD9300 lsass: ALWAYS: [24497/1585435478.001799190] [lsass] ADSyncTimeToDC: Attempting to change System Time, from [Sat Mar 28 23:44:38 2020 ] to [Sat Mar 28 22:53:15 2020 ]
Mar 29 22:04:38 DD9300 lsass: ALWAYS: [24497/1585512278.002014016] [lsass] ADSyncTimeToDC: Attempting to change System Time, from [Sun Mar 29 22:04:38 2020 ] to [Sun Mar 29 22:55:53 2020 ]
Mar 29 23:25:53 DD9300 lsass: ALWAYS: [24499/1585517153.001946740] [lsass] ADSyncTimeToDC: Attempting to change System Time, from [Sun Mar 29 23:25:53 2020 ] to [Sun Mar 29 22:34:37 2020 ]
Mar 29 23:25:53 DD9300 lsass: ALWAYS: [24497/1585517153.001946645] [lsass] ADSyncTimeToDC: Attempting to change System Time, from [Sun Mar 29 23:25:53 2020 ] to [Sun Mar 29 22:34:37 2020 ]
Mar 30 22:00:53 DD9300 lsass: ALWAYS: [24497/1585598453.002161373] [lsass] ADSyncTimeToDC: Attempting to change System Time, from [Mon Mar 30 22:00:53 2020 ] to [Mon Mar 30 22:52:01 2020 ]
Mar 30 23:12:01 DD9300 lsass: ALWAYS: [24497/1585602721.002275775] [lsass] ADSyncTimeToDC: Attempting to change System Time, from [Mon Mar 30 23:12:01 2020 ] to [Mon Mar 30 22:20:52 2020 ]

There is clearly some problem with the configured DC, as it seems time jumps forwards by approximately 50 minutes at around the same time every day, then goes back by the same amount shortly after. This triggers the alert with the DD and forcing the DD FS to shut down. In this particular case, the DC must be investigated for the time changes and resolved accordingly.

When Active Directory is configured, it is recommended that NTP is disabled, as per the contents of the DDOS Administration Guide (see page 137):    
Note:
 Using time synchronization from an Active Directory domain controller might cause excessive time changes on the system if both NTP and the domain controller are modifying the time. 
At the very least, if AD and NTP time sync are being used, if NTP is enabled, it should be configured to sync to the NTP server provided by the DC or to the configured DC's upstream time source for consistency