How to identify XFS Btree Inode Corruption Anomaly

Version 1

    XFS BTree Inode Corruption Anomaly

     

    Recently we have identified a critical anomaly with GOS 5.  So far the anomaly has only been observed with GOS 5.1 however according to our Engineering analysis, the potential also exists under GOS 5.0 as well.

     

    This anomaly is the result of corruption in the filesystem that is caused merely by usage and not by any exterior factors such as a hard shutdown.

     

    This anomaly results in a force shutdown of the filesystem and potentially total data loss. Symptoms include messages similar to the following in the syslog and an unmounted volume.

     

     

    kernel: attempt to access beyond end of device

    kernel: dm-1: rw=0, want=4121118138320289800, limit=5846859776

    kernel: I/O error in filesystem ("dm-1") meta-data dev dm-1 block 0x3931271cb8000000       ("xfs_trans_read_buf") error 5 buf count 4096

    kernel: xfs_force_shutdown(dm-1,0x1) called from line 417 of file fs/xfs/xfs_trans_buf.c. Return address = 0x7822f55c

    kernel: Filesystem "dm-1": I/O Error Detected.  Shutting down filesystem: dm-1

    kernel: Please umount the filesystem, and rectify the problem(s)

     

    Another separate anomaly with the same root cause looks like this:

     

    kernel: Filesystem "dm-0": XFS internal error xfs_iformat_btree at line 705 of file fs/xfs/xfs_inode.c.  Caller 0x78207bf0

    kernel: Filesystem "dm-0": corrupt inode 823956991 (btree).  Unmount and run xfs_repair.

     

     

    Common identifiers:

     

    Attempt to access beyond end of device

    ("xfs_trans_read_buf") error 5 buf count 4096

    xfs_force_shutdown(####) called from line 417 of file fs/xfs/xfs_trans_buf.c.

     

    XFS internal error xfs_iformat_btree at line 705 of file fs/xfs/xfs_inode.c.

    corrupt inode ######### (btree).  Unmount and run xfs_repair.

     

    Please note that not all xfs_force_shutdowns are the same.  This anomaly is specific to a type of corruption that has not been previously observed.  Other xfs_force_shutdowns do still require the same steps to get a customer up and running IE xfs_repair, but are caused by external stimulus such as ungraceful shutdowns, disk failure or other hardware problems.

     

    This problem will be addressed in an upgrade version of GOS that will include a new kernel and xfs filesystem as well as a new version of xfs_repair that will permanently resolve this anomaly.  While we do not have a current ETA it is considered a critical update and is being expedited so we should see it very soon.

     

    In the meantime our only recourse is to perform an xfs_repair on the filesystem in question. While this will not prevent the anomaly from reoccurring it will allow the customer to continue utilizing the system until we have a permanent fix.

     

    Once we have the fixed GSU available we will provide directions for use.

     

    Additionally we have created an answerbook article: 2134SD to send to customers experiencing the above anomaly.  PLEASE BE CAREFUL with using the answerbook response as it does indicate that we are working on fixing 'this problem' but it is non specific.  The fix will NOT address all possible xfs_force_shutdown scenarios but only resolves this specific problem.

     

    RESOLUTION: Upgrade to GOS 5.1.046 or greater and run an xfs repair on any volumes including root.  This issue is NOT resolved by simply upgrading, an xfs_repair MUST be run in order to correct the potential for the problem to exist.  In other words, if you do not perform an xfs_repair, it is possible for a system running GOS 5.1.046 or later that has been upgraded from a previous version to run into this condition until and unless an xfs_repair is performed on any volumes.