% Autopsy of a minor disk failure, fsck, lost+found % Ian! D. Allen -- -- [www.idallen.com] % Winter 2016 - January to April 2016 - Updated 2017-02-22 10:11 EST - [Course Home Page] - [Course Outline] - [All Weeks] - [Plain Text] Disk attributes of damaged disk =============================== A 1TB disk is slowly failing: # smartctl -x /dev/sdd [...] Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-- 118 089 006 - 191177437 3 Spin_Up_Time PO---- 093 092 000 - 0 4 Start_Stop_Count -O--CK 100 100 020 - 54 5 Reallocated_Sector_Ct PO--CK 099 099 036 - 73 7 Seek_Error_Rate POSR-- 087 060 030 - 507706352 9 Power_On_Hours -O--CK 070 070 000 - 26402 10 Spin_Retry_Count PO--C- 100 100 097 - 1 12 Power_Cycle_Count -O--CK 100 100 020 - 32 184 End-to-End_Error -O--CK 100 100 099 - 0 187 Reported_Uncorrect -O--CK 001 001 000 - 5143 188 Command_Timeout -O--CK 100 088 000 - 393568 189 High_Fly_Writes -O-RCK 001 001 000 - 1781 190 Airflow_Temperature_Cel -O---K 062 052 045 - 38 (Min/Max 29/48) 194 Temperature_Celsius -O---K 038 048 000 - 38 (0 16 0 0 0) 195 Hardware_ECC_Recovered -O-RC- 050 020 000 - 191177437 197 Current_Pending_Sector -O--C- 100 100 000 - 28 198 Offline_Uncorrectable ----C- 100 100 000 - 28 199 UDMA_CRC_Error_Count -OSRCK 200 199 000 - 39 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning I/O error during routine backup =============================== Today, a routine backup had an I/O error: # ./dobackup.sh dobackup.sh: Doing /mnt/ubuntu10.04c/. -> /mnt/1tbB/ubuntu10.04c max size 20G Tue Feb 21 10:57:42 EST 2017 rsync: readlink_stat("/mnt/1tbB/ubuntu10.04c/idallen/archive/SpeedTouch516/Documentation/HTML/ST706_es/wwhelp/wwhimpl/js") failed: Structure needs cleaning (117) rsync: readlink_stat("/mnt/1tbB/ubuntu10.04c/idallen/archive/SpeedTouch516/Documentation/HTML/ST706_es/wwhelp/wwhimpl/java") failed: Structure needs cleaning (117) IO error encountered -- skipping file deletion Checking the file in question with `ls`: # ls -lF /mnt/1tbB/ubuntu10.04c/idallen/archive/SpeedTouch516/Documentation/HTML/ST706_es/wwhelp/wwhimpl/js ls: cannot access '/mnt/1tbB/ubuntu10.04c/idallen/archive/SpeedTouch516/Documentation/HTML/ST706_es/wwhelp/wwhimpl/js': Structure needs cleaning Checking the parent directory shows damage: # ls -lF /mnt/1tbB/ubuntu10.04c/idallen/archive/SpeedTouch516/Documentation/HTML/ST706_es/wwhelp/wwhimpl ls: cannot access '/mnt/1tbB/ubuntu10.04c/idallen/archive/SpeedTouch516/Documentation/HTML/ST706_es/wwhelp/wwhimpl/js': Structure needs cleaning ls: cannot access '/mnt/1tbB/ubuntu10.04c/idallen/archive/SpeedTouch516/Documentation/HTML/ST706_es/wwhelp/wwhimpl/java': Structure needs cleaning total 16 dr-xr-xr-x 5 idallen idallen 4096 May 15 2006 ./ dr-xr-xr-x 3 idallen idallen 4096 May 15 2006 ../ dr-xr-xr-x 6 idallen idallen 4096 May 15 2006 common/ d????????? ? ? ? ? ? java/ d????????? ? ? ? ? ? js/ -r-xr-xr-x 1 idallen idallen 885 May 15 2006 version.htm Find the inode number of this directory: # ls -lidF /mnt/1tbB/ubuntu10.04c/idallen/archive/SpeedTouch516/Documentation/HTML/ST706_es/wwhelp/wwhimpl 20711024 dr-xr-xr-x 5 idallen idallen 4096 May 15 2006 ./ Running a file system check: `fsck` =================================== Confirm that /mnt/1tbB is a partition on the questionable `/dev/sdd` disk: # df /mnt/1tbB Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdd1 961302560 751972852 199545712 80% /mnt/1tbB Unmount the file system and do a file system check and repair on it: # umount /dev/sdd1 # fsck -v -C -f /dev/sdd1 Pass 1: Checking inodes, blocks, and sizes Inodes that were part of a corrupted orphan linked list found. Fix? yes Inode 20711106 was part of the orphaned inode list. FIXED. Inode 20711106, i_blocks is 19792652383982, should be 0. Fix? yes Inode 20711107 was part of the orphaned inode list. FIXED. Inode 20711107, i_blocks is 19796947351278, should be 0. Fix? yes Inode 20711108 was part of the orphaned inode list. FIXED. Inode 20711108, i_blocks is 19801242318574, should be 0. Fix? yes Inode 20711109 was part of the orphaned inode list. FIXED. Inode 20711109, i_blocks is 19805537285870, should be 0. Fix? yes Inode 20711110 was part of the orphaned inode list. FIXED. Inode 20711110, i_blocks is 19809832253166, should be 0. Fix? yes Inode 20711111 was part of the orphaned inode list. FIXED. Inode 20711111, i_blocks is 19814127220462, should be 0. Fix? yes Inode 20711112 was part of the orphaned inode list. FIXED. Inode 20711112, i_blocks is 19818422187758, should be 0. Fix? yes Inode 20711113 was part of the orphaned inode list. FIXED. Inode 20711113, i_blocks is 19822717155054, should be 0. Fix? yes Inode 20711114 was part of the orphaned inode list. FIXED. Inode 20711114, i_blocks is 19827012122350, should be 0. Fix? yes Inode 20711115 was part of the orphaned inode list. FIXED. Inode 20711115, i_blocks is 19831307089646, should be 0. Fix? yes Inode 20711116 was part of the orphaned inode list. FIXED. Inode 20711116, i_blocks is 19835602056942, should be 0. Fix? yes Inode 20711117 was part of the orphaned inode list. FIXED. Inode 20711117, i_blocks is 19839897024238, should be 0. Fix? yes Inode 20711118 was part of the orphaned inode list. FIXED. Inode 20711118, i_blocks is 19844191991534, should be 0. Fix? yes Inode 20711119 was part of the orphaned inode list. FIXED. Inode 20711119, i_blocks is 19848486958830, should be 0. Fix? yes Inode 20711120 was part of the orphaned inode list. FIXED. Inode 20711120, i_blocks is 19852781926126, should be 0. Fix? yes Pass 2: Checking directory structure Inode 20711108 (/ubuntu10.04c/idallen/archive/SpeedTouch516/Documentation/HTML/ST706_es/wwhelp/wwhimpl/java) has invalid mode (00). Clear? yes Inode 20711115 (/ubuntu10.04c/idallen/archive/SpeedTouch516/Documentation/HTML/ST706_es/wwhelp/wwhimpl/js) has invalid mode (00). Clear? yes Inode 20711106 (/ubuntu10.04c/idallen/archive/SpeedTouch516/Documentation/HTML/ST706_es/wwhelp/wwhimpl/common/scripts/strutils.js) has invalid mode (00). Clear? yes Inode 20711107 (/ubuntu10.04c/idallen/archive/SpeedTouch516/Documentation/HTML/ST706_es/wwhelp/wwhimpl/common/scripts/switch.js) has invalid mode (00). Clear? yes Entry '..' in <20711115>/<20711123> (20711123) has deleted/unused inode 20711115. Clear? yes Entry '..' in <20711115>/<20711141> (20711141) has deleted/unused inode 20711115. Clear? yes Entry '..' in <20711115>/<20711144> (20711144) has deleted/unused inode 20711115. Clear? yes Pass 3: Checking directory connectivity Unconnected directory inode 20711123 (...) Connect to /lost+found? yes Unconnected directory inode 20711141 (...) Connect to /lost+found? yes Unconnected directory inode 20711144 (...) Connect to /lost+found? yes Pass 4: Checking reference counts Inode 20711024 ref count is 5, should be 3. Fix? yes Inode 20711109 (...) has invalid mode (00). Clear? yes Inode 20711110 (...) has invalid mode (00). Clear? yes Inode 20711111 (...) has invalid mode (00). Clear? yes Inode 20711112 (...) has invalid mode (00). Clear? yes Inode 20711113 (...) has invalid mode (00). Clear? yes Inode 20711114 (...) has invalid mode (00). Clear? yes Inode 20711116 (...) has invalid mode (00). Clear? yes Inode 20711117 (...) has invalid mode (00). Clear? yes Inode 20711118 (...) has invalid mode (00). Clear? yes Inode 20711119 (...) has invalid mode (00). Clear? yes Inode 20711120 (...) has invalid mode (00). Clear? yes Unattached inode 20711121 Connect to /lost+found? yes Inode 20711121 ref count is 2, should be 1. Fix? yes Unattached inode 20711122 Connect to /lost+found? yes Inode 20711122 ref count is 2, should be 1. Fix? yes Inode 20711123 ref count is 3, should be 2. Fix? yes Inode 20711141 ref count is 3, should be 2. Fix? yes Inode 20711144 ref count is 3, should be 2. Fix? yes Pass 5: Checking group summary information Block bitmap differences: -(82845914--82845917) -(82872976--82872995) Fix? yes Free blocks count wrong for group #2528 (15, counted=19). Fix? yes Free blocks count wrong for group #2529 (73, counted=93). Fix? yes Free blocks count wrong (52332403, counted=52332427). Fix? yes Directories count wrong for group #2528 (893, counted=889). Fix? yes 1tbB: ***** FILE SYSTEM WAS MODIFIED ***** 2623987 inodes used (4.30%, out of 61054976) 5225 non-contiguous files (0.2%) 247 non-contiguous directories (0.0%) # of inodes with ind/dind/tind blocks: 0/0/0 Extent depth histogram: 2579521/463/2 191857963 blocks used (78.57%, out of 244190390) 16 bad blocks 25 large files 2308517 regular files 269852 directories 790 character device files 26 block device files 69 fifos 223385 links 44649 symbolic links (43033 fast symbolic links) 75 sockets ------------ 2847361 files Re-mount the file system and examine lost+found =============================================== Re-mount the file system and look at the same directory. The damage is gone, but so are two damaged directories of files: # mount /mnt/1tbB # ls -liF /mnt/1tbB/ubuntu10.04c/idallen/archive/SpeedTouch516/Documentation/HTML/ST706_es/wwhelp/wwhimpl total 16 20711024 dr-xr-xr-x 3 idallen idallen 4096 May 15 2006 ./ 20711023 dr-xr-xr-x 3 idallen idallen 4096 May 15 2006 ../ 20711025 dr-xr-xr-x 6 idallen idallen 4096 May 15 2006 common/ 20711160 -r-xr-xr-x 1 idallen idallen 885 May 15 2006 version.htm Check the `lost+found` for this file system and see what was salvaged and ended up there: # ls -lF /mnt/1tbB/lost+found/ total 44 -r-xr-xr-x 1 idallen idallen 1115 May 15 2006 #20711121 -r-xr-xr-x 1 idallen idallen 5217 May 15 2006 #20711122 dr-xr-xr-x 2 idallen idallen 4096 May 15 2006 #20711123/ dr-xr-xr-x 2 idallen idallen 4096 May 15 2006 #20711141/ dr-xr-xr-x 2 idallen idallen 4096 May 15 2006 #20711144/ drwx------ 5 root root 16384 Aug 3 2015 ./ drwxr-xr-x 8 idallen idallen 4096 Feb 3 08:56 ../ # find /mnt/1tbB/lost+found/ /mnt/1tbB/lost+found/ /mnt/1tbB/lost+found/#20711123 /mnt/1tbB/lost+found/#20711123/navanim1.gif /mnt/1tbB/lost+found/#20711123/searchbutton_it.gif /mnt/1tbB/lost+found/#20711123/nfocbg.gif /mnt/1tbB/lost+found/#20711123/patt_right.gif /mnt/1tbB/lost+found/#20711123/tabspacer.gif /mnt/1tbB/lost+found/#20711123/pdf.gif /mnt/1tbB/lost+found/#20711123/searchbutton_pt.gif /mnt/1tbB/lost+found/#20711123/tabsbg.gif /mnt/1tbB/lost+found/#20711123/navanim2.gif /mnt/1tbB/lost+found/#20711123/searchbutton_en.gif /mnt/1tbB/lost+found/#20711123/searchbutton_es.gif /mnt/1tbB/lost+found/#20711123/searchbutton_sv.gif /mnt/1tbB/lost+found/#20711123/searchbutton_nl.gif /mnt/1tbB/lost+found/#20711123/navanim1_enfocsite.gif /mnt/1tbB/lost+found/#20711123/tabsbg_bkup.gif /mnt/1tbB/lost+found/#20711123/searchbutton_fr.gif /mnt/1tbB/lost+found/#20711123/searchbutton_de.gif /mnt/1tbB/lost+found/#20711141 /mnt/1tbB/lost+found/#20711141/options.js /mnt/1tbB/lost+found/#20711141/locale.js /mnt/1tbB/lost+found/#20711144 /mnt/1tbB/lost+found/#20711144/search.js /mnt/1tbB/lost+found/#20711144/outlin1s.js /mnt/1tbB/lost+found/#20711144/javascpt.js /mnt/1tbB/lost+found/#20711144/outline.js /mnt/1tbB/lost+found/#20711144/index.js /mnt/1tbB/lost+found/#20711144/outlfast.js /mnt/1tbB/lost+found/#20711144/search4s.js /mnt/1tbB/lost+found/#20711144/panels.js /mnt/1tbB/lost+found/#20711144/handler.js /mnt/1tbB/lost+found/#20711144/tabs.js /mnt/1tbB/lost+found/#20711144/search3s.js /mnt/1tbB/lost+found/#20711144/search1s.js /mnt/1tbB/lost+found/#20711144/outlsafe.js /mnt/1tbB/lost+found/#20711144/search2s.js /mnt/1tbB/lost+found/#20711144/index1s.js /mnt/1tbB/lost+found/#20711121 /mnt/1tbB/lost+found/#20711122 Since this file system is a backup copy, I can check the original to discover the names of the three directories above by looking for the above file names in the original: # find /idallen/archive/SpeedTouch516/Documentation/HTML/ST706_es/wwhelp/wwhimpl | grep navanim1.gif /idallen/archive/SpeedTouch516/Documentation/HTML/ST706_es/wwhelp/wwhimpl/js/images/navanim1.gif The above shows that `lost+found/#20711123` should be named `/mnt/1tbB/ubuntu10.04c/idallen/archive/SpeedTouch516/Documentation/HTML/ST706_es/wwhelp/wwhimpl/js/images`. A similar search identifies the names of the other two directory inodes under `lost+found`. We move the three directories back where they belong: # mkdir /mnt/1tbB/ubuntu10.04c/idallen/archive/SpeedTouch516/Documentation/HTML/ST706_es/wwhelp/wwhimpl/js # mkdir /mnt/1tbB/ubuntu10.04c/idallen/archive/SpeedTouch516/Documentation/HTML/ST706_es/wwhelp/wwhimpl/java # mv /mnt/1tbB/lost+found/#20711123 /mnt/1tbB/ubuntu10.04c/idallen/archive/SpeedTouch516/Documentation/HTML/ST706_es/wwhelp/wwhimpl/js/images # mv /mnt/1tbB/lost+found/#20711141 /mnt/1tbB/ubuntu10.04c/idallen/archive/SpeedTouch516/Documentation/HTML/ST706_es/wwhelp/wwhimpl/js/private # mv /mnt/1tbB/lost+found/#20711144 /mnt/1tbB/ubuntu10.04c/idallen/archive/SpeedTouch516/Documentation/HTML/ST706_es/wwhelp/wwhimpl/js/scripts A comparison of the source and the backup copy show that a few files have actually been lost in this I/O error. The I/O error damaged some data inodes and that data is gone: # rsync -avxHs -n /idallen/archive/SpeedTouch516/Documentation/HTML/ST706_es/wwhelp/wwhimpl/. /mnt/1tbB/ubuntu10.04c/idallen/archive/SpeedTouch516/Documentation/HTML/ST706_es/wwhelp/wwhimpl/ sending incremental file list ./ common/scripts/strutils.js common/scripts/switch.js java/ java/private/ java/private/books.xml java/private/locale.js java/private/locale.xml java/private/options.js java/private/options.xml js/ js/html/ js/html/indexsel.htm js/html/navigate.htm js/html/panel.htm js/html/panelini.htm js/html/tabs.htm js/html/wwhelp.htm sent 2,445 bytes received 82 bytes 5,054.00 bytes/sec total size is 394,537 speedup is 156.13 (DRY RUN) Doing a checksum on the two remaining files in `lost+found` and comparing sums with the above list of files identifies their names: # cd /idallen/archive/SpeedTouch516/Documentation/HTML/ST706_es/wwhelp/wwhimpl/ # sum js/html/tabs.htm js/html/wwhelp.htm /mnt/1tbB/lost+found/#2071112* 27775 2 js/html/tabs.htm 01742 6 js/html/wwhelp.htm 27775 2 /mnt/1tbB/lost+found/#20711121 01742 6 /mnt/1tbB/lost+found/#20711122 The other files in the `html/` directory and other directories were damaged by the I/O error and are gone, but we can simply re-do the backup to recreate them. I should throw out this old disk! -- | Ian! D. Allen, BA, MMath - idallen@idallen.ca - Ottawa, Ontario, Canada | Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/ | College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/ | Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/ [Plain Text] - plain text version of this page in [Pandoc Markdown] format [www.idallen.com]: http://www.idallen.com/ [Course Home Page]: .. [Course Outline]: course_outline.pdf [All Weeks]: indexcgi.cgi [Plain Text]: 456_disk_error_autopsy.txt [Pandoc Markdown]: http://johnmacfarlane.net/pandoc/