PCEVA,PC绝对领域,探寻真正的电脑知识
开启左侧

76000小时,弱扇区,修好继续用

[复制链接]
nighttob 发表于 2019-10-22 10:02 | 显示全部楼层 |阅读模式
点击数:4850|回复数:10
本帖最后由 nighttob 于 2019-10-23 20:25 编辑

也许会变成下一个放弃治疗的案例。

最近发现qBit偶尔会报IO错误,IO延迟起伏有点大,所以检查了下状态。
  1. [root@NIGHTTOB-Server:/opt/lsi/storcli] ./storcli /c0/e62/s4 show all
  2. CLI Version = 007.1017.0000.0000 May 10, 2019
  3. Operating system = VMkernel 6.7.0
  4. Controller = 0
  5. Status = Success
  6. Description = Show Drive Information Succeeded.


  7. Drive /c0/e62/s4 :
  8. ================

  9. ------------------------------------------------------------------------------
  10. EID:Slt DID State DG     Size Intf Med SED PI SeSz Model              Sp Type
  11. ------------------------------------------------------------------------------
  12. 62:4     12 JBOD  -  1.819 TB SATA HDD N   N  512B ST2000DL003-9VT166 U  -
  13. ------------------------------------------------------------------------------

  14. EID=Enclosure Device ID|Slt=Slot No.|DID=Device ID|DG=DriveGroup
  15. DHS=Dedicated Hot Spare|UGood=Unconfigured Good|GHS=Global Hotspare
  16. UBad=Unconfigured Bad|Onln=Online|Offln=Offline|Intf=Interface
  17. Med=Media Type|SED=Self Encryptive Drive|PI=Protection Info
  18. SeSz=Sector Size|Sp=Spun|U=Up|D=Down|T=Transition|F=Foreign
  19. UGUnsp=Unsupported|UGShld=UnConfigured shielded|HSPShld=Hotspare shielded
  20. CFShld=Configured shielded|Cpybck=CopyBack|CBShld=Copyback Shielded
  21. UBUnsp=UBad Unsupported


  22. Drive /c0/e62/s4 - Detailed Information :
  23. =======================================

  24. Drive /c0/e62/s4 State :
  25. ======================
  26. Shield Counter = 0
  27. Media Error Count = 740
  28. Other Error Count = 11
  29. Drive Temperature =  31C (87.80 F)
  30. Predictive Failure Count = 0
  31. S.M.A.R.T alert flagged by drive = No


  32. Drive /c0/e62/s4 Device attributes :
  33. ==================================
  34. SN = 6YD0KDHW
  35. Manufacturer Id = ATA
  36. Model Number = ST2000DL003-9VT166
  37. NAND Vendor = NA
  38. WWN = 5000C50037240706
  39. Firmware Revision = CC32
  40. Raw size = 1.819 TB [0xe8e088b0 Sectors]
  41. Coerced size = 1.818 TB [0xe8d00000 Sectors]
  42. Non Coerced size = 1.818 TB [0xe8d088b0 Sectors]
  43. Device Speed = 6.0Gb/s
  44. Link Speed = 6.0Gb/s
  45. NCQ setting = Enabled
  46. Write Cache = N/A
  47. Logical Sector Size = 512B
  48. Physical Sector Size = 512B
  49. Connector Name = Port 4 - 7 x1


  50. Drive /c0/e62/s4 Policies/Settings :
  51. ==================================
  52. Enclosure position = 1
  53. Connected Port Number = 1(path0)
  54. Sequence Number = 2
  55. Commissioned Spare = No
  56. Emergency Spare = No
  57. Last Predictive Failure Event Sequence Number = 0
  58. Successful diagnostics completion on = N/A
  59. FDE Type = None
  60. SED Capable = No
  61. SED Enabled = No
  62. Secured = No
  63. Cryptographic Erase Capable = No
  64. Sanitize Support = Not supported
  65. Locked = No
  66. Needs EKM Attention = No
  67. PI Eligible = No
  68. Certified = No
  69. Wide Port Capable = No

  70. Port Information :
  71. ================

  72. -----------------------------------------
  73. Port Status Linkspeed SAS address
  74. -----------------------------------------
  75.    0 Active 6.0Gb/s   0x4433221106000000
  76. -----------------------------------------


  77. Inquiry Data =
  78. 5a 0c ff 3f 37 c8 10 00 00 00 00 00 3f 00 00 00
  79. 00 00 00 00 20 20 20 20 20 20 20 20 20 20 20 20
  80. 59 36 30 44 44 4b 57 48 00 00 00 00 04 00 43 43
  81. 32 33 20 20 20 20 54 53 30 32 30 30 4c 44 30 30
  82. 2d 33 56 39 31 54 36 36 20 20 20 20 20 20 20 20
  83. 20 20 20 20 20 20 20 20 20 20 20 20 20 20 10 80
  84. 00 40 00 2f 00 40 00 02 00 02 07 00 ff 3f 10 00
  85. 3f 00 10 fc fb 00 10 00 ff ff ff 0f 00 00 07 00
复制代码

如果这是跑生产业务的机器,看到有Media Error,这就啥都不用想了,直接换。
不过既然是自己的机器,可以稍微分析一下。
  1. [root@NIGHTTOB-Server:/opt/lsi/storcli] ./storcli /c0/e62/s4 show smart
  2. CLI Version = 007.1017.0000.0000 May 10, 2019
  3. Operating system = VMkernel 6.7.0
  4. Controller = 0
  5. Status = Success
  6. Description = Show Drive Smart Info Succeeded.

  7. Smart Data Info /c0/e62/s4 =
  8. 0a 00 01 0f 00 76 57 20 47 22 0b 00 00 00 03 03
  9. 00 54 54 00 00 00 00 00 00 00 04 32 00 63 63 d0
  10. 07 00 00 00 00 00 05 33 00 64 64 00 00 00 00 00
  11. 00 00 07 0f 00 55 3c 04 86 5b 16 00 00 00 09 32
  12. 00 0d 0d fb 2b 01 00 00 00 00 0a 13 00 64 64 00
  13. 00 00 00 00 00 00 0c 32 00 64 64 79 03 00 00 00
  14. 00 00 b7 32 00 64 64 00 00 00 00 00 00 00 b8 32
  15. 00 64 64 00 00 00 00 00 00 00 bb 32 00 01 01 dc
  16. 05 00 00 00 00 00 bc 32 00 64 01 69 01 03 00 03
  17. 00 00 bd 3a 00 5e 5e 06 00 00 00 00 00 00 be 22
  18. 00 44 2d 20 00 1d 32 00 00 00 bf 32 00 64 64 00
  19. 00 00 00 00 00 00 c0 32 00 64 64 95 03 00 00 00
  20. 00 00 c1 32 00 63 63 d9 07 00 00 00 00 00 c2 22
  21. 00 20 37 20 00 00 00 11 00 00 c3 1a 00 23 03 20
  22. 47 22 0b 00 00 00 c5 12 00 63 63 88 00 00 00 00
  23. 00 00 c6 10 00 63 63 88 00 00 00 00 00 00 c7 3e
  24. 00 c8 c8 01 00 00 00 00 00 00 f0 00 00 64 fd 2b
  25. ea 00 00 ba 39 2c f1 00 00 64 fd 2f 87 af c1 00
  26. 00 00 f2 00 00 64 fd 5a 2f 4f f0 00 00 00 00 00
  27. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  28. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  29. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  30. 00 00 00 00 00 00 00 00 00 00 82 00 64 02 00 7b
  31. 03 00 01 00 01 ff 02 56 01 00 00 00 00 00 00 00
  32. 00 00 00 00 00 00 00 00 00 08 0f 08 08 08 1d 1e
  33. 1d 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00
  34. 00 00 00 00 00 00 00 00 8a 50 e7 6e 7a fb 00 00
  35. 00 00 00 00 01 00 95 ff 2f 87 af c1 84 48 01 00
  36. 5a 2f 4f f0 c8 bf 1e 00 00 00 00 00 2e bd 30 0c
  37. 00 00 00 04 00 00 00 00 20 12 00 00 51 00 0b 00
  38. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 13
  39. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c5
复制代码

从SMART来看:
05h=0
09h=(12bfb)hex=(76795)dec
B8h=0
BBh=(5dc)hex=(1500)dec
C5h=(88)hex=(136)dec
C6h=(88)hex=(136)dec
通电76795小时,有1500的UNC,E2E为0,05也是0,pending有值。
看起来有“修”好的可能性。

晚上回去先看下RAID log,找找有没有ASC code,再拿下来彻底擦除一下。

nighttob  楼主| 发表于 2019-10-22 22:17 | 显示全部楼层
RAID卡日志显示问题是从10月19日开始的,报ASC 3/11/0,也就是UNRECOVERED READ ERROR。
看来我这发现的还是比较及时的。

现在用HD Sentinel全盘扫描中,快50%了,没发现异常。
nighttob  楼主| 发表于 2019-10-23 06:59 | 显示全部楼层
结果显示,问题出现在1698.6GB附近,算是相当靠后了。
跑完read test以后05h、C5h、C6h都没有变化,只是BBh增加了。然后继续跑reinitialize test。


本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有帐号?注册

x
nighttob  楼主| 发表于 2019-10-23 20:24 | 显示全部楼层
扫完以后算是“复活”了,就是留下了1737个UNC。
插回服务器继续用。



本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有帐号?注册

x
tulei 发表于 2019-10-23 22:17 | 显示全部楼层
nighttob 发表于 2019-10-23 06:59
结果显示,问题出现在1698.6GB附近,算是相当靠后了。
跑完read test以后05h、C5h、C6h都没有变化,只是BBh ...

请教一下硬盘哨兵还有硬盘修复的功能吗?
nighttob  楼主| 发表于 2019-10-23 22:40 | 显示全部楼层
tulei 发表于 2019-10-23 22:17
请教一下硬盘哨兵还有硬盘修复的功能吗?

是啊
这种程度的坏道我是已经修过好几个了
eterfinity 发表于 2019-10-24 13:51 | 显示全部楼层
好贴   这个通电时间够硬
我决定去把仓库吃灰的16盘台系存储拿出来  盘子全部这么过一遍  正好也都是12年前的盘
tulei 发表于 2019-10-24 16:31 | 显示全部楼层
nighttob 发表于 2019-10-23 22:40
是啊
这种程度的坏道我是已经修过好几个了

能把硬盘哨兵修复硬盘的过程大致说一下吗?先谢谢了。
nighttob  楼主| 发表于 2019-10-24 17:02 | 显示全部楼层
tulei 发表于 2019-10-24 16:31
能把硬盘哨兵修复硬盘的过程大致说一下吗?先谢谢了。

http://bbs.pceva.com.cn/thread-56702-1-1.html
tiancai2nd 发表于 2019-10-25 23:09 | 显示全部楼层
哨兵修复“坏道”跟DISKGENIUS修复反应时超过XX毫秒的扇区的原理一样吗?各有啥优劣呢?
nighttob  楼主| 发表于 2019-10-25 23:40 | 显示全部楼层
tiancai2nd 发表于 2019-10-25 23:09
哨兵修复“坏道”跟DISKGENIUS修复反应时超过XX毫秒的扇区的原理一样吗?各有啥优劣呢? ...

不知道
我只用DG做分区镜像和克隆分区
连恢复功能都不用,因为我每天都备份
您需要登录后才可以回帖 登录 | 注册

本版积分规则

快速回复 返回顶部