Проблемы с жестким диском

K
На сайте с 12.07.2006
Offline
295
Kpd
2928

Есть арендованный сервер под Freebsd 7.3, 3 жестких диска, два из них в raid1, третий отдельно (под бэкапы). В один непрекрасный день при создании бэкапов что-то пошло не так и сервер намертво завис, ожил только после жеской перезагрузки. В логе такие ошибки по диску (который отдельно):

kernel: ad8: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly
kernel: ad8: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly
kernel: ad8: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request directly
kernel: ad8: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly
kernel: ad8: WARNING - SET_MULTI taskqueue timeout - completing request directly
kernel: ad8: FAILURE - READ_DMA48 timed out LBA=509838943
kernel: g_vfs_done():ad8s1d[READ(offset=261037506560, length=32768)]error = 5

Гугль говорит, что это симптомы умирания диска и скоро ему будет совсем плохо. Техподдержка говорит, что всё проверила и проблем нет. Вопрос - что делать?

esetnod
На сайте с 16.07.2009
Offline
134
#1

Для начала smart посмотреть, можно ручками ткнуться в блок 509838943. ( если это стабильно к панику не приводит)

Быстрый хостинг на SSD от $0.99 (http://just-hosting.ru/) | OpenVZ (http://just-hosting.ru/vds.html) и KVM (http://just-hosting.ru/vds-kvm.html) VDS от $7.95
K
На сайте с 12.07.2006
Offline
295
Kpd
#2
esetnod:
Для начала smart посмотрет

Как это сделать на freebsd?

[umka]
На сайте с 25.05.2008
Offline
456
#3
Kpd:
Как это сделать на freebsd?

/usr/ports/sysutils/smartmontools

http://sourceforge.net/apps/trac/smartmontools/wiki

Лог в помощь!
K
На сайте с 12.07.2006
Offline
295
Kpd
#4

Вот что говорит smartctl -a /dev/ad8

smartctl 5.40 2010-10-16 r3189 [FreeBSD 7.3-RELEASE amd64] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.11 family
Device Model: ST31500341AS
Serial Number: 9VS46HJK
Firmware Version: CC1H
User Capacity: 1,500,301,910,016 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Fri Feb 25 11:20:59 2011 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 609) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 255) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x103f) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 112 099 006 Pre-fail Always - 173325211
3 Spin_Up_Time 0x0003 100 100 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 6
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 7
7 Seek_Error_Rate 0x000f 070 060 030 Pre-fail Always - 11585888
9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 2077
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 6
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 091 091 000 Old_age Always - 9
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 098 098 000 Old_age Always - 2
190 Airflow_Temperature_Cel 0x0022 073 061 045 Old_age Always - 27 (Min/Max 25/28)
194 Temperature_Celsius 0x0022 027 040 000 Old_age Always - 27 (0 22 0 0)
195 Hardware_ECC_Recovered 0x001a 048 023 000 Old_age Always - 173325211
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 144581484087325
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 2750909728
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 3116739818

SMART Error Log Version: 1
ATA Error Count: 7 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 7 occurred at disk power-on lifetime: 2054 hours (85 days + 14 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 ff ff ff 4f 00 04:43:02.112 READ DMA EXT
25 00 00 ff ff ff 4f 00 04:43:02.104 READ DMA EXT
25 00 00 ff ff ff 4f 00 04:43:01.569 READ DMA EXT
25 00 00 ff ff ff 4f 00 04:43:01.561 READ DMA EXT
25 00 00 ff ff ff 4f 00 04:42:57.111 READ DMA EXT

Error 6 occurred at disk power-on lifetime: 1568 hours (65 days + 8 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 ff ff ff 4f 00 38d+04:40:45.399 READ DMA EXT
25 00 40 ff ff ff 4f 00 38d+04:40:45.345 READ DMA EXT
25 00 00 ff ff ff 4f 00 38d+04:40:35.879 READ DMA EXT
25 00 00 ff ff ff 4f 00 38d+04:40:35.865 READ DMA EXT
25 00 00 ff ff ff 4f 00 38d+04:40:35.853 READ DMA EXT

Error 5 occurred at disk power-on lifetime: 1568 hours (65 days + 8 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 ff ff ff 4f 00 38d+04:40:35.879 READ DMA EXT
25 00 00 ff ff ff 4f 00 38d+04:40:35.865 READ DMA EXT
25 00 00 ff ff ff 4f 00 38d+04:40:35.853 READ DMA EXT
25 00 00 ff ff ff 4f 00 38d+04:40:35.840 READ DMA EXT
25 00 00 ff ff ff 4f 00 38d+04:40:35.829 READ DMA EXT

Error 4 occurred at disk power-on lifetime: 1447 hours (60 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 20 ff ff ff 4f 00 33d+03:57:04.936 READ DMA EXT
25 00 00 ff ff ff 4f 00 33d+03:57:00.555 READ DMA EXT
25 00 20 ff ff ff 4f 00 33d+03:57:00.480 READ DMA EXT
25 00 00 ff ff ff 4f 00 33d+03:56:56.524 READ DMA EXT
25 00 20 ff ff ff 4f 00 33d+03:56:56.441 READ DMA EXT

Error 3 occurred at disk power-on lifetime: 1447 hours (60 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 ff ff ff 4f 00 33d+03:57:00.555 READ DMA EXT
25 00 20 ff ff ff 4f 00 33d+03:57:00.480 READ DMA EXT
25 00 00 ff ff ff 4f 00 33d+03:56:56.524 READ DMA EXT
25 00 20 ff ff ff 4f 00 33d+03:56:56.441 READ DMA EXT
25 00 00 ff ff ff 4f 00 33d+03:56:52.184 READ DMA EXT

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Какие-то ошибки ест. Не могу понять, критичные они или нет?

rtyug
На сайте с 13.05.2009
Offline
263
#5

может кабель отошел, нужно выключить и кабель питания подправить...

потом fsck, если опять будет, то потом /usr/ports/sysutils/ffs2recov

ffs2recov -a

ЗЫ: или сразу включи ffs2recov -a, только раздел в 100Гиг будет суток 3-5 корректировать

Спалил тему: Pokerstars вывод WMZ, etc на VISA 0% или SWIFT + Конверт USD/GBP,etc (net profit $0,5 млрд) (https://minfin.com.ua/blogs/94589307/115366/) Monobank - 50₴ на счет при рег. тут (https://clck.ru/DLX4r) | Номер SIP АТС Москва 7(495) - 0Ꝑ, 8(800) - 800Ꝑ/0Ꝑ (http://goo.gl/XOrCSn)
Andreyka
На сайте с 19.02.2005
Offline
822
#6

5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 7

7 Seek_Error_Rate 0x000f 070 060 030 Pre-fail Always - 11585888

В морг

Не стоит плодить сущности без необходимости

Авторизуйтесь или зарегистрируйтесь, чтобы оставить комментарий