ddrescue

最後更新: 2022-10-27

介紹

HomePage: http://www.gnu.org/software/ddrescue/

If you use the logfile feature of ddrescue, the data is rescued very efficiently, (only the needed blocks are read).

Also you can interrupt the rescue at any time and resume it later at the same point.

ddrescue does not write zeros to the output when it finds bad sectors in the input,

(所以當不理會 bad sector 時 zero 那位置一次)

and does not truncate the output file if not asked to.

So, every time you run it on the same output file, it tries to fill in the gaps without wiping out the data already rescued.

ddrescue manages efficiently the status of the rescue in progress and tries to rescue the good parts first,

scheduling reads inside bad (or slow) areas for later.

This maximizes the amount of data that can be finally recovered from a failing drive.

* If the damaged drive is not listed in /dev, then you cannot rescue it. At least not with ddrescue.

nor is related to dd

try to minimize head movement to minimize drive damage.

 


Recovery 的過程(Algorithm)

 

Recovery 的過程一共有 3 個 Phase, 最終只有 Good ("+") / Bad Block ("-")

流程: non-tried -> non-trimmed -> non-scraped -> bad-sector

  • non-tried         # Size of the part of the rescue domain pending to be tried.
  • non-trimmed    # Size of the part of the rescue domain pending to be trimmed.
  • non-scraped     # Size of the part of the rescue domain pending to be scraped.

First phase - Copying

Copying is done in up to 5 passes.

The first pass

It reads the non-tried parts of the input file, marking the failed blocks as non-trimmed and skipping beyond them.

The second pass

It delimits the blocks skipped by the first pass.

The third and fourth passes

It read the blocks skipped due to slow areas (if any) by the first two passes, in the same direction that each block was skipped.

For each block, passes 2 to 4 skip the rest of the block after finding the first error in the block.

The last pass

It is a sweeping pass, with skipping disabled.

The copying direction is reversed after each pass until all the rescue domain is tried.

Only non-tried areas are read in large blocks. Trimming, scraping and retrying are done sector by sector.

Each sector is tried at most two times; the first in this phase as part of a large block read,

the second in one of the phases below as a single sector read.

The purpose of the multiple passes is to delimit large bad areas fast,

recover the most promising areas first, keep the mapfile small,

and produce good starting points for trimming.

Second phase - Trimming

Trimming is done in one pass.

For each non-trimmed block, read forwards one sector at a time from the leading edge of the block until a bad sector is found.

Then read backwards one sector at a time from the trailing edge of the block until a bad sector is found.

Then mark the bad sectors found (if any) as bad-sector, and mark the rest of the block as non-scraped without trying to read it.

Third phase - Scraping

Scrape together the data not recovered by the copying or trimming phases. Scraping is done in one pass.

Each non-scraped block is read forwards, one sector at a time. Any bad sectors found are marked as bad-sector.

 

目錄

  • Disk Read Timeout
  • Install

 


Disk Read Timeout

 

"dd" 係會使用系統的 timeout 及 eh_timeout 設定

timeout

# To control the command timer

/sys/block/<deviceName>/device/timeout                # SCSI default: 30

eh_timeout

# The timeout value for "TEST UNIT READY" and "REQUEST SENSE" commands used by the SCSI error handling code.

/sys/block/<deviceName>/device/eh_timeout           # Default: 10

修改

i.e.

echo 3 > /sys/block/sdf/device/timeout

echo 3 > /sys/block/sdf/device/eh_timeout

 


Install

 

apt-get install ddrescue

ddrescue -V

GNU ddrescue 1.16

 


Usage

 

ddrescue [options] infile outfile [logfile]

-i <bytes>                   # starting position in input file (Default 0)

-s <bytes>                  # maximum size of input data to be copied

-b softbs                      # sector size of input device [Default 512]

-f, --force                     # 當目標是 Device 時就要加它

-A, --try-again             # mark non-split, non-trimmed blocks as non-tried
                                  # Try this if the drive stops responding and ddrescue immediately starts scraping failed blocks when restarted.

-M, --retrim                 # Mark all failed blocks inside the rescue domain as non-trimmed before beginning the rescue.

-p, --preallocate          # preallocate space on disc for output file

Unit

s = sectors, k = 1000, Ki = 1024, M = 10^6, Mi = 2^20

 


Example

 

<0> 將 sda3 變成 image

dd_rescue /dev/sda3 /media/backup/sda3.img ddrescue.log

<1> Disk to Disk Clone

# sda -> sdz

# '-f'         Force overwrite of outfile. Needed when outfile is not a regular file, but a device or partition.
# '-N'        skip the trimming phase
# '-n'        skip the scraping phase

ddrescue -f -n -N /dev/sda /dev/sdz /root/ddrescue.log

# '-d'        use direct disc access for input file

ddrescue -f -d -r 3 /dev/sda /dev/sdz /root/ddrescue.log

# 用 zero 去清有問題的 Block

ddrescue -f /dev/zero /dev/sdz /root/ddrescue.log

<2> zip 起它 / transfer to remote

dd_rescue /dev/sda1 - | bzip2 > /dir/file.img.bz2

dd_rescue /dev/sda1 - | ssh [email protected] "cat - > /remote/destination/file.img"

<3> 分兩 Part recovery

a) 快速初步 recovery

     ddrescue -i0 -s50MiB /dev/hdc hdimage logfile

b) Now rescue the rest (does not recopy what is already done).

     ddrescue /dev/hdc hdimage logfile

     ddrescue -d -r3 /dev/hdc hdimage logfile

ipos

    Input position. The position in the input file where data are being currently read from.

tried

    Size of the part of the rescue domain already tried but not yet rescued.
    This is the sum of the sizes of all the non-trimmed, non-scraped, and bad-sector blocks.

 


Rescue some key disc areas

 

Opts

'-i bytes' / '--input-position=bytes'

Starting position of the rescue domain in infile, in bytes. Defaults to 0.  

'-s bytes' / '--size=bytes'

Maximum size of the rescue domain, in bytes. It limits the amount of input data to be copied.

i.e.

# 由 0 byte 位置開始 copy 200G

ddrescue -d -i -s 200G -r 2 /dev/sde sde.img ddrescue.log

# 由 30Gib 位置開始 copy 10GiB

ddrescue -i30GiB -s10GiB /dev/hdc hdimage logfile

 


Direct disc access (-d)

 

If you notice that the positions and sizes in the logfile are ALWAYS multiples of the sector size,

maybe your kernel is caching the disc accesses and grouping them.

In this case you may want to use direct disc access to bypass the kernel cache and rescue more of your data.

NOTE! Sector size must be correctly set with the '--sector-size' option for this to work.

# fast reading first
ddrescue -f -n /dev/hdb1 /dev/hdc1 logfile

# slower than normal cached reading
ddrescue -f -d -r3 /dev/hdb1 /dev/hdc1 logfile

# fix & mount
e2fsck -v -f /dev/hdc1
mount -t ext2 -o ro /dev/hdc1 /mnt

'-r n' '--retry-passes=n'       # Exit after given number of retry passes. Defaults to 0 (-1=infinity)

使用後由

[696330.055065] sd 7:0:0:0: [sde] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[696330.055069] sd 7:0:0:0: [sde] tag#0 Sense Key : Medium Error [current]
[696330.055072] sd 7:0:0:0: [sde] tag#0 Add. Sense: Unrecovered read error
[696330.055076] sd 7:0:0:0: [sde] tag#0 CDB: Read(10) 28 00 20 88 22 30 00 00 08 00
[696330.055078] print_req_error: critical medium error, dev sde, sector 545792560
[696330.055089] Buffer I/O error on dev sde, logical block 68224070, async page read
[696336.077706] sd 7:0:0:0: [sde] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[696336.077711] sd 7:0:0:0: [sde] tag#0 Sense Key : Medium Error [current]
[696336.077714] sd 7:0:0:0: [sde] tag#0 Add. Sense: Unrecovered read error
[696336.077717] sd 7:0:0:0: [sde] tag#0 CDB: Read(10) 28 00 20 88 22 30 00 00 08 00
[696336.077720] print_req_error: critical medium error, dev sde, sector 545792560
[696336.077731] Buffer I/O error on dev sde, logical block 68224070, async page read

變成 (沒有了 "Buffer I/O error" 及重複一次 )

[696419.260665] sd 7:0:0:0: [sde] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[696419.260669] sd 7:0:0:0: [sde] tag#0 Sense Key : Medium Error [current]
[696419.260672] sd 7:0:0:0: [sde] tag#0 Add. Sense: Unrecovered read error
[696419.260675] sd 7:0:0:0: [sde] tag#0 CDB: Read(10) 28 00 20 8a 97 00 00 00 80 00
[696419.260678] print_req_error: critical medium error, dev sde, sector 545953536

 

 


Re-try & Skip

 

ddrescue -A --retrim -f -i0 -s300GiB  /dev/sdc /dev/sdd /home/logfile

'-A'   ('--try-again')

Mark all non-trimmed and non-scraped blocks inside the rescue domain as non-tried before beginning the rescue.

Try this if the drive stops responding and ddrescue immediately starts scraping failed blocks when restarted.

If '--retrim' is also specified, mark all failed blocks inside the rescue domain as non-tried.

'--retrim'

Mark all failed blocks inside the rescue domain as non-trimmed before beginning the rescue.

Skip

-n, --no-scrape                                # skip the scraping phase

-N, --no-trim                                   # skip the trimming phase

 

 


Logfile structure

 

Example:

# Mapfile. Created by GNU ddrescue version 1.23
# Command line: ddrescue -A -n -d /dev/sdf3 dsk2.img dsk2.log
# Start time:   2023-11-03 17:09:35
# Current time: 2023-11-03 17:14:41
# Copying non-tried blocks... Pass 1 (forwards)
# current_pos  current_status  current_pass
0x368A14BA00     ?               1
#      pos        size  status
0x00000000  0x00200000  +
0x00200000  0x01CA0000  -

The first non-comment line is the status line.
(The status line allows ddrescue to resume the copying phase instead of restarting it from pass 1)

  • The first integer is the position being tried in the input file.
  • Status character
  • Current pass in the current phase.

Status character

Character     Meaning

  • '?'     copying non-tried blocks
  • '*'     trimming non-trimmed blocks
  • '/'     scraping non-scraped blocks
  • '-'     retrying bad sectors
  • 'F'     filling specified blocks
  • 'G'     generating approximate logfile
  • '+'     finished

The blocks in the list of data blocks

Character     Meaning

  • '?'     # non-tried block
  • '*'     # failed block non-trimmed
  • '/'     # failed block non-scraped
  • '-'     # failed block bad-sectors (bad blocks)
  • '+'    # finished blocks (good sectors)

 


Example: Recovery CD

 

# recovery

ddrescue -n -b2048 /dev/sr0 cd.iso mapfile

ddrescue -d -r 3 -b2048 /dev/sr0 cd.iso mapfile

remark

-b # sector size of input device [default 512]

在 CD 時一定係 -b2048

如果 errsize 係 0 咁果隻 CD 就救返了.

# Identifying Discs (iso image)

apt-get install genisoimage

isoinfo -d -i /dev/sr0

  • -d                     # Print information from the primary volume descriptor (PVD)  of  the  iso9660  image.
  • -i iso_image

# recovery content from CD

mount -o ro,loop -t iso9660 image.iso /mnt/mountpoint

 


Fill Mode (--fill)

 

if you use the "--fill" option, ddrescue does not rescue anything.

fill blocks of given types with data (?*/-+l)

Note that in fill mode the input file is always read from position 0.

In fill mode the input file may have any size.

If it is too small, the data will be duplicated as many times as necessary to fill the input buffer.

If it is too big, only the needed data will be read.

i.e.

# fills all areas marked as ‘-’ (bad hardware blocks) with copies of the string "BAD BLOCK"

echo -n "BaDB1K~!" > tmpfile

ddrescue --fill='-' tmpfile sde.img ddrescue.log
ddrescue --fill='*' tmpfile sde.img ddrescue.log
ddrescue --fill='/' tmpfile sde.img ddrescue.log
ddrescue --fill='?' tmpfile sde.img ddrescue.log

Figure out currupt file

1) Copy the damaged drive with ddrescue until finished.

    Do not use sparse writes. This yields a logfile with only finished (‘+’) and bad (‘-’) areas.

    -S, --sparse               # use sparse writes for output file

2) Mount the copied drive (or the image file, via loopback device).

3) Compute a md5sum or other checksum for every file.

    Build a list of all the files and their checksums.

4) Fill the bad areas of the copied drive or image file with a byte value different from zero.

5) Verify the checksums. Those files which have different checksums this time reside (at least partially) in damaged disk areas.

 

 

Creative Commons license icon Creative Commons license icon