Reconstructing heavily damaged hard drives

[EDIT: Hey guys, thanks for the feedback! Someone over at virtuallyhyper.com has an awesome write up that deals with SD cards specifically (but is highly relevant to hard drives too), with a set of much improved and updated scripts. I’d strongly recommend taking a look … Recover files from an SD card using Linux utilities]

Recover data even when an NTFS (or other) won’t mount due to partial hard drive failure.

This was born when someone brought me a near dead hard drive (a serious number of read errors, so bad that nothing could mount or fix the filesystem), asking if I could recover any data.

Now obviously (as almost any geek would know), the answer of course is a very likely yes. There are many ways of recovering data. One such way (which I performed) is using Foremost to find files based on their headers and structures. While this technique works really quite well, it does miss a lot of files, fragment others up, leave bits out and generally not retrieve any metadata (such as filenames etc).

This makes Matt mad. No filenames == days of renaming files.

So I booted up Helix, created a quick image of the drive to a 500GB external drive, and tried running Autopsy (the GUI of Sleuthkit). This is where things got interesting.

I say interesting, because Sleuthkit couldn’t read the filesystem. But it could retrieve the inodes, and the metadata along with them. And it could accordingly retrieve the data content of (some) files.

Observing this, I realized there was a high probability that I could somehow use Sleuthkit’s command line tools to retrieve the files which were not on bad clusters and recover the filenames from the inode. As it turns out, this wasn’t such a bad idea!

There are 3 tools which proved useful:

  • ils
  • ffind
  • icat

ils “lists inode information” from the image, ffind “finds the name of the file or directory using the given inode” and icat “outputs the content of the file based on it’s inode number”. Using these three tools and a bit of bash, we can grab a list of inodes, get the filename from the metadata, create the directory structure beneath it, extract the file content, move on to the next.

So for this task I knocked up the following (really ugly, potentially unsafe) script:

#!/bin/sh
for inode in $(cat /tmp/inodes) ; do
 
/KNOPPIX/usr/local/sleuthkit-2.09/bin/ffind /dev/hda1 $inode
 
if [ $? -eq 0 ]
then
	echo "INODE: $inode"
	INODEDIR=`/KNOPPIX/usr/local/sleuthkit-2.09/bin/ffind /dev/hda1 $inode`
 
	REALDIR=/mnt/out`dirname "$INODEDIR"`
	FILENAME="/mnt/out$INODEDIR"
	mkdir -p "$REALDIR"
 
	echo "FILENAME: $FILENAME"
	/KNOPPIX/usr/local/sleuthkit-2.09/bin/icat /dev/hda1 $inode > "$FILENAME"
 
	if [ `du "$FILENAME" | awk '{print $1}'` == 1 ]
	then
		rm "$FILENAME"
		mkdir -p "$FILENAME"
	fi
	echo ""
fi
done

Really, I do warn you, take serious care running this!

It needs a lot of work, but enough is there for it to function. It reads a file of inode numbers (one per line) and uses ffind to get the filename. We extract the path, attempt to create it, output the file content and (this is important), take a wild guess at if the inode was a directory. Please note this is wildly inaccurate and needs serious rethinking! Currently we look at the file size, and assume directories alone use 1 byte.

We can populate a file with inode numbers like so:

ils -a /dev/hda1 | awk -F '|' '{print $1}' > /tmp/inodes

(Users of Helix will need to use the full pathname to ils as in the above script).

At some point (no garuntees when) I’ll tidy up the script and make it more bullet proof. In the meantime, I hope this saves some data!

Remember: No matter how much data you have, it’s always better to have 2 hard drives of half the size, mirrored than it is to have one large expensive drive. They will die unexpectedly! When you next buy a bigger hard drive, consider this: 1x500GB drive will loose you 500GB of data. 2x250GB will 99.9% probability loose you nothing. So if you’re on a tight budget, buy twice smaller. If you’ve a lot of money, buy twice big.

Oh, and always make regular backups. Cheap USB drives are good for this!

20 thoughts on “Reconstructing heavily damaged hard drives”

  1. Hey!

    I’ve really hosed myself, and I’m hoping to use your little script. I don’t suppose you’ve improved it in 2 years?

    Thanks!

  2. I’m afraid I haven’t.

    However, if you have partial success and need some help, I’m happy to give you any advice or assistance I can.

    What’s the situation? Filesystem, deleted files or physical failure? Reformatted?

  3. Hi. I have a friend who I am trying to help out. He has a Windows Vista PC that was getting BSOD so he took it to one of those computer geeky companies. Well, I have used UBCD4WIN to look at his directory structure and it is gone! No Users folder, no Documents and Settings files,… no Windows folder. It has (from memory since I am at work and PC is at home) a Win-PE folder. On his new computer, they have a folder called RAW Files. Inside are many folders like Word Documents, Excel Documents, Power Point Documents, JPEG Files, etc. Inside these folders you see file1.jpg, file2.jpg, file3.jpg, etc. You get my drift and you can probably imagined what they did. Of course, his Quickbook files are not there since there is no QIF Files folder or Quickbook Files folder. What is your recommendation to start with. The hard drive seems to be in good shape and I ran a memory scan which turned up no problems.

    Thanks,

    Fil

    1. Sorry to hear that mate, sounds like they really stuffed you up. If you’ve not got access to a Linux workstation, i’d very much recommend starting off by getting one. A large usb thumb drive or external hd is an easy solution, i’d recommend ubuntu. Next make sure you get a byte level copy of your friends drive, using the command “dd”. I’m not at a computer right now, i’ll fill you in with the details tonight. From there you can try using some of the techniques outlined in the blog post, without risking making things worse!

  4. Hi, Wally!

    I must to tell you that, in 10 days of intense Google search, this is the best article I found about how to recover files (with their real names) in Linux.

    My only doubt is this: how for God’s sake did you get to retrieve all the inodes (I mean: a list of inodes) from a imagedisk with Sleuthkit?

    Your help will be most appreciated!

    1. Well, thanks!

      It’s been a while – well, several years, since I last used the software.

      I believe the way I retrieved the inodes is using the last command in the post:

      ils -a /dev/hda1 | awk -F '|' '{print $1}' > /tmp/inodes

      YYMV, as sluethkit has almost certainly been updated since I used it …

      The “ilsmanual is available online – I quote from that document:

      -a
      List only allocated inodes: these belong to files with at least one directory entry in the file system, and to removed files that are still open or executing.

      I’m using awk just to grab the second column of the ils output.

  5. Wally, sorry to bother with something so simple. In the code above, “/dev/hda1” is the mounted image you’ve cloned from your damaged partition, isn’t it? And “/mnt/out” is the folder you’ve created to place the data from the image, is that correct?

  6. Do you know anything else I can do?

    – – –

    While trying to identify the inodes, I got this:
    “Cannot determine file system type”

    Well the filesystem in the case is ext4. So I put “-i ext4” to the mix and got this:
    “Unsupported image type: ext4”

    A little search finished to broke my heart:
    Ext4 is the modern replacement for Ext3, which is beginning to appear as the default install option for many Linux distributions. Currently, the Sleuth Kit does not have proper Ext4 support, but some tasks do work. For example, running ils against a specifc inode will still return expected results, but fs will exhibit inconsistent behavior. This is because the metadata structures have remained consistent with those found in an Ext2/Ext3 fle system, but the data unit layer has changed quite dramatically.

  7. In the code above, “/dev/hda1″ is the mounted image you’ve cloned from your damaged partition, isn’t it?

    Yes … although that’s altered from what I actually did. I used dd to byte-by-byte copy the disk (/dev/xxx) to an image on my local drive – dd provides some nice options like “noerror” which forces copying even on I/O problems: cf. dd manual page). That way I didn’t risk damaging the disk further by forcing it to spin etc!

    And “/mnt/out” is the folder you’ve created to place the data from the image, is that correct?

    Yes.

    Well the filesystem in the case is ext4. So I put “-i ext4″ to the mix and got this:
    “Unsupported image type: ext4″

    Mate, I’m really sorry. Sounds like ext4 (which if you’re not aware, is a relatively recent FS) isn’t supported – and probably won’t be for a while. It’s partially for this reason I’m sticking with ext3 for the time being …

    Do you know anything else I can do?

    Yes. Don’t give up 😉 It took me 3 days to get anything off the hard-drive I looked at!

    Keep an eye out for new Sleuthkit releases … it’s as always only a matter of time …

    Other people also suggeset this:
    http://www.cgsecurity.org/wiki/TestDisk

    As always, YMMV 🙂

    Let me know if you have any success, I’d be delighted to know!

  8. Wally, great news! I wrote to the list and just take a look at the reply I received:

    – – –
    Hey Rogério,

    A few months ago, I wrote up a set of patches that bring basic Ext4 support to TSK. Specifically, it updates the inode structures (creation timestamps, yay!) and adds support for extents. You (and all others) are welcomed to give it a shot. You can download it here: http://www.williballenthin.com/ext4/TSK-Ext4.patch (MD5: 3ba1d239bdc6048f0aeeba0447315346).

    You’ll need to patch TSK version 3.2.1 source, and then build it. However, with this done, the standard metadata and content layer tools worked against my test Ext4 images. I will admit, however, that my testing was not very thorough, so your mileage my vary.

    I have submitted the patches to Brian; however, I suspect that since I have not fully implemented all the new features of Ext4 (huge files, journaling changes, …), it has not been included in the most recent update.

    Please let me know if you have any questions. I’ll try to provide support as best I can, and as time allows 😉

    Willi Ballenthin
    – – –

    I’ll give it a try and post the results, okay?

  9. By the way, he says: “I know the latest version is newer than what I patched against. I haven’t had time to update the patch since 3.2.2. I think the only bug fixes were against non-Ext file systems, so the minor update shouldn’t add anything relevant to your situation. You have to use 3.2.1”.

  10. If this helps anybody:

    Sleuthkit 3.2.0 on Ubuntu 11.04

    sudo apt-get install libewf1 libewf-dev zlib1g-dev build-essential libexpat1-dev libfuse2 libfuse-dev fuse-utils gvfs-fuse libncurses5-dev libreadline-dev uuid-dev libssl-dev

    Download and extract afflib from http://afflib.org/ and run:
    ./configure
    make
    sudo make install

    Download and extract Sleuthkit from http://www.sleuthkit.org/sleuthkit/download.php

    Download and extract http://www.williballenthin.com/ext4/TSK-Ext4.patch

    Apply the patch (sleuthkit-3.2.1 and TSK-Ext4.patch are in the same folder)

    cd sleuthkit-3.2.1
    patch -p1 < ../TSK-Ext4.patch

    After that run:
    ./configure
    make
    sudo make install

    1. Mighty fine. That’s typically awesome open-source software for you!

      (Imagine trying to get that level of help for say Norton Ghost!)

      Let me know if you need any help building with the patch etc … doesn’t sound like it should be too much pain.

  11. For broke harddisks (and for non broke too), also take a look at GNU ddrescue: http://www.gnu.org/s/ddrescue/ddrescue.html , which can copy all that is still readable in 1 go first, and then try harder on problematic regions. It was still able to rescue about half of my already dieing disk.

    Don’t confuse with dd_rescue, upon which is based, but which offers far less capabilities.

    Also, when using ddrescue, do *not* write the image to an ntfs formatted disk using the ntfs-3g driver, it’ll go as slow as hell, and probably wear out your disk even faster. This is not the fault of ddrescue, just the fact that it seeks in the output file, and writes small amounts at a time

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.