August 29, 2011

Saving an academic career or "How I secured data from a failed external HDD"

Two weeks ago, a Friday afternoon, I received an emergency call from a friend. Her sister had stored all of her University work on an external harddrive. The rest of the story should be quite predictable from this moment on. Of course the data was stored nowhere else (this drive SHOULD have been only a backup drive but grew into the main working device) and suddenly one second to another the drive was not accessible anymore. Furthermore the drive emitted a clicking sound every few seconds. Several tries to get the drive running again failed so far. My friend already checked the most obvious causes like dirty contacts or loose connections and had no success so they gave me a call as they already apprehended a headcrash.

When the drive arrived in my hands I had the following situation: an external, almost brand-new, flat-lying 1TB harddisk exposing USB and eSATA connectors. To lower the further damage during uptime a bit I immediately fixed it in a vertical position using a stand from another external disk I had lying around. In my imagination this should minimize the bouncing of particles some more (because some of them then collect on the lower side of the encasing) if the surface of the disks had indeed taken some damage and slow down the degradation to some degree. A very quick test from within Windows showed that the drive tried to register with the system but failed to do so. So no more tinkering here but quickly start up a Linux system for recovery.

I started up SystemRescueCd which I had installed on an USB stick (using SARDU)for situations like those. Connecting the drive via eSATA failed because the drive didn't show up in /dev/ so I had to fall back to the slower USB connection for all following steps. Connecting with USB took some time (about ~30-60s) until the device showed up in /dev but then it was relatively accessible. First thing I checked was the SMART info using smartctl -a /dev/sdd where it became pretty obvious that the drive is badly damaged. About 100 relocated sectors and a handful of pending relocations. Very strong signs for a headcrash indeed, so no time to waste and get as much data from the disk as possible.

Trying to mount the disk failed so I could not just copy the files down but had to make a complete image at first to work with that later on without the failing drive. At this moment another problem struck as I had nothing around where I could store a 1TB image file. At maximum I could free up 600GiB on a Linux drive.

I had to make another call to find out that there should be an NTFS filesystem on it with about 200GiB of data stored on it. The drive should be relatively new and there has been not a lot of activity beyond storing and some updating of the files. So I hoped for a lot of uninitialized areas which would be easily compressible. A quick check with hexedit /dev/sdd confirmed my speculation, there were large zeroed-out areas at the end of the disk. This confirmation took a while because I seemed to hit erronous areas already at the beginning of the disk where the tool stalled until the read-error timeout snapped it out.

The Linux filesystem ext3 has support for sparse files which automatically compresses unused/zeroed areas of a file so I had the hope that the 1TB image-file would still fit on my 600GiB free space.

A simple copy of /dev/sdd (with cp or dd) would fail because of the errors on the disk, luckily there are tools available which save the working areas and try to recover the failing areas. I chose ddrescue for this job because it has a buildin switch for creating sparse target images, which saved me from manually creating one. I somewhat sticked to the instructions from the Forensics wiki and made a first pass over the disk without retrying failing sectors to save as much of the intact data as possible. ddrescue -d -S -n /dev/sdd disksddsparse logfile

This first run took quite some hours because transferring 1TB over USB at 30MB/sec (at best, almost zero when hitting defect sectors). Because of the logfile (the last parameter) I was able to interrupt the process overnight as I didn't want to let it run unattended for too long. During the copy from time to time I checked the SMART infos in a second terminal which showed me that either the disk was dedgrading by the minute or the disk logic was just counting currently undetected errors. But the further the initial rescue was running, the larger were the intervals between the errors which raised my hopes. In the end the first run ended with the full 1TB image stored on my disk (which took only ~250GiB because of the Sparse option), having about 130MiB of errors scattered across ~1100 locations. Not that bad, but there was surely some more to gain, so on to the second run.

In this second run I started ddrescue in a way where it looks closer to the erronous spots on the disk and tries to approximate to the exact location of the error within the whole error area to get out all bytes which are not really affected. These actions are called splitting and trimming of the defects. ddrescue -d -S /dev/sdd disksddsparse logfile

This repair-run finished faster because it only checked the errors, nevertheless it still took some hours. It was quite successful as it lowered the number of error-locations to ~904 and the affected data area to 512kiB. Wow. I wonder if there's more to squeeze out. Let's retry the errors and automatically retry without retry-limit ddrescue -d -S --retrim --max-retries=-1 /dev/sdd disksddsparse logfile

Again I let this run for some hours and when it seemed to only have minimal success anymore (about around the 5th automatic retry) it was down to 859 errors summing up to 490kiB of errors. So, finally the outcome of the rescue operation looked quite promising. Just for the curious ones, the smartctl statistics were far beyond good and evil with about 900 relocated sectors and 1300 pending. And big fat letters telling me "FAILING NOW"...

The last step now was to mount the partition within the disk-image. I found out the offset for the partition mount by comparing the outputs of the following hexedits and finding the second one in the first one (luckily Linux could detect the partition itself). hexedit /dev/sdd hexedit /dev/sdd1

If this weren't possible I would have calculated the partition offset using one of the guides on the internet here (German) or here. After that I coult mount the partition using... mount disksddsparse /mnt/image -o ro,loop,offset=0x7e00

... and began to copy the files out of the partition. There were some filename encoding issues and warnings during the copy which were finally resolved by mounting with a manually enforced charset. mount disksddsparse /mnt/image -o ro,loop,offset=0x7e00,iocharset=utf8

Well, that's the story of a saved academic career (at least a gigantic pile of work). I hope that my experiences maybe help someone other with rescuing data from a failing disk. Now I just have to decide what gift to take in exchange for this rescue operation... ;)

August 28, 2011

Still here

This is the 6th posting in this year now. My output so far is quite below of what I expected from myself. While I blame stress and lack of time for that until the beginning of July, there should have been some time for an update since then. So the only thing preventing postings has been my own lazyness... Ah well...

Ok, what's changed or noteworthy in the last months? Maybe the most important thing for me personally is that I've purchased a new (used) car. I've been passively looking for potential new cars for quite some time now although I've been very pleased and happy with my car so far. But this time the costs for the required servicing and repairs approached almost 2k Euros without a guarantee that the costs will be lower for the next service intervals. So I decided to take the lemon, say goodbye to my old loyal and reliable companion and pick up something with lower regular costs. Welcome my new car, a blue Fiat Grande Punto 1.3 JTD Emotion with 90 HP. It's in excellent shape and I hope it'll be an as good escort as my previous Fiat which carried me over 180k km in the the almost exactly past 6 years.

The next noteworthy news is that this year I'm a contestant in the Zw�lfkampf which is organized by some friends. It is a series of twelve games where an overall winner is calculated over tracks of 4, 8 or all 12 games. The main event on Sep. 24/25th is preceded by some "sideevent" games in which I made a not too bad impression but they don't count for the finale ;)

Communication is the next topic. Back in May I purchased a new smartphone, the LG Optimus Speed (or P990 or LGOS/LG2X). While It worked for some weeks without a hitch at one day it began to show a very special defect where I could not place or receive calls when I'm registered in the 3G network of my provider. I borrowed some other SIM cards from different providers but the problem only exists with the network of mine. Since I'm denied service for it at my provider (it doesn't have this phone in its portfolio) I have to handle everything myself. Currently (yes, it's August now!) it's on its 4th trip to the LG service from which I hope it will eventually return completely replaced. The previous three times the device was only "serviced" and at one occassion the mainboard (without baseband module) and the camera module (wtf?) were replaced. Of course without effect. At the moment my seller and I are just hoping that LG just relents to sending me a new phone instead of useless servicing of the defect one.

On to the university stuff. Not much to report here, since the last exam in July and last hand-in also in July I pretty much left everything in a standby mode. But I'm warming up already as the next key-dates are approaching and there is still a bit of work to do, exams to prepare for and documents to hand in.

Some holiday-related stuff now. I took two weeks summer holiday at the end of July. Originally I planned to use that time to firstly do the car stuff and to finish building some concrete walls for a terrace behind the house. The car business took sadly longer than expected as my first seller let me down and sold the car to somebody other. I could get my hands on another one as you already read above and retrospectively this was a lucky coincidence as my first candidate car was not in an as good shape and more expensive. The wall works also could not be finished in these two weeks because the bricks we ordered took longer to deliver than planned and also it rained the whole two weeks but for two days.

And finally the work news. Project work in the company is running as usual with maybe some changes for me personally in the near future. But as it's not complete determined, this is all I'd like to say so far. After my holiday I returned in a modified team with three new people and it's still changing as soon one of our colleagues will change to another team and tomorrow we'll receive assistance with another new team member. Yeah, bit of change ;)

Well, that's it for now. I hope that I can raise my posting interval in the future again but I wouldn't bet on that as it didn't work in the past six months....

P.S.: Ah yes, I'm on Google+ too meanwhile. This time with my real name, those who know me are of course invited at any time to show me their presence ;)