I’d mentioned in a previous story that i’ve set out on the horrid task of digitizing about 5,000 – 6,000 of my family’s 35mm slides, dating from the early 1960s until the present day.

I want to free all our old pictures from the prison of the slide carousel, where they’ve sat for – literally – decades, so that our family members who are scattered around the world can easily browse them.

I’m no archivist, but I know that many of the folks who read this blog are, so i thought i’d write up how i’m doing this, and solicit feedback/advice from those with more experience than I have.

The first constraint we had for this project was budget; Meredith insisted that I try to keep it cheap.

The main costs of a project like this are hardware & time.

Even though my time is valuable, i’m going to write it off as “free” because this for me is a hobby, and it’s certainly more productive than bowling or playing half-life (the experience of seeing people, places and pets that have long since departed is quite special, if sometimes a bit sad.)

That leaves the real cash outlay to two kinds of items:

1) A method to scan the slides & film into the PC.
2) A method to -reliably- store and share the scans.

The Scanner

For the scanner, I picked the Konica/Minolta DiMAGE Scan Dual IV. It was reasonably priced (about $250 at Amazon) and each magazine/slide holder can take 4 slides at a time. The output is fantastic, and it has built-in correction filters to help restore slides that are past their prime (like most in my collection…)

I decided to not be cheap (sorry, Meredith) and paid for an external piece of scanner software called Vuescan (about $80 for the professional version, with free lifetime upgrades.)

Vuescan is notable because it really does make higher-quality scans than the software that came bundled with the scanner and it makes automating the scanning process much easier.

The automation bit is critical, because with a collection of this size, you cannot scan slides in individually, name them individually, apply filters individually, etc… By the time you’d finish scanning, anyone who cared about the slides would be dead.

The Process

With Vuescan, once you set up the parameters the way you want them, scanning becomes as simple as loading up the magazine with 4 slides, shoving it in the scanner, and waiting for it to pop out. The PC gives an audible “beep” when it pops the magazine out.

Then you shove in another 4, rinse, repeat 1000 times. You don’t even need to touch the computer at all when doing this. So you can do other things throughout the day, and when you walk by the scanner & see the magazine sticking out, you can just load up 4 more slides, shove it in & walk away. Each slide takes about 1 minute to complete.

Even with this automation, the scanning process is a big job, but it goes from something that would require complete attention to something that you can just do casually throughout the day (this is a hobby, remember..)

The Storage Problem

The biggest hurdle to this project is finding a reasonable way to safely store the output of the scans. Hard drives are notoriously unreliable, and I cringe when i see people in CompUSA buying those 200GB external drives knowing that if they are lucky, the drive will last 3 years before it catastrophically fails taking their data with it.

Compounding the problem is that the raw output from the scanner is enormous. Each slide takes about 20MB (that’s on vuescan’s “archive” setting, a lossless, 24-bit file.) When you transform the slide into something more reasonable, like a JPG, it gets crunched down to about 1/15th of that size. But when you’re doing an archival project of this size, you want to preserve the high-quality, raw scans.

Doing the math, 5,000 slides * 20MB ~ 100,000MB ~ 100GB. Adding in the small JPG output that we’ll want for each slide too, and it’s about 110GB of data, give or take.

A good deal of data, to be sure, but well within the capacity of consumer hardware in 2005.

There is a temptation to go into CompUSA and buy one of those big external firewire drives to store all this data. They are cheap and easy to set up. But I had to fight my “cheap gene” with all my strength in order not to buy one, because they are simply not reliable enough.

Storage Solution

My solution to this problem was two-pronged: RAID storage for the scans + Optical storage for off-site archival.

To handle the RAID storage, i decided to buy an Antec Aria case (small, meredith-approved), and throw in an old VIA C3 motherboard that I had lying around (The C3 is famous b/c it uses very little electricity & doesn’t need a fan. Perfect for a file server that’s going to be kept around the house.) Total cost for both was $130.00.

I set up a RAID5 array by going to CompUSA and purchasing 3 Seagate 160GB drives (CompUSA was having a sale, each drive was $59.00 after rebates.) Total cost for storage was about $180.00.

RAID5 is nice in that it gives you reliability (one drive can fail & you’re OK.. simply replace it with a new one and it repairs itself.) It also gives you 2/3rds the capacity of the whole array, so the effective storage space on my system was about 300GB. More than enough for the project.

Finally, I had an old Sony internal DVD burner lying around, and I knew it to be reliable, so that saved us about $70.00.

Each carousel comes out to about 2Gb of data, so conveniently 2 carousels fit per DVD burned. The DVDs can then be taken off-site for safe storage.

Since I had the install CD’s already, I chose Mandrake Linux 10.1 to be the OS for the server. Though in reality, any version of linux will do. Mandrake makes it easy to set up the RAID array. Fedora Core also makes it a snap. I haven’t tried it with any other distros, YMMV, but I doubt the procedure is too far different.

The server is just that — a headless machine running ssh and samba and almost nothing else. It’s used as a central repository for all the scans.

Sharing & Meta Data

At this point, we’ve got a reliable and (relatively) fast method to scan the slides and a safe way to store them.

The next big issue is what to -do- with all the data. Many of these slides were taken before I was born, many of them are of people that I do not know, and are in places i’ve never been. It is going to be up to my family members to take a crack at each slide, providing descriptive metadata for each picture as best they can.

Since my family is spread around the world, just giving them access to the master archive box on the LAN wouldn’t work. Furthermore, no one wants to download 20MB pictures, even locally.

I decided to use a package called gallery to publish the scans to the web. I already had a public, colocated webserver running, so it was just a matter of downloading & installing gallery. Gallery is fast, reliable, and has very granular authentications, so access can be restricted to certain users for certain “albums”. Each family member gets their own username and password; this makes the possibility of unwanted or incorrect metadata being added much lower. (Gallery allows a variety of metadata to be added, as well as free-form searchable comments about each image.)

20MB pictures would destroy my server in short order, so it’s fortunate that vuescan has an option at scan time to also create a JPG of each raw image, with the same filename. The JPGs are very high quality, but a fraction of the size, so they are easy to upload.

Conclusion

This is about as robust a solution as I could find for archiving these slides at a reasonable price. Has anyone else out in blog-land undertaken such a project? Do you have any tips?