Archiving Digital Images from Film: My Approach

I’d mentioned in a previous story that i’ve set out on the horrid task of digitizing about 5,000 – 6,000 of my family’s 35mm slides, dating from the early 1960s until the present day.

I want to free all our old pictures from the prison of the slide carousel, where they’ve sat for – literally – decades, so that our family members who are scattered around the world can easily browse them.

I’m no archivist, but I know that many of the folks who read this blog are, so i thought i’d write up how i’m doing this, and solicit feedback/advice from those with more experience than I have.

The first constraint we had for this project was budget; Meredith insisted that I try to keep it cheap.

The main costs of a project like this are hardware & time.

Even though my time is valuable, i’m going to write it off as “free” because this for me is a hobby, and it’s certainly more productive than bowling or playing half-life (the experience of seeing people, places and pets that have long since departed is quite special, if sometimes a bit sad.)

That leaves the real cash outlay to two kinds of items:

1) A method to scan the slides & film into the PC.
2) A method to -reliably- store and share the scans.

The Scanner

For the scanner, I picked the Konica/Minolta DiMAGE Scan Dual IV. It was reasonably priced (about $250 at Amazon) and each magazine/slide holder can take 4 slides at a time. The output is fantastic, and it has built-in correction filters to help restore slides that are past their prime (like most in my collection…)

I decided to not be cheap (sorry, Meredith) and paid for an external piece of scanner software called Vuescan (about $80 for the professional version, with free lifetime upgrades.)

Vuescan is notable because it really does make higher-quality scans than the software that came bundled with the scanner and it makes automating the scanning process much easier.

The automation bit is critical, because with a collection of this size, you cannot scan slides in individually, name them individually, apply filters individually, etc… By the time you’d finish scanning, anyone who cared about the slides would be dead.

The Process

With Vuescan, once you set up the parameters the way you want them, scanning becomes as simple as loading up the magazine with 4 slides, shoving it in the scanner, and waiting for it to pop out. The PC gives an audible “beep” when it pops the magazine out.

Then you shove in another 4, rinse, repeat 1000 times. You don’t even need to touch the computer at all when doing this. So you can do other things throughout the day, and when you walk by the scanner & see the magazine sticking out, you can just load up 4 more slides, shove it in & walk away. Each slide takes about 1 minute to complete.

Even with this automation, the scanning process is a big job, but it goes from something that would require complete attention to something that you can just do casually throughout the day (this is a hobby, remember..)

The Storage Problem

The biggest hurdle to this project is finding a reasonable way to safely store the output of the scans. Hard drives are notoriously unreliable, and I cringe when i see people in CompUSA buying those 200GB external drives knowing that if they are lucky, the drive will last 3 years before it catastrophically fails taking their data with it.

Compounding the problem is that the raw output from the scanner is enormous. Each slide takes about 20MB (that’s on vuescan’s “archive” setting, a lossless, 24-bit file.) When you transform the slide into something more reasonable, like a JPG, it gets crunched down to about 1/15th of that size. But when you’re doing an archival project of this size, you want to preserve the high-quality, raw scans.

Doing the math, 5,000 slides * 20MB ~ 100,000MB ~ 100GB. Adding in the small JPG output that we’ll want for each slide too, and it’s about 110GB of data, give or take.

A good deal of data, to be sure, but well within the capacity of consumer hardware in 2005.

There is a temptation to go into CompUSA and buy one of those big external firewire drives to store all this data. They are cheap and easy to set up. But I had to fight my “cheap gene” with all my strength in order not to buy one, because they are simply not reliable enough.

Storage Solution

My solution to this problem was two-pronged: RAID storage for the scans + Optical storage for off-site archival.

To handle the RAID storage, i decided to buy an Antec Aria case (small, meredith-approved), and throw in an old VIA C3 motherboard that I had lying around (The C3 is famous b/c it uses very little electricity & doesn’t need a fan. Perfect for a file server that’s going to be kept around the house.) Total cost for both was $130.00.

I set up a RAID5 array by going to CompUSA and purchasing 3 Seagate 160GB drives (CompUSA was having a sale, each drive was $59.00 after rebates.) Total cost for storage was about $180.00.

RAID5 is nice in that it gives you reliability (one drive can fail & you’re OK.. simply replace it with a new one and it repairs itself.) It also gives you 2/3rds the capacity of the whole array, so the effective storage space on my system was about 300GB. More than enough for the project.

Finally, I had an old Sony internal DVD burner lying around, and I knew it to be reliable, so that saved us about $70.00.

Each carousel comes out to about 2Gb of data, so conveniently 2 carousels fit per DVD burned. The DVDs can then be taken off-site for safe storage.

Since I had the install CD’s already, I chose Mandrake Linux 10.1 to be the OS for the server. Though in reality, any version of linux will do. Mandrake makes it easy to set up the RAID array. Fedora Core also makes it a snap. I haven’t tried it with any other distros, YMMV, but I doubt the procedure is too far different.

The server is just that — a headless machine running ssh and samba and almost nothing else. It’s used as a central repository for all the scans.

Sharing & Meta Data

At this point, we’ve got a reliable and (relatively) fast method to scan the slides and a safe way to store them.

The next big issue is what to -do- with all the data. Many of these slides were taken before I was born, many of them are of people that I do not know, and are in places i’ve never been. It is going to be up to my family members to take a crack at each slide, providing descriptive metadata for each picture as best they can.

Since my family is spread around the world, just giving them access to the master archive box on the LAN wouldn’t work. Furthermore, no one wants to download 20MB pictures, even locally.

I decided to use a package called gallery to publish the scans to the web. I already had a public, colocated webserver running, so it was just a matter of downloading & installing gallery. Gallery is fast, reliable, and has very granular authentications, so access can be restricted to certain users for certain “albums”. Each family member gets their own username and password; this makes the possibility of unwanted or incorrect metadata being added much lower. (Gallery allows a variety of metadata to be added, as well as free-form searchable comments about each image.)

20MB pictures would destroy my server in short order, so it’s fortunate that vuescan has an option at scan time to also create a JPG of each raw image, with the same filename. The JPGs are very high quality, but a fraction of the size, so they are easy to upload.

Conclusion

This is about as robust a solution as I could find for archiving these slides at a reasonable price. Has anyone else out in blog-land undertaken such a project? Do you have any tips?

Angie says:

4/22/2005 at 3:59 pm

I just started scanning about 1000 of the “family” slides using the Minolta Dimage DS IV. I was following your process until you got to the “storage” and RAID5. I just started this project in reality on Monday this week and have spent 4 days standing at the computer, labeling, reminiscing, sorting, and escaping to dust or vacuum when it gets to me! My slides are from the 50s to 70s and are in remarkably good shape. I don’t notice any discoloration or fading, however, I don’t think the scanner actually picks up the true colors and find that I am adjusting each slide with contrast and saturation especially. I am using the software that came with the scanner as I didn’t know that there was any other out there. Do you know how I can solve the problem of not getting the true color/brightness of the original slide w/o adjusting? Thanks for any help you can give me. Angie

docwolf says:

4/24/2005 at 9:54 am

HI Angie,

Hope you’re having fun with the slide scanner!

In terms of the problems that you’re having with lighting & exposure, the best solution that I found was to purchase the third-party software package vuescan by hamrick software.

http://www.hamrick.com/vsm.html

It worked with the scanner right out of the box, and Hamrick’s tech support is really good (apparently all they do is make scanner software, so i’m not surprised their package is top-notch.)

You can download & test it for free before buying it.

Using VueScan really sped up my ability to scan the slides in, as you can set up the filters precisely the way you want them, and save them into their own profile. (ie, so that if you have a batch of ekachrome slides you can create & use one filter, Kodachrome another, etc..)

The way vuescan works, it detects when you’ve loaded a magazine in, and automatically does the scans without you having to touch the computer. So you can literally just load the magazines while doing other stuff, instead of fussing with the software. When you see the magazine has popped out, load in 4 more slides, pop it in, etc….. It makes doing mass scanning 1000x easier.

hope it helps!

John Meyer says:

8/14/2005 at 1:38 pm

I have scanned well over 50,000 slides, negatives and prints, mostly from family albums. This has taken almost five years.

Vuescan is mandatory for doing this volume of work. For slides, I use a Nikon Coolscan 4000 slide scanner with the SF-200 slide feeder. I purchased this on eBay for about $300 and plan to sell it back when I’m done (if it is still working). This slide feeder is mandatory for getting a job of this magnitude done before departing this earth.

I archive on specially purchased CD-ROMs from Mitsui that cost several dollars each. I spent several days searching the web for these. They are called “Professional Grade CD Recordable Disc”. The only identifier on the packaging is 6509. These are stored in a dark, temperature controlled room at 65 degrees, with fairly constant 35% humidity.

You may be surprised, but I archive to JPEG, using a very high quality compression. For those few pictures that are really important, I use TIFF. Why JPEG and TIFF? Well, I have actually run a fairly well-known photo editing software company, so I am very familiar with all the artifacts created by various storage techniques. Quite frankly, for 99.9% of all uses, a good JPEG is more than anyone will ever need. There are far more issues with cleaning techniques used by your scanner and the scanner software (if you use them), and also the dozens of settings you can change when scanning your image (e.g., white point, black point, analog gain, ROC and GEM — if your scanner supports it — and many more). But, I’m talking about family pictures here, not works of art. If I was a pro, and these were award winning pictures, carefully lit and composed, I’d definitely store the RAW files.

I also archive on Maxell 2x DVD-R, which makes it easier to access large numbers of photos. It is very difficult to get any useful information on accelerated aging tests for DVD, but from what little information I’ve found, these appear to be the best for archiving.

Finally, I do not share the author’s worry about hard disks. I have run three software companies and have only had two hard disks fail in over twenty years, and those failures were after years of continuous operation. For a disk drive that is going to spend its life on a shelf, the chance of failure is virtually zero, and the convenience of having the ENTIRE set of pictures on one media is enormous. Most people’s experience with hard disk “failure” is actually caused by cockpit error (e.g., accidentally formatting or partitioning the disk), or rogue software (bug, virus, adware). If you write protect all files, this will minimize this risk.

An external Firewire drive backup is virtually mandatory for a project of this size. If you are worried about failure, buy two drives and give one to a family member or friend. Cost per gigabyte is not much more than DVD or CD (this cost comparison assumes that you buy top-quality CD or DVD media which costs considerably more than $0.10 per CD or $0.50 per DVD).

Archiving Digital Images from Film: My Approach by docwolf

ABOUT THE AUTHOR

docwolf

3 Comments