{"id":112,"date":"2005-01-11T15:02:40","date_gmt":"2005-01-11T20:02:40","guid":{"rendered":"\/?p=112"},"modified":"2005-01-11T15:02:40","modified_gmt":"2005-01-11T20:02:40","slug":"archiving-digital-images-from-film-my-approach","status":"publish","type":"post","link":"https:\/\/meredith.wolfwater.com\/wordpress\/2005\/01\/11\/archiving-digital-images-from-film-my-approach\/","title":{"rendered":"Archiving Digital Images from Film: My Approach"},"content":{"rendered":"<p>I&#8217;d mentioned in a previous story that i&#8217;ve set out on the horrid task of digitizing about 5,000 &#8211; 6,000 of my family&#8217;s 35mm slides, dating from the early 1960s until the present day.<\/p>\n<p>I want to free all our old pictures from the prison of the slide carousel, where they&#8217;ve sat for &#8211; literally &#8211; decades, so that our family members who are scattered around the world can easily browse them.<\/p>\n<p>I&#8217;m no archivist, but I know that many of the folks who read this blog are, so i thought i&#8217;d write up how i&#8217;m doing this, and solicit feedback\/advice from those with more experience than I have.<\/p>\n<p>The first constraint we had for this project was budget;  Meredith insisted that I try to keep it cheap. <\/p>\n<p>The main costs of a project like this are  hardware &#038;  time.<\/p>\n<p>Even though my time is valuable, i&#8217;m going to write it off as &#8220;free&#8221; because this for me is a hobby, and it&#8217;s certainly more productive than bowling or playing half-life (the experience of seeing people, places and pets that have long since departed is quite special, if sometimes a bit sad.)<\/p>\n<p>That leaves the real cash outlay to two kinds of items:<\/p>\n<p>1) A method to scan the slides &#038; film into the PC.<br \/>\n2) A method to -reliably- store and share the scans.<\/p>\n<p><strong>The Scanner<\/strong><\/p>\n<p>For the scanner, I picked the <a href=\"http:\/\/www.amazon.com\/exec\/obidos\/tg\/detail\/-\/B0001BG1SI\/qid=1105472974\/sr=8-1\/ref=sr_8_xs_ap_i1_xgl147\/102-7452614-1308100?v=glance&#038;s=pc&#038;n=507846\">Konica\/Minolta DiMAGE Scan Dual IV<\/a>.  It was reasonably priced (about $250 at Amazon) and each magazine\/slide holder can take 4 slides at a time.  The output is fantastic, and it has built-in correction filters to help restore slides that are past their prime (like most in my collection&#8230;)<\/p>\n<p>I decided to not be cheap (sorry, Meredith)  and paid for an external piece of scanner software called <a href=\"http:\/\/www.hamrick.com\/vsm.html\">Vuescan<\/a> (about $80 for the professional version, with free lifetime upgrades.)<\/p>\n<p>Vuescan is notable because it really does make higher-quality scans than the software that came bundled with the scanner and it makes automating the scanning process much easier.  <\/p>\n<p>The automation bit is critical, because with a collection of this size, you cannot scan slides in individually, name them individually, apply filters individually, etc&#8230; By the time you&#8217;d finish scanning, anyone who cared about the slides would be dead.<\/p>\n<p><strong>The Process<\/strong><\/p>\n<p>With Vuescan, once you set up the parameters the way you want them, scanning becomes as simple as loading up the magazine with 4 slides, shoving it in the scanner, and waiting for it to pop out.  The PC gives an audible &#8220;beep&#8221; when it pops the magazine out.  <\/p>\n<p>Then you shove in another 4, rinse, repeat 1000 times. You don&#8217;t even need to touch the computer at all when doing this.  So you can do other things throughout the day, and when you walk by the scanner &#038; see the magazine sticking out, you can just load up 4 more slides, shove it in &#038; walk away.   Each slide takes about 1 minute to complete.<\/p>\n<p>Even with this automation, the scanning process is a big job, but it goes from something that would require complete attention to something that you can just do casually throughout the day (this is a hobby, remember..)<\/p>\n<p><strong>The Storage Problem<\/strong><\/p>\n<p>The biggest hurdle to this project is finding a reasonable way to <em>safely<\/em> store the output of the scans.  Hard drives are notoriously unreliable, and I cringe when i see people in CompUSA buying those 200GB external drives knowing that if they are lucky, the drive will last 3 years before it catastrophically fails taking their data with it.<\/p>\n<p>Compounding the problem is that the raw output from the scanner is enormous.  Each slide takes about 20MB (that&#8217;s on vuescan&#8217;s &#8220;archive&#8221; setting, a lossless, 24-bit file.)  When you transform the slide into something more reasonable, like a JPG, it gets crunched down to about 1\/15th of that size.  But when you&#8217;re doing an archival project of this size, you want to preserve the high-quality, raw scans.<\/p>\n<p>Doing the math, 5,000 slides * 20MB ~ 100,000MB ~ 100GB.  Adding in the small JPG output that we&#8217;ll want for each slide too, and it&#8217;s about 110GB of data, give or take.<\/p>\n<p>A good deal of data, to be sure, but well within the capacity of consumer hardware in 2005.<\/p>\n<p>There is a temptation to go into CompUSA and buy one of those big external firewire drives to store all this data.  They are cheap and easy to set up.  But I had to fight my &#8220;cheap gene&#8221; with all my strength in order not to buy one, because they are simply not reliable enough.<\/p>\n<p><strong>Storage Solution<\/strong><\/p>\n<p>My solution to this problem was two-pronged: RAID storage for the scans + Optical storage for off-site archival.<\/p>\n<p>To handle the RAID storage, i decided to buy an <a href=\"http:\/\/www.antec.com\/us\/productDetails.php?ProdID=15130\">Antec Aria<\/a> case (small, meredith-approved), and throw in an old VIA C3 motherboard that I had lying around (The C3 is famous b\/c it uses very little electricity &#038; doesn&#8217;t need a fan. Perfect for a file server that&#8217;s going to  be kept around the house.)  Total cost for both was $130.00.  <\/p>\n<p>I set up a RAID5 array by going to CompUSA and purchasing 3 Seagate 160GB drives (CompUSA was having a sale, each drive was $59.00 after rebates.)  Total cost for storage was about $180.00.<\/p>\n<p>RAID5 is nice in that it gives you reliability (one drive can fail &#038; you&#8217;re OK.. simply replace it with a new one and it repairs itself.)  It also gives you 2\/3rds the capacity of the whole array, so the effective storage space on my system was about 300GB.  More than enough for the project.<\/p>\n<p>Finally, I had an old Sony internal DVD burner lying around, and I knew it to be reliable, so that saved us about $70.00. <\/p>\n<p>Each carousel comes out to about 2Gb of data, so conveniently 2 carousels fit per DVD burned.  The DVDs can then be taken off-site for safe storage.<\/p>\n<p>Since I had the install CD&#8217;s already, I chose <a href=\"http:\/\/www.mandrakelinux.com\/en-us\/\">Mandrake Linux 10.1<\/a> to be the OS for the server.  Though in reality, any version of linux will do.  Mandrake makes it easy to set up the RAID array.  <a href=\"http:\/\/fedora.redhat.com\/\">Fedora Core<\/a> also makes it a snap.  I haven&#8217;t tried it with any other distros, YMMV, but I doubt the procedure is too far different.<\/p>\n<p>The server is just that &#8212; a headless machine running ssh and samba and almost nothing else.  It&#8217;s used as a central repository for all the scans.<\/p>\n<p><strong>Sharing &#038; Meta Data<\/strong><\/p>\n<p>At this point, we&#8217;ve got a reliable and (relatively) fast method to scan the slides and a safe way to store them.<\/p>\n<p>The next big issue is what to -do- with all the data.  Many of these slides were taken before I was born, many of them are of people that I do not know, and are in places i&#8217;ve never been.  It is going to be up to my family members to take a crack at each slide, providing descriptive metadata for each picture as best they can.<\/p>\n<p>Since my family is spread around the world, just giving them access to the master archive box on the LAN wouldn&#8217;t work.  Furthermore, no one wants to download 20MB pictures, even locally.<\/p>\n<p>I decided to use a package called <a href=\"http:\/\/gallery.menalto.com\/\">gallery<\/a> to publish the scans to the web. I already had a public, colocated webserver running, so it was just a matter of downloading &#038; installing gallery.  Gallery is fast, reliable, and has very granular authentications, so access can be restricted to certain users for certain &#8220;albums&#8221;. Each family member gets their own username and password; this makes the possibility of unwanted or incorrect metadata being added much lower. (Gallery allows a variety of metadata to be added, as well as free-form searchable comments about each image.)<\/p>\n<p>20MB pictures would destroy my server in short order, so it&#8217;s fortunate that vuescan has an option at scan time to also create a JPG of each raw image, with the same filename. The JPGs are very high quality, but a fraction of the size, so they are easy to upload.<\/p>\n<p><strong>Conclusion<\/strong><\/p>\n<p>This is about as robust a solution as I could find for archiving these slides at a reasonable price.  Has anyone else out in blog-land undertaken such a project?  Do you have any tips?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;d mentioned in a previous story that i&#8217;ve set out on the horrid task of digitizing about 5,000 &#8211; 6,000 of my family&#8217;s 35mm slides, dating from the early 1960s&hellip;<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-112","post","type-post","status-publish","format-standard","hentry","category-general"],"_links":{"self":[{"href":"https:\/\/meredith.wolfwater.com\/wordpress\/wp-json\/wp\/v2\/posts\/112"}],"collection":[{"href":"https:\/\/meredith.wolfwater.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/meredith.wolfwater.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/meredith.wolfwater.com\/wordpress\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/meredith.wolfwater.com\/wordpress\/wp-json\/wp\/v2\/comments?post=112"}],"version-history":[{"count":0,"href":"https:\/\/meredith.wolfwater.com\/wordpress\/wp-json\/wp\/v2\/posts\/112\/revisions"}],"wp:attachment":[{"href":"https:\/\/meredith.wolfwater.com\/wordpress\/wp-json\/wp\/v2\/media?parent=112"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/meredith.wolfwater.com\/wordpress\/wp-json\/wp\/v2\/categories?post=112"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/meredith.wolfwater.com\/wordpress\/wp-json\/wp\/v2\/tags?post=112"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}