Friday, October 26, 2012

Archiving Important Events

My friends are having a baby today. They asked me to buy a copy of today's paper so that they could have it to remember what was going on in the world on that date. I did.

Then, I thought, why wouldn't you want to memorialize more than just one local paper to find out what was going on? You see, the awesome Newseum has a site where they share all of today's front pages from over 800 newspapers worldwide. Note that the Newseum indicates the following:
Anyone seeking permission to use a front page must credit the Newseum and contact the newspaper directly for permission. U.S. copyright laws apply.
I think fair use doctrine wouldn't have a problem with you saving these images personally in an electronic baby book - but they shouldn't be shared or put up on a web page.

So, here's what I did. I used FireFox browser with the FoxySpider plugin, and a regex-capable text editor (in my case, TextWrangler).


  1. Install FireFox and the FoxySpider plugin if you haven't already. Go into the FoxySpider preferences and uncheck the "Limit gallery to X thumbnails" box.
  2. Go to the Newseum front pages site, and click the link to show All front pages.
  3. Right-click the page and choose FoxySpider - Advanced Filters
    • In Crawl pages within this URL I modified it slightly to ensure it captured only the linked pages with images - currently this is http://www.newseum.org/todaysfrontpages/hr.asp*
    • Click Start!
    • In the generated page of thumbnails, choose "Select all files" from the drop down and click the Download Files button. Choose a folder to save all the files to.
    • Close the FoxySpider tab after it's done.
  4. Right-click the page again and choose Save Page As... and save it as a .html file in the same folder as you saved the images to. For the type, choose "Web page, complete" so that it keeps the thumbnails.
  5. Open the .html file in your text editor and perform the following replacement (as of today's date, at least):
    • Find: href="http://www\.newseum\.org/todaysfrontpages/hr\.asp\?fpVname=([A-Z]*_[A-Z]*)&ref_pge=gal&b_pge=1"
    • Replace: href="\1.jpg"
    • Matching: (check the Grep checkbox)
  6. Save the file. Now you have a local copy of the Newseum's Today's Front Pages page linked to the local copy of each image you've saved using FoxySpider.
Again to be clear this should be for personal, non-commercial uses only. I think it would have been cool to see every front page from the day I was born without having to go to the Library and checking out the microfiche.