Gavin Baker Rotating Header Image

Preservation for scholarly blogs

I’ve wondered about preservation for new modes of scholarly communication and ephemera, e.g. scholarly blogs, mailing lists, etc. Others have suggested it recently as well. A cursory Googling finds a few others mulling the question, but not (at first glance) anybody actually doing it.

I don’t do much with preservation, so I plead ignorance. Is anybody preserving scholarly blogs (or other “grey” online scholarly resources), other than general-purpose national archiving and the Internet Archive? (Is it en vogue to do Web archiving for certain domains/types of contents vs. general purpose?)

Also, a suggestion: What about an opt-in archiving service for scholarly blogs? (Not opt-in for an individual page, as with WebCite, but for an individual blog/site, to be harvested on a recurring basis.) I don’t suggest it as the end of a solution, but as the beginning of one. An interested library or archive (or consortium) could, I assume, provide such a service fairly cheaply (e.g., someone could build the service in their basement over a weekend). The opt-in could be a simple Web form, asking for the URL of the site and some metadata, maybe also getting a license to provide open access to the preserved copies if the original goes dark. (P.S. Preservation/copyright experts: Is such a license needed?) This might provide a higher level of service than general-purpose Web archiving, e.g. the ability to categorize sites by scientific domain or topic, more frequent/robust archiving than is accorded to pictures of cats with captions, etc.

I imagine I’m glossing over an entire body of literature on the topic. Hopefully someone will let me know where I’m wrong!

3 Comments

  1. Yes, such a license would be necessary.

    There’s another rights wrinkle as well: authors own their blog’s words, but they do not necessarily own the blog DESIGN, especially on third-party hosted services such as Blogger, TypePad, and WordPress.com.

    It would be enormously helpful if a delegation of concerned bloggers were to approach these services to ask for at least ONE rights-unencumbered design on their services, for purposes of archival.

  2. Gavin, I’m on the Blue-Ribbon Task Force on Sustainable Digital preservation and Access; right now we have a bunch of economists looking at aspects of the problem, and a bunch of other folk like me thinking of use cases. Preserving the blogosphere (closely followed by the whole other social network space) is a major use case that we have identified that covers some of these issues of ownership and control.

    I personally think such an opt-in solution would be great, if we could get someone to step up to it. I made sure both the content and comments on my blog are covered by a CC licence, so it’s doable. CavLec is right that the design may be an issue; on the other hand I rarely see any blog’s design, since I read through NetNewsWire, so I’m inclined to think blogs represent an area where the content is primary and design secondary.

    Chris Rusbridge

  3. Blake Stacey says:

    I don’t think blog design and rights thereto are that big an issue. Content scraping would probably be done with RSS feeds, anyway, which makes most design customizations irrelevant. You’d have a bit of a headache from blogs which use things like embedded images to display mathematical equations — a common hack on physicists’ blogs, for example — but nothing a bit of clever programming couldn’t handle. Rights to the text and image content itself would be the biggest obstacle, I suspect. A decent opt-in scheme would probably handle the text part, but the way some people include images clipped from who-knows-where, an archivist might find themselves in a legal wrangle.

    (Incidentally, if one self-promoting link per comment weren’t enough, I also had a few remarks on the Opening Up Education review you posted at Open Access News yesterday.)

Leave a Reply

  • Science dudes, are you even listening to me??