Wednesday, September 18, 2013

26-Sep Hodge, Gail M. Best practices for digital archiving: an information life cycle approach. D-Lib Magazine, Volume 6 Number 1, January 2000.

37 comments:

  1. 1. The author states in “4.2.1.3 Archiving Links” that the most common practice is to only archive the links without archiving the content of the links. The links could change and then break. And it gonna lose its point (or most of the points) if the archive only has the links instead of the actual content. So I suggest archivist also archive of the links of the content, by which the content is referred to, when archiving the content. If all archives follow this rule, they can build a repository that contains all the links ( or all the versions of the links) of the content. So even a link fails, they can still find a way to point to the content.

    2. As for preservation and hardware and software migration, shouldn’t the hardware and software also be archived/ stored as the carrier of the information? Furthermore, shouldn’t the carrier along with the information be regarded as one entity? Since even in emulation, the files can be read (usability level), but the experience is already quite different from reading the file using the original hardware and software (user experience level).

    3. For "Transformation V.S. Native Formats”, if the archive decides to transform all the collection into one standard format for the ease of access and manage, for example they can only use one tool instead of different tools for different formats. But, I wonder would the archive also keep the original native formats?

    ReplyDelete
  2. 1 - Internet/Web archiving is insane. While the Internet Archive does a decent job of following and crawling links, it doesn't do so comprehensively, fails to gather most external links and doesn't account for thinks like MySQL injections into URLS that could pull up websites, nor does it archive things like pull-down menus that retrieve or display database/search results. There are also ongoing issues of crawling etiquette, such as only crawling materials you have IP rights to (which the IA doesn't do for obvious reasons). How can smaller archives make use of internet/webpage archiving in a realistic way? Is it practical for archives to make use of web-crawling, or should we just let the WayBack Machine handle it?

    2 - With all of these standardized formats, I'm wondering about "migration on demand" which is becoming a more popular policy for archives dealing with obsolete digital media. If we migrate on demand, do we run the risk of never being able to access an object, or are we doing a better job of preserving its authenticity instead of generating copy after copy after copy? How do all of these proprietary formats affect the future or preserving born digital materials?

    ReplyDelete
    Replies
    1. 1. That's the problem! Not archiving external links doesn't seem to truly capture the extent of an archived item. But where do the links end? How far should we go with it? Crawling etiquette is frickin' minefield.

      Delete
  3. 1. In section 4.1 of Hodge’s article, she talks about the creation of objects meant for digital archives and the metadata associated with them. She provides an interesting idea regarding the collection of an object’s metadata when she points out that, “Many project managers acknowledged that the best practice would be to create the metadata at the object creation stage”. This is a different solution than the one discussed in Evans’ previous article, “Archives of the People, By the People, and For the People,” in which he suggested a Wikipedia-esque approach to managing archives. In the case Hodge presents, the actual creator would be providing (at least some of) the metadata and then passing his/her work off to the archive. Would this method stop a backlog before it started? Is it possible to catalog this way or would catalogers be spending just as much time checking a creators’ metadata work as they would doing the cataloging themselves?


    2. Hodge also discusses two different gathering approaches when selecting material for an archive – a “hand-selected” approach and an automatic one (section 4.2.2). As she explains, a “hand-selected” approach involves a person actually reading over electronic material to decide if it’s a good fit for the archive, while an automatic approach involves a robot or program that scans and collects websites for the archive. Hodge specifically points out that intellectual property rights can be a problem when using the automatic gatherer, as, while a person hand-selecting material for an archive “seeks permission from the copyright owner before copying the resource for the archive”, “automated system[s] . . . do not contact the owners” (4.2.3). Are intellectual property rights the only concerns when using an automatic system? Is it better to use a person to select archive materials? Or does that waste too much time?


    3. In section 4.6.2 of her article, Hodge discusses the access rights in regards to digital archives – “What rights does the archive have? What rights do various user groups have? What rights has the owner retained?”. She points out that, whatever access is granted to the users and the public, “there is concern among image archivists that images can be tampered with without the tampering being detected”. This reminds me again of Evans’ article, “Archives of the People, By the People, and For the People”. If we grant the public access to digital archives, especially grant them as great an access as Evans’ suggests, what’s stopping someone from tampering with or stealing an image or text? How will the archive even know that it has happened? What protects the objects? Hodge spoke about “encryption, watermarks, and digital signatures”, but will those be available for every object?

    ReplyDelete
  4. 1. The author talks about ways to make the archiving process more efficient. She points out that attention should be paid to issues of consistency,format,standardization and metadata description in the very beginning of the information life-cycle. During the beginning where information is created, how is it possible to format or standardize them? My understanding is that these processes can only be done in the later stage of the information life-cycle where information is delivered. How can the processing take place during creation?

    2. The author says that most organizations archive the links but not the content of the linked object. Is it possible that the NLA’s decision to archive the contents of a link based on some selection guidelines have some drawbacks? There can always be important objects in links that could be lost when the links don’t meet the selection guidelines.

    3. Questions concerning privacy and “stolen information” have become so ubiquitous , and in this case, Rights management can go to an extent of restricting access as appropriate and making security level changes. Is this wholly sufficient to protect important archives? And, how far can the metadata for managing encryption,watermarks etc be accurate and efficient?

    ReplyDelete
  5. 1. I know that digital archiving is seen as a growing priority since "the time between manufacture and preservation is shrinking", as Hodge writes on page 1, but my overarching question is "Why?"? I realize that is incredibly unspecific, but don't we have to draw the line somewhere? All of these different institutions and people keep thinking and proposing that we save this, that, and the other thing. Have we become information hoarders? I suppose I just need a dumbed down list - like a top 10 reasons - we need to be performing digital archiving. I understand the basic principles and the basic needs for doing this, but in the grand scheme, how is it relevant?

    2. On page two of the article, Hodge briefly addresses the issue that many best practice standards for digital archiving are outdated or misunderstood. It seems that she gave a general overview of how different institutions are archiving digitally across the globe, but provided no real set of best practice standards. Will organizations continue to archive whenever and however they want, or is it truly possible to create a widely accepted set of standards. Then cross-institutional and even cross-cultural interactions would be easier and more feasible.

    3. I don't quite understand the reasoning behind having automated gathering software to collect the material that will be digitally archived. Is this just the notion of quantity vs. quality? For certain projects, such as Brewster Kahle's Internet Archive (p. 6) I understand the reasoning for having all content, but for other projects I just don't see the use for collecting everything. Doesn't that rally against the entire idea of having selection standards, or are they just selection standards of a different nature? Maybe my views on archives are too antiquated, but what is the relevance of archiving this website at this particular time? Who is using this information?

    ReplyDelete
    Replies
    1. 1 - The difficulty with the "why?" is that I could easily argue that paper is a waste of time and space (server space is much cheaper) so why not just scan all of the paper we have and toss it?
      Slightly flip statements aside, I view saving born digital materials as important as saving other materials. Salman Rushdie's most recent novels are not available as paper manuscripts - they were all saved on his hard drive. Most administrative and government records are now born-digital as well. These entire systems (e.g. NARA's government email archive) are both legally relevant and historically relevant. Someone's email account will give me a much deeper insight into how they lived and what they were like than a few impersonal birthday cards from family members. If most of our daily interactions, work, and subjective experiences are in one way or another documented online, and one of the facets of archiving (in my opinion) is to document the human experience, then why would we just say "oh, well, digital born stuff is hard to preserve" and leave it at that?
      There's also the ongoing creation of born-digital art and media, which I would like to see preserved. Ben Fino-Radin, the archivist for Rhizome, a NY based art collective, has an excellent paper out detailing what goes in to preserving digital works of art: http://rhizome.org/editorial/2011/aug/5/keeping-it-online/

      And, one of my favorite "why save digital materials?" kerfuffles has got to be the Library of Congress Twitter archive. I think this blog post does an excellent job of explaining why we should care, although there is a lot of swearing, so if you'd rather not, I'm leaving a list of Library of Congress blog links with some very interesting comment sections down below. http://ascii.textfiles.com/archives/2538

      http://blogs.loc.gov/loc/2010/04/the-library-and-twitter-an-faq
      http://blogs.loc.gov/loc/2013/01/update-on-the-twitter-archive-at-the-library-of-congress/

      http://ascii.textfiles.com/archives/2538

      Delete
  6. 1. In this paper, one thing seemed to be put aside which is iteration of information life cycle. For example, the author emphasized that formatted metadata should be carried out at the beginning of the information life cycle. However, how could anyone design a complete metadata format without any changes afterwards? From my point of view, information life cycle does not only contain the top-down process, but has back-to-front activities as well.

    2. Hodge wrote this paper around 1999 and she admitted in the paper that the speed of technological advances. Combining such facts, it's doubtful to apply the framework that Hodge created 14 years ago for universal digital archiving nowadays. In 1999, there were not much information online and most of records were store offline. However, information on the web exploded after 2000 and even Google spiders cannot scrawl every link on Internet for now. In this case, is it possible to build an automated archiving software to collect the data on webpages for now and for the future?

    ReplyDelete
    Replies
    1. 3. The author rises a concept which is "migration on demand" as a mean of taking care of out-dated media. However, I'm not sure that "demand" for now is enough to cover further need or to ensure that current major needs are taken into concern. Who is the creator of the "demand"? Or how can we determine the demand is right for now and for next decade?

      Delete
  7. 1. Many of the guidelines for selecting - the extent of the source that should be preserved, where linked content should be included, how often the archive should be updated and if they should keep all updates - are very subjective. In terms of issues of culture – who’s culture? Accepted, popular culture? What about things on the outside of that - pornography or possible offensive language? How does an organization with even a pair of employees come to the same agreement on all items?

    2. Since URLs can change (especially full URLs), I would think it would be required to also use an identification number. This could also prove helpful with versions of the same URL.

    3. Some of the project managers noted that they keep the original digital format as well as the transformed digital format. This is going to get very expensive and may not be possible to maintain. Do we have to keep an "original" in that original data format, even though we can no longer use that format? Or is it satisfactory to just note each digital format as part of the meta data and note when the transition occurred?

    ReplyDelete
  8. 1. This paper suggests several good practice points for digital archiving, but limits its examination almost entirely to scientific digital archives. These may have been predominant in 2000, prior to the Wayback Machine, Wikipedia, Project Gutenberg, and several other efforts to archive large sections of the web. Are the general suggestions of semi-automated metadata and formatting updates with emulation applicable to these later projects?

    2. Format emulation is one technical hurdle mentioned for ongoing archiving efforts. Has open software successfully filled some of these gaps, through efforts such as OpenOffice and DOSBox to permit access to older file formats?

    3. The question of DRM is not much addressed in this paper, and was not the same common issue that it is now. Given current US law, it can be illegal for an archivist to try and circumvent deliberate limitations on access to a given item for the purpose of preservation. How are these problems being addressed in modern digital archives?

    ReplyDelete
  9. 1) I thought it was interesting that Hodge, writing in 2000, expressed skepticism that the PDF format would ever be widely adopted by digital archives “because of its proprietary nature” (11). In my experience retrieving information from digital archives, the PDF format has become pervasive, and the fact that Adobe owns the format is mitigated by the fact that PDFs can be read fairly easily in programs that are not produced by Adobe. Is the PDF as pervasive within archives as it is outside of them, or are digital archivists still skeptical of the format’s longevity?

    2) The article discussed the problem of maintaining an archive of digital materials whose file formats or programs of origin will go obsolete every few years. How do digital archivists deal with a similar problem: bringing a document into the archive for the first time when its file format has long since gone obsolete? Are there techniques for retrieving the information from such documents, short of obtaining the obsolete programs and equipment? I’ve read elsewhere that this is a real problem and wonder how archivists have begun to address it.

    3) Hodges anticipates some of the problems raised by the massive proliferation of digital and internet-based information, but since the article was written in 2000, the consequences of the internet revolution had not yet arrived in full force. Are the digital archiving practices she outlines still viable now that the internet has grown so exponentially? How have digital archivists since 2000 dealt with the proliferation of digital information, especially in terms of screening it for quality?

    ReplyDelete
  10. 1) The paragraph describing the NLA's PANDORA Project says "Other items are archived on a selective basis 'to provide a broad cultural snapshot of how Australians are using the Internet to disseminate information, express opinions, lobby, and publish their creative work.'" My question stems from the admission that the library is selective in their choices -- why not have a formal method through the Internet for Australian citizens to submit those kinds of items of their own volition, instead of scouring the web? I think a project like that would be something even irregular users of anything the NLA offers could patricipate in, and by having the public plug some of the wholes in the cultural record, the NLA would be free to select other items to fill those gaps.

    2) Section 4.2.3 says that the Swedish and Finnish national library projects have an automated system for their data capture, and do not contact owners of intellectual property they archive. Has this activity ever adversely affected them? Has anyone tried to take them to court to wrestle complete (or as complete as can be) control over their online intellectual property to force these libraries to delete them from their records?

    3) In Section 4.5.1 two sections about technology struck me. The first is "...the American Chemical Society and the U.S. Environmental Protection Agency purchased Oracle not only for its data management capabilities but for the company's longevity and ability to impact standards development." The second was "... there is no policy that requires the manufacturers to deposit the emulation information. The best practice for the foreseeable future will be migration to new hardware and software platforms; emulation will begin to be used if and when the hardware and software industries begin to endorse it." Is this perhaps an area the archives should be pushing back more on? Shouldn't they be the ones having an impact on these companies, instead of taking what is offered and hoping newer iterations will continue to suit their purposes rather than the software evolving into a different animal?

    ReplyDelete
  11. 1. The study identified over 30 projects and selected 18 of them as the most “cutting edge”. I wonder did the author have any standard to identify these projects. The author mentioned “primary attention was given to operational and prototype projects involving scientific and technical information at an international level”; however, it is hard to convince me that opinions from project managers are reliable enough. Is it also practicable to make a survey on those experts, who used to identify the projects, to identify emerging models and best practice for digital archiving?

    2. In the part of 4.0, the author discussed digital archiving in the framework of information life cycle management and summarized the life cycle of six parts: creation, acquisition, cataloging/identification, preservation and access. However, I think the author forgot an important part of information life cycle------ Disposition. Although some digital archive never loses its value, the value of most archive tends to decline over time until it has no further value to anyone for any purpose. If the importance and usage declines, an archive's life cycle will transition to a semi-active and finally to an inactive state. What’s more, speaking of archiving links, archivists also face the problem of “link rot”.

    3. The author discussed the preservation of the “look and feel”, said it is challenging in the text environment and even more challenging in the multimedia environment. I wonder in which way of archivists preserving multimedia could maintain its “look and feel”. Does it mean preserve a whole video but not only some description? Since people create a large amount of multimedia every day, could those models for object-based archiving put into practice use and store multimedia?

    ReplyDelete
  12. 1. In the metadata section, Hodge writes that in the future libraries may receive metadata directly from the publisher, which will save cataloging time costs. This seems like it would be very helpful, so I am wonder if this practice has come into play yet. If not, what barriers are preventing it?

    2. Why are hardware and software companies not supporting emulation and not supplying the valuable information to allow archivists to recreate proprietary software? Is this still a problem today? Are their financial or legal reasons why companies are not participating?

    3. The author describes the time and frequency that digital objects must be migrated to new technologies every 2-10. What is the long term time and cost comparison for saving digital files versus paper files? Does digital archiving take more time that traditional paper based materials? If so, should archivists have higher restrictions for what digital files they will except in an archive or is the cost of space and upkeep not a concern at this point?

    ReplyDelete
    Replies
    1. RE: 2. If the statement is still true that companies generally do not support hardware and software emulation, I would start my guessing on the money-side of things. What revenue or value for the company and its customers can come of such support? If the archivist could create sizable revenue and/or value for such companies, then the tide may quickly change.

      Delete
  13. 1. "Groups and individuals who did not previously consider themselves to be archivists are now being drawn into the role, either because of the infrastructure and intellectual property issues involved or because user groups are demanding it." Similarly, the author of "Information Architecture" published by O'Reilly makes the argument that anyone who works with folders, files, and documents in today's computer operating system environments has become a librarian by virtue having to organize large amounts of information and make them usable as a resource.

    2. Regarding "4.6.2 Rights Management and Security Requirements", how is metadata such as frequency of views, viewer identity, or viewer authority (think PageRank for users) used to help indicate whether or not rights or security requirements have been followed?

    ReplyDelete
  14. 1. On page 3, Hodge suggests that the creator of an information product "may be involved in assessing the long-term value of information." Would it be helpful for the creators of internet content to judge the importance of their own work in such a fashion? Can we develop a system where individuals assess the cultural relevance of their own blog posts, twitter feeds, etc.? If so, the system may resemble the copyright phenomenon Creative Commons where individuals consciously delimit how they will allow their own work to be used by others. Should we be thinking about the perennial nature of our internet contributions in such a way?

    2. Since 2000 when this paper was published, have we agreed upon the limits or boundaries of a particular digital work? If, as Hodge suggests, we do not save the links a page or a website connects to are we fulfilling our role as archivists to preserve "original order"?

    ReplyDelete
  15. 1. This article starts out by referencing a study published by the International Council for Scientific and Technical Information (ICSTI). My patience for acronyms is related. How many different, relevant organizations are there? Do informational professionals even know about, much less stay abreast of the pontifications, of these various groups? How do we prioritize them? Is there an acronym/organization cheat sheet? Should I make that the focus of my literature review? Ha!

    2. In their discussion of archival policies relating to Australian Internet publishing, the authors state, “Other items are archived on a selective basis ‘to provide a broad cultural snapshot of how Australians are using the Internet to disseminate information, express opinions, lobby, and publish their creative work.’” They do not attempt to address the question of who is in a position to determine value or prioritize items for archival. Is this question important? If so, based on our readings and discussions up to this point, how might we go about approaching the inherent questions regarding values, users, domains, and objectivity?

    3. I found section 4.5.3 especially interesting: “According to Canadian Copyright Law, an author’s rights are infringed if the original work is ‘distorted, mutilated or otherwise modified.’ After much discussion, the NLC decided that converting an electronic publication to a standard format to preserve the quality of the original and to ensure long-term access does not infringe on the author's right of integrity. However, this assumption has not been tested in court.” What is the state of this legal quandary in the United States? What solutions have been found in Canada or elsewhere since publication of this article in 2000?

    ReplyDelete
  16. 1. "Scholarly publications of national significance and those of current and long term research value are archived comprehensively. Other items are archived on a selective basis..." The first part of this quote should not be difficult to implement. It should be a fairly easy task to either get the journals to follow some kind of archival guidelines or to monitor the journals and automatically archive the publications. The last part of this quote however is completely different. It may be possible to automatically archive most selected items but the idea that a person or people have to make the decision what to archive and what not to is impossible, even when The Internet is narrowed down to only Australian specific artifacts.
    2. Overall, Hodge seems to be very optimistic about the ability to archive digital objects. Hodge wrote this article in 2000 and since then the amount of hosted websites
    has grown ten fold. I would be very interested to hear what Hodge thinks about the ability to archive all these sites and whether or not she is still as optimistic as she seems in this article.
    3. With the advent of The Cloud and systems like Amazon Glacier and other archival cloud-based storage solutions I would think that the concern over migrating to newer/better storage mediums will have been greatly reduced. The outsourcing of data storage to dedicated data centers is something that all archives of digital objects should consider. Not only is it less expensive than local solutions, the data is guaranteed to be accurate. In some cases 3-5 copies of the archived data exist at any given time. Often in different locations. These attributes make storing digital archives in The Cloud worthy of consideration.

    ReplyDelete
  17. 1. The author identified the best practices for the framework of the information life cycle – creation, acquisition, cataloging/identification, storage, preservation and access. I found the framework of the information life cycle can be also applied to traditional archiving, so where the discrepancies occur? Why the author said digital storage media have shorter life spans? Which phase has been shortened?
    2. When talking about Acquisition and Collection Development, the author mentioned there are a lot of different policies from different institutions. Is that necessary to find an approach to mediate the discrepancies to make them consistent? If yes, how to do that?
    3. In the whole information life cycle, which phase is the most primary section (or have the strongest influence on the whole life cycle)?

    ReplyDelete
  18. 1. This article raised some very interesting questions for me concerning the use and archiving of links including URLs and other identifiers. I have often thought about the increasing trend of digital eBooks to include URLs embedded within the text as a jumping off point for readers and how that might affect future readers when those URLs become broken. For all those in Archives, URLs must be a problem that is looming and I am curious how the field thinks about URLs and how they might deal with this problem when archiving texts?

    2. I was very keen on the authors insistent to collect metadata at the source of file creation and wonder if there are any guidelines for doing this. In the case of camera owners, I know that certain types of metadata are saved with the file including lens, aperture, shutter speed, etc. but I am curious how other types of metadata are collection upon file creation and if anyone has attempted to unify this process to make it easier for archivists.

    3. I am curious about how archives plan to deal with digital artifacts from creators who use a swath of unexpected, unsupported formats in their practice. Recently David Hockney has started to paint with his iPad and many writers use complex software programs to help them organize and edit manuscripts on their computers. How does an archive plan to deal with these format concerns without degrading the original format as intended by the author?

    ReplyDelete
  19. 1. The scope of NLA’s PANDORA Project is only to preserve Australian Internet publishing. This statement seems reasonable, but we don’t know how it could be completed. As we know, the Internet is borderless.

    2. It is mentioned in the article, most organization archive the links but not the content of the linked objects. The internet is dynamic, so how could we do if the content has changed? It will the former archive useless.

    3. “However, work remains to identify the specific metadata elements needed for long-term preservation as opposed to discovery, particularly for non-textual data types like images, video and multimedia.” My question is how could we generate metadata for images video and multimedia nowadays?

    ReplyDelete
  20. 1. In the study methodology part, the author claims that ""digital archiving" was defined as the long-term storage, preservation and access to information that is "born digital" (created and disseminated primarily in electronic form) or for which the digital version is considered to be the primary archive". With what study purpose did this definition come up? Why did he exclude the digitization of material from another medium?

    2. The libraries must make their own decisions to protect the intellectual property, if the law in their countries has not caught up with the digital environment. That is what the author believes. But, what should a good law, which offers enough protection to intellectual property, be like? What aspects should be considered?

    3. The author raises his corcerns about the difficulty in extending legal deposit to network publishing that any individual with access to the internet can be a publisher, and he introduces two general approaches to the gathering of relevant internet-based information: hand-selected and automatic. My third question is that do archivists have a responsibility to identify the truth and value of archives when selecting relevent ones?

    ReplyDelete
  21. 1. Hodge writes, in the beginning of the article, that librarians and archivists “must now look to information managers from the computer science tradition to support the development of a system of stewardship in the new digital environment”. This article, though, was written in 2000. It seems, at this point, that the emphasis on archivists and librarians learning at least some of these skills is much greater. Is this true, especially in the context of the iSchool? Are we more self-sufficient as a profession when it comes to digital archiving?

    2. In discussing the creation stage of the lifecycle and its importance to digital archiving, Hodge articulates that one way to avoid issues in digital preservation is by making sure the creator is aware of formatting, standardization, metadata description, etc., as well as the long-term value of their documents. Does this seem like a lot to expect from a creator, or at least in thinking of the creator as varying from case to case? A corporation or other business might be much more aware of these concepts then, say, an author who has his own means of creating digital content.

    3. Hodge writes towards the beginning, in referring to the study’s participants, that “technology was of secondary interest to the understanding of policy and practice”. This is illustrated when talking about how to approach the archiving of links within digital text. It more or less seemed like it was up to the repository to determine. Do you think that personal policy still reigns supreme or as technology has progressed since 2000, are we focusing more on standardization and interoperability in digital archiving?

    ReplyDelete
  22. 1 On page 6, about gathering approaches, there are two ways to the gathering of relevant Internet-based information-- hand selected and automatic. The difference is obvious that the hand-selected method is based on selection guideline and subjection of people, while the automatic approach is based on computer algorithm. Although the hand selected method may be more subjective, when facing various website designed by human, especially foreign sites acquired translated, would the automatic way work well enough to satisfy our needs?

    2 On page 10, on decision whether we choose transferred standard format or native format should be depended on the question that why we archive the documents. In my opinion, this can be answered that we need to use the information we archived. Thus, we should choose what is convenient for us to retrieval. Transformation seems to have more advantages upon this. Considering the copyrights issue of native formats, I think that we should figure out a way to solve this, maybe in a way more like references we use in papers.

    3 On page 11, the Internet provides us with various world in the meanwhile difficulty to archive because of its a large amount of formats and medias. What we have done is narrow them down to several formats and medias. And the problem the article concerned is that there is less consistency in modeling, simulation and specific purpose software areas; much of this soft ware continues to be specific to project. Is that possible we only use the important data for retrieval, and links to access to specific software, which is preserved in a cloud server, if we need detailed?

    ReplyDelete
  23. Section 4.1 mentions creators as not only being human beings but also sensing equipment. Only to then mention the creators estimating the value of the information they have created. How would a piece of equipment judge the value of a specific piece of information it has sensed? I would assume there would need to be a human element involved in the process otherwise the straight binary assessments made as to value would present problems moving forward or would that kind of process be beneficial for creating an archive?

    In dealing with archiving links, the article mentions only archiving links themselves and not the content associated with the link. For websites such that use links to outside sources as part of citing or notation, such as Wikipedia in their “Notes” section, how might not archiving the content of the links influence the archiving of a singular web page?

    Given the nature of storage mediums becoming smaller and larger and the easily corruptible nature of the data present on these mediums, the article mentions Oak Ridge as migrating to new storage every 4-6 years. With the budget available to places such as Oak Ridge it seems relatively easy to give the necessary time and attention to these moves. In contrast how would smaller archives deal with migrating data on a smaller scale than Oak Ridge and with a much smaller budget?

    ReplyDelete
  24. 1. With so many “born digital” records in the world, how safe and everlasting are these documents really? Because honestly, the only thing that keeps these digital documents accessible to mankind is electricity. Is it really a good idea to vastly increase the size of our digital repositories and throw away the paper records? I guess it varies, but how do institutions avoid hoarding paper documents just in case they experience system failures or worst case scenario data loss and are left with nothing?

    2. Under section 4.2.3. Intellectual Property Concerns, Hodge says that “in many countries, the law has not caught up with the digital environment and libraries must make their own policies.” What happens when a law passes and a library’s policies do not agree with the new rule? Do they have to change their existing policies to agree with the law or are they gradually grandfathered in?

    3. Under section 4.5 Preservation, Hodge says that “they study showed there is no common agreement on the definition of long-term preservation, the time frame can be thought of as long enough to be concerned about changes in technology and changes in the user community.” I feel that sometimes much archival institutions’ progress can be marred by the idea that “someday this will be someone else’s problem.” How can this idea be avoided considering the fact that the flow of information will never stop and there will always probably be a constant backlog of information?

    ReplyDelete
  25. 1. With so many “born digital” records in the world, how safe and everlasting are these documents really? Because honestly, the only thing that keeps these digital documents accessible to mankind is electricity. Is it really a good idea to vastly increase the size of our digital repositories and throw away the paper records? I guess it varies, but how do institutions avoid hoarding paper documents just in case they experience system failures or worst case scenario data loss and are left with nothing?

    2. Under section 4.2.3. Intellectual Property Concerns, Hodge says that “in many countries, the law has not caught up with the digital environment and libraries must make their own policies.” What happens when a law passes and a library’s policies do not agree with the new rule? Do they have to change their existing policies to agree with the law or are they gradually grandfathered in?

    3. Under section 4.5 Preservation, Hodge says that “they study showed there is no common agreement on the definition of long-term preservation, the time frame can be thought of as long enough to be concerned about changes in technology and changes in the user community.” I feel that sometimes much archival institutions’ progress can be marred by the idea that “someday this will be someone else’s problem.” How can this idea be avoided considering the fact that the flow of information will never stop and there will always probably be a constant backlog of information?

    ReplyDelete
  26. 1. I think a good way to create metadata is metadata auto-extracting technology. It makes the process easier and can solve the problem that ‘much of the metadata continues to be created by hand and after-the-fact’ (section 4.1). But there are also some limitations and the most important one is that we have to preprocess the data sources and get rid of those having defects in format or content. So how to solve this problem? Software? Or the format unification of data sources?

    2. The author mentions the PANDORA Project of NLA in section 4.2.1.1 and says ‘the NLA has formulated guidelines for the Selection of Online Australian Publications Intended for Preservation by the National Library of Australia’. I wonder how to decide which items should be archived comprehensively while others not. Are there any standards?

    3. I really wonder which format should be used in preservation, transformation or native format (section 4.5.3). I guess the best way is to use both of them. I think I know the reasons why we should use transformation. But I cannot understand why ‘the projects reviewed favored storage in native formats on the whole’.

    ReplyDelete
  27. 1 - In section 4.1, the authors state "All project managers acknowledged that creation is where long-term archiving and preservation must start." I agree, and think this is an important step. With that, this charge puts more of a different kind of responsibility for the digital information to the creator, in a way that I think could (and perhaps does) go widely overlooked/unassumed.
    2 - Section 4.2.1.4: is it in best practice to archive something while it is ongoing or live? If something new is created, or some modification has been made, shouldn't that piece of information in it's own right be archived?
    3- 4.2.3: Examples of real life intellectual property concerns would be helpful. I realize intellectual property is a hot topic, but I'm not really sure I fully understand it in this context.

    ReplyDelete
  28. 1. In this article the author discusses the topic of metadata generation in the creation of a document. She states that the metadata generated at the creation of a document is usually done by hand but that recently companies are including metadata generation in the architecture of the program that creates the document. Is this a good idea? What are some of the problems that could arise by doing this automatically at the creation of the document?
    2. In discussing the methods used by several agencies to gather data for archiving the author mentions that some agencies, like Sweden, use bots to scour the web for information. Should one trust a program such as a bot to do this type of work? Is it possible that a bot would miss data that should be archived and if it does than how could you tell?
    3. In this article the author mentions the idea of hardware and software migration and emulation. Hardware and software migration is the idea that documents should be moved from older systems and formats to newer ones to keep up with technology and that to do so changes must be made to the document. Emulation on the other hand is the creation of a “shell” of code around a document that translates it so that it can be used on new systems and formats. Each of these two methods has their own benefits and drawbacks. Which of these two approaches is the easiest to implement or is it situational?

    ReplyDelete
  29. 1. I found the section on rights management and security requirements especially interesting. This summer at the annual conference of the American Library Association I heard a presentation on social media sites stripping embedded metadata about copyright from photographs. At the time I thought the concept of embedded metadata within a digital object seemed like a great way to keep related information together, but if it can be stripped and edited how can this information be properly saved and preserved? It reminded me of the concept in the article about how it was hard to determine if a photograph had been altered or not.

    2. In the section on creation I was intrigued by the idea of preservation indicators and having the creator/researcher provide an indication of the long-term value. While the paper stated that this would not take the place of a normal retention schedule, I wonder what impact a preservation indicator by the creator would have on the future use of the document or information? We often create things with no concept of how it will be used in the future, if preservation indicators becomes a way of filtering information, can the creator have an adverse effect on the information they create if they don’t see the potential future value?

    3. Having a strong interest in cataloging I saw the concept of creating metadata at the point of creation a helpful and interesting prospect. It reminded me of last weeks article about creating short records for archives to make them assessable. I would strongly support the use of creator derived metadata to produce these short records, because they would be meaningful to the creator, those seeking the information, and could be used for creation of a fuller cataloged record. I have cataloged many items with creator derived metadata and while it never meets the standards of the most used cataloging standards, the information is extremely helpful in providing a controlled vocabulary for subject searching and providing a sense of how the information can be used and what aspects need to be emphasized. I think it would be useful for all information organizations to think about providing and asking for creator supplied metadata so this information could be used to help thwart the ever present backlogs in libraries and archives.

    ReplyDelete
  30. 1. Hodge's article assumes the information life-cycle as: creation, acquisition, cataloging/identification, storage, preservation and access, yet other authors did not use this particular version. Has this life-cycle become the standard?

    2. The author states that in the creation process, the creator should be "involved with assessing the long-term value of the information." If this were the case, wouldn't it be possible that there might be a degree of bias on the part of the creator?

    3. Emulation was mentioned by Hodge as an alternative to migration. This is an intriguing concept. Describe in-depth how emulation works and also give reasons why it has yet to be employed.

    ReplyDelete
    Replies
    1. In response to #2, I had the same thought. In a traditional library, the value an item adds to a collection is normally assessed by a particular unit or committee, with little deference to the creator of the work. Is it reasonable to give this much deference to an author of digital content?

      Delete
  31. 1. The idea of archiving only a hyperlink to a document, rather than the text of the document itself, baffles me. I understand the constraints of storage space and resources, but if one's intent is to keep an archive of digital resources, shouldn't the resource itself be saved, rather than a link that could later be "broken?"

    2. The author states that "creation is where long-term archiving and preservation must start." With the ease of online publication and the very limited knowledge (or care) of most users of proper practices or thoughts toward long-term preservation, how realistic is such a claim?

    ReplyDelete
  32. 1. How do you archive a website?

    2. “Because of the speed of technological advances, the time frame in which we must consider archiving becomes much shorter. The time between manufacture and preservation is shrinking.” I feel like this makes perfect sense and yet I don’t exactly follow. Why does advancing technology mean that we have to preserve things more quickly now?

    3. I was intrigued by the “rigorous media migration practice” in place at the Atmospheric Radiation Monitoring System mentioned in the storage section of the article (section 4.4). I was impressed that they fully admit that migrating to new technologies every 4 to 5 years will eventually mean that the effort will become “nearly continuous.” Doesn’t that sound a little absurd? If an archivist is still migrating objects to a new technology when the next technology comes out, what is she to do? Finish migrating the objects to the current technology and then turn around and migrate them again? Or would she stop halfway through and switch to the newer technology? And then wouldn’t this process start to pile up on itself after a while? I don’t know. Maybe archiving is already just a race to keep up with updated preservation practices…

    ReplyDelete