As the Future Unfolds, What about the Past?

By Mary Waltham                                                        SPI  Seminar March 7th 2002

For further information contact me by e-mail at:-


STM Publishing  (Slide 2)

·         Market size estimated to be $2.5 billion per year and growing (source: Salomon Brothers UK equity Research Report, February 1997, 40)

·         Some 2,000-research libraries account for large proportion of business

·          In 1998, less than 5% of revenues from online publications  - predicted to rise to 70% over intervening 5/10 years.

·         Growth of scholarly information greatest across sciences (+13%/year STM source:NSF).


What is a journal back-file/archive? (Slide 4)

Used here to include the online archive of materials published within publications dominantly journals before the online journal was launched- say – the mid-‘90’s.

Does not include all journal correspondence – can also be referred to as archival.

Why is a back-file central to researching the literature?

Access to previously published material is central to the research process

Why is it more useful than print?

·         Searchability

·         Linking to other journals and other sources such as databases

·         3D images becoming more important across a number of fields


What do customers want?

Two types of primary customer

·         Librarians – often key budget holders, includes academic, corporate and govt funded libraries

·         Researchers – users

So there is a unique disconnect here between users and payers for STM information

Librarians want…            (Slide 5)            Source: Several but look at

Convenience, quality, a single integrated interface

·         Permanent access – what does this mean?

·         Librarians understand the cost of creating and maintaining an online archive – but expect the pricing to be reasonable.

·         Responsibility lies with the publishers – libraries not equipped to build their own archive – unlike print collections. Want publishers to be flexible too – and allow libraries to make online collections of their journals ….potential for conflict here.

·         “Seamless access to journals for faculty, students and staff…’ – publishers expected to be able to provide this.

·         Site licenses must address archive policy squarely. (How many publishers at the seminar are selling site licenses?)


©Mary Waltham, 2002




Researchers want…. (Slide 8)

As readers

·         Rapid access to an open system with links from every reference to full text – and no barriers that they have to contend with.

As authors

·          Research widely disseminated, cited and accurately attributed


These two groups have been and are converging and articulating their wishes and demands.(Slide 9) Examples

Declaring Independence


Create Change


The market is taking the lead through such initiatives as Public Library of Science and most recently Budapest Open Archive Initiative


The market environment

Outline of archiving initiatives



Funded by the Mellon Foundation – but an independent entity now.

Journals in 15 disciplines, 850,000 articles, and 5 million pages across arts and sciences.

·         Aims to increase/improve access to journals literature, ease storage costs and address conservation and preservation issues

·         Explicitly do not want to compete with publishers – hence the 3 or 5 year “moving wall” of back-file

·         Fees charged are based on Carnegie classification

·         Regular surveys of librarians to see if improved online access is altering their behavior with respect to bound volumes and storage of older literature and so if money is being saved: Net conclusion is “Yes” – fewer bound volumes, more discarding of issues once no longer in the stacks.

·         Look at usage of older articles –(Slide 13)

 Electronic access seems to have increased usage of older materials, citation data alone not a good predictor of online usage, not always the articles that push forward hot research that are the most accessed but may be other ‘popular’ articles used for teaching large classes. Graph shows increase in access  of JSTOR articles NOTE: 100 colleges have had access  to JSTOR since 1997.








©Mary Waltham, 2002


Other digital preservation projects (Slide 14)

Yale, Stanford, Indiana and Cornell

Mostly 12 month funded projects, which should report formally this year (2002)

Libraries are set up to provide access and to make collections based on patrons needs – which will change through the life of an institution. Not set up to establish and maintain digital archives.

Library of Congress

·         American Physical Society has arrangement with LoC – in exchange for current and ongoing access to APS online journals, LoC maintains the journals as a safe storage site.

LOCKSS (Slide 15)

Also funded in part by Mellon, based at Stanford University.

Librarian perspective

Provides tools which use local library controlled computers to ‘safeguard’ access to web based journals. Intended to demonstrate to librarians that it is safe to subscribe to the online edition and to cancel the paper edition.. Acts as a selective web cache – fetching and storing new content. Never deletes content it has pre-loaded and the content is continually revalidated.

Publisher perspective

All publisher access control mechanisms are enforced – content not ‘leaked to unauthorized users. But note:

·         The only power that LOCKSS removes from the publisher is their ability to revoke the rights to back content


Government  funded initiatives (slide 16)


Free and open access to all information on the site OR on the journals’ site so PMC will ‘point’ to a journal site – currently 13 journals and 55 Biomedcentral journals which are online only.


Vision is of “abstracts covering a range of disciplines, linked to the full text of articles hosted on the publishers’ site – European biological archives and data.


Department of Energy- Office of Scientific and Technical Information (OSTI). Essentially abstracts and links to publishers’ sites. Came under threat from a congressional committee in June 2001.








                                        ©Mary Waltham, 2002


Not-for-profit publishers (Slide 17)



What is available

State of archive

Business model

Current fee


Full text to 1985

To 1975 ‘by end of 2002’

Annual fee: per title


Mem: $30, Non-mem:$50


Full text all issues

Archive Complete: PROLA

Annual fee OR included with sub: full archive

Mem: $100

Non-mem: $355-$425 (Carnegie)


Full text all issues

Archive complete: (Escrow fund)

Subscription: per title

Past 5 years AMS,

JSTOR earlier


Full text to 1988

Archive not complete

Site license:

All of current archive

IEL for institutions



 High - Wire Press (Slide 18)








©Mary Waltham, 2002




For-profit publishers (Slide 19)



What is available

State of archive

Business model

Current fee

Elsevier Science

By subject to first issues

Not complete - yet

Site license

Add-on to site license for

Science Direct


Full text to 1996

Not complete

Subscription/site license: annual fee

Included with sub/site



Full text to 1996

Not complete

Subscription/site license: annual fee

Included with sub/site



Trends (Slide 20)

Archives are now being extended back to first publication in many instances.

·       Societies and associations have been leaders


·       Subscribers currently buy access to the existing archive as part of their subscription

·        Although this is changing as deeper archives are being developed

·        EITHER a monetary value is placed on this information

·        OR access is opened up to the older content free of charge. ...There is a divergence….



©Mary Waltham, 2002


More Trends (Slide 21)

·Researchers as readers are:

·              Reading more articles

·              Reading more online

·              Reading more articles well after they are published -


w      “Over one third of the articles read are more than one year old. The newer articles tend to be found by browsing for the purpose of keeping current and the older ones are read for research and teaching purposes. The vast majority of older articles are obtained from libraries.” (source:Tenopir and King: 2000)


Strategic Issues (Slide 22)

Visibility  (Slide 23)

Slide shows:-

·         Article downloads have grown rapidly within OHIOlink as more journals are available to more institutions.

(Note: EJC on the slide stands for the OHIOLink Electronic Journal Center)


·         Tom Sanville Director of OHIOLink concludes that users go to a much broader array of journals than an individual library can provide.

How much are  articles accessed? (Slide 24)

·         Example here is for EMBO journal, which is available full text free after 12 months, Note: After an initial peak during the month immediately after publication, usage declines to about 6% of the initial figure and continues thereafter. Even with free access usage contnues at about the same rate as when usage was restricted to those who paid for it. Over a 2 year period about 40% of usage occurs after the first 6 months of publication.


Mission issues (Slide 25)

w             Why choose publisher X?

w             For Societies ..”From Declaring Independence”

w             “ Societies … assume an obligation to the larger community of scientists in their discipline. Many society mission statements cite the broadest possible dissemination of scientific information as a prime directive for their programs. By evaluating a society's scope, size, and publishing success, scientists exploring publishing options can determine whether there is a good fit with a society publisher.”

w             For all publishers

w             Free and/or discounted access to online health science journals for developing countries. Society and commercial publishers involved see: “Access to research”

©Mary Waltham, 2002



Changes in the way science is done (Slide 26)

w             Steep growth in interaction between disciplines

w             “The most obvious example of this trend (towards interrelated specialization) is the increasing interest among physicists in biological phenomena. Those physicists who are pursuing this route need easy access to a large body of scientific literature in biology, physics, instrumentation, and the like, and it's impossible to predict today where the relevant information will turn up tomorrow.” (source: James Langer President of APS 2000)

w             Opportunity for publishers to gain new access to new readers, authors and reviewers


Integration of the online literature (Slide 27)

w             Steady progress towards full integration of the research literature online

w             Users want it - across subject disciplines, across publishing media

w             It is possible to achieve…eventually

n            “..There is little difference between retrieving an article from an on-line journal and downloading an entry from a large database..”


How much to archive? (Slide 28)

Do you have a priority list for what you should be archiving or is all of your back-file of equal value? This area is likely to be very community and journal dependent. Should you include advertising pages? Again, down to the individual publisher – how many here are not archiving advertising pages?

·         Citation half life according to ISI:-

“The cited half-life is the number of publication years from the current year which account for 50% of current citations received. This figure helps you evaluate the

 age of the majority of cited articles published in a journal. Only those journals cited 100 or more times have a cited half-life.

A higher or lower cited half-life does not imply any particular value for a journal.

 For instance, a primary research journal might have a longer cited half-life than a

journal that provides rapid communication of current information. Dramatic changes in Cited Half-Life over time may indicate a change in a journal’s format. Studying the half-life data of the journals in a comparative study may  indicate differences in format and publication history.”

Journal of biological chemistry 5.6 years

JCI  6.6 years

Journal of Lipid research 7.9 years



·         ISI only tracks 10 years,

·         There is often/usually a very long tail from the citation half life point

·         Varies by topic/community/journal




©Mary Waltham, 2002


Who is responsible for the digital archive? (Slide 29)

Looks like the publisher although note the number of experiments going on between publishers and other potential providers – looking at cost-effective and convenient ways of achieving the goal. Should there always be a third party – neither publisher nor library in charge of the archive? Should there be a third party to say clearly – “this archive exists and is functional”


Financial (Slide 30)

Creating an online archive costs money – consider how those costs will be recovered for initial production of the archive and then ongoing maintenance. Can you find a way of covering the costs over say a 5-year time –frame? Consider this project somewhat like a new journal launch – rapid up-front cost recovery will not enable wide market penetration. Expectation is for cost of the back-file to be low compared with the current subscription.


Technology (Slide 31)

Frank Stumpf will address in detail – but I hope the clear message here has been to choose common tools and standards when embarking on an online archive – so that your publications can be used across a number of different platforms over the years – and that you and your users achieve the greatest return and utility from an effective archive.


Thank you







©Mary Waltham, 2002