Presentation 22 September 2000
<font size="5" face="Times New Roman"></font>Newspapers on conventional and electronic media
By STEEN BILLE LARSEN
Projects on acquisition, preservation and access in Denmark, Norway and Sweden. Paper presented at 5th International Symposium, Bibliotheca Baltica, Szczecin. September 22, 2000
The electronic world and the Internet have not made the libraries' tasks any easier as far as newspapers are concerned. On the contrary, we might well be facing twice as many challenges as previously, because now we have to take care of newspapers both in their traditional form on paper - what we might term the conventional medium - and as well in the electronic form. Here the new aspect is first and foremost publications on the Internet. Add to this the quite obvious possibilities for digitisation of older newspapers.
In this brief paper I am going to concentrate solely on some of the main issues facing the national libraries as regards acquisition, preservation and accessibility, taking as my point of reference some projects running in the three Scandinavian countries.
Publishers of the daily newspapers very quickly caught on to the advantages of publishing newspapers on the Internet. After an initial period of charging a subscription fee for publication on the net, the net papers have now gone on to publication based on advertising funding. Newspapers on the net are blessed with marvellous possibilities for swift updating of news that neither television nor a paper edition can compete with. It seems we have now aspired to a new kind of publication, which should be preserved for posterity.
In 1998 a new act on legal deposit was passed in Denmark, which also encompasses electronic publications on the Internet. Since this act came into force, copies of over 5,000 works have been downloaded and are now stored electronically. The Internet is, of course, manifold, but legal deposit in Denmark applies at the moment only to those net publications, which are available as a complete and finished text - the so-called static documents.
In contrast to this type of online publication stands the category of - what we call dynamic documents, the contents of which are continually changing, and to which it is not possible to determine an exact time for the final version of the work. Practically all newspapers on the Internet are dynamic. The weather forecast in a newspaper is, of course, connected to a database outside the site of the paper, so that the forecast is being continually updated. The same applies to, for example, sports news. It is not possible to identify a particular point at which you may talk about today's newspaper. The electronic newspaper of 12 o'clock this night has changed into 12 o'clock the next night.
The net edition of the English paper "The Daily Telegraph" is an example of the net newspaper being a completely new product. By the end of the day the articles are transferred into a database in order to make them searchable by subject. That day's paper no longer exists in its original form. On the other hand, you are able to search by subject, for example in the book review section for the past five years. An excellent service to the readers, but as yet not very manageable for the libraries. As far as Denmark is concerned the consequence is that we have decided that neither dynamic works nor databases are subject to legal deposit - until a technical solution has been worked out. This kind of information is not preserved for posterity by the libraries - for the time being.
Government publications represent an area of very advanced Internet based information. A case in point is the Danish official government newspaper "Statstidende", published on weekdays and containing all announcements, which according to the law must be made available to the public. Statstidende is published on the Internet as well. The daily edition of Statstidende is a complete unit and therefore a static work. At the moment, Statstidende is the only Danish newspaper, which we download according to the act on legal deposit. It is done in files month by month, which are then placed in the electronic stacks. We are planning to begin downloading another newspaper as well.
We are also very keen to start a pilot project with one of the dynamic newspapers and newspaper publishers have shown some interest in this idea as well. The problem is to find a newspaper of the right size and which is published in a form that makes it technically feasible to download it to our electronic stacks.
Sweden has chosen a different strategy for gaining experience in downloading publications from the Internet. The project is called Kulturarw3 (Cultural heritage on the web). The Royal Library in Stockholm is searching the Internet at regular intervals. A robot downloads all netbased information concerning Sweden at fixed intervalsand experience has shown that especially periodicals and newspapers have been difficult to handle. For the time being the robot's results are stored on a special server. It is not clear yet how access may be gained to the Internet publications, which have been downloaded from the net under the Kulturarw3-project.
The problems involved in electronic preservation are the same for newspapers as for any other electronic work. If I may return once again to my own library - the present status is that the legal deposit electronic works are stored on hard discs with backup on tapes. Preservation of electronic editions of newspapers should be included in the solution found for the preservation of other electronic works.
But what about the paper editions? Couldn't we now replace the bothersome microfilms with a digital alternative?
The national libraries in the three Scandinavian countries are all in agreement that microfilming continues to be the safest method of preservation of the information, and in Denmark, Sweden and Norway microfilming is still the most important way of preserving newspaper information. Practical procedures may vary a bit from country to country, but the main principle is that new newspapers are sent directly from the printers to the libraries for the purpose of filming. A number of libraries in each country purchase the microfilms of recent newspapers in order that they may be offered to the users shortly afterwards.
The National Library section in Rana (by the Arctic Circle), which is responsible for the Norwegian filming, was built only about 10 years ago and is therefore likely to have the most up-to-date equipment. All editions of all Norwegian newspapers are microfilmed here to the tune of about 1,2 mil. newspaper pages a year.
In Denmark, the State and University Library in Århus is responsible for the microfilming of newspapers. Quite a different solution has been chosen here. The filming is being done by a private firm followed by a quality control by the State and University Library.
In all countries there are large quantities of newspapers, which have not been microfilmed. Let me take an example: A couple of years ago a calculation was made in Finland which indicated that with a retrospective microfilming of 170 titles a year it would take 50 years before every newspaper was done. I don't know of any such similar calculations in the three Scandinavian countries, but it does illustrate the enormity of the task. One might fear that the newspapers will have turned into dust before the microfilming task is completed. In Denmark, only a selection of recent newspapers is being filmed, which means that unfortunately the backlog is growing year by year.
The electronic world offers tremendous possibilities for access to newspapers -no doubt about that at all. Both paper editions and microfilms are cumbersome as opposed to the ease with which one can capture an article on the screen and print it out on the spot.
An initial problem in connection with giving access to newspapers in electronic form is the question of copyright. An author or an illustrator must have been dead for 70 years before the copyright to the work in question has expired. It is, therefore, necessary to obtain permission from everyone who has contributed to a particular edition of a newspaper before being able to store it electronically on the net. An apprentice journalist who wrote an article at the age of 19 in 1899 might not have died until 1960. That means that the copyright won't expire until 2030. In order to avoid the copyright problems, many digitisation projects have chosen to stop in 1850, 1880 or 1900, according to how certain one wants to be that digitising works and placing them on the net is not in contravention of the law on copyright.
The largest project is "Tiden-projektet" or in English "NORDICA - Inter Nordic Newspaper Project", which was launched in the spring of 1998 by Helsinki University Library, The Royal Library in Stockholm, the State and University Library in Århus and The National Library in Rana as a three-year Nordic cooperative project for digitisation of old newspapers from microfilm. Since then The National Library of Iceland has also joined the project.
The intention is to present a picture of the original text to the user. The text is not going to be converted 100 percent. The OCR treatment is incorporated, however, to experiment with new ways of searching in the contents of the text.
The project will also develop various graphic search interfaces to make the material more accessible to different user groups. It will for instance be possible to click on maps and graphic time axes in order to choose certain localities or periods.
Denmark has been responsible for clarifying the question of copyright in connection with the "Tiden/NORDICA -project", the preliminary conclusion being that one may use material prior to 1880-1900.
A prototype of a digital newspaper service will be a concrete result of the project. To begin with there will be a simulation of a paper version of the newspapers, but gradually new search methods are going to be developed with allows searching through the text , indexing articles and creating graphic search facilities for different user groups.
The "Tiden" prototype is also part of a major development project for the National Library in Rana with the establishment of a digital library, based on a digital mass store. Here sound, text, image and animation will be stored in digital form and be regarded as digital objects.
In the "Tiden" prototype the individual newspaper page is the smallest accessible digital object. This means that the user can get any page presented on the screen.
Scanning a microfilm produces a number of digital images of single pages with a URN, but without relation to origin. These images must be organised in such a way that the user is given direct access to the individual newspaper page. The project terms this digital binding and software has been developed for this process. The software contains its own database in which the individual pages are registered so that the database controls which page number, issue, volume and title the individual URN belongs to.
In order to make the digital material optimally accessible to the users, it has been decided to give each digitised newspaper its own static homepage as the gateway to the material. This html-page will be identified and indexed by the search robots on the Internet and in this way the newspaper material will be available in digitised form on the net.
The project has provided useful experience with digitisation of old microfilm with a view to OCR treatment. At the same time, one has had to realise that changes must occur in the production of new microfilm to obtain a quality of film which can be read both on analogous microfilm readers and yet be suitable for scanning. It has been necessary to adjust the light setting and take due note of the degree of reduction. It has also been necessary to placing photographing " pointers on the film, which the scanner can use as points of reference for automatic positioning during digitisation.
The National Library in Rana has as an experiment also digitised some old microfilms for Finland, where good results have been obtained with OCR treatment of the newspaper texts. It shows it is possible, but the digitisation is not sufficiently rational. Individual adjustments and "tunings" of each image have been imperative - and that is certainly not an asset when talking about mass production.
Production of the microfilm is the most resource demanding aspect in the process. The material has to be handled manually and each individual page must be photographed. More intensive quality checks are necessary to make sure that all the pages of the newspaper are actually on the film and that they are arranged in sequential order. This prepares the ground for the optimal automatic treatment later in the production sequence. One might ask if you instead should start with the digitisation and after produce the microfilm from the digital copy.
Today the National Library in Rana has two production sequences, one for new and one for old material. The major problem with old newspaper material is that the paper has often yellowed which makes the contrast between print and background negligible. It is important here to get the right light setting when photographing the originals. Newer material usually has a suitable contrast between print and background, but problems still arise when you have a colour print and this is photographed in black and white.
The first version of the "Tiden" prototype is supposed to be ready this year and the plan is to carry out the tests together with the users during Autumn and Winter 2000 and 2001. This prototype is expected to provide remarkable access to newspaper material and the hope is that this will attract many enthusiastic users from schools, libraries and museums.
The State and University Library in Århus are supposed to give access this Autumn on their homepage to test files from four Danish newspapers from the years 1863 - 1865. Inspired by the Tiden/NORDICA-project The State and University Library is also participating in the VESTNORD-project in cooperation with The Faroe Islands, Greenland and Iceland to scan newspapers connected to the North Atlantic Area, among these the entire Grønlandsposten (The Greenland Post) 1861 - 1999.
The Royal Library in Stockholm is in the process of digitising older newspapers to make their contents searchable as free text. The project, called Excalibur, should result in a digitised version of a number of Swedish newspapers from the period 1645-1721. They will be available on the web and searchable in free text via the programme RetrievalWare from the firm Excalibur Technologies.
The Excalibur project will be extended to include Swedish newspapers up to 1850, thereby providing a large historical database covering the period 1645-1850. Due to the problem of copyright, the library is not moving beyond 1850 at this stage. Scanning is also a major problem for this project as is the OCR reading. Producing a scanned and searchable text of the old papers' Gothic type has proved impossible. Instead the searchable texts have been keyed in and are presented together with scanned facsimiles of the original pages. Maybe the project can build on experiences from the projects in German speaking countries to scan Gothic type. The material has so far not reached the web, but that should happen before long hopefully.
This brief presentation shows that IT has provided new prospects for making newspapers available, but the volume of information, which has to be organised, is enormous and, consequently, so are the challenges facing the organiser . The old saying that an unorganised book collection is a useless one is certainly most relevant here. Unorganised information is useless information.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.