Visiting Scholar, Department of Linguistics
University of California at Berkeley
Berkeley, California 94720
Ancient Greek and Latin are two members of the large group of ancient Indo-European languages, whose geographical spread stretched from Europe to Asia in antiquity. Indo-European studies incorporate the study of this group of the ancient related tongues, their linguistic prehistory (and later developments), their cultures, and the archaeological remains that might be associated with Indo-European-speaking peoples.
Because the field of Indo-European spans so many different languages and cultures, doing research necessitates access to a wide variety of primary and secondary materials, including many small European publications. However, the current "crisis in the libraries" has meant that escalating prices for books and journals has restricted the libraries' ability to keep up with serial and monograph purchases. So how can one do adequate research, if the libraries are on a fixed budget and they are not able to purchase all needed materials, other than having one rely on interlibrary loan? The problem is hardly restricted to Indo-European studies: Classicists and those in other fields of the humanities (and even in the sciences) can find this true as well. For those in poorer countries, the problem can be even more dire.
If the Internet can make materials accessible for free or at a reduced cost to a wider audience, then it could be an effective means of scholarly communication. This depends, of course, on users having access to a computer hooked up to the Internet, with the appropriate software and hardware installed.
As a means of making Indo-European materials more accessible to scholars in the US and abroad, I have devised a project in conjunction with the Electronic Text Unit of the UC Berkeley Library to test Web publication for Indo-European studies. Initial seed-funding was provided by Bryn Mawr Reviews. The project is intended to examine whether electronic journal publication is a viable alternative to print for Indo-European and those smaller fields, especially where publications are being produced by small societies or university departments. The single most challenging aspect of the project is how to display (and print) a wide variety of scripts.
The publication being put online is the Indo-European Studies Bulletin, a small journal affiliated with the UCLA Indo-European Studies Program. The publication now has a small but ever-growing base of subscribers. It is formally affiliated with a support group at UCLA, so any money remaining after the publishing and mailing costs are met goes to support an annual IE conference and to bring speakers to UCLA. Those scholars in Eastern Europe, mainland China, and the former Soviet Union can receive the Bulletin gratis, in the belief that promoting scholarly communication to scholars in these areas is of primary importance and the expense to the group is justified.
The project is intended to look at the following questions:
Examining how other multilingual electronic journals have been set up can be instructive, since they have also had to deal with the pesky problem of getting the various languages to appear and print correctly. A review of the other electronic journals shows currently a dearth of titles in Indo-European, particularly in Indo-European linguistics. Classics, on the other hand, has a more sizable number of electronic journals. (Classics is defined here the broad sense of classical literature, linguistics, history, and archaeology.) Of the titles found, a growing number are free publications, often put out by a university department or society (e.g., Classics Ireland, affiliated with the Classical Association of Ireland and put out by University College Dublin). Of the free titles listed on various e-journal lists (e.g., http://gort.ucsd.edu/newjour/), some ceased publication after a few issues (e.g., Arachnion).
Perhaps the most rapidly growing sector of electronic journals are electronic versions of established print titles. While a few large publishers are doing the electronic version themselves (such as Johns Hopkins), many others are using "service providers" to produce and distribute them.
Three formats are predominantly used in electronic journal publication.
A number of important developments have taken place on the Web, specifically the creation of a new standard to replace HTML, XML, Extensible Markup Language. Because XML provides the ability to include markup for content (and is less complex than its parent, SGML), allows the tagset to be extended (hence "extensible"), and is now supported by the latest version of the popular browsers (Internet Explorer 5+, Netscape 6/Mozilla), we decided to use XML for the IES Bulletin. A second motivating factor was to see what kinds of problems would arise when working with Indo-European material in XML. For guidelines on markup (and DTD), we used the Text Encoding Initiative's TEI-Lite. XML also requires Unicode, the universal character encoding standard. In Unicode, every character receives its own unique number, irrespective of computer system, software program, etc. Such a standard is a necessity for working in multiple languages, for in the past the variety of encoding schemes made it difficult to transmit multilingual documents across the web easily without problems. As an example, compare what happens when converting a final sigma from GreekKeys to WinGreek (drawn from the example by Sean Redmond, "Greek Font to Unicode Converter", at http://www.jiffycomp.com/smr/unicode/convert.php3): Final sigma in GreekKeys becomes an omega in WinGreek. This occurs because they were encoded differently. In Unicode, no matter what computer platform or software program, the final sigma should appear as such when transmitted to others (as long as the fonts on the sending and receiving end are Unicode-compliant and both have Unicode-enabled operating systems and browsers).
The following description provides a synopsis of the methodology being used for creating the print version. First, we convert all articles from authors into Microsoft Word (if not already in this format) and then do the editing. The text is then imported into PageMaker for formatting and final proofing before the final print version is created and sent to the printer. The online version is created by using a plain text version of the PageMaker file. It is inserted into the XML skeleton and edited with Emacs. After the file is parsed against the TEI-Lite Document Type Definition, or DTD, we then turn to a product which Berkeley currently owns, DynaWeb, to "make book," which allows one to create HTML on the fly. It also offers a number of nice features, such as searching. However, at $75,000, it is unrealistic to see DynaWeb as a product that all but a few large institutions can afford.
We have at present converted a number of the Indo-European Studies Bulletin issues online. Some significant problems have turned up. Since the goal of the project was to see what kinds of problems arose - and not necessarily to bypass them in an effort to publish quickly - it was deemed important to examine them.
Although I have outlined the short-term solutions for displaying missing characters, I would like to add that this doesn't solve the long-range problem, namely, that such scripts should be included in Unicode, since Unicode ultimately aims to cover all the dead (and living) languages of the world. Unfortunately, I have discovered that there has been a lack of participation from scholars in reviewing Unicode proposals, which is regrettable, since it is scholars who are most likely to use and benefit from the inclusion of various dead languages in the Unicode Standard. Also, the Technical Director of the Unicode Technical Committee has stated (e-mail communication, 10/28/00), that "[t]here are powerful, countervailing forces within the [computer] industry and among standardization circles that would prefer that the Unicode Standard stop changing and expanding, since adaptation to such change is expensive and unsettling." In other words, the time may soon come when no additional historic languages may be accepted because of feelings amongst Unicode Consortium members. Hence it is important to continue to pursue the inclusion of these scripts now, and to provide feedback on any errors or missing characters in the repertoire. (In this regard, I have been directly involved in getting an Old Italic proposal into Unicode, as well as those for Linear B and cuneiform.) Since ancient scripts are needed ultimately for Indo-European online publication and research, working on Unicode proposals has become a new focus of the project.
However, in order to get Unicode to work there are a number of requirments you need:
Because our project aims at providing access to a wide audience, requiring these three components of all users seems like a tall order. Since the main browsers (IE, Netscape) are available for free download, this is not too troublesome. The font problem could be resolved, if the needed fonts could be created and downloaded for free (or made available for a nominal amount). Requiring a user to have a new or recent operating system, particularly for those scholars with restricted funding, is more problematic, especially since the new operating systems are currently adding in more and more support for Unicode with each new version of the operating system. Probably a PDF version should be offered in the interim as an option, since this allows easy access and printing capabilities for those with older machines.
A number of issues have yet to be examined:
Future work will entail converting the remaining back-issues of the IES Bulletin to an online format, noting any further problems that arise with XML and Unicode. Eventually finding an inexpensive alternative to DynaWeb will be necessary. Continued discussion of how to handle character encoding/text markup issues (e.g., the underdot for unsure reading) and the creation of a "Best Practices" guide would be a desideratum, as would a guidebook on how to produce an online publication for small departments and societies in the humanities.
Will our efforts in online journal publishing ultimately help in making more IE materials accessible? I hope so. Our work up to this point has really concentrated on making the Internet infrastructure more amenable to handling various ancient languages. The standards and open-source programs do provide a means for smaller departments and societies to publish online, although we can't recommend yet how these should be paid for. Still, we are poised to improve scholarly access and communication greatly with the Internet, and I think this opportunity should be taken up. However, without input from scholars (especially regarding standards), the ability to do research and publish over the Internet could be seriously compromised.
Return to the Table of Contents