Digital Daisy Bates

A project in the School of Languages and Linguistics at The University of Melbourne.

Grant type

Faculty of Arts small grant (2013-)

Digital Daisy 1921 Daisy Bates (1859-1951) was a remarkable ethnographer who spent all of her adult life living in Aboriginal communities around parts of Western Australia and South Australia. Her priceless collection of written records documents a great deal about the language and culture of the many different people she worked with. Her 'Native Tribes of Western Australia' (edited by Isobel White, Canberra: National Library of Australia, 1985) is a detailed collection about Aboriginal people of WA. Significantly, it is an edited version of all of her notes except the section containing thousands of pages dealing with Aboriginal languages.

In collaboration with the National Library of Australia (NLA), this project made accessible this extremely valuable collection of several hundred wordlists of Australian languages, originally recorded by Daisy Bates in the early 1900s. This will enable reuse of the collection by Aboriginal people searching for their own heritage languages and by other researchers. The dataset has been constructed according to the Text Encoding Initiative TEI: P5 Guidelines, to embody both a facsimile of the original set of manuscripts and a structured dataset for complex research questions. Access to these historical records of Australian languages will benefit from the interdisciplinary cooperation of linguists and musicologists with technology experts and with the premier collecting agency the National Library of Australia.

The output of this project has been a web page with the text of all the vocabularies, each linked to the image of the source document. A map of locations of the vocabularies can be on the PARADISEC Bates vocabularies web page and this then provides a point of entry to the vocabularies as can be seen in the images at the top of this page. A search and annotation system are also planned.

Bates typescript

The vocabularies are extraordinarily valuable as little else was recorded in the same time period and nothing of the same scale has been attempted before or since. However, despite their value, the wordlists, often including grammatical information in the form of example sentences, remain relatively hard to access due to being held in paper form in only three locations: the Barr Smith Library in Adelaide, National Library of Australia, the Battye Library in Perth. By building online accessible content for Indigenous Australians this project takes historical records out of the archive and into the community, supporting current language initiatives.

There are 4,500 pages of typescript and 8,600 pages of manuscript, representing languages from the Southern SA/WA border up to the Kimberley. At least 123 speakers are named in the vocabularies and it is unclear how many languages they represent.

The 4,500 pages of typescript and 1712 notebooks in the NLA will be imaged, keyboarded, and the text will be encoded using the TEI framework. We will base our approach on work done by Henderson (2008) on part of the Gerhardt Laves manuscripts at AIATSIS. Bates' work represents a resource that has not been made accessible, and not even the most basic work has been done to ascertain what languages are represented (eg a best guess approach was taken in Thieberger 1993; Nash 2002 applies a series of metrics to sample data from south-east WA to show similarities to known languages of the region, and McGregor 1998 has been able to identify some Kimberley languages). At a time when Australian languages are under severe threat from English it is critical to make the best representation of historical sources to assist speakers in their efforts to relearn and reinforce the use of their languages.

This project will not only do that, but will also document a research method that can then be replicated for other collections.

In March 2014 we have finished keyboarding all the typescripts and prepared a first encoding. See the discussion by Conal Tuohy on his blog.

The Advisory Committee for the project is: Linda Barwick, Claire Bowern, Simon Greenhill, John Henderson, Bill McGregor, David Nash, and Conal Tuohy.


  • Henderson, John. "Capturing Chaos: Rendering Handwritten Language Documents," in LD&C Vol. 2, No. 2, 2008, pp. 212-243
  • McGregor, William. Handbook of Kimberley languages, Volume 1: General information. Pacific Linguistics, C-105, Canberra, 1988
  • Nash, David. "Historical linguistic geography of south-east Western Australia," in Henderson, John and Nash, David (eds.,). Language in Native Title. Canberra: AIATSIS Native Title Research Unit, Aboriginal Studies Press, 2002, pp. 205-30
  • Thieberger, Nicholas. Handbook of WA Aboriginal Languages south of the Kimberley Region. Canberra: Pacific Linguistics, 1993 (out of print, but see the online version created in 1994)

Project Manager: Nick Thieberger, Linguistics, School of Languages and Linguisitics, The University of Melbourne