Digital Daisy Bates

In collaboration with the National Library of Australia (NLA), this project has made accessible this extremely valuable collection of several hundred wordlists of Australian languages, originally recorded by Daisy Bates in the early 1900s.

Moores Studio. 'Daisy Bates' 1936 (detail)
Moores Studio. 'Daisy Bates' 1936 (detail) State Library of South Australia. Public domain

Daisy Bates (1859-1951) was a remarkable ethnographer who spent all of her adult life living in Aboriginal communities around parts of Western Australia and South Australia. Her priceless collection of written records documents a great deal about the language and culture of the many different people she worked with. Her The Native Tribes of Western Australia (edited by Isobel White, Canberra: National Library of Australia, 1985) is a detailed collection about Aboriginal people of WA. Significantly, it is an edited version of all of her notes except the section containing thousands of pages dealing with Aboriginal languages.

In collaboration with the National Library of Australia (NLA), this project has made accessible this extremely valuable collection of several hundred wordlists of Australian languages, originally recorded by Daisy Bates in the early 1900s. This will enable reuse of the collection by Aboriginal people searching for their own heritage languages and by other researchers. The dataset has been constructed according to the Text Encoding Initiative TEI: P5 Guidelines, to embody both a facsimile of the original set of manuscripts and a structured dataset for complex research questions. Access to these historical records of Australian languages benefits from the interdisciplinary cooperation of linguists and musicologists with technology experts and with the premier collecting agency the National Library of Australia.

Moores Studio. 'Daisy Bates' 1936
Moores Studio. Daisy Bates 1936
State Library of South Australia
Public domain


The output of this project has been the Digital Daisy Bates website with the text of all the vocabularies, each linked to the image of the source document.

The vocabularies are extraordinarily valuable as little else was recorded in the same time period and nothing of the same scale has been attempted before or since. However, despite their value, the wordlists, often including grammatical information in the form of example sentences, remain relatively hard to access due to being held in paper form in only three locations: the Barr Smith Library in Adelaide, National Library of Australia, the Battye Library in Perth. By building online accessible content for Indigenous Australians this project takes historical records out of the archive and into the community, supporting current language initiatives.

There are 4,500 pages of typescript and 8,600 pages of manuscript, representing languages from the Southern SA/WA border up to the Kimberley. At least 123 speakers are named in the vocabularies and it is unclear how many languages they represent.

The 4,500 pages of typescript and 1712 notebooks in the NLA have been imaged, keyboarded, and the text was encoded using the TEI framework. We have based our approach on work done by Henderson (2008) on part of the Gerhardt Laves manuscripts at AIATSIS. Bates' work represents a resource that has not been made accessible, and not even the most basic work has been done to ascertain what languages are represented (eg a best guess approach was taken in Thieberger 1993; Nash 2002 applies a series of metrics to sample data from south-east WA to show similarities to known languages of the region, and McGregor 1998 has been able to identify some Kimberley languages). At a time when Australian languages are under severe threat from English it is critical to make the best representation of historical sources to assist speakers in their efforts to relearn and reinforce the use of their languages.

This project not only does this, but also documents a research method that can then be replicated for other collections.

Digital Daisy Bates website


  • Henderson, John. "Capturing Chaos: Rendering Handwritten Language Documents," in LD&C Vol. 2, No. 2, 2008, pp. 212-243
  • McGregor, William. Handbook of Kimberley languages, Volume 1: General information. Pacific Linguistics, C-105, Canberra, 1988
  • Nash, David. "Historical linguistic geography of south-east Western Australia," in Henderson, John and Nash, David (eds.,). Language in Native Title. Canberra: AIATSIS Native Title Research Unit, Aboriginal Studies Press, 2002, pp. 205-30
  • Thieberger, Nicholas. Handbook of WA Aboriginal Languages south of the Kimberley Region. Canberra: Pacific Linguistics, 1993 (out of print, but see the online version created in 1994)

Project details


Faculty of Arts small grant (2013 -)

Research partners

National Library of Australia (NLA)

Project team

Project Manager

Associate Professor Nick Thieberger, Linguistics, School of Languages and Linguistics, The University of Melbourne

Advisory Committee

Linda Barwick
Claire Bowern
Simon Greenhill
John Henderson
Bill McGregor
David Nash
Conal Tuohy

Project website

Digital Daisy Bates website


Associate Professor Nick Thieberger