Doing great things with small languages

Australian Research Council Discovery Project DP0984419 (2009-2014)

An ARC funded project in the program of Linguistics at the University of Melbourne

Language documentation
Language documentation

Linguists routinely record minority endangered languages for which no prior documentation exists. This is vitally important work which often records language structures and knowledge of the culture and physical environment that would otherwise be lost. However, while it is typical for the interpretation and analysis of this data to be published, the raw data is rarely made available. The data – tapes, field notes, photographs, and video – are often not properly described, catalogued, or made accessible, especially in the absence of a dedicated repository. This means that enormous amounts of data – often the only information we have on disappearing languages – remain inaccessible both to the language community itself, and to ongoing linguistic research.

Linguists routinely record minority endangered languages for which no prior documentation exists. This is vitally important work which often records language structures and knowledge of the culture and physical environment that would otherwise be lost. However, while it is typical for the interpretation and analysis of this data to be published, the raw data is rarely made available. The data – tapes, field notes, photographs, and video - are often not properly described, catalogued, or made accessible, especially in the absence of a dedicated repository. This means that enormous amounts of data – often the only information we have on disappearing languages – remain inaccessible both to the language community itself, and to ongoing linguistic research.

The data that we create as part of our normal intellectual endeavour should be reusable, both by ourselves and by others. First because any claims that we make based on that data must themselves be replicable and testable by others, and second, because the effort of creating a digital representation of the data should not be duplicated later by others, but used as a foundation that can be built on. The same issue is being faced by scholars in many disciplines, how to build on existing knowledge and how to add new data that is being created in the course of various research projects so that the broader research community can benefit from it. This is all the more important when a linguist makes the only recordings for an endangered language - one that may no longer be spoken in the near future. Australia and its immediate neighbours are home to a third of the world’s languages, most of which may never be recorded and many of which could include completely novel structures or ways of viewing the world. To improve our capacity to provide good records of these languages we will need the kind of methodology being developed by this project.

How do we embed theoretical research work in responsible fieldwork so that we can create good primary data for longterm reuse by the speaker communities we work with and by other researchers? How can we build shared digital infrastructure to support collaborative research, both within Australia and internationally? How can we improve theoretical research by creating access to large amounts of primary data based on its relevance to our needs? At a time when many small languages are in danger of being lost, how can we provide adequate records as part of our normal work practices? The knowledge base of the discipline will be extended exponentially by adoption of the methods developed in this project, first because we will provide a mechanism for locating existing information via standard metadata and search tools, and second because we will be training new researchers to create primary data in appropriate forms for discovery and dissemination.

Australian Research Council logo

This project is funded by the Australian Research Council grant DP0984419.
Chief Investigators: Nick Thieberger and Rachel Nordlinger.