LAL Seminar - Corpus annotation for typological research in discourse and grammar: the Multi-CAST initiative Stefan Schnell, University of Bamberg

In this talk I will outline the main ideas behind the multilingual corpus project Multi-CAST that is designed for corpus-based typological research into the interaction between grammatical structure and discourse in and across diverse languages. I will first give a short overview of our current corpus and in-progress developments and then turn to the corpus annotation schemata GRAID and RefIND that form the distinctive backbone of this project. I will also discuss specific issues in morpho-syntax and reference (zero anaphors, various types of person form, argument-adjunct distinction, referent status) and related annotation practices. In the second half of this talk I will show how analyses of GRAID and RefIND annotations can bear on research questions in the areas of discourse structure (referent introduction and tracking, referential choice), and the interaction of these with argument structure and semantic properties of arguments. Essentially, our research relates to two major areas, namely that of (production-oriented) discourse processing and information management and that of language variation and change, and I will briefly summarise some of our recent and current studies in these areas.

Multi-CAST is intended as a contribution to open (language) science. The corpus and related documentation and annotation manuals can be found at https://multicast.aspra.uni-bamberg.de//. All corpus data are freely downloadable through a Creative Commons Attribution 4.0 International licence (CC BY 4.0), and are also available as the multicastR package in R.

Friday 11 September, 4:00pm

https://unimelb.zoom.us/j/97151309216?pwd=QnZtUjdqWFpRUjZseVh3bExwV25LUT09
Password: 227560