Understanding language in its natural habitat

Linguists and psychologists often study sentences in isolation, which may be akin to studying animals in separate cages in a zoo. I have been guilty of this in much of my own work. Our understanding of language will undoubtedly benefit from more focus on language in its natural habitat: conversation (e.g., Du Bois et al., 2003; Hilpert, 2017; Thompson and Hopper, 2001). In fact, the human tendency to cooperate plays a key role in how our complex system of language emerges via cultural evolution (Botha and Knight, 2009; De Boer et al., 2012; Ellis and Larsen-Freeman, 2009; Richerson and Christiansen, 2013; Steels, 2005; Tomasello, 2009). Cooperation is most evident when people use language to communicate with one another in conversations.

Language provides a constrained and discrete system that offers us a window into our even more general, creative, but constrained system of general-purpose knowledge. It allows us to teach and learn, dream and imagine, and reflect and reason in ways that are uniquely human. Moreover, a focus on the functions and distributions of constructions offers important insights about how individual constructions emerge and evolve over time (e.g., Barðdal et al., 2015; Mauri and Sansò, 2011; Traugott, 2015; Traugott and Trousdale, 2013).

There is a growing synergy among linguists, psychologists, anthropologists, and computer scientists, and so it is a very exciting time for research on language. This volume only scratches the surface of what we have already learned, as is evident in the copious list of references.

Goldberg, A. E. (2019: 163). Explain me this: Creativity, competition, and the partial productivity of constructions. Princeton University Press.

Why conventional formulations become easier to access?

An important role for competition and error-driven learning was detailed in chapter 5, offering a way for learners to overcome overgeneralizations and learn the constraints on words and constructions. As a preferred alternative becomes more familiar through repeated exposure, it will become easier to access than the dispreferred formulation through the process of statistical preemption. That is, words that express closely related meanings or grammatical constructions that have closely related functions in discourse are in competition with one another. When multiple representations are simultaneously activated in a given context to express the same (aspects of) the intended message, the representation with the greatest strength wins, and any other representations become progressively dissociated with features of that context. This accounts for why conventional formulations become easier to access. The reason we judge novel formulations to be “wrong,” “inappropriate,” or non-native-like when there exists a more appropriate way to express the particular meaning or function is that we want to speak like “our people.” That is, shared language conventions signal that one is a member of the group. Using ball to refer to a button or saying ?Explain me this is considered “wrong” by native speakers because this is simply not how other native speakers speak.

Goldberg, A. E. (2019: 157). Explain me this: Creativity, competition, and the partial productivity of constructions. Princeton University Press.

Usage based in a nutshell (Ellis 2012)

UB Approaches some references

Usage-based theories of language hold that learners acquire constructions in a similar fashion—from the statistical abstraction of patterns of form-meaning correspondence in their usage experience—and that the acquisition of linguistic constructions can be understood in terms of the cognitive science of concept formation following the general associative principles of the induction of categories from experience of the features of their exemplars. In natural language, the Zipfian-type token-frequency distributions of the occupants of each of these construction islands, their prototypicality and generality of function in these use, roles and the reliability of mappings between these together conspire to make language learnable. Phrasal teddy bears, formulaic phrases with routine functional purposes, play a large part in this experience, and the analysis of their
components gives rise to abstract linguistic structure and creativity.
Is the notion of language acquisition being seeded by formulaic phrases and yet learner language being formula-light having your cake and eating it too?

Ellis, N. (2012). Formulaic Language and Second Language Acquisition: Zipf and the Phrasal Teddy Bear. 32, 17-44.

Usage based and the emergence of L1

The following quotes are from Lieven, E. (2016). Usage-based approaches to language development: Where do we go from here? Language and Cognition,8(3), 346-368. doi:10.1017/langcog.2016.16

Young children show differential and restricted competence in comprehension and production early on; second, that children’s linguistic productivity is tied closely to their linguistic experience, but this interacts with processing capacity, the developing linguistic system, and children’s communicative goals; and, finally, that the development of more abstract grammar is protracted, and that differing levels of abstraction will give the ability to do different tasks

Children are exposed to many meaningful usage events which they can now begin to interpret in the context of this newly developing understanding of shared intentionality. Grammar is learned through a continuous process of abstraction. Constituency and more complex syntax emerge through this process.

In the usage-based approach, linguistic categories such as noun, verb, noun phrase, subject, and object are not pre-given but emerge as the child constructs language by connecting what they already know in terms of the cognitive and intention-reading developments of the first year to the language that they hear. 

The development of word categories is tied to children starting to develop low-scope slot-and-frames patterns based on the frequencies in the input. Examples from English are It’s X-ingI want a YThat’s a Z. The slots in these patterns are the basis of emergent categories, initially of low-semantic scope such as THING or ACTION but showing increasing evidence of abstraction.

Large numbers of studies, not only for English, have found that frequency in the input is closely associated with what children learn.

If something is very frequent in the input, but does not occur in the child’s speech, this suggests that there is something about the form in terms of complexity or meaning that is slowing learning.

An example comes from my study of six children’s learning of English auxiliaries (Lieven, 2008). There was a strong rank order correlation between the frequency of these in the input and the order in which they were found in the children’s speech, but there were a number of exceptions. Frames with couldwould, and should were relatively frequent in the input, but in the period studied these emerged either late or not at all in the children’s speech. This is probably because these modals require a subtle semantics which the children did not yet control. Modals are a set of verbs that diverge from simple declarative sentences and questions about factuality, signalling a range of speaker stances towards the information being conveyed. Moreover, they are polysemous (being used to convey both speech acts and logical prediction), and in each usage they signal a slightly different range of speaker stances.

Although children start with rote-learned strings and low-scope schemas and may retain these into adulthood, they clearly also develop the capacity to produce and comprehend at a more abstract level. 

The evidence is that the youngest children can only correctly identify the agents and patients of transitive causatives if they are presented with a prototypical coalition of cues.

From the point of view of a usage-based account, one can see these results arising from two competing processes: the deep entrenchment of SVO word order (initially with low-scope pronoun schemas) which competes with the much less frequently encountered and highly specific pragmatic contexts in which OVS word order (even with case marking) is used. This latter usage requires a coalition of contextualizing cues for its interpretation…

there is evidence for the storage of ‘big words’. Bannard and Matthews (2008) showed that children did better on production of 4-word sequences that were frequent in the input than identical sequences in which the last word is changed. Second, there is good evidence for the importance of low-scope, pronoun-based schemas particularly in the early stages of sentence production (Ambridge & Lieven, 2014). We know that children are significantly more likely to correct non-grammatical word orders to canonical word order as they get older (Akhtar, 1999). When presented with novel verbs in non-canonical word order, younger children tend to use the same word order when asked to produce the sentence with different nouns. However, when children do change to the correct canonical order, they are very likely to use schemas based on pronouns (e.g., He’s meeking it; Abbot-Smith, Lieven, & Tomasello, 2001; Matthews, Lieven, Theakston, & Tomasello, 2004, 2007).

On the usage-based assumption that young children learn language in order to communicate, the relationship of form to meaning is obviously a crucial area for research. However, in research on the learning of syntax, there has tended to be more of a focus on structure than on meaning. I think this has been in reaction to the emphasis on abstract structure in generativist theory and the claim that children could not learn this structure from what they hear. Usage-based researchers have been concerned to show how children can indeed abstract a grammar from the language that they hear, and to argue that generativist theories are not able to solve the ‘linking problem’ of how the hypothesized Universal Grammar interacts with the input to produce the grammar of the specific language (Ambridge, Pine, & Lieven, 2014).

A great deal of empirical evidence has shown: (1) the strong relationships between the language that children hear and the course of their language development; and (2) that children’s language builds up from low-scope patterns and heuristics to an increasingly schematic and abstract network of constructions. To build a comprehensive and psychologically realistic account of children’s language development we now need to concentrate on identifying the processing mechanisms that are involved; to seriously address the relationship between meaning and form; to account for individual differences in learning; and to extend our research to languages that provide specific challenges to the present state of our theories.

Lourdes Ortega: ethics, politics & research

Malta, Doctoral Summer School 14 June, 2019

What is a bilingual individual?

Knowledge worth knowing to whom, for what purposes, in whose interest? (Ortega, 2019)

QUAN research can also adopt an ethical stand.

Train yourself in statistics that allow you to bypass fixed idea of native/non-nativeness.

Research design can be ethical and political.

Use research discourse for affirmation not for failure.