WCCFL25 abstract 1 of 2
Stages of OT Phonological Acquisition and Error-Selective Learning
This paper provides an answer to the question of how an OT learning model can be guaranteed to reach
its target grammar while also approximating observed stages of natural L1 acquisition along the way.
My starting point is an error-driven learner that uses the Recursive Biased Constraint Demotion
(RBCD) algorithm (Tesar and Smolensky 2000, Prince and Tesar 2004). The RBCD algorithm is very
good at learning OT grammars, because it keeps track of all its errors in the "Support" a table that
records which constraints prefer the target form or the error, as in (1) and then reasons from the
Support's data to necessary constraint rankings. What RBCD does not attempt to do is model the stages
by which children re-rank as they learn their L1 in natural settings. Any intermediate ranking that the
RBCD learner goes through can only last until one error proves it wrong: RBCD cannot learn partially
from data. However, numerous studies of child acquisition show that learning does involve gradual
progress away from the initial state of Markedness >> Faith (e.g. Fikkert 1994, Gnanadesikan 1999/2004,
Levelt, Schiller and Levelt 1999) even when children have clearly made the errors that would push a
pure RBCD learner all the way to the correct grammar. For example, if a learner adds to its Support an
error on a word with a complex coda, as shown in (2), it will immediately learn that both NoCoda and
NoComplexCoda rank below Faithfulness; it cannot choose merely to demote NoCoda. Yet as Levelt et
al point out: children learning Dutch, who encounter and attempt many frequent words with complex
codas, nevertheless go through stages where their productions are faithful only up to singleton codas,
reflecting an intermediate ranking of NoComplexCoda >> Faith >> NoCoda (see (3)).
The proposal here is that stages of acquisition like (3) should be modeled by retaining the RBCD
algorithm for re-ranking, but combining it with a pickier method of choosing errors to learn from. I will
call this approach Error-Selective Learning. Under this approach, making an error does not immediately
trigger an update of the Support or a cycle of re-ranking: errors are merely stored in an Error Table like
(4a). Instead, learning is only triggered when some constraint in the Error Table overcomes the "violation
threshold", meaning that it has assigned Ls to more than some number of different forms in the Error
Table. Once a constraint has exceeded the violation threshold, learning proceeds in two steps. First, the
learner determines one row in the Error Table to be the best error to learn from; second, the learner adds
just the best error to the Support and then applies RBCD to that Support to find a new ranking. The Error-
Selective learner chooses as its best error that row in the Error Table which:
a) has an L assigned by the constraint exceeding the violation threshold, and among those, the one that
b) has the fewest Ls assigned by other Markedness constraints, and among those, the one that
c) has the most Ws assigned by Faithfulness constraints.
After adding this error to the Support and using RBCD to find a new constraint ranking, the learner
empties its Error Table and starts from scratch, making errors with its new grammar and adding them to
the Error Table until the violation threshold is overcome by some constraint and learning begins again.
In the paper, I demonstrate how slowly building up the Support in this way provides RBCD with
just the right errors to produce the kinds of stages observed in natural learning. With respect to the
example above, an Error-Selective learner of Dutch can easily pass through the intermediate grammar in
(3). The criterion in (b) above prefers an error which violates e.g. NoCoda while not violating other
Markedness constraints, like NoComplexCoda; as (4b) shows, learning from such errors will result in a
partially-faithful ranking which protects singleton but not complex codas.
Error-Selective RBCD learning is an alternative to existing OT learning models that produce
intermediate stages, which do so by encoding probabilistic information directly into the grammar (e.g. the
Gradual Learning Algorithm for learning stochastic OT: Boersma and Hayes 2001, Boersma and Levelt
200X, Curtin and Zuraw 2002; see also the Maximum Entropy learner used by Goldwater and Johnson
2003). The current approach is therefore proposed in part to remove one argument against using the
classic, strict-domination view of OT in modeling realistic learning. In addition, the proposal also makes
novel predictions about the ways in which input frequencies can and cannot influence the course of
phonological development.
WCCFL25 abstract 2 of 2
Data (attested production errors taken from Levelt, Schiller and Levelt, 1999)
(1a) Jarmo at 1:5,2, learning Dutch:
/pus/ [pu] `cat'
(1b) How Jarmo's production in (1a) would be recorded in the Support:
Input Winner ~ Loser NoCoda NoComplexCoda Max-Seg
/pus/ pus ~ pu L e W
(2a) An entry in the Support (from a hypothetical error in production of Dutch [erst], `first')
Input Winner ~ Loser NoCoda NoComplexCoda Max-Seg
/erst/ erst ~ e L L W
(2b) The grammar that RBCD learns from the winner~loser pair in (2a): fully faithful
/erst/ Max-Seg NoCoda NoComplexCoda
erst * *
et **! *
e ***!
(3a) Cato at 1;10,11, learning Dutch:
/erst/ [it] `first'
(3b) The intermediate grammar that produces [it] in (3a): partially faithful (ignoring vowel quality)
/erst/ NoComplex Max-Seg NoCoda
Coda
erst *! *
it ** *
e ***!
(4a) A sample Error Table, when NoCoda has exceeded the violation threshold
Input Winner ~ Loser No NoComplex Max-Seg
NoCoda
Complex Coda
Onset
best error /pus/ pus ~ pu e L e W
/erst/ erst ~ e e L L W
/plant/ plant ~ ba L L L W
(4b) The grammar that RBCD learns from the `best' error chosen from (4a):
(i) fully faithful to singleton codas... (ii) ... partially faithful to complex codas
/pus/ NoComplex Max-Seg NoCoda /erst/ NoComplex Max-Seg NoCoda
Coda Coda
pus * erst *! *
pu *! it ** *
e ***!
Selected References
Boersma, P. and B. Hayes, 2001. "Empirical tests of the gradual learning algorithm." Linguistic Inquiry 32.
Curtin, S. and K. Zuraw, 2002. "Explaining Constraint Demotion in a Developing System." Proceedings
of BUCLD26. Somerville, MA: Cascadilla.
Levelt, Shiller, and Levelt, 1999. "A Developmental Grammar for Syllable Structure in the Production of
Child Language." Brain and Language 68.
Prince, A. and B. Tesar, 2004. "Learning Phonotactic Distributions." In Kager, Pater & Zonneveld (eds.)
Fixing Priorities: Constraints in Phonological Acquisition. Cambridge, UK: CUP.
Tesar B. and P. Smolensky, 2000. Learnability in Optimality Theory. Cambridge, MA: MIT Press.