Spanish diminutive formation without rules or constraints1
Abstract Spanish diminutive formation is analyzed in terms of analogy, more precisely within the computationally explicit framework of Analogical Modeling of Language (AML; Skousen, 1989, 1992).
Accordingly, all known diminutives are presumed to be stored in the mental lexicon as completely formed units with associative links to their base forms. 2460 diminutive types were identified in various corpora and served as both the analogical database and the test items. When memory is unhampered by noise in the system, the probability that a previously known form will be chosen as the diminutive of its base is 100%. However, a simulation was performed in which all of the 2460 diminutives were treated as if they were previously unknown. The analogical influence of other forms allowed the correct diminutive form to be chosen in 92% of the cases. What is more, roughly half of the errors were found to be actually attested forms, which raises the success rate to 96%. Another simulation was performed which demonstrates that individual and dialectal differences in diminutive formation arise due to differing contents of the mental lexicon across speakers, as well as to the influence of competing gangs of phonologically similar base forms.
1 1. Introduction. The formation of diminutive variants of nouns, adjectives and certain adverbs is a highly productive process in Spanish. Diminutives express concepts such as familiarity, small size, and disdain (see Zuluaga 1993 for a discussion of the semantics of diminutives).
The accidental crusade: The Spanish American War The Spanish-American War was brief, but it became the beginning of the American overseas empire, formal and informal. For Several centuries Spain remained the World's empire and its colonies were spread worldwide. But by the end of the nineteenth century only few Spanish possessions remained in the Pacific, Africa and West India. Most part of the ...
However, the purpose of present study is not to investigate their semantic traits, but rather, to account for the allomorphy of the diminutive suffix. Several suffixes exist, (-ito, -illo, -zuelo, -ico, -uco), but -ito is the most commonly occurring, and the one which most discussion of the subject has focused on. For this reason, only the allomorphy of -ito will be considered. Diminutive formation has been the topic of a number of investigations. For example, Jaeggli (1980) discusses diminutives in Uruguayan Spanish from a classical generative standpoint, while Crowhurst (1992) and Prieto (1992) argue that they are to be dealt with in terms of prosodic constraints. Elordieta and Carreira (1996) provide an analysis within Optimality Theory, while Ambadiang (1996, 1997) makes the case that diminutive formation belongs to the realm of morphology instead of phonology. Harris’ (1994) paper is primarily a critique leveled at Crowhurst’s account, although many of his points are equally applicable to Prieto’s work as well. He argues that the high degree of lexical idiosyncrasy between competing diminutive forms suggests that they are not predictable on a phonological basis. Instead, he proposes that each word is marked in the lexicon as to which diminutive form(s) it will take (1994:185).
In one regard, the present study follows Harris. That is, if one assumes that a base specifies which diminutive(s) it will take, that is similar to saying that the base is associated with its diminutive form, both of which have individual representation in the mental lexicon. The difficulty with this position is that it cannot account for the productive aspect of diminutive
2 formation. Since base forms cannot be learned with their diminutive allomorphy prespecified, how does one go about producing a diminutive form s/he has never before heard or read? There must also be some mechanism for production, and phonological factors appear to play an important part in determining the phonological shape a diminutive form will take. The purpose of the present paper, then, is to demonstrate that diminutive formation may be accounted for without recourse to highly abstract underlying representations, rules, or constraints, but by analogy to other fully specified pairs of bases and their corresponding diminutives in the mental lexicon. The dialectal and individual variability which exist in regard to diminutive allomorphy will also be accounted for. The remainder of the paper is structured as follows. Sections 2 and 3 lay out the theoretical background and the framework upon which the present study is based. Sections 4 and 5 describe a database of diminutive forms that was compiled from a search of about 51 million words. The allomorphy displayed by each diminutive is also discussed. The resulting database is an essential part of the analogical simulation of diminutive formation that is described in Section 6. An example of how analogy may account for dialectal differences in diminutive formation appears in Section 7.
It's not my p[a per/ I just find it in the internet/Abstract This document defines syntax for representing grammars for use in speech recognition so that developers can specify the words and patterns of words to be listened for by a speech recognizer. The syntax of the grammar format is presented in two forms, an Augmented BNF Form and an XML Form. The specification makes the two representations ...
2. Theoretical Background. In the traditional generative approach to language, the lexicon is presumed to contain morphemes, and only those aspects of words that are unpredictable. Morphologically complex words are assembled, and predictable features of a word are added by means of rules. This state of affairs is deemed necessary based on the assumption that the amount of storage space available in the brain is limited. Accordingly, language acquisition is a matter of tacitly deriving the rules of a language based on the linguistic input received, in
3 combination with genetically inherent linguistic abilities. This view results in a very minimal lexicon, and a requires great deal of computation. A difficulty with this theoretical stance is that much of the machinery required for computation, such as abstract underlying representations, the cycle, and underspecification have been called into question (Burzio 1996; Cole 1995; Cole and Hualde 1998; Steriade 1995).
In addition, psychological correlations to such mechanisms is highly dubious (Eddington 1996b; Lamb 2000).
The opposing view is that the lexicon includes vast amounts of stored information that is redundant and predictable, including detailed phonetic information about individual word tokens (Brown and McNeill 1966; Bybee 1994; Pisoni 1997).
This possibility was suggested at an early period in generative history (Halle 1973; Jackendoff 1975), and several more recent theoretical proposals assume that most known words are stored as wholes in the lexicon (Butterworth 1983; Bybee 1985, 1988, 1998; Stemberger 1994).
The classroom is a dynamic environment wherein nothing is constant except change and the need to continually adapt. This environment affects both students and teachers; students are developing physically, cognitively, and emotionally. Teachers must respond to these changes in the students by adapting the environment, curriculum, instructional style and methods, and classroom management techniques. ...
The psychological literature also contains empirical evidence to support massive lexical storage (e.g. Alegre and Gordon 1999; Baayen, Dijkstra, and Schreuder 1997; Bybee 1995; Manelis and Tharp 1977; Sereno and Jongman 1997).
In fact, storage may go beyond individual words and encompass recurrent word combinations as well as entire phrases (Bod 1998; Bybee 1998; Pawley and Syder 1983).
It also appears that storage limitations on memory are not as problematic as previously supposed. For instance, Palmeri, Goldinger and Pisoni (1993) and Goldinger (1997) provide evidence which suggests that individual word tokens are stored in long-term memory. This position, nevertheless, is not without difficulties of its own. Language is characterized by its productivity. If all forms are merely listed, how are new and previously unknown forms processed? Generally, those who maintain massive storage suggest an
4 analogical process of some sort to account for productivity, but the exact nature of the analogical process invoked is, more often than not, left unspecified. The present study incorporates an explicit model of analogy that fills this void. It entails storage of fully specified pairs of bases and their corresponding diminutives, and a precise procedural algorithm for choosing the correct diminutive allomorph when the diminutive form is novel or temporarily unaccessible from memory.
3. Analogical Modeling of Language (AML).
AML is a model designed to predict linguistic behavior on the basis of stored memory tokens (Skousen 1989, 1992, 1995, 1998).
In this regard, it is similar to other exemplar-based models (Aha, Kibler, and Albert 1991; Medin and Schaffer 1978; Riesbeck and Schank 1989; see Shanks 1995 for an overview of exemplar models; see Daelemans, Gillis, and Durieux 1994 for a comparison of AML and Aha et al.).
AML makes its predictions on the basis of a given context. A given context is a set of variables that represents linguistic information about the entity whose behavior is being predicted. These variables may represent a phoneme in a certain position in a word, a part of speech, or a sociolinguistic or morphological variable. The reader is referred to Skousen (1989, 1992) for a detailed treatment of the AML algorithm, but a brief sketch of the model is in order. For the sake of simplicity, let us assume that the given context contains information about a single word whose behavior we want to predict. AML searches the database, (which represents the mental lexicon), for words which share variables with the given context, and creates groups of database items called subcontexts. Of course, words which share more variables with the given context will appear in more subcontexts. Subcontexts are further
Psychological Attitudes towards Human Behavior Psychological attitude towards variety of issues is an important factor determining human behavior in certain life situation. In this research we are going to analyze the book by Victor Frakl called Mans search for meaning in order to find out possible attitudes towards human behavior and how it can be expressed. This book is giving the reader a ...
5 combined into more comprehensive groups called supracontexts. Upon inspection, some subcontexts will be homogenous, that is, the members ‘agree’ or exhibit the same behavior. (Behavior in this sense could mean that the members are all of the same syntactic class, take the same suffix, undergo the same phonological process, etc.) Other subcontexts will have ‘disagreements’ in that they contain members with differing behaviors; these subcontexts are said to be heterogenous. By minimizing disagreements and eliminating members of heterogenous subcontexts, database items belonging to the most clear-cut areas of contextual space (homogenous subcontexts) are available to exert their influence on the behavior of the given context. Three important effects result from the application of AML’s algorithm (Skousen 1995: 217).
The gang effect is obtained because when there is a large group of items which are similar to the given context, each member is available as a potential analog. Database items which have a great deal in common with the given context will appear in many different subcontexts, and have a greater chance of affecting the behavior of the given context in comparison to those items which have less in common. This is called proximity. Finally, heterogeneity occurs when an item in the database is eliminated from consideration as an analog because there is another item, with a different behavior, that is closer to the given context. The analogical set is arrived at once all members of heterogenous subcontexts have been eliminated. AML uses the items in the analogical set to calculate the probability that the given context will be assigned one of the behaviors manifest in the items in the database. In general, what AML calculates is that the behavior of the words most similar to the given context predicts the behavior of the given context, although the behavior of less similar words has a small chance
In the system of linguistic means vocabulary is one of the most important components of linguistic competence. Knowing a word is the key to understanding and being understood. This proves the urgency of the topic given above. The importance of the lexis has been emphasized and the problem of its acquisition has been surveyed in the works of leading method ologists, scientists and teachers such as ...
6 of applying, as long as those words appear in homogenous subcontexts. It is important to note that AML predicts the behavior of one given context on the basis of the behavior of lexical items in the analogical set. All predictions are made locally, and no global generalizations are abstracted from the data. There are two ways in which the analogical set may be used (Skousen 1989:82).
The first, called selection by plurality, is used to determine the ‘winner.’ Accordingly, the most commonly occurring behavior in the analogical set is applied to the given context. This is similar to the way in which a connectionist model overcomes competing influences and settles on a single output. In a nearest neighbor approach which identifies more that one neighbor, the behavior demonstrated by the majority of the nearest neighbors is declared the winner. Of course, not all research questions involve deciding which behavior ultimately beats its competitors. Measuring leakage between behaviors is often of theoretical interest as well. Extreme cases of leakage occur when one word may exhibit two or more behaviors. For example, Wulf (1998) demonstrates how AML is able to predict leakage between alternating plural forms of certain low frequency German nouns. In less extreme cases, leakage indicates the direction slips-of-the-tongue and/or language change may progress. Random selection allows one to determine if an item is mostly surrounded by other items with the same behavior, or the degree to which there are other items with different behaviors bearing similarities to the item in question. Random selection uses the probabilities calculated by the algorithm, that a specific behavior will apply to a given item. It essentially involves randomly selecting one of the members of the analogical set, and applying the behavior of that member to the given context.2 Behaviors that are more frequent in the analogical set have a higher probability of applying.
7 These two types of selection may also apply to nearest neighbor models in a similar fashion. Assume, for example, that five neighbors have been chosen for a given context, one of which has behavior A and four behavior B. Nearest neighbor models most often select by plurality and would declare B to be the winner. However, random selection shows that the given context is not completely surrounded by other items with behavior B. It also demonstrates some leakage toward behavior A. Since behavior is determined in terms of a given context, no global characterization of the data is made, such as is the case with rule, constraint, and prototype approaches. This implies that the variables which may be important in determining the behavior of one given context may be not be important in determining the behavior of a different one (see Skousen 1995: 223-226 for an example).
How do humans react to the ferocious roar of the bear? Why do humans have headaches and stress out when come upon to do certain tasks? The environment, genes and chromosomes, and the central nervous system are factors to answer these questions. The human psychological characteristics and behavior are built by a combination of these factors. Psychologists use these factors to determine peoples ...
Perhaps the most attractive part of an analogical approach is its simplicity. It is based on the fairly uncontroversial idea that words are stored in the mind and retrieved as necessary. That groups of similar words can effect the behavior of other words with similar characteristics is well-attested in the psycholinguistic literature (e.g. Bybee and Slobin 1982; Stemberger and MacWhinney 1988) There is also ample evidence that behavior may be based on stored exemplars (Chandler 1995; Eddington 2000; Hintzman 1986, 1988; Hintzman and Ludlam 1980; Medin and Schaffer 1978; Nosofsky 1988).
4. Selection of the Database. In an analogical approach to language, a database of linguistic forms is needed from which analogs may be selected. For this reason, a database of existing diminutives was compiled. There are, however, other reasons for considering a large number of
8 instances when attempting to account for a linguistic phenomenon. Basing an analysis on a limited number examples is always risky since one is often predisposed to find examples which coincide with one’s particular preconceived assumptions, and to overlook others. For example, Morin (1999) demonstrates that the criteria proposed to distinguish between Spanish words which end in word markers, and those that do not, are not supported when a much larger number of examples is considered. In a similar vein, Eddington (1996a) finds that when a large number of instances is considered, the relationship between certain derivational suffixes and diphthongization in Spanish word stems is far from binary, as previous investigation had considered it to be. In the literature on diminutives, it is often unclear from what source the authors derive the diminutives on which they base their analyses; most dictionaries include few citations for diminutive forms, and those that do appear often have a lexicalized meaning apart from that of the diminutive. In Prieto’s study (1992), some diminutive forms were elicited from native Spanish speakers by means of a survey. In the present study, diminutives were extracted from several corpora: the Alameda and Cuetos frequency dictionary (1995; 5 million words), the LEXESP tagged frequency dictionary (Sebastián, Cuetos, and Carreiras, in preparation; 3 million words3), a corpus of spoken peninsular Spanish (Marcos Marín, no date a; 1 million words), a corpus of Argentine Spanish (Marcos Marín, no date b; 2 million words).
In addition to these sources, Mark Davies of Illinois State University graciously provided me with the diminutive forms from his corpus project totaling 39.8 million words.4 Therefore, the resulting diminutives were gleaned from a pool of 50.8 million words. Both written and spoken registers are included, although spoken sources comprise only about 7% of the sample. Samples from every Spanish
9 speaking country, (with the exception of Equatorial Guinea), are included, but no effort was made to balance each country’s representation in the database. A total of 2466 diminutive types were identified. Type frequency was used in the present study since research suggests that type frequency is more relevant to the analogical extension of a pattern than token frequency (Bybee to appear; Wang and Derwing 1986, 1994)
5. Diminutive Allomorphy in the Database. Each diminutive was grouped by hand according to the allomorphic relationship that it has with its base form. In this way, 13 major allomorphs were identified.5 However, 13 of the database items demonstrate unusual changes in the diminutives which are not found in any of the 13 groups, and which are found in three or fewer items. For example, the proper names Antonio and Antonia have diminutives with a palatalized nasal: Antoñito, Antoñita. The diminutives of caliente6 ‘hot’ and independiente ‘independent’ are odd in that they lose their diphthongs yielding calentito and independentitas. This is an unusual outcome given the fact that every other word with a diphthong maintains it in the diminutive: prieto ‘tight’ > prietito, cuento ‘story’ > cuentito. Additionally, other diminutives were found which must be considered isolates since they do not fit into any of the 13 major categories described above: fútbol ‘football’ > futbito, pie ‘foot’ > piececito, café ‘coffee’> cafelito, cafetito, dos ‘two’ > dositos, José ‘Joseph’ > Joselito, azúcar ‘sugar’ > azucarlito, lejos ‘far away’ > lejecitos, and diagnosis ‘diagnosis’ > diagnosito. These items were also categorized and included in the database. In addition, the base form of valsecito ‘waltz’ was included in two different categories since it was impossible to determine whether the base form was vals or valse. Once the database was competed, six items were chosen at random, and deleted, in order
10 to yield a number of items divisible by ten. The result was a database containing 2460 different diminutives. With the exception of the odd items just discussed, the remaining 99.5% of the database items fall into one of 13 major categories. A circled V or S indicates that that particular element of the base form does not appear in the diminutive form: (1) -±ITO(S): -ito(s) is added to the singular base form, replacing the final vowel: V minuto ‘minute’ > minutito, elefante ‘elephant’ > elefantito. (2) -±ITA(S): -ita(s) is added to the singular base form, replacing the final vowel: V galleta ‘cookie’ > galletita, Lupe ‘proper name’ > Lupita. (3) -±ECITO(S): -ecito(s) is added to the singular base form, replacing the final V vowel: vidrio ‘glass’ > vidriecito, quieto ‘peaceful’ > quietecito. (4) -±ECITA(S): -ecita(s) is added to the singular base form, replacing the final V vowel: yerba ‘grass’ > yerbecita, piedra ‘stone’ > piedrecita. (5) -CITO(S): -cito(s) is added to the singular base form: traje ‘suit’ > trajecito, pastor ‘shepherd’ > pastorcito. (6) -CITA(S): -cita(s) is added to the singular base form: joven > ‘young girl’ jovencita, llave ‘key’ > llavecita. (7) -ITO(S): -ito(s) is added to the singular base form: normal ‘normal’ > normalito, Andrés ‘Andrew’ > Andresito. (8) -ITA(S): -ita(s) is added to the singular base form: nariz ‘nose’ > naricita, Isabel ‘Isabella’ > Isabelita. (9) -ECITO(S): -ecito(s) is added to the singular base form: pez ‘fish’ > pececito, rey
11 ‘king’ > reyecito. (10) -ECITA(S): -ecita(s) is added to the singular base form: flor ‘flower’ > florecita, luz ‘light’ > lucecita. (11) -±±7ITOS: -itos is added to the singular base form, replacing the vowel and V S false plural morpheme: lejos ‘far away’ > lejitos, Marcos ‘Mark’ > Marquitos. (12) -±±ITAS: -itas is added to the singular base form, replacing the vowel and V S false plural morpheme: Lucas > ‘Luke’ Luquitas, garrapatas ‘tick’ > garrapatitas. (13) -±CITA(S): -cita(s) is added to the singular base form, replacing the final vowel: V jamona ‘fat woman’ > jamoncita, patrona ‘patron saint’ > patroncita. Table 1 categorizes the contents of the database in terms of a number of important features. ++Insert Table 1 Here++ It is important to note that in some cases, diminutive formation would appear to produce sequences of [jí] in the rhyme of the penult syllable: [lím.pjo] ‘clean’ > *[lim.pjí.to], [ar.ma.rjo] ‘closet’ > *[ar.ma.rjí.to]. However, [ji] is a non-occurring rhyme in Spanish which is why the glide does not appear (Elordieta and Carreiras 1996:55; Harris 1994:182; Prieto1992:196).
Instead, the corresponding diminutives are [lim.pí.to] and [ar.ma.rí.to]. Sequences of [i+i] are also attested in one diminutive in the database (tiíta ‘aunt’), but even this is unusual enough that the sequence results in a single high vowel in alternate diminutive forms: tito,’uncle’ tita ‘aunt’. ++Insert Table 2 here++ An analysis of the resulting database reveals a number of interesting facts. First, it contains quite a few doublets, that is, different diminutive forms of the same base form (Table 2).
This is not unusual given the extensive corpora from which the diminutives were gleaned, along
12 with the fact that the database cuts across many dialects. As Prieto (1992:170) indicates, one of the major dialectal differences involves how the diminutives of bisyllabic words containing one of the diphthongs /je/ or /we/8 are formed; 44% of the doublets have stems containing diphthongs of this sort. Another thing which is supported by the database, is the tendency for bisyllabic -e final words to form diminutives with the addition of -cito/a (Category 5 and 6; Crowhurst 1992; Elordieta and Carreira 1996; Prieto 1992).
Only 13 of the 90 base words of this type have diminutives that run counter to this tendency (e.g. leche ‘milk’> lechita).
On the other hand, base words with three or more syllables generally take diminutives with the addition of -±ito/a V (Category 1 and 2).
The sole exception found is retoque ‘retouch’> retoquecito. However retoquito is also an attested form (see Table 3) In contrast to the evidence from the corpora, Prieto’s (1992:174) 12 informants produced diminutives, such as retoquecito, as possible variants of 12 of the 13 test words they were presented (e.g. estuche ‘case’ > estuchecito; chocolate ‘chocolate’ > chocolatecito).
Perhaps the oddest of all diminutive forms are those which appear to involve infixation of -it- before a word final -or or -ar: Víctor ‘Victor’ > Victítor, azúcar ‘sugar’ > azuquítar, ámbar ‘amber’ > ambítar. Their unusual status is evident in that rule accounts of these forms require modifications in order to yield the correct outcome (Crowhurst 1992; Prieto 1992; Jaeggli 1980).
Furthermore, exactly what Spanish dialect one may find such diminutives in is unclear in the literature. The fact that not one instance of this kind of diminutive was found in 51 million words of text suggests that if they exist at all, they are extremely uncommon, or possibly lowprestige forms that would not be found outside the familiar spoken register.
13 6. The role of the database in analogy. The critical part of this study is determining the extent to which the AML algorithm is able to account for diminutive allomorphy. To this end, information about the base form of each diminutive was converted into a series of variables. The base form is the uninflected noun, adjective, gerund or adverb from which the diminutive is derived. For example, the base form of ratoncito ‘little mouse’ is ratón. The variables were chosen in accordance with the principles of distinguishability and proximity (Skousen 1989:52).
Proximity involves choosing those variables that are closest to the phenomenon that is being predicted. Since diminutive formation occurs word-finally, the most relevant features are those that appear toward the end of the word. Therefore, the variables included the following information about each base form: 1) the existence, and stressed or unstressed status, of the final three syllables; 2); the gender of the word: masculine, feminine or none in the case of adverbs and gerunds; 3) the word’s final phoneme; 4) the phonological content of the antepenult rhyme and final two syllables of the word. The criterion of distinguishability suggests that each word should be represented with enough variables that it is unique from every other word in the database. One thing that makes it impossible for each database item to have a unique set of variables is the existence of doublets (Table 2).
For example, both ratonito and ratoncito are attested diminutives of the same base form ratón. Therefore, both forms are represented with the same set of variables. Of course, one entry for ratón specifies that its diminutive is of the sort found in category 5 (i.e. ratoncito, see section 5), while the other entry indicates that its diminutive is of the type found in category 7 (i.e. ratonito).
However, category markers are not treated as variables when analogies are made, instead, they specify the kind of relationship that is found between a base and its diminutive.
14 In section 5, 13 major categories of diminutive types are described. For example, both pueblo, and cuerno are specified as members of category 3 in the database. This means that the morphophonemic relationship that holds between pueblo ‘town’ and its diminutive pueblecito is the same one that hold between cuerno ‘horn’ and cuernecito. Therefore, if these two words are chosen as analogs for the word cuervo ‘crow,’ diminutivization by analogy is assumed to take the form of a proportional analogy: pueblo : pueblecito, cuerno : cuernecito :: cuervo : ? Exactly how speakers derive the diminutive (e.g. infixation of -ecit-, or deletion of -o and suffixation of -ecito) is largely unimportant. However, Bybee’s (1988) conception of morphology as networks of links between stored lexical items suggests another way of viewing the analogical process. Consider Figure 1 which represents a very simplified state of affairs. ++Insert Figure 1 Here++ The solid lines conjoining base forms and diminutives indicate phonological similarities between the stored bases and their diminutives which have already formed links due to their semantic similarity. In like manner, the diminutive suffixes of each word are linked to each other, as well as to other diminutives. It is these relationships that are assumed when pueblo and cuerno are marked as taking category 3 diminutives. The dotted lines between cuervo, cuerno, and pueblo represent phonological similarities that are activated when pueblo and cuerno are chosen as analogs for cuervo. The next step builds on Skousen’s (1992) proposition that once constructed, analogical sets may be stored. Or perhaps it is better to assume that the set is not stored per se, but that the
15 members of a set come to form links with each other based on their shared similarities. Therefore, pueblo and cuerno have presumably cooccurred in other analogical sets in the past, which is why there is a link between the diphthong they have in common. Once cuervecito has been chosen as the diminutive of cuervo, it will form new links with pueblo and cuerno on the basis of their phonological similarities. It is in this way that connections are formed between words that are semantically and phonologically similar .
7. The AML simulation. A ten-fold cross validation simulation was performed. This entailed dividing database into ten groups of equal size. One group was then removed and its members served as the test cases. The members of the remaining nine groups comprised the training set from which analogs were sought. Each group served as the test set only once. If a member of the training set matched a test item exactly, it was not considered as a possible analog. In this way, the influence of one member of a doublet on another was eliminated. Selection by plurality (see Section 3) was assumed since the goal in this simulation was to predict a winner from among the possible outcomes. Under these conditions, the AML algorithm assigned the wrong diminutive suffix to only 198 items, resulting in a success rate of 92%. What this indicates is that there is a great deal of analogical consistency; base forms which take the same diminutive suffix have many features in common, enough that the large majority of them can serve as analogs for each other under conditions of imperfect memory, or if the items are treated as novel. Some errors involved incorrect diminutive allomorphy, but correct gender markers: parte ‘part’ > *partita. Others erred in terms of gender assignment: carnal fem. ‘buddy’ > *carnalito, sofá ‘couch’ > *sofita.
16 Other errors entailed applying a suffix such as -±ito/a to words without final vowels, such as V verdad ‘truth’. Nevertheless, many of the errors appeared to be plausible diminutive forms. The doublets demonstrated this in that errors on one member of the doublet almost always entailed assigning it the diminutive suffix of the other member. In addition, in 65% of the cases of misassignment of a doublet, the second most probable behavior was the correct one. These results occurred in spite of the fact that when tested, both members of a doublet were excluded from the database, and were unable to serve as analogs for each other. In order to determine if other erroneous diminutive forms predicted by AML were actually well-formed diminutives in some dialect of Spanish, the World-wide Web was consulted. All erroneous forms, (with the exception of errors of the type found in verdad), were sought on Spanish language pages. Of the 198 errors, attested forms of 104 were found, either as an attested doublet in the database (see Table 2) or on a Spanish language web page (Table 3).
++Insert Table 3 about here++ What this demonstrates is that 52% of the errors calculated by the model are not true errors, but merely alternative diminutive forms. This is a clear indication that the model has captured the essence of diminutive formation. When the unattested diminutive forms that the model predicts are subtracted from the total number of database items, the overall success rate of the model reaches 96.2%. It would be desirable to be able to compare the results of the analogical simulation with success rate of one of the generative approaches already cited. Regrettably, a straightforward and fair comparison of this sort is not possible for a number of reasons. In none of the studies
17 were the rules and constraints designed to account for the full range of data found in the database. Elordieta and Carreiras’(1996) study is arguably the extreme case in this regard; it only includes diminutives demonstrating eight of the 13 major categories found in the database, it does not include a discussion of bisyllabic bases containing the diphthongs /je/ and /we/, and makes no mention of the existence of alternative diminutive forms of the same base. Another difficulty with making such a comparison is that some analyses (Crowhurst 1992; Jaeggli 1980) only cover diminutive allomorphy in a specific dialect. The use of abstract formal mechanisms that are not surface apparent is also troublesome. There is no doubt that by means of formalisms such as diacritics, underlying representations, and rule and constraint orderings, any of these analyses could easily be modified to account for all of the diminutives in the database, but it would be of interest to determine what predictive value these analyses have if their abstract aspects are eliminated. Unfortunately, formal mechanisms are such an integral part of these analyses that they may not be eliminated without severely hampering the predictive power. For instance, Crowhurst distinguishes between word final -e’s that are epenthetic and those that are terminal elements. Each one is associated with a different type of diminutive. Even Harris (1994), who argues from a generative standpoint, feels that Crowhurst’s use of a number of formal mechanisms in her study is ad hoc. Many of Harris’ criticism are equally applicable to Prieto’s study as well. As a result, he maintains that it is not possible to generate a diminutive on the basis of the phonological shape of the base form. Nevertheless, the present study indicates that an analogical approach is able to achieve this goal. One question that the simulation does not address, is which features of the base form are most important in determining the form of the diminutive. AML only makes predictions on the
18 basis of a given context. Therefore, an inspection of the analogical set for a given context allows one to find the most relevant variables for that given context alone. That is to say, an overall characterization of the data is not readily obtainable in AML. Nevertheless, it may be computed using a different analogical algorithm called TiMBL (Daelemans et al. 1999; see Eddington, to appear, for a comparsion of AML and TiMBL on a similar diminutivization simulation.).
Accordingly, the most relevant variables, ordered from most to least relevant are: 1) the stressed or stressless status of the final syllable, 2) the gender of the base, 3) whether the base is monosyllabic or not, if not, the stressed/unstressed status of the penult syllable, 4) what phoneme appears word finally, as the nucleus or coda of the final syllable, 5) whether the base has two or fewer syllables, if not, the stressed/unstressed status of the antepenult syllable, 6) the phoneme(s) in the coda of the penult syllable, if any, 7) the phoneme(s) in the rhyme of the antepenult syllable, if any, 7) the phoneme(s) in the onset of the final syllable, 8) the phoneme(s) in the onset of the penult syllable. This hierarchy coincides to a great deal with the findings of other studies on Spanish diminutives; the most relevant variables in the base form are the number of syllables it contains, its stress pattern and gender, and its final phoneme. However, one must keep in mind that this global characterization does not preclude the possibility that a different hierarchy may hold when predicting the diminutive form of an individual base. For instance, in bisyllabic words containing /je/ or /we/ in the penult nucleus, (e.g. cuenta and hierba), the contents of the penult nucleus is a much more important factor than it is for other words without these diphthongs.
8. Variability between diminutive forms. One thing noted in most previous studies on Spanish
19 diminutives is that there is some variability in choosing diminutive forms with one suffix or another (Crowhurst 1992; Harris 1994; Jaeggli 1980;9 Prieto 1992).
Elordieta and Carreiras (1996), on the other hand, make no mention of diminutives that differ from those their analysis accounts for. Nevertheless, variation exists both between dialects, and within individual speakers. Consider the diminutives of words ending in -e, for example. In general, bisyllabic words of this type take diminutives which are formed by adding -cito/a to the base form (e.g. madre ‘mother’ > madrecita).
However, bases with three or more syllables generally belong to the -±ito/a categories (e.g. comadre ‘godmother’ > comadrita).
Nevertheless, there are V exceptions to this generalization. Crowhurst (1992:251) cites sangre ‘blood’ > sangrita, mugre ‘filth’ > mugrita, leche ‘milk’ > lechita, and hambre ‘hunger’ > hambrita. Harris (1994:183) cites other exceptions: tigre ‘tiger’ > tigrito, chile ‘chili’ > chilito, nene ‘boy’ > nenito. The database for the present study includes Pepito chocolatito/chocolatecito, estuche > estuchito/estuchecito, comadre > comadrita/comadrecita. Prieto (1992) and Crowhurst (1992) employ a number of different generative devices to account for alternating diminutive forms. For example, to account for some of the variation, Crowhurst (1992) suggests that some speakers have a minimal word template composed of two bisyllabic feet, while others do not. This is similar to Prieto’s (1992) position. Crowhurst explains the alternation between diminutives such as dientito ‘tooth’ and dientecito by proposing that in the former, the diphthong of the stem is resyllabified in the course of the derivation, in
20 such a way that each of its components belong to separate syllables. In the case of dientecito, no such resyllabification occurs. How is such variation accounted for in an analogical model? According to analogy, it is due to differences in the lexicon. Dialectal differences arise because, in the course of acquiring a language, a person adopts the diminutives that are commonly used in the surrounding speech community, and the form of these diminutives varies from dialect to dialect. To this point, this is a fairly tautological statement; dialectal variations exist because they do. However, the differing contents of the mental lexicon from dialect to dialect means that there is a different set of possible analogs on which to determine the diminutive form of new and previously unknown diminutives. An example should clarify this position. Prieto (1992:170) notes that one of the major dialectal differences has to do with the treatment of bisyllabic words containing the diphthongs /je/ and /we/ in the stem (e.g. diente > dientito or dientecito).
The present database does not purport to represent any particular dialect, however it may be employed to simulate dialectal differences. To this end, the database was modified. For Dialect A, all masculine bisyllabic words with /je/ or /we/ in the stem were marked as taking diminutive forms ending in -±ito, and all feminine words with the same V characteristics were given diminutive forms ending in -±ita. For example, the database entry V for cuento was modified so that its diminutive would be cuentito. The entry for vieja ‘old’ was marked as having the diminutive viejita. For Dialect B, these same words were considered to take -±ecito, or -±ecita depending on their gender (cuentecito, viejecita).
V V According to AML, if the diminutive forms of all of the items in the database are remembered with 100% accuracy, Dialect A will produce all of these diminutives with -±ito/a, V
21 and Dialect B with -±ecito/a. In and of itself, this is hardly an interesting outcome. However, V the way each dialect processes novel items is of interest. Table 4 contains the calculated probabilities that the diminutive form of several words (that do not appear in the database) will appear with either -ecito/a or -±ito/a allomorphy in the two simulated dialects.10 V ++Insert Table 4 here++ Modifying the database for Dialect A entailed eliminating the majority of the words that V demonstrate -±ecito and -±ecita allomorphy. It is not surprising, then, that almost no V analogical pressure is exerted by words of this type in the Dialect A simulation. The diminutives produced by Dialect B, in contrast, demonstrate more variability. Nine of the novel diminutives are favored to appear with -±ecito/a, with leakage toward -±ito/a, and three favor -±ito/a. V V V The probability of a diminutive with either type of allomorphy is about equal for siervo ‘servant’. The results of this simulation suggest an empirically testable hypothesis. The diminutives formed from bisyllabic bases containing the diphthongs /je/ and /we/, will demonstrate much more variation in Dialect B than in A. Prieto’s study (1992) suggests that Peninsular Spanish may approximate more closely Dialect B, while Bolivian Spanish may reflect Dialect A. Of course, any attempt to test this hypothesis should focus on the production of diminutives which are most likely to be novel, and less likely to be forms that are known. Not only does the phonological shape of the diminutives vary from one dialect to another, but individuals also demonstrate some degree of uncertainty regarding the diminutive form of certain words. From an analogical standpoint, this may be due to two sets of circumstances. In the first, the speaker has heard and/or produced two or more diminutives of the same base word, with different suffixes. In this case, the probability that one of the diminutive forms will be
22 chosen is proportional to the number of times it appears in the lexicon, in comparison with the other form(s).
If the diminutive form of a base word is completely novel, or if it is temporarily unretrievable from memory, analogy will calculate the base’s similarity to others that exist in the lexicon. If the word is completely surrounded by similar items which all form diminutives in the same manner, only one diminutive form will be produced. However, in some cases gangs of similar items with different behaviors may compete with each other resulting in variability or uncertainty. Several examples of this may be gleaned from the simulation presented in Section 6. ++Insert Table 5 here++ 9. Conclusions. The present study assumes that all known diminutive forms are stored in the mental lexicon as completely formed entities. Under conditions of perfect memory, the probability that a known form will be chosen as the diminutive of its base is 100%. However, base forms which take diminutives with the same allomorphy demonstrate a great deal of phonological similarity. This allows the allomorphy of novel diminutives to be predicted with a high degree of accuracy as well. The AML algorithm is able to correctly predict the shape of most items tested. In addition, about half of the errors it does produce are actually attested forms, which further supports the notion that diminutive formation may be explained as an analogical process. According to this analysis, individual and dialectal differences arise due to differences in the diminutive forms that exist in the mental lexicon, and the influence of competing gangs of phonologically similar items.
23 References Aha, David W., Dennis Kibler, and Marc K. Albert. 1991. Instance-based learning algorithms. Machine Learning 6.37-66. Alameda, José Ramón, and Fernando Cuetos. 1995. Diccionario de frecuencias de las unidades lingüísticas del castellano. Oviedo, Spain: University of Oviedo Press. Alegre, Maria, and Peter Gordon. 1999. Frequency effects and the representational status of regular inflections. Journal of Memory and Language 40.41-61. Ambadiang, Théophile. 1996. La formación de diminutivos en español: ¿Fonología o morfología? Lingüística Española Actual 18.175-211. ________. 1997. Las bases morfológicas de la formación de diminutivos en español. Verba 24.99-132. Baayen, Harald R., Ton Dijkstra, and Robert Schreuder. 1997. Singulars and plurals in Dutch: Evidence for a parallel dual-route model. Journal of Memory and Language 37.94-117. Bod, Rens. 1998. Beyond grammar. Stanford, CA: CSLI. Brown, R. and D. Mc Neill. 1966. The ‘tip of the tongue’ phenomenon. Journal of Verbal Learning and Verbal Behavior 5.325-337. Burzio, Luigi. 1996. Surface constraints versus underlying representations. Current trends in phonology: Models and methods, ed. by Jacques Durand and Bernard Laks, 97-112. Paris X and University of Salford:University of Salford Publications. Butterworth, B. 1983. Lexical representation. Language production, vol. 2, ed. by B. Butterworth, 257-294. London: Academic Press. Bybee, Joan. 1985. Morphology. Amsterdam: John Benjamins.
24 ________. 1988. Morphology as lexical organization. Theoretical approaches to morphology, ed. by Michael Hammond, and Michael Noonan, 119-41. San Diego: Academic Press. ________. 1994. A view of phonology from a cognitive and functional perspective. Cognitive Linguistics 5.285-305. ________. 1995. Regular morphology and the lexicon. Language and Cognitive Processes 10.425-55. ________. 1998. The emergent lexicon. Proceedings of the Chicago Linguistic Society, vol. 34, ed. by M. Gruber, C. Derrick Higgins, K. S. Olson, and T. Wysocki, 421-435. Chicago: Chicago Linguistic Society. ________. Phonology and Language Use. To appear. Stanford, CA: Cambridge University Press. Bybee, Joan L., and Dan I. Slobin. 1982. Rules and schemas in the development and use of the English past tense. Language 58.265-289. Chandler, Steve. 1995. Non-declarative linguistics: Some neuropsychological perspectives. Rivista di Linguistica 7. 233-247. Cole, Jennifer 1995. The cycle in phonology. The handbook of phonological theory, ed by John A. Goldsmith. 70-113. Cambridge, MA: Blackwell. Cole, Jennifer S., and José I. Hualde. 1998. The object of lexical acquisition: A UR-free model. Proceedings of the Chicago Linguistic Society, vol. 34, ed. by M. Gruber, C. Derrick Higgins, K. S. Olson, and T. Wysocki, 447-458. Chicago: Chicago Linguistic Society. Crowhurst, Megan J. 1992. Diminutives and augmentatives in Mexican Spanish: A prosodic analysis. Phonology 9.221-253.
25 Daelemans, Walter, Steven Gillis, and Gert Durieux. 1994. Skousen’s analogical modeling algorithm: A comparison with lazy learning. Proceedings of the International Conference on New Methods in Language Processing, ed. by D. Jones, 1-7. Manchester: UMIST. Daelemans, Walter, Jakub Zavrel, Ko van der Sloot, and Antal van den Bosch. 1999. TiMBL: Tilburg memory based learner, version 2.0, reference guide. Induction of Linguistic Knowledge Technical Report. Tilburg, Netherlands: ILK Research Group, Tilburg University. ).
Eddington, David. 1996a. Diphthongization in Spanish derivational morphology: An empirical investigation. Hispanic Linguistics 8.1-35. Eddington, David. 1996b. The psychological status of phonological analyses. Linguistica 36.17-37. ________. 2000. Spanish stress assignment within the analogical modeling of language. Language 76.92-109. ________. To appear. A comparison of two analogical models: Tilburg memory-based learner versus analogical modeling of language. To appear in an introductory text on Analogical Modeling of Language, ed. by Deryle Lonsdale, and Royal Skousen. Elordieta, Gorka, and María M. Carreiras. 1996. An optimality theoretic analysis of Spanish diminutives. Proceedings from the main session of the Chicago Linguistic Society’s 32nd meeting, ed. by Lise M. Dobrin, Kora Singer, and Lisa McNair. Chicago: Chicago Linguistic Society. Goldinger, Stephen D. 1997. Words and Voices: Perception and production in an episodic lexicon. Talker variability in speech processing, ed. by Keith Johnson and John W.
26 Mullennix, 33-65. San Diego: Academic. Halle, Morris. 1973. Prolegomena to a theory of word formation. Linguistics Inquiry 4.3-16. Harris, James. 1994. The OCP, prosodic morphology and Sonoran Spanish diminutives; A reply to Crowhurst. Phonology 11.179-190. Hintzman, Douglas L. 1986. Schema abstraction in a multiple-trace memory model. Psychological Review 93.411-428. ________. 1988. Judgements of frequency and recognition memory in a multiple-trace memory model. Psychological Review 95.528-551. Hintzman, Douglas L., and Genevieve Ludlam. 1980. Differential forgetting of prototypes and old instances: Simulation by an exemplar-based classification model. Memory and Cognition 8.378-382. Jackendoff, Ray. 1975. Morphological and semantic regularities in the lexicon. Language 51.639-671. Jaeggli, Osvaldo A. 1980. Contemporary studies in Romance languages: Eighth annual linguistics symposium on Romance Languages, ed by Frank Nuessel Jr., 145-158. Bloomington, IN: Indiana University Linguistics Club. Lamb, Sydney. 2000. Bidirectional processing in language and related cognitive systems. Usage-based models of Language, ed. by Michael barlow and Suzanne Kemmer, 87-119. Stanford, CA: CSLI Publications. Manelis, Leon and David A. Tharp. 1977. The processing of affixed words. Memory and Cognition 5.690-695. Marcos Marín, Francisco, (director).
No date a. Corpus oral de referencia del español
27 contemporáneo. Textual corpus, Universidad Autónoma de Madrid. . Marcos Marín, Francisco, (director).
No date b. Corpus lingüístico de referencia de la lengua española en Argentina. Textual corpus, Universidad Autónoma de Madrid. http://www.lllf.uam.es/~fmarcos/informes/corpus/coarginl.html. Morin, Regina. 1999. Spanish substantives: How many classes? Advances in Hispanic Linguistics, ed. by Javier Gutiérrez-Rexach and Fernando Martínez-Gil, 214-230. Somerville, MA: Cascadilla Press. Medin, Douglas L. and Marguerite M. Schaffer. 1978. Context theory of classification learning. Psychological Review 85.207-238. Nosofsky, Robert M. Exemplar-based accounts of relations between classification, recognition, and typicality. 1988. Journal of Experimental Psychology: Learning, Memory, and Cognition 14.700-708. Palmeri, Thomas J., Stephen D. Goldinger, and David B. Pisoni. 1993. Episodic encoding of voice attributes and recognition memory for spoken words. Journal of Experimental Psychology: Learning, Memory, and Cognition 19.309-28. Pawley, Andrew, and Frances Hodgetts Syder. 1983. Two puzzles for linguistic theory: Nativelike selection and nativelike fluency. Language and Communication, ed. by Jack C. Richards and Richard W. Smith, 191-225. London: Longman. Pisoni, David. 1997. Some thoughts on ‘normalization’ in speech perception. Talker variability in speech processing, ed. by Keith Johnson and John W. Mullennix, 9-32. San Diego: Academic.
28 Prieto, Pilar. 1992. Morphophonology of the Spanish diminutive formation: A case for prosodic sensitivity. Hispanic Linguistics 5.169-205. Riesbeck, Chris K., and Roger S. Schank. 1989. Inside case-based reasoning. Hillsdale, N.J.: Erlbaum. Sebastián, Núria, Fernando Cuetos, and Manuel Carreiras. In preparation. LEXESP: Creación de una base de datos informatizada de español. Report, Universitat de Barcelona. Sereno, Joan A., and Allard Jongman. 1997. Processing of English inflectional morphology. Memory and Cognition 25.425-37. Shanks, David R. 1995. The psychology of associative learning. Cambridge: Cambridge UP. Skousen, Royal. 1989. Analogical modeling of language. Dordrecht: Kluwer Academic. ________. 1992. Analogy and structure. Dordrecht: Kluwer Academic. ________. 1995. Analogy: A non-rule alternative to neural networks. Rivista di Linguistica 7.213-232. ________. 1998. Natural statistics in language modelling. Journal of Quantitative Linguistics 5.246-255. Stemberger, Joseph Paul. 1994. Rule-less morphology at the phonology-lexicon interface. The reality of linguistic rules, ed. by Susan D. Lima, Roberta L. Corrigan, and Gregory K. Iverson, 147-169. Amsterdam: Benjamins. Stemberger, Joseph Paul and Brian MacWhinney. 1988. Are inflected forms stored in the lexicon? Theoretical approaches to morphology, ed. by Michael Hammond and Michael Noonan, 101-16. San Diego: Academic Press. Steriade, Donca. Underspecification and markedness. The handbook of phonological theory, ed
29 by John A. Goldsmith. 114-174. Cambridge, MA: Blackwell. Wang, H. S. and B. L. Derwing. 1986. More on English vowel shift: The back vowel question. Phonology Yearbook 3, 99-116. ________. 1994. Some vowel schemas in three English morphological classes: Experimental evidence. In: In honor of William S-Y. Wang: Interdisciplinary studies on language and language change, 561-575. Taipei: Pyramid Press. Wulf, Douglas. 1998. An account of German plural formation using an analogical computer model. Manuscript, University of Washington. Zuluaga O., Alberto. 1993. La función del diminutivo en español. Thesaurus: Boletín del Instituto Caro y Cuervo: Muestra antológica 1945-1985, vol. 1, ed. by Rubén Paez Patino, 305-330. Santafe de Bogotá: Instituto Caro y Cuervo.
30 1. This study was carried out with the help of a grant from the National Science Foundation (#00821950).
2. Actually, one of the pointers in the analogical set is chosen, but the role of pointers in the algorithm has not been discussed in this summary description. 3. A newer version of LEXESP exists that contains 5.5 million words. 4. Details about these corpora are available at: . 5. Some diminutives, such as clasesitas, and tanquesito were found with the suffix *-sito/a(s).
These are obviously due to spelling errors in dialects that do not distinguish /s/ and /›/, and do not indicate an additional suffix which contrasts with -cito/a(s).
In these cases, the spelling was regularized. 6. Calientito is also attested. 7. In some words from groups 11 and 12, s represents what seems to be the plural morpheme since it appears word finally and follows a stressless vowel. In other words, such as cumpleaños, the word ends in the plural morpheme derivationally speaking, (cumple + años ‘complete + years’) but is used to denote both the plural and singular. 8. Prieto (1992) considers only instances of /je/ and /we/ that alternate with /e/ and /o/ in morphemic relatives (e.g. buen+o ‘good’, bon+dad ‘goodness).
However, the database also contains some words with non-alternating diphthongs such as nieta and hueco. 9. Jaeggli (1980:157, note 5) notes the variation, but dismisses it: ‘In some cases, native speakers may be unsure as to which form is the grammatical one, but saying this is very different from saying that more that one form is ever allowed.’ V 10. In the case of words ending in -e, either -cito/a or -±ecito/a may be applied to yield diminutives in -ecito/a. For this reason, the probabilities that these suffixes apply are summed together. The fact that the probabilities for some items do not total 100% is due to small amounts of leakage towards other suffix types.
31 Number Category Y 1 988 979 2a 7b 910 1c 73 4d 46 10e 2 1011 11f 990 10g 1000 11 29 3h 3i 35 35 35 21j 4i 28 28 1k 27 20s 5 201 200 1l 2m 2P 60 43 85 1n 1o 1T 5p 3q 60 6 36 36 17 5 13 1t 1u 17 7 100 99 1v 1x 8 4w 65 13 9y 8 14 14 1z 1A 8 1B 1C 2D 9E 6 5 1F 1G 2H 2I 10E 5 5 1K 1J 3L 11 5 4 1M 5N 12 5 1O 3R 1S 5 13 13 13 13 1Q –
Total Masculine Feminine No Gender -o Final -a Final -e Final -r Final -n Final -l Final -d Final -s Final Other Final /je/ or /we/ in Root Bisyllabic -e Final
3+ Syllables -e Final 63 8 1r a-foto, mano; b-e.g. abajo, adelante, callando; c-papá; d-bikini, güisqui, punqui, Iñaki; e-e.g. chisme, Pepe, tigre; f-e.g. mapa,
problema, sistema; g-e.g. ahora, arriba, encima; h-grande, leche, Maite; i-all are bisyllabic; j-10 of the remaining words end in [jo],
32 leaving lento, lleno, paso, sayo; k-mano; l-allá; m-carro, río; n-opel; o-David; p-buey, cují, godoy, güisqui, pupú; q-buey, diente, mueble; r-retoque; s-j-4 of the remaining words end in [ja], leaving hecha, lengua, mano, seca; t-pared; u-fuente; v-nomás; w-Adrián, Juan, pompón, ratón; x-papá; y-arroz, bistec, chalet, coñac, copey, desliz, lápiz, maíz, reloj; z-Pilar; A-inversión; B-verdad; CGladys; D-cruz, nariz; E-all monosyllabic; F-tren; G-sol; H-flux, valse; I-pez, rey; J-tos; K-flor; L-cruz, luz, voz; M-apenas; Nanteojos, Carlos, cumpleaños, lejos, Marcos; O-Lucas; P-allá, papá; Q-huevona; R-gafas, garrapatas, Mercedes; S-apenas; T-vals.
Table 1. Summary of Database Items by Category.
33 Base Form altar Antonia Antonio bote café caliente carne carro chófer cruz cuello cuenta cuento cuerno cuerpo grande güisqui hierba hierro hombre huevo Diminutive A altarcito Antonita Antonito botecito cafecito, cafetito calientito carnecita carrocito chofercito crucecita cuellecito cuentecita cuentecito cuernecito cuerpecito grandecita güisquicito hierbecita hierrecito hombrecito huevecito Diminutive B altarito Antoñita Antoñito botito cafelito calentito carnita carrito choferito crucita cuellito cuentita cuentito cuernito cuerpito grandita güisquito hierbita hierrito hombrito huevito Gloss altar Antonia Antonio jar coffee hot meat car chauffeur cross neck bill story horn body large whisky grass iron man egg
34 indio/a Jorge José Juan juego lento lleno mano muerto nieta papá paso piedra pieza pueblo puerta quieto/a ratona ratón río rubio sueño indiecito/a Jorgecito Josecito Juancito jueguecito lentecito llenecito manecita muertecito nietecita papacito pasecito piedrecita piececita pueblecito puertecita quietecito/a ratoncita ratoncito riecito rubiecito sueñecito indito/a Jorgito Joselito Juanito jueguito lentito llenito manita, manito muertito nietita papito, papaíto pasito piedrita piecita pueblito puertita quietito/a ratonita ratonito riocito rubito sueñito Indian George Joseph John game slow full hand dead granddaughter daddy step stone piece town door calm mouse mouse river blond sleep
35 tambor tiempo tren viejo/a viento vuelo vuelta tamborcito tiempecito trenecito viejecito/a vientecito vuelecito vueltecita tamborito tiempito trencito viejito/a vientito vuelito vueltita drum time train old wind flight revolution
Table 2. Doublets in the Database.
36 Word airito alfarcito Adriancito barrigonita bebecito buchecito buenito bueyecito callita chaletcito chilito cuatecito cuervecito cuestita Davidito dosito dulcito/a fuentita fuercita hambrita hechita Gloss air pottery Adrian big-bellied baby crop of birds good ox street chalet chili pepper buddy crow incline David two sweet fountain strength hunger done
37 huequito huellita jamonita juerguita lenguita llavita mierdita muellito nenecito nietito nubita nuevito/a patronita patronita Pepecito piecito pomponcito prietito puestito retoquito reycito ruedita hollow track chunky woman binge tongue key shit dock child grandson cloud new boss owner Pepe foot pompom tight stand touch-up king wheel
38 señorcito sequita suavita tiendita valsito verdito viajito sir dry smooth store waltz green trip Table 3. Erroneous diminutives predicted by AML which are attested on the WWW.
39 DIALECT A BASE WORD riego ruego siervo trueno Prob. of -±ito V 100 100 99.98 99.99 Prob. of -±ita V cuelga cuerda fiebre friega niebla nieve prueba suerte sierva 100.00 99.98 95.48 99.95 100 87.76 100 72.28 99.97 Prob. of -±ecito V 0 0 0 0 Prob. of -±ecita V 0 0 3.74 0.05 0 11.39 0 7.92 0 DIALECT B Prob. of -±ito V 26.9 22.3 48.66 0 Prob. of -±ita V 23.12 53.71 29.34 19.58 60.00 13.21 59.15 16.97 30.85 Prob. of -±ecito V 72.7 77.8 50.17 100 Prob. of -±ecita V 76.88 45.19 60.12 80.34 40.00 79.56 40.85 67.68 67.73
Table 4. Probabilities of Variant Forms in Two Simulated Dialects.
40 Base Form yegua Jorge pierna cuervo David Chevrolé nieta corte yeguita Jorgito piernita cuervito Davidito Chevrolito nietita cortito Variant A Prob. of Variant A 67.22 23.33 40.81 33.38 58.06 54.79 67.68 45.03 yeguecita Jorgecito piernecita cuervecito Davidcito Chevrolecito nietecita cortecito 32.78 74.95 59.19 66.31 36.10 40.43 31.21 51.53 Variant B Prob. of Variant B
Table 5. Examples of Competing Gangs on Selected Base Forms.
Figure1. Network Representation of How Analogy May Work.