The investigation of social dialects has required the development of an array of techniques quite different from those used in dialect geography. Many of these derive from the pioneering work of Labov, who, along with other sociolinguists, has attempted to describe how language varies in any community
Types of variable
The investigation of social dialects has required the development of an array of techniques quite different from those used in dialect geography. Many of these derive from the pioneering work of Labov, who, along with other sociolinguists, has attempted to describe how language varies in any community and to draw conclusions from that variation not only for linguistic theory but also sometimes for the conduct of everyday life, e.g., suggestions as to how educators should view linguistic variation. As we will see, investigators now pay serious attention to such matters as stating hypotheses, sampling, the statistical treatment of data, drawing conclusions, and relating these conclusions to such matters as the inherent nature of language, the processes of language acquisition and language change, and the social functions of variation.
Possibly the greatest contribution has been in the development of the use of the ‘linguistic variable,’ the basic conceptual tool necessary to do this kind of work (see Wolfram, 1991). As I have just indicated, variation has long been of interest to linguists, but the use of the linguistic variable has added a new dimension to linguistic investigations. Although not all linguists find the concept useful in their work, it has nevertheless compelled most of its severest critics to reconsider just what it is they are theorizing about when they talk of ‘language,’ of a speaker’s ‘knowledge’ of language, and of the relationship between such knowledge and actual ‘use.’
A linguistic variable is a linguistic item which has identifiable variants. For example, words like singing and fishing are sometimes pronounced as singin’ and fishin’. The final sound in these words may be called the linguistic variable (ng) with its two variants [º] in singing and [n] in singin’. Another example of a linguistic variable can be seen in words like farm and far. These words are sometimes given r-less pronunciations; in this case we have the linguistic variable (r) with two variants [r] and Ø (pronounced ‘zero’). Still another example involves the vowel in a word like bend. That vowel is sometimes nasalized and sometimes it is not; sometimes too the amounts of nasalization are noticeably different. In this case we have the linguistic variable (e) and a number of variants, [y], [*]1, . . . , [*]n; here the superscripts 1 to n are used to indicate the degree of nasalization observed to occur. We might, for example, find two or even three distinct quantities of nasalization.
There are at least two basically different kinds of variation. One is of the kind (ng) with its variants [º] or [n], or (th) with its variants [θ], [t], or [f], as in with pronounced as with, wit, or wif. In this first case the concern is with which quite clearly distinct variant is used, with, of course, the possibility of Ø, the zero
variant. The other kind of variation is the kind you find above in (e): [*]1, . . . , [*]n, when it is the quantity of nasalization, rather than its presence or absence, which is important. How can you best quantify nasalization when the phenomenon is actually a continuous one? The same issue occurs with quantifying variation in other vowel variables: quantifying their relative frontness or backness, tenseness or laxness, and rounding or unrounding. Moreover, more than one dimension may be involved, e.g., amount of nasalization and frontness or
backness. In such cases usually some kind of weighting formula is devised, and when the data are treated it is these weights that are used in any calculations, not just the ones and zeros that we can use in the case of (ng): [º] or [n], where [º] = 1 and [n] = 0.
Linguists who have studied variation in this way have used a number of linguistic variables. The (ng) variable has been widely used. So has the (r) variable. Others are the (h) variable in words like house and hospital, i.e., (h): [h] or Ø; the (t) variable in bet and better, i.e., (t): [t] or [?]; the (th) and (dh) variables
in thin and they, i.e., (th): [θ] or [t] and (dh): [¨] or [d]; the (l) variable in French in il, i.e., (l): [l] or Ø; and consonant variables like the final (t) and (d) in words like test and told, i.e., their presence or absence. Vocalic variables used have included the vowel (e) in words like pen and men; the (o) in dog, caught, and coffee; the (e) in beg; the (a) in back, bag, bad, and half; and the (u) in pull. Studies of variation employing the linguistic variable are not confined solely to phonological matters. Investigators have looked at the (s) of the third-person singular, as in he talks, i.e., its presence or absence; the occurrence or nonoccurrence of be (and of its various inflected forms) in sentences such as He’s happy, He be happy, and He happy; the occurrence (actually, virtual nonoccurrence) of the negative particle ne in French; various aspects of the phenomenon of multiple negation in English, e.g., He don’t mean no harm to nobody; and the beginnings of English relative clauses, as in She is the girl who(m) I praised, She is the girl that I praised, and She is the girl I praised.
To see how individual researchers choose variables, we can look briefly at three studies. In a major part of his work in New York City, Labov (1966) chose five phonological variables: the (th) variable, the initial consonant in words like thin and three; the (dh) variable, the initial consonant in words like there and then; the (r) variable, r-pronunciation in words like farm and far; the (a) variable, the pronunciation of the vowel in words like bad and back; and the (o) variable, the pronunciation of the vowel in words like dog and caught. We should note that some of these have discrete variants, e.g., (r): [r] or Ø, whereas others require the investigator to quantify the variants because the variation is a continuous phenomenon, e.g., the (a) variable, where there can be both raising and retraction of the vowel, i.e., a pronunciation made higher and further back in the mouth, and, of course, in some environments nasalization too.
Trudgill (1974) also chose certain phonological variables in his study of the speech of Norwich: three consonant variables and thirteen vowel variables. The consonant variables were the (h) in happy and home, the (ng) in walking and running, and the (t) in bet and better. In the first two cases only the presence or absence of h-pronunciation and the [º] versus [n] realizations of (ng) were of concern to Trudgill. In the last there were four variants of (t) to consider: an aspirated variant; an unaspirated one; a glottalized one; and a glottal stop. These variants were ordered, with the first two combined and weighted as being least marked as nonstandard, the third as more marked, and the last, the glottal stop, as definitely marked as nonstandard. The thirteen vowel variables were the vowels used in words such as bad, name, path, tell, here, hair, ride, bird, top, know, boat, boot, and tune. Most of these had more than two variants, so weighting, i.e., some imposed quantification, was again required to differentiate the least preferred varieties, i.e., the most nonstandard, from the most preferred variety, i.e., the most standard.
The linguistic variables which sociolinguistics has studied are those where the meaning remains constant but the form varies, though in theory one could study such aspects as the different ways in which past-tense forms are used as a linguistic variable. There are, however, serious problems if we try to use this as a definition of ‘linguistic variable’ since it is hard to be clear about what counts as ‘the same meaning’. For instance, it could be argued that cat and pussy have the same meaning, and therefore might be considered as a linguistic variable, in much the same ways, for example, alternative pronunciation of house with and without [h]. Against this it could be argued that ‘meaning’ ought to be defined more liberally, to include what is often called ‘social meaning’, in which case cat and pussy would have different meanings and couldn’t be treated as variants, of a linguistic variable. Fortunately, the notion ‘linguistic variable’, itself is not meant to be taken as a part of a general theory of language, but rather as an analytical tool in the sociolinguist’s tool chest, so we need not worry unduly about such problems of definition.
A part from saying that a linguistic variables should not involve a change of meaning, there is little to be said which aspects of language may have variables. They may be found in the pronunciation of individual words or of whole classes of words (say, all those beginning in one accent with [h], or all those ending in –ing), and in the patterns of syntax.
There are major problems which make pronunciation variables harder to study than might be expected. The current state of disarray in phonological theory, where, for instance, the status of phonemes and the nature of underlying forms of words is still in doubt, gives rise to one such problem. Is one justified, for example, in treating the [r] sound in cart as an instance of the same ‘phoneme as that in car? Could one use the difference which Labov found in his New York study as evidence that they are different phonemes (assuming that ‘phonemes’ is a meaningful term)? Is it justifiable to postulate phonemes such as /h/ in the underlying forms of words like house when speakers nearly always leave the sound out in ordinary speech? If not, by what right do we assume that such speakers are illustrating the same variable in choosing between house with and without [h] as others speakers who normally have the [h], but sometimes ‘drop’ it.
Calculating scores for texts
The classical labovian approach offers an attractively simple method for assigning scores to text, to show similarities and differences between speakers’ use of linguistics variable, but we shall see it is also has serious weakness. A score is calculated for each variable in each text, which allows texts to be compared with respect to one variable at time, which is the prime aim of quantities study of texts. To calculate the text scores of a given variable, a score is assigned to each of its variants; the score of any texts then the average of all the individual scores for the variants in that text. To take a simple example, let us say we have a variable with three variants, A B and C, and we have calculated the scores as 1 for each instance of A, 2 for each B and 3 for each C. Now assume that we have a text containing 12 A’s, 23 B’s and 75 C’s. We calculate the text score by calculating the scores of all the A’s (12 x 1 = 12), all the B’s (23 x 2 = 46), and all the C’s (75 x 3 = 225), then adding all together (12 + 46 + 225 = 283 ) and dividing the answer by the total number of variants found (i.e. 12 + 23 + 75 = 110), giving 283 ¸ 110 = 2.57. This is the score for the text concerned for this variable, and it will of course be easy to compare it directly with scores for the other texts for this same variable.
This method has two failings, both of which are important, the first is to do the ranking of variants, on which we touched in 5.3.2. Assigning separate scores to individual variants (1 for an A, 2 for a B, and so on), has to be based on some kind of principle, otherwise the results may nonsense. Scoring is not simply arbitrary, since the apparent relations among texts could be completely changed by using a different scoring system. There is no problem if a variable only has two variants, since it makes no difference which one is scored ‘high’ and which ‘low’. The problem arises where there are three or more variants, since the scoring system reflects a particular ordering of the variants, with two variants picked out as maximally different and the others arranged between them as intermediate values. This means that whenever three or more variants are recognized on a single variable, the analyst has to be able to pick out two of them as the extremes and to arrange the remainder between them. This can be done in many cases on the basis of phonetic relations among the variants can be arranged on some phonetic dimension such as vowel height. However, we have seen that this is by no means always the case – there may be more than one such dimension involved – so the phonetic facts do not tell the researcher how to order the variants. Another basis for ordering is the social prestige of the variants, which allows the most standard and the least standard variants to be picked out as the extremes and the others ranked in between according to relative ‘standardness’. The problem with this approach is that it assumes in advance that society is organized in a single hierarchy reflected by linguistic variable, whereas this often turns out not to be true, so the method biases the research towards incorrect conclusion.
The second weakness of the Labovian scoring system is connected with the distribution of variants, since the final figure for a text gives no idea of the relative
Contributions made by individual variants. A score of a 2 for a text in our hypothetical case could reflect the use of nothing but B (scoring 2 each time it occurs), or of nothing but A and C, in equal numbers, with no instances of B at all. Let us take an actual example, using data from a study of the (r) variable in Edinburg by Suzanne Romaine (1978). This study is unusual in providing separate figures for individual variants, rather than aggregate scores for the whole variable. The variable (r), like the one which Labov studied in New York, applies to words containing an (r) not followed by a vowel in the same word. However, these particular figures apply only to (r) occurring at the end of a word, and show the influence of the linguistic context: whether the word is followed by a pause, or by variants are not quite the same as those distinguished by labov, since there are two possible types of consonantal constriction for (r) in Endigburg, a frictionless continuant, as in RP and most American accents (I), and a flapped (r).
Calculating Scores for Individual and Groups
In a sociolinguistic study of texts the investigator has material produced by different individuals, and often more than one text from each, produced in different circumstances. A typical research project might involve the study of 10 variables in the speech of 60 people under 4 types of circumstances, producing 10 x 60 x 4 = 2,400 separate scores for texts, if the classical labovian method were used. The figure would of course be much larger if the alternative of quoting separates scores for individual variants were adopted. The problem is how to handle such a larger amount of data without being swamped. By far the most satisfactory solution is to use a computer with a sophisticated statistical programmer, which is now widely done where sufficient funds and manpower are available.
However, another solution is to reduce the number of figures by producing averages for individuals or groups of individuals, and this is still common practice among sociolinguists. For example, if we can reduce 60 speakers to 8 group defined, say, by sex, socioeconomic class, we immediately reduce the total number of figures from the 2,400 given above to 320, which means just 32 figures for each variable taken on its own. Moreover, the number of cases covered by each of the figures is increased, since each score for a variable will represent a whole group of speakers instead of a single one. This has the advantage of increasing the statistical significance of any difference between scores, since this depends not only on the size of the difference but also on the number cases involved.
There are thus great gains from merging separate figures into averages. All the actual figures quoted so far have been group averages and not scores for individual speaker. This is typical of the literature, where it is in fact rare to find figures for individual speakers.
A reliance on group scores alone conceals the amount of variation within each group. A group scores of say, 2 for some variable ranging from 1 to 3 could be produced either by all the members of the group having scores very close to 2, or by some scoring 1 and others 3. In the former case, the group average of two represent a norm around which the speech of the group members clusters, whereas it is completely meaningless or misleading in the second case. The variable is concerned with the assimilation of one vowel to another in the following syllable in words like /beckon/ ‘Do’, whose first vowel varies between [e] and [o]. Each figure represents the percentage of assimilated vowels in the speech of one speaker, and the speakers are arranged in eight columns, each representing a separate group. The groups are defined on non-linguistic grounds, on the basis of education and sex.
A study of the pronunciation of sixteen 11 years old boys from there different a school in Edinburgh is the source of of the data. The children wore radio microphones while playing in the playground and the data collected were thus expected to be close to the kind of speech the children used naturally. The three schools were chosen so that each would cover a different range of social background, but it can be seen that grouping boys according to their school produced very heterogeneous result from the point of view of the (t) variable, with a great deal of overlap between groups. Reid also gave information about the occupations of the boys’ father, but even this supposedly more accurate measure of social status did not produce much more homogeneous groupings. All the boys from the school 1 had fathers classified as ‘foremen, skilled manual workers and own account workers other than professional’, with the expectation of the two marked with daggers, whose fathers were semi-skilled or unskilled-manual workers or personal service workers.
The other problem which arises from group scores is related to the first, and in fact arises out of it. If grouping speakers or texts is simply matter of convenience for the analyst faced by an otherwise unmanaged able mass of data, there is probably no problem. No doubt the grouping will help him to see various broad trends in the data which he might otherwise miss. But there is a danger of moving from this position to very different one, where one believe that grouping are socially ‘real part of the objective structure of the society, and therefore part of theoretical framework that is referred to in interpreting the result.
According to which society is structured at least partly in terms of network of more or less closely connected people, who are influenced to different degree by the norm of the various networks. The weakness of the group analysis is that it makes no allowance for people who belong to group to different extent; and when individual scores have been merged in group averages there is nothing to indicate wither or not this should be taken into account.
To summarize this section, we have criticized the labovian method of identifying variants and calculating scores because it loses too much information which may be important. Information about the use of individual variants is lost when these are merged into variable scores, and information about the speech of individualism also lost if these are included in group average. At each stage the method impose a structure on the data which may be more rigid than was inherent in the data, and to the extent distorts the result – discrete boundaries are imposed on non-discrete phonetic parameters, artificial orderings are used for variants which are related in more than one way, and speakers are assigned to discrete groups when they relate to each other in terns of networked rather than groups.