#30DayMapChallenge, Day 2. Getting the data, Part 1: Who is fluent in Formosan languages?

9 minute read

Day 2 of the #30DayMapChallenge. I’m entering the preparatory phase: Figuring out what data are available to plot the maps. Since my main interest is about language, let’s start with the crucial and orthodox question: What is the proportion of fluent speakers of indigenous languages in Taiwan?

Anyone interested in Formosan languages knows that they are endangered. A quick chat with indigenous people or researchers will tell you that the most fluent speakers are the older ones. The question is:

What is the number of speakers per language?

This question is not easy to answer, for the only reason that it’s hard to define what a ‘fluent speaker’ is. The ideal way to do so would be to rely on standardized tests and to have everyone taking them… but as always, ideal situations face the reality of life!

First question: Who is fluent in Formosan languages?

Prof. Ai-yu Apay Tang (National Dong Hwa University) looked at this issue in Truku, using behavioral methods to provide psycholinguistic assessments (see her book published in 2021, From diagnosis to remedial plan : a psycholinguistic assessment of language shift, intergenerational linguistic proficiency, and language planning in Truku). Several groups of participants were involved depending on their age: (a) 10-15 years old, (b) 16-25 years old, (c) 26-40 years old, and (d) 41-65 years old. And the results are quite straightforward: The younger the participants, the less fluent they were in Truku. Coupled to the fact that generally they can speak Mandarin Chinese, the results reflect a shift in terms of dominance of L1: While the dominant L1 was still Truku for the group of oldest participants, this was not the case for the three other groups.

As always when dealing with Formosan languages, we shound never assume that what we observe in one language is generalizable to other languages. Do the results found for Truku also hold for other ones?
The Foundation for the Research and Development of Indigenous Languages (in Chinese: 原住民族語言研究發展基金會) has published a huge survey they conducted in 2020 (but only in Chinese, here), with responses from 480 people, covering 16 indigenous tribes (another bigger survey is on-going).
We can find the results to the question “what is your dominant L1” on pages 35-36.

Table 1. Proportion of language dominance by age group

Dominant languageIndigenous languageMandarin ChineseOther
Age group   
Below 108.1%91.9%0%
11-206.2%93.8%0%
21-305%92.5%2.5%
31-4018.3%75%6.7%
41-5043.4%53%3.6%
51-6064.6%26.8%8.6%
Above 6181.8%11.7%6.5%

Again, the situation is quite clear, as one’s own indegineous language is aslo your dominant L1 when you’re older. But unlike the study by Ai-yu Apay Tang, we can see that this is the case for speakers older than 50 years old!
There are also differences depending on the ethnic group, as showed on page 37. But these data were aggregated over the age groups, so it’s not easy to really interpret such differences.

We have now an answer to the first question, we can refine the general question:

What is the number of (fluent) speakers of Formosan language by language?

This is a simple calculation, which is certainly wrong because it won’t represent the reality on the field, and I’m completely ignoring differences between groups, the actual usage of the language in the everyday life, where they live (in metropolitan/urban areas or in the villages), etc.. But at least, we can have a first estimate:

Estimate number of (more or less) fluent speakers = n(51-60 years old) * 64.6% + n(above 61 years old) * 81.8%

The Council of Indigenous Peoples in Taiwan provides monthly statistics about the population number of indigenous people based on different factors (see here for the most recent one so far, September 2024). The problem is that there is no statistical table for the population number in terms of ethnic group and age! But we have other data, so let’s see what we can do.

First, we have the total population number for each ethnic group, and according to whether they live in mountain areas (which is super crucial as you’ll remark great differences between the groups!).

Table 2. Population number for each ethnic group (according to their living area)

 Non-mountain areasMountain areasTotal
Total283 595324 349607 944
Amis225 2541341226 595
Atayal193797 15999 096
Paiwan23 27286 573109 845
Bunun35363 00163 354
Rukai2 80611 28314 089
Puyuma15 64112215 763
Tsou1668696885
Saisiyat481324177230
Yami1249514963
Thao9106916
Kavalan165421656
Truku16535 11735 282
Sakizaya111821120
Seediq2811 48011 508
Saaroa3485488
Kanakanavu1445446
Not declared561230988708

Second, we have the table of the repartition of the population accordind to their age and whether they live in mountain areas. I just give the numbers of interest here:

Table 3. Population number and proportion based on the group of age (all ethnic groups aggregated)

 Non-mountain areasMountain areas
Total283 595 (100%)324 349 (100%)
0-50 years old200 135 (70.57%)243 936 (75.21%)
51-60 years old36 599 (12.91%)38 772 (11.95%)
Above 61 years old46 861 (16.52%)41 641 (12.84%)

What I can do now is just to assume (because I have no other choice so far) that this proportion is the same for each group, and we obtain the following table:

Table 4. Estimate population number for each ethnic group (above 51 years old)

 Non-mountain areas
51-60
Non-mountain areas
Above 60
Mountain areas
51-60
Mountain areas
Above 60
Total36 61245 85038 76041 646
Amis29 08037 212160172
Atayal25032011 61112 475
Paiwan3004384510 34511 116
Bunun465875298089
Rukai36246413481449
Puyuma201925841516
Tsou23821882
Saisiyat621795289310
Yami22592636
Thao11715011
Kavalan21427300
Truku212741964509
Sakizaya14418500
Seediq4513721474
Saaroa005862
Kanakanavu005357
Not declared725927370398

This table only shows the population number by age group, but we know based on Table 1 that their indigenous language may not be their dominant language. So we need to calculate one more time based on the proportion numbers described in Table 1. This gives us the following estimate:

Table 5. Estimate population number whose dominant language is their indigenous language for each ethnic group

 Non-mountain areas
51-60
Non-mountain areas
Above 60
Mountain areas
51-60
Mountain areas
Above 60
Total
Total23 65138 32325 03934 067121 080
Amis18 78630 43910414149 470
Atayal162262750010 20518 128
Paiwan194131456683909320 862
Bunun29484863661711 558
Rukai23437987111852669
Puyuma130421149133440
Tsou125307211255
Saisiyat4016501872541492
Yami12382520905
Thao7612301200
Kavalan13822400362
Truku1422271136886435
Sakizaya9315100245
Seediq2488612062098
Saaroa00375189
Kanakanavu00344781
Not declared4687582393251791


Final remarks

We have an answer to the question in this post. Overall, this means that less than 20% of the people of each ethnic group is dominant in their indigenous language.
But there are many things I need to warn (also to myself!). This is the most optimistic situation as this only corresponds to the highest numbers on the scale. I wouldn’t be surprised if the actual numbers are only half of what I report here, or even lower for the most endangered languages.

The reason is just that there are many factors that aren’t taken into account in this estimate:

  • What does ‘mountain’ and ‘non-mountain’ areas mean? In particular, if ‘non-mountain’ areas cover very urban areas, such as Taipei city, New Taipei city, Taoyuan city or Kaohsiung city, then the situation is more than certainly worse! Unfortunately, the data weren’t fine-grained enough in the reports we can find online.
  • I used the notion of ‘L1 dominance’, but we all agree that this should be seen as a scale. We can be dominant in the same language, this doesn’t mean that we are equally competent.
  • Plus, it doesn’t mean that the speakers are not compentent in a second language. The reality is that they are certainly bilingual, and interferences from the other language (mostly Mandarin Chinese or Taiwanese Southern Min, or even another Formosan language) can’t be avoided.
  • It’s also possible that the number of responses from the survey of L1 dominance was not representative enough. If that’s the case, I’m looking forward to the results of the on-going survey!

    Overall, I believe that these numbers are helpful. We need a starting point. But they fail to reflect the dynamicity of these languages, and they erase individual differences and sociolinguistic phenomena we can observe on the field. So my take-home message is: You can refer to this estimate, but with a lot, a lot, A LOT of caution!