Saturday, October 16, 2010

Choose your transcription well

As discussed in the last two posts, the variation in names for Taishanese (or, alternatively, the Siyi dialects) can be grouped into the geographical terms on which the names are based. Most forms are variants of either Taishan 台山 or of Siyi 四邑. But far more numerous are the ways these terms are transcribed into English.

The Chinese word 台山 has been transcribed as Hoi Saan, Hoi San, Hoisaan, Hoisan, Hoy Saan, Hoy San, Hoy Shan, Hoysun, T’ai-Shan, Tai Shan, Tai-shan, Taishan, Toi Saan, Toi San, Toi Shan, Toisaan, Toisan, Toishan and Toy Shan. I stumbled across a few more variants after writing up this list. (!) But this is not an endless string of random permutations.

These variants fall into three distinct categories based on the form of Chinese used to transcribe the characters—Mandarin, Cantonese or Taishanese. (Note that these three dialects are basically the only ones used; I would wager that you won’t run across any published English-language article referring to the Taisuan or Đài Sơn dialect, as would be transcribed for Chaozhou and Vietnamese, for example.) Below I’ve broken the variants into the three new groups.

  • Mandarin: T’ai-Shan, Tai Shan, Tai-shan, Taishan
  • Cantonese: Toi Saan, Toi San, Toi Shan, Toisaan, Toisan, Toishan, Toy Shan
  • Taishanese: Hoi Saan, Hoi San, Hoisaan, Hoisan, Hoy Saan, Hoy San, Hoy Shan, Hoysun

Some of the differences listed above are small, involving only spaces or hyphens, so I’ve reanalyzed the list by ignoring punctuation and making all the forms single spaceless words.

  • Mandarin: T’aishan, Taishan
  • Cantonese: Toisaan, Toisan, Toishan, Toyshan
  • Taishanese: Hoisaan, Hoisan, Hoysaan, Hoysan, Hoyshan, Hoysun

The differences are a little more distinct. And note that the number of transcription variants differs depending on which form of Chinese you look at: Mandarin–2, Cantonese–4, Taishanese–6.

The two Mandarin forms reflect two romanziation standards: Wade-Giles (T’ai-shan) and Hanyu Pinyin (Taishan). Since the 1960s, the Hanyu Pinyin standard has been the dominant form of Mandarin transcription, written as Taishan.

The Cantonese transcriptions include both standard and non-standard variants. Toisaan follows the LSHK Jyutping (and Sidney Lau) scheme. The Yale romanization is not included; this would be: Toihsaan. But the Jyutping form is not always the preferred form—so for 四邑, you will see Seiyap (Yale/Lau), but rarely if ever Seijap (Jyutping).

The other forms reflect different orthographic tweaks which at least today would be considered non-standard. The double-“a” denotes a vowel-length contrast that exists in Cantonese—ignore this contrast, and you get Toisan. And in earlier versions of standard Cantonese, there are in fact two s-like sounds, which were often distinguished as sz and sh—this spelling is evident in the forms Toishan and Toyshan.

When it comes to Taishanese, there is no accepted standard. No surprise then that the highest number of variants turn up when you look at the transcription of Chinese characters into Taishanese. These schemes generally mirror the Cantonese transcription, but adapting Taishanese pronunciation. Note that Taishanese pronunciations differ, and that’s reflected in some of the different transcriptions. While some write Hoysaan (a as in father), others write Hoysun (u as in sun).

In the previous post, I noted that variation on names differs based on geographical terms. In this post, I show that you can further subdivide variation based on the language used for transcription (Mandarin, Cantonese or Taishanese). I won’t go into the arguments for using one scheme over another—although you can already see that I am leaning towards usage of Mandarin transcription when I write Taishanese. The upcoming post is a discussion of who uses which transcription and why.

