If you aren't aware, there are a few different forms of Chinese – traditional and simplified. I won't get into the history of why simplified Chinese was popularized by the PRC (you're welcome to read more about that here), but it's generally accepted that simplified Chinese is only used in China, whereas traditional Chinese is more commonly used in other places such as Hong Kong and Taiwan. Even within the two written forms, different regional phrases are sometimes used for the same object, e.g. 鼠标 in China, and 滑鼠 in Hong Kong and Taiwan, for a computer mouse. It's important to get these phrases right, to ensure that you're using the correct language for your target audience, just like you'd use 'trash can' for an American audience, and 'rubbish bin' for a British one.

Luckily, we found a great open source project to suit our needs - Open Chinese Convert. OpenCC features custom configurations, designed for translating to and from specific regional forms of Chinese, and can do so well at the phrase level as well. As we were building this with a CraftCMS plugin in mind, we also found a great PHP module for it, OpenCC4PHP.

Although the information for OpenCC4PHP can be found on their GitHub page, I'm reproducing it here for those that don't read Chinese.

Supports PHP 5.3 - 7.0.

On Linux first, you'll need to install OpenCC:

git clone https://github.com/BYVoid/OpenCC.git --depth 1
cd OpenCC
make
sudo make install

Then, you'll need to install the OpenCC4PHP module:

(Note: you may need to apt install php7.0-dev (or php-dev for php 5.x) for phpize to work. Also, if your installation of OpenCC isn't in /usr/ or /usr/local/, then use --with-opencc=[DIR] during the below ./configure command.)

git clone git@github.com:NauxLiu/opencc4php.git --depth 1
cd opencc4php
phpize
./configure
make && sudo make install

To convert:

$od = opencc_open("s2twp.json"); //this is where you specify the configuration, translation to/from which regional variation
$text = opencc_convert("我鼠标哪儿去了。", $od);
echo $text;
opencc_close($od); //close the module when you're done to release memory

You can also use opencc_error() to debug any errors. Will return false if no errors.

The different configs are available on the OpenCC GitHub page, but here they are as well:

Config Translation type
s2t.json Simplified Chinese to Traditional Chinese 簡體到繁體
t2s.json Traditional Chinese to Simplified Chinese 繁體到簡體
s2tw.json Simplified Chinese to Traditional Chinese (Taiwan Standard) 簡體到臺灣正體
tw2s.json Traditional Chinese (Taiwan Standard) to Simplified Chinese 臺灣正體到簡體
s2hk.json Simplified Chinese to Traditional Chinese (Hong Kong Standard) 簡體到香港繁體(香港小學學習字詞表標準)
hk2s.json Traditional Chinese (Hong Kong Standard) to Simplified Chinese 香港繁體(香港小學學習字詞表標準)到簡體
s2twp.json Simplified Chinese to Traditional Chinese (Taiwan Standard) with Taiwanese idiom 簡體到繁體(臺灣正體標準)並轉換爲臺灣常用詞彙
tw2sp.json Traditional Chinese (Taiwan Standard) to Simplified Chinese with Mainland Chinese idiom 繁體(臺灣正體標準)到簡體並轉換爲中國大陸常用詞彙
t2tw.json Traditional Chinese (OpenCC Standard) to Taiwan Standard 繁體(OpenCC 標準)到臺灣正體
t2hk.json Traditional Chinese (OpenCC Standard) to Hong Kong Standard 繁體(OpenCC 標準)到香港繁體(香港小學學習字詞表標準)

For our purposes of this plugin, we wanted to give our clients the convenience of a one-click translation, but also the flexibility to make tweaks to the text if they so wanted. With this in mind, we opted for the (undocumented) Craft hook that allows you to add to the entry's right edit pane in the CP.

craft()->templates->hook('cp.entries.edit.right-pane', function(&$context)) {
    if (!$context['entry']->id || $context['entry']->locale != 'zh') {
        return;
    }
    return craft()->templates->render('translation/_includes/translate-pane', array(
        'entry' => $context['entry']
    ));
});


screenshot of auto translate button in Craft CMS
Convenient one-click button to overwrite existing simplified Chinese entry.

For this, we wanted the edit pane only for the traditional Chinese entries, which is what our clients would be inserting content in. Then we looped through each template field, translating only for text fields, and saving this as the simplified Chinese entry. Since we were using Chinese "lorem ipsum", we ran into some characters that weren't properly getting translated, and came out as squares instead. We decided that the best method for any "unknown" characters would be to use the original traditional Chinese characters. As this was in Chinese, we had to use mb_substr() to find the position of the unknown character, and join the translated text that came before the unknown character with the original character itself, and then with the second half of the text that came after the character.

while(mb_strpos($convertedText, '𥐟') !== false){
    $i = mb_strpos($convertedText, '𥐟');
    $char = mb_substr($textAsString, $i, 1);
    $convertedText = mb_substr($convertedText, 0, $i) . $char . mb_substr($convertedText, $i+1);
}

After this, all that was left was to redirect the user to the simplified Chinese entry for final approval and checking!

Eve Lin
About the author

Eve Lin

Senior Developer

Eve has 8 years web development experience, working on a wide range of large scale projects including websites for the Securities and Futures Commission of Hong Kong, the Shangri-La, and Jardine Matheson.