Google Indic Transliteration

Google's transliteration tool for Hindi showed up a few weeks ago in Blogger for me as a क symbol, which toggled the Hindi mode on and off. But when I tried it out, I was disappointed because it was very buggy and I couldn't compose an entry in Hindi. However, I am deeply impressed by Google's improved Indic Transliteration - currently, it's only Hindi but I'm sure they will be working on Gujarati, Punjabi, Bengali, Marathi and other languages.

Hindi speakers commonly use a very ambiguous but oddly standardized form of English transliteration which depends on the native speakers ability to guess what the word corresponds to. Here's a random sentence I tried which the tool perfectly translated into Hindi:
Isse aage hum aur kya kahein jaanam samjha karo
इससे आगे हम और क्या कहें जानम समझा करो
Here we see that, what Google does yet again, is that it adopts the way people naturally write romanized Hindi, instead of trying to make them learn a new method of transliteration that is easier for computers to parse. This other thing is what Palm did with Graffiti, which was a cumbersome way of writing text, character-by-character, that the PDAs could easily decipher.

If you notice, the Google transliteration engine will not work offline, because it retrieves the transliterated results directly from the Google servers as soon as you hit the space key after typing a word. Although I do not know how the software is implemented, I am guessing it uses large volumes of frequency data as to what English letter combinations correspond to what Hindi letter combinations, sorted by popularity. In that way, they would not need to devise a complicated parsing algorithm for a transliteration system that is standardized only by the weak force of common consensus, but instead present only what seems to be the most probable match for the given word and give the rest of the matches when you click on the word, which, of course, has an uncanny resemblence to "I'm feeling lucky" and "Google Search". It is also the same way that most romanized Chinese and Japanese input systems work.

In fact, what would be really awesome would be if Google converted this amazing piece of web-based software into Input Methods for Mac and Windows so that they could be used in any app. on any computer. 

Comments (2) Posted on at  

  • » Hi, Yeah I tried it too. I liked the concept. But there is another tool in the market which is more powerful I guess. You can check that out at www.quillpad.in. In google transliteration it gives you the transliterated word only after pressing the space. You type 'pani', 'mehnat' etc etc.. it give your undesired first option. Then you have to take pain to go back to correct the word. But in case of quillpad it give you on the fly output which is really cool. Apart from that is as good as google's transliterator if not more than that. Also at quillpad they are supporting 8 languages which is covering 60-70% of all Indian population. And as they promise on their site, it's not a big deal to support many other languages for them. So I guess in this case google has a long way to go.
  • » बहुत धन्यवाद इस लिन्क के लिए। गूगल की सेवा की तरह, ऐसा लगता है कि यह Quillpad भी मैकिंटौश पर काम नहीं करता, पर मैं इसका विनडोज़ पर इस्तेमाल करके देखूँगा। ऐसा लगता है कि इस साईट पर गुजराती भी है, जो मेरा एक दोस्त ढूँढ रहा था।