Saturday, February 25, 2006

I n d i a n L a n g u a g e T r a n s l a t i o n

During my undergraduate days, I have worked in a state-of-the-art research lab on machine translation from English to Indian languages. One of the main motivations for our work was that we would be bringing the WWW to the 90% of the Indian population which is illiterate in English. Owing to this experience, I obtained a fair knowledge of the key problems in this area and the main technical hurdles to be overcome.

But I believe many of the R & D teams that are working on this are missing the holistic view.

Most of the people in India are illiterate, but they can speak native languages such as Telugu and Hindi. Further, many of the people who are literate are not computer literate. And amongst the computer literate people, most of them are already comfortable with English.

Keeping this in view, of what effectiveness would be a new Desktop Manager with arcane Telugu words for terms such as "Start", "My Computer" and "Control Panel" ?

In places such as China and Japan, communication in English is purely impossible. In other places such as France and Germany, there is a strong linguistic pride and awareness. But in India, both of these are absent. Still I believe there is a target audience in India, albeit of a different variety.

In order to clarify this, let me narrate an experience of mine.

Once during my UG days, I had another strange and unique opportunity. My father, a teacher, has invited me to give a few lectures on Modern Physics and Atomic Theory to the outgoing students of the high school. The medium of instruction was Telugu, not English, as had been mine during my school days. I had great fun teaching, and the students liked me very much as an instructor. Also, I had to put some effort in studying the equivalent Telugu words for the scientific terms. But this experience of teaching in the native language had been very benificial for me - in understanding the subject much clearer.

The fact remains that several of the students in Indian high schools are not literate/comfortable in the English language. This is a major hurdle for them to pick up and read the vast majority of the available literature. Even if they are fairly conversant in the rudimentary English language, they are not accustomed to reading large books in English. This section of people includes, surprisingly, both of my own parents - who are much more comfortable in reading books in Telugu.

But ofcourse, the most informative literature is currently available only in English. A sizeable portion of this is present on the web, and the rest is available in print. The key factors are the relevance and the importance of the information that is available.

A good example for such informative texts are the essays from the Edge Foundation.

The state of the art in machine translation research is well behind producing a reasonable output for real texts such as these. But this output can very well be a good starting point for a human translator.

Ideally, such information should be available in differnt formats (audio, images. video, summarized audio, detailed translated text ..) to cater to the needs of the varied audience. The important thing to note is that the needs of the audience are very different.


  • Some people might need to know the latest weather forecasts / stock prices / news headlines in summarized Telugu text.
  • Some people might need to know about the latest agricultural techniques in video format.
  • Some people might need to report/publish websites or blogs using simplistic Telugu interface and video.
  • Students in several schools might need to communicate on their class projects using audio and images.
  • Some students (or elders) might need to read books such as "Selfish Gene" and "The End of Poverty" (two personal favorites of mine) in clear Telugu language. Or they might need video and images in an accompanying website.

In my point of view, these are the issues that are really pertinent for the emancipation of education in the Indian scenario.

Issues such as these should be tackled individually and with special care. There are no magic bullets, such as Indian language GUIs, for solving all these problems in one shot.

These problems will be solved, both by building good computer systems and by massive human effort. These twin efforts should go hand in hand.

Particularly, I want to highlight one issue - that of translation of some informative books (such as The Selfish Gene, The End of Poverty etc) which are written by highly skilled scientists for a general audience. This is the kind of information that needs to be available for high school students. But unfortunately, neither of these books are translated into Telugu, so far, to my knowledge.

This kind of information is currently being published at a rapid pace, both on and off the world wide web. In order to make this available for the general Indian audience, I think it should be encouraged in universities for UG/PG students to translate these literature into regional languages, and course-credit assigned for such efforts. (For example, a group of economics students might translate The End of Poverty)

And by translation, the output need not be a book in Telugu. It can be a website or a series of lectures in images and audio. From my experience (of teaching modern physics in my dad's school), I can say that this will be truly rewarding for the translator too.

Interested people might even maintain weblogs where they post the translated text of specific domains (sometimes translate it as an audio file through speech). Should we call them transblogs ?

These will provide a true breakthrough in bridging the information divide in India.

8 comments:

AK-84 said...

Good point raised.. I really appreciate Sunil Mohan who's been doing something similar for the Prajasakti Telugu daily newspaper.. I believe that you're talking something on the same track..

Ray Lightning said...

In fact, I have been criticizing Sunil Mohan's team :) sorry bunny !!

I think these efforts are useless if they do not get to the holistic point of view.

Mischord said...

Your analysis of the problesm is almost completey right. But u've missed out on seeing the current work to lead to possible solutions.

1. Yeah, most people only "speak" native language. Our solution is to provide a desktop in native lang and make the computer read out the content. This includes menus, documents, web pages. So a person with a little training can practically use the entire desktop. The training is to make him learn to use a computer. Not the lang. And the telugu desktop is ready for this.Check out http://www.swecha.org/wiki/SpeechAssistance

2. I have my reservations on translating all kinds of content into Telugu. For instance, I believe Shelley can be enjoyed only in English. So good if ppl can learn English. But in case of the kind of imp, thought provoking books u r talking about, well they do need to be translated. And to do that, we have to provide ppl with proper/standard tools. And that is exactly what we r doing right now. The content will come up in the later years. And to make it faster, we have to educate ppl to contribute to projects like Wikipedia. Why is it that there is practically no te.wikibooks.org and te.wikinews.org????

My mother is educated enough to translate articles for Wikipedia. So is ur mother. And they r looking for that something that needs their sweat. Think about it.

Ray Lightning said...

Hi Chaitu
You are terribly bogged down by the conventional applications and are refusing to think out of the box. A country like India needs totally different tailor-made applications to suit its population. What works in USA does not work here (read things such as Desktop Manager, Wikipedia etc)

Why are you always playing second fiddle to other developer teams in the US ? Why always localization ? Don't you think you can build something from the scratch ?? There are several things that you could do.

There is a saying that If the only tool you have is a hammer, every problem looks like a nail This is what you are guilty of doing !!

You are not developing software from the software engineering perspective. You have not taken any requirements analysis. You have not done any quality control. You are not doing any tests whether your software would be effective to the consumers.

And when did I ask you to translate Shelley and Keats ?? I asked you to concentrate on the information that matters . There is plenty up there.

What I want you to notice is that Indian languages are more alive in their spoken form and not in their textual form

So you should concentrate on building tools for making audio-blogs and to doing voice-chat.

Please wake up !!

mOby said...

Me big fan.. of whatever u guys are doing.. atleast u guys are doing something..

Bhale Budugu said...

Kiran

I don't know how I landed up here, but FYI - Me and my brother had a similar discussion last month and was surprised to see almost the same content.

May be like minds think alike

Cheers

Ray Lightning said...

@Bhale Budugu :
Thank you :) We are already friends since u visited my blog !

@Bunny (sunil mohan):
first of all, you are one range !
Second of all, you guys are doing awesome work.

:)) thanks for visiting my blog bunny.

According to MIT's figures, 5% of the Indian population can effectively use English for communication. ... It therefore follows that Indian people need Indian language computing.

But how much English do you need to know to use a Desktop Manager ? How many people would like to give Presentations (powerpoint-type) frame word documents in Indian languages in the current scenario ? Handful. Mostly Indian beuraucracy in the government sector These are the issues that you are concentrating upon.

If we have enough bandwidth to do audio/video to rural places, the first thing we should think of is Tele-Medicine.

Website building, audio blogs, translations of "selfish gene", stock prices are phase 10 requirements for Indian language computing.


First, several applications need not have a dedicated internet connection. Providing multimedia-rich educational tools for schools is one such application. This content can be distributed via CDs by a local support group. The same support group extracts these audio+video stuff from the web and writes them down to CDs.

Interested people from the Telugu diaspora from all over the world can contribute to the creation of this content. But they need software to help them in doing this.

Education is the most important issue !! This is a phase-1 issue. Not a phase-10 issue

About the issues that you have mentioned (such as knowing land records, information about what crops to grow) maybe the world wide web is not the right channel for these ! These can be quickly and effectively delivered through other channels - such as call centers which operate on toll-free service.

Telemedicine might be best served through video conference from village dispensaries on a telephone hot-line (instead of the IP network)

This is why I am requesting you to think out of the box. The most serious issue in India right now is providing high-quality internet-age education at high school and undergraduate level. If we don't do this, the next generation of Indians will be severely handicapped in competing with the world.

gRAVIty said...

hi kiran,
i dont know whether you know this or not.there's a site quillpad.in/telugu inwhich you type in english the script comes in telugu