Will Domain Dot भारत Spur the Growth of Indian Languages on the Internet?
Will Domain Dot भारत Spur the Growth of Indian Languages on the Internet? is a Scroll.in article published on 29 August 2014. The piece examines whether the launch of Devanagari script web addresses would help increase Indian language content online. Technology policy expert Sunil Abraham highlighted the absence of core infrastructure such as comprehensive dictionaries, machine translation capabilities, and optical character recognition for Indian languages.
Contents
Article Details
- 📰 Published in:
- Scroll.in
- 📅 Date:
- 29 August 2014
- 👤 Author:
- Rohan Venkataramakrishnan
- 📄 Type:
- News Report
- 📰 Website Link:
- Read Online
Full Text
Modi's effort to promote the use of Hindi and e-governance has given hope to those who want to see more vernacular content online, but many challenges have to be overcome.
For most of its short history, the internet has been the English speaker's playground. Though English is the world's third most-spoken language (after Mandarin and Spanish), it is by far the most commonly used language on the internet. If you wanted to make sense of most of what's on the World Wide Web, you had to be able to read and write English.
This is slowly changing. The launch of Devanagari script web addresses on Sunday, allowing people to use .भारत domain names, was another step in the slow effort to bring about a multilingual Web. Already, Indian languages like Hindi – one of the most commonly-spoken languages on Earth – lag far behind. The move gels well with the new government's effort to promote the use of Hindi, and its push to increase digital services available to all citizens. The next few years could well see a spurt in vernacular content online.
But first many challenges have to be overcome. "At present, not a single Indian language figures in the top 10 languages prevalent on the Internet, though Chinese, Arabic and Russian feature in the list," said a McKinsey report on the internet's impact on India. "The next wave of internet adoption in India will be dominated by local language speakers, which underscores the need for much more content and applications to be offered in local languages."
Vernacular internet
Early studies of the internet attempted to quantify how much of the web was in English. A 1997 estimate put the number at 80% of all websites, while the Online Computer Library's study in 2003 concluded that 72% of all online content was in English. Today that number is much lower.
W3Techs, which conducts surveys of the internet, now estimates that about 55% of content on the Internet is in English, followed by German, Russian and Japanese. Indian languages don't crack the top 35.
The analysis is by its nature imprecise. The internet is vast and mostly uncharted. Estimates suggest search engines have indexed only 40% of Web content, leaving much off the mainstream radar. Measuring language becomes even harder because, in the early years, when fonts were harder to render, most non-English content on the internet was spelt out in Roman letters.
Indian Wiki
The rise of multilingual scripts has changed that, and made it easier to evaluate the diversity of the internet. Yet even the best approach relies more on sampling than measurement. There is one section of the Web, however, that does allow for comparisons of absolute numbers.
Relative to other tongues, Indian language-articles still comprise a minuscule portion of Wikipedia. English, Spanish and French are perhaps expected, but even languages like Vietnamese have nearly 10 times the number of pages that Hindi does. Waray-Waray, the fifth-most commonly spoken language in the Philippines, appears to be an outlier because of an automated translation method that creates pages in that language.
Hindi content has been growing on the internet encyclopedia, from no pages in 2003 to more than one lakh in 2011, but it still falls far behind the languages that are spoken as commonly as it, like Spanish and Arabic, let alone those with much smaller reach. Of course in many countries English is not spoken at all, so Internet users need web pages in their own language. In India, because of the language-class association, the majority of Internet users are at least conversant in English.
Obstacle course
The impediments to further growth are all too apparent. For one, internet infrastructure still leaves much to be desired. Though India has the third-largest internet user-base in the world, only 10% of the country is actually online. Even by 2015, when internet access is expected to reach 28% of the population, the equivalent rural figure is likely to be just 9%, according to estimates.
"A lot of the core infrastructure that is necessary for language computing is missing," said Sunil Abraham, executive director of the Centre for Internet & Society. "There's no mandate by the government that these languages must be supported, no comprehensive dictionaries, no thesauri, no machine translation capabilities, no optical character recognition capabilities. Because our market is so insignificant for proprietary software makers, they haven't done enough to develop these. Meanwhile, the free software community is too small and mostly English-speaking."
The government has launched some initiatives in this regard, like a National Translation Mission aimed at machine translating text from English into Indic languages, as well as banks of fonts that are free to use. But Abraham said that while the government is clear this should be a priority area, it underestimates the scale of the problem.
"We need large scale investment by the government into each language," he said. "We're looking at maybe even Rs 100 crore per language, to bring each of our traditional languages into the internet age."
Context and Background
This article was published shortly after Communications and IT Minister Ravi Shankar Prasad launched the .भारत domain on 21 August 2014, making it available in Devanagari script for eight Indian languages including Hindi, Marathi, Konkani, Bodo, Nepali, Maithili, Dogri and Sindhi. The initiative was part of the newly-elected Modi government’s broader Digital India programme, which aimed to expand internet access and services to rural areas.
The launch of internationalised domain names represented a technical milestone, following ICANN’s delegation of country-code top-level domains in seven Indian languages to the National Internet Exchange of India. However, the move primarily addressed accessibility at the domain level rather than the more fundamental challenges of creating and consuming content in Indian languages.
At the time, despite Hindi being one of the most widely-spoken languages globally, Indian language content remained negligible on the internet. Wikipedia statistics illustrated this disparity starkly—Hindi had barely crossed one lakh articles whilst languages with far smaller speaker populations had substantially more content. The language-class correlation in India meant most internet users were conversant in English, reducing immediate demand for vernacular content.
The infrastructure gaps Abraham identified—absence of comprehensive dictionaries, machine translation systems, and optical character recognition for Indian scripts—would require sustained investment and coordination between government, private sector, and open-source communities to address effectively.
External Link
📄 This page was created on 3 January 2026. You can view its history on GitHub, preview the fileTip: Press Alt+Shift+G, or inspect the .