IIT Gandhinagar Researchers Unveil an LLM for Hindi Language

IIT Gandhinagar researchers have launched Ganga-1B, an AI language model for Hindi that outperforms existing models.
The research group spent almost 1.5 years developing the Ganga-1B model using open-source data from various websites.
The team is also developing AI models for other Indian languages and exploring applications in e-governance and education.

The Lingo Research Group at the Indian Institute of Technology Gandhinagar (IITGN) has announced the first product of Project Unity, an initiative that aims to celebrate and harness India's rich linguistic diversity by creating a comprehensive resource for the country's major languages.

They have developed an artificial intelligence (AI) model called “Ganga-1B,” a pre-trained language learning model (LLM) for the Hindi language.

Releasing the first product of the #Unity project @lingoiitgn , Ganga-1B , a pretrained LLM for Hindi. Created from Scratch using the largest curated Hindi Dataset. Ganga-1B is outperforming all open-source LLMs supporting Hindi with sizes till 7B.
— Mayank (@mayank_iitgn) July 3, 2024

The Ganga-1B model has been trained on a large collection of public domain Hindi language data, including news articles, web documents, books, government publications, educational materials, and selected social media conversations. Native Indian speakers have also reviewed the dataset to ensure its high quality.

Remarkably, Ganga-1B performs better than existing open-source models for Indian languages, even those with up to 7 billion parameters.

The research group spent almost 1.5 years developing the Ganga-1B model using open-source data from various websites. Not to mention, the AI model has been downloaded by over 600 people within 48 hours of its release.

Additionally, the research team is creating models for other languages like Tamil, Telugu, Marathi, Gujarati, and Urdu. They are also looking into using AI in e-governance for regional languages. To help school students and teachers, they are working on an education LLM.

Edited by Harshajit Sarmah