CognitiveLab Logo

CognitiveLab

An Open Source First AI Research Lab Building from India, for the World.

About CognitiveLab

CognitiveLab is an open-source first AI research lab founded in Bangalore in May 2023. Our core mission is to build impactful AI technology from India for the world, with a strong focus on democratizing access and fostering innovation through open collaboration.

We develop cutting-edge models and tools, particularly excelling in multilingual AI for Indic languages (like Ambari and Project Nayana) and creating widely adopted open-source software (like OmniParse and the Indic LLM Leaderboard). We were also selected for the Microsoft for Startups program shortly after our inception.

To sustain our research and open-source contributions, CognitiveLab generates revenue through consulting services, helping startups and established companies build MVPs, production systems, and custom AI solutions.

Our Mission & Approach

CognitiveLab is dedicated to developing state-of-the-art AI models in India that create tangible impact globally. We prioritize open-source development to foster innovation, accelerate progress, and ensure accessibility. We balance our focus between critical multilingual/Indic projects and broadly applicable AI tools like data parsers and educational resources (e.g., AI Engineering Academy).

TL;DR - Our Major Achievements

  • Nayana
    Revolutionary multilingual, multimodal model supporting 22 languages with SoTA OCR capabilities
  • Omniparse
    6,000+ GitHub stars, 10,000+ monthly users, recognized as one of the fastest-growing open-source repositories
  • Ambari
    India'ss first bilingual Kannada-English LLM, evaluated by NVIDIA and Microsoft, featured by Meta
  • Meta Grant
    Received Llama Impact Grant for advancing multilingual AI capabilities

Why CognitiveLab Exists

The Language Gap

Despite India'ss linguistic diversity (22 official languages), AI development historically overlooked many regional languages, creating a digital divide for over 500 million non-English fluent speakers.

CognitiveLab aims to bridge this gap by building high-quality AI for underserved languages, ensuring technological equity.

The Resource Imbalance & Open Source Need

Access to resources and datasets for Indic AI was limited, often controlled by large entities with less focus on open community involvement.

We champion an open-source approach to prove impactful AI can emerge from India with focused engineering and collaboration, empowering local researchers and developers.

How CognitiveLab Has Evolved

From a bootstrapped initiative in May 2023 to securing international grants, our journey reflects our growing impact. Here are some key milestones:

May 2023

Founding & Microsoft for Startups

CognitiveLab founded as an open-source first research lab in Bangalore. Accepted into the Microsoft for Startups program around the same time.

January 2024

Ambari Launch

Released India'ss first bilingual Kannada-English LLM (Ambari), achieving SoTA performance with limited resources.

March 2024

Indic LLM Infrastructure

Launched tools and benchmarks like the Indic LLM Leaderboard to support Indic language AI development.

May 2024

OmniParse Launch

Released OmniParse, an open-source data parsing tool that quickly gained traction (6,000+ GitHub stars).

September 2024

Nayana OCR @ NAACL

First paper on Nayana OCR accepted at the prestigious NAACL conference workshop.

April 2025

Meta Llama Impact Grant

Awarded the grant from Meta (Llama Impact Grant) to advance multilingual AI (Project Nayana). Public announcement on April 29, 2025.

Present Day

Ongoing Research

Continuing work on Nayana, OmniParse, Indic infrastructure, and exploring new frontiers in open-source AI.

01
Ambari

India'ss first bilingual Kannada-English LLM, set a new benchmark by being SoTA at the time of its launch. Trained with a modest budget of just $1,000 on Azure'ss infrastructure, it showcased how powerful AI can emerge even with limited resources.

SoTA bilingual Kannada-English model at launch time
Featured in Meta'ss keynote at the Build with AI summit
Evaluated in research papers by NVIDIA and Microsoft
Highlighted in official posts on India.gov.ai

References & Resources

02
OmniParse

An open-source tool designed to ingest and parse any type of data into a structured format. With 6,000+ GitHub stars and 10,000+ developers using it monthly, it'ss rapidly gaining traction in the AI space.

6,000+ GitHub stars and 10,000+ monthly users
Featured on prominent tech blogs like MarkTechPost
Gained 3,000 GitHub stars in just 2 days after launch
10K+ monthly Docker pulls by developers worldwide
Recognized as one of the fastest-growing open-source repositories of Q3 2024

References & Resources

03
Indic LLM Infrastructure

We'sve developed several tools to support Indic language AI development, including the India LLM Leaderboard, Indic Eval, and Indic Tokeniser.

Standardized benchmarking platform for Indic language models
Entire infrastructure hosted on Azure for scalability and reliability
Part of the Leaderboard Mission at People+AI
Featured by NASSCOM as Tech Maverick innovation

References & Resources

04
Project Nayana

A revolutionary multilingual, multimodal, multitask language model that supports 22 languages, including text, audio, and vision capabilities.

Supports 22 languages with text, audio, and vision capabilities
Nayana OCR accepted at the prestigious NAACL conference
SoTA OCR model in 10 different Indic languages
Received grant from Meta (Llama Impact Grant)
Available on Hugging Face for easy access and use

References & Resources

05
Llama Impact Grant

In recognition of our work, particularly with Project Nayana and the Indic LLM Leaderboard, CognitiveLab was awarded the prestigious Llama Impact Grant by Meta. This significant support, set to be publicly announced on April 29, 2025, will accelerate our efforts in advancing multilingual and multimodal AI for diverse languages.

How We'sll Utilize the Grant

Advancing Project Nayana and Indic language AI

  • Expand language coverage across all 22 official Indian languages
  • Deepen multimodal capabilities connecting text, audio, and visual modalities
  • Develop high-quality training data for low-resource languages
  • Enhance the Indic Tokenizer to better handle morphological richness
  • Create inference optimization techniques for deployment on constrained hardware

Why We Choose Open Source

CognitiveLab'ss commitment to open-source AI is deeply rooted in our foundational philosophy of democratizing artificial intelligence. We believe that open source has the most direct and widespread impact, benefiting both developers and end-users.

Accessibility for All

Open-source AI fundamentally aligns with our mission to make advanced AI technologies accessible across diverse communities, especially in regions where language barriers have historically limited technological inclusion.

Democratization of Innovation

By embracing open-source models, we'sre enabling developers, researchers, and organizations throughout India and beyond to build on powerful foundations without prohibitive costs.

Community-Powered Development

The vibrant ecosystem around open-source models has accelerated our progress through collaborative debugging, shared improvements, and collective problem-solving that proprietary approaches simply cannot match.

Indigenous Innovation Focus

Open-source allows us to develop locally relevant AI solutions that address uniquely Indian challenges while contributing to the global AI ecosystem.