Adithya S Kolavi

AI Researcher, Building Generative AI solutions at Scale

About

Building AI that actually works! Currently deep into Vision-Language Models and Agentic Systems, with hands-on experience taking AI projects from wild ideas to real products. Love tinkering with model fine-tuning and cloud deployments. Big open-source enthusiast - you'll find me contributing to projects that make AI more accessible to everyone.

Work Experience

Microsoft Research

July 2025 - Present

Research Fellow

On Site

Working on Real time AI systems

Apple

January 2025 - June 2025

ML Intern

On Site

Part of the Unified Intelligence team

CognitiveLab

Building a Research Lab

January 2023 - Present

AI Researcher

On Site

Pioneered one of India's first Kannada bilingual Large Language Model - Ambari-7b. Developed Cognitune, an enterprise-grade LLMops platform, reducing time to production by 60%.

Featured Open Source Work

Academic Publications

Research papers and academic contributions

ICCV 2025

Nayana: A Foundation for Document-Centric Vision-Language Models via Multi-Task, Multimodal, and Multilingual Data Syn-thesis

Workshop on Computer Vision for Developing Countries (CV4DC)•2025•

Accepted

A comprehensive approach to generating synthetic datasets for training vision-language models on document understanding tasks across multiple languages.

Dataset Generation

Multimodal AI

Document Understanding

CVPR 2025

ViViD - Vision Language model for Unified Visual Understanding of Documents

Emergent Visual Abilities and Limits of Foundation Models (EVAL-FoMo 2025)•2025•

Accepted

A vision-language model specifically optimized for document understanding tasks, capable of processing diverse document formats with high accuracy.

Vision-Language Models

Document Understanding

Multimodal AI

Coming Soon

NAACL 2025

Nayana OCR: A Scalable Framework for Document OCR in Low-Resource Languages

Language Models for Underserved Communities•2025•

Accepted

Development of a specialized OCR system designed for low-resource Indic languages, addressing unique challenges in character recognition and document processing.

OCR

Low-Resource Languages

Document Processing

Achievements & News

Latest updates, recognitions, and highlights

Omniparse Hits 6500 Stars on GitHub

April 2025

Omniparse, our open-source document parsing library, has reached 6500 stars on GitHub, making it one of the most popular libraries for document processing.

Open Source

GitHub

Milestone

View Repository

Awarded LLaMA Impact Grant by Meta AI

April 2025

Cognitivelab was seleted as one of the recipients of Meta's LLaMA Impact Grant for our work on extending large language models to under-resourced Indic languages.

Award

Grant

Meta AI

Announcement

Latest Blog Posts

Cursor for Image Editing: A Multi-Agent Approach for Visual Content Creation

A novel multi-agent system that redefines how we generate and refine visual content, presented at the Multi-Agent Workshops at AAAI 2025.

2025-01-10•15 min read•

External

Building Ambari-7b: India's First Kannada-English Bilingual LLM

The technical journey of creating a performant bilingual LLM for low-resource languages with limited training data.

Low-Resource Languages

LLM Training

Bilingual Models

Skills

PyTorch

Transformers

PEFT

Bitsandbytes

Diffusers

Hugging Face Ecosystem

NLTK

Scapy

FastAPI

Flask

Django

OpenCV

BeautifulSoup

Selenium

Pandas

Poetry

Langchain

React.js

Next.js

Express

Node.js

Vue.js

Bootstrap

Tailwind

Azure

Azure Machine Learning

AWS

AWS SageMaker

Docker

Kubernetes

Cloudflare

E2E Cloud

Databricks

Azure Data Factory

Apache Spark

Hadoop

Kafka

MongoDB

PostgreSQL

Firebase

Redis

MySQL

Supabase

Pinecone

FAISS

Qdrant

ChromaDb

HTML

CSS

JavaScript

TypeScript

Python

C/C++

SQL

Showing 54 total skills

Research Projects

Indic Eval/Leaderboard

Developed an evaluation framework for Indic Large Language Models, accommodating multiple translated benchmarks and a leaderboard around it for comparison.

spaCy

NLTK

Transformers

SkyPilot

Azure

Ambari-7b

India's first Kannada bilingual LLM utilizing the LLama2/3 base model, fine-tuned across multiple stages with 1 billion Kannada tokens and tokenization efficiency by 85%

Pytroch

Transformers

PEFT

Deepspeed

Azure ML

YoloGemma

Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detection and segmentation.

LLaVa

PaliGemma

FastGPT

Transformers

VARAG

Vision-Augmented Retrieval and Generation : a system integrating textual and visual information, enhancing RAG by 35% and improving contextual precision by 60%.

LLaVa

Visual RAG

LLama-index

Qdrant

Mixture of Lora Experts

A novel architecture facilitating the dynamic serving of multiple finetuned LLMs by swapping Lora Adapters during inference.

PyTorch

BERT

PEFT

Distributed Training

HPC

ViViD

A state-of-the-art Vision-Language model specialized in converting complex PDFs into markdown with high speed and efficiency.

PyTorch

LLaVA

PEFT

Distributed Training

HPC

Other Projects

Cognitune

All-in-one platform for LLMops, featuring distributed data processing, multi-GPU fine-tuning, dynamic evaluation, and one-click high-throughput API deployment.

Python

FastAPI

Transformers

Containerization

Storyblocks

Generate Story Video from a Prompt : Transformed text prompts into dynamic story videos with script generation, synchronized audio, and consistent visual style.

Fast API

NextJS

Diffusers

MoviePy

Wisper

Marker API

A production-ready server with 400 github ⭐, easily deployable to convert PDFs, Word documents, etc., into markdown to aid RAG pipelines.

Pytorch

Fast API

HPC

Docker

Transformer

PyRaft

Python implementation of the RAFT consensus algorithm from scratch using FastAPI, achieving a throughput of 50-250 transactions per second

Python

FastAPI

RAFT Consensus

Tokenizer Arena

A friendly arena to easily compare different tokenizers of various LLMs simultaneously, running completely in the browser.

Transformer JS

React

Tailwind CSS

Topic2Dataset

Create high-quality instruction fine-tuning datasets for LLMs by providing a topic or website, allowing massive synthetic data generation.

FastAPI

NextJS

Langchain

GraphScraper

Docker

Education

PES University

2021 - 2025

Bachelor's Degree in Computer Science

ADITHYA

KOLAVI

AI RESEARCHER, BUILDING GENERATIVE AI SOLUTIONS AT SCALE