About
Building AI that actually works! Currently deep into Vision-Language Models and Agentic Systems, with hands-on experience taking AI projects from wild ideas to real products. Love tinkering with model fine-tuning and cloud deployments. Big open-source enthusiast - you'll find me contributing to projects that make AI more accessible to everyone.
Work Experience


Featured Open Source Work
Academic Publications
Research papers and academic contributions
Nayana: A Foundation for Document-Centric Vision-Language Models via Multi-Task, Multimodal, and Multilingual Data Syn-thesis
ViViD - Vision Language model for Unified Visual Understanding of Documents
A vision-language model specifically optimized for document understanding tasks, capable of processing diverse document formats with high accuracy.
Nayana OCR: A Scalable Framework for Document OCR in Low-Resource Languages
Achievements & News
Latest updates, recognitions, and highlights
Omniparse Hits 6500 Stars on GitHub
Omniparse, our open-source document parsing library, has reached 6500 stars on GitHub, making it one of the most popular libraries for document processing.
Awarded LLaMA Impact Grant by Meta AI
Cognitivelab was seleted as one of the recipients of Meta's LLaMA Impact Grant for our work on extending large language models to under-resourced Indic languages.
Latest Blog Posts
Recent articles and insights
Skills
Research Projects
Indic Eval/Leaderboard
Developed an evaluation framework for Indic Large Language Models, accommodating multiple translated benchmarks and a leaderboard around it for comparison.
Ambari-7b
India's first Kannada bilingual LLM utilizing the LLama2/3 base model, fine-tuned across multiple stages with 1 billion Kannada tokens and tokenization efficiency by 85%
YoloGemma
Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detection and segmentation.
VARAG
Vision-Augmented Retrieval and Generation : a system integrating textual and visual information, enhancing RAG by 35% and improving contextual precision by 60%.
Mixture of Lora Experts
A novel architecture facilitating the dynamic serving of multiple finetuned LLMs by swapping Lora Adapters during inference.
ViViD
A state-of-the-art Vision-Language model specialized in converting complex PDFs into markdown with high speed and efficiency.
Other Projects
Cognitune
All-in-one platform for LLMops, featuring distributed data processing, multi-GPU fine-tuning, dynamic evaluation, and one-click high-throughput API deployment.
Storyblocks
Generate Story Video from a Prompt : Transformed text prompts into dynamic story videos with script generation, synchronized audio, and consistent visual style.
Marker API
A production-ready server with 400 github ⭐, easily deployable to convert PDFs, Word documents, etc., into markdown to aid RAG pipelines.
PyRaft
Python implementation of the RAFT consensus algorithm from scratch using FastAPI, achieving a throughput of 50-250 transactions per second
Tokenizer Arena
A friendly arena to easily compare different tokenizers of various LLMs simultaneously, running completely in the browser.
Topic2Dataset
Create high-quality instruction fine-tuning datasets for LLMs by providing a topic or website, allowing massive synthetic data generation.