Papers

Research papers and academic contributions. My first paper as first author is coming in 2026.

First Author

HViT: A Hierarchical Vision Transformer Architecture with 3x Better Throughput

Under Review2026

Novel hierarchical vision transformer achieving 3x better throughput than standard ViT while maintaining competitive accuracy. Features multi-scale feature extraction, efficient attention mechanisms, and comprehensive ablation studies across ImageNet and domain-specific datasets. Also applied to agricultural domain (HVT-LEAF) with self-supervised pretraining for plant disease detection.

Vision TransformersEfficient ArchitecturesComputer VisionDeep LearningPDFView Code
Contributor

VIDUR: Video-based Instruction Dataset for Understanding and Reasoning

Harvard University Research2024

Contributed to research on video understanding and reasoning under Prof. Devashree Tripathy at Harvard University.

Video UnderstandingMultimodal AIReasoning

🔬 More papers coming soon...