Projects
I've divided projects into few themes for easy access. Although I'm not fully invested in all of them right now, I'm excited to hear and discuss ideas. For a full list, check my Google Scholar page.
Theme 1 · Probabilistic Modelling and Inference
PMI forms the foundation of my understanding in ML, anchoring my intuitions and expanding my approach to complex problems. The CS698X course at IIT Kanpur, taught by Prof. Piyush Rai, was pivotal, and I've documented many of these thoughts in Notes on Research Taste.
- Undergraduate thesis. With Shashank Gupta, we explored bandits for sequential recommendations, leaning on DP-means clustering with stick-breaking priors to avoid fixing the number of clusters. Report · Code
- Mixture of ARMA. Can cohorts of time series be modelled as mixtures of ARMA processes? We tried an EM-based approach. Report · Code
- sslvae. Discrete latent variables in VAEs with Concrete distributions for semi-supervised image labelling, inspired by work from Ben Poole et al. and Yee Whye Teh. Differentiating them is a pain, yet I'm thoroughly fascinated by VAEs and the interpretability offered via discrete LVs. Code
- Exploratory modelling. Experiments in JAX and PyTorch. Link 1 · Link 2
Theme 2 · Multimodal Combinations
Image and text generation tasks are deeply pedagogical and were my entry point into deep learning, though I have mixed thoughts about the field's current incentives.
- SongTrain. First significant project working with data, with real-time feedback on singing accuracy via audio frequency analysis. This project won Microsoft's code.fun.do! Video · The Hindu
- Image-Text Fragments. We developed a tool to generate custom image-text combinations from articles, adapting content and style to meet specific user needs. Presented at IUI 2020. Paper
- Temporal Fragments. Expanding on image-text combinations, what if we incorporate a temporal dimension? This allows us to add audio and video elements to the mix. Of course, it doesn't work for all documents. We focused on procedural texts. Work published in WACV 2023. Paper · Video
Theme 3 · Systems for Creative Collaboration
Generating images and videos certainly has its functional applications, but I feel it doesn't yet fully capture the needs of a creative's toolbox. What truly excites me is the prospect of co-creation. Just as a chat interface provides an interactive flow in textual modality, I envision a similarly iterative, back-and-forth process in image and video modalities. This interaction doesn't have to be limited to a human-agent dynamic; it could very well involve human-human interactions, with an agent in the background, subtly aligning differences. See my thoughts on reinforcement learning here.
- Codifying Conflicts in Co-Coloring. We looked at the process of creatives collaborating on a simple line-art coloring task, focusing on the conflicts and pain points that arise. Through user studies, we explored whether participants could effectively collaborate and impart their unique preferences in the final artwork. This project allowed me to work with SVGs, a flexible and often overlooked modality. Work published in CHI LBW 2022. Paper · Poster · Video
- CollabColor. Extending the previous task, we developed CollabColor, an RL-based support system that assists users in finding a shared creative vision during collaboration. Framed as a multi-agent MDP, this project encompassed user studies, behaviour cloning, supervised RL with transformers, extensive evaluations, and more. We found that CollabColor's interventions improved coherence in final colorizations. While the immediate impact is on artistic collaboration, can RL interventions broadly help reconcile local differences toward societal harmony? Perhaps a utopian overreach, but one worth pursuing. Paper Draft
Theme 4 · Compressing Data and Models
Creative tools highlighted how unwieldy large models can be. I'm interested in making them lighter without compromising utility. More thoughts in this section and here.
- Post-OCR Error Detection. OCR outputs suffer from common document issues like poor angles, folds, and creases. For Adobe Scan, I developed a lightweight, mobile-friendly model that flags documents likely to yield OCR errors, enabling targeted post-processing. Code
- User-guided Variable-rate Learned Compression. Neural nets can compress images but this process is lossy. I explored an approach where we ask users for their critical regions and compress less aggressively in those regions. This enables differential bit allocation, allowing a single model to operate at multiple bit rates. Work published in CVPR 2022 Workshop. Paper
Theme 5 · Tech for Public Service Systems
Alongside ML research, I'm drawn to the sociological, historical, and political landscapes I inhabit. My previous research often catered to functional productivity and efficiency, and I wanted to explore public interest technology that primarily serves human-centered values. Low-resource contexts require participatory action and field research with care and dignity as a necessary ethic. I feel these constraints, whether geographical or economic, can foster uniquely creative solutions. Related notes live in ICTD and Sociology & Caste.
- Avanti's Open Source Tools. I played a major role in developing tools for India's public school systems, reaching over 250,000 students across 5,000 schools. These tools enable assessments, attendance tracking, resource access, reporting, and data analysis for thousands of students and teachers daily. Code · Video
- Memory interventions. Leading an LLM-based intervention to improve retention within Avanti's Quiz Engine, emphasizing responsible AI practices in education.