Theme 1: Probabilistic Modelling and Inference
PMI forms the foundation of my understanding in ML, anchoring my intuitions and expanding my approach to complex problems. My fascination with PMI began with the CS698X course at IIT Kanpur, taught by Prof. Piyush Rai, which was a pivotal experience. I've exhaustively documented my thoughts on PMI
here.
- (a) Undergraduate Thesis: My thesis, in collaboration with Shashank Gupta, explored bandits for sequential recommendations. We focused on two key ideas: the power of bandits in sequential interaction, facilitating online learning, and the efficiency of clustering arms to streamline computations. Using a DP-means approach with stick-breaking priors, we bypassed the need for a fixed cluster size, letting clusters emerge naturally. Report | Code
- (b) Mixture of ARMA: This related project asks whether time series can be modeled as a mixture of ARMA processes, clustering them using the EM algorithm. Report | Code
- (c) sslvae: Recently, I implemented SSLVAE, leveraging VAEs for label prediction in images with discrete latent variables modeled by a Concrete distribution. This approach draws from the work of Ben Poole et al. and Yee Whye Teh on the Gumbel-Softmax trick. Differentiating them is a pain, yet I'm thoroughly fascinated by VAEs and the interpretability offered via discrete LVs. Code
- (d) Misc: Exploratory modelling in Jax, Pytorch: Link 1 | Link 2
Theme 2: Multimodal Combinations
Image and text generation tasks are highly pedagogical. They have been a great entry point for me into Deep Learning. However, I have
mixed thoughts about the subfield in its current shape.
- (a) SongTrain: Though not directly tied to image/text, Songtrain marked my first significant project working with data. We analyzed audio frequencies to track users' singing accuracy compared to an original track, providing real-time feedback and scoring. This project introduced me to the possibilities of multimodal data, Our work earned the first place in Microsoft's code.fun.do! Video | The Hindu.
- (b) Image-Text Fragments: We developed a tool to generate custom image-text combinations from articles, adapting content and style to meet specific user needs. Work published at IUI 2020. Paper
- (c) Temporal Fragments: Expanding on image-text combinations, what if we incorporate a temporal dimension? This allows us to add audio and video elements to the mix. Of course, it doesn't work for all documents! We focused on procedural texts. Work published in WACV 2023. Paper | Video
Theme 3: Systems for Creative Collaboration
Generating images and videos certainly has its functional applications, but I feel it doesn't yet fully capture the needs of a creative's toolbox. What truly excites me is the prospect of co-creation. Just as a chat interface provides an interactive flow in textual modality, I envision a similarly iterative, back-and-forth process in image and video modalities. This interaction doesn't have to be limited to a human-agent dynamic; it could very well involve human-human interactions, with an agent in the background, subtly aligning differences. You can read more about my perspectives on RL
here.
- (a) Codifying Conflicts in Co-Coloring: We looked at the process of creatives collaborating on a simple line-art coloring task, focusing on the conflicts and pain points that arise. Through user studies, we explored whether participants could effectively collaborate and impart their unique preferences in the final artwork. This project allowed me to work with SVGs, a flexible and often overlooked modality. Work published in CHI LBW 2022. Paper | Poster | Video
- (b) Collabcolor: Extending the previous task, we developed CollabColor, an RL-based support system that assists users in finding a shared creative vision during collaboration. Framed as a multi-agent MDP, this project encompassed user studies, behaviour cloning, supervised RL with transformers, extensive evaluations, and more. We found that CollabColor's interventions improved coherence in final colorizations. While the immediate impact is on artistic collaboration, can RL interventions broadly help reconcile local differences toward societal harmony? Perhaps a utopian overreach, but one worth pursuing. Paper Draft
Theme 4: Compressing Data and Models
My experience working on creative support systems highlighted a recurring issue: users struggle with large, slow models that aren't well-suited for production or mobile use. I've always felt that today's models are bloated, impacting UX unnecessarily. This led me to an interest in compression and efficiency. You can read more
here and
here.
- (a) Post-OCR Error Detection: OCR outputs suffer from common document issues like poor angles, folds, and creases. For Adobe Scan, I developed a lightweight, mobile-friendly model that flags documents likely to yield OCR errors, enabling targeted post-processing. Code
- (b) User-guided Variable-rate Learned Image Compression: Neural nets can compress images but this process is lossy. I explored an approach where we ask users for their critical regions and compress less aggressively in those regions. This enables differential bit allocation, allowing a single model to operate at multiple bit rates. Work published in CVPR 2022 Workshop. Paper
Theme 5: Tech for Public Service Systems
Alongside ML research, I've been deeply interested in the sociological, historical, and political landscapes of the communities I belong to. My previous research often catered to functional productivity and efficiency, and I wanted to explore public interest technology that primarily serves human-centered values. Low-resource contexts require participatory action and field research with care and dignity as a necessary ethic. I feel these constraints, whether geographical or economic, can foster uniquely creative solutions. You can read more
here and
here.
- (a) Avanti's Open Source Tools: I played a major role in developing tools for India's public school systems, reaching over 250,000 students across 5,000 schools. These tools enable assessments, attendance tracking, resource access, reporting, and data analysis for thousands of students and teachers daily. Code | Video
- (b) I lead a project applying LLM-based interventions to enhance student memory retention, emphasizing responsible, ethically-aligned AI practices in education. More on this soon!
Are Mobile Phones changing Social Networks?
Paper Review of a longitudinal study of core networks in Kerala by Palackal et al. (2011)