stbv's blog

I am a Software Engineer at the non-profit Avanti Fellows, where I develop open source technology for public school systems alongside Pritam Sukumar. I also mentor contributors in the Code for GovTech program, supporting the creation of Digital Public Goods. I deeply aspire to understand how machine learning can help us address critical needs within public systems and to build responsible technology. Previously, I worked as a Research Associate at Adobe Research under the guidance of Balaji Vasan Srinivasan. Before Adobe, I studied Electrical Engineering and Machine Learning at IIT Kanpur, where my thesis advisor was Prof. Piyush Rai. Feel free to reach out to me at: teja.surya59@gmail.com
I recently shared detailed notes on my research taste. Do check this post and let me know if any of the ideas resonate with you.
To emphasize, I have a strong interest in designing models with the following characteristics:
  • 1) formulated and deployed with care for local communities,
  • 2) built with domain-dependent inductive biases,
  • 3) small scale; both in parameter count and training data size,
  • 4) adequately specified while graciously leaving room for critique,
  • 5) controllable and interpretable; likely through discrete latent variables,
  • 6) inferred via (semi)differentiable approximate Bayesian procedures.
That's a lot! Briefly, I'd like to go against the tide of current ML trends and build adequately-sized models with an unwavering commitment to responsible practices.

Outside of work, I enjoy movies and books, not only for the art but also for dissecting the craft. You can check out my Goodreads reviews here. I plan to write more on other books soon. As for movies, I keep my hot takes offline but am happy to share them if you're interested.

I am also keen on exploring cities and their hidden spaces. Based in Mumbai, I'm endlessly intrigued by its urban histories and politics. If I'm not at home, you might find me loitering in the gullies around Sion or Byculla.

Mangli Kanduri?

A first attempt at tracing the origins of a Masjid's name.

Cities Politics

Production World: Part 2

Touring The Trade-Offs and Joys of Avanti's Tech Deployments

Deployment NGO

Notes on Research Taste

Rough notes on research that excites me. Written to help me discover my taste in research.

Notes ML Sociology

Production World: Part 1

Journey of Avanti's Quiz Engine From Local Dev To Production.

Start Python with Colab | Video

I made a quick intro on using Colab for basic data analysis. Beginners in Python may find this useful. Slides

Video Python

Tutorial on Human Evaluation

Tutorial on a workflow involving AMT and GDrive that caters to the collection of human responses for 1000s of images/documents/videos or other media. Code

Tutorial Evaluation

RAIL Fellowship Symposium 2024

A tech worker's perspective on Responsible AI practices. Slides

Slides Development NGO

Intro to Tech at Avanti Fellows

A presentation on Avanti's tech at the Tech4Dev Sprint organized by Donald Lobo and Glific.

Slides Development NGO

The Evolution of AI: From Rules to Prompts

Discussed the modelling gains we made over the past two decades. Some examples of prompting relevant to a broader NGO community.

Slides ML NGO

Reflections on Foundation Models

Trends, properties, criticisms, and immediate uses of large-scale models.

Slides Modelling

Post-OCR Error Detection and Correction | Video

Through the sociopolitical lens of Sivaji: The Boss (2007). Slides

Video Slides Altppt

Intern General Quiz 2021

Quiz hosted by Vinay Aggarwal and me during Adobe Internship 2021.

Quiz Slides

Reflections on OpenAI's CLIP Model

Comparing CLIP training and properties with previous class of supervised models.

Slides Modelling

Intro to Collaborative Content Creation

Does collaboration aid in creativity? An initial view of the academic landscape.

Slides Creativity
Over the years, I've been part of diverse projects and I'm very grateful to all collaborators. I've divided them into few themes for easy access. Although I'm not fully invested in all the themes right now, I'm excited to hear and discuss ideas. For a full list, check my Google Scholar page.
Probabilistic model diagram PMI theme visualization

Theme 1: Probabilistic Modelling and Inference

PMI forms the foundation of my understanding in ML, anchoring my intuitions and expanding my approach to complex problems. My fascination with PMI began with the CS698X course at IIT Kanpur, taught by Prof. Piyush Rai, which was a pivotal experience. I've exhaustively documented my thoughts on PMI here.
  • (a) Undergraduate Thesis: My thesis, in collaboration with Shashank Gupta, explored bandits for sequential recommendations. We focused on two key ideas: the power of bandits in sequential interaction, facilitating online learning, and the efficiency of clustering arms to streamline computations. Using a DP-means approach with stick-breaking priors, we bypassed the need for a fixed cluster size, letting clusters emerge naturally. Report | Code
  • (b) Mixture of ARMA: This related project asks whether time series can be modeled as a mixture of ARMA processes, clustering them using the EM algorithm. Report | Code
  • (c) sslvae: Recently, I implemented SSLVAE, leveraging VAEs for label prediction in images with discrete latent variables modeled by a Concrete distribution. This approach draws from the work of Ben Poole et al. and Yee Whye Teh on the Gumbel-Softmax trick. Differentiating them is a pain, yet I'm thoroughly fascinated by VAEs and the interpretability offered via discrete LVs. Code
  • (d) Misc: Exploratory modelling in Jax, Pytorch: Link 1 | Link 2
Multimodal interface example Procedural video generation

Theme 2: Multimodal Combinations

Image and text generation tasks are highly pedagogical. They have been a great entry point for me into Deep Learning. However, I have mixed thoughts about the subfield in its current shape.
  • (a) SongTrain: Though not directly tied to image/text, Songtrain marked my first significant project working with data. We analyzed audio frequencies to track users' singing accuracy compared to an original track, providing real-time feedback and scoring. This project introduced me to the possibilities of multimodal data, Our work earned the first place in Microsoft's code.fun.do! Video | The Hindu.
  • (b) Image-Text Fragments: We developed a tool to generate custom image-text combinations from articles, adapting content and style to meet specific user needs. Work published at IUI 2020. Paper
  • (c) Temporal Fragments: Expanding on image-text combinations, what if we incorporate a temporal dimension? This allows us to add audio and video elements to the mix. Of course, it doesn't work for all documents! We focused on procedural texts. Work published in WACV 2023. Paper | Video
Creative collaboration system Collaborative coloring interface

Theme 3: Systems for Creative Collaboration

Generating images and videos certainly has its functional applications, but I feel it doesn't yet fully capture the needs of a creative's toolbox. What truly excites me is the prospect of co-creation. Just as a chat interface provides an interactive flow in textual modality, I envision a similarly iterative, back-and-forth process in image and video modalities. This interaction doesn't have to be limited to a human-agent dynamic; it could very well involve human-human interactions, with an agent in the background, subtly aligning differences. You can read more about my perspectives on RL here.
  • (a) Codifying Conflicts in Co-Coloring: We looked at the process of creatives collaborating on a simple line-art coloring task, focusing on the conflicts and pain points that arise. Through user studies, we explored whether participants could effectively collaborate and impart their unique preferences in the final artwork. This project allowed me to work with SVGs, a flexible and often overlooked modality. Work published in CHI LBW 2022. Paper | Poster | Video
  • (b) Collabcolor: Extending the previous task, we developed CollabColor, an RL-based support system that assists users in finding a shared creative vision during collaboration. Framed as a multi-agent MDP, this project encompassed user studies, behaviour cloning, supervised RL with transformers, extensive evaluations, and more. We found that CollabColor's interventions improved coherence in final colorizations. While the immediate impact is on artistic collaboration, can RL interventions broadly help reconcile local differences toward societal harmony? Perhaps a utopian overreach, but one worth pursuing. Paper Draft
Data compression visualization

Theme 4: Compressing Data and Models

My experience working on creative support systems highlighted a recurring issue: users struggle with large, slow models that aren't well-suited for production or mobile use. I've always felt that today's models are bloated, impacting UX unnecessarily. This led me to an interest in compression and efficiency. You can read more here and here.
  • (a) Post-OCR Error Detection: OCR outputs suffer from common document issues like poor angles, folds, and creases. For Adobe Scan, I developed a lightweight, mobile-friendly model that flags documents likely to yield OCR errors, enabling targeted post-processing. Code
  • (b) User-guided Variable-rate Learned Image Compression: Neural nets can compress images but this process is lossy. I explored an approach where we ask users for their critical regions and compress less aggressively in those regions. This enables differential bit allocation, allowing a single model to operate at multiple bit rates. Work published in CVPR 2022 Workshop. Paper
Public service technology platform

Theme 5: Tech for Public Service Systems

Alongside ML research, I've been deeply interested in the sociological, historical, and political landscapes of the communities I belong to. My previous research often catered to functional productivity and efficiency, and I wanted to explore public interest technology that primarily serves human-centered values. Low-resource contexts require participatory action and field research with care and dignity as a necessary ethic. I feel these constraints, whether geographical or economic, can foster uniquely creative solutions. You can read more here and here.
  • (a) Avanti's Open Source Tools: I played a major role in developing tools for India's public school systems, reaching over 250,000 students across 5,000 schools. These tools enable assessments, attendance tracking, resource access, reporting, and data analysis for thousands of students and teachers daily. Code | Video
  • (b) I lead a project applying LLM-based interventions to enhance student memory retention, emphasizing responsible, ethically-aligned AI practices in education. More on this soon!
I maintain multiple public repositories at Avanti Fellows, focusing on projects such as the Quiz Engine, Portal, Reporting Engine, Session Manager, and College Predictor. These repos have a variety of open issues, many of which are beginner-friendly. Please consider contributing! You can reach out to me on Avanti's Discord. My discord id is suryabulusu.
I've listed some personal projects outside of Avanti here, primarily related to programming and ML:

sslvae: Semi Supervised Learning with discrete VAE

Uses Concrete Distrbution and Gumbel-Softmax trick. Also (unofficially) implemented Variational Contrastive Divergence following Ruiz (2019)'s work.

Advent of Code 2021

I had a lot of fun participating in AOC 2021. My main aim was to learn Python's typing and class structuring. I wrote some takeaways on each question in the README section.

Solutions to CSES Problems

CSES has concise descriptions of common algorithms. My goal wasn't necessarily to excel in competitive programming but rather deepen my familiarity with Python.

Mixture of ARMA: Time Series clustering

EM Algorithm for soft-clustering of Time Series data. Chooses the best model using BIC.

Code for CS698X: Probabilisitc Modelling and Inference

Bayesian Lin. Reg., MCMC, Online EM, etc. Course taught by Prof. Piyush Rai.

Last Updated: 5th Nov. 2024 | You should make a now page too!

Reading

  • (a) A History of the Bible, by John Barton. Two chapters in, it's already a masterpiece in elucidation. Barton is extremely compassionate.
  • (b) Recoding America, by Jennifer Pahlka.

Writing - Lots to write!

  • (a) An essay on Hyderabad. Expanding on my review of Prof. Afsar Mohammad's book.
  • (b) An essay on my interactions with Anki over the past five years. Did Anki improve my memory retention? Did Anki make me more creative?
  • (c) An essay on my experiences in the non-profit sector.

Work and Research

  • (a) I've finally updated my website once again after 5 years!
  • (b) Building an LLM-based intervention for memory retention in Avanti's Quiz Engine.