Research Interests

I am generally interested in machine learning, especially algorithmic and computational issues relevant to the use of machine learning in practice. In my time at Microsoft Research, I have had the chance to apply machine learning to real world scenarios, identifying research problems from the applications and reiterating this process:

Extreme Classification for Bing Ad Recommendations
In this project, I worked on improving Bing Ad Recommendations using Extreme Classification. We started off by applying the then state-of-the-art algorithm PfastreXML and were able to ship our work in production. Later we improved this by a large margin by applying a powerful ensemble of Extreme Classification algorithms, PfastreXML, SwiftXML (accepted in WSDM, 2018) and Parabel (in submission in WWW, 2018).
Research Problems in Extreme Classificaiton
While applying Extreme Classification in products, we came across many research problems in this domain.
- In Extreme Classification traditionaly binary label weights have been used but these do not yield desired results in applications, forcing us to explore many labelling strategies before setting onto point-wise mutual weights.
- Extreme Classification algorithms do not output the scores in same range for all the labels and choosing good recommendations via heuristics such as top k or a global threshold is not optimal. We were able to come up with a novel formulation to get the number of recommendations to be picked per page, as popular pages may have more relevant labels in comparison to tail pages.
- Traditionally Extreme Classificaiton algorithms have only been using feature information from the users side and ignoring the rich information available on label side. In SwiftXML, we explore the use of label features and create a classifier similar in scale to PfastreXML but better in performance.
Large Scale Optimization
I am interested in tackling large scale optimization problems which appear in many machine learning problems. For example, while working on Extreme Classification, I have encountered the following optimization problems
- Learning Regressor to choose best recommendation for a page
- Learning a per query linear separator via optimizing a l2-regularized custom loss function
- Learning a single classifier for separating data points in a node while training PfastreXML trees based on NDCG loss
- Learning classifiers for separating data points in a node while training SwiftXML trees based on NDCG loss
- Learning a one-vs-all separator for each query