Research Statement
My academic research focuses on practical and provable algorithms dealing with large amounts of data. Topics include vector search, streaming algorithms, randomized linear algebra, machine learning, coresets, clustering, data mining and more. I published more than 70 papers including best-paper awards at SODA (2011), KDD (2013), and PODS (2021), plus a 2022 ACM SIGMOD Research Highlight.
The publication list below is organized by research area. Within each area, papers are listed newest first, and conference and journal versions of the same work are listed together.
Academic Service
I served on academic review committees for conferences and journals including NeurIPS, SIGIR, AISTATS, ICML, KDD, WSDM, WWW, COLT, ESA, SODA, and FOCS.
Highlights
-
Nearly Optimal Attention Coresets
Edo Liberty, Alexandr Andoni, Eldar Kleiner
tl;dr: This new paper nearly resolves a critical step in KV-caches compression, allowing model serving of larger contexts or more sessions on the same infrastructure. It proves any KV cache (K,V) contains a subset (K',V') of size roughly O(sqrt(d)e^(r)/eps)) such that ||Attn(q,K,V) - Attn(q,K',V')|| < eps for all queries ||q|| < r. [bib]
-
Amazon SageMaker Elastic Algorithms
Edo Liberty, Zohar Karnin, Bing Xiang, Laurence Rouesnel, Baris Coskun, Ramesh Nallapati, Julio Delgado Mangas, Amir Sadoughi, Yury Astashonok, Piali Das, Can Balioglu, Saswata Chakravarty, Madhav Jha, Philip Gautier, Tim Januschowski, Valentin Flunkert, David Arpin, and Alex Smola.
SIGMOD 2020
tl;dr: The culmination of more than two years of work, this paper describes the algorithms and distributed architecture behind Amazon SageMaker's elastic ML algorithms. [bib]
-
Optimal Quantile Approximation in Streams
Zohar Karnin, Kevin Lang, Edo Liberty
FOCS 2016
tl;dr: This paper describes the KLL algorithm. It resolves one of the longest-standing and basic problems in the streaming algorithms literature. Namely, optimally approximating ranks and quantiles in streaming data. It is implemented in many database systems including BigQuery and Apache DataSketches. [slides] [Python code] [bib]
-
Relative Error Streaming Quantiles
Graham Cormode, Zohar Karnin, Edo Liberty, Justin Thaler, Pavel Veselý
PODS 2021, Best paper award - 2022 ACM SIGMOD Research Highlight Award
tl;dr: This (finally) solves a problem I wanted to solve for years. Namely, how to efficiently sketch quantiles with relative errors. This is critical for large scale performance monitoring, for example. [bib]
-
Simple and Deterministic Matrix Sketches
Edo Liberty
KDD 2013, Best paper award
tl;dr: This paper introduced frequent-directions (FD), an incredibly simple, practically efficient, and theoretically optimal algorithm for approximating the covariance of vector streams. The followup work Even Simpler Deterministic Matrix Sketching later simplified the proof to a single line. Frequent-directions is implemented in many AI platforms, typically for approximating the Hessian. It is also widely taught in CS classes. [slides], [talk], [bib].
See Frequent-Directions in python and experiments repo that Mina Ghashami and I made public.
-
Threading Machine Generated Email
Nir Ailon, Zohar Karnin, Edo Liberty, Yoelle Maarek
Best paper award TechPulse 2012, and WSDM 2013
tl;dr: the paper shows how to use sketches to find causality relations between billions of events using trillions of observations. [bib]
-
An Almost Optimal Unrestricted Fast Johnson-Lindenstrauss Transform
Nir Ailon, Edo Liberty
SODA 2011, Best paper award
tl;dr: The main result of my PhD work on fast dimension reduction. Specifically, matching the optimal target dimension and running time of the Johnson-Lindenstrauss lemma with fast random Hadamard based projection algorithms. Random Hadamard transformations are, by now, a workhorse of vector databases, quantization, and model compression. [bib]
-
Discrepancy, Coresets, and Sketches in Machine Learning
Zohar Karnin and Edo Liberty
COLT 2019
tl;dr: This ML-theory paper shows that many types of machine learning models have much smaller coresets than those previously known. As a special case of the general result, it resolves the open problem regarding the coreset complexity of Gaussian density estimation. [bib]
Vector Search
-
Accurate and Efficient Metadata Filtering in Pinecone's Serverless Vector Database
Amir Ingber, Edo Liberty
1st Workshop on Vector Databases (VecDB) at ICML 2025 [abstract] [bib]
-
Results of the Big ANN: NeurIPS'23 Competition
Harsha Vardhan Simhadri, Martin Aumüller, Amir Ingber, Matthijs Douze, George Williams, Magdalen Dobson Manohar, Dmitry Baranchuk, Edo Liberty, and others
NeurIPS 2023 Competition Track [bib]
-
Bridging Dense and Sparse Maximum Inner Product Search
Sebastian Bruch, Franco Maria Nardini, Amir Ingber, Edo Liberty
ACM Transactions on Information Systems (TOIS) 2024 [bib]
-
An Approximate Algorithm for Maximum Inner Product Search over Streaming Sparse Vectors
Sebastian Bruch, Franco Maria Nardini, Amir Ingber, Edo Liberty
ACM Transactions on Information Systems (TOIS) 2024 [bib]
-
Projective Clustering Product Quantization
Aditya Krishnan, Edo Liberty
arXiv 2021 [bib]
-
Asymmetric Random Projections
Nicholas Ryder, Zohar Karnin, and Edo Liberty
arXiv 2019 [bib]
-
An Almost Optimal Unrestricted Fast Johnson-Lindenstrauss Transform
Nir Ailon, Edo Liberty
Best paper at SODA 2011; ACM Transactions on Algorithms (TALG) 2013 [bib] [journal bib]
-
Dense Fast Random Projections and Lean Walsh Transforms
Edo Liberty, Nir Ailon, Amit Singer
RANDOM 2008; Discrete & Computational Geometry (DCG) 2010 [bib] [journal bib]
-
Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes
Nir Ailon, Edo Liberty
SODA 2008; Discrete & Computational Geometry (DCG) 2009 [bib] [journal bib]
Streaming Data Algorithms
-
Even Simpler Deterministic Matrix Sketching
Edo Liberty
arXiv 2022 [bib]
-
Simple and Deterministic Matrix Sketches
Edo Liberty (introduces frequent-directions; see slides, talk, and the python repo)
Best paper at KDD 2013 [bib]
-
Frequent Directions: Simple and Deterministic Matrix Sketching
Mina Ghashami, Edo Liberty, Jeff M. Phillips, David P. Woodruff
SIAM Journal on Computing 2016 [bib]
-
Relative Error Streaming Quantiles
Graham Cormode, Zohar Karnin, Edo Liberty, Justin Thaler, Pavel Veselý
Best paper at PODS 2021 (2022 ACM SIGMOD Research Highlight Award); Journal of the ACM (JACM) 2023 [bib] [journal bib]
-
Streaming Quantiles Algorithms with Small Space and Update Time
Nikita Ivkin, Edo Liberty, Kevin Lang, Zohar Karnin, Vladimir Braverman
Sensors 2022 [bib]
-
A High-Performance Algorithm for Identifying Frequent Items in Data Streams
Daniel Anderson, Pryce Bevan, Kevin Lang, Edo Liberty, Lee Rhodes, Justin Thaler
IMC 2017 [bib]
-
Optimal Quantile Approximation in Streams
Zohar Karnin, Kevin Lang, Edo Liberty
-
Space Lower Bounds for Itemset Frequency Sketches
Edo Liberty, Michael Mitzenmacher, Justin Thaler, Jonathan Ullman
PODS 2016 [bib]
Randomized Linear Algebra
-
Efficient Frequent Directions Algorithm for Sparse Matrices
Mina Ghashami, Edo Liberty, Jeff M. Phillips
KDD 2016 [bib]
-
A Short Proof for Gap Independence of Simultaneous Iteration
Edo Liberty
arXiv 2016 [bib]
-
Online PCA with Spectral Bounds
Zohar Karnin, Edo Liberty
-
Online Principal Component Analysis
Christos Boutsidis, Dan Garber, Zohar Karnin, Edo Liberty
SODA 2014 [bib]
-
Near-Optimal Entrywise Sampling for Data Matrices
Dimitris Achlioptas, Zohar Karnin, Edo Liberty
NeurIPS 2013 [bib]
-
The Mailman Algorithm: a Note on Matrix-Vector Multiplication
Edo Liberty, Steven Zucker
Information Processing Letters (IPL) 2009 [bib]
-
A Fast Randomized Algorithm for the Approximation of Matrices
Edo Liberty, Franco Woolfe, Vladimir Rokhlin, and Mark Tygert
Applied and Computational Harmonic Analysis (ACHA) 2008 [bib]
-
Randomized Algorithms for the Low-Rank Approximation of Matrices
Edo Liberty, Franco Woolfe, Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert
PNAS 2007 (Proceedings of the National Academy of Sciences) [bib]
Coresets & Clustering
-
Nearly Optimal Attention Coresets
Edo Liberty, Alexandr Andoni, Eldar Kleiner
arXiv 2026 [bib]
-
Discrepancy, Coresets, and Sketches in Machine Learning
Zohar Karnin and Edo Liberty
COLT 2019 [bib]
-
Greedy Minimization of Weakly Supermodular Set Functions
Edo Liberty, Maxim Sviridenko
-
An Algorithm for Online K-Means Clustering
Edo Liberty, Ram Sriharsha, Maxim Sviridenko
ALENEX 2016 [bib]
-
Unsupervised SVMs: On the Complexity of the Furthest Hyperplane Problem
Zohar Karnin, Edo Liberty, Shachar Lovett, Roy Schwartz, and Omri Weinstein
COLT 2012; JMLR 2012 (Journal of Machine Learning Research) [slides] [bib]
-
Improved Approximation Algorithms for Bipartite Correlation Clustering
Nir Ailon, Noa Avigdor-Elgrabli, Edo Liberty, Anke van Zuylen
ESA 2011; SIAM Journal on Computing 2012 [slides] [bib] [journal bib]
-
Similarity Kernels via Bi-clustering for Conventional Intergovernmental Organizations
Minh Tam Le, John Sweeney, Edo Liberty, Steven W. Zucker
IEEE ISI 2010 [bib]
-
Correlation Clustering Revisited: The "True" Cost of Error Minimization Problems
Nir Ailon, Edo Liberty
ICALP 2009 [bib]
Data Mining
-
From the lab to production: A case study of session-based recommendations in the home-improvement domain
Pigi Kouki, Ilias Fountalis, Nikolaos Vasiloglou, Xiquan Cui, Edo Liberty, Khalifeh Al Jadda
RecSys 2020 [bib]
-
Threading Machine Generated Email
Nir Ailon, Zohar Karnin, Edo Liberty, Yoelle Maarek
Best paper at TechPulse 2012 and WSDM 2013 [bib]
-
Framework and Algorithms for Network Bucket Testing
Liran Katzir, Edo Liberty, and Oren Somekh
WWW 2012 [bib]
-
Automatically Tagging Email by Leveraging Other Users' Folders
Yehuda Koren, Edo Liberty, Yoelle Maarek, and Roman Sandler
KDD 2011 [bib]
-
Estimating Sizes of Social Networks via Biased Sampling
Liran Katzir, Edo Liberty, Oren Somekh, Ioana A. Cosma
WWW 2011; Journal of Internet Mathematics [bib] [journal bib]
ML Systems
-
SIGIR 2025 — LiveRAG Challenge Report
David Carmel, Simone Filice, Guy Horowitz, Yoelle Maarek, Oren Somekh, Ran Tavory, Mehdi Ghissassi, Edo Liberty, Roy Miara
SIGIR 2025 [bib]
-
Amazon SageMaker Elastic Algorithms
Edo Liberty, Zohar Karnin, Bing Xiang, Laurence Rouesnel, Baris Coskun, Ramesh Nallapati, Julio Delgado Mangas, Amir Sadoughi, Yury Astashonok, Piali Das, Can Balioglu, Saswata Chakravarty, Madhav Jha, Philip Gautier, Tim Januschowski, Valentin Flunkert, David Arpin, and Alex Smola.
SIGMOD 2020 [bib]
-
ProxQuant: Quantized Neural Networks via Proximal Operators
Yu Bai, Yu-Xiang Wang, Edo Liberty
ICLR 2019 [bib]
-
Stratified Sampling meets Machine Learning
Kevin Lang, Edo Liberty, Konstantin Shmakov
-
Inverted Index Compression via Online Document Routing
Gal Lavee, Ronny Lempel, Edo Liberty, and Oren Somekh
WWW 2011 [bib]
Thesis & Other
-
Accelerated Dense Random Projections
PhD Thesis. See also Talk slides
-
Electrons and Phonons on the Square Fibonacci Tiling
Roni Ilan, Edo Liberty, Shahar Even-Dar Mandel, and Ron Lifshitz.
Ferroelectrics 2004.