About
I spent much of my career in software companies as an applied scientist. Most of that work is proprietary; prototyping, benchmarking, data pipelines, training models, etc. The public artifacts are mainly patents and the occasional open-source project. They are listed below by company (most recent first) based on the kinds of infrastructure I was working on.
Pinecone
Working on building Pinecone's vector database - low-cost, serverless, multi-tenant, large-scale, low-latency vector search.
Open source; benchmarks
-
Big ANN Benchmarks - a benchmark and competition for billion-scale approximate nearest-neighbor search, pushing the state of the art in vector-search algorithms and systems.
-
vq-bench - a benchmark for vector quantization (coming soon).
AWS
Built the algorithms and distributed systems behind Amazon SageMaker - AWS's platform for training and serving machine learning models at scale.
See the paper Amazon SageMaker Elastic Algorithms (SIGMOD 2020) on the Research page.
Patents
-
System and Method for Experimentation and Deployment of Machine Learning Models on Cloud Based Platforms
Edo Liberty, Stefano Stefani, Alexander Smola, Craig Wiley, Steve Loeppky, Tom Faulhaber, Swami Sivasubramanian, Zohar Karnin. US 11,257,002
-
Method for post-training Hyperparameter Tuning by training Machine Learning States
Edo Liberty, Zohar Karnin. US 12,475,406
-
Autoscaling of Training Machine Learning Jobs on Cloud Infrastructures
Edo Liberty, Stefano Stefani, Swami Sivasubramanian, Zohar Karnin, Tom Faulhaber, Alexander Smola, Craig Wiley, Amir Sadoughi, Dayanand Rangegowda. US 12,277,480
-
A system for autoscaling and hosting of ML Models for production inference
Edo Liberty, Stefano Stefani, Steve Loeppky, Craig Wiley, Tom Faulhaber. US 11,126,927
-
Online training with delayed feedback with applications to bandwidth-efficient communication over networks
Edo Liberty, Madhav Jha. US 10,839,809
-
System Architecture for Container Based Large Scale Machine Learning Platforms
Stefano Stefani, Craig Wiley, Thomas Faulhaber, Alexander Smola, Steven Loeppky, Richard Bice, Edo Liberty, Swaminathan Sivasubramanian, Charles Swan, Taylor Goodhart. US 12,045,693
-
Method and Systems for Optimal Graph Synchronization for Distributed Machine Learning
Mu Li, Edo Liberty, Alexander Smola, Leyuan Wang. US 11,176,489
-
Machine Learning model-assisted real-time enhancement of audio/video over a network call to significantly lower bandwidth requirements
Madhav Jha, Edo Liberty. US 10,482,887
-
Training machine learning models for physical agents and robotic controls with simulations
S. Genc, E. Liberty. US 10,824,913
-
Machine Learning system to remove accent from spoken speech
Edo Liberty, Leo Dirac. US 10,163,451
Yahoo
Built horizontal machine-learning platforms and the streaming-data systems that powered Yahoo's products, from advertising to mail.
Open source
-
Apache DataSketches is the leading and most popular open source implementation of streaming algorithms for sketching and summarizing data such as counting distinct items (like HLL), frequent items (aka top-k), streaming quantiles, and more. It is used by Druid, Spark, Yahoo, AWS, Google, and many more.
Patents
-
Generalized Stratified Sampling
Kevin Lang, Edo Liberty, Konstantin Shmakov
-
On-line content sampling
KJ Lang, E Liberty, K Shmakov. US 10,685,066
-
Classifying man versus machine generated email
Zohar Karnin, Guy Halawi, David Wajc, Edo Liberty. US 10,778,618
-
A System for Email sequence identification
Edo Liberty, Zohar Karnin, Yoelle Maarek, Natalie Aizenberg
-
Sponsored Apps Marketplace in eMail
Ronny Lempel, Yoelle Maarek, Edward Bortnikov, Edo Liberty. US 9,111,291
-
Mining Global Email Folders For Identifying Auto-folders tags
Vishwanath Ramarao, Andrei Broder, Idan Szpektor, Edo Liberty, Yehuda Koren, Mark Risher, and Yoelle Maarek. US 8,463,827
-
Email sequence identification
Edo Liberty, Zohar Karnin, Yoelle Maarek. US 8,856,249
-
Mailing List Identification and Representation
Zohar Karnin, Michal Aharon, Edo Liberty, Yoelle Maarek. US 9,596,205
-
Identification of subject line templates
Zohar Karnin, Edo Liberty, David Wajc, Guy Halawi. US 10,885,548
-
Computerized system and method for modifying a message to apply security features to the message's content
Edo Liberty, Yoelle Maarek. US 10,862,843
-
Electronic message composition support method and apparatus
J Tetreault, A Pappu, E Liberty, L Cao, M Liu, E Pavlick, G Tsur, Y Maarek. US 11,265,271
-
Mail Lint: Write Better Emails
Joel Tetreault, Aasish Pappu, Edo Liberty, Liangliang Cao, Meizhu Liu, Ellie Tobochnik, Gilad Tzur, Yoelle Maarek. US 10,193,833
-
Contest Generation Methods for Daily Fantasy Sports
Justin Thaler, Maxim Sviridenko, Edo Liberty, Prerit Uppal, Ron Belmarch, Jerry Shen. US 10,722,799
-
Fantasy Sports Data Analysis for Game Structure Development
Justin Thaler, Maxim Sviridenko, Edo Liberty, Prerit Uppal, Ron Belmarch, Jerry Shen. US 12,233,342
Worked as an Intern (twice) on Google Analytics and Google Maps.
Patents
-
Method And System For Clustering Data Points
Nir Ailon, Edo Liberty, Hari Khalsa. US 2012/0254183
Inscape
Technical founder. Built automatic content-recognition (ACR) infrastructure - fingerprinting broadcast video in real time to identify what is on screen and target contextually relevant content across millions of connected televisions.
Patents
-
Methods for Displaying Contextually Targeted Content on a Connected Television
Zeev Neumeier, Edo Liberty. US 8,769,584
-
Methods for Identifying Video Segments and Displaying Contextually Targeted Content on Connected Televisions
Zeev Neumeier, Edo Liberty. US 12,238,371
During my PhD
-
Methods for filtering data and filling in missing data using nonlinear inference
Edo Liberty, Steven Zucker, Yosi Keller, Mauro M. Maggioni, Ronald R. Coifman, Frank Geshwind, and in collaboration with Plain Sight Systems. US 2007/0214133
For fun
-
Ezuzah Chrome Extension (a digital art piece) - your browser is your door to the internet, why not hang a Mezuzah?