Heavy hitters via cluster-preserving clustering

Seminar: 
Combinatorics Seminar
Event time: 
Thursday, October 20, 2016 - 12:00pm to 2:00pm
Location: 
215 LOM
Speaker: 
Jelani Nelson
Speaker affiliation: 
John A. Paulson School of Engineering and Applied Sciences at Harvard University
Event description: 

In the heavy hitters or frequent items problem, one must process a
stream of items and report those items that occur frequently. For
example a telecommunications company may wish to find popular
destination IP addresses in a packet stream across one of their links,
or a search engine may wish to report popular query words. A more
general problem is when there are two streams and we must report those
items whose frequencies significantly deviate between them; the former
problem is a special case since we can artificially pretend the first
of the two streams the empty stream. Such a problem naturally arises
in trend detection and anomaly detection.

We give a new algorithm, ExpanderSketch, for the more general problem,
which outperforms other well-known algorithms for the same problem such
as the CountMin sketch and CountSketch. Our main innovation is a novel
reduction to a new graph-clustering problem we formulate, in which finding
most of every cluster guarantees finding all the frequent items. We then solve
this problem by devising a novel spectral clustering algorithm potentially of
independent interest, based on divide-and-conquer and local search.