Affiliations: CUBS, Center for Unified Biometrics and Sensors, State
University of New York at Buffalo, Amherst, NY 14260, USA. E-mail: {hlei,
govind}@cse.buffalo.edu
Note: [] Corresponding author
Abstract: Sequential pattern mining can prove to be very useful for
predicating future activities, interpreting recurring phenomena, extracting
similarities in a series of events, etc. For example, in the NASDAQ market, the
problem of finding stocks whose closing prices are always about
β_0 higher than or β_1 times
the stocks of a given company, reduces to linear pattern retrieval: given query
X, find all sequences Y from the database S so that, Y=β_0+β_1X with confidence C. In this paper, we introduce a novel approach using the Simple Linear
Regression (SLR) model to match and retrieve sequential patterns. We extend the
one-dimensional R^2 model to ER^2 for
multi-dimensional sequence matching. In addition, we present the SLR + FFT
pruning technique to speed up data retrieval without incurring any false
dismissal. Experimental results on both synthetic and real datasets show that
the pruning ratio of SLR + FFT can be above 99%. Applying the retrieval
technique to real stocks resulted in the discovery many interesting patterns,
some of which are presented in the paper. Also, using ER^2
as the similarity measure for on-line signature recognition yielded high
accuracy.