The need for accessing, exploring and analyzing large collections of data series concerns a big number of diverse domains, affecting both science and industry. Such domains include meteorology (e.g., temperature), chemistry (e.g., mass spectroscopy), finance (e.g., stock quotes), smart cities (e.g., road traffic), marketing (e.g., opinion evolution) and others. In such applications we need to identify patterns, gain insights and detect abnormalities. Similarity search is a fundamental data mining task in this process: given a data series as a query we find the most similar ones in the database.
A common approach for optimizing similarity search is using indexing. The problem is that state-of-the-art data series indexing may take several days to complete over multi-TeraByte datasets. However, in many cases analysts and scientists need to explore the data without knowing a priori what they are looking for. Waiting for time-consuming indexing methods to load data can be a show-stopper for applications that either require immediate access or the amount of queries does not justify the cost that has to be paid upfront.
We propose an adaptive indexing solution for data series similarity search, called Adaptive Data Series (ADS). During indexing, ADS performs only a few basic steps, mainly creating the basic skeleton of a tree which contains condensed information on the input data series. Its leaves do not contain any raw data series and remain unmaterialized until relevant queries come. The data are gradually loaded in the index as queries are anwered. The net effect is that users do not have to wait for extended periods of time before getting access to the data; by the time state-of-the art indexing approaches are still in the indexing phase, our new approach allows users to answer several hundreds of thousands of queries.
To demonstrate the benefits of ADS, we developed a prototype data exploration tool called RINSE as in the recursive acronym: RINSE INteractive Series Explorer. It is build around on the Adaptive Data Series index (ADS) and allows users to explore large collections of data series. Users can pose queries using their mouse (or touch screen) or select them from other data collections. RINSE can execute queries on large multi-gigabyte datasets in seconds, either in an exact or an approximate mode. When queries arrive, ADS fetches data series from the raw data and moves only those data series, which are relevant to the query workload inside the index.
We recorded a short demonstration video that showcases the RINSE tool. We perform various data exploration actions using ADS+.