Technical Report Details
Authors: | Paolo Ferragina |
Travis Gagie |
Giovanni Manzini |
Scientific Area: | Text Compression and Indexing |
Title: | Space-Conscious Data Indexing and Compression in a Streaming Model |
Published on: | TR-INF-2008-02-01-UNIPMN |
Publisher: | Computer Science Department, UPO |
Year: | 2008 |
URL: | http://www.di.unipmn.it...R-INF-2008-02-01-UNIPMN.pdf |
Abstract: | Compressed full-text indexes, data compressors, and streaming algorithms are three
powerful tools for dealing with massive datasets. Surprisingly, algorithms for building
compressed indexes (or files) are space-inefficient and/or incompatible with a streaming
setting. Space-conscious streaming algorithms are crucial because current disk features
make sequential disk accesses at least two orders of magnitude faster than random
accesses. For the first time in the literature, we investigate string problems in the
streaming model and prove upper and lower bounds on the complexity of building
compressed indexes and compressing data in terms of the classic product: (internal-
memory space) x (number of passes). |