Rapporti Tecnici
Autori: | Paolo Ferragina |
Travis Gagie | |
Giovanni Manzini | |
Area Scientifica: | Text Compression and Indexing |
Titolo: | Space-Conscious Data Indexing and Compression in a Streaming Model |
Apparso su: | TR-INF-2008-02-01-UNIPMN |
Editore: | Computer Science Department, UPO |
Anno: | 2008 |
URL: | http://www.di.unipmn.it...R-INF-2008-02-01-UNIPMN.pdf |
Sommario: | Compressed full-text indexes, data compressors, and streaming algorithms are three powerful tools for dealing with massive datasets. Surprisingly, algorithms for building compressed indexes (or files) are space-inefficient and/or incompatible with a streaming setting. Space-conscious streaming algorithms are crucial because current disk features make sequential disk accesses at least two orders of magnitude faster than random accesses. For the first time in the literature, we investigate string problems in the streaming model and prove upper and lower bounds on the complexity of building compressed indexes and compressing data in terms of the classic product: (internal- memory space) x (number of passes). |