vineri, 16 octombrie 2009

[DessignPatterns][C#] Hashfile/Indexing Technique

HashFile is a technique to insert/delete/find key value pairs in a file in the hard disk. It uses fixed length records to store data and index. There are two files “name.dat” and “name.idx” corresponding to a "name". "name" is the identification for key/value pairs, and is given at the time of initialization.

"name.dat”
This is the data file. The Data file stores fixed length records of key/value pairs. Name of key/value pair, length of key and value should be specified at the time of initialization. Maximum size of the data file could stretch up to 2^32 bytes, which is the maximum value of unsigned integer (uint32).

“name.idx”
This is the index file. It stores fixed length records of record pointer and index pointer. Record pointer and index pointer are unsigned integers.

Indexing Technique

It uses Index Block (IB) of 91 x 2 Matrix to store record and index pointers. When Hashfile is initialized for the first time, an empty IB will be created and every element will be filled with max value of data type uint. Why 91 rows? 91 is the number of characters in the ASCII range 32-122. Each row in the IB represents an ASCII character. Columns of the IB are record pointer and index pointer. Data types of columns are unsigned integers. Record pointer represents the starting byte index number of a data record with respect to the origin, similarly index pointer represents starting byte index number of an Index record with respect to the origin.
Pseudo code for InsertKey method is given below:
  1. Take a byte from the new key
  2. If byte is within range 32-122, then jump to the matching index record within an IB that corresponds to distance from the left most byte, otherwise return error
  3. Read record pointer and index pointer
  4. If index pointer is equal to maximum value of data type uint, then, save record pointer and index pointer, save key and value in the data record and return record pointer
  5. Otherwise read data record for the record pointer
  6. Compare new key and existing key
  7. If new key matches existing key in the data record, then go to step 10
  8. If new key is greater than existing key, then go to step 11
  9. Otherwise step 13
  10. If the status of the data record is deleted then, save new value, change deleted flag to false, return record pointer, otherwise return error
  11. If index pointer points to next IB, take next byte and jump to step 2
  12. Otherwise create new IB, save record pointer and index pointer in the new IB, assign start index pointer of new IB to the old IB, save key and value in the data record, return record pointer
  13. If index pointer points to next IB, assign new record pointer to the index record, swap new and existing keys jump to step 2
  14. Otherwise assign new record pointer to the index record, create new IB, assign existing record pointer, new index pointer to new IB, assign start index pointer of new IB to the old IB, save key and value in the data record, return record pointer
See, also, http://www.codeproject.com/KB/recipes/HashFile.aspx

Niciun comentariu:

Trimiteți un comentariu