CWB
|
#define DEFAULT_NR_OF_BUCKETS 250000 |
Defines the default number of buckets in a lexhash.
Referenced by cl_new_lexhash().
#define DEFAULT_PERFORMANCE_LIMIT 10 |
The default value for the performance limit (avg no of comparisons) before the hash is expanded.
Referenced by cl_lexhash_check_grow().
#define PERFORMANCE_COUNT 1000 |
The update interval for hash performance estimation.
Referenced by cl_lexhash_check_grow(), cl_lexhash_find_i(), and cl_new_lexhash().
typedef void(* cl_lexhash_cleanup_func)(cl_lexhash_entry) |
A function pointer type defining functions that can be used as the "cleanup" for a deleted cl_lexhash_entry.
void cl_delete_lexhash | ( | cl_lexhash | hash | ) |
Deletes a cl_lexhash object.
This deletes all the entries in all the buckets in the lexhash, plus the cl_lexhash itself.
hash | The cl_lexhash to delete. |
References _cl_lexhash::buckets, cl_delete_lexhash_entry(), cl_free, _cl_lexhash_entry::next, and _cl_lexhash::table.
Referenced by main().
void cl_delete_lexhash_entry | ( | cl_lexhash | hash, |
cl_lexhash_entry | entry | ||
) |
Deallocates a cl_lexhash_entry object and its key string.
Also, the cleanup function is run on the entry.
Usage: cl_delete_lexhash_entry(lexhash, entry);
This is a non-exported function.
hash | The lexhash this entry belongs to (needed to locate the cleanup function, if any). |
entry | The entry to delete. |
References cl_free, _cl_lexhash::cleanup_func, and _cl_lexhash_entry::key.
Referenced by cl_delete_lexhash(), and cl_lexhash_del().
cl_lexhash_entry cl_lexhash_add | ( | cl_lexhash | hash, |
char * | token | ||
) |
Adds a token to a cl_lexhash table.
If the string is already in the hash, its frequency count is increased by 1.
Otherwise, a new entry is created, with an auto-assigned ID; note that the string is duplicated, so the original string that is passed to this function does not need ot be kept in memory.
hash | The hash table to add to. |
token | The string to add. |
References cl_lexhash_find_i(), cl_malloc(), cl_strdup(), _cl_lexhash_entry::data, _cl_lexhash::entries, _cl_lexhash_entry::freq, _cl_lexhash_entry::id, _cl_lexhash_entry::_cl_lexhash_entry_data::integer, _cl_lexhash_entry::key, _cl_lexhash_entry::next, _cl_lexhash::next_id, _cl_lexhash_entry::_cl_lexhash_entry_data::numeric, _cl_lexhash_entry::_cl_lexhash_entry_data::pointer, and _cl_lexhash::table.
Referenced by encode_add_wattr_line(), main(), range_close(), range_declare(), range_open(), and sencode_write_region().
void cl_lexhash_auto_grow | ( | cl_lexhash | hash, |
int | flag | ||
) |
Turns a cl_lexhash's ability to autogrow on or off.
When this setting is switched on, the lexhash will grow automatically to avoid performance degradation.
Note the default value for this setting is SWITCHED ON.
hash | The hash that will be affected. |
flag | New value for autogrow setting: boolean where true is on and false is off. |
References _cl_lexhash::auto_grow.
int cl_lexhash_check_grow | ( | cl_lexhash | hash | ) |
Grows a lexhash table, increasing the number of buckets, if necessary.
This function checks whether growing the hash is necessary by updating the performance estimate. If it is above the threshold, and auto_grow is enabled, then the hashes is expanded by increasing the number of buckets, such that the average fill rate is 1 (i.e. 1 lexhash_entry per bucket, 1 lexhash index == 1 key-string ... on average). This gives the hash better performance and makes it capable of absorbing more keys.
Note: this function also implements the hashing algorithm and must be consistent with cl_lexhash_find_i().
Usage: expanded = cl_lexhash_check_grow(cl_lexhash hash);
This is a non-exported function.
hash | The lexhash to autogrow. |
References _cl_lexhash::auto_grow, _cl_lexhash::buckets, cl_debug, cl_free, cl_new_lexhash(), _cl_lexhash::comparisons, DEFAULT_PERFORMANCE_LIMIT, _cl_lexhash::entries, hash_string(), _cl_lexhash_entry::key, _cl_lexhash::last_performance, _cl_lexhash_entry::next, PERFORMANCE_COUNT, and _cl_lexhash::table.
Referenced by cl_lexhash_find_i().
int cl_lexhash_del | ( | cl_lexhash | hash, |
char * | token | ||
) |
Deletes a string from a hash.
The entry corresponding to the specified string is removed from the lexhash. If the string is not in the lexhash to begin with, no action is taken.
hash | The hash to alter. |
token | The string to remove. |
References cl_delete_lexhash_entry(), cl_lexhash_find_i(), _cl_lexhash::entries, _cl_lexhash_entry::freq, _cl_lexhash_entry::next, and _cl_lexhash::table.
cl_lexhash_entry cl_lexhash_find | ( | cl_lexhash | hash, |
char * | token | ||
) |
Finds the entry corresponding to a particular string within a cl_lexhash.
hash | The hash to search. |
token | The key-string to look for. |
References cl_lexhash_find_i().
Referenced by main(), range_close(), range_open(), range_print_registry_line(), and sencode_write_region().
cl_lexhash_entry cl_lexhash_find_i | ( | cl_lexhash | hash, |
char * | token, | ||
unsigned int * | ret_offset | ||
) |
Finds the entry corresponding to a particular string in a cl_lexhash.
This function is the same as cl_lexhash_find(), but *ret_offset is set to the hashtable offset computed for token (i.e. the index of the bucket within the hashtable), unless *ret_offset == NULL.
Note that this function hides the hashing algorithm details from the rest of the lexhash implementation.
Usage: entry = cl_lexhash_find_i(cl_lexhash hash, char *token, unsigned int *ret_offset);
This is a non-exported function.
hash | The hash to search. |
token | The key-string to look for. |
ret_offset | This integer address will be filled with the token's hashtable offset. |
References _cl_lexhash::buckets, cl_lexhash_check_grow(), _cl_lexhash::comparisons, hash_string(), _cl_lexhash_entry::key, _cl_lexhash_entry::next, PERFORMANCE_COUNT, _cl_lexhash::performance_counter, and _cl_lexhash::table.
Referenced by cl_lexhash_add(), cl_lexhash_del(), cl_lexhash_find(), cl_lexhash_freq(), and cl_lexhash_id().
int cl_lexhash_freq | ( | cl_lexhash | hash, |
char * | token | ||
) |
Gets the frequency of a particular string within a lexhash.
hash | The hash to look in. |
token | The string to look for. |
References cl_lexhash_find_i(), and _cl_lexhash_entry::freq.
Referenced by main(), and range_open().
int cl_lexhash_id | ( | cl_lexhash | hash, |
char * | token | ||
) |
Gets the ID of a particular string within a lexhash.
Note this is the ID integer that identifies THAT PARTICULAR STRING, not the hash value of that string - which only identifies the bucket the string is found in!
hash | The hash to look in. |
token | The string to look for. |
References cl_lexhash_find_i(), and _cl_lexhash_entry::id.
Referenced by encode_add_wattr_line(), and range_declare().
void cl_lexhash_set_cleanup_function | ( | cl_lexhash | hash, |
cl_lexhash_cleanup_func | func | ||
) |
Sets the cleanup function for a cl_lexhash.
The cleanup function is called with a cl_lexhash_entry argument; it should delete any objects assocated with the entry's data field.
The cleanup function is initially set to NULL, i.e. run no function.
hash | The cl_lexhash to work with. |
func | Pointer to the function to use for cleanup. |
References _cl_lexhash::cleanup_func, and func.
int cl_lexhash_size | ( | cl_lexhash | hash | ) |
Gets the number of different strings stored in a lexhash.
This returns the total number of entries in all the bucket linked-lists in the whole hashtable.
hash | The hash to size up. |
References _cl_lexhash::buckets, _cl_lexhash_entry::next, and _cl_lexhash::table.
cl_lexhash cl_new_lexhash | ( | int | buckets | ) |
Creates a new cl_lexhash object.
buckets | The number of buckets in the newly-created cl_lexhash; set to 0 to use the default number of buckets. |
References _cl_lexhash::auto_grow, _cl_lexhash::buckets, cl_calloc(), cl_malloc(), _cl_lexhash::cleanup_func, _cl_lexhash::comparisons, DEFAULT_NR_OF_BUCKETS, _cl_lexhash::entries, find_prime(), _cl_lexhash::last_performance, _cl_lexhash::next_id, PERFORMANCE_COUNT, _cl_lexhash::performance_counter, and _cl_lexhash::table.
Referenced by cl_lexhash_check_grow(), main(), range_declare(), sencode_write_region(), and wattr_declare().
int find_prime | ( | int | n | ) |
Returns smallest prime >= n.
Referenced by cl_new_lexhash(), main(), make_attribute_hash(), and MakeMacroHash().
unsigned int hash_string | ( | char * | string | ) |
Computes 32bit hash value for string.
Referenced by att_hash_lookup(), cl_lexhash_check_grow(), and cl_lexhash_find_i().
int is_prime | ( | int | n | ) |
Returns True iff n is a prime.
Referenced by find_prime().