CWB
|
This file contains the API for the CWB "Corpus Library" (CL). More...
#include <strings.h>
The CorpusCharset object: an identifier for one of the character sets supported by CWB.
More...This file contains the API for the CWB "Corpus Library" (CL).
If you are programming against the CL, you should #include ONLY this header file, and make use of ONLY the functions declared here.
Other functions in the CL should ONLY be used within the CWB itself by CWB developers.
The header file is laid out in such a way as to semi-document the API, i.e. function prototypes are given with brief notes on usage, parameters, and return values. You may also wish to refer to CWB's automatically-generated HTML code documentation (created using the Doxygen system; if you're reading this text in a web browser, then the auto-generated documentation is almost certainly what you're looking at). However, please note that the auto-generated documentation ALSO covers (a) functions internal to the CL which should NOT be used when programming against it; (b) functions from the CWB utilities and from the CQP program - neither of which are part of the CL. There is also no distinction in that more extensive documentation between information that is relevant to programming against the CL API and information that is relevant to developers working on the CL itself. Caveat lector.
Note that many functions have two names -- one that follows the standardised format "cl_do_something()", and another that follows no particular pattern. The former are the "new API" (in v3.0.0 or higher of CWB) and the latter are the "old-style" API (depracated, but supported for backward compatibility). The old-style function names SHOULD NOT be used in newly-written code. Such double names mostly exist for the core data-access functions (i.e. for the Corpus and (especially) Attribute objects).
In v3.0 and v3.1 of CWB, the new API was implemented as macros to the old API. As of v3.2, the old API is implemented as macros to the new API.
In a very few cases, the parameter list or return behaviour of a function also changed. In this case, a function with the "old" parameter list is preserved (but depracated) and has the same name as the new function but with the suffix "_oldstyle". The old names are then re-implemented as macros to the _oldstyle functions. But, as should be obvious, while these functions and the macros to them will remain in the public API for backwards-compatibiltiy, they should not be used in new code, and are most definitely depracated!
The CL header is organised to reflect the conceptual structure of the library. While it is not fully "object-oriented" in style most of the functions are organised around a small number of data objects that represent real entities in a CWB-encoded corpus. Each object is defined as an opaque type (usually a structure whose members are PRIVATE and should only be accessed via the functions provided in the CL API).
CONTENTS LIST FOR THIS HEADER FILE:
SECTION 1 CL UTILITIES
1.1 ERROR HANDLING
1.2 MEMORY MANAGEMENT
1.3 DATA LIST CLASSES: cl_string_list AND cl_int_list
1.4 INTERNAL RANDOM NUMBER GENERATOR
1.5 SETTING CL CONFIG VARIABLES
1.6 MISCELLANEOUS UTILITIES
SECTION 2 THE CORE CL LIBRARY (DATA ACCESS)
2.1 THE Corpus OBJECT
2.2 THE Attribute OBJECT
2.3 THE PositionStream OBJECT
SECTION 3 SUPPORT CLASSES
3.1 THE CorpusProperty OBJECT
3.2 THE CorpusCharset OBJECT
3.3 THE CL_Regex OBJECT
3.4 THE cl_lexhash OBJECT
3.5 THE CL_BitVec OBJECT
SECTION 4 THE OLD CL API
(If you're looking at the auto-generated HTML documentation, this contents list, which describes the structure of the actual "cl.h" header file, is wrong for you - instead, use the index of links (above) to find the object or function you are interested in.)
We hope you enjoy using the CL!
best regards from
The CWB Development Team
#define ATT_ALIGN (1<<2) |
Alignment attributes, ie a set of zones of alignment between a source and target corpus.
Referenced by aid_name(), cl_cpos2alg2cpos_oldstyle(), cl_has_extended_alignment(), decode_print_token_sequence(), describecorpus_show_basic_info(), describecorpus_show_statistics(), do_cqi_cl_alg2cpos(), do_cqi_cl_attribute_size(), do_cqi_cl_cpos2alg(), do_cqi_corpus_attributes(), interpreter(), main(), prepare_AlignmentConstraints(), printAlignedStrings(), update_context_descriptor(), and verify_context_descriptor().
#define ATT_ALL ( ATT_POS | ATT_STRUC | ATT_ALIGN | ATT_DYN ) |
shorthand for "any / all types of attribute"
#define ATT_DYN (1<<6) |
Dynamic attributes, ??
Referenced by aid_name(), cl_delete_attribute(), cl_dynamic_call(), cl_dynamic_numargs(), decode_print_token_sequence(), describe_attribute(), and FunctionCall().
#define ATT_NONE 0 |
No type of attribute.
Referenced by aid_name(), att_hash_lookup(), cl_delete_attribute(), cqi_drop_attribute(), new_tabulation_item(), and print_tabulation().
#define ATT_POS (1<<0) |
Positional attributes, ie streams of word tokens, word tags - any "column" that has a value at every corpus position.
Referenced by aid_name(), cl_cpos2id(), cl_cpos2str(), cl_delete_attribute(), cl_id2all(), cl_id2cpos_oldstyle(), cl_id2freq(), cl_id2sort(), cl_id2str(), cl_id2strlen(), cl_idlist2cpos_oldstyle(), cl_idlist2freq(), cl_index_compressed(), cl_max_cpos(), cl_max_id(), cl_new_stream(), cl_regex2id(), cl_sequence_compressed(), cl_sort2id(), cl_str2id(), compose_kwic_line(), compute_grouping(), decode_print_token_sequence(), describecorpus_show_basic_info(), describecorpus_show_statistics(), do_cqi_cl_attribute_size(), do_cqi_cl_cpos2id(), do_cqi_cl_cpos2str(), do_cqi_cl_id2cpos(), do_cqi_cl_id2freq(), do_cqi_cl_id2str(), do_cqi_cl_idlist2cpos(), do_cqi_cl_lexicon_size(), do_cqi_cl_regex2id(), do_cqi_cl_str2id(), do_cqi_corpus_attributes(), do_IDReference(), do_LabelReference(), do_SimpleVariableReference(), do_StringConstraint(), evaluate_target(), get_matched_corpus_positions(), interpreter(), lexdecode_show(), main(), print_tabulation(), read_mapping(), red_factor(), scancorpus_add_key(), Setop(), setup_attribute(), SortSubcorpus(), SystemCorpusSize(), update_context_descriptor(), and VerifyVariable().
#define ATT_REAL ( ATT_POS | ATT_STRUC | ATT_ALIGN ) |
shorthand for "any / all types of attribute except dynamic"
#define ATT_STRUC (1<<1) |
Structural attributes, ie a set of SGML/XML-ish "regions" in the corpus delimited by the same SGML/XML tag.
Referenced by aid_name(), cl_cpos2struc2cpos(), cl_cpos2struc_oldstyle(), cl_struc2cpos(), cl_struc2str(), cl_struc_values(), compute_grouping(), ComputePrintStructures(), decode_print_token_sequence(), describecorpus_show_basic_info(), describecorpus_show_statistics(), do_attribute_show(), do_cqi_cl_attribute_size(), do_cqi_cl_cpos2lbound(), do_cqi_cl_cpos2rbound(), do_cqi_cl_cpos2struc(), do_cqi_cl_struc2cpos(), do_cqi_cl_struc2str(), do_cqi_corpus_attributes(), do_cqi_corpus_structural_attribute_has_values(), do_Description(), do_IDReference(), do_LabelReference(), do_StructuralContext(), do_XMLTag(), evaluate_target(), findcorpus(), get_nr_of_strucs(), interpreter(), main(), print_tabulation(), scancorpus_add_key(), setup_attribute(), update_context_descriptor(), and verify_context_descriptor().
#define ATTAT_FLOAT 5 |
Dynamic att argument type: floating point.
Referenced by argid_name(), attat_name(), cl_dynamic_call(), eval_bool(), get_leaf_value(), and makearg().
#define ATTAT_INT 3 |
Dynamic att argument type: integer.
Referenced by argid_name(), attat_name(), call_predefined_function(), cl_dynamic_call(), eval_bool(), get_leaf_value(), and makearg().
#define ATTAT_NONE 0 |
Dynamic att argument type: none.
Referenced by argid_name(), attat_name(), call_predefined_function(), cl_dynamic_call(), eval_bool(), and get_leaf_value().
#define ATTAT_PAREF 6 |
Dynamic att argument type: ??
Referenced by argid_name(), attat_name(), call_predefined_function(), cl_dynamic_call(), eval_bool(), and get_leaf_value().
#define ATTAT_POS 1 |
Dynamic att argument type: ??
Referenced by argid_name(), attat_name(), call_predefined_function(), cl_dynamic_call(), eval_bool(), get_leaf_value(), makearg(), and setup_attribute().
#define ATTAT_STRING 2 |
Dynamic att argument type: string.
Referenced by argid_name(), attat_name(), call_predefined_function(), cl_dynamic_call(), eval_bool(), get_leaf_value(), and makearg().
#define ATTAT_VAR 4 |
Dynamic att argument type: variable number of string arguments (only in arglist)
Referenced by argid_name(), attat_name(), cl_dynamic_call(), cl_dynamic_numargs(), eval_bool(), and makearg().
#define attr_drop_attribute | ( | a | ) | cl_delete_attribute(a) |
#define call_dynamic_attribute | ( | a, | |
dcr, | |||
args, | |||
nr_args | |||
) | cl_dynamic_call(a, dcr, args, nr_args) |
Referenced by get_leaf_value().
#define CDA_EALIGN -9 |
Error code: no alignment at position.
Referenced by cl_cpos2alg(), cl_error_string(), and get_extended_alignment().
#define CDA_EARGS -12 |
Error code: error in arguments for dynamic call.
Referenced by cl_dynamic_call(), and cl_error_string().
#define CDA_EATTTYPE -2 |
Error code: function was called on illegal attribute.
Referenced by cl_error_string(), and send_cl_error().
#define CDA_EBADREGEX -16 |
Error code: bad regular expression.
Referenced by cl_error_string(), cl_new_regex(), cl_regex2id(), and send_cl_error().
#define CDA_EBUFFER -18 |
Error code: buffer overflow (hard-coded internal buffer sizes)
Referenced by cl_error_string(), and cl_set_intersection().
#define CDA_EFSETINV -17 |
Error code: invalid feature set format.
Referenced by cl_error_string(), cl_make_set(), cl_set_intersection(), and cl_set_size().
#define CDA_EIDORNG -3 |
Error code: id out of range.
Referenced by cl_error_string(), cl_id2cpos_oldstyle(), cl_id2str(), cl_id2strlen(), cl_idlist2cpos_oldstyle(), cl_new_stream(), and send_cl_error().
#define CDA_EIDXORNG -5 |
Error code: index out of range.
Referenced by cl_alg2cpos(), cl_error_string(), cl_id2freq(), cl_sort2id(), cl_struc2cpos(), cl_struc2str(), and send_cl_error().
#define CDA_EINTERNAL -19 |
Error code: internal data consistency error (really bad)
Referenced by cl_error_string(), and cl_struc2str().
#define CDA_ENODATA -11 |
Error code: can't load/create necessary data.
Referenced by cl_alg2cpos(), cl_cpos2alg(), cl_cpos2alg2cpos_oldstyle(), cl_cpos2id(), cl_cpos2struc2cpos(), cl_cpos2struc_oldstyle(), cl_error_string(), cl_id2cpos_oldstyle(), cl_id2freq(), cl_id2sort(), cl_id2str(), cl_id2strlen(), cl_idlist2cpos_oldstyle(), cl_idlist2freq(), cl_max_alg(), cl_max_cpos(), cl_max_id(), cl_new_stream(), cl_regex2id(), cl_sort2id(), cl_str2id(), cl_struc2cpos(), cl_struc2str(), get_nr_of_strucs(), and send_cl_error().
#define CDA_ENOMEM -13 |
Error code: memory fault [unused].
Referenced by cl_error_string(), and send_cl_error().
#define CDA_ENOSTRING -6 |
Error code: no such string encoded.
Referenced by cl_error_string(), and cl_str2id().
#define CDA_ENULLATT -1 |
Error code: NULL passed as attribute argument.
Referenced by cl_error_string().
#define CDA_ENYI -15 |
Error code: not yet implemented.
Referenced by cl_error_string(), cl_id2sort(), and send_cl_error().
#define CDA_EOTHER -14 |
Error code: other error.
Referenced by cl_error_string(), cl_id2strlen(), cl_str2id(), and send_cl_error().
#define CDA_EPATTERN -7 |
Error code: illegal pattern.
Referenced by cl_error_string(), and send_cl_error().
#define CDA_EPOSORNG -4 |
Error code: position out of range.
Referenced by cl_cpos2alg(), cl_cpos2alg2cpos_oldstyle(), cl_cpos2id(), cl_error_string(), get_leaf_value(), and send_cl_error().
#define CDA_EREMOTE -10 |
Error code: error in remote access.
Referenced by cl_error_string().
#define CDA_ESTRUC -8 |
Error code: no structure at position.
Referenced by cl_cpos2boundary(), cl_cpos2struc2cpos(), cl_cpos2struc_oldstyle(), and cl_error_string().
#define CDA_OK 0 |
Error code: everything is fine; actual error values are all less than 0.
Referenced by call_predefined_function(), check_alignment_constraints(), cl_alg2cpos(), cl_cpos2alg(), cl_cpos2alg2cpos_oldstyle(), cl_cpos2id(), cl_cpos2str(), cl_cpos2struc2cpos(), cl_cpos2struc_oldstyle(), cl_dynamic_call(), cl_dynamic_numargs(), cl_error_string(), cl_id2all(), cl_id2cpos_oldstyle(), cl_id2freq(), cl_id2sort(), cl_id2str(), cl_id2strlen(), cl_idlist2cpos_oldstyle(), cl_idlist2freq(), cl_make_set(), cl_max_alg(), cl_max_cpos(), cl_max_id(), cl_new_regex(), cl_new_stream(), cl_regex2id(), cl_set_intersection(), cl_set_size(), cl_sort2id(), cl_str2id(), cl_struc2cpos(), cl_struc2str(), cl_struc_values(), compress_reversed_index(), compute_code_lengths(), decode_check_huff(), decode_print_token_sequence(), decompress_check_reversed_index(), do_cqi_cl_regex2id(), eval_bool(), get_corpus_positions(), get_leaf_value(), get_nr_of_strucs(), get_position_values(), lexdecode_print_item_info(), lexdecode_show(), map_token_to_class_number(), meet_mu(), member_of_class_s(), OptimizeStringConstraint(), read_mapping(), send_cl_error(), and Setop().
#define cderrno cl_errno |
Referenced by call_predefined_function(), check_alignment_constraints(), compute_code_lengths(), decode_check_huff(), do_cqi_cl_regex2id(), ensure_corpus_size(), eval_bool(), get_leaf_value(), get_position_values(), map_token_to_class_number(), meet_mu(), member_of_class_s(), OptimizeStringConstraint(), and read_mapping().
#define cdperror | ( | message | ) | cl_error(message) |
Referenced by compute_code_lengths().
#define cdperror_string | ( | no | ) | cl_error_string(no) |
Referenced by ensure_corpus_size(), and OptimizeStringConstraint().
#define central_corpus_directory | ( | ) | cl_standard_registry() |
Referenced by main().
#define CHARSET_FOR_IDENTIFIERS ascii |
"Dummy" charset macro for calling cl_string_canonical
We have a problem - CorpusCharsets are attached to corpora. So what charset do we use with cl_string_canonical if we are calling it on a string that does not (yet) have a corpus?
The answer: CHARSET_FOR_IDENTIFIERS. This should only be used as the 2nd argument to cl_string_canonical when the string is an identifier for a corpus, attribute, or whatever.
Note it is Ascii in v3.2.x+, breaking backwards compatibility with 2.2.x where Latin1 was allowed for identifiers.
#define CL_DYN_STRING_SIZE 2048 |
maximum size of 'dynamic' strings
Referenced by call_predefined_function(), and cl_set_intersection().
#define cl_free | ( | p | ) | do { if ((p) != NULL) { free(p); p = NULL; } } while (0) |
Safely frees memory.
p | Pointer to memory to be freed. |
Referenced by add_hosts_in_subnet_to_list(), after_Query(), assign_temp_to_sub(), attach_subcorpus(), cl_delete_attribute(), cl_delete_corpus(), cl_delete_int_list(), cl_delete_lexhash(), cl_delete_lexhash_entry(), cl_delete_regex(), cl_delete_string_list(), cl_free_string_list(), cl_id2cpos_oldstyle(), cl_lexhash_check_grow(), cl_make_set(), cl_new_corpus(), cl_new_regex(), cl_regex2id(), cl_string_canonical(), cl_string_qsort_compare(), comp_drop_component(), creat_rev_corpus(), cwbci_check_line(), delete_interval(), delete_intervals(), DestroyAttributeList(), do_AddSubVariables(), do_cqi_cl_cpos2lbound(), do_cqi_cl_cpos2rbound(), do_cqi_cl_cpos2struc(), do_cqi_cqp_fdist_1(), do_cqi_cqp_fdist_2(), do_flagged_re_variable(), do_IDReference(), do_LabelReference(), do_SearchPattern(), do_StandardQuery(), do_undump(), do_XMLTag(), drop_mapping(), drop_single_mapping(), DropVariable(), encode_add_wattr_line(), encode_generate_registry_file(), encode_scan_directory(), evaltree2searchstr(), evaluate_target(), execute_side_effects(), expand_macro(), free_booltree(), free_environment(), free_group(), free_matchlist(), free_tabulation_list(), FreeIDList(), FreeSortClause(), get_fulllocalpath(), get_matched_corpus_positions(), initialize_cl(), initialize_cqp(), load_macro_file(), MacroHashDelete(), main(), matchfirstpattern(), meet_mu(), open_input_stream(), open_pager(), open_temporary_file(), OptimizeStringConstraint(), print_tabulation(), range_close(), range_declare(), range_open(), RangeSetop(), RangeSort(), RecomputeAL(), RemoveNameFromAL(), sencode_check_set(), sencode_parse_line(), set_context_option_value(), set_corpus_matchlists(), set_target(), Setop(), SL_delete(), SortExternally(), SortSubcorpus(), SortSubcorpusRandomize(), split_subcorpus_spec(), Unchain(), validate_revcorp(), VariableDeleteItems(), VariableSubtractItem(), VerifyList(), and VerifyVariable().
#define cl_id2cpos | ( | a, | |
id, | |||
freq | |||
) | cl_id2cpos_oldstyle(a, id, freq, NULL, 0) |
Gets all the corpus positions where the specified item is found on the given P-attribute.
a | The P-attribute to look on. |
id | The id of the item to look for. |
freq | The frequency of the specified item is written here. This will be 0 in the case of errors. |
Referenced by do_cqi_cl_id2cpos().
#define cl_idlist2cpos | ( | a, | |
idlist, | |||
idlist_size, | |||
sort, | |||
size | |||
) | cl_idlist2cpos_oldstyle(a, idlist, idlist_size, sort, size, NULL, 0) |
Gets a list of corpus positions matching a list of ids.
a | The P-attribute we are looking in |
idlist | A list of item ids (i.e. id codes for items on this attribute). |
idlist_size | The length of this list. |
sort | boolean: return sorted list? |
size | The size of the allocated table will be placed here. |
Referenced by do_cqi_cl_idlist2cpos(), and get_corpus_positions().
#define CL_MAX_FILENAME_LENGTH 1024 |
String buffer size constant (for filenames).
This constant can be used for declaring character arrays that will only contain a filename (or path). It is expected that this will be shorter than CL_MAX_LINE_LENGTH.
Referenced by attach_subcorpus(), check_stamp(), compress_reversed_index(), decompress_check_reversed_index(), ensure_corpus_size(), expand_filename(), get_fulllocalpath(), load_corpusnames(), main(), open_file(), and save_subcorpus().
#define CL_MAX_LINE_LENGTH 4096 |
General string buffer size constant.
This constant is used to determine the maximum length (in bytes) of a line in a CWB input file. It therefore follows that no s-attribute or p-attribute can ever be longer than this. It's also the normal constant to use for (a) a local or global declaration of a character array (b) dynamic memory allocation of a string buffer. The associated function cl_strcpy() will copy this many bytes at most.
Referenced by alignshow_goodbye(), alignshow_print_next_region(), alignshow_skip_next_region(), cl_dynamic_call(), cl_new_regex(), cl_strcpy(), cl_string_qsort_compare(), compute_code_lengths(), ComputeGroupExternally(), corpus_info(), decode_check_huff(), decode_string_escape(), do_undump(), encode_add_wattr_line(), encode_get_input_line(), get_field_separators(), get_next_range(), get_position_values(), get_print_attribute_values(), html_convert_string(), latex_convert_string(), lexdecode_show(), load_corpusnames(), main(), ParsePrintOptions(), process_fd(), push_regchr(), range_close(), range_declare(), read_mapping(), scancorpus_add_key(), sencode_open_files(), SetVariableValue(), sgml_convert_string(), SortExternally(), update_grain_buffer(), and wattr_declare().
#define cl_new_attribute | ( | c, | |
name, | |||
type | |||
) | cl_new_attribute_oldstyle(c, name, type, NULL) |
Finds an attribute that matches the specified parameters, if one exists, for the given corpus.
Note that although this is a cl_new_* function, and it is the canonical way that we get an Attribute to call Attribute-functions on, it doesn't actually create any kind of object. The Attribute exists already as one of the dependents of the Corpus object; this function simply locates it and returns a pointer to it.
This "function" is implemented as a macro wrapped round the depracated function, making the means of calling it more in line with the rest of the CL.
corpus | The corpus in which to search for the attribute. |
attribute_name | The name of the attribute (i.e. the handle it has in the registry file). |
type | Type of attribute to be searched for. |
Referenced by cqi_lookup_attribute(), describecorpus_show_basic_info(), do_XMLTag(), lexdecode_show(), main(), print_tabulation(), scancorpus_add_key(), and setup_attribute().
#define cl_xml_is_name_char | ( | c | ) |
( ( c >= 'A' && c <= 'Z') || \ ( c >= 'a' && c <= 'z') || \ ( c >= '0' && c <= '9') || \ ( (unsigned char) c >= 0x80 \ && (unsigned char) c <= 0xff \ ) || \ ( c == '-') || \ ( c == '_') \ )
For a given character, say whether it is legal for an XML name.
TODO: Currently, anything in the upper half of the 8-bit range is allowed (in the old Latin1 days this was anything from 0xa0 to 0xff). This will work with any non-ascii character set, but is almost certainly too lax.
c | Character to check. (It is expected to be a char, so is typecase to unsigned char for comparison with upper-128 hex values.) |
Referenced by main(), and range_open().
#define ClosePositionStream | ( | ps | ) | cl_delete_stream(ps) |
#define collect_matches | ( | a, | |
idlist, | |||
idlist_size, | |||
sort, | |||
size, | |||
rl, | |||
rls | |||
) | cl_idlist2cpos_oldstyle(a, idlist, idlist_size, sort, size, rl, rls) |
Referenced by calculate_initial_matchlist_1().
#define collect_matching_ids | ( | a, | |
re, | |||
flags, | |||
size | |||
) | cl_regex2id(a, re, flags, size) |
Referenced by OptimizeStringConstraint().
#define cumulative_id_frequency | ( | a, | |
list, | |||
size | |||
) | cl_idlist2freq(a, list, size) |
Referenced by cl_idlist2cpos_oldstyle().
#define drop_corpus | ( | c | ) | cl_delete_corpus(c) |
Referenced by huffcode_usage().
#define find_attribute | ( | c, | |
name, | |||
type, | |||
data | |||
) | cl_new_attribute_oldstyle(c, name, type, data) |
Referenced by compute_grouping(), ComputePrintStructures(), do_attribute_show(), do_Description(), do_IDReference(), do_LabelReference(), do_SimpleVariableReference(), do_StringConstraint(), do_StructuralContext(), evaluate_target(), findcorpus(), FunctionCall(), prepare_AlignmentConstraints(), printAlignedStrings(), read_mapping(), RecomputeAL(), red_factor(), Setop(), SortSubcorpus(), SystemCorpusSize(), update_context_descriptor(), verify_context_descriptor(), and VerifyList().
#define get_alg_attribute | ( | a, | |
p, | |||
start1, | |||
end1, | |||
start2, | |||
end2 | |||
) | cl_cpos2alg2cpos_oldstyle(a, p, start1, end1, start2, end2) |
#define get_attribute_size | ( | a | ) | cl_max_cpos(a) |
Referenced by cl_id2cpos_oldstyle(), cl_new_stream(), compose_kwic_line(), and SystemCorpusSize().
#define get_bounds_of_nth_struc | ( | a, | |
struc, | |||
start, | |||
end | |||
) | cl_struc2cpos(a, struc, start, end) |
Referenced by calculate_ranges(), and feature_match().
#define get_id_at_position | ( | a, | |
cpos | |||
) | cl_cpos2id(a, cpos) |
Referenced by eval_bool(), feature_match(), get_leaf_value(), and get_position_values().
#define get_id_frequency | ( | a, | |
id | |||
) | cl_id2freq(a, id) |
Referenced by call_predefined_function(), cl_id2all(), cl_id2cpos_oldstyle(), cl_new_stream(), and compute_code_lengths().
#define get_id_from_sortidx | ( | a, | |
sid | |||
) | cl_sort2id(a, sid) |
#define get_id_info | ( | a, | |
sid, | |||
freq, | |||
len | |||
) | cl_id2all(a, sid, freq, len) |
#define get_id_of_string | ( | a, | |
str | |||
) | cl_str2id(a, str) |
#define get_id_range | ( | a | ) | cl_max_id(a) |
Referenced by cl_id2cpos_oldstyle(), cl_new_stream(), and OptimizeStringConstraint().
#define get_id_string_len | ( | a, | |
id | |||
) | cl_id2strlen(a, id) |
Referenced by cl_id2all().
#define get_nr_of_strucs | ( | a, | |
nr | |||
) | cl_max_struc_oldstyle(a, nr) |
#define get_num_of_struc | ( | a, | |
p, | |||
num | |||
) | cl_cpos2struc_oldstyle(a, p, num) |
Referenced by calculate_ranges(), and structure_value_at_position().
#define get_path_component cl_path_get_component |
Referenced by load_corpusnames().
#define get_positions | ( | a, | |
id, | |||
freq, | |||
rl, | |||
rls | |||
) | cl_id2cpos_oldstyle(a, id, freq, rl, rls) |
Referenced by calculate_initial_matchlist_1(), and cl_idlist2cpos_oldstyle().
#define get_sortidxpos_of_id | ( | a, | |
id | |||
) | cl_id2sort(a, id) |
#define get_string_at_position | ( | a, | |
cpos | |||
) | cl_cpos2str(a, cpos) |
Referenced by alignshow_print_next_region(), get_leaf_value(), and get_position_values().
#define get_string_of_id | ( | a, | |
id | |||
) | cl_id2str(a, id) |
Referenced by call_predefined_function(), cl_id2all(), cl_id2strlen(), compute_code_lengths(), eval_bool(), and print_mapping().
#define get_struc_attribute | ( | a, | |
cpos, | |||
start, | |||
end | |||
) | cl_cpos2struc2cpos(a, cpos, start, end) |
Referenced by calculate_ranges(), eval_bool(), get_leaf_value(), meet_mu(), and simulate().
#define IGNORE_CASE 1 |
Flag ignore-case in regular expression engine.
Referenced by cl_new_regex(), cl_string_canonical(), cl_string_maptable(), main(), print_pattern(), and scancorpus_add_key().
#define IGNORE_DIAC 2 |
Flag ignore-diacritics in regular expression engine.
Referenced by cl_new_regex(), cl_string_canonical(), cl_string_maptable(), main(), print_pattern(), and scancorpus_add_key().
#define IGNORE_REGEX 4 |
Flag for: don't use regular expression engine - match as a literal string.
Referenced by do_flagged_re_variable(), do_flagged_string(), do_mval_string(), do_XMLTag(), and print_pattern().
#define inverted_file_is_compressed | ( | a | ) | cl_index_compressed(a) |
Referenced by cl_id2cpos_oldstyle().
#define item_sequence_is_compressed | ( | a | ) | cl_sequence_compressed(a) |
Referenced by cl_cpos2id(), cl_max_cpos(), and load_component().
#define nr_of_arguments | ( | a | ) | cl_dynamic_numargs(a) |
#define OpenPositionStream | ( | a, | |
id | |||
) | cl_new_stream(a, id) |
#define setup_corpus | ( | reg, | |
name | |||
) | cl_new_corpus(reg, name) |
Referenced by GetSystemCorpus(), and printAlignedStrings().
#define STRUC_INSIDE 1 |
cl_cpos2boundary() return flag: specified position is WITHIN a region of this s-attribute
Referenced by cl_cpos2boundary().
#define STRUC_LBOUND 2 |
cl_cpos2boundary() return flag: specified position is AT THE START BOUNDARY OF a region of this s-attribute
Referenced by cl_cpos2boundary().
#define STRUC_RBOUND 4 |
cl_cpos2boundary() return flag: specified position is AT THE END BOUNDARY OF a region of this s-attribute
Referenced by cl_cpos2boundary().
#define structure_has_values | ( | a | ) | cl_struc_values(a) |
Referenced by ComputePrintStructures(), do_LabelReference(), and update_context_descriptor().
#define structure_value | ( | a, | |
struc | |||
) | cl_struc2str(a, struc) |
Referenced by structure_value_at_position().
#define structure_value_at_position | ( | a, | |
cpos | |||
) | cl_cpos2struc2str(a, cpos) |
typedef union _Attribute Attribute |
The Attribute object: an entire segment of a corpus, such as an annotation field, an XML structure, or a set.
The attribute can be of any flavour (s, p etc); this information is specified internally.
Note that each Attribute object is associated with a particular corpus. They aren't abstract, i.e. every corpus has a "word" p-attribute but any Attribute object for a "word" refers to the "word" of a specific corpus, not to "word" attributes in general.
typedef struct _CL_BitVec* CL_BitVec |
The CL_BitVec object: doesn't seem to exist {???-- AH}.
typedef struct _cl_int_list* cl_int_list |
Automatically growing list of integers (just what you always need ...)
typedef struct _cl_lexhash* cl_lexhash |
The cl_lexhash class (lexicon hashes, with IDs and frequency counts)
A "lexicon hash" links strings to integers. Each cl_lexhash object represents an entire table of such things; individual string-to-int links are represented by cl_lexhash_entry objects.
Within the cl_lexhash, the entries are grouped into buckets. A bucket is the term for a "slot" on the hash table. The linked-list in a given bucket represent all the different string-keys that map to one particular index value.
Each entry contains the key itself (for search-and-retrieval), the frequency of that type (incremented when a token is added that is already in the lexhash), an ID integer, plus a bundle of "data" associated with that string.
These lexicon hashes are used, notably, in the encoding of corpora to CWB-index-format.
typedef struct _cl_lexhash_entry * cl_lexhash_entry |
Underlying structure for the cl_lexhash_entry class.
Unlike most underlying structures, this is public in the CL API.
The CL_Regex object: an optimised regular expression.
The CL regex engine wraps around another regex library (v3.1.x: POSIX, will be PCRE in v3.2.0+) to implement CL semantics. These are: (a) the engine always matches the entire string; (b) there is support for case-/diacritic-insensitive matching; (c) certain optimisations are implemented.
Associated with the CL regular expression engine are macros for three flags: IGNORE_CASE, IGNORE_DIAC and IGNORE_REGEX. All three are used by the related cl_regex2id(), but only the first two are used by the CL_Regex object.
typedef struct _cl_string_list* cl_string_list |
Automatically growing list of strings (just what you always need ...)
The Corpus object: contains information on a loaded corpus, including all its attributes.
typedef enum ECorpusCharset CorpusCharset |
The CorpusCharset object: an identifier for one of the character sets supported by CWB.
(Note on adding new character sets: add them immediately before unknown_charset. Do not change the order of existing charsets. Remember to update the special-chars module if you do so.)
typedef struct TCorpusProperty * CorpusProperty |
The CorpusProperty object.
The underlying structure takes the form of a linked-list entry.
Note that unlike most CL objects, the underlying structure is exposed in the public API.
Each Corpus object has, as one of its members, the head entry on a list of CorpusProperties.
typedef struct _DCR DynCallResult |
The DynCallResult object (needed to allocate space for dynamic function arguments)
typedef struct _position_stream_rec_* PositionStream |
The PositionStream object: gives stream-like reading of an Attribute.
enum ECorpusCharset |
The CorpusCharset object: an identifier for one of the character sets supported by CWB.
(Note on adding new character sets: add them immediately before unknown_charset. Do not change the order of existing charsets. Remember to update the special-chars module if you do so.)
int cl_alg2cpos | ( | Attribute * | attribute, |
int | alg, | ||
int * | source_region_start, | ||
int * | source_region_end, | ||
int * | target_region_start, | ||
int * | target_region_end | ||
) |
Gets the corpus positions of an alignment on the given align-attribute.
Note that four corpus positions are retrieved, into the addresses given as parameters.
attribute | The align-attribute to look on. |
alg | The ID of the alignment whose positions are wanted. |
source_region_start | Location to put source corpus start position. |
source_region_end | Location to put source corpus end position. |
target_region_start | Location to put target corpus start position. |
target_region_end | Location to put target corpus end position. |
References CDA_EIDXORNG, CDA_ENODATA, CDA_OK, cl_errno, cl_has_extended_alignment(), CompAlignData, CompXAlignData, TMblob::data, TComponent::data, ensure_component(), and TComponent::size.
Referenced by check_alignment_constraints(), compose_kwic_line(), decode_print_token_sequence(), do_cqi_cl_alg2cpos(), and printAlignedStrings().
void* cl_calloc | ( | size_t | nr_of_elements, |
size_t | element_size | ||
) |
safely allocates memory calloc-style.
nr_of_elements | Number of elements to allocate |
element_size | Size of each element |
Referenced by alloc_mblob(), cl_new_int_list(), cl_new_lexhash(), cl_new_string_list(), cl_regex2id(), compute_code_lengths(), evaluate_target(), main(), range_declare(), and validate_revcorp().
CorpusCharset cl_charset_from_name | ( | char * | name | ) |
Gets a CorpusCharset enumeration with the id code for the given string.
References _charset_spec::name, and unknown_charset.
Referenced by add_corpus_property(), cwbci_parse_options(), and main().
char* cl_charset_name | ( | CorpusCharset | id | ) |
Gets a string containing the name of the specified CorpusCharset character set object.
Note that returned string cannot be modified. TODO It should probably be a const char.
References _charset_spec::name.
Referenced by corpus_info().
char* cl_charset_name_canonical | ( | char * | name_to_check | ) |
Checks whether a string represents a valid charset, and returns a pointer to the name in canonical form (ie lacking any non-standard case there may be in the input string).
Note that the returned string cannot be modified.
name_to_check | String containing the character set name to be checked |
References _charset_spec::name.
Referenced by cwbci_parse_options(), and encode_parse_options().
CorpusCharset cl_corpus_charset | ( | Corpus * | corpus | ) |
Retrieves the special 'charset' property from a Corpus object.
corpus | The corpus object from which to retrieve the charset |
References TCorpus::charset.
Referenced by decode_print_xml_declaration(), and scancorpus_add_key().
cl_string_list cl_corpus_list_attributes | ( | Corpus * | corpus, |
int | attribute_type | ||
) |
Gets a list of the named attributes that this corpus posesses.
This function creates a list of strings containing the names of all and only those Attributes in this corpus whose type matches that specified in the second parameter.
corpus | The corpus whose attributes are to be listed. |
attribute_type | The type of attributes to be listed. This must be one of the attribute type macros: ATT_POS, ATT_STRUC etc. For all attributes, specify ATT_ALL (natuerlich). |
References _Attribute::any, TCorpus::attributes, cl_new_string_list(), cl_strdup(), and cl_string_list_append().
char* cl_corpus_property | ( | Corpus * | corpus, |
char * | property | ||
) |
Gets the value of the specified corpus property.
corpus | Pointer to the Corpus object. |
property | Name of the property to retrieve. |
References cl_first_corpus_property(), cl_next_corpus_property(), TCorpusProperty::property, and TCorpusProperty::value.
Referenced by add_corpus_property(), and corpus_info().
int cl_cpos2alg | ( | Attribute * | attribute, |
int | cpos | ||
) |
Gets the id number of the alignment at the specified corpus position.
attribute | The align-attribute to look on. |
cpos | The corpus position to look at. |
References CDA_EALIGN, CDA_ENODATA, CDA_EPOSORNG, CDA_OK, cl_errno, cl_has_extended_alignment(), CompAlignData, CompXAlignData, TMblob::data, TComponent::data, ensure_component(), get_alignment(), get_extended_alignment(), and TComponent::size.
Referenced by check_alignment_constraints(), compose_kwic_line(), decode_print_token_sequence(), do_cqi_cl_cpos2alg(), and printAlignedStrings().
int cl_cpos2alg2cpos_oldstyle | ( | Attribute * | attribute, |
int | position, | ||
int * | source_corpus_start, | ||
int * | source_corpus_end, | ||
int * | aligned_corpus_start, | ||
int * | aligned_corpus_end | ||
) |
Gets the corpus positions of an alignment on the given align-attribute.
This is for old-style alignments only: it doesn't (can't) deal with extended alignments. Depracated: use cl_alg2cpos instead (but note its parameters are not identical).
attribute | The align-attribute to look on. |
position | The corpus position {??} of the alignment whose positions are wanted. |
source_corpus_start | Location to put source corpus start position. |
source_corpus_end | Location to put source corpus end position. |
aligned_corpus_start | Location to put target corpus start position. |
aligned_corpus_end | Location to put target corpus end position. |
References ATT_ALIGN, CDA_ENODATA, CDA_EPOSORNG, CDA_OK, check_arg, cl_errno, CompAlignData, TMblob::data, TComponent::data, ensure_component(), get_alignment(), and TComponent::size.
int cl_cpos2boundary | ( | Attribute * | a, |
int | cpos | ||
) |
Compares the location of a corpus position to the regions of an s-attribute.
This determines whether the specified corpus position is within a region (i.e. a structure, an instance of that s-attribute) on the given s-attribute; and/or on a boundary; or outside a region.
a | The s-attribute on which to search. |
cpos | The corpus position to look for. |
References CDA_ESTRUC, cl_cpos2struc2cpos(), cl_errno, STRUC_INSIDE, STRUC_LBOUND, and STRUC_RBOUND.
int cl_cpos2id | ( | Attribute * | attribute, |
int | position | ||
) |
Gets the integer ID of the item at the specified position on the given p-attribute.
attribute | The P-attribute to look on. |
position | The corpus position to look at. |
References _Attribute::any, ATT_POS, BSclose(), BSopen(), BSread(), BSseek(), CDA_ENODATA, CDA_EPOSORNG, CDA_OK, check_arg, cl_errno, CompCorpus, CompHuffCodes, CompHuffSeq, CompHuffSync, COMPRESS_DEBUG, corpus, TMblob::data, TComponent::data, ensure_component(), POS_Attribute::hc, item_sequence_is_compressed, _huffman_code_descriptor::length, _huffman_code_descriptor::min_code, _Attribute::pos, _huffman_code_descriptor::symbols, _huffman_code_descriptor::symindex, SYNCHRONIZATION, POS_Attribute::this_block, and POS_Attribute::this_block_nr.
Referenced by cl_cpos2str(), compute_code_lengths(), creat_rev_corpus(), decode_check_huff(), do_cqi_cl_cpos2id(), get_group_id(), i2compare(), main(), SortSubcorpus(), and validate_revcorp().
char* cl_cpos2str | ( | Attribute * | attribute, |
int | position | ||
) |
Gets the string of the item at the specified position on the given p-attribute.
attribute | The P-attribute to look on. |
position | The corpus position to look at. |
References ATT_POS, CDA_OK, check_arg, cl_cpos2id(), cl_errno, and cl_id2str().
Referenced by decode_print_token_sequence(), do_cqi_cl_cpos2str(), print_tabulation(), SortExternally(), and SortSubcorpus().
int cl_cpos2struc | ( | Attribute * | a, |
int | cpos | ||
) |
Gets the ID number of a structure (instance of an s-attribute) that is found at the given corpus position.
This is a wrapper of the "old" function get_num_of_struc() that normalises it to standard return value behaviour.
a | The s-attribute on which to search. |
cpos | The corpus position to look for. |
References cl_cpos2struc_oldstyle(), and cl_errno.
Referenced by compose_kwic_line(), decode_print_surrounding_s_att_values(), decode_print_token_sequence(), do_cqi_cl_cpos2lbound(), do_cqi_cl_cpos2rbound(), do_cqi_cl_cpos2struc(), eval_constraint(), get_position_values(), and main().
int cl_cpos2struc2cpos | ( | Attribute * | attribute, |
int | position, | ||
int * | struc_start, | ||
int * | struc_end | ||
) |
Gets the start and end positions of the instance of the given S-attribute found at the specified corpus position.
This function finds one particular instance of the S-attribute, and assigns its start and end points to the locations given as arguments.
attribute | The s-attribute to search. |
position | The corpus position to search for. |
struc_start | Location for the start position of the instance. |
struc_end | Location for the end position of the instance. |
References ATT_STRUC, CDA_ENODATA, CDA_ESTRUC, CDA_OK, check_arg, cl_errno, CompStrucData, TMblob::data, TComponent::data, ensure_component(), get_previous_mark(), and TComponent::size.
Referenced by cl_cpos2boundary(), and decode_print_token_sequence().
char* cl_cpos2struc2str | ( | Attribute * | attribute, |
int | position | ||
) |
Referenced by get_group_id(), and print_tabulation().
int cl_cpos2struc_oldstyle | ( | Attribute * | attribute, |
int | position, | ||
int * | struc_num | ||
) |
Gets the ID number of a structure (instance of an s-attribute) that is found at the given corpus position.
Depracated function: use cl_cpos2struc.
attribute | The s-attribute on which to search. |
position | The corpus position to look for. |
struc_num | Location where the number of the structure that is found will be put. |
References ATT_STRUC, CDA_ENODATA, CDA_ESTRUC, CDA_OK, check_arg, cl_errno, CompStrucData, TMblob::data, TComponent::data, ensure_component(), get_previous_mark(), and TComponent::size.
Referenced by cl_cpos2struc().
int cl_delete_attribute | ( | Attribute * | attribute | ) |
Deletes the specified Attribute object.
The function also appropriately amends the Corpus object of which this Attribute is a dependent. This means you can call it repreatedly on the first element of a Corpus's Attribute list (as the linked list is automatically adjusted).
References _Attribute::any, Dynamic_Attribute::arglist, ATT_DYN, ATT_NONE, ATT_POS, TCorpus::attributes, Dynamic_Attribute::call, cl_free, comp_drop_component(), CompDirectory, CompLast, corpus, _Attribute::dyn, POS_Attribute::hc, _DynArg::next, _Attribute::pos, and _Attribute::type.
Referenced by cl_delete_corpus(), cqi_drop_attribute(), and drop_attribute().
int cl_delete_corpus | ( | Corpus * | corpus | ) |
Deletes a Corpus object from memory.
A Corpus object keeps track of how many times it has been requested via cl_new_corpus(). When cl_delete_corpus() is called, the object is only actually deleted when there is just one outstanding request. Otherwise, the variable tracking the number of requests is decremented.
corpus | The Corpus to delete. |
References TCorpus::admin, TCorpus::attributes, cl_delete_attribute(), cl_free, FreeIDList(), TCorpus::groupAccessList, TCorpus::hostAccessList, TCorpus::id, TCorpus::info_file, loaded_corpora, TCorpus::name, TCorpus::next, TCorpus::nr_of_loads, TCorpus::path, TCorpus::registry_dir, TCorpus::registry_name, and TCorpus::userAccessList.
Referenced by cl_new_corpus(), compressrdx_cleanup(), decode_cleanup(), and main().
void cl_delete_int_list | ( | cl_int_list | l | ) |
Deletes a cl_int_list object.
References cl_free, and _cl_int_list::data.
void cl_delete_lexhash | ( | cl_lexhash | hash | ) |
Deletes a cl_lexhash object.
This deletes all the entries in all the buckets in the lexhash, plus the cl_lexhash itself.
hash | The cl_lexhash to delete. |
References _cl_lexhash::buckets, cl_delete_lexhash_entry(), cl_free, _cl_lexhash_entry::next, and _cl_lexhash::table.
Referenced by main().
void cl_delete_regex | ( | CL_Regex | rx | ) |
Deletes a CL_Regex object.
Note that we use cl_free to deallocate the internal PCRE buffers, not pcre_free, for the simple reason that pcre_free is just a function pointer that will normally contain free, and thus we miss out on the checking that cl_free provides.
rx | The CL_Regex to delete. |
References cl_free, _CL_Regex::extra, _CL_Regex::grain, _CL_Regex::grains, _CL_Regex::haystack_buf, and _CL_Regex::needle.
Referenced by cl_regex2id(), free_booltree(), and free_environment().
int cl_delete_stream | ( | PositionStream * | ps | ) |
Deletes a PositionStream object.
References BSclose().
Referenced by compress_reversed_index(), and decompress_check_reversed_index().
void cl_delete_string_list | ( | cl_string_list | l | ) |
Deletes a cl_string_list object.
References cl_free, and _cl_string_list::data.
Referenced by cl_make_set(), encode_parse_options(), and main().
int cl_dynamic_call | ( | Attribute * | attribute, |
DynCallResult * | dcr, | ||
DynCallResult * | args, | ||
int | nr_args | ||
) |
Calls a dynamic attribute.
This is the attribute access function for dynamic attributes.
attribute | The (dynamic) attribute in question. |
dcr | Location for the result (*int or *char). |
args | Location of the parameters (of *int or *char). |
nr_args | Number of parameters. |
References Dynamic_Attribute::arglist, ATT_DYN, ATTAT_FLOAT, ATTAT_INT, ATTAT_NONE, ATTAT_PAREF, ATTAT_POS, ATTAT_STRING, ATTAT_VAR, Dynamic_Attribute::call, CDA_EARGS, CDA_OK, _DCR::charres, check_arg, cl_errno, CL_MAX_LINE_LENGTH, cl_strdup(), _Attribute::dyn, _DCR::floatres, _DCR::intres, _DynArg::next, Dynamic_Attribute::res_type, _DCR::type, _DynArg::type, and _DCR::value.
int cl_dynamic_numargs | ( | Attribute * | attribute | ) |
Count the number of arguments on a dynamic attribute's argument list.
attribute | pointer to the Attribute object to analyse; it must be a dynamic attribute. |
References Dynamic_Attribute::arglist, ATT_DYN, ATTAT_VAR, CDA_OK, check_arg, cl_errno, _Attribute::dyn, _DynArg::next, and _DynArg::type.
void cl_error | ( | char * | message | ) |
Prints an error message, together with a string identifying the current error number.
References cl_errno, and cl_error_string().
Referenced by compress_reversed_index(), decode_print_token_sequence(), decompress_check_reversed_index(), lexdecode_print_item_info(), lexdecode_show(), and main().
char* cl_error_string | ( | int | error_num | ) |
Gets a string describing the error identified by an error number.
error_num | Error number integer (a CDA_* constant as defined in cl.h) |
References CDA_EALIGN, CDA_EARGS, CDA_EATTTYPE, CDA_EBADREGEX, CDA_EBUFFER, CDA_EFSETINV, CDA_EIDORNG, CDA_EIDXORNG, CDA_EINTERNAL, CDA_ENODATA, CDA_ENOMEM, CDA_ENOSTRING, CDA_ENULLATT, CDA_ENYI, CDA_EOTHER, CDA_EPATTERN, CDA_EPOSORNG, CDA_EREMOTE, CDA_ESTRUC, and CDA_OK.
Referenced by cl_error().
CorpusProperty cl_first_corpus_property | ( | Corpus * | corpus | ) |
Gets the first entry in this corpus's list of properties.
(The corpus properties iterator / property datatype is public.)
corpus | Pointer to the Corpus object. |
References TCorpus::properties.
Referenced by cl_corpus_property(), and corpus_info().
void cl_free_string_list | ( | cl_string_list | l | ) |
Frees all the strings in the cl_string_list object.
References cl_free, _cl_string_list::data, and _cl_string_list::size.
Referenced by main().
void cl_get_rng_state | ( | unsigned int * | i1, |
unsigned int * | i2 | ||
) |
int cl_has_extended_alignment | ( | Attribute * | attribute | ) |
Checks whether an attribute's XALIGN component exists, that is, whether or not it has extended alignment.
attribute | An align-attribute. |
References ATT_ALIGN, check_arg, cl_errno, component_state(), ComponentLoaded, ComponentUnloaded, and CompXAlignData.
Referenced by cl_alg2cpos(), cl_cpos2alg(), cl_max_alg(), and describecorpus_show_statistics().
char* cl_id2all | ( | Attribute * | attribute, |
int | index, | ||
int * | freq, | ||
int * | slen | ||
) |
Gets the string of the item with the specified ID on the given p-attribute.
As well as returning the string, other information about the item is inserted into locations specified by other parameters.
attribute | The P-attribute to look on. |
index | The ID of the item to look at. |
freq | Will be set to the frequency of the item. |
slen | Will be set to the string-length of the item. |
References ATT_POS, CDA_OK, check_arg, cl_errno, get_id_frequency, get_id_string_len, and get_string_of_id.
Referenced by lexdecode_print_item_info().
int* cl_id2cpos_oldstyle | ( | Attribute * | attribute, |
int | id, | ||
int * | freq, | ||
int * | restrictor_list, | ||
int | restrictor_list_size | ||
) |
Gets all the corpus positions where the specified item is found on the given P-attribute.
The restrictor list is a set of ranges in which instances of the item MUST occur to be collected by this function. If no restrictor list is specified (i.e. restrictor_list is NULL), then ALL corpus positions where the item occurs are returned.
This restrictor list has the form of a list of ranges {start,end} of size restrictor_list_size, that is, the number of ints in this area is 2 * restrictor_list_size!!!
This function is "oldstyle" because in the "newstyle" function, there is no restrictor list. (And in fact, the newstyle function is implemented as a macro to this one with the last two arguments NULL and 0.)
attribute | The P-attribute to look on. |
id | The id of the item to look for. |
freq | The frequency of the specified item is written here. This will be 0 in the case of errors. |
restrictor_list | A list of pairs of integers specifying ranges {start,end} in the corpus |
restrictor_list_size | The number of PAIRS of ints in the restrictor list. |
References ATT_POS, BSclose(), BSopen(), BSseek(), CDA_EIDORNG, CDA_ENODATA, CDA_OK, check_arg, cl_errno, cl_free, cl_malloc(), cl_realloc(), CompCompRF, CompCompRFX, CompRevCorpus, CompRevCorpusIdx, compute_ba(), TMblob::data, TComponent::data, ensure_component(), get_attribute_size, get_id_frequency, get_id_range, inverted_file_is_compressed, and read_golomb_code_bs().
int cl_id2freq | ( | Attribute * | attribute, |
int | id | ||
) |
Gets the frequency of an item on this attribute.
attribute | The P-attribute to look on |
id | Identifier of an item on this attribute. |
References ATT_POS, CDA_EIDXORNG, CDA_ENODATA, CDA_OK, check_arg, cl_errno, CompCorpusFreqs, TMblob::data, TComponent::data, and ensure_component().
Referenced by cl_idlist2freq(), compress_reversed_index(), creat_rev_corpus(), create_feature_maps(), decompress_check_reversed_index(), do_cqi_cl_id2freq(), and validate_revcorp().
int cl_id2sort | ( | Attribute * | attribute, |
int | id | ||
) |
Gets the position in the Attribute's sorted wordlist index of the item with the specified ID code.
This function is NOT YET IMPLEMENTED.
attribute | The (positional) Attribute whose index is to be searched |
id | Identifier of an item on this attribute. |
References ATT_POS, CDA_ENODATA, CDA_ENYI, CDA_OK, check_arg, cl_errno, CompLexiconSrt, and ensure_component().
char* cl_id2str | ( | Attribute * | attribute, |
int | id | ||
) |
Gets the string that corresponds to the specified item on the given P-attribute.
attribute | The Attribute to look the item up on |
id | Identifier of an item on this attribute. |
References ATT_POS, CDA_EIDORNG, CDA_ENODATA, CDA_OK, check_arg, cl_errno, CompLexicon, CompLexiconIdx, TMblob::data, TComponent::data, ensure_component(), and TComponent::size.
Referenced by cl_cpos2str(), create_feature_maps(), do_cqi_cl_id2str(), Group_id2str(), i2compare(), main(), and scancorpus_add_key().
int cl_id2strlen | ( | Attribute * | attribute, |
int | id | ||
) |
Calculates the length of the string that corresponds to the specified item on the given P-attribute.
attribute | The (positional) Attribute to look up the item on |
id | Identifier of an item on this attribute. |
References ATT_POS, CDA_EIDORNG, CDA_ENODATA, CDA_EOTHER, CDA_OK, check_arg, cl_errno, CompLexiconIdx, TMblob::data, TComponent::data, ensure_component(), get_string_of_id, and TComponent::size.
Referenced by create_feature_maps().
void cl_id_tolower | ( | char * | s | ) |
Converts an uppercase corpus name to an equivalent lowercase form.
String is modified in situ. Only the ASCII characters are changed.
Note, this function doesn't check for what is and is not an allowed CWB-corpus-name character.
Referenced by cl_new_corpus(), encode_generate_registry_file(), and main().
void cl_id_toupper | ( | char * | s | ) |
Converts a lowercase corpus name to an equivalent uppercase form.
String is modified in situ. Only the ASCII characters are changed.
Note, this function doesn't check for what is and is not an allowed CWB-corpus-name character.
The old version of this code was a line in cwb-encode that used the library toupper to cope with Latin1 characters. But these are no longer allowed in identifiers, which must be ASCII only.
Referenced by encode_generate_registry_file().
int cl_id_validate | ( | char * | s | ) |
Checks a string to see if it is a valid CWB identifier.
The rules for these are as follows (see also the CQP lexer):
* all characters must be ASCII, ie less than 0x80; * must be at least 1 character long (of course) * first character must be an uppercase or lowercase letter or underscore * second and subsequent characters may also be digits, hyphen or fullstop. * mixed case is allowed (just-upper and just-lower is imposed elsewhere, where necessary).
TODO: should the CL registry lexer be amended to reflect these restricitons? (ID there is rather laxer than this)
s | The string to check. |
Referenced by cl_new_corpus(), and encode_generate_registry_file().
int* cl_idlist2cpos_oldstyle | ( | Attribute * | attribute, |
int * | word_ids, | ||
int | number_of_words, | ||
int | sort, | ||
int * | size_of_table, | ||
int * | restrictor_list, | ||
int | restrictor_list_size | ||
) |
Gets a list of corpus positions matching a list of ids.
This function returns an (ordered) list of all corpus positions which match one of the ids given in the list of ids. The table is allocated with malloc, so free it when you don't need any more.
The list itself is returned; its size is placed in size_of_table. This size is, of course, the same as the cumulative id frequency of the ids (because each corpus position matching one of the ids is added into the list).
BEWARE: when the id list is rather big or there are highly-frequent ids in the id list (for example, after a call to collect_matching_ids with the pattern ".*") this will give a copy of the corpus -- for which you probably don't have enough memory!!! It is therefore a good idea to call cumulative_id_frequency before and to introduce some kind of bias.
This function is DEPRACATED in favour of cl_idlist2cpos().
This function is "oldstyle" because it has the "restrictor list" parameters, which are not available through the "newstyle" function cl_idlist2cpos() (which is currently just a macro to this).
A note on the last two parameters, which are currently unused: restrictor_list is a list of integer pairs [a,b] which means that the returned value only contains positions which fall within at least one of these intervals. The list must be sorted by the start positions, and secondarily by b. restrictor_list_size is the number of integers in this list, NOT THE NUMBER OF PAIRS. WARNING: CURRENTLY UNIMPLEMENTED {NB -- this descrtiption of restrictor_list_size DOESN'T MATCH the one for get_positions(), which this function calls...
REMEMBER: this monster returns a list of corpus indices, not a list of ids.
attribute | The P-attribute we are looking in |
word_ids | A list of item ids (i.e. id codes for items on this attribute). |
number_of_words | The length of this list. |
sort | boolean: return sorted list? |
size_of_table | The size of the allocated table will be placed here. |
restrictor_list | See function description. |
restrictor_list_size | See function description. |
References ATT_POS, CDA_EIDORNG, CDA_ENODATA, CDA_OK, check_arg, cl_errno, cl_malloc(), CompLexiconIdx, cumulative_id_frequency, ensure_component(), get_positions, intcompare(), and TComponent::size.
Referenced by get_matched_corpus_positions().
int cl_idlist2freq | ( | Attribute * | attribute, |
int * | word_ids, | ||
int | number_of_words | ||
) |
Calculates the total frequency of all items on a list of item IDs.
This function returns the sum of the word frequencies of words, which is an array of word_ids with length number_of_words.
The result is therefore the number of corpus positions which match one of the words.
attribute | P-attribute on which these items are found. |
word_ids | An array of item IDs. |
number_of_words | Length of the word_ids array. |
References ATT_POS, CDA_ENODATA, CDA_OK, check_arg, cl_errno, and cl_id2freq().
Referenced by OptimizeStringConstraint().
int cl_index_compressed | ( | Attribute * | attribute | ) |
Check whether the index (inverted file) of the given P-attribute is compressed.
See comments in body of function for what counts as "compressed".
References ATT_POS, check_arg, cl_errno, CompCompRF, CompCompRFX, component_state(), ComponentLoaded, ComponentUnloaded, CompRevCorpus, and CompRevCorpusIdx.
Referenced by cl_new_stream().
void cl_int_list_append | ( | cl_int_list | l, |
int | val | ||
) |
Appends an integer to the end of a cl_int_list object.
References cl_int_list_set(), and _cl_int_list::size.
int cl_int_list_get | ( | cl_int_list | l, |
int | n | ||
) |
Retrieves an element from a cl_int_list object.
l | The list to search. |
n | The element to retrieve. |
References _cl_int_list::data, and _cl_int_list::size.
void cl_int_list_lumpsize | ( | cl_int_list | l, |
int | s | ||
) |
Sets the lumpsize of a cl_int_list object.
l | The cl_int_list. |
s | The new lumpsize. |
References _cl_int_list::lumpsize, and LUMPSIZE.
void cl_int_list_qsort | ( | cl_int_list | l | ) |
Sorts a cl_int_list object.
The list of integers are sorted into ascending order.
References cl_int_list_intcmp(), _cl_int_list::data, and _cl_int_list::size.
void cl_int_list_set | ( | cl_int_list | l, |
int | n, | ||
int | val | ||
) |
Sets an integer on a cl_int_list object.
The n'th element on the list is set to val, and the list is auto-extended if necessary.
References _cl_int_list::allocated, cl_realloc(), _cl_int_list::data, _cl_int_list::lumpsize, and _cl_int_list::size.
Referenced by cl_int_list_append().
int cl_int_list_size | ( | cl_int_list | l | ) |
Gets the current size of a cl_int_list object (number of elements on the list).
References _cl_int_list::size.
cl_lexhash_entry cl_lexhash_add | ( | cl_lexhash | hash, |
char * | token | ||
) |
Adds a token to a cl_lexhash table.
If the string is already in the hash, its frequency count is increased by 1.
Otherwise, a new entry is created, with an auto-assigned ID; note that the string is duplicated, so the original string that is passed to this function does not need ot be kept in memory.
hash | The hash table to add to. |
token | The string to add. |
References cl_lexhash_find_i(), cl_malloc(), cl_strdup(), _cl_lexhash_entry::data, _cl_lexhash::entries, _cl_lexhash_entry::freq, _cl_lexhash_entry::id, _cl_lexhash_entry::_cl_lexhash_entry_data::integer, _cl_lexhash_entry::key, _cl_lexhash_entry::next, _cl_lexhash::next_id, _cl_lexhash_entry::_cl_lexhash_entry_data::numeric, _cl_lexhash_entry::_cl_lexhash_entry_data::pointer, and _cl_lexhash::table.
Referenced by encode_add_wattr_line(), main(), range_close(), range_declare(), range_open(), and sencode_write_region().
void cl_lexhash_auto_grow | ( | cl_lexhash | hash, |
int | flag | ||
) |
Turns a cl_lexhash's ability to autogrow on or off.
When this setting is switched on, the lexhash will grow automatically to avoid performance degradation.
Note the default value for this setting is SWITCHED ON.
hash | The hash that will be affected. |
flag | New value for autogrow setting: boolean where true is on and false is off. |
References _cl_lexhash::auto_grow.
int cl_lexhash_del | ( | cl_lexhash | hash, |
char * | token | ||
) |
Deletes a string from a hash.
The entry corresponding to the specified string is removed from the lexhash. If the string is not in the lexhash to begin with, no action is taken.
hash | The hash to alter. |
token | The string to remove. |
References cl_delete_lexhash_entry(), cl_lexhash_find_i(), _cl_lexhash::entries, _cl_lexhash_entry::freq, _cl_lexhash_entry::next, and _cl_lexhash::table.
cl_lexhash_entry cl_lexhash_find | ( | cl_lexhash | hash, |
char * | token | ||
) |
Finds the entry corresponding to a particular string within a cl_lexhash.
hash | The hash to search. |
token | The key-string to look for. |
References cl_lexhash_find_i().
Referenced by main(), range_close(), range_open(), range_print_registry_line(), and sencode_write_region().
int cl_lexhash_freq | ( | cl_lexhash | hash, |
char * | token | ||
) |
Gets the frequency of a particular string within a lexhash.
hash | The hash to look in. |
token | The string to look for. |
References cl_lexhash_find_i(), and _cl_lexhash_entry::freq.
Referenced by main(), and range_open().
int cl_lexhash_id | ( | cl_lexhash | hash, |
char * | token | ||
) |
Gets the ID of a particular string within a lexhash.
Note this is the ID integer that identifies THAT PARTICULAR STRING, not the hash value of that string - which only identifies the bucket the string is found in!
hash | The hash to look in. |
token | The string to look for. |
References cl_lexhash_find_i(), and _cl_lexhash_entry::id.
Referenced by encode_add_wattr_line(), and range_declare().
void cl_lexhash_set_cleanup_function | ( | cl_lexhash | lh, |
void(*)(cl_lexhash_entry) | func | ||
) |
int cl_lexhash_size | ( | cl_lexhash | hash | ) |
Gets the number of different strings stored in a lexhash.
This returns the total number of entries in all the bucket linked-lists in the whole hashtable.
hash | The hash to size up. |
References _cl_lexhash::buckets, _cl_lexhash_entry::next, and _cl_lexhash::table.
char* cl_make_set | ( | char * | s, |
int | split | ||
) |
Generates a set attribute value.
s | The input string. |
split | Boolean; if True, s is split on whitespace. If False, the function expects input in '|'-delimited format. |
References CDA_EFSETINV, CDA_OK, cl_delete_string_list(), cl_errno, cl_free, cl_malloc(), cl_new_string_list(), cl_strdup(), cl_string_list_append(), cl_string_list_get(), cl_string_list_qsort(), and cl_string_list_size().
Referenced by encode_add_wattr_line(), range_open(), and sencode_check_set().
void* cl_malloc | ( | size_t | bytes | ) |
safely allocates memory malloc-style.
This function allocates a block of memory of the requested size, and does a test for malloc() failure which aborts the program and prints an error message if the system is out of memory. So the return value of this function can be used without further testing for malloc() failure.
bytes | Number of bytes to allocate |
Referenced by accessible(), add_corpus_property(), add_grant_to_last_user(), add_host_to_list(), add_hosts_in_subnet_to_list(), add_tabular_pattern(), add_to_string(), add_user_to_list(), AddNameToAL(), alloc_mblob(), Allocate(), attach_subcorpus(), binsert_g(), check_alignment_constraints(), cl_id2cpos_oldstyle(), cl_idlist2cpos_oldstyle(), cl_lexhash_add(), cl_make_set(), cl_new_int_list(), cl_new_lexhash(), cl_new_regex(), cl_new_string_list(), cl_path_registry_quote(), cl_regex2id(), cl_string_latex2iso(), cl_string_qsort_compare(), combine_subcorpus_spec(), compute_code_lengths(), compute_grouping(), ComputeGroupExternally(), CopyS(), cqi_read_bool_list(), cqi_read_byte_list(), cqi_read_int_list(), cqi_read_string(), cqi_read_string_list(), cqp_run_mu_query(), cqp_run_tab_query(), creat_rev_corpus(), creat_rev_corpus_idx(), create_bitfield(), define_macro(), do_cqi_cqp_query(), do_flagged_re_variable(), do_MeetStatement(), do_mval_string(), do_undump(), do_UnionStatement(), do_XMLTag(), duplicate_corpus(), encode_generate_registry_file(), encode_scan_directory(), evaltree2searchstr(), find_corpus_registry(), FormState(), get_leaf_value(), get_matched_corpus_positions(), GetVariableItems(), GetVariableStrings(), hash_add(), initialize_cqp(), labellookup(), list_macros(), LookUp(), macro_iterator_next_prototype(), MacroAddSegment(), MacroHashAdd(), main(), make_attribute_hash(), make_first_tabular_pattern(), make_temp_corpus(), MakeExp(), MakeMacroHash(), mallocfile(), matchfirstpattern(), meet_mu(), mval_string_conversion(), new_reftab(), new_symbol_table(), new_tabulation_item(), NewAttributeList(), NewContextDescriptor(), NewVariable(), open_input_stream(), OptimizeStringConstraint(), parse_macro_name(), PushInputBuffer(), RangeSetop(), RangeSort(), read_mapping(), ReadHCD(), regex2dfa(), set_corpus_matchlists(), set_target(), Setop(), show_corpora_files1(), simulate_dfa(), SL_insert_after_point(), SortExternally(), SortSubcorpus(), SortSubcorpusRandomize(), Store(), strdupto(), try_optimization(), and VariableAddItem().
int cl_max_alg | ( | Attribute * | attribute | ) |
Gets the id number of alignments on this align-attribute.
This is equal to the maximum alignment on this attribute.
attribute | An align-attribute. |
References CDA_ENODATA, CDA_OK, cl_errno, cl_has_extended_alignment(), CompAlignData, CompXAlignData, ensure_component(), and TComponent::size.
Referenced by describecorpus_show_statistics(), and do_cqi_cl_attribute_size().
int cl_max_cpos | ( | Attribute * | attribute | ) |
Gets the maximum position on this P-attribute (ie the size of the attribute).
The result of this function is equal to the number of tokens in the attribute.
If the attribute's item sequence is compressed, this is read from the attribute's Huffman code descriptor block.
Otherwise, it is read from the size member of the Attribute's CompCorpus component.
References ATT_POS, CDA_ENODATA, CDA_OK, check_arg, cl_errno, CompCorpus, CompHuffCodes, corpus, ensure_component(), POS_Attribute::hc, item_sequence_is_compressed, _huffman_code_descriptor::length, _Attribute::pos, and TComponent::size.
Referenced by compress_reversed_index(), compute_code_lengths(), creat_rev_corpus(), decode_check_huff(), decompress_check_reversed_index(), describecorpus_show_basic_info(), describecorpus_show_statistics(), do_cqi_cl_attribute_size(), get_matched_corpus_positions(), lexdecode_show(), main(), OptimizeStringConstraint(), Setop(), SortSubcorpus(), and validate_revcorp().
int cl_max_id | ( | Attribute * | attribute | ) |
Gets the maximum id on this P-attribute (ie the range of the attribute's ID codes).
The result of this function is equal to the number of types in this attribute.
References ATT_POS, CDA_ENODATA, CDA_OK, check_arg, cl_errno, CompLexiconIdx, ensure_component(), and TComponent::size.
Referenced by compress_reversed_index(), compute_code_lengths(), creat_rev_corpus(), create_feature_maps(), decompress_check_reversed_index(), describecorpus_show_statistics(), do_cqi_cl_lexicon_size(), get_matched_corpus_positions(), lexdecode_show(), main(), and validate_revcorp().
int cl_max_struc | ( | Attribute * | a | ) |
Gets the maximum for this S-attribute (ie the size of the S-attribute).
The result of this function is equal to the number of instances of this s-attribute in the corpus.
This function works as a wrapper round cl_max_struc_oldstyle that normalises it to standard return value behaviour.
The s-attribute to evaluate.
References cl_errno, and get_nr_of_strucs().
Referenced by compose_kwic_line(), describecorpus_show_statistics(), do_cqi_cl_attribute_size(), main(), matchfirstpattern(), and scancorpus_add_key().
int cl_max_struc_oldstyle | ( | Attribute * | attribute, |
int * | nr_strucs | ||
) |
Attribute* cl_new_attribute_oldstyle | ( | Corpus * | corpus, |
char * | attribute_name, | ||
int | type, | ||
char * | data | ||
) |
Finds an attribute that matches the specified parameters, if one exists, for the given corpus.
Note that although this is a cl_new_* function, and it is the canonical way that we get an Attribute to call Attribute-functions on, it doesn't actually create any kind of object. The Attribute exists already as one of the dependents of the Corpus object; this function simply locates it and returns a pointer to it.
This function is DEPRACATED. Use cl_new_attribute() instead (which is actually a macro to this function, but the parameter list is different.)
corpus | The corpus in which to search for the attribute. |
attribute_name | The name of the attribute (i.e. the handle it has in the registry file). |
type | Type of attribute to be searched for. |
data | NOT USED. |
References _Attribute::any, TCorpus::attributes, STREQ, and _Attribute::type.
Referenced by drop_attribute(), get_matched_corpus_positions(), and main().
Corpus* cl_new_corpus | ( | char * | registry_dir, |
char * | registry_name | ||
) |
Creates a Corpus object to represent a given indexed corpus, located in a given directory accessible to the program.
registry_dir | Path to the CWB registry directory from which the corpus is to be loaded. This may be NULL, in which case the default registry directory is used. |
registry_name | The CWB-name of the indexed corpus to load (in the all-lowercase form) |
References check_access_conditions(), cl_delete_corpus(), cl_free, cl_id_tolower(), cl_id_validate(), cl_standard_registry(), cl_strdup(), corpus, cregin, cregin_name, cregin_path, cregparse(), cregrestart(), find_corpus(), find_corpus_registry(), TCorpus::id, loaded_corpora, TCorpus::next, TCorpus::nr_of_loads, TCorpus::registry_dir, and TCorpus::registry_name.
Referenced by main(), and sencode_parse_options().
cl_int_list cl_new_int_list | ( | void | ) |
Creates a new cl_int_list object.
References _cl_int_list::allocated, cl_calloc(), cl_malloc(), _cl_int_list::data, _cl_int_list::lumpsize, LUMPSIZE, and _cl_int_list::size.
cl_lexhash cl_new_lexhash | ( | int | buckets | ) |
Creates a new cl_lexhash object.
buckets | The number of buckets in the newly-created cl_lexhash; set to 0 to use the default number of buckets. |
References _cl_lexhash::auto_grow, _cl_lexhash::buckets, cl_calloc(), cl_malloc(), _cl_lexhash::cleanup_func, _cl_lexhash::comparisons, DEFAULT_NR_OF_BUCKETS, _cl_lexhash::entries, find_prime(), _cl_lexhash::last_performance, _cl_lexhash::next_id, PERFORMANCE_COUNT, _cl_lexhash::performance_counter, and _cl_lexhash::table.
Referenced by cl_lexhash_check_grow(), main(), range_declare(), sencode_write_region(), and wattr_declare().
CL_Regex cl_new_regex | ( | char * | regex, |
int | flags, | ||
CorpusCharset | charset | ||
) |
Create a new CL_regex object (ie a regular expression buffer).
The regular expression is preprocessed according to the flags, and anchored to the start and end of the string. (That is, ^ is added to the start, $ to the end.)
Then the resulting regex is compiled (using PCRE) and optimised.
regex | String containing the regular expression |
flags | IGNORE_CASE, or IGNORE_DIAC, or both, or 0. |
charset | The character set of the regex. |
References CDA_EBADREGEX, CDA_OK, charset, _CL_Regex::charset, cl_debug, cl_errno, cl_free, cl_malloc(), CL_MAX_LINE_LENGTH, cl_regex_error, cl_regopt_analyse(), cl_string_canonical(), cl_string_latex2iso(), _CL_Regex::extra, _CL_Regex::flags, _CL_Regex::grains, _CL_Regex::haystack_buf, IGNORE_CASE, IGNORE_DIAC, _CL_Regex::needle, regopt_data_copy_to_regex_object(), and utf8.
Referenced by cl_regex2id(), do_flagged_string(), do_XMLTag(), main(), and scancorpus_add_key().
PositionStream cl_new_stream | ( | Attribute * | attribute, |
int | id | ||
) |
Creates a new PositionStream object.
attribute | The P-attribute to open the position stream on |
id | The id that the new PositionStream will have. This the id of an item on the specified attribute. |
References ATT_POS, _position_stream_rec_::attribute, _position_stream_rec_::b, _position_stream_rec_::base, _position_stream_rec_::bs, BSopen(), BSseek(), CDA_EIDORNG, CDA_ENODATA, CDA_OK, check_arg, cl_errno, cl_index_compressed(), CompCompRF, CompCompRFX, CompRevCorpus, CompRevCorpusIdx, compute_ba(), TMblob::data, TComponent::data, ensure_component(), get_attribute_size, get_id_frequency, get_id_range, _position_stream_rec_::id, _position_stream_rec_::id_freq, _position_stream_rec_::is_compressed, _position_stream_rec_::last_pos, and _position_stream_rec_::nr_items.
Referenced by compress_reversed_index(), and decompress_check_reversed_index().
cl_string_list cl_new_string_list | ( | void | ) |
Creates a new cl_string_list object.
References _cl_string_list::allocated, cl_calloc(), cl_malloc(), _cl_string_list::data, _cl_string_list::lumpsize, LUMPSIZE, and _cl_string_list::size.
Referenced by cl_corpus_list_attributes(), cl_make_set(), cwbci_parse_options(), encode_scan_directory(), main(), and range_declare().
CorpusProperty cl_next_corpus_property | ( | CorpusProperty | prop | ) |
Gets the next corpus property on the list of properties.
(The corpus properties iterator / property datatype is public.)
prop | The current property. |
References TCorpusProperty::next.
Referenced by cl_corpus_property(), and corpus_info().
void cl_path_adjust_independent | ( | char * | path | ) |
Standardises subdirectory-dividers in a string that represents a path into Unix-like form (ie with forward-slash), regardless of what OS we are in.
Or, to put it another way, changes backslashes into forward slashes under Windows.
This may be useful because of the need to move corpora between systems
Note that the path is modified in place.
path | The path to modify (must be Ascii-compatible) |
References SUBDIR_SEPARATOR.
void cl_path_adjust_os | ( | char * | path | ) |
Standardises subdirectory-dividers in a string that represents a path, in an OS-sensitive way.
If the CL was compiled for Unix, backslash is changed to forwardslash. If the CL was compiled for Windows, forwardslash is changed to backslash.
Note that the path is modified in place.
path | The path to modify (must be Ascii-compatible) |
References SUBDIR_SEPARATOR.
char* cl_path_get_component | ( | char * | s | ) |
Tokenises a string into components split by ':' (or ';' under Win32).
s | The string to tokenise; or, NULL if tokenisation has already been initialised. |
References last, and PATH_SEPARATOR.
char* cl_path_registry_quote | ( | char * | path | ) |
Add quotes and escape slashes to a file path if necessary.
This is for the HOME and INFO fields of the registry file.
If either field contains any characters that can't be treated as an "ID" token by the registry parser, then we make sure it is treated as a string (quoted) instead, and make all appropriate substitutions
For consistency, this function always returns a newly allocated string, regardless of whether changes have been made.
Note that the way the registry parser works, it is quite happy with either "C:\dir\subdir" or "C:\\dir\\subdir" as a path for HOME or INFO.
path | String containing the path to quotify. |
References cl_malloc(), and cl_strdup().
Referenced by encode_generate_registry_file().
unsigned int cl_random | ( | void | ) |
Gets a random number.
Part of the CL-internal random number generator.
References RNG_I1, and RNG_I2.
Referenced by cl_runif(), and SortSubcorpusRandomize().
void cl_randomize | ( | void | ) |
Initialises the CL-internal random number generator from the current system time.
References cl_set_seed().
Referenced by initialize_cqp(), and main().
int cl_read_stream | ( | PositionStream | ps, |
int * | buffer, | ||
int | buffer_size | ||
) |
Reads corpus positions from a position stream to a buffer.
ps | The position stream to read. |
buffer | Location to put the resulting item positions. |
buffer_size | Maximum number of item positions to read. (Fewer will be read if fewer are available). |
References _position_stream_rec_::b, _position_stream_rec_::base, _position_stream_rec_::bs, _position_stream_rec_::id_freq, _position_stream_rec_::is_compressed, _position_stream_rec_::last_pos, _position_stream_rec_::nr_items, and read_golomb_code_bs().
Referenced by compress_reversed_index(), and decompress_check_reversed_index().
void* cl_realloc | ( | void * | block, |
size_t | bytes | ||
) |
safely reallocates memory.
block | Pointer to the block to be reallocated |
bytes | Number of bytes to allocate to the resized memory block @ return Pointer to the block of reallocated memory |
Referenced by add_to_string(), AddBuf(), AddEquiv(), AddState(), binsert_g(), cl_id2cpos_oldstyle(), cl_int_list_set(), cl_string_list_set(), ComputeGroupExternally(), ComputeGroupInternally(), load_macro_file(), MakeExp(), meet_mu(), NewVariable(), PushQ(), RangeSetop(), read_mapping(), Reallocate(), Setop(), and VariableAddItem().
int* cl_regex2id | ( | Attribute * | attribute, |
char * | pattern, | ||
int | flags, | ||
int * | number_of_matches | ||
) |
Gets a list of the ids of those items on a given Attribute that match a particular regular-expression pattern.
The pattern is interpreted internally with the CL regex engine, q.v.
The function returns a pointer to a sequence of ints of size number_of_matches. The list is allocated with malloc(), so do a cl_free() when you don't need it any more.
attribute | The p-attribute to look on. |
pattern | String containing the pattern against which to match each item on the attribute. Note: this pattern is a regular expression, but it is passed as a string, not a CL_Regex object. The CL_Regex object is created internally. |
flags | Flags for the regular expression system via cl_new_regex. |
number_of_matches | This is set to the number of item ids found, i.e. the size of the returned buffer. |
References ATT_POS, CDA_EBADREGEX, CDA_ENODATA, CDA_OK, check_arg, cl_calloc(), cl_debug, cl_delete_regex(), cl_errno, cl_free, cl_malloc(), cl_new_regex(), cl_regex_error, cl_regex_match(), cl_regex_optimised(), cl_regopt_count_get(), cl_regopt_count_reset(), CompLexicon, CompLexiconIdx, TMblob::data, TComponent::data, ensure_component(), _Attribute::pos, TComponent::size, and word.
Referenced by do_cqi_cl_regex2id(), get_matched_corpus_positions(), lexdecode_show(), and scancorpus_add_key().
int cl_regex_match | ( | CL_Regex | rx, |
char * | str | ||
) |
Matches a regular expression against a string.
The regular expression contained in the CL_Regex is compared to the string. No settings or flags are passed to this function; rather, the settings that rx was created with are used.
rx | The regular expression to match. |
str | The string to compare the regex to. |
References _CL_Regex::anchor_end, _CL_Regex::anchor_start, _CL_Regex::charset, cl_debug, cl_regopt_successes, cl_string_canonical(), _CL_Regex::extra, _CL_Regex::flags, _CL_Regex::grain, _CL_Regex::grain_len, _CL_Regex::grains, _CL_Regex::haystack_buf, _CL_Regex::jumptable, and _CL_Regex::needle.
Referenced by cl_regex2id(), eval_bool(), eval_constraint(), is_regular(), main(), and matchfirstpattern().
int cl_regex_optimised | ( | CL_Regex | rx | ) |
Finds the level of optimisation of a CL_Regex.
This function returns the approximate level of optimisation, computed from the ratio of grain length to number of grains (0 = no grains, ergo not optimised at all).
rx | The CL_Regex to check. |
References _CL_Regex::grain_len, and _CL_Regex::grains.
Referenced by cl_regex2id().
int cl_regopt_count_get | ( | void | ) |
Get a reading from the "success counter" for optimised regexes.
The counter is incremented by 1 every time the "grain" system is used successfully to avoid calling PCRE. That is, it is incremented every time a string is scrutinised and found to contain none of the grains.
Usage:
for (i = 0, hits = 0; i < n; i++) if (cl_regex_match(rx, haystacks[i])) hits++;
fprintf(stderr, "Found %d matches; avoided regex matching %d times out of %d trials", hits, cl_regopt_count_get(), n );
References cl_regopt_successes.
Referenced by cl_regex2id().
void cl_regopt_count_reset | ( | void | ) |
Reset the "success counter" for optimised regexes.
References cl_regopt_successes.
Referenced by cl_regex2id().
double cl_runif | ( | void | ) |
Gets a random number in the range [0,1] with uniform distribution.
Part of the CL-internal random number generator.
References cl_random().
Referenced by do_cqi_cqp_query(), and do_reduce().
int cl_sequence_compressed | ( | Attribute * | attribute | ) |
Checks whether the item sequence of the given P-attribute is compressed.
See comments in body of function for what counts as "compressed".
References ATT_POS, check_arg, cl_errno, CompCorpus, CompHuffCodes, CompHuffSeq, CompHuffSync, component_state(), ComponentLoaded, ComponentUnloaded, POS_Attribute::hc, and _Attribute::pos.
void cl_set_debug_level | ( | int | level | ) |
Sets the debug level configuration variable.
References cl_debug.
Referenced by execute_side_effects(), main(), parse_options(), and set_default_option_values().
int cl_set_intersection | ( | char * | result, |
const char * | s1, | ||
const char * | s2 | ||
) |
Computes the intersection of two set attribute values.
Compute intersection of two set attribute values (in standard syntax, i.e. sorted and '|'-delimited); memory for the result string must be allocated by the caller.
References CDA_EBUFFER, CDA_EFSETINV, CDA_OK, CL_DYN_STRING_SIZE, cl_errno, cl_strcmp(), s1, and s2.
Referenced by call_predefined_function().
void cl_set_memory_limit | ( | int | megabytes | ) |
Sets the memory limit respected by some CL functions.
References cl_memory_limit.
Referenced by main().
void cl_set_optimize | ( | int | state | ) |
Turns optimization on or off.
state | Boolean (true turns it on, false turns it off). |
References cl_optimize.
Referenced by execute_side_effects(), main(), and set_default_option_values().
void cl_set_rng_state | ( | unsigned int | i1, |
unsigned int | i2 | ||
) |
Restores the state of the CL-internal random number generator.
i1 | The value to set the first RNG integer to (if zero, resets it to 1) |
i2 | The value to set the second RNG integer to (if zero, resets it to 1) |
References RNG_I1, and RNG_I2.
Referenced by cl_set_seed(), and SortSubcorpusRandomize().
void cl_set_seed | ( | unsigned int | seed | ) |
Initialises the CL-internal random number generator.
seed | A single 32bit number to use as the seed |
References cl_set_rng_state().
Referenced by cl_randomize().
int cl_set_size | ( | char * | s | ) |
Counts the number of elements in a set attribute value.
This function counts the number of elements in a set attribute value (using '|'-delimited standard syntax);
References CDA_EFSETINV, CDA_OK, and cl_errno.
Referenced by call_predefined_function().
int cl_sort2id | ( | Attribute * | attribute, |
int | sort_index_position | ||
) |
Gets the ID code of the item at the specified position in the Attribute's sorted wordlist index.
That is, given a sort-order position, the actual ID of the corresponding item is generated.
attribute | The (positional) Attribute whose index is to be searched. |
sort_index_position | The offset in the index where the ID code is to be found. |
References ATT_POS, CDA_EIDXORNG, CDA_ENODATA, CDA_OK, check_arg, cl_errno, CompLexiconSrt, TMblob::data, TComponent::data, and ensure_component().
Referenced by lexdecode_show().
char* cl_standard_registry | ( | ) |
Gets a string containing the path of the default registry directory.
References regdir, REGISTRY_DEFAULT_PATH, and REGISTRY_ENVVAR.
Referenced by cl_new_corpus(), find_corpus(), load_corpusnames(), and main().
int cl_str2id | ( | Attribute * | attribute, |
char * | id_string | ||
) |
Gets the ID code that corresponds to the specified string on the given P-attribute.
attribute | The (positional) Attribute to look the string up on |
id_string | The string of an item on this attribute |
References ATT_POS, CDA_ENODATA, CDA_ENOSTRING, CDA_EOTHER, CDA_OK, check_arg, cl_errno, cl_strcmp(), CompLexicon, CompLexiconIdx, CompLexiconSrt, TMblob::data, TComponent::data, ensure_component(), and TComponent::size.
Referenced by create_feature_maps(), do_cqi_cl_str2id(), get_corpus_positions(), and lexdecode_show().
int cl_strcmp | ( | char * | s1, |
char * | s2 | ||
) |
CL internal string comparison (uses signed char on all platforms).
Referenced by cl_set_intersection(), cl_str2id(), cl_string_list_strcmp(), and scompare().
char* cl_strcpy | ( | char * | buf, |
const char * | src | ||
) |
Replacement for strcpy that won't copy more than CL_MAX_LINE_LENGTH characters.
This is intended to make it easier to evade buffer overflows. But it doesn't protect against the opposite danger of losing important data from the end of a truncated string.
Note, buffer overflow is still possible if buf is a pointer to the middle of a buffer.
So this function is not a panacea, it's just a bit of a help.
It's also implemented in a way that is safe for down-strcpying, that is, if we are erasing a section from the start/middle of the string - cl_strcpy(string, string+3); for instance). The POSIX standard states that the normal strcpy has undefined behaviour if the objects overlap. That's not the case here.
buf | A string buffer to copy to. |
src | The string pointer to copy from. |
References buf, and CL_MAX_LINE_LENGTH.
Referenced by encode_get_input_line(), ParsePrintOptions(), and range_declare().
char* cl_strdup | ( | char * | string | ) |
Safely duplicates a string.
string | Pointer to the original string |
Referenced by add_grant_to_last_user(), add_user_to_list(), AddNameToAL(), after_Query(), assign_temp_to_sub(), att_hash_lookup(), attach_subcorpus(), changecase_string(), cl_corpus_list_attributes(), cl_dynamic_call(), cl_lexhash_add(), cl_make_set(), cl_new_corpus(), cl_path_registry_quote(), cl_string_latex2iso(), cl_string_reverse(), combine_subcorpus_spec(), component_full_name(), compose_kwic_line(), cwbci_check_line(), cwbci_parse_options(), do_XMLTag(), duplicate_corpus(), encode_add_wattr_line(), encode_generate_registry_file(), evaltree2searchstr(), execute_side_effects(), expand_filename(), expand_macro(), get_fulllocalpath(), GetSystemCorpus(), labellookup(), load_corpusnames(), load_macro_file(), LookUp(), MacroHashAdd(), main(), make_temp_corpus(), NewVariable(), open_pager(), open_stream(), parse_options(), print_tabulation(), printAlignedStrings(), range_close(), range_declare(), range_open(), read_mapping(), regopt_data_copy_to_regex_object(), sencode_declare_new_satt(), sencode_parse_line(), set_context_option_value(), set_default_option_values(), SL_insert_after_point(), SortSubcorpus(), split_attribute_spec(), split_subcorpus_spec(), VariableAddItem(), VerifyVariable(), and wattr_declare().
void cl_string_canonical | ( | char * | s, |
CorpusCharset | charset, | ||
int | flags | ||
) |
Converts a string to canonical form.
The "canonical form" of a string is for use in comparisons where case-insensitivity and/or diacritic insensitivity is desired.
Note that the string s is modified in place. This means it must have enough memory to cope with any expansions made in Unicode case folding. Ideally, allocate double the length of the string (since case-folding doesn't include any one -> more-than-two mappings so far as I know).
Note also that the arguments of this string were changed in v3.2.1. Now, a CorpusCharset is needed. This is because string canonicalising works differently in UTF8. In UTF8, the "composed" status of ALL strings is standardised (this is not dependent on flags; so this function should always be called on all strings that are going to be inserted into or searched for within, an indexed corpus; then we know we are always dealing with maximally-precomposed strings). Then case folding / accent folding is done by calling Unicode-aware functions. This is in contrast to the process for Latin1, which just uses a straightforward mapping table for both sorts of folding.
s | The string (currently: must be Ascii, Latin-1, or UTF8, but this is not checked for you!) |
charset | The character set to use in standardising. If this is utf8, complex accent and/or case folding will be done, as per the unicode standard. If it is anything else, the Latin1 mapping tables will be used (currently no other ISO mapping tables are built in and activated in the CL). |
flags | The flags that specify which conversions are required. Can be IGNORE_CASE and/or IGNORE_DIAC. |
References cl_free, cl_string_maptable(), IGNORE_CASE, IGNORE_DIAC, and utf8.
Referenced by cl_new_regex(), cl_regex_match(), cl_string_qsort_compare(), encode_get_input_line(), print_tabulation(), SortExternally(), and SortSubcorpus().
char* cl_string_latex2iso | ( | char * | str, |
char * | result, | ||
int | target_len | ||
) |
Converts ASCII strings with latex-style blackslash escapes for accented characters to ISO-8859-1 (Latin-1).
Syntax:
"[AaOoUus..] --> corresponding ISO 8859-1 character
octal} --> ISO 8859-1 character
Note that if cl_allow_latex2iso is FALSE, this function will simply copy the input to the output. So it is always safe to call this function.
str | The string to convert. |
result | The location to put the altered string (which should be shorter, or at least no longer than, the input string). If this parameter is NULL, space is automatically allocated for the output. result is allowed to be the same as str. |
target_len | The maximum length of the target string. If result is NULL, then this is deduced automatically. |
References cl_allow_latex2iso, cl_malloc(), cl_strdup(), popc, and pushc.
Referenced by cl_new_regex(), do_flagged_string(), do_SetVariableValue(), and do_XMLTag().
void cl_string_list_append | ( | cl_string_list | l, |
char * | val | ||
) |
Appends a string pointer to the end of a cl_string_list object.
References cl_string_list_set(), and _cl_string_list::size.
Referenced by cl_corpus_list_attributes(), cl_make_set(), cwbci_check_line(), encode_parse_options(), encode_scan_directory(), and range_declare().
char* cl_string_list_get | ( | cl_string_list | l, |
int | n | ||
) |
Retrieves an element from a cl_string_list object.
l | The list to search. |
n | The element to retrieve. |
References _cl_string_list::data, and _cl_string_list::size.
Referenced by cl_make_set(), cwbci_check_line(), encode_get_input_line(), encode_parse_options(), main(), range_close(), range_open(), and range_print_registry_line().
void cl_string_list_lumpsize | ( | cl_string_list | l, |
int | s | ||
) |
Sets the lumpsize of a cl_string_list object.
l | The cl_string_list. |
s | The new lumpsize. |
References _cl_string_list::lumpsize, and LUMPSIZE.
void cl_string_list_qsort | ( | cl_string_list | l | ) |
Sorts a cl_string_list object.
The list of strings is sorted using cl_strcmp().
References cl_string_list_strcmp(), _cl_string_list::data, and _cl_string_list::size.
Referenced by cl_make_set(), and encode_scan_directory().
void cl_string_list_set | ( | cl_string_list | l, |
int | n, | ||
char * | val | ||
) |
Sets a string pointer on a cl_string_list object.
The n'th element on the list is set to val, and the list is auto-extended if necessary.
References _cl_string_list::allocated, cl_realloc(), _cl_string_list::data, _cl_string_list::lumpsize, and _cl_string_list::size.
Referenced by cl_string_list_append().
int cl_string_list_size | ( | cl_string_list | l | ) |
Gets the current size of a cl_string_list object (number of elements on the list).
References _cl_string_list::size.
Referenced by cl_make_set(), cwbci_check_line(), encode_parse_options(), main(), range_close(), range_open(), and range_print_registry_line().
int cl_string_qsort_compare | ( | const char * | s1, |
const char * | s2, | ||
CorpusCharset | charset, | ||
int | flags, | ||
int | reverse | ||
) |
Compares two strings in a qsort-stylie!
This function is designed to be suitable for use as a callback with qsort(). As such, its return values are negative if s1 is "less than" s2; zero if the two strings are the same; and positive if s2 is "greater than" s2. But of course you can also use it on its own.
You cannot use it directly with qsort as its parameters are wrong. It needs to be wrapped in another function that (at least) provides the charset, flags and reverse arguments (e.g. from global variables or by calling other functions).
The two strings must be in the same character set. Both will be made canonical in accordance with the flags argument if it is set. Also, the comparison can be done on reverse-order strings.
Note that if either flags or reverse is non-zero, then memory allocation will be necessary. If you are calling this function in a loop, that could quickly get costly. To avoid this, a pair of one-time-allocated buffers are used - but this doesn't dispense with all need for allocation. [Another option would be to allow a buffer to be optionally supplied....]
s1 | First string to compare. |
s2 | Second string to compare. |
charset | Character set of the two strings. |
flags | IGNORE_CASE, IGNORE_DIAC, both, or neither. |
reverse | Boolean: if true, strings are compared from end to beginning, rather than beginning to end. |
References cl_free, cl_malloc(), CL_MAX_LINE_LENGTH, cl_string_canonical(), cl_string_reverse(), MIN, s1, s2, and utf8.
Referenced by i2compare().
char* cl_string_reverse | ( | const char * | s, |
CorpusCharset | charset | ||
) |
Creates a "backwards" version of the specified string.
The memory for the reversed string is newly allocated. (This is potentially wasteful, but it occurs in the depths of GLib, so short of reinventing the wheel we have to live with it.)
s | String to reverse. |
charset | The character set of the string. |
References cl_strdup(), and utf8.
Referenced by cl_string_qsort_compare(), SortExternally(), and SortSubcorpus().
int cl_string_validate_encoding | ( | char * | s, |
CorpusCharset | charset, | ||
int | repair | ||
) |
Checks the encoding of a string.
This function looks for bad bytes (or byte sequences in the case of UTF8); if any are present, it judges the string invalid. For ISO8859-* encodings, the string can optionally be "repaired" in-place by replacing bad bytes with '?' characters. If the "repair" is successful, the function returns True.
What counts as "bad" is of course relative to the character set that the string is encoded in - so this must be specified.
s | Null-terminated string to check. |
charset | CorpusCharset of the string's encoding. |
repair | if True, replace invalid 8-bit characters by '?' |
References arabic, ascii, cyrillic, greek, hebrew, latin1, latin2, latin3, latin4, latin5, latin6, latin7, latin8, latin9, and utf8.
Referenced by encode_get_input_line(), and prepare_Query().
int cl_string_zap_controls | ( | char * | s, |
CorpusCharset | charset, | ||
char | replace, | ||
int | zap_tabs, | ||
int | zap_newlines | ||
) |
Replaces any invalid control characters in a string.
"Invalid" control characters are any below 0x20.
The string is modified in situ. A typical "replace" to use would be '?' to match the action of cl_string_validate_encoding.
s | The string to modify. |
charset | The character set of the string. |
replace | The replacement character to use. If this is 0, the character is deleted rather than replaced. |
zap_tabs | Whether or not tabs should be zapped (boolean). |
zap_newlines | Whether or not and should be zapped (boolean). |
Referenced by encode_get_input_line().
int cl_struc2cpos | ( | Attribute * | attribute, |
int | struc_num, | ||
int * | struc_start, | ||
int * | struc_end | ||
) |
Retrieves the start-and-end corpus positions of a specified structure of the given s-attribute type.
attribute | An s-attribute. |
struc_num | The instance of that s-attribute to retrieve (i.e. the struc_num'th instance of this s-attribute in the corpus). |
struc_start | Location to put the starting corpus position. |
struc_end | Location to put the ending corpus position. |
References ATT_STRUC, CDA_EIDXORNG, CDA_ENODATA, CDA_OK, check_arg, cl_errno, CompStrucData, TMblob::data, TComponent::data, ensure_component(), and TComponent::size.
Referenced by align_print_line(), compose_kwic_line(), decode_print_token_sequence(), do_cqi_cl_cpos2lbound(), do_cqi_cl_cpos2rbound(), do_cqi_cl_struc2cpos(), eval_constraint(), get_position_values(), main(), and matchfirstpattern().
char* cl_struc2str | ( | Attribute * | attribute, |
int | struc_num | ||
) |
Gets the value that is associated with the specified instance of the given s-attribute.
attribute | An S-attribute. |
struc_num | ID of the structure whose value is wanted (ie, function gets value of struc_num'th instance of this s-attribute) |
References ATT_STRUC, CDA_EIDXORNG, CDA_EINTERNAL, CDA_ENODATA, CDA_OK, check_arg, cl_errno, cl_struc_values(), CompStrucAVS, CompStrucAVX, TMblob::data, TComponent::data, ensure_component(), s_v_comp(), and TComponent::size.
Referenced by compute_grouping(), decode_print_surrounding_s_att_values(), decode_print_token_sequence(), do_cqi_cl_struc2str(), eval_constraint(), get_position_values(), main(), matchfirstpattern(), and scancorpus_add_key().
int cl_struc_values | ( | Attribute * | attribute | ) |
Checks whether this s-attribute has attribute values.
References ATT_STRUC, CDA_OK, check_arg, cl_errno, component_state(), ComponentLoaded, ComponentUnloaded, CompStrucAVS, CompStrucAVX, Struc_Attribute::has_attribute_values, and _Attribute::struc.
Referenced by cl_struc2str(), compute_grouping(), decode_print_token_sequence(), describecorpus_show_statistics(), do_cqi_corpus_structural_attribute_has_values(), do_XMLTag(), get_position_values(), main(), print_tabulation(), PrintAttributes(), PrintAttributesSimple(), and scancorpus_add_key().
char* cl_xml_entity_decode | ( | char * | s | ) |
Decode XML entities in a string.
This function decodes pre-defined XML entities in string s. It overwrites the input string s and also returns s for convenience.
(The entities are < > & " ').
TODO -- numeric entities?
If passed NULL, it will not fall over - it will just pass NULL back!
This function is safe for strings in any encoding. The returned string will be at the same memory location and will always be the same length or shorter after the decoding of entities.
s | A string to decode. |
Referenced by encode_add_wattr_line(), and range_open().
Boolean switch enabling/disabling latex-style escapes.
By default, it is false; if programs wish to allow these escapes they need to offer some means of changing this variable.
Note that enabling this variable may cause scrambling of the string for LatinX strings where X is not 1; and may cause undefined errors for UTF8 strings. In short, you should only activate it when you are working with a corpus whose charset is Latin1.
Referenced by cl_string_latex2iso().
int cl_errno |
Error number for CL: is set after access to any of various corpus-data-access functions.
Referenced by cl_alg2cpos(), cl_cpos2alg(), cl_cpos2alg2cpos_oldstyle(), cl_cpos2boundary(), cl_cpos2id(), cl_cpos2str(), cl_cpos2struc(), cl_cpos2struc2cpos(), cl_cpos2struc_oldstyle(), cl_dynamic_call(), cl_dynamic_numargs(), cl_error(), cl_has_extended_alignment(), cl_id2all(), cl_id2cpos_oldstyle(), cl_id2freq(), cl_id2sort(), cl_id2str(), cl_id2strlen(), cl_idlist2cpos_oldstyle(), cl_idlist2freq(), cl_index_compressed(), cl_make_set(), cl_max_alg(), cl_max_cpos(), cl_max_id(), cl_max_struc(), cl_new_regex(), cl_new_stream(), cl_regex2id(), cl_sequence_compressed(), cl_set_intersection(), cl_set_size(), cl_sort2id(), cl_str2id(), cl_struc2cpos(), cl_struc2str(), cl_struc_values(), compress_reversed_index(), decode_print_token_sequence(), decompress_check_reversed_index(), get_corpus_positions(), get_nr_of_strucs(), lexdecode_print_item_info(), lexdecode_show(), send_cl_error(), and Setop().
char cl_regex_error[] |
The error message from (PCRE) regex compilation are placed in this buffer if cl_new_regex() fails.
This global variable is part of the CL_Regex object's API.
Referenced by cl_new_regex(), and cl_regex2id().