CWB
|
#include "../cl/corpus.h"
#include "../cl/bitfields.h"
#include "cqp.h"
#include "context_descriptor.h"
The FieldType object represents the fields (or 'anchor points') of a subcorpus (inc query result).
More...Underlying enumeration for CorpusType.
More...An enumeration for lowercase vs.
More...typedef struct cl CorpusList |
The CorpusList object records information on a corpus that CQP recognises.
This might be an actual corpus in the registry, or a subcorpus, or a query result.
Note that CorpusList is a bit of a misnomer, although it IS a linked-list entry object, and CQP keeps information on the "corpora" it currently knows about on such a list, in fact this object is often used for passing around information about individual corpora, queries etc. as well.
typedef enum corpus_type CorpusType |
The CorpusType object.
typedef enum _field_type FieldType |
The FieldType object represents the fields (or 'anchor points') of a subcorpus (inc query result).
NoField is always the last field (so it can be used to determine range of field type codes) (the mnemonic is NoField = "no field" = "number of fields")
The Range object represents a range of corpus positions - for instance, the range enclosed by an instance of an s-attribute.
enum _field_type |
The FieldType object represents the fields (or 'anchor points') of a subcorpus (inc query result).
NoField is always the last field (so it can be used to determine range of field type codes) (the mnemonic is NoField = "no field" = "number of fields")
enum case_mode |
enum corpus_type |
Boolean access_corpus | ( | CorpusList * | cl | ) |
Assesses whether a specified corpus can be accessed.
That is, it makes sure that the data for corpus in "cl" is loaded and accessible.
cl | A CorpusList specifying the corpus to check. |
References attach_subcorpus(), False, cl::loaded, cl::range, cl::saved, cl::size, SUB, SYSTEM, TEMP, True, and cl::type.
Referenced by catalog_corpus(), change_corpus(), CorpusLoad(), cqi_find_corpus(), cqi_lookup_attribute(), do_cqi_corpus_attributes(), do_cqi_corpus_full_name(), findcorpus(), prepare_AlignmentConstraints(), prepare_Query(), red_factor(), Setop(), SortSubcorpus(), and SortSubcorpusRandomize().
CorpusList* assign_temp_to_sub | ( | CorpusList * | tmp, |
char * | subname | ||
) |
Convert a temporary corpus to a real subcorpus.
assign_temp_to_sub assigns the temporary corpus in *tmp to a "real" subcorpus with name "subname". If such a subcorpus already exists, it is overwritten. The temporary corpus is deleted afterwards. The return value is the new subcorpus (which may be equal to tmp, but not necessarily).
tmp | Temporary corpus to convert. |
subname | Name to use for new subcorpus. |
References cl::abs_fn, auto_save, cl_free, cl_strdup(), cl::corpus, dropcorpus(), False, findcorpus(), initialize_cl(), cl::keywords, cl::loaded, cl::mother_name, cl::mother_size, cl::name, cl::needs_update, cl::query_corpus, cl::query_text, cl::range, cl::registry, save_subcorpus(), cl::saved, cl::size, cl::sortidx, SUB, cl::targets, TEMP, True, cl::type, and UNDEF.
Referenced by CorpusChangeTMPtoSUB(), do_undump(), and in_UnnamedCorpusCommand().
Make a corpus accessible for searching as the "current" corpus.
change_corpus sets the current corpus to the corpus with name "name", first searching SUB corpora, then searching SYSTEM corpora.
When a corpus is "made accessible", its name is checked for validity and availability; if all is OK, set_current_corpus is called on it.
name | A string indicating the name of a corpus. |
silent | Boolean. Ignored. |
References access_corpus(), False, cl::name, search_corpus(), set_current_corpus(), and True.
void check_available_corpora | ( | enum corpus_type | ct | ) |
References load_corpusnames(), LOCAL_CORP_PATH, set_current_corpus(), SUB, SYSTEM, TEMP, and UNDEF.
Referenced by CorpusLoadDescriptors(), execute_side_effects(), and initialize_cqp().
void drop_temp_corpora | ( | void | ) |
Delete temproary corpora.
drop_temp_corpora clears the list of corpora of all temporary stuff.
References corpuslist, dropcorpus(), initialize_cl(), cl::next, TEMP, True, and cl::type.
Referenced by CorpusDiscardTMPCorpora(), do_undump(), in_UnnamedCorpusCommand(), and load_corpusnames().
void dropcorpus | ( | CorpusList * | cl | ) |
Remove a corpus from the global list of corpora.
cl | The corpus to drop. |
References corpuslist, current_corpus, initialize_cl(), cl::next, set_current_corpus(), and True.
Referenced by assign_temp_to_sub(), attach_subcorpus(), copy_intervals(), CorpusDiscard(), do_cqi_cqp_drop_subcorpus(), drop_temp_corpora(), ensure_corpus_size(), and main().
CorpusList* duplicate_corpus | ( | CorpusList * | cl, |
char * | new_name, | ||
Boolean | force_overwrite | ||
) |
Duplicate a corpus via its CorpusList object.
duplicate_corpus creates a copy of an existing corpus and casts its type to SUB. The new corpus is given the name "new_name". If a subcorpus of that name is already present, NULL is retured if force_overwrite is False. If force_overwrite is True, the old corpus is discarded.
cl | The corpus to duplicate |
new_name | Name for the duplicated corpus. |
force_overwrite | Boolean: whether or not to force an overwrite if the subcorpus you are attempting to create already exists. |
References cl::abs_fn, auto_save, cl_malloc(), cl_strdup(), cl::corpus, corpuslist, cqpmessage(), False, initialize_cl(), cl::keywords, cl::loaded, LoadedCorpus(), cl::mother_name, cl::mother_size, cl::name, cl::needs_update, NewCL(), cl::next, cl::query_corpus, cl::query_text, cl::range, cl::registry, save_subcorpus(), cl::saved, cl::size, cl::sortidx, SUB, SYSTEM, cl::targets, True, cl::type, and Warning.
Referenced by copy_intervals(), CorpusDuplicate(), findcorpus(), and in_CorpusCommand().
FieldType field_name_to_type | ( | char * | name | ) |
Returns a FieldType enumeration corresponding to the field name indicated by its stirng argument.
References KeywordField, MatchEndField, MatchField, NoField, and TargetField.
Referenced by do_cqi_cqp_fdist_1(), do_cqi_cqp_fdist_2(), and labellookup().
char* field_type_to_name | ( | FieldType | field | ) |
Returns a pointer to an internal constant string that labels the FieldType argument.
References cqpmessage(), Error, KeywordField, MatchEndField, MatchField, NoField, and TargetField.
Referenced by do_AnchorPoint(), and prepare_do_subset().
CorpusList* findcorpus | ( | char * | s, |
enum corpus_type | type, | ||
int | try_recursive_search | ||
) |
CorpusList* FirstCorpusFromList | ( | ) |
Gets the CorpusList pointer for the first corpus on the currently-loaded list.
Function for iterating through the list of currently-loaded corpora.
References corpuslist.
Referenced by do_cqi_corpus_list_corpora(), do_cqi_cqp_list_subcorpora(), and main().
void free_corpuslist | ( | void | ) |
Frees the global list of currently-loaded corpora.
This function sets the corpus list to NULL and frees all members of the list.
References corpuslist, initialize_cl(), cl::next, set_current_corpus(), and True.
Referenced by CorpusListFree().
void init_corpuslist | ( | void | ) |
Initialises the global corpus list (sets it to NULL, no matter what its value was).
References set_current_corpus().
Referenced by CorpusListInit().
Boolean is_qualified | ( | char * | corpusname | ) |
Checks whether corpusname is fully qualified (with name of mother corpus); does not imply syntatic validity.
References COLON.
Referenced by CorpusNameQualified(), do_undump(), and in_CorpusCommand().
CorpusList* make_temp_corpus | ( | CorpusList * | cl, |
char * | new_name | ||
) |
Copy a corpus as type TEMP.
make_temp_corpus makes a copy of the corpus in *cl into a corpus of type "TEMP" with name "new_name". If a temporary corpus with that name already exists, it is overwritten.
cl | The corpus to copy. |
new_name | Name for the temporary copy. |
References cl::abs_fn, cl_malloc(), cl_strdup(), cl::corpus, corpuslist, False, findcorpus(), initialize_cl(), cl::keywords, cl::loaded, cl::mother_name, cl::mother_size, cl::name, cl::needs_update, NewCL(), cl::next, cl::query_corpus, cl::query_text, cl::range, cl::registry, cl::saved, cl::size, cl::sortidx, cl::targets, TEMP, True, and cl::type.
Referenced by CorpusDuplicateIntoTMP(), do_setop(), do_undump(), in_UnnamedCorpusCommand(), prepare_do_subset(), and prepare_Query().
CorpusList* NextCorpusFromList | ( | CorpusList * | cl | ) |
Gets the CorpusList pointer for the next corpus on the currently-loaded list.
Function for iterating through the list of currently-loaded corpora.
cl | The current corpus on the list. |
References cl::next.
Referenced by do_cqi_corpus_list_corpora(), do_cqi_cqp_list_subcorpora(), and main().
int NrFieldValues | ( | CorpusList * | cl, |
FieldType | ft | ||
) |
References KeywordField, cl::keywords, MatchField, NoField, cl::size, TargetField, and cl::targets.
Boolean save_subcorpus | ( | CorpusList * | cl, |
char * | fname | ||
) |
References cl::abs_fn, CL_MAX_FILENAME_LENGTH, cqpmessage(), False, cl::keywords, cl::loaded, LOCAL_CORP_PATH, cl::mother_name, cl::name, cl::needs_update, open_file(), cl::range, cl::registry, cl::saved, cl::size, cl::sortidx, SUB, SUBCORPMAGIC, SUBDIR_SEPARATOR, cl::targets, True, cl::type, and Warning.
Referenced by after_CorpusCommand(), assign_temp_to_sub(), copy_intervals(), CorpusSave(), delete_intervals(), do_save(), duplicate_corpus(), and save_unsaved_subcorpora().
void save_unsaved_subcorpora | ( | ) |
References cqpmessage(), False, LOCAL_CORP_PATH, cl::next, save_subcorpus(), cl::saved, SUB, cl::type, and Warning.
Referenced by CorpusSaveAll(), and cqp_parse_file().
int set_current_corpus | ( | CorpusList * | cp, |
int | force | ||
) |
Sets the current corpus (by pointer to the corpus).
Also, executes Xkwic side effects, if necessary.
cp | Pointer to the corpus to set as current. cp may be NULL, which is legal. |
force | If true, the current corpus is set to the specified corpus, even if it is ALREADY set to that corpus. |
References _context_description_block::attributes, CD, cl::corpus, current_corpus, DEFAULT_ATT_NAME, DestroyAttributeList(), FindInAL(), _attlist::list, _attrbuf::next, _attrbuf::status, _context_description_block::strucAttributes, and update_context_descriptor().
Referenced by after_CorpusCommand(), change_corpus(), check_available_corpora(), CorpusSetCurrent(), cqi_activate_corpus(), dropcorpus(), free_corpuslist(), init_corpuslist(), and set_current_corpus_name().
int set_current_corpus_name | ( | char * | name, |
int | force | ||
) |
Sets the current corpus (by name).
Also, execustes Xkwic side effects, if necessary.
name | Name of the corpus to set as current. |
force | If true, the current corpus is set to the specified corpus, even if it is ALREADY set to that corpus. |
References findcorpus(), set_current_corpus(), and UNDEF.
Referenced by CorpusSetCurrentByname(), and initialize_cqp().
void show_corpora_files | ( | enum corpus_type | ct | ) |
A function to print out a list of corpora currently available.
"files" is a misnomer; it actually looks on the global list of currently loaded corpora, and prints their names.
Either system corpora (SYSTEM) or subcorpora (SUB) can be shown, depending on ct. If ct is UNDEF, both are shown.
For subcorpora, a bundle of other information is shown too.
ct | Type of corpus to show (SUB, SYSTEM or UNDEF). |
References show_corpora_files1(), SUB, SYSTEM, and UNDEF.
Referenced by CorpusShowNames().
char* split_subcorpus_name | ( | char * | corpusname, |
char * | mother_name | ||
) |
Splits a query result corpus-name into qualifier and local name.
This function splits query result name {corpusname} into qualifier (name of mother corpus) and local name; returns pointer to local name part, or NULL if {corpusname} is not syntactically valid; if mother_name is not NULL, it must point to a buffer of suitable length (CL_MAX_LINE_LENGTH is sufficient) where the qualifier will be stored (empty string for unqualified corpus, and return value == {corpusname} in this case)
References COLON.
Referenced by do_undump(), and valid_subcorpus_name().
int touch_corpus | ( | CorpusList * | cp | ) |
Touches a corpus, ie, marks it as changed.
cp | The corpus to touch. This must be of type SUB. |
References cl::needs_update, cl::saved, SUB, and cl::type.
Referenced by CorpusTouch(), delete_intervals(), do_cut(), evaluate_target(), findcorpus(), RangeSetop(), set_target(), SortSubcorpus(), and SortSubcorpusRandomize().
Boolean valid_subcorpus_name | ( | char * | corpusname | ) |
Checks whether corpusname is syntactically valid as a query result name.
References False, split_subcorpus_name(), and True.
Referenced by do_undump().
Global pointer to the head of CQP's linked list of corpora.
Global pointer to the "current" corpus.
Referenced by cqp_parse_file(), do_attribute_show(), do_MUQuery(), do_StandardQuery(), do_undump(), dropcorpus(), execute_side_effects(), in_CorpusCommand(), LoadedCorpus(), prepare_Query(), and set_current_corpus().