CWB
Data Structures | Typedefs | Enumerations | Functions | Variables

corpmanag.h File Reference

#include "../cl/corpus.h"
#include "../cl/bitfields.h"
#include "cqp.h"
#include "context_descriptor.h"

Data Structures

Typedefs

Enumerations

Functions

Variables


Typedef Documentation

typedef struct cl CorpusList

The CorpusList object records information on a corpus that CQP recognises.

This might be an actual corpus in the registry, or a subcorpus, or a query result.

Note that CorpusList is a bit of a misnomer, although it IS a linked-list entry object, and CQP keeps information on the "corpora" it currently knows about on such a list, in fact this object is often used for passing around information about individual corpora, queries etc. as well.

typedef enum corpus_type CorpusType

The CorpusType object.

typedef enum _field_type FieldType

The FieldType object represents the fields (or 'anchor points') of a subcorpus (inc query result).

NoField is always the last field (so it can be used to determine range of field type codes) (the mnemonic is NoField = "no field" = "number of fields")

typedef struct _Range Range

The Range object represents a range of corpus positions - for instance, the range enclosed by an instance of an s-attribute.


Enumeration Type Documentation

The FieldType object represents the fields (or 'anchor points') of a subcorpus (inc query result).

NoField is always the last field (so it can be used to determine range of field type codes) (the mnemonic is NoField = "no field" = "number of fields")

Enumerator:
MatchField 
MatchEndField 
TargetField 
KeywordField 
NoField 
enum case_mode

An enumeration for lowercase vs.

uppercase mode.

Enumerator:
LOWER 
UPPER 

Underlying enumeration for CorpusType.

Gives an identifier for each "type" of corpus CQP can deal with.

Enumerator:
UNDEF 

undefined status

SYSTEM 

system corpus, ie registered corpus

SUB 

subcorpus which was generated by a query

TEMP 

temporary subcorpus, deleted after query

ALL 

Function Documentation

Boolean access_corpus ( CorpusList cl)

Assesses whether a specified corpus can be accessed.

That is, it makes sure that the data for corpus in "cl" is loaded and accessible.

Parameters:
clA CorpusList specifying the corpus to check.
Returns:
A boolean - true if cl can be accessed.

References attach_subcorpus(), False, cl::loaded, cl::range, cl::saved, cl::size, SUB, SYSTEM, TEMP, True, and cl::type.

Referenced by catalog_corpus(), change_corpus(), CorpusLoad(), cqi_find_corpus(), cqi_lookup_attribute(), do_cqi_corpus_attributes(), do_cqi_corpus_full_name(), findcorpus(), prepare_AlignmentConstraints(), prepare_Query(), red_factor(), Setop(), SortSubcorpus(), and SortSubcorpusRandomize().

CorpusList* assign_temp_to_sub ( CorpusList tmp,
char *  subname 
)

Convert a temporary corpus to a real subcorpus.

assign_temp_to_sub assigns the temporary corpus in *tmp to a "real" subcorpus with name "subname". If such a subcorpus already exists, it is overwritten. The temporary corpus is deleted afterwards. The return value is the new subcorpus (which may be equal to tmp, but not necessarily).

Parameters:
tmpTemporary corpus to convert.
subnameName to use for new subcorpus.
Returns:
Pointer to new subcorpus.

References cl::abs_fn, auto_save, cl_free, cl_strdup(), cl::corpus, dropcorpus(), False, findcorpus(), initialize_cl(), cl::keywords, cl::loaded, cl::mother_name, cl::mother_size, cl::name, cl::needs_update, cl::query_corpus, cl::query_text, cl::range, cl::registry, save_subcorpus(), cl::saved, cl::size, cl::sortidx, SUB, cl::targets, TEMP, True, cl::type, and UNDEF.

Referenced by CorpusChangeTMPtoSUB(), do_undump(), and in_UnnamedCorpusCommand().

Boolean change_corpus ( char *  name,
Boolean  silent 
)

Make a corpus accessible for searching as the "current" corpus.

change_corpus sets the current corpus to the corpus with name "name", first searching SUB corpora, then searching SYSTEM corpora.

When a corpus is "made accessible", its name is checked for validity and availability; if all is OK, set_current_corpus is called on it.

Parameters:
nameA string indicating the name of a corpus.
silentBoolean. Ignored.
Returns:
Boolean. True if the corpus was set successfully, otherwise false.

References access_corpus(), False, cl::name, search_corpus(), set_current_corpus(), and True.

void check_available_corpora ( enum corpus_type  ct)
void drop_temp_corpora ( void  )

Delete temproary corpora.

drop_temp_corpora clears the list of corpora of all temporary stuff.

References corpuslist, dropcorpus(), initialize_cl(), cl::next, TEMP, True, and cl::type.

Referenced by CorpusDiscardTMPCorpora(), do_undump(), in_UnnamedCorpusCommand(), and load_corpusnames().

void dropcorpus ( CorpusList cl)

Remove a corpus from the global list of corpora.

See also:
corpuslist
Parameters:
clThe corpus to drop.

References corpuslist, current_corpus, initialize_cl(), cl::next, set_current_corpus(), and True.

Referenced by assign_temp_to_sub(), attach_subcorpus(), copy_intervals(), CorpusDiscard(), do_cqi_cqp_drop_subcorpus(), drop_temp_corpora(), ensure_corpus_size(), and main().

CorpusList* duplicate_corpus ( CorpusList cl,
char *  new_name,
Boolean  force_overwrite 
)

Duplicate a corpus via its CorpusList object.

duplicate_corpus creates a copy of an existing corpus and casts its type to SUB. The new corpus is given the name "new_name". If a subcorpus of that name is already present, NULL is retured if force_overwrite is False. If force_overwrite is True, the old corpus is discarded.

Parameters:
clThe corpus to duplicate
new_nameName for the duplicated corpus.
force_overwriteBoolean: whether or not to force an overwrite if the subcorpus you are attempting to create already exists.
Returns:
NULL if you attempted to overwrite with force_overwrite == false. Otherwise, a CorpusList pointer to the new corpus.

References cl::abs_fn, auto_save, cl_malloc(), cl_strdup(), cl::corpus, corpuslist, cqpmessage(), False, initialize_cl(), cl::keywords, cl::loaded, LoadedCorpus(), cl::mother_name, cl::mother_size, cl::name, cl::needs_update, NewCL(), cl::next, cl::query_corpus, cl::query_text, cl::range, cl::registry, save_subcorpus(), cl::saved, cl::size, cl::sortidx, SUB, SYSTEM, cl::targets, True, cl::type, and Warning.

Referenced by copy_intervals(), CorpusDuplicate(), findcorpus(), and in_CorpusCommand().

FieldType field_name_to_type ( char *  name)

Returns a FieldType enumeration corresponding to the field name indicated by its stirng argument.

References KeywordField, MatchEndField, MatchField, NoField, and TargetField.

Referenced by do_cqi_cqp_fdist_1(), do_cqi_cqp_fdist_2(), and labellookup().

char* field_type_to_name ( FieldType  field)

Returns a pointer to an internal constant string that labels the FieldType argument.

References cqpmessage(), Error, KeywordField, MatchEndField, MatchField, NoField, and TargetField.

Referenced by do_AnchorPoint(), and prepare_do_subset().

CorpusList* findcorpus ( char *  s,
enum corpus_type  type,
int  try_recursive_search 
)
CorpusList* FirstCorpusFromList ( )

Gets the CorpusList pointer for the first corpus on the currently-loaded list.

Function for iterating through the list of currently-loaded corpora.

Returns:
The requested CorpusList pointer.

References corpuslist.

Referenced by do_cqi_corpus_list_corpora(), do_cqi_cqp_list_subcorpora(), and main().

void free_corpuslist ( void  )

Frees the global list of currently-loaded corpora.

This function sets the corpus list to NULL and frees all members of the list.

References corpuslist, initialize_cl(), cl::next, set_current_corpus(), and True.

Referenced by CorpusListFree().

void init_corpuslist ( void  )

Initialises the global corpus list (sets it to NULL, no matter what its value was).

References set_current_corpus().

Referenced by CorpusListInit().

Boolean is_qualified ( char *  corpusname)

Checks whether corpusname is fully qualified (with name of mother corpus); does not imply syntatic validity.

References COLON.

Referenced by CorpusNameQualified(), do_undump(), and in_CorpusCommand().

CorpusList* make_temp_corpus ( CorpusList cl,
char *  new_name 
)

Copy a corpus as type TEMP.

make_temp_corpus makes a copy of the corpus in *cl into a corpus of type "TEMP" with name "new_name". If a temporary corpus with that name already exists, it is overwritten.

Parameters:
clThe corpus to copy.
new_nameName for the temporary copy.
Returns:
NULL for error. Otherwise, a CorpusList pointer to the new corpus.

References cl::abs_fn, cl_malloc(), cl_strdup(), cl::corpus, corpuslist, False, findcorpus(), initialize_cl(), cl::keywords, cl::loaded, cl::mother_name, cl::mother_size, cl::name, cl::needs_update, NewCL(), cl::next, cl::query_corpus, cl::query_text, cl::range, cl::registry, cl::saved, cl::size, cl::sortidx, cl::targets, TEMP, True, and cl::type.

Referenced by CorpusDuplicateIntoTMP(), do_setop(), do_undump(), in_UnnamedCorpusCommand(), prepare_do_subset(), and prepare_Query().

CorpusList* NextCorpusFromList ( CorpusList cl)

Gets the CorpusList pointer for the next corpus on the currently-loaded list.

Function for iterating through the list of currently-loaded corpora.

Parameters:
clThe current corpus on the list.
Returns:
The requested CorpusList pointer.

References cl::next.

Referenced by do_cqi_corpus_list_corpora(), do_cqi_cqp_list_subcorpora(), and main().

int NrFieldValues ( CorpusList cl,
FieldType  ft 
)
Boolean save_subcorpus ( CorpusList cl,
char *  fname 
)
void save_unsaved_subcorpora ( )
int set_current_corpus ( CorpusList cp,
int  force 
)

Sets the current corpus (by pointer to the corpus).

Also, executes Xkwic side effects, if necessary.

Parameters:
cpPointer to the corpus to set as current. cp may be NULL, which is legal.
forceIf true, the current corpus is set to the specified corpus, even if it is ALREADY set to that corpus.
Returns:
Always 1.

References _context_description_block::attributes, CD, cl::corpus, current_corpus, DEFAULT_ATT_NAME, DestroyAttributeList(), FindInAL(), _attlist::list, _attrbuf::next, _attrbuf::status, _context_description_block::strucAttributes, and update_context_descriptor().

Referenced by after_CorpusCommand(), change_corpus(), check_available_corpora(), CorpusSetCurrent(), cqi_activate_corpus(), dropcorpus(), free_corpuslist(), init_corpuslist(), and set_current_corpus_name().

int set_current_corpus_name ( char *  name,
int  force 
)

Sets the current corpus (by name).

Also, execustes Xkwic side effects, if necessary.

Parameters:
nameName of the corpus to set as current.
forceIf true, the current corpus is set to the specified corpus, even if it is ALREADY set to that corpus.
Returns:
True if the corpus was found and set, otherwise false if the corpus could not be found.

References findcorpus(), set_current_corpus(), and UNDEF.

Referenced by CorpusSetCurrentByname(), and initialize_cqp().

void show_corpora_files ( enum corpus_type  ct)

A function to print out a list of corpora currently available.

"files" is a misnomer; it actually looks on the global list of currently loaded corpora, and prints their names.

Either system corpora (SYSTEM) or subcorpora (SUB) can be shown, depending on ct. If ct is UNDEF, both are shown.

For subcorpora, a bundle of other information is shown too.

Parameters:
ctType of corpus to show (SUB, SYSTEM or UNDEF).

References show_corpora_files1(), SUB, SYSTEM, and UNDEF.

Referenced by CorpusShowNames().

char* split_subcorpus_name ( char *  corpusname,
char *  mother_name 
)

Splits a query result corpus-name into qualifier and local name.

This function splits query result name {corpusname} into qualifier (name of mother corpus) and local name; returns pointer to local name part, or NULL if {corpusname} is not syntactically valid; if mother_name is not NULL, it must point to a buffer of suitable length (CL_MAX_LINE_LENGTH is sufficient) where the qualifier will be stored (empty string for unqualified corpus, and return value == {corpusname} in this case)

References COLON.

Referenced by do_undump(), and valid_subcorpus_name().

int touch_corpus ( CorpusList cp)

Touches a corpus, ie, marks it as changed.

Parameters:
cpThe corpus to touch. This must be of type SUB.
Returns:
Boolean: true if the touch worked, otherwise false.

References cl::needs_update, cl::saved, SUB, and cl::type.

Referenced by CorpusTouch(), delete_intervals(), do_cut(), evaluate_target(), findcorpus(), RangeSetop(), set_target(), SortSubcorpus(), and SortSubcorpusRandomize().

Boolean valid_subcorpus_name ( char *  corpusname)

Checks whether corpusname is syntactically valid as a query result name.

References False, split_subcorpus_name(), and True.

Referenced by do_undump().


Variable Documentation

Global pointer to the head of CQP's linked list of corpora.