CWB
Data Structures | Defines | Typedefs | Enumerations | Functions

attributes.h File Reference

#include "globals.h"
#include "storage.h"
#include "corpus.h"

Data Structures

Defines

Typedefs

Enumerations

Functions


Define Documentation

#define ATTS_LOCAL   1
#define ATTS_NONE   0
#define COMMON_ATTR_FIELDS
Value:
int type;                                            \
char *name;                    \
union _Attribute *next;               \
int attr_number;            \
char *path;                                \
 \
struct TCorpus *mother;                  \
Component *components[CompLast]       \

Members found in ALL the different types of Attribute object.

#define DEFAULT_ATT_NAME   "word"
#define MAXCODELEN   32

The maximum length of a single code, which is also the number of possible code lengths.

Referenced by compute_code_lengths(), load_component(), ReadHCD(), and WriteHCD().

#define SYNCHRONIZATION   128

Typedef Documentation

typedef struct TComponent Component

The Component object.

A "component" is one of the data-chunks on disk that make up a CWB corpus. Each corpus attribute (of whatever kind) consists of an array (vector) of components, along with some other fields dependent on what type of attribute it is.

See also:
ComponentID
Attribute
_Attribute

ComponentID: index for the array of components in each Attribute object.

Possible states for an attribute component.

typedef struct _DynArg DynArg

The DynArg object contains an argument for a dynamic attribute.

typedef struct _huffman_code_descriptor HCD

A Huffman Code Descriptor block (HCD) for Huffman compressed sequences.


Enumeration Type Documentation

Possible states for an attribute component.

Enumerator:
ComponentLoaded 

valid and loaded

ComponentUnloaded 

valid and on disk

ComponentDefined 

valid but not yet created

ComponentUndefined 

invalid

ComponentID: index for the array of components in each Attribute object.

Enumerator:
CompDirectory 

the directory where an attribute is stored

CompCorpus 

the sequence of word IDs

CompRevCorpus 

reversed file of corpus

CompRevCorpusIdx 

index to reversed file

CompCorpusFreqs 

absolute frequencies of corpus

CompLexicon 

wordlist

CompLexiconIdx 

index to wordlist

CompLexiconSrt 

sorted index to wordlist

CompAlignData 

alignment data

CompXAlignData 

extended alignment attribute

CompStrucData 

structure data

CompStrucAVS 

structure attribute values

CompStrucAVX 

structure attribute value index

CompHuffSeq 

Huffman compressed item sequence.

CompHuffCodes 

Code descriptor data for CompHuffSeq.

CompHuffSync 

Synchronisation of Compressed Item Seq.

CompCompRF 

compressed reversed file (CompRevCorpus)

CompCompRFX 

index for CompCompRFX (subst CompRCIdx)

CompLast 

MUST BE THE LAST ELEMENT OF THIS ENUM -- it is used for limiting loops on component arrays.


Function Documentation

char* cid_name ( ComponentID  cid)

Gets a string containing the name of the attribute component with the specified ID-code.

References find_cid_id(), and component_field_spec::name.

Referenced by create_component(), declare_component(), describe_component(), ensure_component(), load_component(), makeall_do_attribute(), makeall_make_component(), and validate_revcorp().

ComponentState comp_component_state ( Component comp)
int comp_drop_component ( Component comp)

Delete a Component object.

The specified component object, and all memory associated with it, is freed.

Returns:
Always 1.

References _Attribute::any, TComponent::attribute, cl_free, CompHuffCodes, CompLast, TComponent::corpus, TComponent::data, POS_Attribute::hc, TComponent::id, mfree(), TComponent::path, and _Attribute::pos.

Referenced by cl_delete_attribute(), and drop_component().

char* component_full_name ( Attribute attribute,
ComponentID  cid,
char *  path 
)

Initializes the path of an attribute Component.

This function starts with the path it is passed, and then evaluates variables in the form $UPPERCASE. The resulting path is assigned to the specified entry in the component array for the given Attribute.

Note that if it is called for a Component that does not yet exist, this function creates the component by calling declare_component().

See also:
declare_component
Component_Field_Specs
Parameters:
attributeThe Attribute object to work with.
cidThe identifier of the Component to which the path is to be added.
pathThe path to assign to the component. Can be NULL, in which case, the default path from Component_Field_Specs is used.
Returns:
Pointer to this function's static buffer for creating the path (NB: NOT to the path in the actual component! which is a copy). If a path already exists, a pointer to that path. NULL in case of error in Component_Field_Specs.

References _Attribute::any, buf, cl_strdup(), component_full_name(), declare_component(), component_field_spec::default_path, find_cid_id(), find_cid_name(), component_field_spec::id, MAX_LINE_LENGTH, TComponent::path, and STREQ.

Referenced by component_full_name(), compress_reversed_index(), compute_code_lengths(), creat_freqs(), declare_component(), decode_check_huff(), decompress_check_reversed_index(), and main().

ComponentID component_id ( char *  name)

Gets the identifier of the attribute component with the specified name.

References CompLast, find_cid_name(), and component_field_spec::id.

Referenced by main().

ComponentState component_state ( Attribute attribute,
ComponentID  cid 
)

Gets the state of a specified component on the given attribute.

Parameters:
attributeThe attribute to look at.
cidThe component whose state to get.
Returns:
The return value in case the component is not found is ComponentUndefined. Otherwise, some other value of ComponentState.

References _Attribute::any, comp_component_state(), CompLast, and ComponentUndefined.

Referenced by cl_has_extended_alignment(), cl_index_compressed(), cl_sequence_compressed(), cl_struc_values(), component_ok(), create_component(), and makeall_make_component().

Component* create_component ( Attribute attribute,
ComponentID  cid 
)

Creates the specified component for the given Attribute.

This function only works for the following components: CompRevCorpus, CompRevCorpusIdx, CompLexiconSrt, CompCorpusFreqs. Also, it only works if the state of the component is ComponentDefined.

"Create" here means create the CWB data files. This is accomplished by calling one of the "creat_*" functions, of which there is one for each of the four available component types. These are defined in makecomps.c.

Each of these functions reads in the data it needs, processes it, and then writes a new file.

Parameters:
attributeThe Attribute object to work with.
cidThe identifier of the Component to create.
Returns:
Pointer to the component created, or NULL in case of error (e.g. if an invalid component was requested).

References aid_name(), _Attribute::any, cid_name(), cl_debug, CompAlignData, CompCompRF, CompCompRFX, CompCorpus, CompCorpusFreqs, CompDirectory, CompHuffCodes, CompHuffSeq, CompHuffSync, CompLast, CompLexicon, CompLexiconIdx, CompLexiconSrt, component_state(), ComponentDefined, CompRevCorpus, CompRevCorpusIdx, CompStrucAVS, CompStrucAVX, CompStrucData, CompXAlignData, creat_freqs(), creat_rev_corpus(), creat_rev_corpus_idx(), creat_sort_lexicon(), TMblob::data, TComponent::data, TComponent::path, and _Attribute::type.

Referenced by ensure_component(), and makeall_make_component().

Component* declare_component ( Attribute attribute,
ComponentID  cid,
char *  path 
)

Sets up a component for the given attribute.

If the component of the specified ComponentID does not already exist, a new Component object is created, set up, and assigned to the attribute's component array. Finally, the component path is initialised using the path argument.

See also:
component_full_name
Parameters:
attributeThe Attribute for which to create this component.
cidThe ID of the component to create.
pathPath to be passed to component_full_name. Can be NULL.
Returns:
The new component if all is OK. If a component with the specified ID already exists, it is returned and no new component is created (and a warning message is printed to STDERR). If the attribute is NULL, return is NULL (and a warning is printed).

References _Attribute::any, TComponent::attribute, cid_name(), component_full_name(), TComponent::corpus, TComponent::data, TComponent::id, init_mblob(), and TComponent::path.

Referenced by component_full_name(), and declare_default_components().

void declare_default_components ( Attribute attribute)

Sets up a default set of components on the given attribute.

Note that in each case, a call is made to declare_component with the path as NULL.

See also:
declare_component

References _Attribute::any, CompDirectory, CompLast, declare_component(), _Attribute::type, and component_field_spec::using_atts.

void describe_attribute ( Attribute attribute)
void describe_component ( Component component)
int drop_attribute ( Corpus corpus,
char *  attribute_name,
int  type,
char *  data 
)

Drops an attribute for the given corpus.

The attribute to be dropped is specified by its attribute name and its type (i.e. no pointer needed: compare cl_delete_attribute).

After calling this, the corpus does not have the attribute any longer -- it is the same as it was never defined.

This is an internal function; the function exposed in the API for this purpose is cl_delete_attribute().

tODO: this function can probably actually be deleted. It doesn't todo: seem to be used anywhere, and is much more complex than todo: cl_delete_attribute

See also:
cl_delete_attribute
Returns:
Boolean: true for all OK, false for a problem

References cl_delete_attribute(), and cl_new_attribute_oldstyle().

int drop_component ( Attribute attribute,
ComponentID  cid 
)

Drops the specified component for the given Attribute.

See also:
comp_drop_component
Parameters:
attributeThe Attribute object to work with.
cidThe identifier of the Component to drop.
Returns:
Always 1.

References _Attribute::any, and comp_drop_component().

Referenced by main().

Component* ensure_component ( Attribute attribute,
ComponentID  cid,
int  try_creation 
)

Ensures that a component is loaded and ready.

The state of the component specified should be ComponentLoaded once this function has run (assuming all is well). If the component is unloaded, the function will try to load it. If the component is defined, the function MAY try to create it. If the component is undefined, nothing will be done.

There are flags in attributes.c that control the behaviour of this function (e.g. if failure to ensure causes the program to abort).

See also:
KEEP_SILENT
ENSURE_COMPONENT_EXITS
ALLOW_COMPONENT_CREATION
Parameters:
attributeThe Attribute object to work with.
cidThe identifier of the Component to "ensure".
try_creationBoolean. True = attempt to create a component that does not exist. False = don't. This behaviour only applies when ALLOW_COMPONENT CREATION is defined; otherwise component creation will never be attempted.
Returns:
A pointer to the specified component (or NULL if the component cannot be "ensured").

References _Attribute::any, cid_name(), comp_component_state(), ComponentDefined, ComponentLoaded, ComponentUndefined, ComponentUnloaded, create_component(), and load_component().

Referenced by cl_alg2cpos(), cl_cpos2alg(), cl_cpos2alg2cpos_oldstyle(), cl_cpos2id(), cl_cpos2struc2cpos(), cl_cpos2struc_oldstyle(), cl_id2cpos_oldstyle(), cl_id2freq(), cl_id2sort(), cl_id2str(), cl_id2strlen(), cl_idlist2cpos_oldstyle(), cl_max_alg(), cl_max_cpos(), cl_max_id(), cl_new_stream(), cl_regex2id(), cl_sort2id(), cl_str2id(), cl_struc2cpos(), cl_struc2str(), compress_reversed_index(), compute_code_lengths(), creat_freqs(), creat_rev_corpus(), creat_rev_corpus_idx(), creat_sort_lexicon(), get_nr_of_strucs(), and validate_revcorp().

Component* find_component ( Attribute attribute,
ComponentID  component 
)

Gets a pointer to the specified component for the given Attribute.

References _Attribute::any.

Referenced by creat_freqs().

Attribute* first_corpus_attribute ( Corpus corpus)

Get a pointer to the head entry in the specified corpus's list of attributes.

Returns:
NULL if the corpus parameter is NULL; otherwise a pointer to Attribute.

References TCorpus::attributes, and loop_ptr.

Referenced by send_cqi_corpus_attributes().

Component* load_component ( Attribute attribute,
ComponentID  cid 
)

Loads the specified component for this attribute.

"Loading" means that the file specified by the component's "path" member is read into the "data" member.

If the component is CompHuffCodes, the data is also copied to the attribute's pos.hc member.

Note that the action of this function is dependent on the component's state. If the component's state is ComponentUnloaded, the component is loaded. If the component's state is ComponentDefined, the size is set to 0 and nothing else is done.

Parameters:
attributeThe Attribute object to work with.
cidThe identifier of the Component to load.
Returns:
Pointer to the component. This will be NULL if the component has not been declared (i.e. created).

References aid_name(), _Attribute::any, cid_name(), comp_component_state(), CompDirectory, CompHuffCodes, CompLast, ComponentDefined, ComponentLoaded, ComponentUnloaded, TMblob::data, TComponent::data, POS_Attribute::hc, item_sequence_is_compressed, _huffman_code_descriptor::lcount, _huffman_code_descriptor::length, _huffman_code_descriptor::max_codelen, MAXCODELEN, _huffman_code_descriptor::min_code, _huffman_code_descriptor::min_codelen, MMAPPED, TMblob::nr_items, TComponent::path, _Attribute::pos, read_file_into_blob(), TComponent::size, _huffman_code_descriptor::size, _huffman_code_descriptor::symbols, _huffman_code_descriptor::symindex, and _Attribute::type.

Referenced by ensure_component().

DynArg* makearg ( char *  type_id)

Creates a DynArg object.

The object created is a dynamic argument of the type specified by the argument type_id, with its "next" pointer set to NULL.

See also:
DynArg
Parameters:
type_idString specifying the type of argument required; choose from: STRING, POS, INT, VARARG, FLOAT
Returns:
Pointer to the new DynArg object, or NULL in case of an invalid type_id.

References ATTAT_FLOAT, ATTAT_INT, ATTAT_POS, ATTAT_STRING, ATTAT_VAR, _DynArg::next, and _DynArg::type.

int MayHaveComponent ( int  attr_type,
ComponentID  cid 
)

Checks whether a particular Attribute type can possess the specified component field.

Returns:
True or false.

References CompLast, find_cid_id(), component_field_spec::id, and component_field_spec::using_atts.

Attribute* next_corpus_attribute ( )

Get a pointer to the next attribute on the list currently being processed.

References _Attribute::any, and loop_ptr.

Referenced by send_cqi_corpus_attributes().

Attribute* setup_attribute ( Corpus corpus,
char *  attribute_name,
int  type,
char *  data 
)

Sets up a corpus attribute.

NEVER CALL THIS!! ONLY USED WHILE PARSING A REGISTRY ENTRY!!!!

Parameters:
corpusThe corpus this attribute belongs to.
attribute_nameThe name of the attribute (i.e. the handle it has in the registry file).
typeType of attribute to be created.
dataUnused. It can just be NULL.

References aid_name(), _Attribute::any, ATT_POS, ATT_STRUC, ATTAT_POS, TCorpus::attributes, cl_new_attribute, CompDirectory, CompLast, corpus, DEFAULT_ATT_NAME, Struc_Attribute::has_attribute_values, POS_Attribute::hc, TCorpus::id, _Attribute::pos, _Attribute::struc, POS_Attribute::this_block_nr, and _Attribute::type.