OCDM_Document Class Reference

The class OCDM_Document incorporates every function related to the manipulation of the documents of a collection. Bare in mind that an Ellogon Document does not consist merely of text. It consists of textual data as well as linguistic information about the textual data.

#include <OCDM.h>

List of all members.

Public Member Functions

 OCDM_Document ()
 This is a null constructor of an OCDM_Document Object. The purpose of this function is to create a null document, just to be used for programming reasons.
 OCDM_Document (const char *TclCmdName)
 This function servers as the main constructor of an OCDM_Document object.
 OCDM_Document (const class OCDM_Document &obj)
 This is the default copy constructor.
 OCDM_Document (const Tcl_Obj *TclCmdName)
 This function servers as the main constructor of an OCDM_Document object.
 OCDM_Document (const CDM_Document)
 A constructor that maps a CDM_Document to an OCDM_Document object.
 ~OCDM_Document ()
 This is the destructor of an OCDM_document object.
void Close (void)
 This method will remove a Document from memory..
OCDM_Documentoperator= (const class OCDM_Document &obj)
 This is the default Assignment operator.
void storeObject (const class OCDM_Object *objPtr) const
void storeObject (const class OCDM_Collection *objPtr) const
OCDM_ObjectgetStoredObject (void) const
void releaseStoredObject (void) const
long AddAnnotation (const OCDM_Annotation &Ann)
 OCDM_REF (OCDM_AnnotationSet) AnnotationsAt(const long position) const
 OCDM_REF (OCDM_AnnotationSet) AnnotationsContaining(const long position) const
 OCDM_REF (OCDM_AnnotationSet) AnnotationsContaining(const long position1
 OCDM_REF (OCDM_AnnotationSet) AnnotationsInRange(const long Start
 OCDM_REF (OCDM_AnnotationSet) AnnotationsInRange(const OCDM_Annotation &Ann) const
 OCDM_REF (OCDM_AnnotationSet) AnnotationsMatchingRange(const long Start
 OCDM_REF (OCDM_AnnotationSet) AnnotationsMatchingRange(const OCDM_Annotation &Ann) const
OCDM_BOOL AttributeExists (const char *name) const
 This function will return true if an Attribute with the specified name exists in the Document object.
int DisplaceAnnotations (long offset, long displacement)
 OCDM_REF (OCDM_Annotation) FirstAnnotationContaining(const long Position) const
 OCDM_REF (OCDM_Annotation) FirstAnnotationContaining(const long Position1
 OCDM_REF (OCDM_RawDataSet) GetAnnotatedTextRanges(const OCDM_Annotation &Ann) const
 OCDM_REF (OCDM_Annotation) GetAnnotation(const long Id) const
 OCDM_REF (OCDM_Attribute) GetAttribute(const char *name) const
 OCDM_REF (OCDM_AttributeSet) GetAttributes(void) const
 OCDM_REF (OCDM_ByteSequence) GetFirstAnnotatedTextRange(const OCDM_Annotation &Ann) const
 OCDM_REF (OCDM_AnnotationSet) NextAnnotations(const long Position) const
int PutAttribute (const OCDM_Attribute &Attr)
 This function will add a given Attribute to the specified Document object.
int RemoveAnnotation (const long Id)
 As the name implies this function will remove the Annotation object that has as Id the value specified by the "Id" parameter.
const char * GetId (void) const
 This function will return the Id of the Document object in use.
int RemoveAttribute (const char *name)
 This function will remove the Attribute named exactly as the "Name" parameter.
 OCDM_REF (OCDM_AnnotationSet) SelectAnnotations(const char *Type) const
 OCDM_REF (OCDM_AnnotationSet) SelectAnnotations(const char *Type
 OCDM_REF (OCDM_AnnotationSet) SelectAnnotationsSorted(const char *Type) const
 OCDM_REF (OCDM_AnnotationSet) SelectAnnotationsSorted(const char *Type
 OCDM_REF (OCDM_ByteSequence) ByteSequenceInsertString(const long pos
 OCDM_REF (OCDM_ByteSequence) ByteSequenceReplace(const long first
 OCDM_REF (OCDM_ByteSequence) ByteSequenceReplaceCharacters(const long first
int DeleteAnnotations (const char *Type)
int DeleteAnnotations (const char *Type, const char *Constraints)
long FindMaxUsedAnnotationId (void) const
 OCDM_REF (OCDM_AnnotationSet) GetAnnotations(void) const
 OCDM_REF (OCDM_ByteSequence) GetByteSequence(void) const
const char * GetByteSequence (const char *encoding) const
const char * GetEncoding (void) const
const char * GetExternalId (void) const
 OCDM_REF (OCDM_Annotation) GetFirstAnnotation(const char *Type) const
 OCDM_REF (OCDM_Annotation) GetNextAnnotation(void) const
 OCDM_REF (OCDM_Collection) GetParent(void) const
 OCDM_REF (OCDM_RawData) GetRawData(void) const
const char * GetRawData (const char *encoding) const
int SetByteSequence (const OCDM_ByteSequence &Text)
 This function will change the text of a Document object to the text specified by the "Text" parameter.
void SetProcessStatus (const int value) const
 Updates the Percent Bar displayed when an Annotator runs...
const char * SetEncoding (const char *encoding)
const char * SetExternalId (const char *ExternalId)
 This function will modify the external Id (XID) of the Document object in use.
 OCDM_REF (OCDM_ByteSequence) Status(void) const
int Sync (void) const
 This function will save the Document object in disk.
 OCDM_REF (OCDM_RawData) RawDataReplace(const long first
 OCDM_REF (OCDM_RawData) RawDataInsertString(const long pos
 OCDM_REF (OCDM_RawData) RawDataReplaceCharacters(const long first
void Log (const char *str,...) const
 This method logs information. It is equivalent to OCDM_Utilities.Log().
OCDM_BOOL Valid (void) const
 As the name implies this function checks for a valid Document.
const char * toString (void) const
 Return object as a formatted string.
const char * objectType (void) const

Public Attributes

const long position2 const
const long End const
const long Position2 const
const char *Constraints const
const char *Constraints const
const char * string
const long last
const long const char * newstring
const long const char * string
const long const char * newstring
const char * string
const int characters
const int const char * string


Constructor & Destructor Documentation

OCDM_Document (  ) 

OCDM_Document ( const char *  TclCmdName  ) 

Arguments:
  • TclCmdName: The parameter TclCmdName represents the name of the document to be created.

OCDM_Document ( const class OCDM_Document obj  ) 

OCDM_Document ( const Tcl_Obj *  TclCmdName  ) 

OCDM_Document ( const   CDM_Document  ) 

~OCDM_Document (  ) 

Description:
The goal of this function is to close the given (opened) Document object. All the objects that the Document to be closed owns, as well the Document object itself, will be deleted from memory and will be unregistered from the current Tcl interpreter (CDM_Interp). Note that this function will not save the Document object in use. In order for the Document to be saved, the caller must previously use the function Sync of the Class OCDM_Document.


Member Function Documentation

void Close ( void   ) 

Description:
This method can be used to force a Document object to release the memory hold by the Document structure, in order to save memory. The Document object will not be released, but after calling this method will contain an invalid internal representation: any method (except the destructor) called after Close has beeen called will throw an exception. Note that this function will not save the Document object in use. In order for the Document to be saved, the caller must previously use the function Sync of the Class OCDM_Document. Usually, the user does not have to call this method. A Document will be closed when the object representing this Document will be deleted. However, there are situation where such a method is need: in java for example, the user does not have control on when an object will be deleted. This method allows the programmer to release the memory occupied internally by the object and let the object get garbage-collected at a later time.

class OCDM_Document & operator= ( const class OCDM_Document obj  ) 

void storeObject ( const class OCDM_Object objPtr  )  const

void storeObject ( const class OCDM_Collection objPtr  )  const

class OCDM_Object * getStoredObject ( void   )  const

void releaseStoredObject ( void   )  const

long AddAnnotation ( const OCDM_Annotation Ann  ) 

Arguments
  • Ann: The Annotation object to be added.
Note:
This function returns the Id of the Annotation that was added to the Document object in use. Whether the Annotation had or had not an Id makes no difference to the returned value. Note that after using this function, CDM will keep a reference to the given Annotation. This will transform the given Annotation into a shared object, that cannot be manipulated any more by the caller. Always remember that an attempt to modify a shared object will abort the execution of the whole platform. If the caller wants to modify a shared object, the following steps must be followed: A check should be made in order to find if an object is shared or not, by using the Tcl function Tcl_IsShared. If the object is not shared (i.e. Tcl_IsShared returns an integer value < 2), the object can be safely modified. If the object is shared, a new copy of the object must be created by using the Tcl function Tcl_DuplicateObj. The new object will be an exact copy of the original one but it will not be shared. The returned object can be safely modified. In case of an error the return value will be -1 and an error message describing the error will be left at the current Tcl interpreter (CDM_Interp).

OCDM_REF ( OCDM_AnnotationSet   )  const

OCDM_REF ( OCDM_AnnotationSet   )  const

OCDM_REF ( OCDM_AnnotationSet   )  const

OCDM_REF ( OCDM_AnnotationSet   )  const

OCDM_REF ( OCDM_AnnotationSet   )  const

OCDM_REF ( OCDM_AnnotationSet   )  const

OCDM_REF ( OCDM_AnnotationSet   )  const

OCDM_BOOL AttributeExists ( const char *  name  )  const

Arguments:
  • name: The Attribute name to be found.
Note:
In case of an Error an Exception of type OCDM_Exception will be thrown.

int DisplaceAnnotations ( long  offset,
long  displacement 
)

Description:
This function displases or "moves" the Annotations of the Document object in use by "displacement" characters. This function will iterate over all Annotations contained in the Document. For each Annotation, all spans contained in the span set will be examined: for each offset in the span (either start or end), if it is equal or greater than the value provided by the "offset" parameter, then the value of the "displacement" parameter will be added.
Note:
In case of illegal manipulation, an exception of type OCDM_Exception will be thrown

OCDM_REF ( OCDM_Annotation   )  const

OCDM_REF ( OCDM_Annotation   )  const

OCDM_REF ( OCDM_RawDataSet   )  const

OCDM_REF ( OCDM_Annotation   )  const

OCDM_REF ( OCDM_Attribute   )  const

OCDM_REF ( OCDM_AttributeSet   )  const

OCDM_REF ( OCDM_ByteSequence   )  const

OCDM_REF ( OCDM_AnnotationSet   )  const

int PutAttribute ( const OCDM_Attribute Attr  ) 

Description
This function will add a given Attribute to the specified Document object. If an Attribute with the same name already exists in the Document, then it will be overwritten by the new Attribute. Else, the new Attribute will be appended to the existing Attribute set of the specified Document.
Arguments
  • Attr: The attribute to be added to the Document in use.
Note:
In case of illegal manipulation, an exception of type OCDM_Exception will be thrown

int RemoveAnnotation ( const long  Id  ) 

Description
This function removes the Annotation object that has as Id the value specified by the "Id" parameter.
Arguments
  • Id: The Id value of the parameter to be removed
Note:
If the requested Annotation does not exist, an exception of type OCDM_Exception will be thrown.

const char * GetId ( void   )  const

Description
This function returns the Id of the Document object in use. The Id is the file name of the initial file that contained the text which was used in order to create the Document object.
Note:
The returned value will be encoded using the UTF-8 encoding (thus enabling the existance of non Latin characters in the value).

int RemoveAttribute ( const char *  name  ) 

Arguments
  • name: The name of the attribute to be removed from the document.
Note:
If the requested Attribute does not exist, an exception of type OCDM_Exception will be thrown

OCDM_REF ( OCDM_AnnotationSet   )  const

OCDM_REF ( OCDM_AnnotationSet   )  const

OCDM_REF ( OCDM_AnnotationSet   )  const

OCDM_REF ( OCDM_AnnotationSet   )  const

OCDM_REF ( OCDM_ByteSequence   )  const

OCDM_REF ( OCDM_ByteSequence   )  const

OCDM_REF ( OCDM_ByteSequence   )  const

int DeleteAnnotations ( const char *  Type  ) 

Description
This function accepts an argument of type const char : This is a UTF-8 string that defines the Annotations to be deleted. According to the value specified by the "Type" parameter, this function will delete Annotations or Annotation Attributes from the specified Document.
If the value of the "Type" parameter is an empty string or NULL, then ALL the Annotations that the specified Document currently has will be deleted from this Document object. The used Annotation Ids will be also reseted. This means that the first Annotation that will be added to this Document will take the value "0" as Annotation Id, if it does not already have an Id. If the deletion is done successfully, the total number of the deleted Annotations will be returned.

If Annotations having as type the value of the "Type" parameter exist, then all these Annotations will be deleted from the Document. If the deletion is done successfully, the total number of the deleted Annotations will be returned.

If all Annotations have been searched and no Annotation having as type the value of the "Type" parameter was found, the value of the "Type" parameter is examined whether it can be splitted in two words. If this is the case (the value of the "Type" parameter contains at least one space character), then the value is splitted in two parts on the first space character that is found. The first word will be used as an Annotation type and the rest of the value (second word) will be used as an Attribute name. All Annotations will be seached. If Annotations having as type the first word are found, they are examined whether they contain an Attribute named as the second word. If such an Attribute is found in an Annotation, it will be deleted from the Annotation. The total number of deleted Attributes will be returned as the return value of this function.

Note:
If this function fails to delete any Annotation or any Annotation Attribute, then the value "0" will be deleted. In case of an error, an exception of type OCDM_Exception will be thrown.

int DeleteAnnotations ( const char *  Type,
const char *  Constraints 
)

Description
This function is an oveloaded version of the above one It accepts two arguments: a UTF-8 string that defines the Annotations to be deleted and a OCDM_BOOLean expression. This expression must be true in order for an Annotation to be deleted. According to the value specified by the "Type" parameter, this function will delete Annotations or Annotation Attributes from the specified Document.
If all Annotations have been searched and no Annotation having as type the value of the "Type" parameter was found, the value of the "Type" parameter is examined whether it can be splitted in two words. If this is the case (the value of the "Type" parameter contains at least one space character), then the value is splitted in two parts on the first space character that is found. The first word will be used as an Annotation type and the rest of the value (second word) will be used as an Attribute name.

All Annotations will be seached. If Annotations having as type the first word are found, they are examined whether they contain an Attribute named as the second word. If such an Attribute is found in an Annotation and the OCDM_BOOLean expression is true for this Annotation, it will be deleted from the Annotation. The total number of deleted Attributes will be returned as the return value of this function.

The OCDM_BOOLean expression that can be specified through the "Constraints" parameter will be evaluated before an Annotation or an Annotation Attribute is deleted. If the expression is true, then the deletion will be done. Else, this Annotation or Annotation Attribute will not be deleted. This expression can be any valid Tcl OCDM_BOOLean expression (i.e. a OCDM_BOOLean expression that will be accepted by the "expr" Tcl command), with the following excepion: The OCDM_BOOLean expression can contain references to the values of Atttributes of the Annotation, by utilising the notation "ann::<Attribute Name>". The only limitation is that "<Attribute Name>" cannot exceed 120 characters. For example, the following code will delete all Annotations from the specified Document that their type is token, they have a "type" Attribute which has as value "EFW" and also have a "pos" Attribute that has a value other than "NN":

DeleteAnnotations("token", "ann::type == \"EFW\" && ann::pos != \"NN\"");

If the value of the "Type" parameter is an empty string or NULL, then all Annotations will be searched for deletion, regardless their type. If the value of the "Constraints" parameter is an empty string or NULL, then the expression is ignored and the deletions will be based only on the value of the "Type" parameter. Note that if all Annotations are deleted from the Document object in use then the set of utilised Ids will also be reseted. As a result, the first Annotation that will be added to this Document after the deletion will take the value "0" as Annotation Id, if it does not already have an Id. If any deletion are done successfully, the total number of the deleted (or modified) Annotations will be returned. If this function fails to delete any Annotation or any Annotation Attribute, then the value "0" will be deleted. In case of an error, an exception of type OCDM_Exception will be thrown.

long FindMaxUsedAnnotationId ( void   )  const

Description
This function will examine the Annotations stored in the Document object in use to find the maximum used Annotation Id. This function will make the proper arrangements so as the next Annotation that will be added to this Document to receive as Id the following Id after the identified maximum Ids.

OCDM_REF ( OCDM_AnnotationSet   )  const

OCDM_REF ( OCDM_ByteSequence   )  const

const char * GetByteSequence ( const char *  encoding  )  const

Description
This function will return the Text of the Document object in use It will return a C pointer (of type char*) that will contain the text of the Document.
Arguments
  • encoding: The text will be stored in a newly allocated memory space, using the requested by the value of the "Encoding" parameter encoding. The pointer to that new memory segment will be returned to the caller.

const char * GetEncoding ( void   )  const

Description
This function will return the encoding of the Document object in useThe return value will be of type char* and will be owned by the new version of CDM. Its value will be a standart Tcl encoding value (like iso8859-7 or cp1253). For all available Tcl encodings please refer to the Tcl manuals.

const char * GetExternalId ( void   )  const

Description
This function will return the external Id (XID) of the Document object in use. The external Id is the (absolute) path of the initial file that contained the text which was used in order to create the Document object object in use. The returned value will be encoded using the UTF-8 encoding (thus enabling the existance of non Latin characters in the value). The return value will be of type char* and will be owned by the CDM.

OCDM_REF ( OCDM_Annotation   )  const

OCDM_REF ( OCDM_Annotation   )  const

OCDM_REF ( OCDM_Collection   )  const

OCDM_REF ( OCDM_RawData   )  const

const char * GetRawData ( const char *  encoding  )  const

Description
This function will return a C pointer (of type char*) that will contain the text (or byte sequence or raw data) of the Document object in use.
Arguments
  • \ encoding: the parameter specifing the requested encoding.

int SetByteSequence ( const OCDM_ByteSequence Text  ) 

Description
This function will change the text of a Document object to the text specified by the "Text" parameter.The "Text" parameter must be an object of type OCDM_ByteSequence. Such an object can be created either by the use of Tcl_NewStringObj or by creating a OCDM_ByteSequence object using the default constructor. The text that this function accepts, is assumed to be encoding according to the UTF-8 encoding. If the text is stored using a different encoding, the caller must use CDM_ExternalToUtf in order to convert the text to UTF. CDM will internally keep a reference to this object.

void SetProcessStatus ( const int  value  )  const

const char * SetEncoding ( const char *  encoding  ) 

Description
This function will change the encoding of a Document object to the encoding specified by the "Encoding" parameter.

const char * SetExternalId ( const char *  ExternalId  ) 

Description
This function will modify the external Id (XID) of the Document object in use The external Id is the (absolute) path of the initial file that contained the text which was used in order to create the Document. The value is expected to be encoded using the UTF-8 encoding (thus enabling the existance of non Latin characters in the value). The caller can use the function CDM_ExternalToUft in order to convert a string from any supported encoding to UTF.

OCDM_REF ( OCDM_ByteSequence   )  const

int Sync ( void   )  const

Description
This function will save the Document object in disk. If the Document object has never been saved again, then this function will create a file according to the Id the Document in use. A representation of the Document will be saved in this file. If the file already exists it will be overwritten. If the file cannot be created an error condition will be returned

OCDM_REF ( OCDM_RawData   )  const

OCDM_REF ( OCDM_RawData   )  const

OCDM_REF ( OCDM_RawData   )  const

void Log ( const char *  str,
  ... 
) const

OCDM_BOOL Valid ( void   )  const

Description
As the name implies this function checks for a valid Document. In other words, if a Document exists it returns a true OCDM_BOOLean variable. Otherwise it returns false

const char * toString ( void   )  const

const char* objectType ( void   )  const


Member Data Documentation

const long End const

const long End const

const long Position2 const

const char* Constraints const

const char* Constraints const

const char* string

const long last

const long const char* newstring

const long const char* string

const long const char* newstring

const char* string

const int characters

const int const char* string


Generated on Tue Jun 26 17:40:44 2007 for OCDM by  doxygen 1.5.2