public class RegExAnnotator extends CasAnnotator_ImplBase
There are two ways to specify the regular expressions - via configuration parameters or via an external resource file.
This annotator takes the following optional configuration parameters:
Patterns
- array of Strings indicating regular expressions to match. The
pattern language is described at
http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html) TypeNames
- array of Strings indicating names of Types to be created from the
patterns. ContainingAnnotationTypes
- an array of input annotation types. This
annotator will only produce new annotations that are contained within existing annotaions of
these types. (This is optional.) AnnotateEntireContainedAnnotation
- When the ContainingAnnoationTypes
parameter is specified, a value of true for this parameter will cause the entire containing
annotation to be used as the span of the new annotation, rather than just the span of the regular
expression match. This can be used to "classify" previously created annotations according to
whether or not they contain text matching a regular expression.
The indices of the Patterns
and TypeNames
arrays correspond, so
that a substring that matches Patterns[i]
will result in an annotation of type
TypeNames[i]
.
It is also possible to provide an external resource file that declares the annotation type names and the regular expressions to match. The annotator will look for this file under the resource key "PatternFile". The file format is as follows:
Patterns
configuration parameter.Modifier and Type | Field and Description |
---|---|
static java.lang.String |
MESSAGE_DIGEST |
Constructor and Description |
---|
RegExAnnotator() |
Modifier and Type | Method and Description |
---|---|
protected int[] |
getRangesToAnnotate(CAS aCAS)
Utility method that determines which subranges of the document text should be annotated by this
annotator.
|
void |
initialize(UimaContext aContext)
Performs any startup tasks required by this annotator.
|
void |
process(CAS aCAS)
Invokes this annotator's analysis logic.
|
void |
typeSystemInit(TypeSystem aTypeSystem)
Acquires references to CAS Type and Feature objects that are later used during the
process(CAS) method. |
getRequiredCasInterface, process
getCasInstancesRequired, hasNext, next
batchProcessComplete, collectionProcessComplete, destroy, getContext, getResultSpecification, reconfigure, setResultSpecification
public static final java.lang.String MESSAGE_DIGEST
public void initialize(UimaContext aContext) throws ResourceInitializationException
initialize
in interface AnalysisComponent
initialize
in class AnalysisComponent_ImplBase
aContext
- Provides access to services and resources managed by the framework. This includes
configuration parameters, logging, and access to external resources.ResourceInitializationException
- if this AnalysisComponent cannot initialize successfully.BaseAnnotator.initialize(AnnotatorContext)
public void typeSystemInit(TypeSystem aTypeSystem) throws AnalysisEngineProcessException
process(CAS)
method.typeSystemInit
in class CasAnnotator_ImplBase
AnalysisEngineProcessException
- if the provided type system is missing types or features required by this annotatorBaseAnnotator.typeSystemInit(TypeSystem)
public void process(CAS aCAS) throws AnalysisEngineProcessException
process
in class CasAnnotator_ImplBase
aCAS
- the CAS to processaResultSpec
- A list of outputs that this annotator should produce.AnnotatorProcessException
- if a failure occurs during processing.AnalysisEngineProcessException
- if a problem occurs during processingCasAnnotator_ImplBase.process(CAS)
protected int[] getRangesToAnnotate(CAS aCAS)
mContainingAnnotationTypes
is null
, the entire document
is eligible for annotation.mContainingAnnotationTypes
is not null
, then each of its
elements is expected to be an Annotation Type name. The CAS is queried for existing annotations
of any of these Types, and the only subranges of the document eligible for annotation are those
subranges contained within such annotations.aCAS
- CAS currently being processedCopyright © 2013. All Rights Reserved.