public class Constants extends Object
Modifier and Type | Class and Description |
---|---|
static class |
Constants.CheckDuplicityLevel
Enum for possible levels of duplicity checking algorithm.
|
Modifier and Type | Field and Description |
---|---|
static String |
BITTOKEN
Tag used for bitmaps stored in the index.
|
static boolean |
CHECK_DUPLICITY
Sign if duplicity checking algorithm should be defaultly used.
|
static String |
CHECK_DUPLICITY_DIR
The name of directory for the duplicity checking algorithm files.
|
static String |
CHECK_DUPLICITY_INDEX_DIR
The name of directory for the duplicity checking algorithm reports.
|
static Constants.CheckDuplicityLevel |
CHECK_DUPLICITY_LEVEL
Actual level of duplicity checking algorithm.
|
static boolean |
CHECK_DUPLICITY_ON_NGRAMS
Sign if duplicity checking algorithm will work with N-grams of words.
|
static int |
CHECK_DUPLICITY_PERM_CHUNK_BITS
Number of bits of permutation chunks used in duplicity checking algorithm.
|
static int |
CHECK_DUPLICITY_PERM_CHUNKS
Number of permutation chunks that together form one logical permutation
used in duplicity checking algorithm.
|
static int |
CHECK_DUPLICITY_PERM_NUM
Number of permutations used in duplicity checking algorithm.
|
static String |
CHECK_DUPLICITY_REPORT_DIR
The name of directory for the duplicity checking algorithm reports.
|
static String |
CHECK_DUPLICITY_TEMP_DIR
The name of directory for the duplicity checking algorithm temporary files.
|
static boolean |
CHECK_PARAGRAPHS
Sign if
ParagraphPunctFilter should be used. |
static String |
CONST_FILE_BEGINNING_POSTFIX
postfix of const files that specifie time of the request for constancy of
index.
|
static String |
CONST_FILE_PREFIX |
static String |
DEADBARRELS_FILENAME
Name of the file where directory numbers of dead barrels are saved.
|
static int |
DEFAULTMODEL
What model is used for querying.
|
static int |
DOCINVSIZE
How many terms we assume in a document.
|
static int |
DOCSCACHE
How many documents are cached in each barrel during querying phase?
|
static double |
DUPLICATE_TRESHOLD
The treshold for mean value of Jaccard coeficient
(divided by number of permutations see
CHECK_DUPLICITY_PERM_NUM )
for all textual units of the document. |
static int |
FIRSTPARAGRAPH
Number of the first paragraph in a document.
|
static int |
FIRSTSENTENCE
Number of the first sentence in a document.
|
static long |
FIRSTUID
Number of the first document in a collection (barrel/tanker).
|
static String |
FS
File separator.
|
static int |
IOSIZE
Size of the I/O buffers.
|
static int |
ITEM_LENGTH_IN_TRANSACTION_LISTENER
Length of a sorted item in transaction listener log.
|
static String |
JACCARD_COEFICIENTS_FILE_NAME
The filename for the
JaccardCoeficientsFile . |
static String |
LOCAL_TANKER_COMMIT_TO_GLOBAL_LOG_FILENAME
Prefix of state file, that signals, that local tanker is in commit phase.
|
static String |
LOCAL_TANKER_DIRECTORY_PREFIX
Prefix of all local tankers.
|
static long |
LOCK_RESERVATION_REFRESH_PERIOD
Refresh time of lock reservation, time after which the reservation can expire.
|
static String |
LOCK_SERVER_DEFAULT_CONFIG_FILENAME
Full filename to the lock server configuration file.
|
static String |
LS
Line separator.
|
static double |
MINIDF
Minimal value of an inverse document frequency.
|
static double |
MINVALIDIDF
All terms with idf that is lower are excluded automatically.
|
static String |
MODIFIER_STATE_FILENAME_PREFIX
Prefix of all modify active state filenames.
|
static long |
MODIFY_STATE_REFRESH_PERIOD
Period of time after which modify state file of a modifier is refreshed.
|
static long |
NO_RESERVATION_ID
Id of lock, that is returned from lock server, when no reservation was
created.
|
static int |
NORMFACTOR
Normalization of vectors to this...
|
static int |
OCCURENCIESTOSCAN
Maximum number of positions which are scanned during phrase queries in each of
the acting term occurencies.
|
static char |
PARAGRAPH_SEPARATOR
Special character which determines paragraph separator.
|
static int |
PARAGRAPH_SEPARATOR_WEIGHT
Special weight which determines paragraph separator.
|
static String |
PERMUTATED_MINS_FILE_PREFIX
The prefix of filename for the
PermutatedMinsFile . |
static int |
PRECOMPCACHE
How many values are precomputed for an inverted list during the search phase.
|
static String |
READ_LOCK_FILENAME_PREFIX
Prefix of all read lock filenames.
|
static long |
READ_LOCK_PERIOD
Default read lock refresh period.
|
static boolean |
REQUIREDMODEBYDEF
Required mode in queries? (true=act as g00gle)
|
static int |
SECOND
Period of time - 1 second.
|
static String |
SEPFILESEXT
What extention is used in
ThickBarrel for separated inverted lists. |
static String |
SEPTOKEN
Tag(s) used for separated inverted lists - defines the prefix.
|
static String |
SIMILAR_UNIT_PAIRS_FILE_PREFIX
The prefix of filename for the
SimilarUnitPairsFile . |
static double |
SIMILARITY_RELEVANT_TRESHOLD
The treshold for Jaccard coeficient
(divided by number of permutations see
CHECK_DUPLICITY_PERM_NUM )
for a textual units of the document to appear as suspect in duplicity checking report. |
static int |
SKIPFACTOR
Skip factor preferably used.
|
static boolean |
SUPPORTHTDIG
Support HTdig in the HTML parser.
|
static int |
TANKERCACHE
Size of a cache in the TankerImpl.
|
static String |
TEMPDIR
Temporary directory.
|
static int |
TERMSCACHE
How many words are cached in each barrel during querying phase?
|
static int |
TITLELEN
Title length.
|
static String |
TRANSACTION_LISTENER_FILENAME_PREFIX
Prefix of all transaction listeners' filenames.
|
static String |
UNKNOWNCONTENTTYPE
Unknown content type (used by robot or indexers when they cannot obtain a valid content-type).
|
static int |
WORDNGRAMS_LENGHT
The lenght of N-grams produced by the
WordNGrammer filter. |
static String |
WRITE_LOCK_FILENAME_PREFIX
Prefix of all write lock filenames.
|
static long |
WRITE_LOCK_PERIOD
Default write lock refresh period.
|
Constructor and Description |
---|
Constants() |
public static final int NORMFACTOR
public static final int PRECOMPCACHE
public static final int DEFAULTMODEL
public static final int DOCINVSIZE
public static final long FIRSTUID
public static final int FIRSTSENTENCE
public static final int FIRSTPARAGRAPH
public static final int TERMSCACHE
Terms
,
Constant Field Valuespublic static final int DOCSCACHE
Documents
,
Constant Field Valuespublic static final String SEPFILESEXT
ThickBarrel
for separated inverted lists.
Nonfunctionalpublic static final String SEPTOKEN
public static final String BITTOKEN
public static final int OCCURENCIESTOSCAN
PhraseScan
,
Constant Field Valuespublic static final double MINIDF
CWI
,
Constant Field Valuespublic static final String FS
public static final String LS
public static final boolean SUPPORTHTDIG
HTMLParser
,
Constant Field Valuespublic static final int TITLELEN
HTMLParser
,
Constant Field Valuespublic static final int IOSIZE
FileLocal
,
Constant Field Valuespublic static final boolean REQUIREDMODEBYDEF
public static final double MINVALIDIDF
public static final int SKIPFACTOR
IListMetadataWrite
,
Constant Field Valuespublic static final int TANKERCACHE
TankerImpl
,
Constant Field Valuespublic static final String UNKNOWNCONTENTTYPE
public static final String CONST_FILE_PREFIX
public static final String CONST_FILE_BEGINNING_POSTFIX
public static final String LOCAL_TANKER_DIRECTORY_PREFIX
public static final String DEADBARRELS_FILENAME
public static final String READ_LOCK_FILENAME_PREFIX
public static final String WRITE_LOCK_FILENAME_PREFIX
public static final String TRANSACTION_LISTENER_FILENAME_PREFIX
public static final String MODIFIER_STATE_FILENAME_PREFIX
public static final String LOCAL_TANKER_COMMIT_TO_GLOBAL_LOG_FILENAME
public static final int ITEM_LENGTH_IN_TRANSACTION_LISTENER
public static final String LOCK_SERVER_DEFAULT_CONFIG_FILENAME
public static final long READ_LOCK_PERIOD
public static final long WRITE_LOCK_PERIOD
public static final long NO_RESERVATION_ID
public static final int SECOND
public static final long MODIFY_STATE_REFRESH_PERIOD
public static final long LOCK_RESERVATION_REFRESH_PERIOD
public static final char PARAGRAPH_SEPARATOR
public static final int PARAGRAPH_SEPARATOR_WEIGHT
public static final int WORDNGRAMS_LENGHT
WordNGrammer
filter.public static final String PERMUTATED_MINS_FILE_PREFIX
PermutatedMinsFile
.public static final String SIMILAR_UNIT_PAIRS_FILE_PREFIX
SimilarUnitPairsFile
.public static final String JACCARD_COEFICIENTS_FILE_NAME
JaccardCoeficientsFile
.public static final String CHECK_DUPLICITY_DIR
public static final String CHECK_DUPLICITY_TEMP_DIR
public static final String CHECK_DUPLICITY_REPORT_DIR
public static final String CHECK_DUPLICITY_INDEX_DIR
public static final boolean CHECK_DUPLICITY_ON_NGRAMS
public static Constants.CheckDuplicityLevel CHECK_DUPLICITY_LEVEL
public static final int CHECK_DUPLICITY_PERM_CHUNK_BITS
public static final int CHECK_DUPLICITY_PERM_CHUNKS
public static int CHECK_DUPLICITY_PERM_NUM
public static final boolean CHECK_DUPLICITY
public static final boolean CHECK_PARAGRAPHS
ParagraphPunctFilter
should be used.public static double DUPLICATE_TRESHOLD
CHECK_DUPLICITY_PERM_NUM
)
for all textual units of the document.
Above this value the document is considered duplicate.public static final double SIMILARITY_RELEVANT_TRESHOLD
CHECK_DUPLICITY_PERM_NUM
)
for a textual units of the document to appear as suspect in duplicity checking report.
Above this value the diff algorithm with the most similar document from the index
is made for this textual unit.Copyright © 2016 Egothor. All Rights Reserved.