public class Config extends Object
Modifier and Type | Class and Description |
---|---|
static class |
Config.TypeURIRepository |
Modifier and Type | Field and Description |
---|---|
static int |
BACKUPRESPONSES
Backup all in-content-table-specified documents into a gzipped stream?
|
static int |
BUCKETSCACHE
Number of cached
Table objects. |
static int |
BUCKETSIZE
Size of buckets (in bytes).
|
static int |
CAPACITY
How many servers will be kept in memory for processing (implies up to 4
IO handles for each).
|
static boolean |
DEBUG
Print more details about robot work into logs.
|
static int |
DNSRETRIES
How many times we restart DNS query?
|
static String |
DNSSERVER
IP address of our DNS server.
|
static int |
DNSTIMEOUT
How long we wait for DNS response.
|
static long |
DOCDELAY
How often we may ask for the same URI? [msec]
|
static long |
DOCDELAYONFAIL
How often we may ask for the same URI if the request failed? [msec]
|
static long |
DOCINITDELAY
How often we may ask for the same URI on start (the first two fetches)?
[msec]
|
static int |
FETCHQSIZE
Number of ready servers for gathering (number of pages gathered
concurrently).
|
static int |
IMGBUCKETSCACHE
Number of cached
Table objects when storing
IMG-SRC URIs. |
static int |
IMGNODESCACHE
An internal cache which is used in
Directory when storing IMG-SRC URIs. |
static int |
IMGSCHUNKLEN
Chunk length of
IMGSLISTCHUNKFILENAME and
IMGSSTRUCTCHUNKFILENAME reporters. |
static String |
IMGSLISTCHUNKFILENAME
Filename of log file with all new IMG URIs accepted.
|
static String |
IMGSSTRUCTCHUNKFILENAME
Filename of log file with the occurencies of images on pages.
|
static int |
IMGURISCACHE
An internal cache which is used in
Table
when storing IMG-SRC URIs. |
static int |
INDEXCHUNK
How large chunks are created by the indexer.
|
static int |
IP_PAUSE
Delay between two connections to the same IP (msec).
|
static int |
IP_VIRTUAL
Number of requests to one IP concurrently (in case of virtual hosts).
|
static int |
MAXCONNECTIONSINSEQ
One server may be stay in the processing queue up to this value of
connections, then is must give a chance to others.
|
static int |
MAXFAILURESINBATCH
How many times do we allow to restart DNS or HTTP requests during a batch
(sequence)?
|
static int |
MAXIMUSTOTALUS
What's the maximum size of a document we accept?
|
static int |
MAXITEMSINQUEUE
If a queue has more than this number of URIs in a queue, then new URIs
are not planned.
|
static long |
MAXPAGES
Initial number of URIs we will collect.
|
static long |
MAXSERVERS
Initial number of servers robot will scan.
|
static int |
MAXURILEN
Max length of a URI.
|
static int |
MAXURISFORBLINDHASH
Size of a hash table (bitmap) for all URIs in the system.
|
static int |
NEWURISCACHE
An internal cache which is used in
LinksCollector . |
static int |
NODESCACHE
An internal cache which is used in
Directory . |
static int |
POSTQUEUESIZE
How many
Response objects are in post-process
queues. |
static int |
PREFSORTED
Number of items sorted preferably.
|
static int |
RESOLVEQSIZE
Number of ready servers for resolving (number of concurrent request our
DNS server can handle).
|
static int |
RESPONSESIZE
What's the maximum size of a defragmented response we accept?
|
static String |
ROBOTID
Identification string of our robot.
|
static long |
ROBOTSTXT
How old can the robots.txt specification be (in milliseconds)?
|
static int |
SAVEINTERVAL
New URIs are saved at these intervals (msec) when possible.
|
static boolean |
SCRUB
Scrub all URI to a standard format?
|
static long |
SRVEMPTY
How long is a server parked when its queue is empty?
|
static long |
SRVIPLOCKED
How long is a server parked when its IP is locked currently?
|
static long |
SRVRECYCLE
How long is a server parked when it is exhaused?
|
static long |
SRVTCPERROR
How long is a server parked when it is unreachable?
|
static long |
SRVUNRESOLVED
How long is a server parked when it cannot be resolved?
|
static String |
STATCHUNKFILENAME
Filename of log file with statistics values of pages processed
successfully.
|
static int |
STATCHUNKLEN
Chunk length of
STATCHUNKFILENAME reporter. |
static int |
TRANSMITTERPORT
UDP SAX events transmitter port.
|
static int |
TURNUPWHEEL
How many requests are parsed in synchronous mode.
|
static Config.TypeURIRepository |
URIREPOSITORY
What URI repository is used inside
T0 to
assign ids to URIs. |
static int |
URISCACHE
An internal cache which is used in
Table or
BijectInt2StringAppender . |
static String |
URISCHUNKFILENAME
Filename of log file with all new URIs accepted.
|
static int |
URISCHUNKLEN
Chunk length of
URISCHUNKFILENAME reporter. |
static String |
USERAGENT
User agent string of our robot.
|
static String |
VR
Version number of the robot.
|
static int |
WWWRETRIES
How many times we restart HTTP connection?
|
static int |
WWWTIMEOUT
How long we wait for data.
|
Constructor and Description |
---|
Config()
This is an empty constructor - used by test scripts only.
|
Modifier and Type | Method and Description |
---|---|
static Escape |
acceptToDownload(String contentType,
String contentLength)
Tests whether we have some interest to download a document of a given
content-type and suggested content-length (both are HTTP headers sent to
us).
|
static boolean |
allowedPage(String name) |
static boolean |
allowedServer(String name) |
static boolean |
allowedToBackup(String contentType) |
static String |
explain(URI uri) |
void |
exportTo(DataOutputStream dos)
Save this configuration into an output stream.
|
boolean |
hasSomeBackupConditions()
Return true if this configuration defines some rules for backup.
|
ArrayList<String> |
initialize(String filename)
Load a configuration from a file.
|
URI |
normalize(URI uri)
This function first normalizes the URI and then tests whether such a URI
is valid with a configuration specified by this class.
|
void |
shutdown()
Close any resources this object may held.
|
public static boolean DEBUG
public static String USERAGENT
public static String ROBOTID
public static final String VR
public static String DNSSERVER
public static final int RESPONSESIZE
public static int MAXIMUSTOTALUS
public static int IP_PAUSE
public static int IP_VIRTUAL
public static int MAXURILEN
public static int CAPACITY
public static int RESOLVEQSIZE
public static int FETCHQSIZE
public static int DNSTIMEOUT
public static int WWWTIMEOUT
public static int DNSRETRIES
public static int WWWRETRIES
public static int MAXFAILURESINBATCH
MAXCONNECTIONSINSEQ
public static long ROBOTSTXT
public static long DOCDELAY
public static long DOCINITDELAY
public static long DOCDELAYONFAIL
public static long SRVEMPTY
public static long SRVUNRESOLVED
public static long SRVIPLOCKED
public static long SRVTCPERROR
public static long SRVRECYCLE
MAXCONNECTIONSINSEQ
public static int NEWURISCACHE
LinksCollector
.public static int BUCKETSCACHE
Table
objects.public static int URISCACHE
Table
or
BijectInt2StringAppender
.public static int NODESCACHE
Directory
.public static int IMGBUCKETSCACHE
Table
objects when storing
IMG-SRC URIs.ImageLinksExtractor
public static int IMGURISCACHE
Table
when storing IMG-SRC URIs.ImageLinksExtractor
public static int IMGNODESCACHE
Directory
when storing IMG-SRC URIs.ImageLinksExtractor
public static int SAVEINTERVAL
public static String URISCHUNKFILENAME
public static int URISCHUNKLEN
URISCHUNKFILENAME
reporter. Valid values are:
T0
public static String STATCHUNKFILENAME
public static int STATCHUNKLEN
STATCHUNKFILENAME
reporter. Valid values are:
T5
public static String IMGSLISTCHUNKFILENAME
ImageLinksExtractor
,
Reporter
public static int IMGSCHUNKLEN
IMGSLISTCHUNKFILENAME
and
IMGSSTRUCTCHUNKFILENAME
reporters. Valid values are:
ImageLinksExtractor
,
Response
public static String IMGSSTRUCTCHUNKFILENAME
ImageLinksExtractor
,
Reporter
public static long MAXSERVERS
public static long MAXPAGES
public static int MAXCONNECTIONSINSEQ
public static int BUCKETSIZE
Bucket
public static int INDEXCHUNK
public static int POSTQUEUESIZE
Response
objects are in post-process
queues.public static int TURNUPWHEEL
T5
public static int BACKUPRESPONSES
StorageEscape
public static int MAXURISFORBLINDHASH
FastBlindAppender
,
T0
public static Config.TypeURIRepository URIREPOSITORY
T0
to
assign ids to URIs.public static boolean SCRUB
Scrub
public static int PREFSORTED
SequentialQueue
public static int MAXITEMSINQUEUE
PageSequentialQueue
public static int TRANSMITTERPORT
Transmitter
public void exportTo(DataOutputStream dos) throws IOException
dos
- the output streamIOException
- when the data cannot be writtenpublic ArrayList<String> initialize(String filename) throws IOException
filename
- the filenameIOException
- on I/O errornormalize(java.net.URI)
public URI normalize(URI uri)
uri
- the entry URIvalid(java.net.URI)
public static Escape acceptToDownload(String contentType, String contentLength)
contentType
- contentLength
- public static boolean allowedServer(String name)
name
- public static boolean allowedPage(String name)
name
- public static boolean allowedToBackup(String contentType)
contentType
- public boolean hasSomeBackupConditions()
public void shutdown()
Copyright © 2016 Egothor. All Rights Reserved.