Settings can be specified in a Python file and/or using environment variables. Settings specified in environment variables override those stored in a file. The default settings are intended to make it easy to get Disco running on a single node. make install will create a more reasonable settings file for a cluster environment, and put it in /etc/disco/settings.py
Disco looks in the following places for a settings file:
- The settings file specified using the command line utility --settings option.
- ~/.disco
- /etc/disco/settings.py
Possible settings for Disco are as follows:
- DISCO_DATA
- Directory to use for writing data. Default obtained using os.path.join(DISCO_ROOT, data).
- DISCO_DEBUG
- Sets the debugging level for Disco. Default is 1.
- DISCO_ERLANG
- Command used to launch Erlang on all nodes in the cluster. Default usually erl, but depends on the OS.
- DISCO_EVENTS
- If set, events are logged to stdout. If set to json, events will be written as JSON strings. If set to nocolor, ANSI color escape sequences will not be used, even if the terminal supports it. Default is unset (the empty string).
- DISCO_FLAGS
- Default is the empty string.
- DISCO_HOME
- The directory which Disco runs out of. If you run Disco out of the source directory, you shouldn’t need to change this. If you use make install to install Disco, it will be set properly for you in /etc/disco/settings.py.
- DISCO_HTTPD
- Command used to launch lighttpd. Default is lighttpd.
- DISCO_MASTER_HOME
- Directory containing the Disco master directory. Default is obtained using os.path.join(DISCO_HOME, 'master').
- DISCO_MASTER_HOST
- The hostname of the master. Default obtained using socket.gethostname().
- DISCO_MASTER_ROOT
- Directory to use for writing master data. Default obtained using os.path.join(DISCO_DATA, '_%s' % DISCO_NAME).
- DISCO_MASTER_CONFIG
- Directory to use for writing cluster configuration. Default obtained using os.path.join(DISCO_ROOT, '%s.config' % DISCO_NAME).
- DISCO_NAME
- A unique name for the Disco cluster. Default obtained using 'disco_%s' % DISCO_PORT.
- DISCO_LOG_DIR
- Directory where log-files are created. The same path is used for all nodes in the cluster. Default is obtained using os.path.join(DISCO_HOME, 'log').
- DISCO_PID_DIR
- Directory where pid-files are created. The same path is used for all nodes in the cluster. Default is obtained using os.path.join(DISCO_HOME, 'run').
- DISCO_PORT
- The port the workers use for HTTP communication. Default is 8989.
- DISCO_ROOT
- Root directory for Disco-written data and metadata. Default is obtained using os.path.join(DISCO_HOME, 'root').
- DISCO_USER
- The user Disco should run as. Default obtained using os.getenv(LOGNAME).
- DISCO_WORKER
- Executable which launches the Disco worker process. Default obtained using os.path.join(DISCO_HOME, node, disco-worker).
- DISCO_WWW_ROOT
- Directory that is the document root for the master HTTP server. Default obtained using os.path.join(DISCO_MASTER_HOME, www).
- DISCO_GC_AFTER
- How long to wait before garbage collecting data. Only results explictly saved to DDFS won’t be garbage collected. Default is 100 * 365 * 24 * 60 * 60 (100 years).
- DISCO_SORT_BUFFER_SIZE
- How much memory can be used by external sort. Passed as the ‘-S’ parameter for the Unix sort command (see man sort). Default is 10% i.e. 10% of the total available memory.
Settings to control the proxying behavior:
- DISCO_PROXY_ENABLED
- If set, enable proxying through the master. Default is ''.
- DISCO_PROXY
- The address of the proxy to use on the client side. Default is ''.
- DISCO_PROXY_PORT
- The port the master proxy should run on. Default is 8999.
Settings to control the scheduler behavior:
- DISCO_SCHEDULER
- The type of scheduler that disco should use. The only options are fair and fifo. Default is fair.
- DISCO_SCHEDULER_ALPHA
- Parameter controlling how much the fair scheduler punishes long-running jobs vs. short ones. Default is .001 and should usually not need to be changed.
Settings used by the testing environment:
- DISCO_TEST_HOST
- The hostname that the test data server should bind on. Default is DISCO_MASTER_HOST.
- DISCO_TEST_PORT
- The port that the test data server should bind to. Default is 9444.
Settings used by DDFS:
- DDFS_ROOT
- The root data directory for DDFS. Default is obtained using os.path.join(DISCO_ROOT, 'ddfs').
- DDFS_PUT_PORT
- The port to use for writing to DDFS nodes. Must be open to the Disco client unless proxying is used. Default is 8990.
- DDFS_PUT_MAX
- The maximum default number of retries for a PUT operation. Default is 3.
- DDFS_GET_MAX
- The maximum default number of retries for a GET operation. Default is 3.
The following settings are used by DDFS to determine the number of replicas for data/metadata to keep (it is not recommended to use the provided defaults in a multinode cluster):
- DDFS_TAG_MIN_REPLICAS
- The minimum number of replicas for a tag operation to succeed. Default is 1.
- DDFS_TAG_REPLICAS
- The number of replicas of tags that DDFS should aspire to keep. Default is 1.
- DDFS_BLOB_REPLICAS
- The number of replicas of blobs that DDFS should aspire to keep. Default is 1.