Skip to main content

Algorithm Configuration

An algorithm is added to a Project via a JSON configuration.

Algorithm configuration

The JSON configuration has the following properties.

Properties

PropertyTypeRequired
NamestringRequired
VersionstringRequired
OwnerstringOptional
DataFormatenumRequired
TrainWordEmbeddingsbooleanOptional
WordEmbeddingsCachebooleanOptional
WordEmbeddingsOutputstringOptional
WordEmbeddingsNamestringOptional
RunOnenumRequired
DetailsobjectRequired
AlgoFromstringRequired
AlgoDetailsobjectRequired

Name

The algorithm name

  • is required

  • Type: string

Constraints

minimum length: the minimum number of characters for this string is: 3

Version

Version associated with the current configuration for eg: 1.0.2

  • is required

  • Type: string

Constraints

pattern: the string must match the following regular expression:

^([0-9]+(.[0-9]+)*)$

try pattern

Owner

Owner of the algorithm

  • is optional

  • Type: string

DataFormat

Data format of the data accepted by algorithm

  • is required

  • Type: enum

Constraints

enum: the value of this property must be equal to one of the following values:

ValueExplanation
"CoNLL2003"Conll2003 format
"IOB"Iob format
"IOB2"Iob2 format
"IOBES"iobes format
"BILOU"bilou format
"TSV"tsv format
"JSONL"jsonl format
"Acharya"Acharya format

TrainWordEmbeddings

If set, this will initiate a word embedding training configured in config.yaml before the actual training. Word embedding training helps in training/updating the vocabulary based on the data. This can also be used to update a pre-trained model like BERT. This is very useful when a domain specific vocabulary training is desired.

  • is optional

  • Type: boolean

WordEmbeddingsCache

info

Recommended if TrainWordEmbeddings is set

If set, will cache the output of the word-embedding training done and will only re-initiate word-embedding training when the data content changes. This will help in overall training time reduction.

  • is optional

  • Type: boolean

WordEmbeddingsOutput

info

Recommended if TrainWordEmbeddings is set

Path where the word-embedding output is saved. The contents of this path will be cached and reused for future training until the input data changes.

  • is optional

  • Type: string

WordEmbeddingsName

A name to be associated with the word embedding training

  • is optional

  • Type: string

RunOn

Will run the training on the specified infrastructure like: bare-metal, docker container or virtual machine. Currently supported values: Docker

  • is required

  • Type: enum

Constraints

enum: the value of this property must be equal to one of the following values:

ValueExplanation
"Docker"Specifies that the algorithm should be trained in a docker container configured.

Default Value

The default value is:

"Docker"

Details

Detailed configuration of the infrastructure to run the training

  • is required

  • Type: object

More info on each property of Details

Default Value

The default value is:

{
"Image": false,
"Port": "7707/tcp",
"HostIP": "0.0.0.0",
"Debug": false,
"DockerHost": "localhost",
"DockerHostPort": 2375
}

AlgoFrom

The source from where the algorithm should be fetched

  • is required

  • Type: string

Constraints

enum: the value of this property must be equal to one of the following values:

ValueExplanation
"Git"The algo is obtained from a git repository

Default Value

The default value is:

"Git"

AlgoDetails

  • is required

  • Type: object

More info on each property of AlgoDetails

Default Value

The default value is:

{
"Path": "",
"Branch": "master",
"Auth": "None",
"Debug": false,
"DockerfilePath": "Dockerfile",
"ConfigPath": "config.yaml",
"AlgoOutput": [
"/path/to/model/output",
"any/other/path"
],
"Logs": [
"path/to/model/logs",
"any/other/logs"
]
}

Details Properties

PropertyTypeRequired
ImagebooleanRequired
PortstringRequired
HostIPstringOptional
DebugbooleanOptional
BasepathstringOptional
DockerHoststringRequired
DockerHostPortnumberRequired
RuntimestringOptional

Image

Specifies whether the docker is an Image or not

  • is required

  • Type: boolean

ImageName

If Image, the Image name to run

  • is optional

  • Type: string

Port

Port to be used for Training service, Default: 7707

Port

  • is required

  • Type: string

Constraints

pattern: the string must match the following regular expression:

^[0-9]+/(tcp|udp)$

try pattern

Default Value

The default value is:

"7707/tcp"

HostIP

IP on the host machine where the container should listen on

  • is optional

  • Type: string

Constraints

hostname: the string must be a hostname, according to RFC 1123, section 2.1

Default Value

The default value is:

"0.0.0.0"

Debug

Set true to enable debug mode

  • is optional

  • Type: boolean

Basepath

Set the basepath to the path where the scripts are to be copied and run

  • is optional

  • Type: string

DockerHost

The ip address or the hostname of the Docker server to be connected. The Dockerserver can be remote over a VPN also, provided proper network connectivity is available.

  • is required

  • Type: string

Constraints

hostname: the string must be a hostname, according to RFC 1123, section 2.1

Default Value

The default value is:

"localhost"

DockerHostPort

The port number of the Docker server to be connected.

  • is required

  • Type: number

Constraints

maximum: the value of this number must smaller than or equal to: 65535

minimum: the value of this number must greater than or equal to: 0

Default Value

The default value is:

2375

Runtime

Configure docker runtime, for eg: user nvidia for using Docker with cuda compatiblie gpu cards

  • is optional

  • Type: string

Constraints

enum: the value of this property must be equal to one of the following values:

ValueExplanation
"nvidia"

AlgoDetails Properties

PropertyTypeRequired
PathstringRequired
BranchstringRequired
AuthstringRequired
DockerFilePathstringOptional
ConfigPathstringRequired
UsernamestringOptional
CredentialstringOptional
AlgoOutputarrayRequired
RestoreOutputbooleanOptional
LogsarrayOptional

Path

The git url or path from where the algorithm can be fetched.

  • is required

  • Type: string

Branch

The name of the branch to be used to fetch the algorithm.

  • is required

  • Type: string

Default Value

The default value is:

"master"

Auth

The Authentication mechanism to use.

  • is required

  • Type: string

Constraints

enum: the value of this property must be equal to one of the following values:

ValueExplanation
"None"
"http"
"ssh"

Default Value

The default value is:

"None"

DockerFilePath

The path to Dockerfile inside the repository.

  • is optional

  • Type: string

Default Value

The default value is:

"Dockerfile"

ConfigPath

The path to config.yaml inside the repository.

ConfigPath

  • is required

  • Type: string

Default Value

The default value is:

"config.yaml"

Username

The username to authenticate against in case the authentication type is not none.

  • is optional

  • Type: string

Credential

The password or authkey to be used to authenticate

  • is optional

  • Type: string

AlgoOutput

Provide all the paths which needs to be saved as the output of the model trained.

  • is required

  • Type: array

RestoreOutput

If set, previously trained models would be restored in new training runs. This will help in incremental training

  • is optional

  • Type: boolean

Default Value

The default value is:

true

Logs

Provide all the paths which log details about the training/evaluation

  • is optional

  • Type: array