Job configuration reference
Fields you set when you create a GPU Container Job or a Managed Inference Job.
You set these fields when you create a job, either in the web interface or with cosmicac jobs create. The job type determines which fields apply. In non-interactive mode, set each field with the flag in the CLI flag column. For the create flow, see Create a GPU Container Job and Create a Managed Inference Job.
Common fields
These fields apply to every job type.
| Field | Required | CLI flag | Description |
|---|---|---|---|
| Job type | Yes | --type | The kind of job to create, GPU Container or Managed Inference. |
| Job name | Yes | --name | A name to identify the job. |
| Tags | Yes | --tags | One or more labels for the job. The CLI accepts a comma-separated list. |
| Location | Yes | --location | Where the job runs, for example IN. The CLI lists the locations your racks report. |
GPU configuration
These fields select the job's hardware.
| Field | Required | CLI flag | Description |
|---|---|---|---|
| GPU | Yes | --gpu-type | The GPU to use, for example GH100_H100_SXM5_80GB. The CLI lists the GPU types your racks report. |
| GPU count | Yes | --gpu-count | Number of GPUs. |
| CUDA / driver | Yes | --driver | GPU driver version. Only CUDA 12.9 is supported. |
Set the GPU type and count in one flag with --gpu TYPE=COUNT, for example --gpu H100=2. This replaces --gpu-type and --gpu-count.
GPU Container parameters
These fields apply to a GPU Container Job.
| Field | Required | CLI flag | Description |
|---|---|---|---|
| Base OS image | Yes | --base-image | Base OS image for the container. Only Ubuntu22.04/CUDA12.9 is supported. |
| Disk (GB) | Yes | --root-disk-size-gb | Root disk size in GB. One of 250, 500, or 1000. |
Managed Inference (vLLM) parameters
These fields apply to a vLLM Managed Inference Job.
| Field | Required | CLI flag | Description |
|---|---|---|---|
| Model | Yes | --model | Hugging Face model ID to serve (Qwen/Qwen3-32B). |
| Runtime image (CUDA) | Yes | --runtime-image | Serving runtime image (vllm-openai-0.8.5). |
| Data type | Yes | --data-type | Numeric precision the model runs at (BF16 or Auto). |
| Quantisation | Yes | --quantisation | Quantization scheme (FP8 or INT8). |
| Tensor parallel | Yes | --tensor-parallel | Number of GPUs to split the model across. |
| GPU memory utilization | Yes | --gpu-memory-utilization | Fraction of GPU memory to use, between 0 and 1. |
| Max concurrent sequences | Yes | --max-concurrent-sequences | Maximum requests handled at once. |
| Max model length | Yes | --max-model-length | Maximum model context length. |
| Reasoning parser | Yes | --reasoning-parser | Parser for the model's reasoning output. |
| Video & image input | Yes | --multimodal | Whether the model accepts multimodal input. true or false. |
| Root disk size | Yes | --root-disk-size-gb | VM root disk size in GB. One of 250, 500, or 1000. |
| Environment variables | No | --env | Environment variables passed to the inference service. |
| Endpoint name | Yes | --endpoint-name | Name of the inference endpoint. Must be unique across active inference jobs. |
| Replicas | Yes | --replica | Number of endpoint replicas. |
| Require Authorization header | Yes | --require-auth-header / --no-auth-header | Whether callers must send an authorization header. true or false. |
Managed Inference (Parakeet) parameters
These fields apply to a Parakeet Managed Inference Job.
| Field | Required | CLI flag | Description |
|---|---|---|---|
| Model | Yes | --model | Parakeet model to serve, nvidia/parakeet-tdt-0.6b-v3. |
| Endpoint name | Yes | --endpoint-name | Name of the transcription endpoint. |
| Chunk duration | Yes | --chunk-duration | Audio chunk length in seconds (60). |
| Chunk overlap | Yes | --chunk-overlap | Overlap between chunks in seconds (10). |
| Max file size (MB) | Yes | --max-file-size-mb | Maximum upload size in MB (2048). |
| Require Authorization header | Yes | --require-auth-header / --no-auth-header | Whether callers must send an authorization header. true or false. |