> ## Documentation Index
> Fetch the complete documentation index at: https://mintlify.com/pranavkrishnasuresh/chemAgent/llms.txt
> Use this file to discover all available pages before exploring further.

# VRAM Settings

> Configure VRAM requirements and manage GPU memory for LlaSMol models

## Overview

ChemAgent provides a `LOW_VRAM` configuration flag to control whether the LlaSMol model is loaded. The LlaSMol model requires **at least 15GB of VRAM** to run properly.

## Configuration File

The VRAM setting is controlled in:

```python plan_execute_agent/config.py theme={null}
# Flag to avoid running LLaSmol with <15GB VRAM (MIN REQUIREMENT)
LOW_VRAM = True
```

<Warning>
  The LlaSMol model requires a minimum of 15GB VRAM. Systems with less VRAM should keep `LOW_VRAM = True`.
</Warning>

## VRAM Modes

### Low VRAM Mode (Default)

**Configuration:**

```python theme={null}
LOW_VRAM = True
```

**Behavior:**

* LlaSMol model is NOT loaded (chem\_tools.py:115-119)
* `answer_chemistry_query` tool will raise a RuntimeError if called
* Agent relies entirely on OpenAI GPT-4o for chemistry queries
* Suitable for systems with less than 15GB VRAM

**Error Handling:**

When `LOW_VRAM = True`, attempting to use the chemistry query tool will produce:

```python theme={null}
RuntimeError: answer_chemistry_query tool cannot be used with LOW_VRAM enabled.
```

The model response is set to:

```
"LlaSmol model unused. Low VRAM enabled."
```

### High VRAM Mode (Cluster/GPU)

**Configuration:**

```python theme={null}
LOW_VRAM = False
```

**Requirements:**

* Minimum 15GB VRAM
* CUDA-enabled GPU
* PyTorch with CUDA support

**Behavior:**

* LlaSMol model is loaded into GPU memory
* `answer_chemistry_query` tool becomes available
* Model uses `bfloat16` precision for memory efficiency
* Automatic device mapping with `device_map="auto"`

## Implementation Details

### Conditional Loading

The VRAM flag controls model initialization in `plan_execute_agent/chem_tools.py`:

```python chem_tools.py:115-119 theme={null}
# Tool to use LlaSmol to answer prompts related to Chemistry
# Won't initialize with low VRAM
if not LOW_VRAM:
    from LLM4Chem.generation import LlaSMolGeneration
    generator = LlaSMolGeneration("osunlp/LlaSMol-Mistral-7B", device="cuda")
else:
    generator = None
```

### Runtime Checks

The `answer_chemistry_query` tool validates VRAM mode:

```python chem_tools.py:148-151 theme={null}
if LOW_VRAM:
    llasmol_response.model_response = "LlaSmol model unused. Low VRAM enabled."
    raise RuntimeError(
        "answer_chemistry_query tool cannot be used with LOW_VRAM enabled."
    )
```

## Model Memory Usage

When loaded, the LlaSMol model uses:

### Memory Optimizations

1. **bfloat16 Precision** (model.py:38, 45)
   ```python theme={null}
   model = AutoModelForCausalLM.from_pretrained(
       base_model,
       torch_dtype=torch.bfloat16,
       device_map="auto",
   )
   ```

2. **PEFT/LoRA Loading** (model.py:42-46)
   ```python theme={null}
   model = PeftModelForCausalLM.from_pretrained(
       model,
       model_name,
       torch_dtype=torch.bfloat16,
   )
   ```

3. **Model Merging** (model.py:50)
   ```python theme={null}
   model = model.merge_and_unload()
   ```

4. **Torch Compilation** (model.py:58-59)
   ```python theme={null}
   if torch.__version__ >= "2" and sys.platform != "win32":
       model = torch.compile(model)
   ```

## Device Selection

The model automatically detects available devices:

```python model.py:10-16 theme={null}
def get_device():
    if torch.cuda.is_available():
        device = "cuda"
    else:
        device = "cpu"
    return device
```

<Note>
  Currently, CPU-only inference is not implemented. The model loader raises `NotImplementedError` for CPU devices (model.py:48).
</Note>

## Configuration for Different Environments

### Local Development (Low VRAM)

```python plan_execute_agent/config.py theme={null}
LOW_VRAM = True
```

**Suitable for:**

* Laptops with consumer GPUs (less than 15GB VRAM)
* Development machines with limited GPU memory
* Testing agent logic without model inference

### Cluster/Production (High VRAM)

```python plan_execute_agent/config.py theme={null}
LOW_VRAM = False
```

**Suitable for:**

* NVIDIA A100 (40GB/80GB)
* NVIDIA V100 (16GB/32GB)
* NVIDIA RTX 3090 (24GB)
* Cloud GPU instances with ≥15GB VRAM

## Troubleshooting

### Out of Memory Errors

If you encounter CUDA OOM errors:

1. **Verify VRAM availability:**
   ```bash theme={null}
   nvidia-smi
   ```

2. **Check available memory:**
   ```python theme={null}
   import torch
   print(f"Available VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
   ```

3. **Set LOW\_VRAM = True** if VRAM \< 15GB

### Model Not Loading

If the model fails to load:

```python theme={null}
# Check CUDA availability
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
```

## Related Configuration

The same `LOW_VRAM` flag exists in:

* `plan_execute_agent/config.py:2` (active flag)
* `LLM4Chem/config.py:2` (legacy, not actively used)

<Warning>
  Only modify the flag in `plan_execute_agent/config.py`. The flag in `LLM4Chem/config.py` is not referenced by the agent.
</Warning>

## Next Steps

<CardGroup cols={2}>
  <Card title="Model Selection" icon="robot" href="/configuration/model-selection">
    Choose which LlaSMol model to use
  </Card>

  <Card title="Environment Setup" icon="key" href="/configuration/environment">
    Configure API keys and environment variables
  </Card>
</CardGroup>
