> ## Documentation Index > Fetch the complete documentation index at: https://mintlify.com/pranavkrishnasuresh/chemAgent/llms.txt > Use this file to discover all available pages before exploring further. # VRAM Settings > Configure VRAM requirements and manage GPU memory for LlaSMol models ## Overview ChemAgent provides a `LOW_VRAM` configuration flag to control whether the LlaSMol model is loaded. The LlaSMol model requires **at least 15GB of VRAM** to run properly. ## Configuration File The VRAM setting is controlled in: ```python plan_execute_agent/config.py theme={null} # Flag to avoid running LLaSmol with <15GB VRAM (MIN REQUIREMENT) LOW_VRAM = True ``` The LlaSMol model requires a minimum of 15GB VRAM. Systems with less VRAM should keep `LOW_VRAM = True`. ## VRAM Modes ### Low VRAM Mode (Default) **Configuration:** ```python theme={null} LOW_VRAM = True ``` **Behavior:** * LlaSMol model is NOT loaded (chem\_tools.py:115-119) * `answer_chemistry_query` tool will raise a RuntimeError if called * Agent relies entirely on OpenAI GPT-4o for chemistry queries * Suitable for systems with less than 15GB VRAM **Error Handling:** When `LOW_VRAM = True`, attempting to use the chemistry query tool will produce: ```python theme={null} RuntimeError: answer_chemistry_query tool cannot be used with LOW_VRAM enabled. ``` The model response is set to: ``` "LlaSmol model unused. Low VRAM enabled." ``` ### High VRAM Mode (Cluster/GPU) **Configuration:** ```python theme={null} LOW_VRAM = False ``` **Requirements:** * Minimum 15GB VRAM * CUDA-enabled GPU * PyTorch with CUDA support **Behavior:** * LlaSMol model is loaded into GPU memory * `answer_chemistry_query` tool becomes available * Model uses `bfloat16` precision for memory efficiency * Automatic device mapping with `device_map="auto"` ## Implementation Details ### Conditional Loading The VRAM flag controls model initialization in `plan_execute_agent/chem_tools.py`: ```python chem_tools.py:115-119 theme={null} # Tool to use LlaSmol to answer prompts related to Chemistry # Won't initialize with low VRAM if not LOW_VRAM: from LLM4Chem.generation import LlaSMolGeneration generator = LlaSMolGeneration("osunlp/LlaSMol-Mistral-7B", device="cuda") else: generator = None ``` ### Runtime Checks The `answer_chemistry_query` tool validates VRAM mode: ```python chem_tools.py:148-151 theme={null} if LOW_VRAM: llasmol_response.model_response = "LlaSmol model unused. Low VRAM enabled." raise RuntimeError( "answer_chemistry_query tool cannot be used with LOW_VRAM enabled." ) ``` ## Model Memory Usage When loaded, the LlaSMol model uses: ### Memory Optimizations 1. **bfloat16 Precision** (model.py:38, 45) ```python theme={null} model = AutoModelForCausalLM.from_pretrained( base_model, torch_dtype=torch.bfloat16, device_map="auto", ) ``` 2. **PEFT/LoRA Loading** (model.py:42-46) ```python theme={null} model = PeftModelForCausalLM.from_pretrained( model, model_name, torch_dtype=torch.bfloat16, ) ``` 3. **Model Merging** (model.py:50) ```python theme={null} model = model.merge_and_unload() ``` 4. **Torch Compilation** (model.py:58-59) ```python theme={null} if torch.__version__ >= "2" and sys.platform != "win32": model = torch.compile(model) ``` ## Device Selection The model automatically detects available devices: ```python model.py:10-16 theme={null} def get_device(): if torch.cuda.is_available(): device = "cuda" else: device = "cpu" return device ``` Currently, CPU-only inference is not implemented. The model loader raises `NotImplementedError` for CPU devices (model.py:48). ## Configuration for Different Environments ### Local Development (Low VRAM) ```python plan_execute_agent/config.py theme={null} LOW_VRAM = True ``` **Suitable for:** * Laptops with consumer GPUs (less than 15GB VRAM) * Development machines with limited GPU memory * Testing agent logic without model inference ### Cluster/Production (High VRAM) ```python plan_execute_agent/config.py theme={null} LOW_VRAM = False ``` **Suitable for:** * NVIDIA A100 (40GB/80GB) * NVIDIA V100 (16GB/32GB) * NVIDIA RTX 3090 (24GB) * Cloud GPU instances with ≥15GB VRAM ## Troubleshooting ### Out of Memory Errors If you encounter CUDA OOM errors: 1. **Verify VRAM availability:** ```bash theme={null} nvidia-smi ``` 2. **Check available memory:** ```python theme={null} import torch print(f"Available VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB") ``` 3. **Set LOW\_VRAM = True** if VRAM \< 15GB ### Model Not Loading If the model fails to load: ```python theme={null} # Check CUDA availability import torch print(f"CUDA available: {torch.cuda.is_available()}") print(f"CUDA version: {torch.version.cuda}") ``` ## Related Configuration The same `LOW_VRAM` flag exists in: * `plan_execute_agent/config.py:2` (active flag) * `LLM4Chem/config.py:2` (legacy, not actively used) Only modify the flag in `plan_execute_agent/config.py`. The flag in `LLM4Chem/config.py` is not referenced by the agent. ## Next Steps Choose which LlaSMol model to use Configure API keys and environment variables