Skip to main content

Overview

This guide will walk you through the complete installation process for ChemAgent, including environment setup, dependency installation, and GPU/CUDA configuration.
Installation time: ~15-30 minutes depending on your internet connection and hardware.

System Requirements

Minimum Requirements

  • Python: 3.8 or higher
  • OS: Linux, macOS, or Windows (WSL recommended for Windows)
  • RAM: 8GB minimum
  • Storage: ~10GB for dependencies and models

For LOW_VRAM Mode

  • VRAM: Any GPU or CPU-only
  • CUDA: Not required
  • Models: GPT-4o only (via API)
  • Best for: Quick prototyping, limited hardware

For Full Model Support

  • VRAM: ≥15GB recommended
  • CUDA: 11.0 or higher
  • Models: GPT-4o + LlaSMol-Mistral-7B
  • Best for: Production, offline inference
The LlaSMol-Mistral-7B model requires at least 15GB VRAM. If you have less, keep LOW_VRAM=True in the configuration.

Installation Steps

1

Clone the repository

git clone <repo_url>
cd <repo_folder>
Replace <repo_url> and <repo_folder> with your actual repository URL and folder name.
2

Set up environment variables

Create your .env file from the example template:
cp .env.example .env
Then edit .env and add your OpenAI API key:
.env
OPENAI_API_KEY=sk-your-api-key-here
Never commit your .env file to version control! It contains sensitive API keys.
  1. Go to OpenAI Platform
  2. Sign in or create an account
  3. Navigate to API Keys section
  4. Create a new secret key
  5. Copy and paste it into your .env file
Make sure you have credits available in your OpenAI account.
3

Install agent dependencies

Install the core agent dependencies:
pip install -r agent_requirements.txt
This installs:
  • python-dotenv - Environment variable management
  • partialsmiles - SMILES parsing utilities
  • langgraph==0.2.55 - Agent orchestration framework
  • langchain-community - LangChain community integrations
  • langchain-openai - OpenAI integration for LangChain
  • rouge-score - Evaluation metrics
  • spacy==3.8.2 - NLP for chemistry term extraction
This step is required for all installations, regardless of VRAM settings.
4

Install combined requirements

Install additional chemistry and ML dependencies:
pip install -r comb_requirements.txt
This is a comprehensive requirements file (~120 packages) including:
  • RDKit - Core chemistry library for molecular operations
  • PyTorch - Deep learning framework
  • Transformers - Hugging Face transformers for LLMs
  • LangChain - LLM application framework
  • OpenAI SDK - API client for GPT models
  • PubChemPy - PubChem API wrapper for RAG
  • Gradio - Optional UI components
  • PEFT - Parameter-efficient fine-tuning
  • And many more scientific computing libraries
This installation may take 10-20 minutes depending on your connection speed.
5

Download spaCy language model

The agent uses spaCy for NLP tasks. Download the English language model:
python -m spacy download en_core_web_sm
6

Configure VRAM settings

Edit plan_execute_agent/config.py to match your hardware:
plan_execute_agent/config.py
# Flag to avoid running LLaSmol with <15GB VRAM (MIN REQUIREMENT)
LOW_VRAM = True  # Set to False if you have ≥15GB VRAM
# For systems with <15GB VRAM or CPU-only
LOW_VRAM = True
If you set LOW_VRAM=False without sufficient VRAM, you may encounter CUDA out-of-memory errors.
7

Verify installation

Test your installation with a simple query:
python -m plan_execute_agent.rdkit_agent --query "Could you provide the SMILES for <IUPAC> 4-ethyl-4-methyloxolan-2-one </IUPAC>?"
You should see output similar to:
Running the main function
Structuring tool call!
Input: Could you provide the SMILES for <IUPAC> 4-ethyl-4-methyloxolan-2-one </IUPAC>?
Output: Could you provide the SMILES for <IUPAC> 4-ethyl-4-methyloxolan-2-one </IUPAC>?
Result: [SMILES output]
Completed: True
Attempts: 1

GPU and CUDA Setup

Checking Your GPU

Verify your GPU is available:
# Check NVIDIA GPU
nvidia-smi

# Check PyTorch CUDA availability
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'CUDA version: {torch.version.cuda}')"

Installing CUDA (if needed)

# For Ubuntu/Debian
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda

# Add to PATH
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

PyTorch with CUDA

If PyTorch doesn’t detect CUDA, reinstall with CUDA support:
# For CUDA 11.8
pip install torch==2.0.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.1
pip install torch==2.0.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
The version in comb_requirements.txt is torch==2.0.0. Match your CUDA version accordingly.

VRAM Configuration Details

LOW_VRAM = True (Default)

from plan_execute_agent.config import LOW_VRAM

if not LOW_VRAM:
    from LLM4Chem.generation import LlaSMolGeneration
    generator = LlaSMolGeneration("osunlp/LlaSMol-Mistral-7B", device="cuda")
else:
    generator = None  # Model not loaded
Behavior:
  • LlaSMol model is not loaded into memory
  • All queries processed using GPT-4o via OpenAI API
  • Minimal VRAM usage (~2-4GB for basic operations)
  • Faster startup time
  • Requires OpenAI API credits

LOW_VRAM = False (Cluster/High-VRAM)

Behavior:
  • LlaSMol-Mistral-7B model loaded from Hugging Face Hub
  • First run downloads ~14GB model weights
  • Model loaded onto CUDA device
  • Local inference for chemistry tasks
  • Requires ≥15GB VRAM
  • No additional API costs for chemistry queries
On first run with LOW_VRAM=False, the model will download from Hugging Face. This may take 10-30 minutes depending on your connection.

Dependency Files Explained

agent_requirements.txt

Purpose: Core agent dependencies for LangGraph and LangChain orchestration.
python-dotenv==0.19.1    # Environment variables
partialsmiles==2.0       # SMILES parsing
langgraph==0.2.55        # Agent framework (specific version required)
langchain-community==0.2.19
langchain-openai==0.1.25 # OpenAI integration
rouge-score==0.1.2       # Evaluation metrics
spacy==3.8.2             # NLP for term extraction
Important: LangGraph version 0.2.55 is required. Newer versions may have breaking changes.

comb_requirements.txt

Purpose: Comprehensive chemistry, ML, and scientific computing dependencies. Key packages:
  • rdkit==2024.3.6 - Chemistry library for molecular operations
  • torch==2.0.0 - Deep learning framework
  • transformers==4.34.1 - Hugging Face transformers
  • PubChemPy==1.0.4 - PubChem API for RAG
  • openai - GPT-4o API client
  • gradio==3.47.1 - Optional UI framework
  • peft==0.7.0 - Parameter-efficient fine-tuning
  • accelerate==0.24.1 - Distributed training utilities

Using venv

# Create virtual environment
python -m venv chemenv

# Activate (Linux/macOS)
source chemenv/bin/activate

# Activate (Windows)
chemenv\Scripts\activate

# Install dependencies
pip install -r agent_requirements.txt
pip install -r comb_requirements.txt

Using conda

# Create conda environment
conda create -n chemenv python=3.10

# Activate environment
conda activate chemenv

# Install dependencies
pip install -r agent_requirements.txt
pip install -r comb_requirements.txt

# Install conda-specific packages
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
Using a virtual environment prevents dependency conflicts with other Python projects.

Troubleshooting Installation

Problem: python-dotenv not installed.Solution:
pip install python-dotenv
Problem: Insufficient VRAM for LlaSMol model.Solution: Set LOW_VRAM=True in plan_execute_agent/config.py:
LOW_VRAM = True
Problem: RDKit installation failed or corrupted.Solution:
# Uninstall and reinstall
pip uninstall rdkit
pip install rdkit==2024.3.6

# Or use conda (recommended for RDKit)
conda install -c conda-forge rdkit==2024.3.6
Problem: Incorrect LangGraph version installed.Solution: LangGraph 0.2.55 is specifically required:
pip install langgraph==0.2.55 --force-reinstall
Problem: Missing or invalid OPENAI_API_KEY.Solution:
  1. Verify .env file exists in the project root
  2. Check that the key is correctly formatted:
    OPENAI_API_KEY=sk-...
    
  3. Ensure no extra spaces or quotes around the key
  4. Verify your API key is active at platform.openai.com
Problem: English language model not downloaded.Solution:
python -m spacy download en_core_web_sm
Problem: CPU-only PyTorch installed instead of CUDA version.Solution: Reinstall PyTorch with CUDA support:
pip uninstall torch torchvision torchaudio
pip install torch==2.0.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Problem: Slow connection or firewall blocking Hugging Face Hub.Solution:
  1. Use a VPN or faster connection
  2. Download manually:
    from transformers import AutoModelForCausalLM
    model = AutoModelForCausalLM.from_pretrained("osunlp/LlaSMol-Mistral-7B")
    
  3. Or use LOW_VRAM=True to skip model download

Testing Your Installation

Quick Test Suite

import asyncio
from plan_execute_agent.rdkit_agent import process_input

async def test_installation():
    """Test basic functionality"""
    
    # Test 1: Simple IUPAC to SMILES
    print("Test 1: IUPAC to SMILES...")
    result, completed, attempts, _, _, _ = await process_input(
        "Could you provide the SMILES for <IUPAC> 4-ethyl-4-methyloxolan-2-one </IUPAC>?"
    )
    assert completed, "Test 1 failed: Query did not complete"
    print(f"✓ Test 1 passed: {result}")
    
    # Test 2: SMILES validation
    print("\nTest 2: SMILES to IUPAC...")
    result, completed, attempts, _, _, _ = await process_input(
        "What is the IUPAC name of <SMILES> C1CCOC1 </SMILES>?"
    )
    assert completed, "Test 2 failed: Query did not complete"
    print(f"✓ Test 2 passed: {result}")
    
    print("\n✓ All tests passed! Installation successful.")

if __name__ == "__main__":
    asyncio.run(test_installation())

Run the test suite:

python test_installation.py

Next Steps

Quickstart Guide

Learn how to run your first chemistry query

Configuration

Customize ChemAgent for your use case

API Reference

Explore all available functions and tools

Guides

See more examples and use cases

Additional Resources