Installation - ChemAgent

Overview

This guide will walk you through the complete installation process for ChemAgent, including environment setup, dependency installation, and GPU/CUDA configuration.

Installation time: ~15-30 minutes depending on your internet connection and hardware.

System Requirements

Minimum Requirements

Python: 3.8 or higher
OS: Linux, macOS, or Windows (WSL recommended for Windows)
RAM: 8GB minimum
Storage: ~10GB for dependencies and models

GPU Requirements (Optional but Recommended)

For LOW_VRAM Mode

VRAM: Any GPU or CPU-only
CUDA: Not required
Models: GPT-4o only (via API)
Best for: Quick prototyping, limited hardware

For Full Model Support

VRAM: ≥15GB recommended
CUDA: 11.0 or higher
Models: GPT-4o + LlaSMol-Mistral-7B
Best for: Production, offline inference

The LlaSMol-Mistral-7B model requires at least 15GB VRAM. If you have less, keep LOW_VRAM=True in the configuration.

Installation Steps

Clone the repository

git clone <repo_url>
cd <repo_folder>

Replace <repo_url> and <repo_folder> with your actual repository URL and folder name.

Set up environment variables

Create your .env file from the example template:

cp .env.example .env

Then edit .env and add your OpenAI API key:

.env

OPENAI_API_KEY=sk-your-api-key-here

Never commit your .env file to version control! It contains sensitive API keys.

Where to get your OpenAI API key

Go to OpenAI Platform
Sign in or create an account
Navigate to API Keys section
Create a new secret key
Copy and paste it into your .env file

Make sure you have credits available in your OpenAI account.

Install agent dependencies

Install the core agent dependencies:

pip install -r agent_requirements.txt

This installs:

python-dotenv - Environment variable management
partialsmiles - SMILES parsing utilities
langgraph==0.2.55 - Agent orchestration framework
langchain-community - LangChain community integrations
langchain-openai - OpenAI integration for LangChain
rouge-score - Evaluation metrics
spacy==3.8.2 - NLP for chemistry term extraction

This step is required for all installations, regardless of VRAM settings.

Install combined requirements

Install additional chemistry and ML dependencies:

pip install -r comb_requirements.txt

This is a comprehensive requirements file (~120 packages) including:

RDKit - Core chemistry library for molecular operations
PyTorch - Deep learning framework
Transformers - Hugging Face transformers for LLMs
LangChain - LLM application framework
OpenAI SDK - API client for GPT models
PubChemPy - PubChem API wrapper for RAG
Gradio - Optional UI components
PEFT - Parameter-efficient fine-tuning
And many more scientific computing libraries

This installation may take 10-20 minutes depending on your connection speed.

Download spaCy language model

The agent uses spaCy for NLP tasks. Download the English language model:

python -m spacy download en_core_web_sm

Configure VRAM settings

Edit plan_execute_agent/config.py to match your hardware:

plan_execute_agent/config.py

# Flag to avoid running LLaSmol with <15GB VRAM (MIN REQUIREMENT)
LOW_VRAM = True  # Set to False if you have ≥15GB VRAM

# For systems with <15GB VRAM or CPU-only
LOW_VRAM = True

If you set LOW_VRAM=False without sufficient VRAM, you may encounter CUDA out-of-memory errors.

Verify installation

Test your installation with a simple query:

python -m plan_execute_agent.rdkit_agent --query "Could you provide the SMILES for <IUPAC> 4-ethyl-4-methyloxolan-2-one </IUPAC>?"

You should see output similar to:

Running the main function
Structuring tool call!
Input: Could you provide the SMILES for <IUPAC> 4-ethyl-4-methyloxolan-2-one </IUPAC>?
Output: Could you provide the SMILES for <IUPAC> 4-ethyl-4-methyloxolan-2-one </IUPAC>?
Result: [SMILES output]
Completed: True
Attempts: 1

GPU and CUDA Setup

Checking Your GPU

Verify your GPU is available:

# Check NVIDIA GPU
nvidia-smi

# Check PyTorch CUDA availability
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'CUDA version: {torch.version.cuda}')"

Installing CUDA (if needed)

Linux
Windows
macOS

# For Ubuntu/Debian
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda

# Add to PATH
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

Download CUDA Toolkit from NVIDIA website
Run the installer
Follow the installation wizard
Verify installation: nvcc --version

macOS does not support CUDA. Use CPU-only mode:

LOW_VRAM = True

PyTorch with CUDA

If PyTorch doesn’t detect CUDA, reinstall with CUDA support:

# For CUDA 11.8
pip install torch==2.0.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.1
pip install torch==2.0.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

The version in comb_requirements.txt is torch==2.0.0. Match your CUDA version accordingly.

VRAM Configuration Details

LOW_VRAM = True (Default)

from plan_execute_agent.config import LOW_VRAM

if not LOW_VRAM:
    from LLM4Chem.generation import LlaSMolGeneration
    generator = LlaSMolGeneration("osunlp/LlaSMol-Mistral-7B", device="cuda")
else:
    generator = None  # Model not loaded

Behavior:

LlaSMol model is not loaded into memory
All queries processed using GPT-4o via OpenAI API
Minimal VRAM usage (~2-4GB for basic operations)
Faster startup time
Requires OpenAI API credits

LOW_VRAM = False (Cluster/High-VRAM)

Behavior:

LlaSMol-Mistral-7B model loaded from Hugging Face Hub
First run downloads ~14GB model weights
Model loaded onto CUDA device
Local inference for chemistry tasks
Requires ≥15GB VRAM
No additional API costs for chemistry queries

On first run with LOW_VRAM=False, the model will download from Hugging Face. This may take 10-30 minutes depending on your connection.

Dependency Files Explained

agent_requirements.txt

Purpose: Core agent dependencies for LangGraph and LangChain orchestration.

python-dotenv==0.19.1    # Environment variables
partialsmiles==2.0       # SMILES parsing
langgraph==0.2.55        # Agent framework (specific version required)
langchain-community==0.2.19
langchain-openai==0.1.25 # OpenAI integration
rouge-score==0.1.2       # Evaluation metrics
spacy==3.8.2             # NLP for term extraction

Important: LangGraph version 0.2.55 is required. Newer versions may have breaking changes.

comb_requirements.txt

Purpose: Comprehensive chemistry, ML, and scientific computing dependencies. Key packages:

rdkit==2024.3.6 - Chemistry library for molecular operations
torch==2.0.0 - Deep learning framework
transformers==4.34.1 - Hugging Face transformers
PubChemPy==1.0.4 - PubChem API for RAG
openai - GPT-4o API client
gradio==3.47.1 - Optional UI framework
peft==0.7.0 - Parameter-efficient fine-tuning
accelerate==0.24.1 - Distributed training utilities

Virtual Environment Setup (Recommended)

Using venv

# Create virtual environment
python -m venv chemenv

# Activate (Linux/macOS)
source chemenv/bin/activate

# Activate (Windows)
chemenv\Scripts\activate

# Install dependencies
pip install -r agent_requirements.txt
pip install -r comb_requirements.txt

Using conda

# Create conda environment
conda create -n chemenv python=3.10

# Activate environment
conda activate chemenv

# Install dependencies
pip install -r agent_requirements.txt
pip install -r comb_requirements.txt

# Install conda-specific packages
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

Using a virtual environment prevents dependency conflicts with other Python projects.

Troubleshooting Installation

Import Error: No module named 'dotenv'

Problem: python-dotenv not installed.Solution:

pip install python-dotenv

CUDA out of memory error

Problem: Insufficient VRAM for LlaSMol model.Solution: Set LOW_VRAM=True in plan_execute_agent/config.py:

LOW_VRAM = True

RDKit import error

Problem: RDKit installation failed or corrupted.Solution:

# Uninstall and reinstall
pip uninstall rdkit
pip install rdkit==2024.3.6

# Or use conda (recommended for RDKit)
conda install -c conda-forge rdkit==2024.3.6

LangGraph version error

Problem: Incorrect LangGraph version installed.Solution: LangGraph 0.2.55 is specifically required:

pip install langgraph==0.2.55 --force-reinstall

OpenAI API authentication error

Problem: Missing or invalid OPENAI_API_KEY.Solution:

Verify .env file exists in the project root
Check that the key is correctly formatted:
```
OPENAI_API_KEY=sk-...
```
Ensure no extra spaces or quotes around the key
Verify your API key is active at platform.openai.com

spaCy model not found

Problem: English language model not downloaded.Solution:

python -m spacy download en_core_web_sm

Torch not compiled with CUDA

Problem: CPU-only PyTorch installed instead of CUDA version.Solution: Reinstall PyTorch with CUDA support:

pip uninstall torch torchvision torchaudio
pip install torch==2.0.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Hugging Face model download timeout

Problem: Slow connection or firewall blocking Hugging Face Hub.Solution:

Use a VPN or faster connection

Download manually:

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("osunlp/LlaSMol-Mistral-7B")

Or use LOW_VRAM=True to skip model download

Testing Your Installation

Quick Test Suite

import asyncio
from plan_execute_agent.rdkit_agent import process_input

async def test_installation():
    """Test basic functionality"""
    
    # Test 1: Simple IUPAC to SMILES
    print("Test 1: IUPAC to SMILES...")
    result, completed, attempts, _, _, _ = await process_input(
        "Could you provide the SMILES for <IUPAC> 4-ethyl-4-methyloxolan-2-one </IUPAC>?"
    )
    assert completed, "Test 1 failed: Query did not complete"
    print(f"✓ Test 1 passed: {result}")
    
    # Test 2: SMILES validation
    print("\nTest 2: SMILES to IUPAC...")
    result, completed, attempts, _, _, _ = await process_input(
        "What is the IUPAC name of <SMILES> C1CCOC1 </SMILES>?"
    )
    assert completed, "Test 2 failed: Query did not complete"
    print(f"✓ Test 2 passed: {result}")
    
    print("\n✓ All tests passed! Installation successful.")

if __name__ == "__main__":
    asyncio.run(test_installation())

Run the test suite:

python test_installation.py

Next Steps

Quickstart Guide

Learn how to run your first chemistry query

Configuration

Customize ChemAgent for your use case

API Reference

Explore all available functions and tools

Guides

See more examples and use cases

Additional Resources

RDKit Documentation: rdkit.org/docs
LangGraph Guide: langchain-ai.github.io/langgraph
OpenAI API Docs: platform.openai.com/docs
PubChem API: pubchem.ncbi.nlm.nih.gov/docs/pug-rest

​Overview

​System Requirements

​Minimum Requirements

​GPU Requirements (Optional but Recommended)

For LOW_VRAM Mode

For Full Model Support

​Installation Steps

​GPU and CUDA Setup

​Checking Your GPU

​Installing CUDA (if needed)

​PyTorch with CUDA

​VRAM Configuration Details

​LOW_VRAM = True (Default)

​LOW_VRAM = False (Cluster/High-VRAM)

​Dependency Files Explained

​agent_requirements.txt

​comb_requirements.txt

​Virtual Environment Setup (Recommended)

​Using venv

​Using conda

​Troubleshooting Installation

​Testing Your Installation

​Quick Test Suite

​Run the test suite:

​Next Steps

Quickstart Guide

Configuration

API Reference

Guides

​Additional Resources

Overview

System Requirements

Minimum Requirements

GPU Requirements (Optional but Recommended)

Installation Steps

GPU and CUDA Setup

Checking Your GPU

Installing CUDA (if needed)

PyTorch with CUDA

VRAM Configuration Details

LOW_VRAM = True (Default)

LOW_VRAM = False (Cluster/High-VRAM)

Dependency Files Explained

agent_requirements.txt

comb_requirements.txt

Virtual Environment Setup (Recommended)

Using venv

Using conda

Troubleshooting Installation

Testing Your Installation

Quick Test Suite

Run the test suite:

Next Steps

Additional Resources