Overview
This guide will walk you through the complete installation process for ChemAgent, including environment setup, dependency installation, and GPU/CUDA configuration.Installation time: ~15-30 minutes depending on your internet connection and hardware.
System Requirements
Minimum Requirements
- Python: 3.8 or higher
- OS: Linux, macOS, or Windows (WSL recommended for Windows)
- RAM: 8GB minimum
- Storage: ~10GB for dependencies and models
GPU Requirements (Optional but Recommended)
For LOW_VRAM Mode
- VRAM: Any GPU or CPU-only
- CUDA: Not required
- Models: GPT-4o only (via API)
- Best for: Quick prototyping, limited hardware
For Full Model Support
- VRAM: ≥15GB recommended
- CUDA: 11.0 or higher
- Models: GPT-4o + LlaSMol-Mistral-7B
- Best for: Production, offline inference
Installation Steps
Set up environment variables
Create your Then edit
.env file from the example template:.env and add your OpenAI API key:.env
Where to get your OpenAI API key
Where to get your OpenAI API key
- Go to OpenAI Platform
- Sign in or create an account
- Navigate to API Keys section
- Create a new secret key
- Copy and paste it into your
.envfile
Install agent dependencies
Install the core agent dependencies:This installs:
python-dotenv- Environment variable managementpartialsmiles- SMILES parsing utilitieslanggraph==0.2.55- Agent orchestration frameworklangchain-community- LangChain community integrationslangchain-openai- OpenAI integration for LangChainrouge-score- Evaluation metricsspacy==3.8.2- NLP for chemistry term extraction
This step is required for all installations, regardless of VRAM settings.
Install combined requirements
Install additional chemistry and ML dependencies:This is a comprehensive requirements file (~120 packages) including:
- RDKit - Core chemistry library for molecular operations
- PyTorch - Deep learning framework
- Transformers - Hugging Face transformers for LLMs
- LangChain - LLM application framework
- OpenAI SDK - API client for GPT models
- PubChemPy - PubChem API wrapper for RAG
- Gradio - Optional UI components
- PEFT - Parameter-efficient fine-tuning
- And many more scientific computing libraries
Download spaCy language model
The agent uses spaCy for NLP tasks. Download the English language model:
Configure VRAM settings
Edit
plan_execute_agent/config.py to match your hardware:plan_execute_agent/config.py
GPU and CUDA Setup
Checking Your GPU
Verify your GPU is available:Installing CUDA (if needed)
- Linux
- Windows
- macOS
PyTorch with CUDA
If PyTorch doesn’t detect CUDA, reinstall with CUDA support:The version in
comb_requirements.txt is torch==2.0.0. Match your CUDA version accordingly.VRAM Configuration Details
LOW_VRAM = True (Default)
- LlaSMol model is not loaded into memory
- All queries processed using GPT-4o via OpenAI API
- Minimal VRAM usage (~2-4GB for basic operations)
- Faster startup time
- Requires OpenAI API credits
LOW_VRAM = False (Cluster/High-VRAM)
Behavior:- LlaSMol-Mistral-7B model loaded from Hugging Face Hub
- First run downloads ~14GB model weights
- Model loaded onto CUDA device
- Local inference for chemistry tasks
- Requires ≥15GB VRAM
- No additional API costs for chemistry queries
Dependency Files Explained
agent_requirements.txt
Purpose: Core agent dependencies for LangGraph and LangChain orchestration.comb_requirements.txt
Purpose: Comprehensive chemistry, ML, and scientific computing dependencies. Key packages:- rdkit==2024.3.6 - Chemistry library for molecular operations
- torch==2.0.0 - Deep learning framework
- transformers==4.34.1 - Hugging Face transformers
- PubChemPy==1.0.4 - PubChem API for RAG
- openai - GPT-4o API client
- gradio==3.47.1 - Optional UI framework
- peft==0.7.0 - Parameter-efficient fine-tuning
- accelerate==0.24.1 - Distributed training utilities
Virtual Environment Setup (Recommended)
Using venv
Using conda
Troubleshooting Installation
Import Error: No module named 'dotenv'
Import Error: No module named 'dotenv'
Problem:
python-dotenv not installed.Solution:CUDA out of memory error
CUDA out of memory error
Problem: Insufficient VRAM for LlaSMol model.Solution: Set
LOW_VRAM=True in plan_execute_agent/config.py:RDKit import error
RDKit import error
Problem: RDKit installation failed or corrupted.Solution:
LangGraph version error
LangGraph version error
Problem: Incorrect LangGraph version installed.Solution: LangGraph 0.2.55 is specifically required:
OpenAI API authentication error
OpenAI API authentication error
Problem: Missing or invalid
OPENAI_API_KEY.Solution:- Verify
.envfile exists in the project root - Check that the key is correctly formatted:
- Ensure no extra spaces or quotes around the key
- Verify your API key is active at platform.openai.com
spaCy model not found
spaCy model not found
Problem: English language model not downloaded.Solution:
Torch not compiled with CUDA
Torch not compiled with CUDA
Problem: CPU-only PyTorch installed instead of CUDA version.Solution: Reinstall PyTorch with CUDA support:
Hugging Face model download timeout
Hugging Face model download timeout
Problem: Slow connection or firewall blocking Hugging Face Hub.Solution:
- Use a VPN or faster connection
- Download manually:
- Or use
LOW_VRAM=Trueto skip model download
Testing Your Installation
Quick Test Suite
Run the test suite:
Next Steps
Quickstart Guide
Learn how to run your first chemistry query
Configuration
Customize ChemAgent for your use case
API Reference
Explore all available functions and tools
Guides
See more examples and use cases
Additional Resources
- RDKit Documentation: rdkit.org/docs
- LangGraph Guide: langchain-ai.github.io/langgraph
- OpenAI API Docs: platform.openai.com/docs
- PubChem API: pubchem.ncbi.nlm.nih.gov/docs/pug-rest