Skip to main content

Welcome to ChemAgent

ChemAgent is a sophisticated AI-powered chemistry assistant that combines the power of large language models (LLMs), RDKit chemical informatics, and retrieval-augmented generation (RAG) to solve complex molecular tasks. Built on the LlaSMol fine-tuned models and a plan-and-execute agent architecture, ChemAgent handles chemistry queries with unprecedented accuracy and intelligence.

Quick Start

Get up and running with ChemAgent in minutes

Installation

Set up your environment and install dependencies

Architecture

Understand how ChemAgent works under the hood

API Reference

Explore the complete API documentation

Key Capabilities

Convert between chemical naming conventions with ease:
  • IUPAC ↔ SMILES
  • SMILES ↔ Molecular Formula
  • IUPAC ↔ Molecular Formula
ChemAgent automatically canonicalizes SMILES strings and validates chemical structures.
Predict molecular properties using fine-tuned chemistry models:
  • Solubility (ESOL)
  • Lipophilicity (LIPO)
  • Blood-brain barrier permeability (BBBP)
  • Toxicity (ClinTox)
  • HIV inhibition
  • Side effects (SIDER)
Generate and analyze molecular structures:
  • Molecule captioning and description
  • Structure generation from text descriptions
  • SMILES validation with detailed error reporting
  • Chemistry parser with validity vectors
Plan and analyze chemical reactions:
  • Forward synthesis prediction
  • Retrosynthesis pathway planning
  • Reaction validation
Extract chemical information from images using GPT-4o:
  • Molecular structure recognition
  • Chemical formula extraction
  • Integration with chemistry queries
Enhance queries with contextual information:
  • Automatic PubChem knowledge retrieval
  • Chemistry term identification
  • Context-aware analysis

How It Works

ChemAgent uses a sophisticated plan-and-execute architecture built with LangGraph:
1

Query Planning

The agent analyzes your chemistry question and creates a step-by-step execution plan using GPT-4o.
2

Tool Execution

Each step is executed using specialized chemistry tools:
  • structure_chem_prompt for tagging IUPAC/SMILES
  • answer_chemistry_query for LlaSMol inference
  • validate_smiles_rdkit for RDKit validation
3

Replanning

The agent evaluates results and replans if needed, adapting to validation errors or incomplete information.
4

Final Response

Returns validated chemistry answers with detailed error reporting.

Supported Models

ChemAgent leverages the LlaSMol family of fine-tuned chemistry models:
  • LlaSMol-Mistral-7B (default) — Best overall performance
  • LlaSMol-Llama2-7B — Alternative base model
  • LlaSMol-CodeLlama-7B — Code-optimized variant
  • LlaSMol-Galactica-6.7B — Compact model option
All models are trained on SMolInstruct, a comprehensive chemistry instruction tuning dataset covering 14 essential chemistry tasks.
LlaSMol models require a minimum of 15GB VRAM. For lower VRAM systems, set LOW_VRAM=True in the configuration to disable model loading and use external API calls only.

Architecture Highlights

Plan-and-Execute Agent

LangGraph-based orchestration with GPT-4o planning and replanning capabilities

LlaSMol Models

Fine-tuned 7B parameter models specialized for chemistry understanding

RDKit Integration

Robust SMILES validation and molecular structure verification

Optional RAG

PubChem knowledge retrieval for enhanced context

Next Steps

Quickstart

Run your first chemistry query

Core Concepts

Learn the fundamentals

Guides

Explore use cases