Skip to main content

Overview

The answer_chemistry_query tool leverages the LlaSMol-Mistral-7B model to answer chemistry-related queries. It handles name conversions, property predictions, molecule descriptions, synthesis planning, and more. Note: This tool requires properly tagged input from structure_chem_prompt and cannot be used when LOW_VRAM mode is enabled.

Function Signature

@tool
def answer_chemistry_query(query: str) -> str:
    """
    Answer a chemistry-related query, handling requests for name conversions,
    property predictions, molecule descriptions, and more.

    The ONLY supported queries are:
    - Name Conversion: IUPAC to Molecular Formula, IUPAC to SMILES, SMILES to IUPAC, SMILES to Molecular Formula.
    - Property Prediction: Solubility, LIPO, BBBP (blood-brain barrier permeability), Clintox (toxic), HIV, Side Effects.
    - Molecule Captioning.
    - Molecule Generation.
    - Chemical Reaction Forward Synthesis.
    - Chemical Reaction Retrosynthesis
    """

Parameters

query
str
required
The chemistry-related query string with properly tagged chemical identifiers using <SMILES> and <IUPAC> tags.

Response

response
str
The generated response containing the requested chemical information, property prediction, or synthesis plan.
On error, returns a string: "Error generating response: [error details]"

Supported Query Types

LlaSMol supports 14 distinct chemistry task types organized into categories:

1. Name Conversion (4 tasks)

IUPAC to Molecular Formula
conversion
Convert an IUPAC name to its molecular formula.Example: "What is the molecular formula of <IUPAC> 2,5-diphenyl-1,3-oxazole </IUPAC>?"
IUPAC to SMILES
conversion
Convert an IUPAC name to SMILES notation.Example: "Please provide the SMILES representation for <IUPAC> 4-ethyl-4-methyloxolan-2-one </IUPAC>."
SMILES to IUPAC
conversion
Convert SMILES notation to IUPAC name.Example: "Can you tell me the IUPAC name of <SMILES> C1CCOC1 </SMILES>?"
SMILES to Molecular Formula
conversion
Convert SMILES notation to molecular formula.Example: "What is the molecular formula for <SMILES> S=P1(N(CCCl)CCCl)NCCCO1 </SMILES>?"

2. Property Prediction (6 tasks)

Solubility (ESOL)
property
Predict aqueous solubility of a molecule.Example: "How soluble is <SMILES> CC(C)Cl </SMILES>?"
Lipophilicity (LIPO)
property
Predict lipophilicity (octanol-water partition coefficient) of a molecule.Example: "What is the lipophilicity of <SMILES> CCO </SMILES>?"
Blood-Brain Barrier Permeability (BBBP)
property
Predict whether a molecule can penetrate the blood-brain barrier.Example: "Can <SMILES> CC(C)Cc1ccc(cc1)C(C)C(=O)O </SMILES> cross the blood-brain barrier?"
Clinical Toxicity (Clintox)
property
Predict clinical toxicity of a molecule.Example: "Is <SMILES> CN1C=NC2=C1C(=O)N(C(=O)N2C)C </SMILES> toxic?"
HIV Inhibition
property
Predict HIV replication inhibition activity.Example: "Does <SMILES> CC(=O)Nc1ccc(O)cc1 </SMILES> inhibit HIV?"
Side Effects
property
Predict potential side effects of a molecule.Example: "What are the side effects of <SMILES> CC(C)NCC(COc1ccccc1)O </SMILES>?"

3. Molecule Description (2 tasks)

Molecule Captioning
description
Generate a natural language description of a molecule’s structure and properties.Example: "Describe this molecule: <SMILES> CCOC(=O)C1=CN=CN1[C@H](C)C1=CC=CC=C1 </SMILES>"
Molecule Generation
description
Generate SMILES for a molecule matching a natural language description.Example: "Generate a molecule that is a beta-blocker with moderate lipophilicity."

4. Reaction Prediction (2 tasks)

Forward Synthesis
reaction
Predict reaction products from given reactants.Example: "What are the products of the reaction between <SMILES> CCO </SMILES> and <SMILES> CC(=O)O </SMILES>?"
Retrosynthesis
reaction
Predict reactants needed to synthesize a target molecule.Example: "What reactants are needed to synthesize <SMILES> CC(=O)OCC </SMILES>?"

LlaSMol Integration

The tool uses the LLM4Chem package with the following configuration:
from LLM4Chem.generation import LlaSMolGeneration

generator = LlaSMolGeneration("osunlp/LlaSMol-Mistral-7B", device="cuda")

Model Details

  • Model: osunlp/LlaSMol-Mistral-7B
  • Device: CUDA (GPU required)
  • Framework: LLM4Chem generation pipeline

Usage Examples

Example 1: SMILES to IUPAC Conversion

Query:
Can you tell me the IUPAC name of <SMILES> C1CCOC1 </SMILES>?
Response:
oxolane

Example 2: Molecule Description

Query:
Describe this molecule: <SMILES> CCOC(=O)C1=CN=CN1[C@H](C)C1=CC=CC=C1 </SMILES>
Response:
This molecule is an ethyl ester derivative containing an imidazole ring. It features a chiral center with S-configuration and a phenyl substituent. The compound exhibits both aromatic and ester functional groups.

Example 3: Solubility Prediction

Query:
How soluble is <SMILES> CC(C)Cl </SMILES>?
Response:
The predicted solubility (ESOL) is -1.23 log(mol/L), indicating moderate solubility in water.

System Requirements

GPU and VRAM

This tool requires:
  • CUDA-capable GPU
  • Sufficient VRAM to load LlaSMol-Mistral-7B (~14GB recommended)

LOW_VRAM Mode

When LOW_VRAM=True in the configuration:
if LOW_VRAM:
    raise RuntimeError(
        "answer_chemistry_query tool cannot be used with LOW_VRAM enabled."
    )
The tool will raise a runtime error and return:
LlaSmol model unused. Low VRAM enabled.

Response Tracking

The tool stores responses in the llasmol_response module:
llasmol_response.formatted_input  # Stores the tagged input query
llasmol_response.model_response   # Stores the full model response
This allows other components to access the query history and model outputs.

Error Handling

The tool handles several error conditions:
  1. LOW_VRAM enabled: Raises RuntimeError
  2. Generation failure: Returns "Error generating response: [error details]"
  3. Invalid query format: May produce unexpected results if tags are missing

Best Practices

  1. Always preprocess queries: Use structure_chem_prompt first to tag chemical identifiers
  2. Validate SMILES: Consider using validate_smiles_rdkit to check SMILES validity before querying
  3. Check VRAM: Ensure sufficient GPU memory is available
  4. Use specific queries: LlaSMol performs best with clear, specific questions matching the supported task types

Workflow Example

# Step 1: Structure the input
structured = structure_chem_prompt(
    "What is the molecular formula of 2,5-diphenyl-1,3-oxazole?"
)

# Step 2: Answer the chemistry query
if not LOW_VRAM:
    result = answer_chemistry_query(structured["new_prompt"])
    print(result)  # Output: C15H11NO