> ## Documentation Index
> Fetch the complete documentation index at: https://mintlify.com/pranavkrishnasuresh/chemAgent/llms.txt
> Use this file to discover all available pages before exploring further.

# structure_chem_prompt

> Structure and tag chemical information in queries for LlaSMol preprocessing

## Overview

The `structure_chem_prompt` tool formats chemistry queries by adding appropriate XML tags around SMILES and IUPAC chemical identifiers. This preprocessing step ensures that the `answer_chemistry_query` tool can correctly parse and process chemical information.

**IMPORTANT**: Always pass the output of this tool directly to the `answer_chemistry_query` tool.

## Function Signature

```python theme={null}
@tool
def structure_chem_prompt(original_prompt: str) -> dict:
    """Structure and tag IUPAC or SMILES chemical information for preprocessing the input query.
    PASS THE OUTPUT OF THIS TOOL DIRECTLY TO THE 'answer_chemistry_query' Tool!!
    """
```

## Parameters

<ParamField path="original_prompt" type="str" required>
  The unstructured chemistry query containing SMILES representations or IUPAC names that need to be tagged.
</ParamField>

## Response

<ResponseField name="new_prompt" type="str">
  The formatted query with chemical information wrapped in `<SMILES>` and `<IUPAC>` tags.
</ResponseField>

<ResponseField name="error" type="str">
  Error message if the structuring process fails. Only present when an error occurs.
</ResponseField>

## How It Works

The tool uses GPT-4o with structured outputs to identify and tag chemical information based on the following system prompt:

```python theme={null}
SYSTEM_TAG_PROMPT = """
You are an EXPERT chemical information tagger. I will give you an INPUT QUERY and your task is to format it based on the information below.
You MUST return ONLY the formatted input query!!
When processing chemical information, use only two tags in the input query: <SMILES> for SMILES representations and <IUPAC> for IUPAC names. The input should include only <SMILES> and <IUPAC> tags ONLY when needed to mark chemical information.

Tag Definitions:
SMILES: <SMILES> ... </SMILES> for chemical structure in SMILES notation.
IUPAC: <IUPAC> ... </IUPAC> for the IUPAC name of the compound.

Instructions:
1. In the input query, use only the <SMILES> and <IUPAC> tags to wrap the SMILES representation or IUPAC name.
2. Ensure no extra characters or spaces are present within the tags.
"""
```

## Tag Definitions

### SMILES Tag

```xml theme={null}
<SMILES> ... </SMILES>
```

Wraps chemical structures in SMILES notation. Ensures no extra characters or spaces within the tags.

### IUPAC Tag

```xml theme={null}
<IUPAC> ... </IUPAC>
```

Wraps IUPAC names of compounds. Maintains exact naming without modifications.

## Examples

### Example 1: IUPAC to Molecular Formula

**Input:**

```
What is the molecular formula of 2,5-diphenyl-1,3-oxazole?
```

**Output:**

```json theme={null}
{
  "new_prompt": "What is the molecular formula of <IUPAC> 2,5-diphenyl-1,3-oxazole </IUPAC>?"
}
```

### Example 2: IUPAC to SMILES

**Input:**

```
Please provide the SMILES representation for 4-ethyl-4-methyloxolan-2-one.
```

**Output:**

```json theme={null}
{
  "new_prompt": "Please provide the SMILES representation for <IUPAC> 4-ethyl-4-methyloxolan-2-one </IUPAC>."
}
```

### Example 3: SMILES to Molecular Formula

**Input:**

```
What is the molecular formula for S=P1(N(CCCl)CCCl)NCCCO1?
```

**Output:**

```json theme={null}
{
  "new_prompt": "What is the molecular formula for <SMILES> S=P1(N(CCCl)CCCl)NCCCO1 </SMILES>?"
}
```

### Example 4: SMILES to IUPAC

**Input:**

```
Translate CCC(C)C1CNCCCNC1 to its IUPAC name.
```

**Output:**

```json theme={null}
{
  "new_prompt": "Translate <SMILES> CCC(C)C1CNCCCNC1 </SMILES> to its IUPAC name."
}
```

### Example 5: Property Prediction (ESOL Solubility)

**Input:**

```
How soluble is CC(C)Cl?
```

**Output:**

```json theme={null}
{
  "new_prompt": "How soluble is <SMILES> CC(C)Cl </SMILES>?"
}
```

### Example 6: Multiple Chemical Identifiers

**Input:**

```
What is the molecular formula of the compound with this IUPAC name 2,5-diphenyl-1,3-oxazole and what is the name of C1CCOC1?
```

**Output:**

```json theme={null}
{
  "new_prompt": "What is the molecular formula of the compound with this IUPAC name <IUPAC> 2,5-diphenyl-1,3-oxazole </IUPAC> and what is the name of <SMILES> C1CCOC1 </SMILES>?"
}
```

## Implementation Details

* **Model**: GPT-4o with structured outputs
* **Temperature**: 0 (deterministic)
* **Max Tokens**: 1024
* **Response Format**: Uses Pydantic `StructuredPrompt` model

## Error Handling

If an error occurs during the structuring process, the tool returns:

```json theme={null}
{
  "error": "Error generating structured prompt: [error details]"
}
```

## Workflow Integration

Typical workflow:

1. User provides unstructured chemistry query
2. `structure_chem_prompt` tags chemical identifiers
3. Tagged output passed to `answer_chemistry_query`
4. LlaSMol processes the structured query

```python theme={null}
# Step 1: Structure the prompt
structured = structure_chem_prompt("What is the IUPAC name of C1CCOC1?")

# Step 2: Pass to chemistry query tool
result = answer_chemistry_query(structured["new_prompt"])
```
