Chemistry Compare
POST/science/v1/chemistry/compare
What it does
Send two molecules in any supported notation — SMILES, InChI, or MDL Molfile — and find out how they relate. Powered by RDKit Morgan fingerprints and substructure matching.
Comparison modes
| Mode | Tier | What it tells you |
|---|---|---|
| identity | Free | Are these the same molecule? Compares canonical SMILES — handles atom reordering, aromatic/Kekulé differences, and notation variants. |
| similarity | Free | How similar are they? Tanimoto coefficient (0–1) on Morgan circular fingerprints, with human-readable interpretation. |
| substructure | Premium | Is molecule A inside molecule B (or vice versa)? Graph-based substructure matching for scaffold/fragment detection. |
| full | Premium | All of the above in a single request. |
Invalid molecules are not errors
If one or both molecules fail to parse, the API still returns HTTP 200 with data.parseErrors describing what went wrong — consistent with how Chemistry Validate handles invalid inputs. No comparison is performed, but the response is structured and safe for pipelines.
Why use it?
- Dedup compound databases. Identity mode tells you if two differently-written SMILES are actually the same molecule. Use it as a batch dedup step before expensive downstream processing.
- Rank drug candidates by similarity. Generate variants → compare each to a reference molecule → sort by Tanimoto score. The interpretation label ("very similar", "similar", "moderately similar", "dissimilar", etc.) is ready for reports without manual threshold logic.
- Scaffold matching. Does this compound contain a benzene ring? A sulfonamide group? Substructure mode answers fragment-in-molecule questions that regular string matching can't.
- Cross-format comparison. Compare an InChI string against a SMILES string without pre-conversion. The API normalizes both inputs before comparing.
- No RDKit installation required. Molecular fingerprinting and substructure search without C++ builds, Python bindings, or WASM setup. One HTTP call.
Examples
Identity check — same molecule, different notation
curl -X POST https://api.creatornode.io/science/v1/chemistry/compare \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_KEY" \
-d '{
"molecules": [
{ "input": "CC(=O)OC1=CC=CC=C1C(=O)O" },
{ "input": "CC(=O)Oc1ccccc1C(=O)O" }
],
"mode": "identity"
}'Response — identical
{
"success": true,
"data": {
"molecules": [
{ "canonicalSmiles": "CC(=O)Oc1ccccc1C(=O)O", "format": "smiles" },
{ "canonicalSmiles": "CC(=O)Oc1ccccc1C(=O)O", "format": "smiles" }
],
"identity": {
"identical": true
}
},
"meta": {
"requestId": "abc-123",
"processingTimeMs": 8,
"rdkitVersion": "2025.3.4"
}
}Similarity — Aspirin vs Ibuprofen
{
"molecules": [
{ "input": "CC(=O)Oc1ccccc1C(=O)O" },
{ "input": "CC(C)Cc1ccc(cc1)C(C)C(=O)O" }
],
"mode": "similarity"
}Response — similarity score
{
"success": true,
"data": {
"molecules": [
{ "canonicalSmiles": "CC(=O)Oc1ccccc1C(=O)O", "format": "smiles" },
{ "canonicalSmiles": "CC(C)Cc1ccc(C(C)C(=O)O)cc1", "format": "smiles" }
],
"similarity": {
"tanimoto": 0.29,
"fingerprintRadius": 2,
"fingerprintBits": 2048,
"interpretation": "dissimilar"
}
},
"meta": {
"requestId": "def-456",
"processingTimeMs": 10,
"rdkitVersion": "2025.3.4"
}
}Full mode — everything at once (Premium)
{
"molecules": [
{ "input": "c1ccccc1" },
{ "input": "c1ccc(cc1)O" }
],
"mode": "full"
}Returns identity, similarity, and substructure blocks in a single response. Requires a Premium API key.
Comparison modes in detail
| Mode | Algorithm | Output |
|---|---|---|
| identity | Canonical SMILES string equality | identical: true/false |
| similarity | Morgan fingerprint → Tanimoto coefficient | tanimoto: 0.0–1.0 + interpretation label |
| substructure | RDKit get_substruct_match() graph search | aInB, bInA, relationship |
| full | All three combined | All output blocks in one response |
Similarity interpretation thresholds
| Tanimoto range | Interpretation | Meaning |
|---|---|---|
| ≥ 0.95 | identical | Essentially the same structure |
| 0.85 – 0.94 | very similar | Close analogs, minor substituent differences |
| 0.70 – 0.84 | similar | Same scaffold family, moderate variation |
| 0.50 – 0.69 | moderately similar | Shared substructure elements |
| 0.30 – 0.49 | dissimilar | Limited structural overlap |
| < 0.30 | unrelated | Different molecular classes |
Fingerprint options
| Option | Type | Default | What it does |
|---|---|---|---|
fingerprintRadius | number (1–4) | 2 | Morgan fingerprint radius. Higher = more extended features captured. |
fingerprintBits | 1024 | 2048 | 2048 | Bit vector length. 2048 is standard; 1024 is faster but slightly less precise. |
💡 Auto-detection: You don't need to specify
format on each molecule. The API detects it from the input: InChI strings start with InChI=, Molfiles contain V2000/V3000, everything else is treated as SMILES.Tips & tricks
💡 Tip: For the full request/response schema, OpenAPI spec, and the interactive demo endpoint, see the API Reference.
- Let format auto-detection work for you. Both molecules default to
format: "auto". The API detects SMILES, InChI, and Molfile automatically — you can even compare an InChI against a SMILES. - Tanimoto interpretation guide. The
interpretationfield saves you from hardcoding thresholds:≥ 0.95— identical,≥ 0.85— very similar,≥ 0.7— similar,≥ 0.5— moderately similar,≥ 0.3— dissimilar,< 0.3— unrelated. - Fingerprint radius matters. The default radius of 2 is standard for drug-likeness comparisons. Increase to 3 or 4 to capture more extended molecular features — useful for distinguishing closely related analogs.
- Substructure for fragment screening. Use
"mode": "substructure"to check if a pharmacophore fragment is present in a larger molecule. Therelationshipfield tells you the direction:"a_in_b","b_in_a","mutual", or"none". - Parse errors don't crash your pipeline. If one molecule is invalid, you get a
parseErrorsarray with the index plus ahint/hintCodeto fix the input — no HTTP error, no exception, just structured data. - Input limits are in bytes, not characters. UTF-8 encoding means Molfiles use more than 1 byte per character. Free tier: 200 bytes per molecule, Premium: 4,000 bytes.
Cost & Limits
| Feature | Detail |
|---|---|
| Credit cost | 3 credits per request |
| Input formats | SMILES, InChI, MDL Molfile (V2000/V3000) |
| Free modes | identity, similarity |
| Premium modes | substructure, full |
| Engine | RDKit |
Tier Limits
| Limit | Free | Premium |
|---|---|---|
| Max input size (per molecule) | 200 bytes | 4 000 bytes |
| Identity mode | Yes | Yes |
| Similarity mode | Yes | Yes |
| Substructure mode | No | Yes |
| Full mode | No | Yes |