ImmunoVerse-Chat | Yarmarkovich Lab

How it works

From question to validated answer in three steps.

chat

Ask in plain English

"Top targets for neuroblastoma," "verify QYNPIRTTF in IEDB," "fetch 8EK5 and analyze binding." No SQL, no JSON, no manual joins.

manufacturing

Agent dispatches the right tools

An LLM orchestrates 60+ tools — atlas filtering, CEDAR & IEDB epitope evidence, Human Protein Atlas expression/safety, ClinicalTrials.gov, RCSB PDB search/fetch, 3D interaction scoring, TesOrAI cross-ref.

verified

Every number is validated

Numbers are pre-computed in Python, not invented by the LLM. A post-response checker cross-references every figure against tool output and regenerates if anything drifts.

A real exchange

This is what an answer looks like.

No prose hallucinations. Every score, every sample count, every safety flag traces back to deterministic Python.

person

Show me highly ranked peptides for neuroblastoma that are tumor-restricted and have published evidence.

smart_toy

Searching ImmunoVerse atlas → cross-checking IEDB for published evidence → validating safety against HLA Ligand Atlas…

Top tumor-specific candidates (NBL):
1. QYNPIRTTF (PHOX2B)  Score 15.30  TSI 6.09  4/16 samples
2. SLLQHLIGL (MYCN)    Score 11.42  TSI 4.85  6/16 samples
3. AALLGLLAL (GPR133)  Score  9.18  TSI 3.72  3/16 samples

PHOX2B HLA-A*24:02 QYNPIRTTF IEDB ID 1267996

QYNPIRTTF has 37 IEDB assays, 5 published papers, and PDB structures 7MJA, 8EK5 available. Want me to fetch 8EK5 and run a binding analysis?

→ Fetch 8EK5 and analyze binding → Clinical trials for PHOX2B → Export to CSV

Try it yourself arrow_forward

neurology

Natural-language discovery

Move from complex SQL queries and CSV manipulation to conversational data exploration without losing technical rigor.

verified_user

Anti-hallucination by design

Numbers are computed deterministically in Python and passed to the LLM as structured facts. Every response is cross-checked before the user sees it.

biotech

Bench-to-structure

Bridge identified targets to RCSB structures, 3D binding-interaction scoring, IEDB epitope evidence, and ClinicalTrials.gov — in a single conversation.

Capabilities

Built for end-to-end target discovery.

From atlas filtering all the way to a validated 3D binding analysis — without leaving the chat.

top_targets filter_peptides composite_score

Therapeutic ranking

Composite scoring across tumor specificity (TSI), sample prevalence, MS evidence, surface abundance, normal-tissue safety, essentiality, and clinical-stage bonuses. Every score is interpretable down to its components.

QYNPIRTTF (PHOX2B, NBL) Score 15.30

TSI

6.09

Prevalence

4/16

MS confidence

0.71

Safety

caution

Tumor-specific search

search_tumor_specific

stacks

Dual HLA ranking

population + affinity

link

CEDAR & IEDB epitopes

Cancer-focused CEDAR plus broader IEDB — neoantigen/germline flags, curated mutations, T-cell & elution assays, PubMed refs.

lookup_cedar_epitope lookup_epitope_evidence lookup_iedb_epitope

clinical_notes

Clinical trials

search_clinical_trials

grid_view

Human Protein Atlas · target safety

Normal-tissue expression for off-tumor toxicity, TCGA cancer prognostics, subcellular localization (cell-surface vs intracellular), and immune-cell expression — layered on the NeoVerse profile.

hpa_normal_tissue_expression hpa_cancer_prognostics hpa_subcellular_location hpa_gene_summary

filter_center_focus

Subtype-restricted targets

Surface "good but diluted" targets (e.g. ASCL1 in neuroendocrine) buried by cohort-wide scoring, then confirm with per-sample RNA.

find_focal_targets subtype_restricted_targets

view_in_ar

RCSB PDB & 3D analysis

Search and fetch crystal structures from RCSB. Score salt bridges, H-bonds, pi-pi, cation-pi, hydrophobic, and disulfide contacts. Render in 3Dmol.js with one-letter labels and distance dashes.

search_rcsb_structures analyze_pdb_comprehensive batch_analyze_pdbs

compare_arrows

TesOrAI cross-ref

crossref_neoverse

Architecture

An agent loop that doesn't make things up.

An LLM orchestrates 60+ tools across pre-computed atlases and live external APIs. The model never invents numbers — it composes natural prose around facts produced by deterministic Python.

model_training

LLM layer · multi-model fallback

LiteLLM routes through gpt-4o-mini → gpt-4o → gemini-2.0-flash → gemini-2.5-flash → gpt-4-turbo. Dynamic max_tokens (2048/3072/4096) scales with result size.

refresh

Agent loop · anti-hallucination

generate_response() → tool calls → dispatch → results → post-response validator cross-checks every number against tool output, regenerates if mismatches are found. Trace logger writes JSONL events with code hashing for reproducibility.

databaseNeoVerse router

Peptide atlas, HLA frequencies, interpretation hints, sample coverage.

neoverse_router.py

upload_fileTesOrAI router

User uploads, batch merge, cross-ref against NeoVerse + tumor-specific datasets.

tesorai_router.py

view_in_arAIbinder · PDB

RCSB search/fetch, interaction scoring, PyMOL script generation.

aibinder_tools.py

clusterHLA typing · SLURM

OptiType + PDX disambiguation pipelines on HPC clusters.

hla_typing_workflows.py

apiCEDAR

Cancer Epitope Database — neoantigen/germline flags, curated mutations, cancer-specific T-cell & elution assays, PubMed refs.

apiIEDB

Live epitope evidence — T-cell assays, MHC alleles, qualitative results, PubMed refs.

apiHuman Protein Atlas

Normal-tissue & cancer RNA expression, TCGA prognostics, subcellular location, immune-cell expression — target-safety reference.

apiClinicalTrials.gov

Gene-level trial data. Agent explicitly distinguishes gene-level from peptide-level evidence.

apiRCSB PDB

Sequence-motif search for short peptides, metadata enrichment, on-demand downloads.

Examples

Four scenarios, one chat.

"Show me highly ranked peptides for neuroblastoma that are shared with Ewing Sarcoma but not found in normal tissues."

Searching ImmunoVerse atlas… Found 3 shared candidates. Validating against GTEx and HLA Ligand Atlas…

1. PHOX2B (QYNPIRTTF) — Score 15.30, TSI 6.09
2. MYCN (SLLQHLIGL)   — Score 11.42, TSI 4.85
3. GPR133 (AALLGLLAL) — Score  9.18, TSI 3.72

PHOX2B is the strongest candidate. Want to check MHC-I binding for HLA-A*24:02 or search RCSB for a crystal structure?

Team

Built by the Yarmarkovich Lab.

Aman Sharma

Lead Developer

Guangyuan (Frank) Li, PhD

Postdoctoral Researcher

Xinya Liu

PhD Student

Jason Tang

Postdoctoral Researcher

Michele Palamenghi

Postdoctoral Researcher

Mark Yarmarkovich

Principal Investigator

school

Yarmarkovich Lab

Advancing precision immunotherapy at NYU Langone Health.

yarmarkovichlab.com north_east

Citation

Cite our work.

BibTeX · bioRxiv

@article{Li2025ImmunoVerse,
  title  = {ImmunoVerse: A pan-cancer atlas of constitutive and
            induced antigen presentation refines immunotherapeutic
            target discovery},
  author = {Li, G. and Guzm{\'a}n-Bringas, L. and Sharma, A. and others},
  journal= {bioRxiv},
  year   = {2025},
  doi    = {10.1101/2025.01.22.634237},
  url    = {https://www.biorxiv.org/content/10.1101/2025.01.22.634237v2.full}
}

Read on bioRxiv north_east

FAQ

Common questions.

What is ImmunoVerse-Chat? expand_more

An agentic LLM interface for the ImmunoVerse pan-cancer immunopeptidome atlas. It exposes 60+ tools — peptide ranking, HLA filtering, TesOrAI cross-referencing, CEDAR & IEDB epitope lookup, Human Protein Atlas expression/safety, subtype-restricted target discovery, ClinicalTrials.gov search, RCSB PDB analysis — through natural language conversation.

How does it prevent hallucinations? expand_more

Two layers. First, all numerical interpretations are computed deterministically in Python (scores, ratios, sample coverage, safety classifications) and passed to the LLM as structured facts. Second, a post-response validator extracts every number from the model's reply and cross-checks it against tool output before the user sees the response — mismatches trigger automatic regeneration.

Is the data experimentally validated? expand_more

The NeoVerse atlas comprises antigens identified through high-throughput mass spectrometry and literature mining, each assigned a technical confidence score. The agent additionally pulls live evidence from CEDAR, IEDB, the Human Protein Atlas, and ClinicalTrials.gov.

What does the therapeutic score weigh? expand_more

A composite signal combining tumor-specificity, sample prevalence, MS evidence, surface abundance, essentiality, and a normal-tissue safety penalty, with bonuses for cross-cancer breadth and clinical-stage validation. Each component is visible in the chat so you can see which factors drove a given peptide's rank. Methodology details are described in the preprint.

Can I upload my own data? expand_more

Yes — TesOrAI mass-spec output (TSV/CSV/XLSX), PDB structures, and PyMOL scripts via the paperclip button. Up to 30 files per batch with auto-merge. Files can also be loaded from Dropbox, Google Drive, or any HTTP URL.

Ask. Discover. Validate.

From question to validated answer in three steps.

Ask in plain English

Agent dispatches the right tools

Every number is validated

This is what an answer looks like.

Natural-language discovery

Anti-hallucination by design

Bench-to-structure

Built for end-to-end target discovery.

Therapeutic ranking

Tumor-specific search

Dual HLA ranking

CEDAR & IEDB epitopes

Clinical trials

Human Protein Atlas · target safety

Subtype-restricted targets

RCSB PDB & 3D analysis

TesOrAI cross-ref

An agent loop that doesn't make things up.

LLM layer · multi-model fallback

Agent loop · anti-hallucination

Four scenarios, one chat.

Built by the Yarmarkovich Lab.

Cite our work.

Common questions.

Start your first target query.