Yarmarkovich Lab · NYU Langone Health

Ask. Discover. Validate.

Conversational agentic-AI for pan-cancer immunotherapy target discovery.

Query the ImmunoVerse atlas of 28,446 antigens across 21 tumor types in plain English. Every claim cross-checked against IEDB, ClinicalTrials.gov, and RCSB PDB — with anti-hallucination validation built into every response.

chat Top 5 therapeutic targets for neuroblastoma arrow_forward
click to ask in chat →
28,446
Antigens
21
Tumor Types
11
Event Classes
55+
Agentic Tools
How it works

From question to validated answer in three steps.

chat
01

Ask in plain English

"Top targets for neuroblastoma," "verify QYNPIRTTF in IEDB," "fetch 8EK5 and analyze binding." No SQL, no JSON, no manual joins.

manufacturing
02

Agent dispatches the right tools

An LLM orchestrates 55+ tools — atlas filtering, IEDB API, ClinicalTrials.gov, RCSB PDB search/fetch, 3D interaction scoring, TesOrAI cross-ref.

verified
03

Every number is validated

Numbers are pre-computed in Python, not invented by the LLM. A post-response checker cross-references every figure against tool output and regenerates if anything drifts.

A real exchange

This is what an answer looks like.

No prose hallucinations. Every score, every sample count, every safety flag traces back to deterministic Python.

person
Show me highly ranked peptides for neuroblastoma that are tumor-restricted and have published evidence.
smart_toy
Searching ImmunoVerse atlas → cross-checking IEDB for published evidence → validating safety against HLA Ligand Atlas…
Top tumor-specific candidates (NBL):
1. QYNPIRTTF (PHOX2B)  Score 15.30  TSI 6.09  4/16 samples
2. SLLQHLIGL (MYCN)    Score 11.42  TSI 4.85  6/16 samples
3. AALLGLLAL (GPR133)  Score  9.18  TSI 3.72  3/16 samples
PHOX2B HLA-A*24:02 QYNPIRTTF IEDB ID 1267996
QYNPIRTTF has 37 IEDB assays, 5 published papers, and PDB structures 7MJA, 8EK5 available. Want me to fetch 8EK5 and run a binding analysis?
neurology

Natural-language discovery

Move from complex SQL queries and CSV manipulation to conversational data exploration without losing technical rigor.

verified_user

Anti-hallucination by design

Numbers are computed deterministically in Python and passed to the LLM as structured facts. Every response is cross-checked before the user sees it.

biotech

Bench-to-structure

Bridge identified targets to RCSB structures, 3D binding-interaction scoring, IEDB epitope evidence, and ClinicalTrials.gov — in a single conversation.

Capabilities

Built for end-to-end target discovery.

From atlas filtering all the way to a validated 3D binding analysis — without leaving the chat.

top_targets filter_peptides composite_score

Therapeutic ranking

Composite scoring across tumor specificity (TSI), sample prevalence, MS evidence, surface abundance, normal-tissue safety, essentiality, and clinical-stage bonuses. Every score is interpretable down to its components.

QYNPIRTTF (PHOX2B, NBL) Score 15.30
TSI
6.09
Prevalence
4/16
MS confidence
0.71
Safety
caution
search

Tumor-specific search

search_tumor_specific
stacks

Dual HLA ranking

population + affinity
link

IEDB epitope lookup

lookup_iedb_epitope
clinical_notes

Clinical trials

search_clinical_trials
view_in_ar

RCSB PDB & 3D analysis

Search and fetch crystal structures from RCSB. Score salt bridges, H-bonds, pi-pi, cation-pi, hydrophobic, and disulfide contacts. Render in 3Dmol.js with one-letter labels and distance dashes.

search_rcsb_structures analyze_pdb_comprehensive batch_analyze_pdbs
compare_arrows

TesOrAI cross-ref

crossref_neoverse
Architecture

An agent loop that doesn't make things up.

An LLM orchestrates 55+ tools across pre-computed atlases and live external APIs. The model never invents numbers — it composes natural prose around facts produced by deterministic Python.

model_training

LLM layer · multi-model fallback

LiteLLM routes through gpt-4o-mini → gpt-4o → gemini-2.0-flash → gemini-2.5-flash → gpt-4-turbo. Dynamic max_tokens (2048/3072/4096) scales with result size.

refresh

Agent loop · anti-hallucination

generate_response() → tool calls → dispatch → results → post-response validator cross-checks every number against tool output, regenerates if mismatches are found. Trace logger writes JSONL events with code hashing for reproducibility.

databaseNeoVerse router

Peptide atlas, HLA frequencies, interpretation hints, sample coverage.

neoverse_router.py
upload_fileTesOrAI router

User uploads, batch merge, cross-ref against NeoVerse + tumor-specific datasets.

tesorai_router.py
view_in_arAIbinder · PDB

RCSB search/fetch, interaction scoring, PyMOL script generation.

aibinder_tools.py
clusterHLA typing · SLURM

OptiType + PDX disambiguation pipelines on HPC clusters.

hla_typing_workflows.py
apiIEDB

Live epitope evidence — T-cell assays, MHC alleles, qualitative results, PubMed refs.

apiClinicalTrials.gov

Gene-level trial data. Agent explicitly distinguishes gene-level from peptide-level evidence.

apiRCSB PDB

Sequence-motif search for short peptides, metadata enrichment, on-demand downloads.

Examples

Four scenarios, one chat.

"Show me highly ranked peptides for neuroblastoma that are shared with Ewing Sarcoma but not found in normal tissues."

Searching ImmunoVerse atlas… Found 3 shared candidates. Validating against GTEx and HLA Ligand Atlas…

1. PHOX2B (QYNPIRTTF) — Score 15.30, TSI 6.09
2. MYCN (SLLQHLIGL)   — Score 11.42, TSI 4.85
3. GPR133 (AALLGLLAL) — Score  9.18, TSI 3.72

PHOX2B is the strongest candidate. Want to check MHC-I binding for HLA-A*24:02 or search RCSB for a crystal structure?

Team

Built by the Yarmarkovich Lab.

AS
Aman Sharma
Lead Developer
FL
Guangyuan (Frank) Li, PhD
Postdoctoral Researcher
XL
Xinya Liu
PhD Student
JT
Jason Tang
Postdoctoral Researcher
MP
Michele Palamenghi
Postdoctoral Researcher
MY
Mark Yarmarkovich
Principal Investigator
school
Yarmarkovich Lab
Advancing precision immunotherapy at NYU Langone Health.
Citation

Cite our work.

BibTeX · bioRxiv
@article{Li2025ImmunoVerse,
  title  = {ImmunoVerse: A pan-cancer atlas of constitutive and
            induced antigen presentation refines immunotherapeutic
            target discovery},
  author = {Li, G. and Guzm{\'a}n-Bringas, L. and Sharma, A. and others},
  journal= {bioRxiv},
  year   = {2025},
  doi    = {10.1101/2025.01.22.634237},
  url    = {https://www.biorxiv.org/content/10.1101/2025.01.22.634237v2.full}
}
Read on bioRxiv north_east
FAQ

Common questions.

What is ImmunoVerse-Chat? expand_more
An agentic LLM interface for the ImmunoVerse pan-cancer immunopeptidome atlas. It exposes 55+ tools — peptide ranking, HLA filtering, TesOrAI cross-referencing, IEDB lookup, ClinicalTrials.gov search, RCSB PDB analysis — through natural language conversation.
How does it prevent hallucinations? expand_more
Two layers. First, all numerical interpretations are computed deterministically in Python (scores, ratios, sample coverage, safety classifications) and passed to the LLM as structured facts. Second, a post-response validator extracts every number from the model's reply and cross-checks it against tool output before the user sees the response — mismatches trigger automatic regeneration.
Is the data experimentally validated? expand_more
The NeoVerse atlas comprises antigens identified through high-throughput mass spectrometry and literature mining, each assigned a technical confidence score. The agent additionally pulls live evidence from IEDB and ClinicalTrials.gov.
What does the therapeutic score weigh? expand_more
A composite signal combining tumor-specificity, sample prevalence, MS evidence, surface abundance, essentiality, and a normal-tissue safety penalty, with bonuses for cross-cancer breadth and clinical-stage validation. Each component is visible in the chat so you can see which factors drove a given peptide's rank. Methodology details are described in the preprint.
Can I upload my own data? expand_more
Yes — TesOrAI mass-spec output (TSV/CSV/XLSX), PDB structures, and PyMOL scripts via the paperclip button. Up to 30 files per batch with auto-merge. Files can also be loaded from Dropbox, Google Drive, or any HTTP URL.

Start your first target query.

Anti-hallucinating conversational AI for immunotherapy target discovery.

Launch ImmunoVerse-Chat arrow_forward