NOTEBOOK 03 ← Previous

Clonal Expansion & Dominance Analysis

TCR Repertoire Analysis — Notebook 03: Top expanded clonotypes, cumulative dominance, and shared clone tracking

Joshua Luthy R + immunarch + edgeR Synthetic Data 2026

Contents

Analysis Overview
Top Expanded Clonotypes
Clonal Dominance Patterns
Shared Clonotype Analysis
Conclusions & Next Steps

01 Analysis Overview

This notebook examines clonal expansion patterns in the manufactured Product, identifying the most dominant clonotypes and characterizing the degree of oligoclonality. We assess whether specific clones dominate the Product and how the top clone fraction varies with clinical response.

# Identify top expanded clonotypes per sample
top_clones <- tcr_data %>%
  filter(sample_type == "Product") %>%
  group_by(patient_id) %>%
  arrange(desc(clone_fraction)) %>%
  mutate(rank = row_number()) %>%
  filter(rank <= 20)
    

02 Top Expanded Clonotypes

Below are the top 10 expanded clonotypes in the Product for patient PT-001 (CR). Note the extreme dominance of the top clone, which comprises nearly 9% of all reads.

Rank	CDR3 Sequence	V Gene	J Gene	Count	Fraction
1	`CDWVSYQFTKRRF`	TRBV7-9	TRBJ1-1	30,790	42.46%
2	`CMGDVHRMPPGLMF`	TRBV29-1	TRBJ2-1	5,447	7.51%
3	`CYSAQCWFMKLMYEF`	TRBV29-1	TRBJ2-1	1,988	2.74%
4	`CLICPRQLPLMVNKLF`	TRBV10-3	TRBJ1-2	980	1.35%
5	`CETHNLRSMQAPQSIQVF`	TRBV12-4	TRBJ1-6	565	0.78%
6	`CVIYIPQNADGMSHIGF`	TRBV6-5	TRBJ2-5	368	0.51%
7	`CINEITGPPDMNGIVF`	TRBV18	TRBJ1-5	239	0.33%
8	`CWMWNDKGQWESRWEWIF`	TRBV15	TRBJ2-4	185	0.26%
9	`CEWFQHPQDVWDRIAF`	TRBV30	TRBJ2-3	128	0.18%
10	`CLGIVHPSSGGAHPVVF`	TRBV27	TRBJ1-1	107	0.15%

03 Clonal Dominance Patterns

We quantify clonal dominance by examining the fraction of total reads held by the top 1, top 5, and top 10 clonotypes in each Product sample.

Figure 1. Cumulative read fraction captured by the top N clonotypes in Product samples, stratified by clinical response. CR patients show the steepest accumulation curves.

# Compute cumulative clone fraction for top clonotypes
cumulative <- top_clones %>%
  group_by(patient_id) %>%
  arrange(rank) %>%
  mutate(cum_fraction = cumsum(clone_fraction))

# Plot cumulative curves
ggplot(cumulative, aes(x = rank, y = cum_fraction,
                        color = clinical_response, group = patient_id)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 2) +
  scale_color_manual(values = c("CR" = "#00ff9d", "PR" = "#00d4ff", "PD" = "#ff6b6b")) +
  labs(x = "Clonotype Rank", y = "Cumulative Fraction") +
  theme_minimal()
    

Finding

In CR patients, the top 10 clonotypes account for 30–45% of all Product reads, indicating highly focused expansion. PD patients show a flatter accumulation curve, where the top 10 clones capture only 15–25% of reads — consistent with less efficient clonal selection during manufacturing.

04 Shared Clonotype Analysis

We examine the overlap between Apheresis and Product repertoires to determine what fraction of the Product's dominant clones were detectable in the starting material.

# Identify shared clonotypes between Apheresis and Product
shared <- tcr_data %>%
  select(patient_id, sample_type, cdr3_aa, clone_fraction) %>%
  pivot_wider(names_from = sample_type, values_from = clone_fraction,
              values_fn = list) %>%
  filter(map_lgl(Apheresis, ~ !is.null(.x)),
         map_lgl(Product, ~ !is.null(.x)))
    

Figure 2. Proportion of Product clonotypes that were also detected in the matched Apheresis sample. The majority of expanded Product clones originate from the starting material.

05 Conclusions & Next Steps

Summary of Findings

1. Manufacturing drives extreme oligoclonal expansion — top 10 Product clones capture 15–45% of reads depending on patient.

2. CR patients show the most concentrated Product repertoires, suggesting efficient clonal selection is therapeutically beneficial.

3. The majority of dominant Product clones are traceable back to the Apheresis starting material.

4. PD patients exhibit more diffuse Product repertoires with less dominant clonal expansion.

Next Steps

Apply edgeR differential abundance framework to formally identify significantly expanding clonotypes (FDR-controlled).
Compute AUC-based clonal kinetics if longitudinal timepoints are available.
Investigate V gene segment bias in expanded vs. non-expanded clonotypes.
Correlate clonal expansion metrics with additional clinical endpoints.