NOTEBOOK 03 ← Previous

Clonal Expansion & Dominance Analysis

TCR Repertoire Analysis — Notebook 03: Top expanded clonotypes, cumulative dominance, and shared clone tracking
Joshua Luthy R + immunarch + edgeR Synthetic Data 2026
Contents
  1. Analysis Overview
  2. Top Expanded Clonotypes
  3. Clonal Dominance Patterns
  4. Shared Clonotype Analysis
  5. Conclusions & Next Steps

01 Analysis Overview

This notebook examines clonal expansion patterns in the manufactured Product, identifying the most dominant clonotypes and characterizing the degree of oligoclonality. We assess whether specific clones dominate the Product and how the top clone fraction varies with clinical response.

# Identify top expanded clonotypes per sample top_clones <- tcr_data %>% filter(sample_type == "Product") %>% group_by(patient_id) %>% arrange(desc(clone_fraction)) %>% mutate(rank = row_number()) %>% filter(rank <= 20)

02 Top Expanded Clonotypes

Below are the top 10 expanded clonotypes in the Product for patient PT-001 (CR). Note the extreme dominance of the top clone, which comprises nearly 9% of all reads.

RankCDR3 SequenceV GeneJ GeneCountFraction
1 CDWVSYQFTKRRF TRBV7-9TRBJ1-1 30,790 42.46%
2 CMGDVHRMPPGLMF TRBV29-1TRBJ2-1 5,447 7.51%
3 CYSAQCWFMKLMYEF TRBV29-1TRBJ2-1 1,988 2.74%
4 CLICPRQLPLMVNKLF TRBV10-3TRBJ1-2 980 1.35%
5 CETHNLRSMQAPQSIQVF TRBV12-4TRBJ1-6 565 0.78%
6 CVIYIPQNADGMSHIGF TRBV6-5TRBJ2-5 368 0.51%
7 CINEITGPPDMNGIVF TRBV18TRBJ1-5 239 0.33%
8 CWMWNDKGQWESRWEWIF TRBV15TRBJ2-4 185 0.26%
9 CEWFQHPQDVWDRIAF TRBV30TRBJ2-3 128 0.18%
10 CLGIVHPSSGGAHPVVF TRBV27TRBJ1-1 107 0.15%

03 Clonal Dominance Patterns

We quantify clonal dominance by examining the fraction of total reads held by the top 1, top 5, and top 10 clonotypes in each Product sample.

Figure 1. Cumulative read fraction captured by the top N clonotypes in Product samples, stratified by clinical response. CR patients show the steepest accumulation curves.
# Compute cumulative clone fraction for top clonotypes cumulative <- top_clones %>% group_by(patient_id) %>% arrange(rank) %>% mutate(cum_fraction = cumsum(clone_fraction)) # Plot cumulative curves ggplot(cumulative, aes(x = rank, y = cum_fraction, color = clinical_response, group = patient_id)) + geom_line(linewidth = 1.2) + geom_point(size = 2) + scale_color_manual(values = c("CR" = "#00ff9d", "PR" = "#00d4ff", "PD" = "#ff6b6b")) + labs(x = "Clonotype Rank", y = "Cumulative Fraction") + theme_minimal()
Finding

In CR patients, the top 10 clonotypes account for 30–45% of all Product reads, indicating highly focused expansion. PD patients show a flatter accumulation curve, where the top 10 clones capture only 15–25% of reads — consistent with less efficient clonal selection during manufacturing.

04 Shared Clonotype Analysis

We examine the overlap between Apheresis and Product repertoires to determine what fraction of the Product's dominant clones were detectable in the starting material.

# Identify shared clonotypes between Apheresis and Product shared <- tcr_data %>% select(patient_id, sample_type, cdr3_aa, clone_fraction) %>% pivot_wider(names_from = sample_type, values_from = clone_fraction, values_fn = list) %>% filter(map_lgl(Apheresis, ~ !is.null(.x)), map_lgl(Product, ~ !is.null(.x)))
Figure 2. Proportion of Product clonotypes that were also detected in the matched Apheresis sample. The majority of expanded Product clones originate from the starting material.

05 Conclusions & Next Steps

Summary of Findings

1. Manufacturing drives extreme oligoclonal expansion — top 10 Product clones capture 15–45% of reads depending on patient.

2. CR patients show the most concentrated Product repertoires, suggesting efficient clonal selection is therapeutically beneficial.

3. The majority of dominant Product clones are traceable back to the Apheresis starting material.

4. PD patients exhibit more diffuse Product repertoires with less dominant clonal expansion.

Next Steps