comp-011: Cassette Compatibility — C. utilis Uricase + Lactoferrin in A. oryzae¶
Question: Does the C. utilis uricase (P78609) + lactoferrin (P02788) payload pair have any cassette-design-specific issues — codon collisions, KEX2 site geometry problems, or secretion-pathway burden — that the Ward 1995 glucoamylase-KEX2 architecture will not handle out of the box in A. oryzae?
Verdict: MODERATE overall cassette-design risk — not a fundamental incompatibility; three manageable design requirements vs. comp-010 (A. flavus, LOW)
Three material differences from comp-010 (A. flavus): 1. Codon burden HEAVY (C. utilis AT-biased, GC~42% vs. A. oryzae ~54%) — full gene synthesis optimization required (same as lactoferrin) 2. 4 free cysteines (vs. 0 in A. flavus) — ER aggregation risk during secretion; monitor by non-reducing SDS-PAGE 3. 2 internal KR sites (positions 130 and 138, both HIGH) vs. 1 in A. flavus — non-load-bearing in direct-secretion design
Follow-on to: comp-010 (A. flavus uricase Q00511 + lactoferrin, LOW risk)
Informs: validation-experiments.md §1.9 — Ward 1995 dual-cassette feasibility test
Interpretive wiki page: wiki/c-utilis-uricase-cassette-compatibility-computational.md
Accession note: P15296 (originally cited as likely C. utilis uricase) is a reassigned accession returning a Drosophila protein. Correct canonical entry: P78609 (URIC_CYBJA, Cyberlindnera jadinii) — verified by taxon search.
How to reproduce¶
No external packages required (stdlib only: json, math, pathlib). Outputs land in outputs/.
File index¶
comp-011-c-utilis-uricase-cassette-compatibility/
analyze.py ← analysis script (run this)
inputs/
P78609.fasta ← C. utilis uricase P78609 (UniProt, 303 aa; no signal peptide)
P02788.fasta ← human lactoferrin (UniProt, 710 aa; signal peptide aa 1-19)
glucoamylase_carrier.fasta ← A. awamori glucoamylase P69327 (Ward 1995 / Huynh 2020 carrier)
a_oryzae_codon_usage.json ← A. oryzae RIB40 codon usage (RSCU + freq/1000; Kazusa database)
kex2_site_specs.json ← KEX2 cleavage rules (KR|X, P1' preferences, KRGGG linker)
provenance.md ← sources, fetch dates, citations, accession note
outputs/ ← generated by analyze.py; committed as artifacts
cassette_analysis.json ← machine-readable full analysis
summary.md ← human-readable; cited in the wiki
README.md ← this file
Seven analyses¶
| # | Analysis | C. utilis uricase verdict | Lactoferrin verdict | comp-010 delta |
|---|---|---|---|---|
| 1 | Codon usage (CAI proxy + hotspot scan) | HEAVY (CAI proxy 0.65; AT-biased yeast vs. GC-biased host) | LOW (proxy; full codon-opt required in practice) | Material: A. flavus LOW (CAI ~1.51) |
| 2 | KEX2 site geometry | HIGH (2 sites: pos 130 P1'=I, pos 138 P1'=S) — not load-bearing (direct-secretion cassette) | MODERATE (1 moderate-risk site pos 579, P1'=K; 1 abolished pos 38, P1'=D) | Delta: A. flavus 1 site (128 HIGH) |
| 3 | Secretion-targeting signals | MODERATE (C-terminal TKL — UniProt-annotated microbody signal; weak PTS1 variant) | LOW (no routing issues) | Comparable (A. flavus SKL — same risk level) |
| 4 | Disulfide load | VERY LOW (0 disulfides) + 4 free Cys — new aggregation risk | MODERATE (17 disulfides, 1.06× Huynh 2020 baseline) | Material: A. flavus 0 Cys (no free-Cys risk) |
| 5 | N-glycosylation sites | 1 predicted (NSS at pos 54; not UniProt-annotated) | 3 predicted, all UniProt-annotated (N137, N478, N623) | Comparable |
| 6 | Combined secretion burden | MODERATE | — | comp-010 LOW |
| 7 | Comparison to comp-010 + Huynh 2020 | HARDER on codon + free-Cys axes; ALLN-346 strategic advantage | Identical to comp-010 | — |
Key results¶
| Scope | Verdict | Primary driver |
|---|---|---|
| Overall cassette-design risk | MODERATE | Codon burden (HEAVY) + free-Cys aggregation risk |
| C. utilis uricase cassette | MODERATE | AT-biased codon preference; 4 free Cys |
| Lactoferrin cassette | MODERATE (identical to comp-010) | 1 moderate-risk internal KR site; 17 disulfides within Huynh precedent |
| comp-010 vs. comp-011 delta | comp-010 LOW → comp-011 MODERATE | Not a fundamental incompatibility; manageable with gene synthesis + SDS-PAGE QC |
| §1.9 cassette ordering recommendation | Head-to-head: order BOTH A. flavus (comp-010 verified) AND C. utilis (codon-opt + ALLN-346 mutations) | Empirical comparison resolves platform decision at $0 additional fermentation cost |
Disagreement protocol¶
If you reproduce the outputs and disagree with the methods or numbers, file a GitHub issue referencing this folder (comp-011-c-utilis-uricase-cassette-compatibility). Primary candidates for revision: (1) C. utilis codon preferences — RSCU values used are Kazusa genome-wide approximations for related Candida/Cyberlindnera species; a species-specific table would improve precision; (2) KEX2 P1' classification (S. cerevisiae Kex2p-derived; A. oryzae kexB specificity unpublished); (3) CAI proxy burden threshold of 0.90 (introduced in comp-011 to capture the AT/GC mismatch that RSCU < 0.4 threshold misses); (4) free-Cys aggregation risk — magnitude is unknown without wet-lab data.