C. utilis Uricase Cassette Compatibility — Computational Analysis (comp-011)¶
1. Question¶
Does the C. utilis uricase (P78609) + lactoferrin (P02788) payload pair have any cassette-design-specific issues — codon collisions, KEX2 site geometry problems, or secretion-pathway burden — that the Ward 1995 glucoamylase-KEX2 fusion architecture will not handle out of the box in A. oryzae?
This is the follow-on to cassette-compatibility-computational.md (comp-010), which verified LOW cassette-design risk for A. flavus uricase (Q00511) + lactoferrin. The motivation: uricase-variant-selection.md documents that the industry's revealed preference for oral uricase programs is C. utilis, not A. flavus — three independent programs (ALLN-346, SEL-212 pegadricase, SSS11) all chose C. utilis. If the project adopts C. utilis as the §1.9 payload, the comp-010 LOW verdict does not transfer automatically. C. utilis uricase has a different sequence, different codon-usage profile (AT-biased yeast vs. GC-biased filamentous fungus), 4 free cysteines (vs. 0 in A. flavus), and 2 internal KR sites vs. 1.
comp-011 resolves this in silico using the same seven-analysis pipeline as comp-010.
2. Verdict¶
Overall cassette-design risk: MODERATE (Mechanistic Extrapolation; in silico only)
No blocking cassette-design issues identified. The C. utilis uricase (P78609) + lactoferrin (P02788) pair is architecturally compatible with the Ward 1995 / Huynh 2020 design, but carries three design requirements that are absent from the comp-010 (A. flavus) baseline:
-
Full codon optimization required for C. utilis uricase — the AT-biased yeast codon preference (GC~42%) diverges substantially from the A. oryzae host (GC~54%). CAI proxy = 0.65 vs. 1.51 for A. flavus. This means both cassettes require gene synthesis optimization (vs. only lactoferrin in the A. flavus design). Not a blocker — standard gene synthesis resolves it.
-
4 free cysteines (positions 39, 168, 250, 293) vs. 0 in A. flavus uricase. No disulfide bonds are annotated, but free thiols in the oxidizing A. oryzae ER lumen during secretion can form aberrant intermolecular disulfides. Risk: aggregation bands in secreted fraction. Mitigation: non-reducing SDS-PAGE QC.
-
2 internal KR sites (positions 130 and 138, both HIGH risk in KEX2 scoring) vs. 1 in A. flavus (position 128). Non-load-bearing in the direct-secretion cassette design (KEX2 never encounters the payload), but both require mutation if C. utilis uricase is ever moved to a fusion architecture.
comp-010 comparison: comp-010 (A. flavus) = LOW. comp-011 (C. utilis) = MODERATE. The MODERATE verdict reflects more design work required, not a fundamental incompatibility. The differences are manageable with standard gene synthesis and SDS-PAGE QC.
3. Per-Analysis Summary¶
| Analysis | C. utilis uricase verdict | Lactoferrin verdict | comp-010 delta (A. flavus) |
|---|---|---|---|
| 1. Codon usage | HEAVY (full optimization required; CAI proxy 0.65) | LOW (proxy; full opt required in practice) | Delta: A. flavus LOW (CAI proxy 1.51) |
| 2. KEX2 geometry | HIGH (2 sites, positions 130 and 138, both HIGH) — not load-bearing (direct-secretion) | MODERATE (1 site, position 579, P1'=K; 1 abolished at 38, P1'=D) | Delta: A. flavus 1 site (128 HIGH) |
| 3. Routing/secretion | MODERATE (C-terminal TKL, UniProt-annotated microbody signal) | LOW (no routing issues) | Comparable (A. flavus SKL — same risk level) |
| 4. Disulfide load | VERY LOW (0 disulfides) + 4 free Cys new risk | MODERATE (17 disulfides, 1.06× Huynh baseline) | Delta: A. flavus 0 Cys (no free-Cys risk) |
| 5. N-glycosylation | 1 predicted (NSS at pos 54; not UniProt-annotated) | 3 predicted, all UniProt-annotated (N137, N478, N623) | Comparable (A. flavus 1 predicted at pos 191) |
| 6. Dual-cassette burden | MODERATE | — | Delta: comp-010 LOW (A. flavus codon burden LOW) |
| 7. Comparison | Harder than comp-010 on codon + free-Cys axes; strategic advantage from ALLN-346 prior art | Identical to comp-010 | — |
4. Material Differences vs. comp-010 (A. flavus)¶
The platform decision (which uricase to use for §1.9) hinges on this comparison. Three differences are material:
4.1 Codon burden: HEAVY vs. LOW¶
C. utilis (Cyberlindnera jadinii) is a yeast with AT-biased codon usage (GC~42%). A. oryzae is a GC-biased filamentous ascomycete (~54% GC). The preferred codons for C. utilis (e.g., GCT for Ala, AGA for Arg, GAA for Glu) are systematically in the 0.40–0.80 RSCU range in A. oryzae — disfavored but not individually rare by the RSCU < 0.4 threshold. The aggregate effect is a CAI proxy of 0.65 (meaning the geometric mean of the RSCU values for C. utilis preferred codons in A. oryzae is 0.65 — substantially below 1.0 = neutral). In contrast, A. flavus has near-identical codon preferences to A. oryzae (both GC-biased, ~54% GC), giving a CAI proxy of ~1.51.
Practical consequence: full gene synthesis with A. oryzae codon optimization is mandatory for C. utilis uricase — the same requirement as for human lactoferrin. In the A. flavus design (comp-010), only lactoferrin required full optimization; the uricase cassette was optimization-free. With C. utilis, BOTH cassettes require gene synthesis optimization. Cost delta: ~$200–400 for one additional codon-optimized gene. Not a blocker. (Mechanistic Extrapolation)
4.2 Free cysteines: 4 vs. 0¶
C. utilis uricase (P78609) has 4 cysteine residues (positions 39, 168, 250, 293). None are annotated as disulfide-bonded in UniProt — consistent with uricase family biochemistry (Cu-independent, O2-dependent active site, no disulfide requirement). A. flavus uricase (Q00511) has 0 cysteines.
When C. utilis uricase is expressed via the secretory pathway in A. oryzae and transits the oxidizing ER lumen, these 4 free thiols are exposed to PDI/ERO1. The risk is aberrant intermolecular disulfide formation producing aggregated species. This is a new risk factor absent in the A. flavus design.
Mitigation strategy: (1) run non-reducing SDS-PAGE on secreted fractions to detect high-MW aggregation bands; (2) if aggregation is observed, use AlphaFold2 to identify solvent-exposed Cys residues and engineer Cys→Ser substitutions at surface positions. Buried cysteines are less accessible to PDI and lower risk. (Mechanistic Extrapolation; no quantitative model available for magnitude)
4.3 KEX2 internal sites: 2 vs. 1¶
C. utilis uricase has 2 internal KR dipeptides (positions 130–131: KRIT, P1'=I; positions 138–139: KRSG, P1'=S). Both I and S are in the "preferred" P1' category for KEX2 cleavage — both are scored HIGH by the canonical Rockwell 2002 / Brenner 1992 rules. The two sites are 8 residues apart in context GGEKRITDLYYKRSGD.
A. flavus has 1 internal KR site (position 128, P1'=N, HIGH).
In the proposed direct-secretion cassette design (PTEF1 or PamyB – amyB SP – uricase – TgpdA), KEX2 is not involved in uricase processing. Both sites are non-load-bearing. If C. utilis uricase is ever moved to a glucoamylase-KEX2 fusion architecture, a double KR→KQ mutation at positions 130 and 138 would be required (vs. single mutation at 128 for A. flavus). (Mechanistic Extrapolation; P1' rules from S. cerevisiae Kex2p homology)
4.4 Chaperone-load implications (cross-reference to chaperone-orthogonal stacking framework)¶
The 4 free cysteines flagged in §4.2 are not just a cassette-design QC item — they have downstream implications under the chaperone-orthogonal stacking framework. The framework's central claim is that two cassettes competing for the same ER chaperone subsystem (especially PDI/ERO1) show steep yield collapse, while orthogonal cassettes (loading different subsystems) preserve dual-cassette synergy.
The framework's §4 cassette scoring table currently lists uricase as "BiP-transit only — chaperone-light," scored against A. flavus (Q00511, 0 disulfides, 0 free Cys). That score does not transfer to C. utilis: 4 free thiols in the oxidizing ER lumen will engage PDI/ERO1 (the same subsystem lactoferrin's 17 disulfides load heavily), whether or not aberrant intermolecular bonds form.
Direction-only prediction: C. utilis uricase + lactoferrin in a dual cassette is predicted to show some synergy penalty relative to A. flavus + lactoferrin, because the C. utilis variant is no longer fully PDI-orthogonal. The A. flavus baseline (chaperone framework §5: synergy 0.8–1.0) is the upper bound; the C. utilis synergy is predicted lower but the magnitude is not computable from the framework as written.
Why no number: §4.2 explicitly notes "no quantitative model available for magnitude" of free-Cys ER aggregation risk, and the chaperone framework's §3 substrate categorization treats disulfide count as bulk PDI load without a per-free-Cys coefficient. Both the comp-011 risk frame and the chaperone framework's calibration are silent on the same number. Any threshold-style claim ("synergy drops below 0.7") would be invented, not derived.
Practical consequence for §1.9: the parallel head-to-head recommended in §6.3 below is the right empirical resolution — the synergy penalty is exactly the kind of bounded-unknown that a $200–400 second-cassette gene synthesis cost can resolve directly via Lf titer comparison across the two strains. If the A. flavus + Lf strain hits >500 mg/L Lf and the C. utilis + Lf strain stalls below ~150 mg/L (other variables held), that's the synergy penalty made visible. If both strains achieve comparable Lf titers, the framework's chaperone-competition prediction is partially falsified for free-Cys-mediated PDI load (a useful result for future cassette decisions).
This bidirectional cross-reference is mirrored in chaperone-orthogonal-stacking.md §4 — the cassette scoring table now carries a C. utilis variant note flagging this load shift, so the implicit "uricase = A. flavus" assumption does not propagate silently.
5. Strategic Advantage of C. utilis (Non-Cassette Axis)¶
The comp-010 vs. comp-011 comparison is purely a cassette-design comparison. There is one strategic advantage of C. utilis that is not captured by the cassette-design analysis but is directly relevant to the oral/gut-lumen track:
ALLN-346 ProteinGPS prior art (US10815461B2): Allena Pharmaceuticals disclosed seven point mutations on a C. utilis uricase backbone (I180V, V190G, Y165F, E51K, Q244K, I132R, A87G) that improve protease resistance and thermostability in the gut-lumen context. These mutations were identified by directed-evolution (ProteinGPS platform) and validated in Phase 1 / Phase 2a clinical studies (ALLN-346). They are publicly disclosed — freedom-to-operate for research use.
A. flavus uricase has no equivalent publicly-disclosed protease-resistance mutation panel. The ALLN-346 mutations convert C. utilis from "less-characterized alternative" to "alternative with millions of dollars of directed-evolution engineering already available to adopt."
This strategic asymmetry is noted here but does not change the cassette-design risk verdict. See uricase-variant-selection.md for the full analysis.
6. Design Recommendations for §1.9¶
- If adopting C. utilis uricase (industry-preferred oral track):
- Order codon-optimized C. utilis uricase CDS from gene synthesis vendor (A. oryzae codon table). This is mandatory — unlike A. flavus, the native C. utilis codon preferences will suppress expression without optimization.
- Add ALLN-346 ProteinGPS mutations (US10815461B2: I180V, V190G, Y165F, E51K, Q244K, I132R, A87G) into the codon-optimized synthesis. Order as a single variant construct.
- Cassette architecture: direct-secretion (PTEF1 or PamyB – amyB SP – codon-optimized C. utilis uricase – TgpdA). Do NOT use glucoamylase-KEX2 fusion for uricase — avoids the 2 internal KR site issue entirely.
-
Add non-reducing SDS-PAGE to the secreted fraction QC panel to detect free-Cys aggregation.
-
If keeping A. flavus uricase (comp-010-verified LOW risk track):
- comp-010 verdict stands: LOW risk, no additional design requirements. Codon optimization optional but low-priority.
-
Recommended if speed-to-first-clone is the priority or rasburicase-derivative IP strategy is preferred.
-
Recommended approach for §1.9: empirical head-to-head:
- Order both A. flavus (Q00511, codon-optimized) and C. utilis (P78609, codon-optimized + ALLN-346 mutations) as direct-secretion cassettes. The §1.9 fermentation experiment resolves the platform decision empirically at $0 additional fermentation cost (same experiment, two strains). Total gene synthesis cost delta: ~$200–400 for the second codon-optimized gene.
-
This recommendation is the canonical instance of the platform's Principle 6: Variant-Agnostic Empirical Head-to-Head (codified 2026-05-06 from this comp-011 pattern). All three preconditions are satisfied: shared §1.9 fermentation infrastructure, bounded marginal input cost ($200-400), genuine literature split (industry prefers C. utilis, academic precedent favors A. flavus).
-
Lactoferrin cassette (unchanged from comp-010): PamyB – glucoamylase – KRGGG – hLf (codon-optimized) – TamyB. Monitor internal KR site at position 579 (P1'=K, MODERATE risk) by SDS-PAGE.
-
Host strain: NSlD-ΔP10 (10-protease-deletion A. oryzae) — required, same as comp-010.
7. Limitations¶
-
CAI proxy, not CDS analysis. Codon burden analysis uses protein sequence + origin-organism preferred codons as a proxy. The HEAVY verdict for C. utilis reflects the CAI proxy (0.65) capturing the systematic AT/GC mismatch. A gene synthesis vendor's codon optimizer resolves this — the burden analysis informs whether optimization is required, not its magnitude after optimization.
-
C. utilis codon table is a genome-wide estimate. A C. utilis-specific highly-expressed secreted-gene codon table is not publicly available. The AT-bias is well-established; specific RSCU values carry ±15–20% error.
-
Free-Cys ER aggregation risk: no quantitative model. The magnitude of aggregation from 4 free cysteines in the A. oryzae ER is empirically unknown. Flagged as a design consideration requiring SDS-PAGE monitoring, not a predicted outcome.
-
P78609 is PE=3 (Inferred from homology). The C. utilis uricase entry has not been directly characterized biochemically under this UniProt accession. The ALLN-346 work provides indirect evidence of biochemical activity on this backbone, but the exact parent sequence Allena used is not disclosed.
-
KEX2 P1' rules from S. cerevisiae Kex2p homology. A. oryzae kexB specificity has not been published with a full P1' preference matrix. This is the same caveat as comp-010.
-
Solid-state format not modeled. Solid-state vs. submerged differences (water activity, O2 gradient, secretion dynamics) are not captured computationally (Sun 2024 caveat).
8. Accession Note¶
UniProt P15296 was cited in early project planning as the likely C. utilis uricase accession. On retrieval (2026-05-05), P15296 returns sp|P16320|NOF_DROME (a Drosophila transposable element protein) — indicating this accession has been reassigned or merged. The correct reviewed Swiss-Prot entry for Candida utilis / Cyberlindnera jadinii uricase is P78609 (URIC_CYBJA), confirmed by taxon search (OX=4903). comp-011 uses P78609.
9. Cross-References¶
- experiments/comp-011-c-utilis-uricase-cassette-compatibility/ — Analysis code, inputs, outputs (machine-readable JSON + human-readable summary)
- cassette-compatibility-computational.md — comp-010 page (A. flavus baseline, LOW risk)
- uricase-variant-selection.md — Industry-revealed preference analysis; comp-011 verdict and platform implications noted there
- hypotheses/H01-ward-dual-cassette.md — Falsification Card; comp-011 is design support for §1.9
- validation-experiments.md — §1.9 wet-lab experiment this analysis informs
- koji-endgame-strain.md — Protocol sketch for §1.9
- experiments/comp-001-uricase-shio-koji-protease-stability/ — A. flavus uricase protease stability (comp-001, LOW)
- experiments/comp-005-lactoferrin-shio-koji-protease-stability/ — Lactoferrin protease stability (comp-005, MODERATE)
Evidence level for all findings: Mechanistic Extrapolation — in silico sequence analysis only. Wet-lab confirmation required for all design recommendations.