DAF/CD55 SCR1-4 Truncated Construct: Shio-Koji Protease Stability (comp-012)¶

Status: Complete — 2026-05-05
Experiment folder: experiments/comp-012-daf-cd55-scr14-truncated/
Evidence level: Mechanistic Extrapolation — AlphaFold pLDDT-based structural inference; no wet-lab confirmation.
Predecessor: comp-006 — full ectodomain (aa 35–353): HIGH
Companion analyses: comp-001 (uricase, LOW), comp-005 (lactoferrin, HIGH/MODERATE)

Summary¶

Scope	Verdict	Max risk score	Worst protease
comp-006: Full ectodomain (aa 35–353, SCR1-4 + Ser/Thr stalk)	HIGH	0.388	NPr
comp-012: SCR1-4 only (aa 35–285, stalk removed)	LOW	0.039	NPr

The SCR1-4-only construct (aa 35–285) is protease-stable under shio-koji conditions: LOW verdict, max risk score 0.039. This is a 10-fold drop from the comp-006 full ectodomain result (HIGH, 0.388), driven entirely by removing the Ser/Thr-rich stalk (aa 286–353, pLDDT 30–52). After truncation, zero exposed sites remain across all three A. oryzae proteases. All 242 recognition sites in the SCR1-4 construct are buried (pLDDT ≥ 80).

Platform implication: This closes the computational feasibility gate for the koji-CD55 engineering thesis. The SCR1-4-only construct is the first computationally validated complement regulator candidate compatible with shio-koji fermentation. Whether it provides meaningful CP0 coverage (C5a reduction in a gut-lumen assay) remains an open wet-lab question — but the protease stability objection is resolved.

Cross-track coordination per comp-018 (2026-05-08): The upstream complement modulator sweep surfaced recombinant C1-INH expression as a near-twin engineering thesis to comp-012/H05. C1-INH is a soluble human serpin (FDA-approved as Berinert/Cinryze/Ruconest for hereditary angioedema) that operates at the C1r/C1s + AP convertase regulator nodes — different complement layer from DAF SCR1-4 (which operates at C3 convertase decay-acceleration). Both share the same chassis logic (express a soluble human complement regulator in koji or LBP) and would provide complementary, not redundant, coverage at upstream-CP0. Phase 2 follow-up: open a parallel comp-NNN for koji-expressed C1-INH protease stability (mirror of comp-006/comp-012). If LOW or MODERATE, OE has two engineering candidates at upstream-CP0 operating at different cascade nodes. comp-018 also surfaced a TIER-1 dietary-tier modulator (rosmarinic acid, C3 convertase IC50 5-10 µM) which is mechanistically additive to engineered DAF (different geometric scale: engineered DAF saturates the MSU surface; dietary RMA saturates fluid-phase + gut-luminal C3 convertase). (source: wiki/upstream-complement-modulator-sweep-computational.md §7.1, §7.4)

§1.5 Correction note (2026-05-06): disulfide count anchor¶

An earlier draft of this page (the 2026-05-05 Sonnet subagent output) asserted "3 conserved disulfide bonds per SCR domain → 12 total for SCR1-4." This was an unverified hallucination: the comp-012 pipeline (AlphaFold pLDDT-based protease stability scoring) does not model disulfide bonds at all — the Limitations section below explicitly states "Disulfide bonds not modelled." The number 3 per domain was never derived from a primary source.

Primary source verification against UniProt P08174 (human DAF/CD55, canonical isoform, SV=4, accessed 2026-05-06) confirms exactly 8 DISULFID feature annotations in the SCR1-4 region: 2 per SCR domain (Cys36-Cys81, Cys65-Cys94 [SCR1]; Cys98-Cys145, Cys129-Cys158 [SCR2]; Cys163-Cys204, Cys190-Cys220 [SCR3]; Cys225-Cys267, Cys253-Cys283 [SCR4]). This is the canonical sushi/CCP fold (Cys1-Cys3 + Cys2-Cys4 motif), standard complement biology. The corrected count (8 total) is now anchored to UniProt feature annotations throughout this page and in H05.

This class of error — a plausible-sounding number invented without primary source verification — is exactly what wiki/manual-literature-mining.md exists to prevent. Surface it here so the correction is traceable.

Context: this experiment completes the comp-006 decision tree¶

comp-006 found the full DAF/CD55 ectodomain (aa 35–353) is HIGH protease risk. The driver was unambiguous: the Ser/Thr-rich stalk (aa 286–353, pLDDT 30–52) contributed 9 NPr-exposed sites, 48 ALP-exposed sites, and 1 acid_protease exposed site — all within 68 disordered residues. The SCR1–4 domains (aa 35–285, pLDDT 85–98) contributed zero exposed sites. comp-006's own Limitations section identified the stalk-truncated construct as the logical follow-up; the 2026-05-05 sweep formalized this into a wet-lab proposal. comp-012 runs the in silico verdict before any wet-lab resources are committed.

The platform context: complement-c5a-gout.md establishes the CP0 chokepoint — complement priming via C5a is the only node in the gout-modulation framework with zero fermentable coverage. Avacopan (a small-molecule C5aR1 antagonist) is the current default, which is awkward for the "your microbe makes the medicine" thesis. DAF/CD55 expressed via koji is the leading candidate for closing this gap, but only if the ectodomain survives shio-koji fermentation.

Method¶

Identical shared library to comp-001, comp-005, comp-006 (experiments/lib/protease_stability.py). Risk score per site:

risk_score = accessibility_weight × salt_residual_activity × ph_activity_factor

Accessibility weights: buried (pLDDT ≥ 80) = 0.1; partially exposed (pLDDT 65–80) = 0.4; exposed (pLDDT < 65) = 1.0.

Truncation: The full P08174 sequence (381 aa) and pLDDT dict (381 entries) are loaded, then subset to aa 35–285 (251 residues, 251 pLDDT values) in Python before passing to library functions. The library sees only the 251-residue SCR1-4 window. Position indices in the analysis are 1-indexed relative to the subsetted sequence; the summary table back-maps each site to its full-sequence coordinate for readability.

Single verdict: comp-006 computed three scopes (full sequence, mature protein, soluble ectodomain). comp-012 computes one scope — the SCR1-4 construct — which is the only engineering-relevant variant being tested.

Structural context¶

SCR1 (aa 35–96): pLDDT 85–98 — well-folded. Two conserved disulfide bonds (4 Cys: Cys36-Cys81, Cys65-Cys94 per UniProt P08174). Inter-SCR junction at aa 83–96 remains well-folded (pLDDT 89–96).

SCR2 (aa 97–160): pLDDT 90–98 — well-folded. Minor dip at aa 99–102 (~91) at the interdomain connection.

SCR3 (aa 161–222): pLDDT 97–98 — the most confidently modelled region. All recognition sites buried.

SCR4 (aa 223–285): pLDDT 91–98 — well-folded, with a minor reduction at aa 270–272 (~93–94). Terminates at aa 285; the disordered stalk begins at aa 286 (pLDDT drops to 74 at aa 286, 63 at aa 287, 50 at aa 288).

Stalk (aa 286–353) — excluded: Fully disordered (pLDDT 30–52). This was the entirety of the protease risk in comp-006. Not present in this construct.

Summary statistics (SCR1-4 only, 251 aa): mean pLDDT 96.7, min pLDDT 85.6, 100% residues pLDDT > 80, 98.8% pLDDT > 90.

Per-protease results¶

Protease	Sites in SCR1-4	Buried	Max risk score
ALP (alkaline subtilisin)	157	157	0.019
NPr (neutral metalloprotease)	60	60	0.039
acid_protease (aspergillopepsin)	25	25	0.020

Zero exposed or partially exposed sites across all three proteases. The max risk score is determined purely by buried-site risk: accessibility_weight (0.1) × effective protease activity. NPr drives the max at 0.039 because it has higher salt tolerance (39% residual at 17.5% NaCl) than ALP (19%) or acid_protease (19.5%).

The risk score floor of 0.039 is the mathematical minimum given the protease parameters — no further structural improvement is possible once all sites are buried.

Verdict interpretation¶

The LOW verdict is not surprising — it was structurally predicted by comp-006.

comp-006 established that the SCR1–4 domains contribute zero exposed sites. comp-012 confirms that a construct containing only those domains scores LOW. The 10-fold risk-score drop (0.388 → 0.039) maps directly to the stalk removal: 9 NPr exposed sites → 0, 48 ALP exposed sites → 0, 1 acid_protease exposed site → 0.

Structural comparison to uricase (comp-001), the benchmark LOW:

Feature	Uricase (comp-001)	CD55 SCR1-4 (comp-012)
Construct length	301 aa	251 aa
Mean pLDDT	97.1	96.7
Min pLDDT	80.5	85.6
% pLDDT > 80	100%	100%
Exposed sites	0	0
Verdict	LOW (0.039 max)	LOW (0.039 max)

The SCR1-4 construct matches uricase on every structural metric. The max risk scores are identical (both 0.039) because both have only buried sites and the NPr effective activity (0.388) sets the buried-site ceiling for the same conditions.

Important caveat on the disulfide contribution: The SCR domains contain 8 conserved disulfide bonds (2 per domain — canonical sushi/CCP fold per UniProt P08174 DISULFID annotations: Cys36-Cys81, Cys65-Cys94 [SCR1]; Cys98-Cys145, Cys129-Cys158 [SCR2]; Cys163-Cys204, Cys190-Cys220 [SCR3]; Cys225-Cys267, Cys253-Cys283 [SCR4]). This analysis treats them as standard polypeptide — disulfide cross-linking further reduces backbone flexibility and proteolytic accessibility relative to what pLDDT alone captures. The true risk in the SCR domains is therefore lower than 0.039 for any correctly folded construct. The LOW verdict likely underestimates the actual protease resistance of a disulfide-intact SCR1-4 fragment.

The CP0 gap question¶

The computational feasibility gate is resolved in favor of the SCR1-4 construct. The remaining open questions before this becomes a real wet-lab proposal:

1. Does SCR1-4 without the stalk retain complement-regulatory activity?
Native DAF/CD55 requires SCR2–4 for C3b/C4b binding and C3 convertase decay-accelerating activity; SCR1 contributes primarily to cofactor function. The stalk is a GPI-anchor linker with no known enzymatic or binding activity. Soluble forms of related proteins (sCR1, which lacks a GPI anchor) retain full complement-inhibitory function, suggesting SCR domains function independently of the stalk. However, this is a mechanistic extrapolation — the specific CD55 SCR1-4 soluble fragment has not been validated functionally in a gut-lumen complement assay (Mechanistic Extrapolation).

2. Can A. oryzae fold 8 intrachain disulfide bonds in a secreted protein?
A. oryzae has protein disulfide isomerase (PDI) activity in its ER. The organism secretes disulfide-containing proteins (e.g., glucoamylase, which has 2 disulfide bonds). Whether it can fold 8 intrachain disulfides in an SCR repeat array (2 per SCR × 4 SCRs) is unknown — though 8 sits comfortably between glucoamylase's 2 and lactoferrin's 17, which is less concerning than prior framings implied. This is a non-trivial question — misfolded SCR domains would be non-functional even if protease-stable. Small-scale expression in A. oryzae with SDS-PAGE under reducing vs. non-reducing conditions would answer this.

3. Is gut-lumen complement relevant to gout flare suppression?
The CP0 hypothesis (complement priming via C5a) is established in synovial fluid and joint tissue. Gut-lumen complement activation, and whether mucosal-expressed CD55 can suppress it meaningfully, is less characterized. Complement activity in the gut lumen is documented (particularly in inflammatory states), but whether gut-lumen DAF expression can modulate systemic C5a-mediated joint inflammation requires direct evidence. This is the fundamental CP0 mucosal-coverage question.

4. Expression titers and activity per gram of koji:
Even a correctly folded, protease-stable, functionally active SCR1-4 fragment needs to be expressed at levels sufficient for gut-lumen therapeutic effect. The ACS Syn Bio 2025 engineered lipase benchmark (365 μmol/h/OD in S. boulardii) sets a reference point for heterologous enzyme expression, but complement regulators are not enzymes — their functional unit is dose-dependent suppression, not catalytic turnover. Dosing calculations require knowing both expression titer and in vivo effective dose.

Key limitations¶

Disulfide bonds not modelled. 8 conserved disulfide bonds in SCR1-4 (2 per domain, per UniProt P08174) substantially reduce backbone flexibility and proteolytic susceptibility beyond pLDDT. This analysis likely overestimates the already-low risk in the SCR domains. The true risk for a correctly-folded construct is lower than 0.039.
pLDDT ≠ solvent accessibility. All 242 SCR-domain sites are classified as buried (pLDDT ≥ 80), but SASA calculation on the AlphaFold structure would identify which surface loops are actually solvent-exposed at domain interfaces. Some "buried" sites may be accessible at the protein surface. A structure-based SASA analysis would refine this.
P1/P1' rules only. Extended subsite specificity (P2–P4) not modelled. May over-count recognition sites; extended context in the compact SCR fold may disfavor cleavage at many nominal sites.
ALP and NPr pH factors conservatively set to 1.0. ALP is outside its active range (pH 6–12) at shio-koji pH 4.5–5.0; NPr is at the lower edge. True activity is lower — risk is conservatively overstated. If corrected, max risk score would fall below 0.039.
CCP-regulatory activity not assessable in silico. Whether the stalk-truncated SCR1-4 construct inhibits C3b deposition, accelerates C3 convertase decay, or reduces C5a generation in a gut-lumen assay is entirely a wet-lab question. Structural protease stability is necessary but not sufficient for function.
O-glycosylation not modelled. Native SCR domains carry N-linked glycans; stalk O-glycans are absent by design. A. oryzae glycosylation differs from human; net effect on stability and function is unknown.

Next step: wet-lab proposal¶

The computational feasibility gate is cleared. The logical next step is a minimal wet-lab expression test:

Expression screen: Clone DAF/CD55 SCR1-4 (aa 35–285) with an A. oryzae α-amylase signal peptide (Ward et al. 1995) into an A. oryzae expression vector (e.g., pANe-based system). Express under standard koji fermentation conditions. Assess: (a) secretion (SDS-PAGE of culture supernatant), (b) correct folding (non-reducing vs. reducing SDS-PAGE for disulfide bond count), © protease stability in shio-koji-conditioned medium over 7–14 days.

Complexity: Medium. Standard A. oryzae transformation protocols are well-established; the expression construct is a 750 bp insert with no unusual features beyond the 8 disulfide bonds (2 per SCR × 4 SCRs, per UniProt P08174). The disulfide folding question is the primary unknown.

Collaboration context: comp-006 cross-references Role 1 (in-vivo / immunology) and Role 2 (pharma-translation / epithelial biology) collaborator pathways for functional complement assays. Expression screen could be a Phase 0 co-experiment if collaboration is established.

Formalized as validation-experiments.md §1.25 (2026-05-06): the full wet-lab protocol with construct design, host strain selection (RIB40 first; NSlD-ΔP10 unlikely needed for single-cassette per Item 10 chaperone framework refinement), readouts (SDS-PAGE reducing+non-reducing / anti-DAF Western / anti-DAF ELISA / mass spec disulfide-pairing / zymosan C5a inhibition / native metabolite carryover), success criteria (≥50 mg/L titer + ≥40% native folded form + ≥30% C5a inhibition), cost ($2.5-4K), timeline (6-8 weeks), and Accept/Iterate/Reject thresholds — with explicit single-cassette routing per the chaperone-orthogonal-stacking.md §5.5 triple-cassette prediction landing below 0.6 decision gate (architecture-refined 2026-05-06).

Open follow-up — RFdiffusion + ProteinMPNN tool-stack integration (added 2026-05-15)¶

The DAF SCR1-4 truncated construct is a redesign problem (modifying the existing full ectodomain to drop SCR5 + GPI anchor while preserving native 8-disulfide folding in aa 35–285). Per the BioDesignBench-validated tool stack (Kim & Romero 2026; see bio-ai-tools.md §BioDesignBench), the canonical computational tools for this class of work are RFdiffusion (de novo backbone design + truncation-aware refolding) and ProteinMPNN (structure-conditioned sequence design to optimize the truncation interface). Neither is currently in OE's computational stack; both are available via the protein-design-mcp package (github.com/jasonkim8652/protein-design-mcp, MIT licence, Docker images for reproducible deployment).

Fires when: comp-006 / §1.25 wet-lab work surfaces a folding-yield problem that the current ESM2 + Boltz-2 verdict can't resolve, OR a v2 of this comp-NNN is needed to refine the construct design before the wet-lab gate. Until then, dormant — the current computational verdict (HIGH protease stability) gates wet-lab work directly without further design iteration.

Phase 2 integration tasks (if triggered):

Deploy protein-design-mcp as a sibling MCP server alongside abent-somm / pubmed / paperclip.
Re-run the DAF SCR1-4 truncated construct through RFdiffusion (backbone scaffold) + ProteinMPNN (sequence optimization) + AlphaFold3 (re-verification).
Cross-check against the current ESM2 + Boltz-2 verdict; if multi-method concordance holds, increase confidence in the construct; if discordant, surface as a decision item before wet-lab spend.

Cross-references¶

daf-cd55-protease-stability-computational.md — comp-006; the full ectodomain HIGH result that motivated this analysis
modality-chokepoint-matrix.md — "Engineered soluble complement regulators" row; CP0 platform gap
complement-c5a-gout.md — CP0 chokepoint; why complement regulation matters for gout
uricase-protease-stability-computational.md — comp-001; the LOW-verdict structural benchmark
lactoferrin-protease-stability-computational.md — comp-005; the HIGH/MODERATE comparison
computational-experiments.md — tracking index
engineered-koji-protocol.md — A. oryzae expression context
validation-experiments.md §1.25 — formalized wet-lab gate experiment (2026-05-06)
chaperone-orthogonal-stacking.md §5.5 — triple-cassette prediction (architecture-refined 2026-05-06) that motivated single-cassette routing for §1.25
hypotheses/H05-daf-scr14-cp0-thesis.md — falsification card; §1.25 readouts directly address H05's three named wet-lab unknowns
daf-cd55-scr14-cassette-ranking-computational.md — comp-030 (2026-05-15); exhaustive cassette ranking (43,200 candidates); top cluster = PamyB + SPamyB + cai_max + direct-secretion; α-coefficient CORROBORATED (pLDDT mean 88.8, 100% above 80)