comp-037: C1-INH (SERPING1) Protease Stability + Glycosylation Feasibility in EcN LBP¶
Question: Would human C1-INH (SERPING1, UniProt P05155) survive the colonic-luminal protease environment if expressed as a secreted LBP-luminal payload in engineered E. coli Nissle 1917 (EcN)? And does the heavy native glycosylation (~26 kDa glycan on ~52 kDa polypeptide) — which EcN cannot reproduce — preclude functional inhibitor activity?
Verdict: MODERATE (kinetic-competition gated)
Three independent verdicts, by scope and axis:
| Scope / axis | Verdict | Driver |
|---|---|---|
| Serpin core, strictly-degradative (non-RCL, aa 123-451 + 468-500) | LOW (0.1) | Once mucin domain (aa 23-119) truncated, serpin body is protease-resistant; only buried OmpT di-basic sites remain |
| Serpin core, including RCL (aa 123-500) | RED (0.8) | DegP and elastase have 9 + 13 exposed sites in the RCL (aa 452-467) — but RCL exposure is by design (serpin suicide-substrate mechanism) |
| Glycosylation feasibility, serpin core | GREEN | EcN cannot N-glycosylate; for luminal-secreted topology, plasma half-life concern is moot; serpin catalytic mechanism is encoded in polypeptide, not glycan |
The headline finding: strictly-degradative protease risk on the serpin body is LOW after mucin truncation. The remaining risk is kinetic competition at the RCL — does C1s / C1r / MASP-2 productively engage the reactive bond before DegP / elastase cleaves it unproductively? That is a wet-lab assay question (compare k_C1s_inhibition vs k_DegP_RCL_cleavage on the recombinant construct), not something the pLDDT-based pipeline can resolve.
The combined verdict is MODERATE, not RED, because:
- The strictly-degradative scope is LOW — the serpin body folds and persists.
- The glycosylation issue is GREEN for luminal topology — N-glycans not required for the inhibitor mechanism.
- The remaining RCL kinetic-competition question is gated by wet-lab data, not by additional in silico modeling.
A naive read of "RED at 0.8" would mis-attribute the risk to the wrong axis (degradation) when the actual concern is mechanism-overlapping (RCL engagement vs. RCL degradation).
Informs: wiki/c1-inh-protease-stability-ecn-computational.md — interpretive wiki page; wiki/complement-c5a-gout.md §9.8 — two-chassis CP0 architecture (C1-INH on EcN + DAF SCR1-4 on koji); wiki/engineered-lbp-chassis.md — LBP chassis peer track
Sister experiments: - comp-006 — DAF/CD55 full ectodomain in shio-koji (HIGH, stalk-driven) - comp-012 — DAF/CD55 SCR1-4 truncated in shio-koji (LOW; stalk truncation rescued the verdict) - comp-024 — complestatin BGC on EcN/Bacteroides (RED for LBP track); same study returned GREEN-provisional 0.774 for the C1-INH-on-EcN comparator, which this comp-037 protease-stability analysis substantiates
How to reproduce¶
No external packages required (stdlib only: json, pathlib, random). Outputs land in outputs/.
The core algorithm lives in experiments/lib/protease_stability.py — shared with comp-001 (uricase), comp-005 (lactoferrin), comp-006 (DAF/CD55 full), comp-012 (DAF/CD55 SCR1-4). This script is the orchestrator; the library exports functions only. Deterministic seed (20260517).
File index¶
comp-037-c1-inh-protease-stability-ecn/
analyze.py ← analysis script (run this)
inputs/
P05155.fasta ← human C1-INH sequence (UniProt P05155, 500 aa)
alphafold_P05155_plddt.json ← per-residue pLDDT scores (AF-P05155-F1-v6)
protease_specificities.json ← EcN-luminal protease panel (trypsin / chymotrypsin / elastase / OmpT / DegP) + colonic-lumen conditions
provenance.md ← sources, fetch dates, citations for every input, all features grep-verified against UniProt P05155 SV=2
outputs/ ← generated by analyze.py; committed as artifacts
cleavage_sites.json ← machine-readable full results
summary.md ← human-readable; cited in the wiki
README.md ← this file
Key results¶
| Protease | Total sites | Exposed sites (mature) | Exposed sites (serpin core, non-RCL) | Exposed sites (RCL only) | Effective activity (pH × salt) |
|---|---|---|---|---|---|
| trypsin (S01.151) | 43 | 6 | 0 | 0 | 60% |
| chymotrypsin (S01.001) | 112 | 17 | 0 | 0 | 60% |
| elastase (S01.153) | 267 | 86 | 0 | 11 | 50% |
| OmpT (A26.001) | 3 | 0 | 0 | 0 | 100% |
| DegP / HtrA (S01.273) | 178 | 39 | 0 | 9 | 80% |
After mucin truncation (start at aa 123 = first disulfide Cys), the serpin core has zero exposed sites outside the RCL across all five proteases. Inside the RCL: DegP 9 exposed, elastase 11 exposed — these are the residues that present as substrate for the suicide-mechanism in vivo, and the residues that off-target proteases can cleave unproductively in the EcN luminal environment.
Structural interpretation¶
C1-INH has three structurally distinct zones:
-
Mucin-like N-terminal domain (aa 23-119): pLDDT 30-55, fully disordered. UniProt annotates two MobiDB-lite REGION entries here (aa 20-43, aa 65-118) and 7 × 4-aa [QE]-P-T-[TQ] tandem repeats at aa 85-119. In vivo, heavily O-glycosylated (8 confirmed GalNAc sites at T47/T48/S64/T71/T83/T88/T92/T96 per UniProt CARBOHYD features). In EcN, completely unglycosylated → high-accessibility protease target. Recommended construct truncates this entire region.
-
Serpin core (aa ~123-450): canonical serpin β-sheet + α-helix fold. AlphaFold pLDDT 80-95 across most of the region. Two disulfide bonds (C123-C428 and C130-C205) anchor the fold; the construct boundary is set at C123 to preserve this anchor. Three N-glycan sites in the body (N238, N253, N352 canonical + N272 variant-TA-only); in EcN all unglycosylated, but the body fold does not strictly require N-glycans (folding-quality-control concern only — addressable with EcN periplasmic DsbA/DsbC).
-
Reactive-center loop (RCL, aa ~452-467): exposed, flexible, ending at P1-P1' reactive bond R466-T467 (UniProt SITE "Reactive bond" + "Cleavage; by C1S"). By design must be cleavable by target proteases (C1r, C1s, MASP-2). The serpin suicide mechanism: target protease cleaves at R466-T467 → covalent acyl-enzyme intermediate forms via the serpin reactive serine of C1-INH (no — wait: C1-INH RCL P1 is Arg, classical serpin uses scissile bond, target protease active-site Ser forms acyl-enzyme with the serpin RCL; the serpin then distorts and traps the protease). The 9 DegP / 11 elastase exposed RCL sites can cleave unproductively → kinetic-competition question.
Disagreement protocol¶
If you reproduce the outputs and disagree with the methods or numbers, file a GitHub issue referencing this folder (comp-037-c1-inh-protease-stability-ecn). Primary candidates for revision:
- Serpin-core boundary. Set at aa 123 (first disulfide Cys). aa 22 (post-signal-peptide) or aa 119 (end of UniProt-annotated disordered REGION) are defensible alternatives. The aa 122-123 transition is supported by pLDDT trajectory (43.3 → 57.9 → 77.6 → 84.9) and by disulfide-anchor logic. Truncation choice is a load-bearing engineering parameter.
- Protease panel choice. Trypsin / chymotrypsin / elastase / OmpT / DegP are the dominant identifiable risks; commensal-bacterial proteases (Bacteroides, Clostridia) add diffuse risk not modeled here. If the EcN colonization niche shifts the relevant protease mix (e.g., mucus-layer-residence vs. lumen-residence), the panel should be expanded.
- pH and salt factors. Colonic pH 6-7 is well-established. Bile-acid exposure not modeled as a protease-activity modifier.
- RCL kinetic-competition treatment. This pipeline cannot model k_C1s_engagement vs. k_DegP_RCL_cleavage. The wet-lab gate is a co-incubation assay: recombinant serpin-core construct + C1s + DegP at colonic-relevant ratios, measure productive trapped C1s-C1INH complex vs. unproductive RCL-cleaved free C1-INH fragments.
- UniProt feature numbers. All disulfide, glycosylation, and reactive-bond positions were grep-verified against UniProt P05155 SV=2 text-format flatfile, fetched 2026-05-17. If UniProt updates feature annotations, the inputs should be refreshed and analyze.py re-run.
Provenance & verification¶
All load-bearing numbers grep-verified against primary source before commit (per Open Enzyme CLAUDE.md Rule 4):
| Claim | Primary source | Verification |
|---|---|---|
| Sequence length 500 aa | UniProt P05155 SV=2 | Asserted in analyze.py main() |
| Signal peptide aa 1-22 | UniProt FT SIGNAL 1..22 | Direct grep against UniProt text format |
| Mature chain aa 23-500 | UniProt FT CHAIN 23..500 | Direct grep |
| Reactive bond R466-T467 | UniProt FT SITE 465..466 (chymotrypsin reactive bond), SITE 466..467 (C1S cleavage) | Direct grep + asserted in analyze.py main() (seq[465]='R', seq[466]='T') |
| Disulfide C123-C428 | UniProt FT DISULFID 123..428 | Direct grep + asserted in analyze.py main() (seq[122]='C', seq[427]='C') |
| Disulfide C130-C205 | UniProt FT DISULFID 130..205 | Direct grep + asserted in analyze.py main() |
| 2 disulfide bonds total (not the 6-12 sometimes quoted in casual lit) | UniProt FT DISULFID (exactly two entries) | grep -c verified |
| N-glycan sites N25, N69, N81, N238, N253, N272-variant, N352 | UniProt FT CARBOHYD N-linked | Direct grep; sequon N-X-(S/T) verified by sequence inspection (analyze.py provenance.md) |
| O-glycan sites T47, T48, S64, T71, T83, T88, T92, T96 | UniProt FT CARBOHYD O-linked | Direct grep |
| C1-INH ~26 kDa glycan on ~52 kDa polypeptide | Bos 1998 PMID 9799502; Stavenhagen 2018 PMID 29381136 | Cited in summary.md and provenance.md (PMID-anchored) |
| DegP P1 preference V/I/L/F/Y/A | Krojer 2008 PMID 18261546 | Cited in protease_specificities.json + provenance.md |
| OmpT di-basic P1-P1' (K/R-X-X-K/R) | Dekker 2001 PMID 11226160; Hwang 2007 PMID 17263510 | Cited in protease_specificities.json + provenance.md |
The DAF SCR1-4 disulfide-count incident (2026-05-06) is the canonical case this verification gate exists to prevent. That incident had a subagent confidently asserting "12 disulfides per SCR1-4 → 12 total" when UniProt P08174 has exactly 8 DISULFID features. Here every disulfide and glycosylation position is grep-pinned to UniProt before being written into the analysis. C1-INH has exactly 2 disulfides — verified — not 6 or 12.