Skip to content

comp-037: C1-INH (SERPING1) Protease Stability + Glycosylation Feasibility in EcN LBP

Question: Would human C1-INH (SERPING1, UniProt P05155) survive the colonic-luminal protease environment if expressed as a secreted LBP-luminal payload in engineered E. coli Nissle 1917 (EcN)? And does the heavy native glycosylation (~26 kDa glycan on ~52 kDa polypeptide) — which EcN cannot reproduce — preclude functional inhibitor activity?

Verdict: MODERATE (kinetic-competition gated)

Three independent verdicts, by scope and axis:

Scope / axis Verdict Driver
Serpin core, strictly-degradative (non-RCL, aa 123-451 + 468-500) LOW (0.1) Once mucin domain (aa 23-119) truncated, serpin body is protease-resistant; only buried OmpT di-basic sites remain
Serpin core, including RCL (aa 123-500) RED (0.8) DegP and elastase have 9 + 13 exposed sites in the RCL (aa 452-467) — but RCL exposure is by design (serpin suicide-substrate mechanism)
Glycosylation feasibility, serpin core GREEN EcN cannot N-glycosylate; for luminal-secreted topology, plasma half-life concern is moot; serpin catalytic mechanism is encoded in polypeptide, not glycan

The headline finding: strictly-degradative protease risk on the serpin body is LOW after mucin truncation. The remaining risk is kinetic competition at the RCL — does C1s / C1r / MASP-2 productively engage the reactive bond before DegP / elastase cleaves it unproductively? That is a wet-lab assay question (compare k_C1s_inhibition vs k_DegP_RCL_cleavage on the recombinant construct), not something the pLDDT-based pipeline can resolve.

The combined verdict is MODERATE, not RED, because:

  1. The strictly-degradative scope is LOW — the serpin body folds and persists.
  2. The glycosylation issue is GREEN for luminal topology — N-glycans not required for the inhibitor mechanism.
  3. The remaining RCL kinetic-competition question is gated by wet-lab data, not by additional in silico modeling.

A naive read of "RED at 0.8" would mis-attribute the risk to the wrong axis (degradation) when the actual concern is mechanism-overlapping (RCL engagement vs. RCL degradation).

Informs: wiki/c1-inh-protease-stability-ecn-computational.md — interpretive wiki page; wiki/complement-c5a-gout.md §9.8 — two-chassis CP0 architecture (C1-INH on EcN + DAF SCR1-4 on koji); wiki/engineered-lbp-chassis.md — LBP chassis peer track

Sister experiments: - comp-006 — DAF/CD55 full ectodomain in shio-koji (HIGH, stalk-driven) - comp-012 — DAF/CD55 SCR1-4 truncated in shio-koji (LOW; stalk truncation rescued the verdict) - comp-024 — complestatin BGC on EcN/Bacteroides (RED for LBP track); same study returned GREEN-provisional 0.774 for the C1-INH-on-EcN comparator, which this comp-037 protease-stability analysis substantiates


How to reproduce

cd experiments/comp-037-c1-inh-protease-stability-ecn
python3 analyze.py

No external packages required (stdlib only: json, pathlib, random). Outputs land in outputs/.

The core algorithm lives in experiments/lib/protease_stability.py — shared with comp-001 (uricase), comp-005 (lactoferrin), comp-006 (DAF/CD55 full), comp-012 (DAF/CD55 SCR1-4). This script is the orchestrator; the library exports functions only. Deterministic seed (20260517).


File index

comp-037-c1-inh-protease-stability-ecn/
  analyze.py                          ← analysis script (run this)
  inputs/
    P05155.fasta                      ← human C1-INH sequence (UniProt P05155, 500 aa)
    alphafold_P05155_plddt.json       ← per-residue pLDDT scores (AF-P05155-F1-v6)
    protease_specificities.json       ← EcN-luminal protease panel (trypsin / chymotrypsin / elastase / OmpT / DegP) + colonic-lumen conditions
    provenance.md                     ← sources, fetch dates, citations for every input, all features grep-verified against UniProt P05155 SV=2
  outputs/                            ← generated by analyze.py; committed as artifacts
    cleavage_sites.json               ← machine-readable full results
    summary.md                        ← human-readable; cited in the wiki
  README.md                           ← this file

Key results

Protease Total sites Exposed sites (mature) Exposed sites (serpin core, non-RCL) Exposed sites (RCL only) Effective activity (pH × salt)
trypsin (S01.151) 43 6 0 0 60%
chymotrypsin (S01.001) 112 17 0 0 60%
elastase (S01.153) 267 86 0 11 50%
OmpT (A26.001) 3 0 0 0 100%
DegP / HtrA (S01.273) 178 39 0 9 80%

After mucin truncation (start at aa 123 = first disulfide Cys), the serpin core has zero exposed sites outside the RCL across all five proteases. Inside the RCL: DegP 9 exposed, elastase 11 exposed — these are the residues that present as substrate for the suicide-mechanism in vivo, and the residues that off-target proteases can cleave unproductively in the EcN luminal environment.


Structural interpretation

C1-INH has three structurally distinct zones:

  1. Mucin-like N-terminal domain (aa 23-119): pLDDT 30-55, fully disordered. UniProt annotates two MobiDB-lite REGION entries here (aa 20-43, aa 65-118) and 7 × 4-aa [QE]-P-T-[TQ] tandem repeats at aa 85-119. In vivo, heavily O-glycosylated (8 confirmed GalNAc sites at T47/T48/S64/T71/T83/T88/T92/T96 per UniProt CARBOHYD features). In EcN, completely unglycosylated → high-accessibility protease target. Recommended construct truncates this entire region.

  2. Serpin core (aa ~123-450): canonical serpin β-sheet + α-helix fold. AlphaFold pLDDT 80-95 across most of the region. Two disulfide bonds (C123-C428 and C130-C205) anchor the fold; the construct boundary is set at C123 to preserve this anchor. Three N-glycan sites in the body (N238, N253, N352 canonical + N272 variant-TA-only); in EcN all unglycosylated, but the body fold does not strictly require N-glycans (folding-quality-control concern only — addressable with EcN periplasmic DsbA/DsbC).

  3. Reactive-center loop (RCL, aa ~452-467): exposed, flexible, ending at P1-P1' reactive bond R466-T467 (UniProt SITE "Reactive bond" + "Cleavage; by C1S"). By design must be cleavable by target proteases (C1r, C1s, MASP-2). The serpin suicide mechanism: target protease cleaves at R466-T467 → covalent acyl-enzyme intermediate forms via the serpin reactive serine of C1-INH (no — wait: C1-INH RCL P1 is Arg, classical serpin uses scissile bond, target protease active-site Ser forms acyl-enzyme with the serpin RCL; the serpin then distorts and traps the protease). The 9 DegP / 11 elastase exposed RCL sites can cleave unproductively → kinetic-competition question.


Disagreement protocol

If you reproduce the outputs and disagree with the methods or numbers, file a GitHub issue referencing this folder (comp-037-c1-inh-protease-stability-ecn). Primary candidates for revision:

  • Serpin-core boundary. Set at aa 123 (first disulfide Cys). aa 22 (post-signal-peptide) or aa 119 (end of UniProt-annotated disordered REGION) are defensible alternatives. The aa 122-123 transition is supported by pLDDT trajectory (43.3 → 57.9 → 77.6 → 84.9) and by disulfide-anchor logic. Truncation choice is a load-bearing engineering parameter.
  • Protease panel choice. Trypsin / chymotrypsin / elastase / OmpT / DegP are the dominant identifiable risks; commensal-bacterial proteases (Bacteroides, Clostridia) add diffuse risk not modeled here. If the EcN colonization niche shifts the relevant protease mix (e.g., mucus-layer-residence vs. lumen-residence), the panel should be expanded.
  • pH and salt factors. Colonic pH 6-7 is well-established. Bile-acid exposure not modeled as a protease-activity modifier.
  • RCL kinetic-competition treatment. This pipeline cannot model k_C1s_engagement vs. k_DegP_RCL_cleavage. The wet-lab gate is a co-incubation assay: recombinant serpin-core construct + C1s + DegP at colonic-relevant ratios, measure productive trapped C1s-C1INH complex vs. unproductive RCL-cleaved free C1-INH fragments.
  • UniProt feature numbers. All disulfide, glycosylation, and reactive-bond positions were grep-verified against UniProt P05155 SV=2 text-format flatfile, fetched 2026-05-17. If UniProt updates feature annotations, the inputs should be refreshed and analyze.py re-run.

Provenance & verification

All load-bearing numbers grep-verified against primary source before commit (per Open Enzyme CLAUDE.md Rule 4):

Claim Primary source Verification
Sequence length 500 aa UniProt P05155 SV=2 Asserted in analyze.py main()
Signal peptide aa 1-22 UniProt FT SIGNAL 1..22 Direct grep against UniProt text format
Mature chain aa 23-500 UniProt FT CHAIN 23..500 Direct grep
Reactive bond R466-T467 UniProt FT SITE 465..466 (chymotrypsin reactive bond), SITE 466..467 (C1S cleavage) Direct grep + asserted in analyze.py main() (seq[465]='R', seq[466]='T')
Disulfide C123-C428 UniProt FT DISULFID 123..428 Direct grep + asserted in analyze.py main() (seq[122]='C', seq[427]='C')
Disulfide C130-C205 UniProt FT DISULFID 130..205 Direct grep + asserted in analyze.py main()
2 disulfide bonds total (not the 6-12 sometimes quoted in casual lit) UniProt FT DISULFID (exactly two entries) grep -c verified
N-glycan sites N25, N69, N81, N238, N253, N272-variant, N352 UniProt FT CARBOHYD N-linked Direct grep; sequon N-X-(S/T) verified by sequence inspection (analyze.py provenance.md)
O-glycan sites T47, T48, S64, T71, T83, T88, T92, T96 UniProt FT CARBOHYD O-linked Direct grep
C1-INH ~26 kDa glycan on ~52 kDa polypeptide Bos 1998 PMID 9799502; Stavenhagen 2018 PMID 29381136 Cited in summary.md and provenance.md (PMID-anchored)
DegP P1 preference V/I/L/F/Y/A Krojer 2008 PMID 18261546 Cited in protease_specificities.json + provenance.md
OmpT di-basic P1-P1' (K/R-X-X-K/R) Dekker 2001 PMID 11226160; Hwang 2007 PMID 17263510 Cited in protease_specificities.json + provenance.md

The DAF SCR1-4 disulfide-count incident (2026-05-06) is the canonical case this verification gate exists to prevent. That incident had a subagent confidently asserting "12 disulfides per SCR1-4 → 12 total" when UniProt P08174 has exactly 8 DISULFID features. Here every disulfide and glycosylation position is grep-pinned to UniProt before being written into the analysis. C1-INH has exactly 2 disulfides — verified — not 6 or 12.