How to Identify Interaction Domains in Protein–Protein Interactions

From domain annotation to docking strategy

← Back to Blog

When doing protein–protein interaction studies, many people directly ask: Can we just take two full-length proteins and run docking to see which pose is best? Yes, you can do that—but it is usually not the most reliable first step. The reason is that many proteins are not simple single units. They are composed of multiple structural domains, functional regions, flexible linkers, intrinsically disordered regions, and repeat segments. In reality, the actual interaction is often not “whole protein vs whole protein,” but rather a specific domain, a functional segment, or even a short peptide motif. So the core question is not: “Can we compute a complex structure?” But instead: “How do we progressively narrow down the most biologically plausible binding region based on the research goal?”

I. The first step is NOT docking, but defining domain boundaries
The most critical mistake is arbitrarily splitting proteins by length.
For example, for a 900-aa protein, it is incorrect to simply divide it into:
• 1–300 aa
• 301–600 aa
• 601–900 aa

This looks neat, but it often cuts through real structural domains.
A more reasonable approach is to first consult databases to determine annotated structural and functional regions.

Common databases include:
• UniProt: basic domains, functional regions, sites, post-translational modifications  
• InterPro: integrated domain/family annotation from multiple databases  
• Pfam: canonical domain families  
• NCBI CDD: conserved domains and functional sites  
• SMART: signaling domains, extracellular regions, repeat architectures  
• PROSITE: functional motifs and conserved sites  
• PDB: experimentally solved structures  
• AlphaFold: predicted structures, pLDDT confidence, domain organization

The purpose is not to list database annotations in a paper, but to answer a practical question:
Which segments are biologically meaningful structural units, rather than arbitrary length-based fragments?

II. What different databases actually provide
1.UniProt: the first stop
UniProt provides a comprehensive overview of protein function, sequence features, domains, sites, and modifications. 

Key features to check:
• Domain
• Region
• Repeat
• Coiled-coil
• Disordered region
• Motif
• Binding site
• Active site
• Modified residue

These help determine:
• Which regions are structured domains
• Which are functional regions
• Which are repetitive segments
• Which are disordered regions
• Where binding or catalytic sites exist

However, UniProt is only a starting point, not the final answer.

For example:
• If a segment is annotated as a domain → candidate structural unit
• If annotated as disordered → not suitable as rigid docking unit
• If modified (phosphorylation, ubiquitination, cleavage) → may regulate interaction

2.InterPro: cross-validation of domain boundaries
InterPro integrates multiple databases including Pfam, CDD, PROSITE, SMART, CATH-Gene3D, SUPERFAMILY, etc. 

Its value lies in cross-validation.
If UniProt and InterPro both annotate the same region as a domain, confidence increases.

For example:
• UniProt: 150–408 aa kinase domain
• InterPro: protein kinase-like domain
• Pfam/CDD: kinase domain also detected

→ This strongly supports 150–408 aa as a valid structural domain.

3.Pfam / CDD: conserved domain consistency
Pfam classifies protein families and domains based on hidden Markov models. 

Pfam and CDD help confirm:
• Whether a region corresponds to a canonical domain
• Approximate domain boundaries
• Presence of conserved functional residues
• Evolutionary conservation

4.SMART / PROSITE: motifs, repeats, functional sites
SMART is useful for:
• signaling proteins
• extracellular domains
• repeat architectures 

PROSITE focuses on:
• conserved motifs
• functional residues

5.PDB / AlphaFold: structural validation
Databases provide domain predictions, but docking requires 3D validation. 

Key questions:
• Does the fragment form a stable fold?
• Are domain boundaries cutting helices or β-sheets?
• Is the region exposed or buried?
• Does AlphaFold pLDDT support stability?
• Does PAE indicate reliable inter-domain orientation?

AlphaFold predictions must be interpreted carefully.
Low pLDDT → likely disordered region
→ not suitable for rigid docking
→ better treated as peptide/motif region

Summary of Section II:
Database analysis defines domain boundaries; structural analysis determines whether they are suitable for docking.

III. Example: AKT1 domain interpretation (P31749)
AKT1 should NOT be divided arbitrarily.
Correct domain structure:
• PH domain: 5–108 aa
• Linker: 109–149 aa
• Kinase domain: 150–408 aa
• C-terminal regulatory region: 409–480 aa

Incorrect split example:
• 1–160
• 161–320
• 321–480

This breaks real domain organization.

Functional sites:
• 14–19, 23–25, 53, 86 → phosphoinositide binding (PH domain)
• 156–164, 179 → ATP binding (kinase domain)
• 274 → active site
• 462 → caspase-3 cleavage site

Interpretation:
• PH domain → membrane localization / lipid binding
• Kinase domain → catalytic activity
• C-terminal region → regulation / cleavage

Therefore, interaction analysis must ask:
• Does protein X affect AKT1 membrane localization? → PH domain
• Does X affect kinase activity? → kinase domain
• Does X affect regulation or cleavage? → C-terminal region


IV. Combine domain data with experimental evidence
Databases alone are insufficient.
You should integrate:
• literature reports
• Co-IP / pull-down truncation studies
• deletion mutants
• point mutations
• PTM effects
• mutation clustering

Example:
If deletion of D2 abolishes interaction → D2 is priority, regardless of docking score.

V. Converting domain information into docking strategy
Case 1: multiple domains in A, full-length B
Run:
• A-D1 vs B
• A-D2 vs B
• A-D3 vs B

If needed:
• A-D1 vs B-D1
• A-D1 vs B-D2
• A-D2 vs B-D1
• A-D2 vs B-D2

But do NOT exhaustively enumerate all combinations blindly. Prioritize biologically supported pairs.

Case 2: repeat regions
Repeat units (ankyrin, LRR, TPR, etc.) do NOT function independently.
Instead of isolating single repeats:
• treat entire repeat block as one unit

Case 3: coiled-coil regions
Check:
• helix orientation
• hydrophobic stripe alignment
• electrostatic complementarity
• continuous interface along helix
• feasibility in full-length context

Beware of false-positive docking where helices merely “touch”.

Case 4: disordered regions and motifs
Disordered regions often contain functional motifs.
• identify short linear motifs
• conserved segments
• modification sites
• peptide docking approaches
• optionally molecular dynamics validation

VI. Docking score is NOT sufficient
Many people wrongly conclude:
lowest docking score = true binding domain

This is unreliable because:
• domains differ in size and surface area
• electrostatics vary
• scoring functions are not directly comparable

Better criteria:
1.Consistency of top-ranked models
Check whether top 10–50 models converge on the same region.
If many cluster on D2 → likely interface.

2.Interface residue clustering
A reliable interface should show:
• continuous contact patch
• hydrophobic complementarity
• salt bridges / hydrogen bonds
• not scattered residues

3.Consistency with experiments
Best docking models should explain:
• truncation effects
• mutation effects
• PTM regulation
• binding loss upon deletion

4.Full-length structural feasibility
Check:
• is interface accessible?
• steric clashes in full protein?
• linker length sufficient?
• orientation realistic in full context?

Docking predicts possibility, not final truth.

VII. When is this level of analysis sufficient?
• domain annotation
• experimental integration
• docking screening
• interface inspection
• full-length validation

“This region is suggested to be a potential interaction interface…”

VIII. Summary
Determining which domains of two proteins are responsible for their interaction is NOT:
• arbitrarily cutting protein sequences
• relying solely on docking scores
• or depending on a single database annotation

A proper analytical strategy involves:
• integrating multiple databases for domain annotation
• validating domain boundaries using 3D structural information
• incorporating existing experimental evidence as constraints
• filtering docking results based on consistency across models
• evaluating full-length protein context

🧪 More lab insights: Visit our Blog

🧪 Need help mapping interaction domains or predicting binding interfaces? Explore our Protein Interaction Analysis Services