Citing INstruct

A paper describing INstruct is published in Bioinformatics. [view article] [view supplement]

Curation Details

For each species, to construct a protein interactome network with structural resolution, we first compiled a list of high-quality binary interactions extracted from high-throughput, high-quality yeast two-hybrid (Y2H) assays and from eight major protein-protein interaction databases: BioGrid, DIP, HPRD, IntAct, iRefWeb, MINT, MIPS, and VisANT. The literature-curated dataset was further filtered to retain high-quality interactions as based on the following criteria: (a) the interaction has at least two separate supporting publications and (b) each of these publications has a binary evidence code; that is, the experiments used for determining the interaction must be in principle capable of determining direct, binary protein-protein interactions. We keep only high-quality binary interactions because the concept of interaction interfaces does not apply when two proteins do not bind each other directly.

Then a homology-based interaction interface inference approach is used to identify protein-protein interaction interfaces based on domain-domain interactions from known co-crystal structures in the Protein Data Bank (PDB). We obtained “Pfam A” domains that are both “significant” and “in-full” for each protein based on Pfam release 24. For each high-quality binary interaction, we determined whether the two interactors contain a Pfam domain pair that has been seen to interact in at least one protein structure in either 3did (release of December 2011) or iPfam (release 21). The set of Pfam domains on protein A that have corresponding interacting domains on protein B is then considered the interaction interface of protein A for protein B. This homology-based interaction interface inference approach has been demonstrated to be very effective and accurate in inferring protein-protein interaction interfaces (Wang, X. et al.).

Gene and Protein Naming Convention

We have annotated each Gene/Protein in three ways, each of which is searchable.

OrganismConvention
H. sapiensUniprot, Entrez Gene ID, Official Gene Symbol
A. thalianaUniprot, TAIR, Official Gene Symbol
C. elegansUniprot, WormBase ID, Official Gene Symbol
D. melanogasterUniprot, FlyBase ID, Official Gene Symbol
M. musculusUniprot, Mouse Genome Informatics ID, Official Gene Symbol
S. cerevisiaeUniprot, ORF Name, Official Gene Symbol
S. pombeUniprot, ORF Name, Official Gene Symbol

To search based on an alternate gene or protein naming convention, please use the Uniprot site to convert your symbol to Uniprot ID by clicking on the "ID Mapping" tab.

Protein Interaction Figures

The figure below shows an example of an interaction diagram depicting an interaction between two imagined proteins.


- The common name of the protein. Click to lookup this protein.
- Domain in Protein A called 'domain 1'. Shown in green because domain 1 helps facilitate the interaction between Protein A and protein B.
- Grey domains do not facilitate the given protein-protein interaction.
- Grey lines indicate a domain-domain interaction that is inferred using our homology-based inference method.
- Red lines indicate a domain-domain interaction that has been shown using direct co-crystal evidence. PDB structures in red in the table below the figure are the source of this evidence.
- Unannotated region of protein.
- Length of the protein measured in amino acids, i.e. Protein A is 850 amino acids long.

Contact

This project is maintained by the Yu lab at Cornell University. For more information about our lab and to find current contact information, please visit the lab's website.