A representation of the 3D structure of the protein myoglobin showing turquoise ?-helices. This protein was the first to have its structure solved by X-ray crystallography. Towards the right-center among the coils, a prosthetic group called a home group (shown in gray) with a bound oxygen molecule (red).
Proteins are large biomolecules, or macromolecules, consisting of one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalyzing metabolic reactions, DNA replication, responding to stimuli providing structure to cells and organisms, and molecules from one location to another. Proteins differ from one another primarily in their sequence of amino acids, which is dictated by the nucleotide sequence of their genes, and which usually results in protein folding into a specific three-dimensional structure that determines its activity.
A linear chain of amino acid residues is called a polypeptide. A protein contains at least one long polypeptide. Short polypeptides, containing less than 20–30 residues, are rarely considered to be proteins and are commonly called peptides, or sometimes oligopeptides. The individual amino acid residues are bonded together by peptide bonds and adjacent amino acid residues. The sequence of amino acid residues in a protein is defined by the sequence of a gene which is encoded in the genetic code. In general, the genetic code specifies 20 standard amino acids; however, in certain organisms the genetic code can include selenocysteine and—in certain arches—pyrrolysine. Shortly after or even during synthesis, the residues in a protein are often chemically modified by post-translational modification, which alters the physical and chemical properties, folding, stability, activity, and ultimately, the function of the proteins. Sometimes proteins have non-peptide groups attached, which can be called prosthetic groups or cofactors. Proteins can also work together to achieve a particular function, and they often associate to form stable protein complexes.
Once formed, proteins only exist for a certain period and are then degraded and recycled by the cell’s machinery through the process of protein turnover. A protein’s lifespan is measured in terms of its half-life and covers a wide range. They can exist for minutes or years with an average lifespan of 1–2 days in mammalian cells. Abnormal or misfielded proteins are degraded more rapidly either due to being targeted for destruction or due to being unstable.
Like other biological macromolecules such as polysaccharides and nucleic acids, proteins are essential parts of organisms and participate in virtually every process within cells. Many proteins are enzymes that catalyze biochemical reactions and are vital to metabolism. Proteins also have structural or mechanical functions, such as actin and myosin in muscle and the proteins in the cytoskeleton, which form a system of scaffolding that maintains cell shape. Other proteins are important in cell signaling, immune responses, cell adhesion, and the cell cycle. In animals, proteins are needed in the diet to provide the essential amino acids that cannot be synthesized. Digestion breaks the proteins down for use in the metabolism.
Proteins may be purified from other cellular components using a variety of techniques such as ultracentrifugation, precipitation, electrophoresis, and chromatography; the advent of genetic engineering has made possible a number of methods to facilitate purification. Methods commonly used to study protein structure and function include immunohistochemistry, site-directed mutagenesis, X-ray crystallography, nuclear magnetic resonance and mass spectrometry.
The crystal structure of the chaperoning, a huge protein complex. A single protein subunit is highlighted. Chaperoning assist protein folding.
Three possible representations of the three-dimensional structure of the protein triose phosphate isomerize.
All-atom representation colored by atom type.
Simplified representation illustrating the backbone conformation, colored by secondary structure.
Solvent- accessible surface representation colored by residue type (acidic residues red, basic residues blue, polar residues green, nonpolar residues white).
Most proteins fold into unique 3-dimensional structures. The shape into which a protein naturally folds is known as its native conformation. Although many proteins can fold unassisted, simply through the chemical properties of their amino acids, others require the aid of molecular chaperones to fold into their native states. Biochemists often refer to four distinct aspects of a protein’s structure;
The amino acid sequence. A protein is a polyamide.
Regularly repeating local structures stabilized by hydrogen bonds. The most common examples are the ?-helix, ?-sheet and turns. Because secondary structures are local, many regions of different secondary structure can be present in the same protein molecule.
The overall shape of a single protein molecule; the spatial relationship of the secondary structures to one another. Tertiary structure is generally stabilized by nonlocal interactions, most commonly the formation of a hydrophobic core, but also through salt bridges, hydrogen bonds, disulfide bonds, and even posttranslational modifications. The term “tertiary structure” is often used as synonymous with the term fold. The tertiary structure is what controls the basic function of the protein.
The structure formed by several protein molecules (polypeptide chains), usually called protein subunits in this context, which function as a single protein complex.
Proteins are not entirely rigid molecules. In addition to these levels of structure, proteins may shift between several related structures while they perform their functions. In the context of these functional rearrangements, these tertiary or quaternary structures are usually referred to as “conformations”, and transitions between them are called conformational changes. Such changes are often induced by the binding of a substrate molecule to an enzyme’s active site, or the physical region of the protein that participates in chemical catalysis. In solution proteins also undergo variation in structure through thermal vibration and the collision with other molecules.
Molecular surface of several proteins showing their comparative sizes. From left to right are: immunoglobulin G (Iggy, an antibody), hemoglobin, insulin (a hormone), acetylate kinase (an enzyme), and glutamine synthetize (an enzyme).
Proteins can be informally divided into three main classes, which correlate with typical tertiary structures:
• Globular proteins
• Fibrous proteins
• Membrane proteins
Almost all globular proteins are soluble and many are enzymes. Fibrous proteins are often structural, such as collagen the major component of connective tissue, or keratin, the protein component of hair and nails. Membrane proteins often serve as receptors or provide channels for polar or charged molecules to pass through the membrane. A special case of intermolecular hydrogen bonds within proteins, poorly shielded from water attack and hence promoting their own dehydration, are called dehydrations.
Many proteins are composed of several protein domains, i.e. segments of a protein that fold into distinct structural units. Domains usually also have specific functions, such as enzymatic activities (e.g. kinase) or they serve as binding modules (e.g. the SH3 domain binds to praline-rich sequences in other proteins).
Short amino acid sequences within proteins often act as recognition sites for other proteins. For instance, SH3 domains typically bind to short Pox motifs (i.e. 2 pralines P, separated by 2 unspecified amino acids x, although the surrounding amino acids may determine the exact binding specificity). A large number of such motifs has been collected in the Eukaryotic Linear Motif (ELM) database.
Proteins are the chief actors within the cell, said to be carrying out the duties specified by the information encoded in genes. With the exception of certain types of RNA, most other biological molecules are relatively inert elements upon which proteins act. Proteins make up half the dry weight of an Escherichia coli cell, whereas other macromolecules such as DNA and RNA make up only 3% and 20%, respectively. The set of proteins expressed in a particular cell or cell type is known as its proteome.
The enzyme hexokinase is shown as a conventional ball-and-stick molecular model. To scale in the top right-hand corner are two of its substrates,
The chief characteristic of proteins that also allows their diverse set of functions is their ability to bind other molecules specifically and tightly. The region of the protein responsible for binding another molecule is known as the binding site and is often a depression or “pocket” on the molecular surface. This binding ability is mediated by the tertiary structure of the protein, which defines the binding site pocket, and by the chemical properties of the surrounding amino acids’ side chains. Protein binding can be extraordinarily tight and specific; for example, the rib nuclease inhibitor protein binds to human angiotensin with a sub-femtomolar dissociation constant (1 M). Extremely minor chemical changes such as the addition of a single methyl group to a binding partner can sometimes suffice to nearly eliminate binding; for example, the aminoacyl tRNAsynthetize specific to the amino acid valinediscriminates against the very similar side chain of the amino acid isoleucine.
Proteins can bind to other proteins as well as to small-molecule substrates. When proteins bind specifically to other copies of the same molecule, they can oligomerize to form fibrils; this process occurs often in structural proteins that consist of globular monomers that self-associate to form rigid fibers. Protein–protein interactions also regulate enzymatic activity, control progression through the cell cycle, and allow the assembly of large protein complexes that carry out many closely related reactions with a common biological function. Proteins can also bind to, or even be integrated into, cell membranes. The ability of binding partners to induce conformational changes in proteins allows the construction of enormously complex signaling networks. As interactions between proteins are reversible, and depend heavily on the availability of different groups of partner proteins to form aggregates that are capable to carry out discrete sets of function, study of the interactions between specific proteins is a key to understand important aspects of cellular function, and ultimately the properties that distinguish particular cell types:
The best-known role of proteins in the cell is as enzymes, which catalyze chemical reactions. Enzymes are usually highly specific and accelerate only one or a few chemical reactions. Enzymes carry out most of the reactions involved in metabolism, as well as manipulating DNA in processes such as DNA replication, DNA repair, and transcription. Some enzymes act on other proteins to add or remove chemical groups in a process known as posttranslational modification. About 4,000 reactions are known to be catalyzed by enzymes. The rate acceleration conferred by enzymatic catalysis is often enormous—as much as 10-fold increase in rate over the unanalyzed reaction in the case of rotate decarboxylase (78 million years without the enzyme, 18 milliseconds with the enzyme).
The molecules bound and acted upon by enzymes are called substrates. Although enzymes can consist of hundreds of amino acids, it is usually only a small fraction of the residues that come in contact with the substrate, and an even smaller fraction—three to four residues on average—that are directly involved in catalysis. The region of the enzyme that binds the substrate and contains the catalytic residues is known as the active site.
Diligent proteins are members of a class of proteins that dictate the stereochemistry of a compound synthesized by other enzymes.
Cell signaling and ligand binding:
Ribbon diagram of a mouse antibody against cholera that binds a carbohydrate antigen. Many proteins are involved in the process of cell signaling and signal transduction. Some proteins, such as insulin, are extracellular proteins that transmit a signal from the cell in which they were synthesized to other cells in distant tissues. Others are membrane proteins that act as receptors whose main function is to bind a signaling molecule and induce a biochemical response in the cell. Many receptors have a binding site exposed on the cell surface and an effector domain within the cell, which may have enzymatic activity or may undergo a conformational change detected by other proteins within the cell.
Antibodies are protein components of an adaptive immune system whose main function is to bind antigen , or foreign substances in the body, and target them for destruction. Antibodies can be secreted into the extracellular environment or anchored in the membranes of specialized B cells known as plasma cells. Whereas enzymes are limited in their binding affinity for their substrates by the necessity of conducting their reaction, antibodies have no such constraints. An antibody’s binding affinity to its target is extraordinarily high.
Many ligand transport proteins bind particular small biomolecules and transport them to other locations in the body of a multicellular organism. These proteins must have a high binding affinity when their ligand is present in high concentrations, but must also release the ligand when it is present at low concentrations in the target tissues. The canonical example of a ligand-binding protein is hemoglobin, which transports oxygen from the lungs to other organs and tissues in all vertebrates and has close homologs in every biological kingdom. Lections are sugar-binding proteins which are highly specific for their sugar moieties. Lections typically play a role in biological recognition phenomena involving
Cells and proteins. Receptors and hormones are highly specific binding proteins. Trans membrane proteins can also serve as ligand transport proteins that alter the permeability of the cell membrane to small molecules and ions. The membrane alone has a hydrophobic core through which polar or charged molecules cannot diffuse. Membrane proteins contain internal channels that allow such molecules to enter and exit the cell. Many ion channel proteins are specialized to select for only a particular ion; for example, potassium and sodium channels often discriminate for only one of the two ions.
Structural proteins confer stiffness and rigidity to otherwise-fluid biological components. Most structural proteins are fibrous proteins; for example, collagen and elastin are critical components of connective tissue such as cartilage, and keratin is found in hard or filamentous structures such as hair, nails, feathers, hooves, and some animal shells. Some globular proteins can also play structural functions, for example, actin and tubulin are globular and soluble as monomers, but polymerize to form long, stiff fibers that make up the cytoskeleton, which allows the cell to maintain its shape and size.
Other proteins that serve structural functions are motor proteins such as myosin, kinesis, and dynein, which are capable of generating mechanical forces. These proteins are crucial for cellular motility of single celled organisms and the sperm of many multicellular organisms which reproduce sexually. They also generate the forces exerted by contracting muscles and play essential roles in intracellular transport.
Cro protein complex with DNA
Interaction of DNA (orange) with histones (blue). These proteins’ basic amino acids bind to the acidic phosphate groups on DNA.
The lambda repressor helix-turn-helix transcription factor bound to its DNA target.
The restriction enzyme Encore (green) in a complex with its substrate DNA.
DNA-binding proteins are proteins that have DNA-binding domains and thus have a specific or general affinity for single- or double-stranded DNA. Sequence-specific DNA-binding proteins generally interact with the major groove of B-DNA, because it exposes more functional groups that identify a base pair. However, there are some known minor groove DNA-binding ligands such as necropsy, distamycin, Hoechst 33258, pentamidine, DAPI and others.
DNA-binding proteins include transcription factor which modulate the process of transcription, various polymerases, nucleases which cleave DNA molecules, and histones which are involved in chromosome packaging and transcription in the cell nucleus. DNA-binding proteins can incorporate such domains as the zinc finger, the helix-turn-helix, and the leonine zipper among many others) that facilitate binding to nucleic acid. There are also more unusual examples such as transcription activator like effectors.
Non-specific DNA-protein interactions:
Structural proteins that bind DNA are well-understood examples of non-specific DNA-protein interactions. Within chromosomes, DNA is held in complexes with structural proteins. These proteins organize the DNA into a compact structure called chromatin. In eukaryotes, this structure involves DNA binding to a complex of small basic proteins called histones. In prokaryotes, multiple types of proteins are involved. The histones form a disk-shaped complex called a nucleosome, which contains two complete turns of double-stranded DNA wrapped around its surface. These non-specific interactions are formed through basic residues in the histones making bonds to the acidic sugar-phosphate backbone of the DNA, and are therefore largely independent of the base sequence. Chemical modifications of these basic acid residues include methylation, phosphorylation and acetylation. These chemical changes alter the strength of the interaction between the DNA and the histones, making the DNA more or less accessible to transcription factors and changing the rate of transcription. Other non-specific DNA-binding proteins in chromatin include the high-mobility group (HMG) proteins, which bind to bent or distorted DNA. Biophysical studies show that these architectural HMG proteins bind, bend and loop DNA to perform its biological functions. These proteins are important in bending arrays of nucleosomes and arranging them into the larger structures that form chromosomes.
Proteins that specifically bind single-stranded DNA:
A distinct group of DNA-binding proteins are the DNA-binding proteins that specifically bind single-stranded DNA. In humans, replication protein A is the best-understood member of this family and is used in processes where the double helix is separated, including DNA replication, recombination and DNA repair. These binding proteins seem to stabilize single-stranded DNA and protect it from forming stem-loops or being degraded by nucleases.
Binding to specific DNA sequences:
In contrast, other proteins have evolved to bind to specific DNA sequences. The most intensively studied of these are the various transcription factors, which are proteins that regulate transcription. Each transcription factor binds to one specific set of DNA sequences and activates or inhibits the transcription of genes that have these sequences near their promoters. The transcription factors do this in two ways. Firstly, they can bind the RNA polymerase responsible for transcription, either directly or through other mediator proteins; this locates the polymerase at the promoter and allows it to begin transcription. Alternatively, transcription factors can bind enzymes that modify the histones at the promoter. This alters the accessibility of the DNA template to the polymerase.
These DNA targets can occur throughout an organism’s genome. Thus, changes in the activity of one type of transcription factor can affect thousands of genes. Thus, these proteins are often the targets of the signal transduction processes that control responses to environmental changes or cellular differentiation and development. The specificity of these transcription factors’ interactions with DNA come from the proteins making multiple contacts to the edges of the DNA bases, allowing them to read the DNA sequence. Most of these base-interactions are made in the major groove, where the bases are most accessible. Mathematical descriptions of protein-DNA binding taking into account sequence-specificity, and competitive and cooperative binding of proteins of different types are usually performed with the help of the lattice models. Computational methods to identify the DNA binding sequence specificity have been proposed to make a good use of the abundant sequence data in the post-genomic era.
Protein–DNA interactions occur when a protein binds a molecule of DNA, often to regulate the biological function of DNA, usually the expression of a gene. Among the proteins that bind to DNA are transcription factors that activate or repress gene expression by binding to DNA motifs and histones that form part of the structure of DNA and bind to it less specifically. Also proteins that repair DNA such as uracil-DNA glycosylate interact closely with it.
In general, proteins bind to DNA in the major groove; however, there are exceptions. Protein–DNA interaction are of mainly two types, either specific interaction, or non-specific interaction. Recent single-molecule experiments showed that DNA binding proteins undergo of rapid rebinding in order to bind in correct orientation for recognizing the target site.
Designing DNA-binding proteins that have a specified DNA-binding site has been an important goal for biotechnology. Zinc finger proteins have been designed to bind to specific DNA sequences and this is the basis of zinc finger nucleases. Recently transcription activator-like effector nucleases (TALENs) have been created which are based on natural proteins secreted by Xanthomonas bacteria via their type III secretion system when they infect various plant species.
There are many in vitro and in vivo techniques which are useful in detecting DNA-Protein Interactions. The following lists some methods currently in use:
• Electrophoretic mobility shift assay is a widespread technique to identify protein–DNA interactions.
• DNase foot printing assay can be used to identify the specific site of binding of a protein to DNA.
• Chromatin immune precipitation is used to identify the sequence of the DNA fragments which bind to a known transcription factor. This technique when combined with high throughput sequencing is known as Chipset and when combined with microarrays it is known as Chip-chip.
• Yeast One-hybrid System (Y1H) is used to identify which protein binds to a particular DNA fragment.
• Bacterial one-hybrid system (B1H) is used to identify which protein binds to a particular DNA fragment.
• Structure determination using X-ray crystallography has been used to give a highly detailed atomic view of protein–DNA interactions.
Manipulating the interactions:
The protein–DNA interactions can be modulated using stimuli like ionic strength of the buffer, macromolecular crowding, temperature, pH and electric field. This can lead to reversible dissociation/association of the protein–DNA complex