an Introduction to Proteins in 30 minutes

An introduction to the composition and structure of proteins, the relationship with activity and function, and how size, molecular weight and zeta potential measurements can be used for their characterization by light scattering.

Introduction

In the biotechnology industry, the majority of research is directed towards characterizing the structure and function of proteins. New methods are continually being developed to accomplish this. Light scattering technologies are relatively new to the field of protein characterization but the potential for characterizing them using methods that do not consume the sample is great. This technical note provides an introduction to the field of proteins, their composition and structure, and how size, molecular weight and ζ-potential measurements can be applied.

Proteins

Amino Acids

Amino acids are small molecules with a common structure. They have a central carbon attached to an amino and a carboxyl group, a hydrogen atom, and a fourth functional group (R). This functional group is variable and it is this that changes in each of the 20 or so amino acids the body uses to build proteins. This basic structure of amino acids is shown in figure 1 in both its charged and uncharged states. A couple of examples of amino acids are shown in figure 2.

Figure 1: Amino acid structure

Figure 2: Some example amino acids

The different functional groups on the amino acids confer them with different properties. For example, the amine group on the lysine will gain or lose its charge depending on the local environment. The sulphur groups on cysteine residues can bond together covalently to form di-sulphide bridges when two of these amino acids are brought close together.

Amino acids join to one another via bonds known as peptide bonds between the carboxyl carbon and the amino nitrogen. When the bond forms, a water molecule is released (figure 3).

Figure 3: Peptide bond formation

Amino acids can join together using these peptide bonds in chains of almost any sequence, which are subsequently known as polypeptides. A short sequence of up to a dozen or twenty amino acids is generally known simply as a peptide. When a polypeptide is of an appropriate sequence, size and structure, it is functionally a protein.

Protein Structure

The function of a protein is determined by its structure rather than its sequence of amino acids. However, the sequence of the amino acids is a key factor in determining the final structure of the protein. A protein with no fixed structure is said to be in a random coil formation. This has very little regular structure and no activity. Functional proteins have a very tightly regulated structure held together by hydrogen bonds and Van der Waals forces between nearby amino acids, di-sulphide bridges between cysteine residues, and hydrophobic interactions. The structure of a protein has four levels of complexity.

Primary structure simply describes the sequence of the amino acids in the polypeptide chain.

Secondary structure describes the large regular sub-structures that form as the protein folds. There are two major sub-structures that form as secondary structure. These are the α-helix and the β-sheet. Certain amino acids in sequence are known as helix formers (including e.g. methionine, alanine and leucine). They form tight coils known as α-helices, which can be joined by loops (short amino acid sequences with loose structure). Hydrogen bonds between the double-bonded oxygen of one amino acid and the amino hydrogen four amino acids along the helix hold the structure together. The structure of an α-helix is shown in figure 4.

Figure 4: α-helical structures

α-helices form in many proteins and are frequently found in membrane spanning proteins - proteins that sit in the cell membrane. These are used to transmit signals or ions or molecules across the membrane either into or out of the cell. These proteins may have multiple trans-membrane domains. As an example of this, G-proteins transmit external signals into the cell. These have a regular structure including 7 membrane-spanning helices. A representation of this is shown in figure 5.

Figure 5: The 7 trans-membrane helices in β2-adrenergic receptor, a G-protein coupled receptor.

β-sheets are formed when straight chains of amino acids in the polypeptide run past each other in opposite directions (anti-parallel) as the protein chain folds. Hydrogen bonds form between the carboxyl oxygen and the amino hydrogen of opposing amino acids giving the structure rigidity. These chains are joined at either end by loops or turns. This flat linear structure is called a β-pleated sheet or β-sheet. Figure 6A shows the hydrogen bonding between amino acids and figure 6B shows 'green fluorescent protein' in which β-sheets, shown in yellow, form a barrel structure.

Figure 6: A: Hydrogen bonding between opposing strands in a β-sheet. B: Green fluorescent protein with β-sheets forming a barrel structure.

Tertiary structure is the final structure that forms from the secondary structures. It is the final 3-dimensional structure of the protein. It is held in place by hydrogen bonds and Van der Waals forces, hydrophobic interactions, and disulphide bridges.

Some proteins only function when two or more polypeptide chains come together to form dimers/trimers etc., collectively known as oligomers. These can be made from identical sub-units (component proteins) and be called homomers, or different sub-units and be called heteromers. The arrangement made by the formation of oligomers is known as quaternary structure. For example, the final structure of hemoglobin is a homotetramer made up of four sub-units each itself heterotetrameric consisting of 2 pairs of sub-units α and β (figure 7).

Figure 7: Ribbon diagram of the structure of Hemoglobin

Post-translational modifications

The process of manufacturing a protein within a biological cell is called translation. It is at this stage that amino acids are joined together in sequence and the protein folds. This process is performed by ribosomes, themselves partially protein, which translate the sequence of a strand of RNA (similar to DNA) into a sequence of amino acids for a protein. Subsequent to this process a number modifications can take place that affect the activity of the protein. These can include glycosylation, where chains of sugars are bound to the surface of the protein. This is frequently used to target proteins to particular locations within the cell e.g. the cell membrane. Phosphorylation, the addition of a phosphate group can be used to modify the activity of a protein. Multiple glycosylation and phosphorylation sites may be present on a single protein.

Denaturation

As temperature rises, the internal forces that hold the structure of the protein together are overcome and the protein unfolds. Changes in pH will affect di-sulphide bridge formation and the ionization state of the numerous functional groups on the amino acids that may be involved in internal bonds. In most proteins, once this process begins, it tends to continue until the protein structure is completely lost. The activity of the protein will also be completely lost. The protein is then said to be denatured. The point at which this occurs is called the protein's melting point. A protein's melting point can be used as a predictor of how quickly it will degrade in given conditions.

Aggregation

As proteins begin to denature, charged ionic groups, hydrophobic regions and dipoles will be exposed to the surrounding medium. When partially denatured proteins come together, they will easily bind to each other via these exposed regions and the same forces that hold the protein structure together, will hold proteins to each other. These bonds are strong, can hold many proteins together in groups and are often impossible to dissociate without totally denaturing the proteins. This process is called aggregation. Not only will aggregation clearly result in the loss of activity of any protein involved in an aggregate but pharmaceutical preparations that contain aggregates are known to cause strong immune responses in patients treated with them. These include inflammation or, more seriously, anaphylactic shock, all of which are undesirable, and mean that aggregation must be avoided during the production or storage of a therapeutic protein.

Protein Activity

The functions of proteins within the body are highly varied. They regulate almost all the chemical reactions that occur within living organisms; they act as transporters for ions or other molecules; they signal within and between cells locally and throughout an organism; they build and breakdown other proteins; they build and break down DNA; and they stimulate and regulate cell growth and division.

When a protein drives a chemical reaction by reacting substrates to form products, it is known as an enzyme and typically has the suffix -ase. There are many families of enzymes both within organisms and between them. They can be very similar, performing the same function in different parts of an organism, or at different rates. As an example of this, the enzyme Nitric Oxide Synthase has three isoforms (family members) called endothelial, neuronal and inducible. All three of these use the substrates arginine (an amino acid) and oxygen and produce citrulline (an amino acid) and nitric oxide. The three isoforms exist in different parts of the body and have different kinetics, meaning that they perform the same reaction at different speeds.

Other proteins transport molecules within the body. To return to a previous example, hemoglobin carries oxygen in the blood. An oxygen molecule binds to hemoglobin, causing the structure of the protein to change. This is a called a conformational change and is a common process in protein activity. This change in structure makes it easier for the next oxygen molecule to bind inducing another change making it easier for the next and so on. The same is true for the opposite process as the oxygen dissociates.

G-proteins, discussed earlier, transmit signals across cell membranes. A signaling molecule binds to the external portion of the protein. This is believed to cause a conformational change in the shape of the protein that exposes a functional region on the inside of the cell. This allows another protein to bind on the interior region of the protein beginning a signaling cascade within the cell that leads to the desired changes.

Insulin is a hormone, an example of a signaling protein. It is produced by the pancreas in response to rising glucose levels in the blood and it travels in the blood to the liver, which it signals to take up the glucose and store it. Diabetes is caused by the loss of the body's ability to produce sufficient insulin. By measuring their own blood glucose levels and injecting insulin, a person can regulate their own blood sugar levels artificially.

Asthma is a disease caused by chronic inflammation of the lungs. Proteins like interleukins 6 and 8 regulate inflammation and these two in particular stimulate the process. Others such as interleukin 1 and 10 reduce inflammation. Interleukins are cytokines, proteins that are secreted that cause other cells to move and divide. Much work has gone into trying to artificially regulate these and many other proteins involved in the inflammatory processes

These are just a few examples to demonstrate the many ways in which proteins function. Protein activity is linked in many series and pathways and changes in the activity of any protein have knock on effects on these pathways. The goal of many researchers is to modulate the activity of these pathways artificially and beneficially. As the above examples show, diseases are often caused by the loss of a protein or by a failure to regulate these pathways correctly and their artificial regulation can be used to correct these failures and treat diseases.

Antibodies

Antibodies (also known as immunoglobulins) are large proteins of approximately 150 kDa involved in the immune response. They have a defined structure shown in figure 8. The base of the molecule, the Fc fragment, always has the same primary, secondary and tertiary structure within a given organism. The other part of the molecule, the Fab fragment, maintains similar shape but has a large amount of variation in structure such that there are millions of different antibodies in any organism. Antibodies are made to bind to foreign objects or antigens, such as molecules, bacteria and viruses, within the body and have many variants in order to identify as many foreign molecules as possible. Any foreign molecule that specifically binds to an antibody through various inter-molecular bonds identifies it as such. This triggers the immune response leading the body to destroy the foreign antigen.

Figure 8: Antibody structure

Antibodies are widely used in biotechnology laboratories. Monoclonal antibodies are raised artificially to be all identical. They can then be labeled with groups like rhodamine, a fluorescent molecule, or biotin, a molecule that can give off light. When they are raised against a particular target protein, they can be used to target that particular protein in cultured cells. If an antibody is labeled, the researcher can quantify the target protein according to the amount of bound antibody. Alternatively, the location of the target protein within the cell can be identified. This is a common tool in biochemistry used in many different applications.

Other protein terminology:

Active site - this is where a substrate binds an enzyme and the reaction takes place.

Cofactor - this is a molecule that is involved in the enzyme reaction in an alternative way, perhaps donating electrons to the reaction.

Conjugate - a conjugate is a molecule bound to a protein such as a labeling group (e.g. a fluorescent dye - rhodamine).

Inhibitor - an inhibitor binds to a protein/enzyme to inhibit its function either by altering the kinetics of the reaction or blocking the substrate from binding.

Ion channel - an ion channel is a trans-membrane protein that forms a pore allowing particular ions to traverse the cell membrane.

Ligand - a ligand is an extra group or molecule that binds to a protein.

Lock & key - this refers to the fit of the substrate in the active site. Intermolecular bonds hold the substrate in a close fit similarly to a key in a keyhole.

Motif - a motif is a substructure within a protein such as a group of α-helices.

Mutant - a protein whose structure is altered due to the deletion of an amino acid or its replacement with an alternative, either by experimental design or chance as occurs in some diseases.

Native - a protein in its native state is correctly folded and has a functional structure.

Recombinant - a recombinant protein is one produced artificially in the lab.

Vector - a piece of DNA constructed in the lab coding for a protein to be artificially produced.

Virus - viruses are frequently used to infect cultured cells with vectors in order make them manufacture a particular protein.

Wild type - the naturally occurring un-mutated form of a protein.

Protein Measurements

Batch Dynamic Light Scattering (DLS)

The primary measurement of proteins that can be performed with batch DLS is a size measurement. Since proteins have a very consistent composition and fold into tight structures, the hydrodynamic size relates predictably with molecular weight. The Zetasizer software has a model to predict the molecular weight of a protein from its hydrodynamic size by DLS. The activity and function of a protein is closely related to correct folding and structure. As such, activity is also directly related to the size of the protein. Thus, size can also be used as a predictor of activity. Quaternary structure of the protein can also be studied. When proteins oligomerize, their size and molecular weight will increase in discreet increments corresponding to the addition of separate proteins. Again, by measuring the protein under different conditions the oligomeric state of the protein can be assessed. Many proteins rely on correct quaternary structure in order to function, so again, hydrodynamic size can be used as a predictor of activity

In adverse conditions such as extremes of temperature and pH, a protein will become denatured. By controlling these conditions, and measuring the hydrodynamic radius, the melting point of the protein can be established. This is related to stability of the protein and can be used as a predictor of shelf life.

Crystallization of proteins is a necessary step for elucidating their detailed 3-dimensional structure. Crystallization is a difficult process that requires a highly purified protein kept in ideal conditions. In DLS measurements, polydispersity is a measure of the purity of a sample. A protein sample with a very low polydispersity indicates that it is highly purified, that all the protein is in one particular oligomeric conformation and that its structure is very well controlled under these conditions, all of which are required for crystallization. By identifying a protein sample with the lowest polydispersity, a researcher can find the most suitable conditions for crystallization.

Light scattering techniques are particularly sensitive to larger molecules in preparations of smaller molecules. Any increase in the size of a protein will most likely be the result of aggregate formation. The sensitivity of the DLS measurement to larger proteins means that the earliest stages of denaturation, leading to the formation of a few aggregates, will result in changes in the mean hydrodynamic size. As such, DLS is the most sensitive technique for detecting small quantities of aggregates in preparations.

Static Light Scattering (SLS)

Following on from DLS measurements, SLS measurements can also be made of proteins. Often highly purified, many protein samples should be applicable for batch measurements of molecular weight using SLS, as long as the concentrations are accurately known. By measuring the amount of light scattered at different concentrations of sample, the molecular weight, which is proportional to the amount of light scattered, can be calculated by creating a Debye plot.

The 2nd virial coefficient is a measure of molecular interaction within a solution. A strongly positive value indicates good solubility while a strongly negative value indicates a propensity to aggregate. For ideal crystallization conditions, a protein will be aggregating at a very slow rate. This will allow regular structures, and thus crystals, to form. Tailoring the 2^nd virial coefficient to be small while remaining negative should in theory lead to ideal conditions for crystallization. The slope of the line in the Debye plot is 2x the 2^nd virial coefficient so this technique can also be useful for studying crystallization conditions.

Charge and ζ-Potential

Zeta-potential measurements of proteins are also possible, using a suitably sensitive instrument such as the Zetasizer Nano ZSP, and an appropriate method such as the patented diffusion barrier technique. A significant number of the functional groups on amino acids can be charged and any combination of these may be in their charged or uncharged states in the protein. This will change depending on the conditions in the local environment and it is important to note that zeta-potential can be different from the calculated net charge based on the likely state of the charged residues in the molecule. Charge is of particular interest to protein chemists and ζ-potential should be able to compete with iso-electric focusing, currently one of the primary methods for determining protein charge, as it allows the protein to be kept in conditions far nearer to its native state. It should be noted, however, that proteins are subject to being denatured by the applied electric field, which can make zeta-potential measurements difficult.

Overall, ζ-potential is a measure of the strength of the repulsive forces between molecules in solution. Conventionally, this has been used as a primary indicator of the stability of a sample preparation. With high ζ-potential, and consequently, high inter-molecular repulsive force, a drug or protein preparation can be expected to be stable for longer periods than a similar preparation with low ζ-potential.

Gel-Permeation Chromatography (GPC) (or size-exclusion chromatography (SEC))

Adding SEC capabilities to a light scattering detector is a way to greatly improve its resolution. While DLS can be used to characterize the oligomeric state of a protein, it is unable to resolve a mixture of oligomers. SEC separates molecules based on their size making it an excellent partner for light scattering. By separating the molecules before measuring them, using DLS or SLS, this technique can be used to identify the different components in a mixture.

It is more common to at SLS to SEC. At known concentrations, measured with a refractive index or a UV detector, molecular weight can be related directly to the amount of light scattered by a molecule. This can be combined with data from a viscometer, which measures viscosity allowing size and some structural aspects to be determined. Thus, a large amount of information can be obtained for a single protein sample using this method.

SEC also adds another dimension to the detection of aggregates. By separating them from the primary sample, it is possible to further characterize and quantify them. Manufacturers of protein solutions routinely use SEC as the final step in purification. SEC is used to polish samples in order to remove any aggregates formed in the sample preparation. The same is true when purifying a single protein from biological samples.

Research Areas - who will be measuring proteins?

Researchers studying proteins fall into the two groups of academia and industry. Biochemists, biophysicists and molecular biologists generally study the structure of proteins, are interested in oligomerization as well as the chemical reactions associated with a protein. Pharmacologists are interested in manipulating the activity of proteins. They use other proteins or drugs (usually small molecules e.g. aspirin) to artificially shift the balance of regulation to correct the changes that result in diseases.

Research into protein activity and function is generally done by those who study proteomics, which involves the identification, characterization and regulation of proteins.

Instrument selection

The most versatile light scattering instrument available is the Zetasizer Nano. The Zetasizer Nano ZSP has been specifically designed to provide the sensitivity required for the size and zeta potential measurement of poorly scattering materials such as proteins.

In industry, much of the work performed is screening for possible drug candidates or appropriate crystallization conditions. For such high-throughput situations, the Zetasizer APS, with DLS plate sampling technology, is the most time-efficient and sensitive DLS instrument on the market.

In early protein analysis studies, proteins are often purified and are only available in very small quantities. In such situations, the Zetasizer μV provides the industry leading sensitivity of the Zetasizer range using volumes as low as 2 μl.

For SEC applications the Viscotek range of instruments and detectors provides detailed and accurate information on molecular weight, size and structure of proteins.

Figure 9: A: Zetasizer Nano B: Zetasizer APS C: Zetasizer μV D: GPCmax