Monday, 16 October 2017

Separation and divorce in Biochemistry!

Related imageOne of the most fundamental strategies in experimental Biochemistry is the separation of cells or molecules in order to clarify their function. The development of microscopes first revealed the phenomenon of the cell as the "atomic" equivalent of chemistry in biology. This year's award of the Nobel Prize in Chemistry to those pioneering single particle cryo-electron microscopy, shows how important this technology remains today. Later, the fractionation of cells and the development of centrifugation, solvent extraction and chromatography formed (and largely still forms) the basis of biochemical separation methods. My title is meant to be a light-hearted comment on an important aspect of contemporary biochemistry: can we go too far in separating cells and molecules? Hence I am going to consider separation as the reversible disruption of interacting molecules versus divorce, in which the interaction is technically over, and irreversible. Of course I do realise that separation can sometimes lead to divorce and that divorce can be followed by a re-uniting of partners (see image, top LHS)! And then there is the issue of promiscuity and multiple partners. I can see this developing into a controversial post. Oh well here goes!


Sir Archibald Garrod's "one gene: one enzyme hypothesis" has had a major influence on the logic of experimental biochemistry over the last 100 years. If you combine this with the classical chemistry and physics approach of reducing a system to its components in order to determine the properties of the constituent parts, then it is no surprise that mainstream biochemists during the second half of the twentieth century, spent much of their time separating, enriching and purifying cells and molecules. However, there have always been a small number of scientists for whom this reductionist approach misses the point. Some of the leading scientists during this "Warburg" phase, including Sir Frederick Gowland Hopkins, Conrad Waddington, and later the code-breaker Alan Turing, were already discussing data in a "Systems aware" manner. You may wish to look at  the work of Dennis Noble in this context: his new book, Dance to the Tune of Life: Biological Relativity, provides both an excellent and an accessible starting point. 

Any engineer will tell you that a machine, however simple can be deconstructed and its mechanism of action determined. A trivial example would be a cork screw. The one shown on the left comprises two parts: a wooden "handle" and a metal rod, half spiral with a blunt and a pointed tip. As I am sure you are aware, this device is pushed into the soft cork in a bottle of wine, and by turning the corkscrew clockwise until it is fully "engaged" by the cork, the handle is then used to pull out the offending item separating you from your glass of wine.


It has been suggested that similar devices were first used to remove blockages from the barrel of a rifle. The corkscrew is very much a sum of its parts: the rotation of the handle, the traction of the screw in the cork and finally the effort in pulling, which is provided by the user. Many variations on this simple design exist, but the replacement of the wooden handle with a "key" and two hinged side arms, which act as levers to reduce the effort of the pulling action and reduce the risk of splitting the cork, is one such popular "evolution" of the original corkscrew. The addition of a collar, prevents the screw from being driven down too deeply, and was an early modification to the original patent over 200 years ago.

What can we learn from the invention of the corkscrew (and the later modification), that is relevant to Biological systems? Let's look at the materials first. The handle is typically made of wood; an abundant and simple material which needs to be hard enough to tolerate a firm grip (brittle or very soft woods are incompatible). The metal rod must also be strong enough to tolerate the forces: very soft metals would be incompatible, but the metal must be sufficiently malleable to be forged into a spiral and sharpened to a point. The end of the rod that is inserted into the handle must either be flattened or made asymmetric in cross-section, in order to ensure it doesn't twist during use. Typically the rod is fixed into the handle with some form of glue, but this isn't essential. The materials used in construction of any machine or structural device are a critical part of engineering design: so too in Biology, the molecular components are critical. Double stranded polynucleotides have evolved as information stores in most living organisms and polypeptides (or occasionally RNA) are used for enzyme fabrication. And you may see some similarities with the corkscrew in the molecular structure of the bacteriophage T4 base-plate, shown top left and the subject of an earlier Molecule of the Month post.

Perhaps the most important parallel between simple devices in the Newtonian world and the sub-cellular world of Biology is the harnessing and distribution of energy and mechanical forces which is central to the origin of life. In order to convert simple foods such as proteins and carbohydrates to energy and more of us, cells must operate within strict "windows" of temperature, pressure and pH. It is the relationship between form and function which continues to fascinate many biochemists and why the reductionist approach has been so popular. 

Getting back to the topic of this post, I hope you can see that by deconstructing the components of a "machine" or structure, we can learn a great deal about its mechanism of action. I shall return to the limitations of this approach later, but for now let's consider how biological molecules can be isolated.

Consider a banana growing in a bunch as shown on the left. At a very simple level the plant comprises the fruit, the leaves, the stem and the root. It would only take a few minutes to separate each component, and of course the banana skin can be separated readily from the edible fruit in a few seconds. In fact each component can be "purified" or separated mechanically with very little "contamination" from the other components. If we now take the leaf tissue, for example, and look at it under a high quality, light microscope, the heterogeneity of the tissue is plain to see (RHS).However, such microscopic structures are much more difficult to isolate than the larger structures. Even more challenging is the separation of the 25 000 or so proteins encoded by the Banana genome from each other and from lipids, carbohydrates and the many low molecular weight constituents in the cell. The same problem would be faced by a microbial or mammalian biochemist.

Differential solvent extraction and salting out

These methods are derived from analytical and preparative chemistry. At some point you will have surely observed the formation of a precipitate. The addition of water to Pernod or Ouzo, or the mixing of silver nitrate and sodium chloride solutions leads to the appearance of a precipitate of silver chloride. The same outcome can be seen when salts like ammonium sulphate (AS) are added to protein solutions, or when organic solvents like ethanol or acetone are slowly added. By carefully adding step-wise additions and centrifuging or filtering the mixtures, the pioneers of protein chemistry were able to purify abundant proteins such as trypsin or histones. Those interested in the original landmark papers in this area (or anyone considering a PhD in this area) should read the first volume of Methods in Enzymology and look at the work of J.B. Sumner and J.H.Northrop. Here you will find a link to an (open access) classic publication outlining the purification (and crystallization) of bovine glutamate dehydrogenase from this period. This was a paper I personally referred to throughout my entire PhD, and still do. However, bringing methodology more up to date, here is the purification of a mammalian transcription factor from the laboratory of Robert Tjian in the 1980s: I think you will appreciate the power of SDS gel electrophoresis in helping Tjian's lab follow the protein! Finally, I have chose a paper from the laboratory of Bertrand Seraphin, who combined the power of molecular genetics with affinity chromatography, opening the doors for many to characterise multi-protein complexes involved in key biological processes. This method is colloquially referred to as Tap-Tagging, and whilst a number of modifications have been made since the original work was patented and published, it remains in my view, a tour de force in the field of protein chemistry. Let's now consider the methods described in these later papers.

Liquid chromatography

The separation and purification of proteins was transformed by the introduction  of "biocompatible resins" enabling biochemists to access much less abundant protein species from complex mixtures using the principles applied by Tsvet in the 1930s for the separation of plant pigments. In principle, column chromatography, as it is often called, is simply a modification of paper chromatography. Components of a mixture are separated by their differential partitioning between a mobile phase (sometimes an organic solvent or  a buffered salt solution) and the stationary phase (the paper or the resin). Advances in polymer chemistry, and in particular in the control of bead size, shape, stability, inertness and porosity were incredibly important in enabling biochemists to rapidly capture labile proteins. When Cuatrecasas published his first affinity chromatography paper in 1970, the final addition to the portfolio of liquid chromatography methods for purifying proteins was added: affinity chromatography.

In short there are 6 mainstream "flavours" of column chromatography used routinely in biochemistry labs.

1. Gel permeation chromatography (also called size exclusion or gel filtration chromatography or molecular sieving). Here, large proteins (eg MW in excess of 500 000 can be separated from small proteins (eg MW of 20 000), but samples can become significantly diluted.

2. Ion exchange chromatography (in which the resin may be either either anionic [attracting negatively charged species] or cationionic [attracting positively charged species]. This is perhaps the most effective, popular and economical way of enriching proteins with a high degree of resolution: proteins often elute as a concentrated solution.

3. Adsorption chromatography (usually based on hydroxyapatite slurries). An early column method, which has some niche applications, but is largely obsolete.

4. Hydrophobic chromatography. Here the resin beads are coated in a hydrophobic layer, which can concentrate some classes of proteins. Elution is often achieved via a "reverse" salt gradient.

5. Reversed phase chromatography. This method offers considerable control over the adsorption and elution of proteins, in particular small polypetides or peptides generated by proteases. It is uncommon to use RP chromatography if activity is to be preserved: it is more often used to analyse small proteins and to feed mass spectrometers in proteomic analysis, since the resolution afforded by the combination of silica based stationary phases and organic solvents enhances the identification of proteins in complex mixtures.

6. Affinity chromatography. Perhaps the Biologist's favourite form of chromatography. A ligand, such as an antibody, a cofactor or a metal ion is coupled to the resin and those proteins with a biospecific affinity for the ligand are selectively captured and eluted with an excess of the free ligand (in most cases).

There are a number of text books describing the generic principles of liquid chromatography and the associated instrumentation. I will therefore not spend much time on this here, but if you want to dig deeper, an excellent starting point is the GE Sciences series of booklets, which can be downloaded here. GE (formerly Pharmacia) have played a major role in the development and spread of liquid chromatography in the Life Sciences over the last 50 years. In all of the above methods a mixture of proteins is applied to the top of a vertical column of resin, which may be less than one ml in total volume, or several liters, depending on the scale of the separation required. Since the GE literature is so good, I shall leave a detailed description of the technology to the masters! However, the preparation of the sample is also of paramount importance, and so I shall devote some time to a discussion of this topic, since this is often overlooked.

Garbage in: garbage out

The successful isolation of a pure, fully functional protein is not trivial. Success begins with the careful handling of the Biological material at the beginning of the experiment. In my view this is not given the attention it deserves, and in my experience can make or break a purification experiment more than any other factor. The phrase: "garbage in: garbage out" may be more familiar to computer scientists, dating from the 1950s. For me, it sums up a failure to show the appropriate level of "respect" for the organism, cells or tissue from which you are extracting your protein of interest. [The same would be true whatever the molecule you are trying to purify: DNA, RNA, a particular metabolite etc.] A more worrying re-wording of the phrase is "garbage in: gospel out" in which too much trust is placed on technology to overcome poor data entry or coding. Again, I fear the same can be said in some biochemistry circles!

The source of biological material used by Biochemists over the last hundred years or so ranges from bacteria such as Escherichia coli and Bacillus subtilis, yeast (motivated by the brewing and baking industries), insects (drosophila in particular owing to the geneticists' predilection for the fruit fly!), plants (spinach and more recently arabidopsis), low value abattoir tissues (or in the US, slaughter house) including bovine and porcine organs such as the liver, the brain etc. and of course the classic model rodents: rats, guinea pigs, mice etc. The first thing you will realise is that all of these tissues and species, present quite different challenges with respect to extraction. The methods and instruments used routinely for cell extraction include the following
  • mortar and pestle
  • glass homogenisers and mincers (later motorised)
  • freeze-thawing (very low temperatures came in relatively recently)
  • mechanical shearing
  • enzymatic disruption
  • chemical (detergent) disruption
  • sonication
This is not an exhaustive list and some methods involve combinations of two different methods. It is of course also possible to use brute force, such as boiling or adding strong acids to release cell contents, but these usually (but not always) produce biochemically inactive extracts. Each method should be researched and initially followed "to the letter", but I cant stress enough that assays must be incorporated in order to monitor the yield of (for example) total soluble protein and active material, such as an enzyme. It is also important to measure the weight of all samples before addition of buffers and to carefully note the volumes of buffers used at every stage of the extraction. Water typically represents around 70% of the cell mass, and the remainder is made up of mainly proteins, nucleic acids, carbohydrates and lipids as shown in the "voronoi tree" taken from the excellent cell biology by the numbers site. These proportions represent a generalised case, and there will be specialised cells and tissues that show quite different distributions of macromolecules. 

Returning to my "garbage in: garbage out" caution, if you do not take care to maximise the yield of extracted protein and to ensure it is "stable", then the investment of effort at the first stage in the purification experiment, will be wasted. A good experimental scientist maximises the return on his/her efforts: and after all you may be sacrificing a mouse or a rat unnecessarily! With experience, it become easy to make a reasonable assessment of the success or failure of an extraction protocol. Following centrifugation of a small sample, maybe around 0.5ml (as shown above), the deep colour of the supernatant (in this case green, for GFP) but which is usually somewhere between yellow and brown will be revealed. (The cell will contain a mixture of chromophores bound to a certain proteins, with colours such as yellow (flavins), red (haems) and brown (iron-sulfur), as well as free pigments). When you are satisfied that you have extracted close to 100% of the available protein (and remember soluble proteins represent around 70% of the total protein: the remainder are membrane associated), you can proceed to the next step. It is important here to retain samples for activity measurements and for running SDS PAGE to compare the purity of your particular protein at each step of the purification. I strongly recommend following your purification experiment using a table similar to the one shown below. Not only does this tell you which steps are important and effective, it also enables you to compare different batches. As you might expect this is of critical importance if you are purifying a protein that will become formulated and given to a patient!



Step
Fraction volume (ml)
Protein (mg)
Activity (units)
Specific activity (U/mg)
Sonication
58
1800
28000
15.5
30% ammonium sulphate cut (retain supernatant)
67
1200
25000
21
Dialysis
74
1150
26000
23
Ion exchange (fractions eluting between 300-400mM NaCl
26
58
23500
405
Dialysis
28
55
23000
418
Affinity chromatography (fractions eluting in 400mM imidazole)
4
4
20000
5 000
80% ammonium sulphate precipitation
-



Storage buffer and dialysis (50mM Tris,Cl, pH 7.8, 20%v/v glycerol)
0.5
3.5
19500
5 570

Let's take a look at the Table above in a little more detail. First the amount of protein at the start is reduced from 1800 mg to 3.5 at the end (a factor of just over 500-fold), at the same time the number of units of activity only drops from 28 000 to 19 500 (just over a third of the activity has been lost). This immediately suggests a good level of enrichment has been achieved, and importantly activity has not been too heavily sacrificed: this is always a balancing act in protein purification. [From a commercial perspective, a 30% loss may be catastrophic, it really depends on the cost of production and the profit margin. Similarly in research, providing this level of loss does not impact too negatively on the downstream experiments, such a loss may sometimes be acceptable, but it could also be an area to focus on improvement in the protocol; buffering, temperature and speed of the process, would be initial areas to consider].

The most important thing to take away from this Table is the importance of the ratio of units of activity to the amount (in g, mols etc) of protein. This is referred to as specific activity: in my view it is a key, generic, concept that will help you evaluate the significance of decision making in many walks of life. For example the wealth of a country per capita, the number of calories consumed per day, the number of miles per gallon a car consumes etc. These are all versions of specific activity and they all provide incisive information to help inform decisions.  A couple of minor points. The measured quantity of units sometimes increases as an inhibitor is removed during purification, and you should look carefully at whether a purification step makes a significant difference to the specific activity. If it doesn't, it is a waste of time....unless, it removes an important contaminant. I once witnessed a colleague add an apparently "pure" DNA binding protein to a pure DNA sample for a structural experiment. [After lots of effort to obtain both the DNA and the protein]. Unfortunately, the "pure" protein was contaminated by a very small amount (maybe 0.001%) of a nuclease. After around 5 hours, there was no DNA left in the sample! I have never forgotten this! Also, the fractions eluting from a column should be kept separate until the specific activity is determined and ideally an SDS PAGE analysis carried out. Pooling of samples then becomes a judgement depending on the use of the purified protein. If in doubt store the fractions separately.

Storage of samples is another neglected area of protein purification: the application of the protein or downstream analysis should dictate the conditions under which the protein samples are stored. There are several options. Let's assume the final sample has been dialysed (or desalted using a column) and concentrated to a working/stock concentration. I should stress of course that an empirical assessment of the optimum, storage conditions must be established for any new protein preparation, ensuring activity is maintained close to 100%.

1. Store the solution for immediate use in the fridge (or occasionally at room temperature, if the protein is cold-labile, yeast isocitrate dehydrogenase springs to mind). The risks here are losses owing to proteolysis, by a low abundant contaminant or as a result of microbial contamination. The addition of protease inhibitors or bactericidal agents such as sodium azide are well documented precautions. I should point out that PMSF (a commonly used, irreversible inhibitor of serine proteases, requires careful preparation during use, owing to its solubility characteristics. Look here before you use PMSF. 

2. Store the solution in a freezer. Some proteins seem to be tolerant to repeated freeze thawing and can be frozen directly after the last step in a method (usually some form of buffer exchange). However, it is more common for the sample to deteriorate in the absence of some form of anti-freeze compound, this is usually 20-50% v/v glycerol, which may also include an additional stabliser, such as a commercial preparation of Bovine Serum Albumin (BSA) typically at a final concentration of 1-10mg/ml.

3. Precipitate the protein with a saturated solution of ammonium sulphate (AS). Prior to the widespread use of laboratory freezers, many biochemists would store their pure preparations (and intermediate stages of the purification) by precipitation in AS. It is simple and only occasionally is the protein inactivated. Powdered AS can also be added directly to the sample, especially if you are working with tens of mls (or more); in this case, use high quality, finely ground AS, and add it slowly with constant stirring, to avoid localised denaturation of the sample.

4. Currently, one of the most popular methods for storing proteins is lyophilisation, in which a freeze drier is used to maintain the sample in the frozen state, while the aqueous phase is slowly removed under vacuum by sublimation. As with freezing, preservatives may be needed. These typically include low molecular weight sugars and stabilising proteins such as BSA. However, it is critical to EXCLUDE compounds like GLYCEROL, which do not support lyophilisation.

Critical comments


My title refers not only to separation, but also to "divorce". So let me finish with some comments on the pros and cons of separation. First of all, a reductionist view, nicely captured by the late, great Biochemist Efraim Racker: "Don't waste clean thinking on dirty enzymes": emphasises the view that in order to understand how a molecule "works", it must be free from "contaminants". I think this is a completely logical position. However, how do we know whether a co-purifying polypeptide (or co-factor) is required for the complete function of the target protein? The problem is exacerbated when the target is a multi-component assembly. The image above is of an SDS PAGE gel of the yeast Arp2/3 protein complex purified by Dr. Qaiser Sheikh in my lab a few years ago. If you didn't know anything about this particular protein, you might (quite rightly) challenge me over my use of the word "purified". Lanes F1-8 contain a series of 6 pretty clear polypeptide (bands). Lanes F2 and F3 seem to also contain a few higher molecular weight species (indicated by the arrows). Let's look at the data with a critical eye.

Let's first assume the protein sample is "pure" (or homogeneous). Are all of the polypeptide species present in equimolar proportions? A strong indication of a contaminant is a lower (or higher) intensity in respect of staining by Coomassie Blue. We first suspected the arrowed bands were contaminants. Secondly, as we descend the gel image, the bands in the middle (45-65kDa) appear darker than those at the foot of the gel (15-20kDa). Is this a consequence of the reduced stoichiometry of dye uptake by the polypeptides (see the comments above about specific activity), or are they less than equimolar? This is a difficult one, since the intensity of the dye stain is usually proportional to the length of the polypeptide chain, but there are some proteins that show anomalous interactions with Coomassie Blue. You can read a detailed analysis of Coomassie Blue staining of polypeptides here. This would give me some cause for caution, but on balance I would initially assume that the intensity of the dye drops with polypeptide chain length: therefore the 6 bands are likely to be approximately equimolar. 

How do I know that this isn't a simple collection of polypeptides that have co-purified? Why do I think they form a "complex" (by which I mean a non-covalently associated, stable, oligomeric assembly of polypeptdies, divisible typically under denaturing conditions)? Here I need to consider the method of purification. A similar result following purification by fractional saturation with AS or by ion exchange chromatography would not necessarily imply that the polypeptides are a complex. However, these sample have just eluted from the final step in a TAP Tagging protocol, in which two affinity chromatography steps have been used to purify the Arp2/3. One of the polypeptides is "tagged" with two short protein sequences which interact sequentially with two affinity columns (actually in this case, the second column is actually a capillary (which you can read about here if you are interested). This implies that there is a physical interaction between the tagged polypeptide and the co-purifying subunits. Of course the functional significance of such interactions is not implied simply by co-purification. This requires a completely different set of experiments. However, the conclusion that the 6 prominent bands exist in an equimolar complex is completely consistent with the observed data.

But, I hear you say, what about those two "minor" high molecular weight bands? These "bands" were removed from the gel and analysed by Mass Spectrometry by Professor Mark Dickman. The analysis confirmed that they were both a single polypeptide named Arc40: a known "collaborator" of Arp2/3 (from independent work). However, not only had the bands migrated anomalously with respect to staining intensity; they were one not two species! The co-purification of Arc40 clearly requires further investigation in order to draw any robust conclusions about its stoichiometry in the complex, but it also reveals the limitations of Coomassie Blue and  SDS PAGE in analysing the composition of polypeptide mixtures. In the vast majority of cases, analysis is straight forward, but ALWAYS remain sceptical, especially of your own data! And reach out for independent means of corroboration, whenever you can.

What about Systems Biology?


I have made the occasional reference to the fact that whilst the logic of reductionism is sound, and it has certainly been the bedrock of contemporary Molecular Biology, the development of high throughput "omics" technologies has enabled Molecular Biologists to run a lot faster and in doing so, the inter-connected nature of the cell has emerged as something that at the moment appears to be greater than the sum of its parts. Of course, this can never be the case, it is just that we do not yet have a good enough understanding of the role(s) of all the parts. This is the view of the "Systems Biologists", and I agree that it is inappropriate to over-simplify the "code" that gives rise to the phenotypic properties of cells, tissues and organisms under the banner of the "one-gene-one-enzyme" mantra. I wrote a short post sometime ago entitled "Moonlighting Molecules" to draw attention to the limitations in our ability to confidently ascribe sequences (or primary structures) to function(s). The "Holy Grail" of Genomics is to provide a comprehensive description of the complete set of genes in an organism and their associated functions. Synthetic Biology is predicated on our ability to achieve this: by understanding the fundamental basis of experimental biology  and its limitations, will help us reach this goal.