Mol-OT-M


I shall post the most recent (top of the page) and a defining post on this page for my Molecule of the Month series (Mol-OT-M) for students at the Life Sciences UTC in Liverpool and in the Department of Molecular Biology and Biotechnology the University of Sheffield. The archive can be found here (Sheffield) or here (UTC) and both will remain available indefinitely.

Crambin September 2017


This month I have picked a protein I first heard about from a crystallographer, visiting Sheffield around 1981. What interested me was that the structure had been refined to a very high resolution (less than one angstrom, in the early 1980s!). What also interested me was that this was a protein with no known function. So what was the point of all that effort! The molecule itself is pretty unremarkable (see the image left). It contains two alpha helices oriented into a Y-shape, linked by a short, constrained loop. The N-and C-termini seem to have considerable freedom, with a small amount of beta sheet, but surprisingly perhaps, the protein crystallises very easily.


The molecular envelope shown right, shows crambin to be a globular molecule with a well defined shape. Without any knowledge of its function, the surface of the long helix is presented for interaction and the N and C termini could re-fold around a small ligand or another macromolecule. But there is no evidence of a metal ion or any significant space for a small ligand or substrate. After all, it contains less than 50 amino acids, which makes it an ideal candidate for NMR spectroscopy.

The protein was initially isolated from an Abyssinian cabbage (or kale) and is now known to belong to the family of toxins called thionins.  [The source of proteins used by biochemists would make a nice Blog post for the future!] The key to the stability of the terminal segments is (as the name thionin suggests) the disulphide bond. The sequence of crambin is shown below, with the Cys residues highlighted.  If you look at the representation shown left, you can see the yellow sulphurs and the small network of disulphide bonds that contribute to  the stability of the structure. This is a feature of many extracellular proteins, including immunoglobulins. The analysis of structures at such high resolution provides a molecular framework for defining the precise geometry of such bonding phenomena and provide a nice


TTCCPSIVARSNFNVCRLPGTPEALCATYTGCIIIPGATCPGDYAN

experimental opportunity to address the role of disulphides in stabilisation and in protein folding pathways of proteins in general. Try mapping the bonds from the structure onto the sequence. You will immediately appreciate that primary structures must be considered in three dimensions in order to fully appreciate the significance of sequence conservation!   

Image result for crambin nmr structure

The possible application of this plant product in cancer treatment is being investigated, but remains at an early stage to date. One other point I would like to draw your attention to is a comparison between methods of structure determination. X-ray crystallography "prefers" proteins of several hundred or more amino acids in each polypeptide, whereas Nuclear Magnetic Resonance (NMR) spectroscopy "prefers" proteins with molecular weights below 20 000 (<200 amino acids). These rules aren't hard and fast, but they do significantly improve the probability of obtaining a high resolution data set (required to fix the position of side-chain atoms). The structural representation on the right was obtained by NMR. NMR structure determination generates an "ensemble" of structures that are consistent with the spectral data. (See here for an introduction to protein NMR). The first thing you realise is that some parts of the protein are better defined than others. In X-ray crystallography, any significant "flexibility" in a protein structure usually prevents the assignment of electron density in that region and this may mean that this section of a protein is not included in the deposited structural file (it is usually pointed out in the publication). In the case of crambin, the NMR structure suggests that the N and C termini are pretty rigid. A consequence of the disulphide bonding, explaining why the protein is so compact and probably explains why it such an amenable molecule for obtaining high resolution atomic data.

So finally, why has so much effort been invested by structural biologists in a molecule of such poorly defined function? This is an important issue in Science in general: what should we (as tax payers, versus say drug companies) spend our money on? Molecules like crambin can help establish the fundamental principles of protein structure. Some medically important molecules may be difficult to purify, may be unstable or may yield poor diffraction (in the case of X ray crystallography) or may be difficult to solubilise and show poor spectral resolution (for NMR). The insight we gain from "well-behaved" proteins can help us fill in the gaps with molecules that we can easily recognise as being of societal value. Moreover, workhorses like crambin can help us push the envelope of techniques like X-ray crystallography and NMR, which may then make it more likely that we can interpret the data from proteins that are less well-behaved! One final point is that structure determination alone can rarely determine the function of a molecule. It may be that we need to solve the structures of the entire proteome of humans before we are ready to comprehensively link structure and function in Biology!

RNA Polymerase August 2017

Related imageThis month is a little later than I had hoped, mainly due to my choice of molecule. I decided I wanted to get more familiar with the (expanding number of) regulatory proteins involved in the control of gene transcription in bacteria; partly out of a research interest, and partly because it allows me to combine my interest in language (or more specifically alphabets) and Science. The down-side is that I will have to replace all of my as with αand my bs with βs etc. which is always a little clunky on my free Blogging software! The molecule I am focusing on is RNA Polymerase (Pol), which I have covered earlier. However this time I am going to take a look at the transcription factors, or "known associates" of this multi-subunit protein complex that bridges the gap between information and function. The genomes of all prokaryotes contain a set of between 2 000 and 5 000 protein coding genes together with a few hundred genes that encode functional RNAs. This is all information; but in order to "translate" from nucleic acid speak (Nu-speak: sorry George!) to the language of amino acids and proteins (Pep-speak?), the ribosome is required. However, a limited number of RNA species combine an information mode with function, such as the hammerhead ribozyme, that can catalyse specific RNA cleavage in the absence of any proteins (a good future molecular candidate perhaps?). The nucleotide sequence of a ribozyme is no different than that in the genome (apart from an additional oxygen atom per sugar),and it also determines its three-dimensional fold. And therefore its biological function. 

You may be interested to know the source of the images used in this post. I have chosen, where possible, to include the beautiful models created and exhibited at the Pingry Biomolecular Modelling Project web site which is just one of the incredibly impressive Pingry School initiatives at the school: more information on this ground-breaking collaboration between the Milwaukee School of Engineering (MSOE University) staff and the students and teachers at the Pingray School can be found here. On the right is an image of the components of RNA Pol in the early stages of transcriptional initiation. I hope you will agree with me that these models capture both structure and function in a beautiful and informative way.

The Basics RNA Pols, in their simplest forms (let's leave bacteriophage enzymes on the side for now), comprise two α subunits, a β and a variation of β, called β-prime (written β')(there are some enzymes in which the β-type subunits are fused, but these are only occasional exceptions). This hetero-tetrameric "apo-enzyme" then associates with a number of "regulators" to form the "holo-enzyme", the most important being the σ subunit, which is critical for determining the DNA sequence specificity associated with the choice of the promoter to be transcriptionally active. The image below the reaction scheme shows the promoter sequences recognised by the RNA Pol holoenzyme, with the -10 and -35 elements (recognised by the sigma factor) highlighted. As we we shall see below, the σ subunit comes in a number of different "flavours".The reaction catalysed by RNA Pols is shown below: it is important to remember that while I am discussing sequence specific DNA binding, RNA Pols are catalysts and DNA and RNA represent substrates and products respectively. 


The prefixes apo and holo are derived from the Greek: meaning away from and complete, respectively, and are used frequently by Biochemists to describe proteins without (apo) a key component, such as a co-factor compared with the fully functional molecule (holo): apo-haemoglobin lacks the haem, for example. Which brings me to the inevitable glossary: an essential set of definitions of terms, symbols and concepts needed to understand gene transcription and for those of you are unfamiliar with the idiosyncrasies of the Greek alphabet, I have included my suggested (phonetic) pronunciations: remember when discussing Science, it really helps if you feel confident about the pronunciation of some of the rather ludicrous terms!

The Greek alphabet and my advice on pronunciation! [A "hard" consonant, eg the first and last G in gang is written gg, while the soft G in German is written as a single j. Where there is no ambiguity, e.g. the letter D, it is shown as a single d. If the vowel is drawn out, like the two Es in meet, it is again doubled].

α (alff-a)
β (bee-ta (UK), bayta (USA))
γ (ggamm-a)
δ (delt-a)
ε (ep-ssee-lon)
ζ (zee-ta)
η (new)
θ (thee-ta (UK), sometimes tha-yta (USA))
ι (eye-oh-ta)
κ (kapp-a)
λ (lamm-da)
μ (mew)
ν (new)
ξ (k-ss-eye)
ο (oh-mee-kron)
π (p-eye, or for English readers pie!)
ρ (row)
σ (ssigg-ma)
τ (torr)
υ (up-ssee-lon)
φ (ff-eye)
χ (kai, or k-eye [not kee])
ψ (p-ss-eye, as in psychology)
ω (oh-mee-ga (UK) or oh-may-ga (USA))

A short glossary

A

Apoenzyme: an incomplete molecule, usually requires a coenzyme (such as FAD, an additional protein (such as σ) or an RNA molecule for full function
H
Holoenzyme: an complete molecule, usually incorporating an essential coenzyme (such as FAD, an additional protein (such as σ) or an RNA molecule and expressing full biological function
O

Operator is the term given to a promoter that is flanked by a repressor (or an activator) binding site. The sequence of the promoter is extended in either direction (or possibly both

Promoter: a stretch of double-stranded DNA sequence to which an RNA Pol binds and, through a series of orchestrated molecular interactions, marks the initiation point for the transcription of a particular gene or group of genes. In bacteria, the DNA sequence comes in two sections: the -10 box comprises around 10 base pairs which are recognised by a σ factor (which is itself associated with the apo-enzyme for of RNA Pol). The -35 "box" provides contacts for the αβ subunits. The negative sign indicates the distance between the two "boxes" and the nucleotide that forms the 5' end of the transcript. The diagram below should help explain these concepts.
R
Ribosomes: a multi-component molecular machine comprising rRNA and polypeptides in the form of two "subunits" referred to by their sedimentation properties in an analytical ultracentrifuge. The 30S (small) and 50S (large) subunits co-assemble during the initiation of protein synthesis in the presence of initiation factors aminoacylated tRNAs, mRNA and an energy supply. You can read more here
T
Transcription: the catalytic, template mediated synthesis of RNA from double stranded DNA. The products are a range of RNAs, including messenger, transfer etc and the enzyme may be a single species such as bacterial RNA Polymerase, or a dedicated one such as RNA PolII in eukaryotes that catalyses mRNA biosynthesis.
Translation: the biosynthesis of polypeptide chains from mRNA templates via the ribosome. Each ribosome can accommodate virtually any mRNA and in higher organisms, aggregates of ribosomes are called polysomes

Sigma factors One of the many returns on our collective investment in genome sequencing, has been the insights gained into those genes that are essential for cell growth and reproduction. Not surprisingly, the genes encoding the polypeptides that make up RNA Pols are essential for cell viability. However, while all prokaryotes possess the genes encoding the α (rpoA)β/β'(rpoB and C) and the major σ factor,  σ70 (rpoD or sigA), there are some other regulatory factors that seem to confer advantages in regulating gene expression, that are likely to add to the physiological versatility of the organisms in which they are expressed. In the well-studied prokaryote E.coli, in addition to σ70 , we find  the following σ factors:

σ19 (fecI) - regulates the fec gene for iron transport
σ24 (rpoE) - the extreme heat stress factor
σ28 (rpoF) - the flagellar factor
σ32 (rpoH) - the heat shock factor, that is turned on when the bacteria are exposed to heat.  Some of the enzymes that are expressed upon activation of σ32 are chaperones, proteases and DNA-repair enzymes.
σ38 (rpoS) - the starvation/stationary phase sigma factor
σ54 (rpo
N) - the nitrogen-limitation factor.


Before (L) and After (R)

σ factors interact with the RNA Pol apoenzyme to generate the holoenzyme and in doing so, provide the enzyme with the capacity to recognise the -10 and -35 elements of a promoter (see figure and scheme above). The "before and after" images (LHS) show the location of the (orange) σ factor in the complex, and how its elongated shape facilitates recognition of the -10 and -35 elements (the promoter is the blue and pale green duplex above the RNA Pol). The initiation of transcription of all constitutive genes only requires the RNA Pol holoenzyme as in the "before" image. As soon as the transcriptional start site is exposed and a supply of NTPs is made available, the σ factor dissociates (the "after" image) and the elongation phase of transcription gets underway. The role of the σ factor is primarily to "target" the catalytic apparatus: by replacing the house-keeping σ factor with any of the above sigma variants, selective sets of genes can be expressed in response to one or more environmental cues. Pretty straight forward I think you'll agree. This principle of combining a core function, in this case RNA synthesis, with a variety of targeting polypeptides (in this case sigma subunits), is a common strategy used in Biology, with antibodies being a well known example. 

Anti-sigma factors The potency of σ factors has led to the evolution of antagonistic molecules, called anti-sigma factors. In some organisms, σ factors need to be attenuated [slowed] (or even abrogated [stopped]): this can be achieved by the expression of anti-sigmas. Again, the logic is pretty simple. A σ factor can be maintained in complex with an anti-sigma, until an environmental queue is triggered. Through an induced conformational switch, such as a pH transition, or the binding of a small molecule to the anti-sigma component, the two components (see the image of the T4 phage anti-sigma-σ complex, RHS) are able to dissociate and the σ factor is free to promote targeted transcription.

File:Lambda repressor.jpgRepressors These molecules have a special place in the history of Molecular Genetics. The work of Jacob and Monod (see an earlier post on RNA Pol) in the early 1960s laid the foundations for our understanding of gene regulation in prokaryotes and higher organisms. At the centre of their logic was the concept of the repressor, which was later defined in molecular terms as a protein molecule (although it can also be an RNA molecule) that interferes with transcription. The mode of action of repressors can be simply described as creating a road block in the path of a promoter bound RNA Pol, but since this simple concept was proposed, genetic, structural and kinetic studies have shown that repressors can inhibit RNA Pol progress by a variety of mechanisms which do not always arise from simply blocking the path of the RNA Pol, or by competing for a specific sequence in at or around the promoter. In fact, some repressors (including the lambda repressor shown left) are able to act as both repressors and activators of RNA Pol mediated transcription, and this forms the basis of the "plot" of the remarkable work from Mark Ptashne's laboratory, whose short book on this topic is a "must read" for all Molecular Biology students. Since most repressors do not form stable interactions with RNA Pols (although this is not meant to be a dogmatic statement), I will not discuss them further in this post.

Termination factor ρ , which is shown on the right, is responsible for terminating RNA Pol mediated transcription, but once again ρ acts like a classical repressor in recognising a specific RNA termination sequence of around 70 nucleotides, signalling the end of the road for RNA Pol: the ρ protein does not form a stable complex with RNA Pol. Bacteria like E.coli invest significant energy in synthesising this hexameric homo-polymeric protein and it is essential for viability in most prokaryotes. In fact the transcription of about half of the genes in E.coli are terminated via ρ while the remainder are said to be ρ-independent, or alyternatively utilise the proteins τ or nusA

The ω and δ factors. These are both bona fide components of the RNA Pol holoenzyme.  ω seems to be involved in chaperoning and stabilising the interactions of the β' subunit. Unlike ρ it is dispensable, in that ω knockouts survive; but it does seem to improve the net efficiency of transcription: I expect growth rates in ω knockouts are lower than wild-type strains. δ is also formally enshrined in the RNA Pol holoenzyme, and like ω, the gene encoding this not factor is not essential, but its removal from a genome, does give rise to some strange morphological changes in growing cells (abnormal elongation in particular). A complete understanding of the roles of these two factors in transcription remains to be elucidated, but both primary structures are highly conserved amongst prokaryotes and a number of groups are currently looking at the functions of these accessory factors during infections in pathogenic bacteria.

I want to close with a mention of a growing number of regulators of transcription that seem to modulate transcription and bind to DNA, or indirectly via σ factors, and thereby RNA Pol transcription through a redox signal mediated by an iron-sulphur (Fe-S) cluster, buried in the heart of the protein, or sometimes in a flexible subdomain. One such regulator is SoxR (containing an 2Fe-2S cluster, shown in red-yellow on the LHS). I think you can see how the distortion of DNA might be induced by SoxR and this can modulate transcription initiation. Environmental signals such as reactive oxygen species and NO, trigger gene expression events that ultimately lead to the elaboration of processes that defend the cell against this metabolic challenge. The wbl proteins are a class of gene regulators (originally identified in Streptomyces strains), but which are also found amongst the Mycobacteria (think TB). The main reason for including them is however that they have become a major area of interest of one of my colleagues at Sheffield, Professor Jeff Green. And because I really like the story emerging from his lab (see a review here ), that connects redox sensing and the control of gene expression, which may have wider implications for a number of prokaryotes and may possibly modulate the mode of action of some antibiotics. 

In summary, the extraordinary focal point for gene regulation is RNA Pol in bacteria and we are learning every day about the plethora of polypeptides and RNAs that influence its activity. I hope this has given you a flavour of the structure and function of this area of Molecular Biology. At some point, when I am brave enough, I'll look at the eukaryotic RNA Pols!

Luciferase July 2017

Related imageThe summer brings out the best in most of us (purely based on the evidence of the greater number of smiling people on the platform when I catch the train!). So, I thought about choosing a molecule that reflected this mood. I could have looked amongst the proteins that are the targets of psychotic drugs, or I could have gone for sunlight capturing molecules involved in photosynthesis. In January of 2016, I discussed a number of photo-activated proteins after a thrilling seminar from the Biochemist Tomas Carrel. See here. This month I have chosen the enzyme luciferase, a key element in the generation of light in insects such as the glow worm and the  fire fly (Photinus pylaris). I hope you agree that these creatures (notwithstanding the general unpopularity of most insects), make most people smile!

Related imageThe name luciferase, has its origins in the Latin for "bringer of light" (think lucid or elucidate). You might be familiar with the Biblical archangel Lucifer, who defied God, and went on to establish an alternative post-mortem retreat for those of a slightly unorthodox disposition. As Mark Twain (or J.M. Barrie?) famously commented: I'd choose Heaven for the climate and Hell for the company! I assume Lucifer lit up the general conversation in Hades? Alternatively, if you have read any Charles Dickens or Arthur Conan Doyle, you will know that the "nickname" for a match was a "lucifer". Let's now have a look how luciferase generates light and how the properties of the enzyme have been incorporated into a biological detection technology that is used both in a discovery and diagnostics mode. The reaction, catalysed by all luciferases is as follows.


luciferin + ATP → luciferyl adenylate + PPi

luciferyl adenylate + O2 → oxyluciferin + AMP + light



Light is produced because the reaction forms oxyluciferin in an electronically excited state. The reaction releases a photon of light as oxyluciferin returns to the ground state (in this case, the "quantum mechanical" state of a system having the lowest possible potential energy. The expression is also used in Biochemistry to define the lowest free energy state of substrate(s) in an enzyme catalysed reaction, usually with respect to the transition state and the products of the reaction). Firefly luciferase generates light from luciferin in a multistep process. First, D-luciferin is adenylated by ATP to form luciferyl adenylate and pyrophosphate. Following this "activation" by ATP, luciferyl adenylate is oxidized by molecular oxygen to form a dioxetanone ring. A decarboxylation reaction yields the excited state of oxyluciferin, which tautomerizes between the keto-enol form (at a given pH and temperature, all carbonyls have a tendency to shift between these two forms: you can read more here). The reaction finally emits light as oxyluciferin returns to the ground state. [I shall return to the important topic of "excitation" of molecules and its importance in Biological systems in a separate post.]


The protein molecule (shown left from a dinoflagellate) comprises two major structural units. The blue (mainly) beta barrel sits beneath the alpha-helical arrangement, with the adenylate and the chromophore positioned at the junction of the two domains. On binding the reactants the domains come together to exclude water, which increases the half life of the "excited" state of the oxyluciferin. The details vary a little from species to species and this leads to a variation in the wavelength of the emitted light. One mechanism proposes that the colour of the emitted light depends on whether the product is in the keto or enol form. The mechanism suggests that red light is emitted from the keto form of oxyluciferin, while green light is emitted from the enol form of oxyluciferin. This is not proven, but the logic relates to the well established connection between resonance structures and the energetics of absorption of light in the visible and uv spectrum. There are some other ideas, but even though a consensus hasn't yet been reached, all mechanisms will probably connect the local (molecular) environment with the stabilisation of the excited state (see below RHS).

You may wish to compare the properties of luciferases with naturally fluorescent proteins such as the Green Fluorescent Protein (GFP for short). Can you think of the biological advantages for an organism emitting light? Maybe a useful exercise is to compare and contrast the applications of these enzymes in contemporary experimental molecular cell biology? Can you find glow worms and fireflies in the UK? Take a look at the survey.


No comments:

Post a Comment