MicroProteins are small single-domained proteins found in plants that regulate processes such a shade response, abiotic stress response, nutrient uptake and others. Crucially, they are evolutionarily related to their intracellular targets (called ancestors), arising through either gene duplication or alternate RNA processing events such as splicing, polyadenylation, or utilising alternative transcription start sites. These ancestors contain a dimerisation domain which the microProteins have retained, losing most or all of the rest of the protein through evolution. Their mode of action is to form heterodimers with their ancestors, thereby disrupting the function of the ancestors homodimers. There are many ways the microProtein-ancestor heterodimer can alter the function of the normal homodimer, from forming an (in)active complex, changing the cellular localisation of the heterodimer, or recruiting additional proteins to activate or repress downstream pathways.
My work was initially focussing on the purification and biochemical characterisation of a microProtein involved in shade avoidance. In particular I was looking at how the microProtein interacts with its target ancestor, and whether the binding affinity to the target can be modified to control the strength of the shade avoidance response. However, in the middle of all this the COVID-19 pandemic started, which meant that I could not do physical experiments for a while. Instead I started a new project to find a way to identify novel microProteins from bioinformatic data, including the rapidly expanding source of plant genomes. I rewrote the existing Python pipeline in C++ and updated the processing steps to increase the accuracy and speed with which it can detect potential microProteins.
The new pipeline could detect existing microProteins with very high accuracy, as well as predict many small proteins that based on their biological and biochemical annotations can function as microProteins. We did a cross-species examination of these microProteins and found many highly conserved small proteins, linked to biological processes known to be under microProtein control. In addition, we looked at the RNA coexpression of the predicted microProteins-ancestors pairs, and found around 50 pairs that showed significant up- or downregulation, similar to all known microProteins. Furthermore, we also found a predicted microProtein that was homologous to an A. thaliana one but whose ancestor had acquired a mutation which made it constitutively active. This indicated that this yet uncharacterised conserved microProtein had acquired a new function of deactivating the active ancestor homodimers.
I also integrated RNAseq data into the analysis to improve the detection and filter out matches that are not backed up by transcriptomic data, as well as to detect microProteins arising from post-transcriptional events that are difficult to detect from proteomic data. This led to the identification of a novel alternative transcript that could function as a microProtein by a group of Bachelor students doing their thesis project in our lab.
If you wish to know more this work, you can read more from here, here, here and here.