Genome-wide identification, characterization, and expression analysis of the small auxin-up RNA gene family during zygotic and somatic embryo maturation of the cacao tree (Theobroma cacao)

Small auxin-up RNA (SAUR) proteins were known as a large family that supposedly participated in various biological processes in higher plant species. However, the SAUR family has been still not explored in cacao (Theobroma cacao L.), one of the most important industrial trees. The present work, as an in silico study, revealed comprehensive aspects of the structure, phylogeny, and expression of TcSAUR gene family in cacao. A total of 90 members of the TcSAUR gene family have been identified and annotated in the cacao genome. According to the physic-chemical features analysis, all TcSAUR proteins exhibited slightly similar characteristics. Phylogenetic analysis showed that these TcSAUR proteins could be categorized into seven distinct groups, with 10 sub-groups. Our results suggested that tandemly duplication events, segmental duplication events, and whole genome duplication events might be important in the growth of the TcSAUR gene family in cacao. By re-analyzing the available transcriptome databases, we found that a number of TcSAUR genes were exclusively expressed during the zygotic embryogenesis and somatic embryogenesis. Taken together, our study will be valuable to further functional characterizations of candidate TcSAUR genes for the genetic engineering of cacao.


Introduction
Cacao (Theobroma cacao L.) has been known as one of the most critical industrial crops globally, which belongs to the family Malvaceae.Originating from the Central and South America [1,2], cacao has grown up to at least fifty nations located in the humid tropics.As an excellent source of essential nutrients, minerals and antioxidants, cacao beans have been used for chocolate production, confectionery, and cosmetics [3,4].However, climate change, especially biotic and abiotic stresses could threaten cocoa production [5,6].
Thus, the aim of this present study was to systematically identify, annotate and characterize the SAUR family in cacao.Firstly, all putative members of the SAUR family have been screened and validated in the cacao assembly.By using various web-based tools, the general features of the proteins and genes were then explored.We then constructed an unrooted phylogenetic tree of the SAUR proteins and predicted the duplication events in the SAUR gene family.Finally, we re-analyzed the previous transcriptome database to investigate the expression levels of the SAUR genes in various tissues during zygotic embryogenesis and somatic embryogenesis.

Identification of the SAUR genes in cacao
In order to identify the SAUR family members from cacao genomes, the whole genome and proteome data of T. cacao cultivar "B97-61/B2" (NCBI RefSeq assembly: GCF_000208745.1, date of release: Jul 9, 2016) were downloaded from the NCBI [29].The hidden Markov model profile of the conservative functional domain of SAUR (Pfam accession: PF02519) [11] was obtained from the Pfam database [30].All protein sequences were then screened against the cacao proteomes [29] to obtain the potential members of the SAUR gene family.The fulllength protein sequence, genomic DNA sequence, and coding DNA sequence of each member of the SAUR family in cacao were obtained for subsequent analysis.

Prediction of the SAUR protein characteristics in cacao
The full-length amino acid sequences of SAUR proteins in cacao were used as seed sequences for a search in the Expasy Protparam [31,32] as previously guided [9,[33][34][35][36].Particularly, the SAUR protein's common features, including protein length, isoelectric point (pI), molecular weight (mW), aliphatic index (AI), and grand average of hydropathicity (GRAVY) were estimated.

Construction of the phylogenetic tree of the SAUR proteins in cacao
The full-length amino acid sequences of SAUR proteins in cacao were used to generate an unrooted phylogenetic tree as previously guided [9,[33][34][35][36].Firstly, the ClustalW software [37,38] was used for the multisequence alignment of the SAUR proteins in cassava.Additionally, all members of the SAUR families from Arabidopsis thaliana [39,40] and coffea [19] were also downloaded for other trees.Results were then imported into the Molecular Evolutionary Genetics Analysis (MEGA) software [41] for constructing an unrooted phylogenetic tree.A maximum likelihood estimation with default settings was applied as the model selection parameter.Finally, the Adobe Illustrator software was used to edit and visualize the resultant tree.

Prediction of gene duplication of the SAUR genes in cacao
The duplicated events that occurred in the SAUR gene family in cacao were predicted based on the MEGAbased phylogenetic tree as previously described [9,[33][34][35][36].Particularly, SAUR members in the same clade with high bootstrap values were assigned as duplicated pair.The criteria of sharing more than 70% identity were utilized for describing a duplicated gene pair.A duplicated pair was defined as a tandem duplication event if these genes are located next to each other on the same chromosome within a 100-kb distance, while a segmental duplication event referred to duplications of DNA segments that range in size from 1 to 200 kb and occur in the same chromosome [42].Additionally, a duplicated pair resulting from a whole genome duplication event was known that duplicated genes were distributed in different chromosomes [42].

Exon/intron structural analysis of the SAUR genes in cacao
Gene exon-intron structure characteristics of genes encoding the SAUR proteins in cacao were analyzed as previously guided [9,[33][34][35][36].Specifically, the genomic DNA sequence and coding DNA sequence of each gene encoding SAUR protein in cacao were used to align in the Gene Structure Display Server [43].The order of the SAUR proteins in cacao obtained from the phylogenetic tree was then applied to visualize the gene structures.
We then used the Adobe Illustrator software to edit the figure.

Transcriptome analysis of the SAUR genes in cacao
The expression profiles of the SAUR genes were analyzed based on the published transcriptome atlas available in the NCBI Gene Expression Omnibus [44] as previously described [9,[33][34][35][36].We used the GSE55476 dataset to assess the expression levels of the SAUR genes in six tissue types and stages of embryogenesis [45].Particularly, zygotic embryo tissues at 14 (T-ZE), 16 (EF-ZE), 18 (LF-ZE), and 20 weeks after pollination (M-ZE) and somatic embryo tissues harvested in the whole late torpedo stage (LT-SE) and cotyledon tissues from mature somatic embryos (M-SE) were extracted to prepare the library [45].The genome-wide expression of the SAUR genes was visualized in R script [46].The expression levels are described by a color bar that changes from green to red.

Identification and annotation of TcSAUR genes in cacao
To identify all the putative SAUR genes in cacao, the seed sequence of the SAUR domain [11] was used to search against the proteome of cacao [29].As a result, a total of 90 SAUR genes were identified and well-annotated in the genome of cacao (Table 1).Based on the order of occurrence in the cacao genome, all putative members of the SAUR family in cacao were defined from TcSAUR01 to TcSAUR90, with "Tc" and "SAUR" abbreviated for the scientific name of cacao (Theobroma cacao) and the full name of the protein (small auxin-up RNA) (Table 1, Fig. 1).It has been realized that all putative TcSAUR genes were localized in the cacao genome with an uneven ratio.Interestingly, the chromosomal distributions of the SAUR gene family in the genomes of other higher plant species, such as coffea [19], melon [21], and wax gourd [23] also confirmed our finding.Previously, the SAUR family is being explored in various higher plant species, such as potato [12], tomato [12], watermelon [13], cotton [14], moso bamboo [15], poplar [16], grape [17], apple [18], coffea [19], Chinese white pear [20], melon [21], loquat [22], wax gourd [23], peanut [24], pineapple [25], foxtail millet [26], cucumber [27], and longan [28].More specifically, 31 and 38 members of the SAUR families have been recorded in coffea and moso bamboo [15,19].Previous studies also revealed 52, 60, 62, and 68 SAUR proteins in pineapple, grape, cucumber, and wax gourd [17,23,25,27].Meanwhile, a total of 98, 105, and 116 putative SAUR proteins was found in apple, poplar and Chinese white pear [16,20,47].Our comparisons suggested that the SAUR families in higher plant species were large groups, with greatly variable members.

Analysis of the general features of TcSAUR proteins in cacao
To better comprehend the TcSAUR proteins, the physicchemical parameters of each member of the TcSAUR family, such as protein length, pI, mW, AI and GRAVY scores were analyzed as previously described [9,[33][34][35][36].The general properties of the TcSAUR proteins were then provided in Table 1.We found that the proteins of TcSAUR family were varied from 60 (TcSAUR36) to 180 residues (TcSAUR54) in length (Table 1).The estimated mW ranged from 6.56 to 20.35 kDa, and TcSAUR36 and TcSAUR54 had the lowest and highest mW values, respectively (Table 1).The predicted pI scores of the TcSAUR proteins were varied from 4.04 (TcSAUR36) to 10.60 (TcSAUR11) (Table 1).Among them, a majority of members of the TcSAUR, particularly 68 out of 90 members had pI scores greater than 7.00 (Table 1).Next, the AI scores of the TcSAUR proteins were found between 65.75 (TcSAUR03) and 107.97 (TcSAUR66) (Table 1).Finally, 80 out of 90 TcSAUR proteins were predicted to be hydrophilic because their GRAVY scores were minus, ranging from − 0.81 (TcSAUR02 and TcSAUR55) to − 0.01 (TcSAUR37 and TcSAUR41) (Table 1).Ten remaining TcSAUR proteins, including TcSAUR18, TcSAUR26, TcSAUR27, TcSAUR34, TcSAUR38, TcSAUR40, TcSAUR44, TcSAUR49, TcSAUR67, and TcSAUR89, had plus GRAVY scores (Table 1), suggested that they were hydrophobic.Previously, the general features of the SAUR proteins in higher plant species were discussed [20].For example, the pI scores of the SAUR proteins in Chinese white pear ranged from 5.10 to 10.28, of which 63 (out of 116) SAUR proteins shared pI scores greater than 7.00 [20].The mW values of the SAUR proteins in Chinese white pear have been reported to vary greatly, with the minimum mW and maximum mW of 7.47 and 122.22 kDa, respectively [20].Similarly, the protein sizes of the SAUR proteins in Chinese white pear ranged from 67 to 1090 residues, while all proteins were hydrophilic (GRAVY scores were minus) [20].In foxtail millet, the SAUR proteins were varied from 8.21 to 39.49 kDa in mass [26].Interestingly, most of the SAUR proteins were basic molecules (pI scores greater than 7.00), whereas only 17 members of the SAUR family were acidic proteins (pI scores less than 7.00) [26].The AI scores of the SAUR proteins in foxtail millet were varied from 53.19 to 104.15 [26].In cucumber, the SAUR proteins were varied in mW values from 9.47 to 86.25 kDa, while the pI scores of these proteins ranged from 4.77 to 10.38 [27].The sizes of the SAUR proteins in cucumber were reported to be between 84 and 746 residues, while the GRAVY scores of these molecules were varied from -0.96 to 0.05 [27].

Analysis of gene structures and phylogenetic tree of TcSAUR proteins in cacao
To get insight into the gene structures of the TcSAUR genes in cacao, we analyzed the exon/intron organization of all members.We found that 85 (out of 90) TcSAUR genes were intronless (Fig. 2).Five remaining TcSAUR genes, including TcSAUR22, TcSAUR24, TcSAUR62, TcSAUR67, and TcSAUR87 contained two exons (Fig. 2).Additionally, the coding DNA sequences of the TcSAUR genes were varied from 183 (TcSAUR36) to 2209 nucleotides (TcSAUR22) (Fig. 2).The high occurrence of intronless genes in the TcSAUR family in cacao could be consistent with the cases reported in other plant species.For example, most SAUR genes in pineapple did not have introns [25], while 85 (out of 95) SAUR genes in loquat also contained no intron [22].In Chinese white pear, a majority of the SAUR genes were intronless, whereas only five SAUR genes contained at least one intron [20].
Similarly, 94 (out of 105) SAUR genes in poplar contained no introns [16].Taken together, our study suggested that most SAUR genes in cacao, perhaps in plant species did not have introns.

Analysis of the TcSAUR genes expression profiles during the zygotic and somatic embryo maturation of cacao
Of our interest, we investigated the expression patterns of the TcSAUR genes during the zygotic embryogenesis and somatic embryogenesis by re-explored the previous microarray data [45].We then arranged the whole 90 members of the TcSAUR gene family into 10 sub-groups and provided in Fig. 4. As provided in Fig. 4, all TcSAUR genes were differentially expressed in six samples during the zygotic embryogenesis and somatic embryogenesis.Particularly, TcSAUR85 was exclusively expressed in all six samples, while TcSAUR83 tend to highly express in T-ZE (Fig. 4A).In sub-group 2, only TcSAUR51 exhibited a strong expression during the zygotic embryogenesis (Fig. 4B).In sub-group 3, at least two genes, particularly TcSAUR20 and TcSAUR21, were noted to be strongly expressed during the zygotic embryogenesis and somatic embryogenesis, while TcSAUR35 was highly expressed in the LT-SE and M-SE (Fig. 4C).Two genes, like TcSAUR22 and TcSAUR23, were exclusively expressed in T-ZE (Fig. 4C).Interestingly, a majority (four out of five) members of the TcSAUR family belonging to subgroup 4, including TcSAUR04, TcSAUR05, TcSAUR55, and TcSAUR57, exhibited a strong expression in both six tissues, whereas TcSAUR02 was highly expressed in LT-SE and M-SE (Fig. 4D).We also found that TcSAUR genes in sub-group 5 tend to be moderately expressed in all tissues during zygotic embryogenesis and somatic embryogenesis (Fig. 4E).Additionally, two (TcSAUR52 and TcSAUR80), one (TcSAUR79), and four (TcSAUR56, TcSAUR59, TcSAUR75 and TcSAUR84) genes in subgroup 6, 7, and 8 were strongly expressed in all samples (Fig. 4F, G, H).In sub-group 9, TcSAUR90 was highly expressed in LT-SE and M-SE, while TcSAUR62 and TcSAUR63 proteins were highly accumulated in M-ZE (Fig. 4I).Finally, two TcSAUR genes in sub-group 10, like TcSAUR53 and TcSAUR54, were exclusively expressed in M-SE (Fig. 4J).
Up till now, the SAUR functions in higher plant species have been investigated.For example, a recent study found that a member of the SAUR in tomato, namely SlSAUR69, increased fruit sensitivity to ethylene by suppressing polar auxin transport to alter the unripening-to-ripening transition [12].Previously, the functions of the SAUR genes during embryogenesis were also recorded.Specifically, a number of the SAUR genes in coffea exhibited more expression in at least one of the developing embryo stages or plantlets [19].Among them, the expression of coffea SAUR12 gene increased in non-embryogenic calli and the developing embryo stages [19].In coconut, the expression patterns of the SAUR genes in the embryogenic callus stage were reported to be significantly higher than that in the initial culture and somatic embryo stage [48].Recently, a number of the SAUR genes in longan were strongly expressed in the globular embryos, suggesting that they might play an important role during the early longan somatic embryogenesis [28].In the future, point-mutation genetic tests should be performed to confirm their crucial significance in the biochemical function of TcSAUR proteins in cacao.

Conclusion
To sum up, this current study provided new insight into the identification, annotation, characterization, and expression of the TcSAUR gene family in cacao.Our results indicated that all members of the TcSAUR family were slightly conserved based on their structure and phylogenetic tree.Among them, our results clearly indicated that tandemly segmental duplication events, segmental duplication events, and whole genome duplication events could be explained for the evolution of this important gene family.Of our interest, we found that the expression of the TcSAUR genes showed significant expression levels in various tissues during the zygotic embryogenesis and somatic embryogenesis by re-analyzing the previous microarray database.Taken together, our study provided fundamental information on the molecular mechanism of TcSAUR genes involved in cacao embryogenesis.Manipulation of TcSAUR expression will facilitate and accelerate zygotic embryogenesis and somatic embryogenesis during cacao tissue culture.

Fig. 1
Fig. 1 Physical distribution of the SAUR gene family in cacao genome

Fig. 2
Fig. 2 Exon/intron organization of the SAUR gene family in cacao

Table 1
Physical and chemical properties of the SAUR family in cacao