Cotton (Gossypium hirsutum L.) is the most important natural textile fiber worldwide and the world’s sixth largest source of vegetable oil. China, India, USA and Pakistan are largest cotton producing countries in the world [1]. Cotton seeds are important source of food [2]. Cotton is known as the cash crop due to its high economic value. Cotton is considered as the life line of economy of Pakistan. The share of cotton in GDP of Pakistan is 0.8% [3]. It plays role as backbone of Pakistan’s economy by producing the lint, which is used in textile industry [4].
Cotton belongs to genus Gossypium and family Malvaceae. The family Malvaceae contains 90 genera. The genus Gossypium includes shrubs and saplings [5–6]. The genus Gossypium has 45 diploid and 5 allotetraploid species that occur in semiarid and arid areas of Africa, Central and South America, Galapagos, Indian subcontinent, Australia, Arabia, and Hawaii [7]. Currently cotton has only 4 cultivated species, out of which two are tetraploid species including G. hirsutum L. and G. barbadense L. The other species are diploid such as G. arboreum L., and G. herbaceum L. Tetraploid species were formed by merging of two diploid genomes, out of which one genome has similarity with A-genome diploids found in old world while other genome is similar to the D-genome diploids of new world. Tetraploid species evolved by hybridization between an African A genome and an American D genome about 1–2 million years ago [8,9]. Diploid species are divided into eight genomic groups A-G and K [10]. Spinable fiber is the product of A genome species [11]. World’s cotton production is largely based on Gossypium hirsutum and Gossypium barbadense [12]. Gossypium hirsutum, also known as Upland and Mexican cotton, represents 90% of global fiber production. It produces longer, stronger and finer fiber [13]. Gossypium barbadense, also known as Pima cotton, involves in 8% global cotton production. It is valued for its higher quality fiber [14]. Gossypium arboreum (Tree cotton) and Gossypium herbaceum (Levant cotton) contribute less than 2% of global cotton production [15].
World’s agriculture production is adversely affected by diseases caused by viruses and unfavorable climatic conditions [16]. Cotton production is threatened by many biotic and abiotic stresses. Among biotic stresses, cotton leaf curl disease (CLCuD) is most damaging. CLCuD is considered as the major threat to cotton production in Pakistan [17]. CLCuD restricts the cotton production by 20%–30% annually in Pakistan which depends on the severity of the disease [18,19]. Millions of cotton bales have affected by this disease and country faced economic difficulties. It is estimated that Pakistan economy suffered a loss of 75 million rupees during last ten years due to CLCuD [20]. CLCuD is transmitted by the whitefly Bemisia tabaci [21]. This disease is caused by a begomovirus, belonged to the family Geminiviridae. Geminiviruses infect both monocotyledonous and dicotyledonous plants. Cotton leaf curl virus has a wide range of alternative hosts including okra, tomato, potato, Chinese rose, tobacco, lehli and dhatura [22].
Geminiviruses are single stranded circular DNA viruses. The genome of these viruses consists of 2500–5200 bases. The family Geminiviridae is divided into nine genera on the basis of arrangement of genome and vector. Several insects are involved in the transmission of these viruses such as aphids, leafhoppers, treehoppers and whiteflies. Whiteflies are involved in the transmission of begomoviruses [23]. The largest genus of family Geminiviridae is begomovirus. More than 320 species are present in this genus. Dicotyledonous plants are infected by begomoviruses [21]. Production of many crops including cotton is reduced on a large scale by begomoviruses in temperate, tropical and sub-tropical areas. All members of family Geminiviridae are monopartite except begomoviuses [23]. Begomoviruses are of two types including monopartite and bipartite begomoviruses. Most of begomoviruses genome contains two types of components. One is known as DNA A and other is known as DNA B. These begomoviruses are known as bipartite viruses. The size of both components of genome is same. Each component of genome is consisting of about 2600 nucleotides. Only DNA A is present in monopartite begomoviruses [21]. Monopartite begomoviruses are responsible for CLCuD [24,25]. The DNA A of bipartite genome contains 6 genes out of which V1 and V2 genes are present on sense strand while other 4 genes are present on antisense strand such as C1, C2, C3 and C4 genes. These genes encode different proteins. For example, replication-initiation protein (Rep) is encoded by C1 gene while C2 gene encodes transcriptional activator protein. Replication enhancer protein is encoded by C3 gene. C4 encodes protein that is necessary for movement of virus. This protein determines the severity of disease and range of host [26]. Two types of satellites are associated with begomoviruses. These satellites are known as alphasatellites and betasatellites [24]. The size of alphasatellites is about 1.4kb. Allphasatellites do not take part in the determination of disease symptoms [27]. The level of viral DNA in infected plants is suppressed by alphasatellites [28]. The replication of helper virus is enhanced by betasatellites. Betasatellites consist of about 1350 nucleotides. These are small and non-circular ssDNA satellites which are linked with majority of monopartite begomoviruses. The protein (βC1) which determines the symptoms of disease is also encoded by betasatellites. The severity of disease is determined by the betasatellites [24]. Disease severity is positively correlated with vrisue/satellite DNA thus more virus/satellite results into more disease severity [17].
CLCuD first appeared in Nigeria in 1916. But in Pakistan, the cotton plants were first attacked by this disease in 1967 at Tiba Sultan Pur near Multan district. Its first epidemic in Pakistan appeared during 1992–93 [29]. CLCuD has specific symptoms such as growth stunting, visibility of thickened veins, leaf darkening, and leaf enations may be one or more on the underside side of leaf. Enation is a leaf like small outgrowth of plant tissue that is caused by a virus infection [24]. CLCuD caused two types of thickening of veins on cotton plants; i) major vein’s thickening ii) minor vein thickening. Leaf margins become thick at the initial stage of disease [30]. Seriously infected plant has shortened internodal length that results into stunted growth of the plant [24,31]. CLCuD adversely affects fiber quality. It also reduces seed weight, number of monopodial and sympodial branches [32]. Its susceptible plants have darker color than normal plants due to the proliferation of chlorophyll deposited tissues [33].
Different strategies are used to compete with the problem of CLCuD. But the best option is to use the resistant cultivar. Molecular mapping approach can help to identify the genomic regions of cotton associated with CLCuD resistance. Association mapping is a recent molecular mapping approach which has great resolution. Association mapping (AM), also known as linkage disequilibrium (LD)-based mapping, is used to identify genomic regions associated with phenotypic variations [34]. This technique is used to determine marker-trait associations with the help of genome-wide distributed genes and markers and to study the architecture of the trait. This technique is appropriate for both synthetic and natural populations [35]. AM is a technique that is used to analyze a large number of polymorphisms to assess the QTL effect. It does not require generation of segregating population and a large number of progeny, due to which it is more preferable than linkage analysis (LA). Association mapping can increase mapping resolution. It decreases the research time and greater number of alleles is identified [36]. In the present project, an association mapping approach was used to identify simple sequence repeat (SSR) markers associated with CLCuD resistance in cotton.