# Annual 30m Great Lakes Maps of the Tibetan Plateau from 1991 to 2018

### Study zone

TP is located in Central Asia (25°59′ N–39°49′ N; 73°29′ E–104°40′ E) and is the largest and highest plateau in the world, with an average altitude more than 4000 m above sea level (asl) and an area of ​​2.5 × 106 kilometers2.29. There were ~1400 lakes >1 km2 in size on the plateau in 2018, and 453 of them were over 10 km2.17. The changing hydrology of TP has had enormous impacts on ecosystems and human society, as it is the original water source for many vital rivers in Asia, supplying water to approximately 22% of the Earth’s population for agricultural, industrial and domestic purposes.30. The climate across TP varies greatly due to its heterogeneous geography with an annual average temperature of -4.1°C in the inner basin to 1.7°C in the Brahmaputra basin31. With its high elevation and wide surface area, the plateau also acts as a barrier to westerly winds and atmospheric monsoon circulation. The plateau is located in the arid part of the monsoon region and experiences a decreasing rainfall gradient from east to west of 335 to 430 mm/year32. The monsoon climatic zone (Brahmaputra and Salouen basins outlined in purple in Fig. 1) experiences summer rains and winter droughts, while the zone controlled by westerly winds (Tarim and Indus outlined in blue in Fig. 1) is extremely arid in winter with relatively more precipitation in summer. The other basins receive influences from both the Indian Summer Monsoon (ISM) and westerly winds to some extent33as shown in figure 1.

### Source data

We obtained Landsat Collection 1 Tier 1 surface reflectance satellite imagery for Landsat 5 Thematic Mapper I, Landsat 7 ETM+, and Landsat 8 Operational Land Imager (OLI)/TIRS from the United States Geological Survey (USGS). All data in the collection have been atmospherically and geometrically corrected, and cross-calibration between different sensors has been applied34. The lakes were mapped using all available imagery acquired during the period 1991-2018. It was not possible to map the lakes at high temporal resolution before 1991 because coverage is incomplete for the study area2.35. A summary of the source imagery used is shown in Table 1. Modern glacier polygons derived from Glacial Area Mapping for Asian Mountain Discharges (GAMDAM)36 were also used to avoid the possibility of glaciers being incorrectly mapped as lakes.

The study area is divided into ten basins as shown in Fig. 11.37, due to the vast expanse of the TP and its spatial heterogeneity. The boundaries of the TP were taken from the Boundary and Tibetan Plateau Area (DBATP) dataset, which is the result of long-term fieldwork and was published and revised in 201429.

### Annual generation of the lake map

An approach based on water frequency – which is the ratio of the number of water observations at each location to the total number of images used in a year38 – has proven effective in obtaining reliable representations of annual water masses27. This is particularly important for the TP, where significant seasonal variations in lake boundaries are observed.13.14. The frequency map approach can also help reduce the impact of cloud shadows and other cloud artifacts to improve mapping accuracy. Additionally, this method can avoid the drawbacks of manual visual interpretation and mapping of lakes based on their appearance on remote sensing imagery, which can be time consuming over large areas and somewhat subjective. The mapping approach used involves several steps, including image pre-processing, generation of the water body map, and yearly production of the lake map based on water frequency, as shown in Fig. figure 2.

#### Generation of water body maps

Water mass maps were then generated from the pre-processed Landsat imagery using the Modified Difference Normalized Water Index (MNDWI), Difference Normalized Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI), as reported in previous studies.25.38:

$$MNDWI=frac{{{rm{rho }}}_{{rm{Green}}}-{{rm{rho }}}_{{rm{SWIR}}1}} {{{rm{rho }}}_{{rm{Green}}}+{{rm{rho }}}_{{rm{SWIR}}1}}$$

(1)

$$NDVI=frac{{{rm{rho }}}_{{rm{NIR}}}-{{rm{rho }}}_{{rm{Red}}}}{ {{rm{rho }}}_{{rm{NIR}}}+{{rm{rho }}}_{{rm{Red}}}}$$

(2)

$$EVI=frac{{rm{rho }}}_{{rm{NIR}}}-{{rm{rho }}}_{{rm{Red}}}}{ 1 ,0+{{rm{rho }}}_{{rm{NIR}}}+6,0{{rm{rho }}}_{{rm{Red}}}+7, 5{{rm {rho }}}_{{rm{Blue}}}}$$

(3)

or ρBlue, ρGreen, ρRed, ρNIR and ρSWIR1 are the surface reflectance values ​​for the blue, green, red, near infrared (NIR), and shortwave infrared-1 (SWIR1) bands of Landsat sensors40,41,42.

MNDWI enhances water features in imagery and has been widely applied to identify the presence of water bodies in various regions40,41,42. It also has the added benefit of eliminating any residual effects associated with mountain shadows and light cloud cover.40,41,42. Since low vegetation near wet surfaces is one of the main causes of commission error in mapping surface water bodies, in this study we combined MNDWI and vegetation indices (NDVI and EVI) using logical operators to improve the discrimination capabilities of water body mapping algorithms25.38:

$$Water,body=left(MNDWI > EVIorMNDWI > NDVIright)andleft(EVI (4) or ‘MNDWI > EVI or MNDWI > NDVI‘ identifies pixels that have a stronger water signal than the vegetation signal25.38while ‘ENI ‘ ensures that all vegetation pixels or mixed water-vegetation pixels can be removed39. Only pixels meeting the criteria were classified as corresponding to water bodies and all other pixels were classified as non-aquatic pixels25.38. Criteria are effective in distinguishing water bodies from non-water pixels in Landsat imagery27and so it was used to generate water body maps for each of the 101,768 individual Landsat images. #### Generation of water frequency and annual lake maps Individual water mass maps for each year were collated and then used to calculate annual water frequency maps (28 in total). The water frequency for each pixel in a water frequency map was calculated according to Zou et al.25 as:$$Fleft(yright)=frac{1}{{N}_{y}}mathop{sum }limits_{i=1}^{{N}_{y}}{W }_{y,i}ast 100{rm{ % }}

(5)

or F is the water frequency,there is the year, NOTthere is the total number of Landsat observations for that pixel that year, while Oy, i indicates whether a pixel in a water body map is classified as water (represented by a value of 1) or not water (a value of 0).

A permanent water surface is under water all year round2which must correspond to an annual water frequency (F) of 100% in a frequency map. However, the water frequency of pixels corresponding to water bodies that persist throughout the year may be less than 100% due to the obscuring effects of cloud cover and shadow in satellite imagery. For example, certain clouds (i.e. optically thin clouds that were not obscured) that obscure the presence of a true permanent body of water below have a chance of being classified as non-water in the maps of water bodies, which would reduce the amount of water frequency of the corresponding pixels in the annual water frequency map. To overcome this problem, we set a threshold of F≥ 75% to classify pixels in frequency maps as permanent water pixels. The choice of the threshold value of 75% is in line with previous studies25,27,38 and this was confirmed to be the most appropriate threshold value for identifying permanent water bodies in the TP area. After applying the threshold to the water frequency maps, areas of permanent water were extracted using Arcpy’s reclassification tool as pixels comprising water during the vast majority of the year.38. Only these permanent water bodies were selected for inclusion in the interannual dataset, as temporary water bodies could be associated with seasonal flooding and irrigation activities.2,41,43.

Then the permanent extents of water bodies were converted from raster to vector format to produce annual lake maps. These water body polygons were then intersected with the modern glacier polygons to remove any glaciers incorrectly mapped as lakes. Additionally, the presence of other water bodies, such as rivers, was manually detected and removed to further improve the quality of the lake map dataset. The remaining permanent water bodies were then considered to consist of lakes only. Finally, after filtering the dataset to retain lakes with areas greater than 10 km2the area and perimeter of each large lake have been calculated to help determine the inter-annual changes of the lake.