没有大气校正和实测数据怎么搞水色

2022-06-14 8 min

看到标题是不是觉得很奇怪，没错，我也觉得很奇怪，但没办法，这是客户的要求。

开局吐槽

国产卫星两大麻烦的地方，一个是没有准确的辐射校正与大气校正，然后还不公开完整的数据与标准处理方式，这样的话我们想搞准确的辐射校正和大气校正，或者说需要某个波段处理到某一个级别的未公开的数据，就只能自己从头开始搞，并且想参考标准的处理流程的时候，人家不给你代码，问就是保密要求，咱也不知道国内哪来那么多保密级别；另一个是没有公开详细的ATBD (Algorithm Theoretical Basis Document)(首先是公开都不公开，我们想知道标准产品怎么算的只能靠猜，详细的定义可以看看MODIS的ATBD，可以直接当教科书用)，别的国家啥情况我不知道，日本和韩国在卫星数据还没公开的时候就把最初版本的ATBD放出来了，并且在获取了足够的卫星数据之后很快就发布标准产品的实地检验结果，国产那俩水色卫星发射这么久，国家海洋卫星应用中心的官网连个P都不放一个。

当然还有一个麻烦的地方就是下载麻烦，批量下载还需要写机构申请或者拿U盘，真的2202年了，世界第二大国能不能有点世界大国的心胸，官网没有英语页面也就罢了，下载的方法能不能和国际接轨一下。日本水色人才都空心了，SGLI卫星的各个指标都搞得比国内好的不知道哪里去了，我们交了这么多税你们就搁着干这个呢？

客户要求

客户给我的是一个新发射的国产的高光谱卫星，可见光及红外波段有76个波段，SWIR有90个波段，分辨率没细看估计挺高的，为了给客户保密，我就不放结果的图了。问题在下面几个。

没有Spectral Response Function
没有 extraterrestrial solar irradiance
辐射定标只定标到Radiance
只有一景的卫星图片

可能是因为刚发射没多久所以这些该有的都没有吧

稍微学过一点水色的都知道，前两个是做大气校正或者大气部分校正的必要条件，只要有前两个，就可以拿py6s做瑞利校正。但是这连前两个都没有，摊手。

而最后一个又限制了神经网络的使用(当然我也懒得给他匹配那么多数据做神经网络)，那还能咋办呢？

解决思路

那就直接上统计的办法吧。

我在处理高光谱卫星的时候的一个强烈体会是，高光谱卫星需要的计算资源比我想象的要多的多，我16g内存如果不对高光谱数据做任何降维和并行处理的话，在我自己的电脑上是完全跑不了的, kernel always die，那就，就这这个下台阶。大概的思路如下。

对VNIR波段数据做PCA降维
对降维后的数据做K-means cluster，提取水体mask
将PCs mask之后 upscale到OLCI的分辨率，跑个简单的线性回归找个R2最高的
把线性回归用在upscale之前的数据上

客户给的钱也不多不然我就把线性回归换成RF+error downscale了

部分参考代码

高光谱卫星预处理

# %% [markdown]
# # Read File

# %%
import os
import glob
from os.path import isfile, join
folderpath=''

# %%
for path, subdirs, files in os.walk(folderpath):
    for name in files:
        print(os.path.join(path, name))

# %%
#import xarray as xr
from osgeo import gdal
gdal.UseExceptions()

VN=gdal.Open('/ZY1F_AHSI_.tif',
             gdal.GA_ReadOnly)
VNband1=VN.GetRasterBand(1)
VNim1=VNband1.ReadAsArray()

# %%
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.imshow(VNim1)
plt.colorbar()

# %%
VN.RasterCount

# %%
VN.RasterXSize

# %%
VN.RasterYSize

# %%
SW=gdal.Open('ZY1F_AHSI.tif'
            ,gdal.GA_ReadOnly)
SWband1=SW.GetRasterBand(90)
SWim1=SWband1.ReadAsArray()

# %%
plt.imshow(SWim1)

# %%
SW.RasterCount

# %%
SW.RasterXSize

# %%
SW.RasterYSize

# %%
OGP=gdal.Open('OGP.tif',
             gdal.GA_ReadOnly)

OGPband1=OGP.GetRasterBand(1)
OGPim1=OGPband1.ReadAsArray()

# %%
plt.imshow(OGPim1)
plt.colorbar()

# %%
OGPband2=OGP.GetRasterBand(2)
OGPim2=OGPband2.ReadAsArray()
plt.imshow(OGPim2)
plt.colorbar()

# %%
import geopandas as gpd
shapefile = gpd.read_file("ZY1F_AHSI.shp")

# %%
shapefile.keys()

# %%
print(shapefile)

# %%
shapefile['geometry']

# %% [markdown]
# # RadCal

# %%
#GAIN from .raw file open with txt editor
VNGain=[]

# %%
np.shape(VNGain)

# %%
SWGain=[ ]

# %%
np.shape(SWGain)

# %%
VNarr=np.ones((2051,2000,76))
for i in range(76):
    VNarr[:,:,i]=VN.GetRasterBand(i+1).ReadAsArray()*VNGain[i]

# %%
SWarr=np.ones((2051,2000,90))
for i in range(90):
    SWarr[:,:,i]=SW.GetRasterBand(i+1).ReadAsArray()*SWGain[i]

# %%
plt.imshow(VNarr[:,:,0])
plt.colorbar()

# %%
plt.imshow(VNarr[:,:,75])
plt.colorbar()

# %% [markdown]
# # PCA

# %%
import spectral as spy


# %%

pc = spy.principal_components(VNarr)
pc_99 = pc.reduce(fraction=0.999)  # 保留99.9%的特征值

# %%
spy.imshow(pc.cov)

# %%
pc.eigenvalues


# %%
pc_99.mean

# %%
print(len(pc_99.eigenvalues))  # 剩下的特征值数量


# %%
img_pc = pc_99.transform(VNarr)
v = spy.imshow(img_pc[:,:,:3], stretch_all=True)

# %%
plt.imshow(img_pc[:,:,1])
plt.colorbar()

# %%
plt.imshow(img_pc[:,:,2])
plt.colorbar()


# %%
plt.imshow(img_pc[:,:,3])
plt.colorbar()

# %% [markdown]
# # water mask

# %%
(m,c)=spy.kmeans(img_pc,nclusters=3)

# %%
plt.imshow(m)
plt.colorbar()

# %%
Mask=np.ma.masked_less(m,2)
plt.imshow(Mask)
plt.colorbar()

# %%
Mask
np.save("VNarrData.npy",VNarr)
# %%
VNarr[Mask.mask,:]=np.nan


# %%
plt.imshow(VNarr[:,:,1])

# %%
np.save("VNarrData.npy",VNarr)

# %% [markdown]
# # Load Data
# 

# %%
import spectral as spy
poly = [120.85938, 36.95460, 121.52334, 36.822677]
import numpy as np
import matplotlib.pyplot as plt
PCs=np.load("VNPCs.npy",allow_pickle=True)
Mask = np.load("mask.npy")


# %%
PCs[Mask,:]=np.nan



# %%
import pyresample
def swath_resampling(src_data: np.ma.array, src_lon: np.array, src_lat: np.array,
                     trg_lon: np.array, trg_lat: np.array, search_radius: float):
    if len(trg_lon.shape) == 1:
        grid_def = pyresample.geometry.SwathDefinition(
            *np.meshgrid(trg_lon, trg_lat))
    else:
        grid_def = pyresample.geometry.SwathDefinition(
            lons=trg_lon, lats=trg_lat)

    # source grid with original swath data
    swath_def = pyresample.geometry.SwathDefinition(lons=src_lon, lats=src_lat)

    # resample (here we use nearest. Bilinear, gaussian and custom defined methods are available)
    # for more, visit https://pyresample.readthedocs.io/en/latest/
    result = pyresample.kd_tree.resample_nearest(swath_def, src_data, grid_def, epsilon=0.5,
                                                 fill_value=None, radius_of_influence=search_radius)
    return result, grid_def


# %% [markdown]
# # Read S3

# %%
S3path="/mnt/f/S3B_OL_2_WFR____20220404T014531_20220404T014831_20220405T105330_0180_064_231_2340_MAR_O_NT_003/S3B_OL_2_WFR____20220404T014531_20220404T014831_20220405T105330_0180_064_231_2340_MAR_O_NT_003.SEN3"
import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import gridspec
import os
import warnings
import cartopy.crs as ccrs
import cartopy.feature as cfeature
file_name_chl = 'chl_nn.nc'
file_name_tsm = 'tsm_nn.nc'
file_name_geo = 'geo_coordinates.nc'
GEO_file = xr.open_dataset(os.path.join(S3path, file_name_geo))
LAT = GEO_file.variables['latitude'][:]
LON = GEO_file.variables['longitude'][:]
GEO_file.close()


# %%
def flag_data_fast(flags_we_want, flag_names, flag_values, flag_data, flag_type='WQSF'):
    flag_bits = np.uint64()
    if flag_type == 'SST':
        flag_bits = np.uint8()
    elif flag_type == 'WQSF_lsb':
        flag_bits = np.uint32()

    for flag in flags_we_want:
        try:
            flag_bits = flag_bits | flag_values[flag_names.index(flag)]
        except:
            print(flag + " not present")

    return (flag_data & flag_bits) > 0
flags_we_want = ['CLOUD']

file_name_flags = 'wqsf.nc'
FLAG_file = xr.open_dataset(os.path.join(
    S3path, file_name_flags))
# get all the flag names
flag_names = FLAG_file['WQSF'].flag_meanings.split(' ')
# get all the flag bit values
flag_vals = FLAG_file['WQSF'].flag_masks
# get the flag field itself
FLAGS = FLAG_file.variables['WQSF'].data
FLAG_file.close()

# make the flag mask using the function we defined above "flag_data_fast"
flag_mask = flag_data_fast(flags_we_want, flag_names,
                           flag_vals, FLAGS, flag_type='WQSF')
flag_mask = flag_mask.astype(float)
flag_mask[flag_mask == 0.0] = np.nan


# %%


#load CHL_NN
OLCI_file_CHL = xr.open_dataset(os.path.join(
    S3path, file_name_chl))
CHL_NN = OLCI_file_CHL.variables['CHL_NN'][:].data
CHL_NN[np.isfinite(flag_mask)] = np.nan
OLCI_file_CHL.close()

#load TSM_NN
OLCI_file_TSM = xr.open_dataset(os.path.join(
    S3path , file_name_tsm))
TSM_NN = OLCI_file_TSM.variables['TSM_NN'][:].data
TSM_NN[np.isfinite(flag_mask)] = np.nan
OLCI_file_TSM.close()

# %% [markdown]
# # Resample

# %%
print(np.shape(PCs))
print(poly)

# %%
#def swath_resampling(src_data: np.ma.array, src_lon: np.array, src_lat: np.array,
#                     trg_lon: np.array, trg_lat: np.array, search_radius: float):
ZYlon=np.linspace(poly[0],poly[2],num=np.shape(PCs)[0])
ZYlat=np.linspace(poly[1],poly[3],num=np.shape(PCs)[1])

# %%
np.shape(ZYlat)

# %%
S3lon=np.arange(poly[0],poly[2],step=0.003)
S3lat=np.arange(poly[1],poly[3],step=-0.003)

# %%
S3_chl,grid=swath_resampling(CHL_NN,LON,LAT,S3lon,S3lat,search_radius=10000)
plt.imshow(S3_chl)

# %%
S3_tsm,grid=swath_resampling(TSM_NN,LON,LAT,S3lon,S3lat,search_radius=10000)
plt.imshow(S3_tsm)

# %%
np.shape(S3_tsm)

# %%
import cv2


# %%
PCsUp=np.ones((44,222,5))
import cv2
for i in range(5):
    PCsUp[:,:,i]=cv2.resize(PCs[:,:,i],(222,44),interpolation=cv2.INTER_CUBIC)

plt.imshow(PCsUp[:,:,0])

# %%
S3_chl=np.ma.masked_invalid(S3_chl)
S3_tsm=np.ma.masked_invalid(S3_tsm)

# %%
PCsUp[S3_chl.mask,:]=np.nan
PCsUpMasked=np.ma.masked_invalid(PCsUp)

# %%
#from pylr2 import regress2
for i in range(5):
    print(i+1)
    r=np.ma.corrcoef(PCsUpMasked[:,:,i].flatten(),S3_chl.flatten())
    print(r)

# %%
for i in range(5):
    print(i+1)
    r = np.ma.corrcoef(PCsUpMasked[:, :, i].flatten(), S3_tsm.flatten())
    print(r)


# %%
import statsmodels.api as sm

model=sm.OLS(S3_chl.flatten(),PCsUpMasked[:,:,1].flatten(),missing='drop',hasconst=True)
chlre=model.fit()
print(chlre.summary())

# %%

model = sm.OLS(S3_tsm.flatten(),
               PCsUpMasked[:, :, 0].flatten(), missing='drop', hasconst=True)
tsmre = model.fit()
print(tsmre.summary())


# %%
plt.imshow(PCs[:, :, 1]*-0.0220)
plt.colorbar()
plt.title('Chl')


# %%
plt.imshow(PCs[:, :, 0]*0.0158)
plt.colorbar()
plt.title('tsm')

准确度可以说是不能用的地步，懒得改了，就这样吧

Ocean Color Hyperspectral