# 1 Estimating Ocean Surface Currents from Satellite Observable

## Transcript Of 1 Estimating Ocean Surface Currents from Satellite Observable

This is a non-peer reviewed preprint. This work was submitted to the Frontiers in Marine Science Ocean Observation on Feb 25, 2021

1 Estimating Ocean Surface Currents from Satellite Observable Quantities

2

with Machine Learning

3

Anirban Sinha∗

4

California Institute of Technology, Pasadena, CA

5

Ryan Abernathey

6

Lamont Doherty Earth Observatory of Columbia University, Palisades, NY

7 ∗Corresponding author address: Anirban Sinha, California Institute of Technology, Pasadena, CA 8 91125. 9 E-mail: [email protected]

Generated using v4.3.2 of the AMS LATEX template

1

ABSTRACT 2

10 Global surface currents are usually inferred from directly observed quanti11 ties like sea-surface height, wind stress by applying diagnostic balance rela12 tions (like geostrophy and Ekman ﬂow), which provide a good approximation 13 of the dynamics of currents at large scales and low Rossby numbers. However, 14 newer generation satellite altimeters (like the upcoming SWOT mission) will 15 capture the high wavenumber variability associated with unbalanced compo16 nents, and applying these balances directly may lead to an incorrect estimate 17 of the surface ﬂow. In this study, we explore Machine Learning (ML) as an al18 ternate route to infer surface currents from satellite observable quantities using 19 SSH, SST and wind stress from available ocean GCM simulation outputs as 20 inputs to make predictions of surface currents (u,v), which are then compared 21 against the true GCM output. We demonstrate that a linear regression model 22 is ineffective at predicting velocities accurately beyond localized regions. In 23 comparison, a relatively simple neural network (NN) can predict surface cur24 rents accurately over most of the global ocean, with lower mean errors than 25 geostrophy+Ekman. Using a local stencil of neighboring grid points as ad26 ditional input features, we can train the deep learning models to effectively 27 “learn” spatial gradients and the physics of surface currents. By passing the 28 stenciled variables through convolutional ﬁlters we can help the model learn 29 spatial gradients much faster. Various training strategies are explored using 30 systematic feature hold out, to understand the effect of each input feature on 31 the NN’s ability to accurately represent surface ﬂow. Sensitivity analysis of a 32 reference NN reveals that besides SSH, geographic information is an essential 33 ingredient required for making accurate predictions of surface currents with 34 deep learning.

3

35 1. Introduction

36 The most reliable spatially continuous estimates of global surface currents in the ocean come 37 from geostrophic balance applied to the sea surface height (SSH) ﬁeld observed by satellite al38 timeters. For the most part, the dynamics of slow, large-scale currents (up to the mesoscale) are 39 well approximated by geostrophic balance, leading to a direct relationship between gradients of 40 SSH and near-surface currents. However, current meter observations for the past few decades 41 and some of the newer generation ultra-high-resolution numerical model simulations indicate the 42 presence of an energized submesoscale as well as high-frequency waves / tides at smaller spatial 43 and temporal scales (Rocha et al. 2016). These high-frequency unbalanced motions are likely to 44 complicate the estimation of surface currents from from SSH in the upcoming Surface Water and 45 Ocean Topography (SWOT) mission (Morrow et al. 2018). That is, the high-wavenumber SSH 46 variability may represent a different, ageostrophic regime, where geostrophy might not be the best 47 route to infer velocities. Motivated by this problem, in this study we explore statistical models 48 based on machine learning (ML) algorithms for inferring surface currents from satellite observ49 able quantities like SSH, wind and temperature. These algorithms offer a potential alternative to 50 the traditional physics-based models. 51 The traditional method of calculating surface currents from sea surface height relies on the 52 following physical principles. Assuming 2D ﬂow and shallow water pressure, the momentum 53 equation at the ocean surface can be written as:

∂ u + u · ∇u + f × u = −g∇η + F

(1)

∂t

54 where F is the frictional term due to wind stress. For a sufﬁciently low Rossby number (accel55 eration terms small; a questionable assumption for the submesoscale regime), the leading-order 56 balances are geostrophy and Ekman ﬂow. The surface ﬂow can be split into a geostrophic and an

4

57 ageostrophic, Ekman component (u = ug + ue), and this leading-order force balance can be written 58 as

f × ug = −g∇η

(2)

f × ue = F

(3)

59 Satellite altimetery products typically provide the sea surface height relative to the geoid (SSH, η), 60 with tidally driven SSH signals removed (LeTraon and Morrow 2001). The geostrophic velocities 61 associated with the SSH anomalies are given by:

f vg = g ∂ η

(4)

∂x

f ug = −g ∂ η

(5)

∂y

62 Since geostrophic balance does not hold at the equator ( f ≈ 0), typically (Ducet et al. 2000), 63 a higher order “equatorial geostrophic” treatment is used to compute velocities near the equa64 tor (Lagerloef et al. 1999), which is matched to the geostrophic regime away from the equator. 65 Usually, the data-assimilative processing algorithms used to map along-track SSH observations to 66 gridded maps (e.g. AVISO Ducet et al. 2000) also involve some form of temporal smoothing. The 67 process of combining measurements from multiple satellites and ﬁltering can also lead to spurious 68 physical signals (Arbic et al. 2012) leading to exaggerated forward-cascades of energy. 69 In addition to the geostrophic velocities, some products like OSCAR (Ocean Surface Current 70 Analysis Real Time, Bonjean and Lagerloef 2002), or GEKCO (Geostrophic and Ekman Current 71 Observatory, Sudre and Morrow 2008; Sudre et al. 2013) provide an additional ageostrophic com72 ponent due to Ekman ﬂow. The Ekman velocity is related to friction, which in the upper layer of

5

73 the ocean is provided by wind stress and can be derived from the following equations:

f ve + ∂ τx = 0

(6)

∂z

f ue − ∂ τy = 0

(7)

∂z

τx = ρAz ∂ u

(8)

∂z

τy = ρAz ∂ v

(9)

∂z

74 Since the Coriolis parameter f changes sign at the equator, the functional relationship between

75 velocity and wind stress is different between the two hemispheres. In the Northern Hemisphere

76 we derive:

ue = 1 (τx + τy)

(10)

ρ 2Az| f |

ve = 1 (−τx + τy)

(11)

ρ 2Az| f |

77 And in the Southern Hemisphere:

ue = 1 (τx − τy)

(12)

ρ 2Az| f |

ve = 1 (τx + τy)

(13)

ρ 2Az| f |

78 where Az is the linear drag coefﬁcient representing vertical eddy viscosity. Alternatively we can

79 write these equations in terms of the Ekman layer depth hEk which is related to the eddy viscosity

80 Az as:

hEk = 2Az

(14)

f

81 Both of these quantities (Az,hEk) are largely unknown for the global ocean and are estimated based

82 on empirical multiple linear regression from Lagrangian surface drifters (Lagerloef et al. 1999;

83 Sudre et al. 2013). Typical values of Ekman depth hEk in the ocean range from 10 to 40 meters .

84 So geostrophy + Ekman is the essential underlying physical/dynamical “model” currently used

85 for calculating surface currents from satellite observations. This procedure, combining observa-

6

86 tions with physical principles, represents a top-down approach A more bottom-up approach would 87 be a data driven regression model that extracts information about empirical relationships from 88 data. Recently, machine learning (ML) methods have grown in popularity and have been proposed 89 for a wide range of problems in ﬂuid dynamics: Reynolds-averaged turbulence models (Ling 90 et al. 2016), detecting eddies from altimetric SSH ﬁelds (Lguensat et al. 2017), reconstructing 91 subsurface ﬂow-ﬁelds in the ocean from surface ﬁelds (Chapman and Charantonis 2017; Bolton 92 and Zanna 2018), sub-gridscale modeling of PDEs (Bar-Sinai et al. 2018), predicting the evolu93 tion of large spatio-temporally chaotic dynamical systems (Pathak et al. 2018), parameterizing 94 unresolved processes, like convective systems in climate models (Gentine et al. 2018), or eddy 95 momentum ﬂuxes in ocean models (Bolton and Zanna 2018), to name just a few examples. 96 In this study we aim to tackle a simpler problem than those cited above: training a ML model 97 to “learn” the empirical relationships between the different observable quantities (sea surface 98 height, wind stress etc.) and surface currents (u, v). The hypothesis to be tested is the follow99 ing: Can we use machine learning to provide surface current estimates that resolve small scale 100 (balanced/unbalanced) turbulent processes better than geostrophy+Ekman? The motivation for 101 doing this exercise is two-fold:

102 1. It will help us understand how machine learning can be applied in the context of traditional

103

physics-based theories. ML is often criticised as a ”black box.” But can we use ML to com-

104

plement our physical understanding? This present problem serves as a good test-bed since

105

the corresponding physical model is straightforward and well understood.

106 2. It may be of practical value when SWOT mission launches.

107 We see this work as a stepping stone to more complex applications of ML to ocean remote sensing 108 of ocean surface currents.

7

109 This paper is organized as follows. In section 2, we introduce the dataset that was used, the 110 framework of the problem and identify the key variables that are required for training a statistical 111 model to predict surface currents. In section 3 we describe numerical evaluation procedure for 112 baseline physics-based model that we are hoping to match/beat. In sections 4 and 5 we discuss the 113 statistical models that we used. We start with the simplest statistical model - linear regression in 114 Section 4 before moving on to more advanced methods like neural networks in Section 5. In sec115 tion 6 we summarize some the ﬁndings from the present study, discuss some of the shortcomings 116 of the present approach, propose some solutions as well as outline some of the future goals for this 117 project.

118 2. Dataset and Input Features 119 To focus on the physical problem of relating currents to surface quantities, rather than the ob120 servational problems of spatio-temporal sampling and instrument noise, we choose to analyze a 121 high-resolution global general circulation model (GCM), which provides a fully sampled, noise122 free realization of the ocean state. The dataset used for this present study is the surface ﬁelds from 123 the ocean component of the Community Earth System Model (CESM), called the Parallel Ocean 124 Program (POP) simulation (Smith et al. 2010) which has a ≈ 0.1◦ horizontal resolution, with 125 daily-averaged outputs available for the surface ﬁelds. The model employs a B-grid (scalars at 126 cell centers, vectors at cell corners) for the horizontal discretization and a three-time-level second127 order-accurate modiﬁed leap-frog scheme for stepping forward in time. The model solves the 128 primitive equations of motion, which, for the surface ﬂow, are essentially (1). Further details 129 about the model physics and simulations can be found in Small et al. (2014); Uchida et al. (2017).

8

130 We selected this study because of the long time record of available data (approx. 40 years), 131 although, in retrospect, we found that all our models can be trained completely with just a few 132 days of output! 133 A key choice in any ML application is the choice of features, or inputs, to the model. In this 134 paper, we experiment with a range of different feature combinations; seeing which features are 135 most useful for estimating currents is indeed one of our aims. The features we choose are all 136 quantities that are observable from satellites: SSH, surface wind stress (τx and τy), sea-surface 137 temperature (SST, θ ) and sea-surface Salinity (SSS). Our choice of features is also motivated by 138 the traditional physics-based model: the same information that goes into the physics-based model 139 should also prove useful to the ML model Just like the physics-based model, all the ML models 140 we consider are pointwise, local models: the goal is to predict the 2D velocity vector u, v at each 141 point, using data from at or around that point. 142 Beyond these observable physical quantities, we also need to provide the models with geo143 graphic information about the location and spacing between the neighboring points. In the physics144 based model, geography enters in two places: 1) in the Coriolis parameter f , and 2) in the grid 145 spacing dx and dx, which varies over the model domain. Geographic information can be provided 146 to the statistical models in a few different ways. The ﬁrst method involves providing the same 147 kind of spatial information that is provided to the physical models, i.e. f and local grid spacings 148 - dx and dy. We can also encode geographic information (lat, lon) in our input features, using a 149 coordinate transformation of the form:

X

sin(lat)

Y

=

sin(lon) · cos(lat)

(15)

Z

−cos(lon) · cos(lat)

9

150 to transform the spherical polar lat-lon coordinate into a homogeneous three dimensional coordi151 nate (Gregor et al. 2017). This transformation gives the 3D position of each point in Euclidean 152 space, rather than the geometrically warped lat / lon space (which has a singularity at the poles and 153 a discontinuity at the dateline). Note that one of the coordinates -X, that comes out of this kind 154 of coordinate transformation, is functionally the same as the coriolis parameter ( f ) normalized by 155 2Ω (Ω = Earth’s rotation). Therefore we will use X as proxy for f for all the statistical models 156 throughout this study. We also explored another approach where the only geographic information 157 provided to the models is X (= 2fΩ ). 158 Since geostrophic balance involves spatial derivatives, it is not sufﬁcient to simply provide SSH 159 and the local coordinates pointwise. In order to compute derivatives, we also need the SSH of 160 the surrounding grid points as a local stencil around each grid point. The approach we used 161 for providing this local stencil is motivated by the horizontal discretization of the POP model. 162 Horizontal derivatives of scalars (like SSH) on the B grid requires 4 cell centers. At every timestep, 163 each variable of the The 1◦ POP model ouput has 3600 × 2400 data points (minus the land mask). 164 We can simply rearrange each variable as a 1800 × 1200 × 2 × 2 dataset or split it into 4 variables 165 each with 1800 × 1200 data points, corresponding to the 4 grid cells required for taking spatial 166 derivatives. The variables that require spatial a spatial stencil for physical models, we will refer to 167 as the stencil inputs. For the variables for which we do not need spatial derivatives for (like wind 168 stress), we can simply use every alternate grid point resulting in a dataset of size 1800 × 200. We 169 will refer to these variables as point inputs. For the purpose of the statistical models the inputs 170 need to be ﬂattened and have all the land points removed. This means that each input variable 171 has a shape of either N × 2 × 2 or N depending on whether or not a spatial stencil is used ( where 172 N = 1800 × 1200− the points that fall over land). Alternatively we can think of the stencilled

10

1 Estimating Ocean Surface Currents from Satellite Observable Quantities

2

with Machine Learning

3

Anirban Sinha∗

4

California Institute of Technology, Pasadena, CA

5

Ryan Abernathey

6

Lamont Doherty Earth Observatory of Columbia University, Palisades, NY

7 ∗Corresponding author address: Anirban Sinha, California Institute of Technology, Pasadena, CA 8 91125. 9 E-mail: [email protected]

Generated using v4.3.2 of the AMS LATEX template

1

ABSTRACT 2

10 Global surface currents are usually inferred from directly observed quanti11 ties like sea-surface height, wind stress by applying diagnostic balance rela12 tions (like geostrophy and Ekman ﬂow), which provide a good approximation 13 of the dynamics of currents at large scales and low Rossby numbers. However, 14 newer generation satellite altimeters (like the upcoming SWOT mission) will 15 capture the high wavenumber variability associated with unbalanced compo16 nents, and applying these balances directly may lead to an incorrect estimate 17 of the surface ﬂow. In this study, we explore Machine Learning (ML) as an al18 ternate route to infer surface currents from satellite observable quantities using 19 SSH, SST and wind stress from available ocean GCM simulation outputs as 20 inputs to make predictions of surface currents (u,v), which are then compared 21 against the true GCM output. We demonstrate that a linear regression model 22 is ineffective at predicting velocities accurately beyond localized regions. In 23 comparison, a relatively simple neural network (NN) can predict surface cur24 rents accurately over most of the global ocean, with lower mean errors than 25 geostrophy+Ekman. Using a local stencil of neighboring grid points as ad26 ditional input features, we can train the deep learning models to effectively 27 “learn” spatial gradients and the physics of surface currents. By passing the 28 stenciled variables through convolutional ﬁlters we can help the model learn 29 spatial gradients much faster. Various training strategies are explored using 30 systematic feature hold out, to understand the effect of each input feature on 31 the NN’s ability to accurately represent surface ﬂow. Sensitivity analysis of a 32 reference NN reveals that besides SSH, geographic information is an essential 33 ingredient required for making accurate predictions of surface currents with 34 deep learning.

3

35 1. Introduction

36 The most reliable spatially continuous estimates of global surface currents in the ocean come 37 from geostrophic balance applied to the sea surface height (SSH) ﬁeld observed by satellite al38 timeters. For the most part, the dynamics of slow, large-scale currents (up to the mesoscale) are 39 well approximated by geostrophic balance, leading to a direct relationship between gradients of 40 SSH and near-surface currents. However, current meter observations for the past few decades 41 and some of the newer generation ultra-high-resolution numerical model simulations indicate the 42 presence of an energized submesoscale as well as high-frequency waves / tides at smaller spatial 43 and temporal scales (Rocha et al. 2016). These high-frequency unbalanced motions are likely to 44 complicate the estimation of surface currents from from SSH in the upcoming Surface Water and 45 Ocean Topography (SWOT) mission (Morrow et al. 2018). That is, the high-wavenumber SSH 46 variability may represent a different, ageostrophic regime, where geostrophy might not be the best 47 route to infer velocities. Motivated by this problem, in this study we explore statistical models 48 based on machine learning (ML) algorithms for inferring surface currents from satellite observ49 able quantities like SSH, wind and temperature. These algorithms offer a potential alternative to 50 the traditional physics-based models. 51 The traditional method of calculating surface currents from sea surface height relies on the 52 following physical principles. Assuming 2D ﬂow and shallow water pressure, the momentum 53 equation at the ocean surface can be written as:

∂ u + u · ∇u + f × u = −g∇η + F

(1)

∂t

54 where F is the frictional term due to wind stress. For a sufﬁciently low Rossby number (accel55 eration terms small; a questionable assumption for the submesoscale regime), the leading-order 56 balances are geostrophy and Ekman ﬂow. The surface ﬂow can be split into a geostrophic and an

4

57 ageostrophic, Ekman component (u = ug + ue), and this leading-order force balance can be written 58 as

f × ug = −g∇η

(2)

f × ue = F

(3)

59 Satellite altimetery products typically provide the sea surface height relative to the geoid (SSH, η), 60 with tidally driven SSH signals removed (LeTraon and Morrow 2001). The geostrophic velocities 61 associated with the SSH anomalies are given by:

f vg = g ∂ η

(4)

∂x

f ug = −g ∂ η

(5)

∂y

62 Since geostrophic balance does not hold at the equator ( f ≈ 0), typically (Ducet et al. 2000), 63 a higher order “equatorial geostrophic” treatment is used to compute velocities near the equa64 tor (Lagerloef et al. 1999), which is matched to the geostrophic regime away from the equator. 65 Usually, the data-assimilative processing algorithms used to map along-track SSH observations to 66 gridded maps (e.g. AVISO Ducet et al. 2000) also involve some form of temporal smoothing. The 67 process of combining measurements from multiple satellites and ﬁltering can also lead to spurious 68 physical signals (Arbic et al. 2012) leading to exaggerated forward-cascades of energy. 69 In addition to the geostrophic velocities, some products like OSCAR (Ocean Surface Current 70 Analysis Real Time, Bonjean and Lagerloef 2002), or GEKCO (Geostrophic and Ekman Current 71 Observatory, Sudre and Morrow 2008; Sudre et al. 2013) provide an additional ageostrophic com72 ponent due to Ekman ﬂow. The Ekman velocity is related to friction, which in the upper layer of

5

73 the ocean is provided by wind stress and can be derived from the following equations:

f ve + ∂ τx = 0

(6)

∂z

f ue − ∂ τy = 0

(7)

∂z

τx = ρAz ∂ u

(8)

∂z

τy = ρAz ∂ v

(9)

∂z

74 Since the Coriolis parameter f changes sign at the equator, the functional relationship between

75 velocity and wind stress is different between the two hemispheres. In the Northern Hemisphere

76 we derive:

ue = 1 (τx + τy)

(10)

ρ 2Az| f |

ve = 1 (−τx + τy)

(11)

ρ 2Az| f |

77 And in the Southern Hemisphere:

ue = 1 (τx − τy)

(12)

ρ 2Az| f |

ve = 1 (τx + τy)

(13)

ρ 2Az| f |

78 where Az is the linear drag coefﬁcient representing vertical eddy viscosity. Alternatively we can

79 write these equations in terms of the Ekman layer depth hEk which is related to the eddy viscosity

80 Az as:

hEk = 2Az

(14)

f

81 Both of these quantities (Az,hEk) are largely unknown for the global ocean and are estimated based

82 on empirical multiple linear regression from Lagrangian surface drifters (Lagerloef et al. 1999;

83 Sudre et al. 2013). Typical values of Ekman depth hEk in the ocean range from 10 to 40 meters .

84 So geostrophy + Ekman is the essential underlying physical/dynamical “model” currently used

85 for calculating surface currents from satellite observations. This procedure, combining observa-

6

86 tions with physical principles, represents a top-down approach A more bottom-up approach would 87 be a data driven regression model that extracts information about empirical relationships from 88 data. Recently, machine learning (ML) methods have grown in popularity and have been proposed 89 for a wide range of problems in ﬂuid dynamics: Reynolds-averaged turbulence models (Ling 90 et al. 2016), detecting eddies from altimetric SSH ﬁelds (Lguensat et al. 2017), reconstructing 91 subsurface ﬂow-ﬁelds in the ocean from surface ﬁelds (Chapman and Charantonis 2017; Bolton 92 and Zanna 2018), sub-gridscale modeling of PDEs (Bar-Sinai et al. 2018), predicting the evolu93 tion of large spatio-temporally chaotic dynamical systems (Pathak et al. 2018), parameterizing 94 unresolved processes, like convective systems in climate models (Gentine et al. 2018), or eddy 95 momentum ﬂuxes in ocean models (Bolton and Zanna 2018), to name just a few examples. 96 In this study we aim to tackle a simpler problem than those cited above: training a ML model 97 to “learn” the empirical relationships between the different observable quantities (sea surface 98 height, wind stress etc.) and surface currents (u, v). The hypothesis to be tested is the follow99 ing: Can we use machine learning to provide surface current estimates that resolve small scale 100 (balanced/unbalanced) turbulent processes better than geostrophy+Ekman? The motivation for 101 doing this exercise is two-fold:

102 1. It will help us understand how machine learning can be applied in the context of traditional

103

physics-based theories. ML is often criticised as a ”black box.” But can we use ML to com-

104

plement our physical understanding? This present problem serves as a good test-bed since

105

the corresponding physical model is straightforward and well understood.

106 2. It may be of practical value when SWOT mission launches.

107 We see this work as a stepping stone to more complex applications of ML to ocean remote sensing 108 of ocean surface currents.

7

109 This paper is organized as follows. In section 2, we introduce the dataset that was used, the 110 framework of the problem and identify the key variables that are required for training a statistical 111 model to predict surface currents. In section 3 we describe numerical evaluation procedure for 112 baseline physics-based model that we are hoping to match/beat. In sections 4 and 5 we discuss the 113 statistical models that we used. We start with the simplest statistical model - linear regression in 114 Section 4 before moving on to more advanced methods like neural networks in Section 5. In sec115 tion 6 we summarize some the ﬁndings from the present study, discuss some of the shortcomings 116 of the present approach, propose some solutions as well as outline some of the future goals for this 117 project.

118 2. Dataset and Input Features 119 To focus on the physical problem of relating currents to surface quantities, rather than the ob120 servational problems of spatio-temporal sampling and instrument noise, we choose to analyze a 121 high-resolution global general circulation model (GCM), which provides a fully sampled, noise122 free realization of the ocean state. The dataset used for this present study is the surface ﬁelds from 123 the ocean component of the Community Earth System Model (CESM), called the Parallel Ocean 124 Program (POP) simulation (Smith et al. 2010) which has a ≈ 0.1◦ horizontal resolution, with 125 daily-averaged outputs available for the surface ﬁelds. The model employs a B-grid (scalars at 126 cell centers, vectors at cell corners) for the horizontal discretization and a three-time-level second127 order-accurate modiﬁed leap-frog scheme for stepping forward in time. The model solves the 128 primitive equations of motion, which, for the surface ﬂow, are essentially (1). Further details 129 about the model physics and simulations can be found in Small et al. (2014); Uchida et al. (2017).

8

130 We selected this study because of the long time record of available data (approx. 40 years), 131 although, in retrospect, we found that all our models can be trained completely with just a few 132 days of output! 133 A key choice in any ML application is the choice of features, or inputs, to the model. In this 134 paper, we experiment with a range of different feature combinations; seeing which features are 135 most useful for estimating currents is indeed one of our aims. The features we choose are all 136 quantities that are observable from satellites: SSH, surface wind stress (τx and τy), sea-surface 137 temperature (SST, θ ) and sea-surface Salinity (SSS). Our choice of features is also motivated by 138 the traditional physics-based model: the same information that goes into the physics-based model 139 should also prove useful to the ML model Just like the physics-based model, all the ML models 140 we consider are pointwise, local models: the goal is to predict the 2D velocity vector u, v at each 141 point, using data from at or around that point. 142 Beyond these observable physical quantities, we also need to provide the models with geo143 graphic information about the location and spacing between the neighboring points. In the physics144 based model, geography enters in two places: 1) in the Coriolis parameter f , and 2) in the grid 145 spacing dx and dx, which varies over the model domain. Geographic information can be provided 146 to the statistical models in a few different ways. The ﬁrst method involves providing the same 147 kind of spatial information that is provided to the physical models, i.e. f and local grid spacings 148 - dx and dy. We can also encode geographic information (lat, lon) in our input features, using a 149 coordinate transformation of the form:

X

sin(lat)

Y

=

sin(lon) · cos(lat)

(15)

Z

−cos(lon) · cos(lat)

9

150 to transform the spherical polar lat-lon coordinate into a homogeneous three dimensional coordi151 nate (Gregor et al. 2017). This transformation gives the 3D position of each point in Euclidean 152 space, rather than the geometrically warped lat / lon space (which has a singularity at the poles and 153 a discontinuity at the dateline). Note that one of the coordinates -X, that comes out of this kind 154 of coordinate transformation, is functionally the same as the coriolis parameter ( f ) normalized by 155 2Ω (Ω = Earth’s rotation). Therefore we will use X as proxy for f for all the statistical models 156 throughout this study. We also explored another approach where the only geographic information 157 provided to the models is X (= 2fΩ ). 158 Since geostrophic balance involves spatial derivatives, it is not sufﬁcient to simply provide SSH 159 and the local coordinates pointwise. In order to compute derivatives, we also need the SSH of 160 the surrounding grid points as a local stencil around each grid point. The approach we used 161 for providing this local stencil is motivated by the horizontal discretization of the POP model. 162 Horizontal derivatives of scalars (like SSH) on the B grid requires 4 cell centers. At every timestep, 163 each variable of the The 1◦ POP model ouput has 3600 × 2400 data points (minus the land mask). 164 We can simply rearrange each variable as a 1800 × 1200 × 2 × 2 dataset or split it into 4 variables 165 each with 1800 × 1200 data points, corresponding to the 4 grid cells required for taking spatial 166 derivatives. The variables that require spatial a spatial stencil for physical models, we will refer to 167 as the stencil inputs. For the variables for which we do not need spatial derivatives for (like wind 168 stress), we can simply use every alternate grid point resulting in a dataset of size 1800 × 200. We 169 will refer to these variables as point inputs. For the purpose of the statistical models the inputs 170 need to be ﬂattened and have all the land points removed. This means that each input variable 171 has a shape of either N × 2 × 2 or N depending on whether or not a spatial stencil is used ( where 172 N = 1800 × 1200− the points that fall over land). Alternatively we can think of the stencilled

10