Refractive Geometry¶

This page explains the mathematical foundations of refractive ray tracing for underwater multi-view stereo. The key challenge is that cameras are in air while targets are underwater, with a flat water surface between them. Light rays refract at the air-water interface according to Snell’s law, requiring a modified projection model.

Coordinate System¶

AquaMVS inherits the coordinate system convention from AquaCal:

World Frame

The origin is at the optical center of the reference camera (whichever camera was designated as reference during AquaCal calibration). Camera names are hardware identifiers (e.g., e3v82e0), not sequential labels.

+X: Right (when facing the scene)
+Y: Forward (optical axis of reference camera)
+Z: Down (into the water)
Units: Meters throughout

Camera Frame

OpenCV convention for individual cameras:

+X: Right
+Y: Down
+Z: Forward (optical axis)

Pixel Coordinates

Image coordinates use (u, v) where:

u: Column (horizontal pixel index)
v: Row (vertical pixel index)
Origin at top-left corner

Extrinsics Convention

Transformation from world to camera frame:

\[\mathbf{p}_{\text{cam}} = \mathbf{R} \mathbf{p}_{\text{world}} + \mathbf{t}\]

where \(\mathbf{R}\) is the rotation matrix and \(\mathbf{t}\) is the translation vector.

Typical Geometry

In a standard AquaMVS setup:

Cameras are positioned near \(Z \approx 0\)
Water surface is at \(Z = z_{\text{water}} > 0\) (e.g., 0.978 m)
Underwater targets are at \(Z > z_{\text{water}}\)
Interface normal is \(\mathbf{n} = [0, 0, -1]\) (points from water toward air)

Camera Model¶

AquaMVS uses a standard pinhole camera model with radial and tangential distortion. All images are undistorted in a preprocessing step, so the projection model operates on rectified images.

Intrinsic Matrix

The intrinsic matrix \(\mathbf{K}\) maps 3D camera coordinates to pixel coordinates:

\[\begin{split}\mathbf{K} = \begin{bmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix}\end{split}\]

\(f_x, f_y\): Focal lengths in pixels
\(c_x, c_y\): Principal point (optical center) in pixels

Standard Projection (no refraction)

For a 3D point \(\mathbf{p}_{\text{cam}} = [X, Y, Z]^T\) in camera frame:

\[\begin{split}\begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = \frac{1}{Z} \mathbf{K} \begin{bmatrix} X \\ Y \\ Z \end{bmatrix}\end{split}\]

This standard model is not valid for underwater targets due to refraction.

Refractive Ray Casting¶

Ray casting is the inverse operation of projection: given a pixel coordinate, compute the 3D ray that passes through it. For refractive scenarios, this process has four steps.

Step 1: Pinhole Back-Projection

Cast a ray from the camera optical center through the pixel. In camera frame, the normalized ray direction is:

\[\begin{split}\mathbf{d}_{\text{cam}} = \mathbf{K}^{-1} \begin{bmatrix} u \\ v \\ 1 \end{bmatrix}\end{split}\]

Normalize to unit length: \(\mathbf{d}_{\text{cam}} \leftarrow \mathbf{d}_{\text{cam}} / \|\mathbf{d}_{\text{cam}}\|\)

Step 2: Transform to World Frame

Camera optical center in world frame:

\[\mathbf{C} = -\mathbf{R}^T \mathbf{t}\]

Ray direction in world frame:

\[\mathbf{d}_{\text{air}} = \mathbf{R}^T \mathbf{d}_{\text{cam}}\]

Step 3: Intersect Water Surface

Find the point where the ray crosses the plane \(Z = z_{\text{water}}\). Parametric ray equation:

\[\mathbf{p}(t) = \mathbf{C} + t \mathbf{d}_{\text{air}}\]

At the water surface:

\[C_z + t \, d_{z,\text{air}} = z_{\text{water}}\]

Solving for \(t\):

\[t = \frac{z_{\text{water}} - C_z}{d_{z,\text{air}}}\]

The intersection point is:

\[\mathbf{P} = \mathbf{C} + t \mathbf{d}_{\text{air}}\]

Step 4: Apply Snell’s Law

Refract the ray direction at the interface. See the next section for the vector form of Snell’s law.

        graph TD
  A["Camera in Air<br/>(optical center C)"] -->|"incident ray d_air"| B["Water Surface<br/>z = z_water"]
  B -->|"refracted ray d_water<br/>(Snell's law)"| C["Underwater Target<br/>(z > z_water)"]
  style B fill:#4fc3f7,stroke:#0288d1,stroke-width:2px

Snell’s Law: Vector Form¶

Snell’s law relates the incident and refracted ray directions at an interface between two media with different refractive indices.

Scalar Form

The traditional scalar formulation:

\[n_1 \sin \theta_i = n_2 \sin \theta_t\]

where:

\(n_1 = n_{\text{air}} \approx 1.0\): Refractive index of air
\(n_2 = n_{\text{water}} \approx 1.333\): Refractive index of water
\(\theta_i\): Angle of incidence (ray to normal)
\(\theta_t\): Angle of refraction (transmitted ray to normal)

Vector Form Derivation

For computational implementation, we need a vector formulation. Given:

\(\mathbf{d}_i\): Incident ray direction (unit vector)
\(\mathbf{n}\): Interface normal (unit vector, pointing from medium 2 to medium 1)
\(\eta = n_1 / n_2\): Ratio of refractive indices

The angle of incidence:

\[\cos \theta_i = -\mathbf{d}_i \cdot \mathbf{n}\]

(Note: The negative sign accounts for \(\mathbf{d}_i\) pointing toward the interface while \(\mathbf{n}\) points away from it.)

From Snell’s law:

\[\sin^2 \theta_t = \eta^2 (1 - \cos^2 \theta_i) = \eta^2 \sin^2 \theta_i\]

The refracted ray direction is:

\[\mathbf{d}_t = \eta \mathbf{d}_i + \left( \eta \cos \theta_i - \sqrt{1 - \sin^2 \theta_t} \right) \mathbf{n}\]

Total Internal Reflection

If \(\sin^2 \theta_t > 1\), total internal reflection occurs and there is no refracted ray. This happens when:

\[\eta^2 (1 - \cos^2 \theta_i) > 1\]

In practice, this is rare for air-to-water transitions at typical viewing angles but must be checked. AquaMVS returns a validity mask to flag these cases.

Implementation in AquaMVS

The vector form is implemented in aquamvs.projection.RefractiveProjectionModel. For air-to-water refraction:

\(\mathbf{d}_i = \mathbf{d}_{\text{air}}\) (incident ray from Step 3)
\(\mathbf{n} = [0, 0, -1]\) (points from water toward air)
\(\eta = n_{\text{air}} / n_{\text{water}} \approx 0.75\)
\(\mathbf{d}_t = \mathbf{d}_{\text{water}}\) (refracted ray into water)

The refracted ray originates at the water surface intersection point \(\mathbf{P}\) and points into the water.

Depth Parameterization¶

Depth maps in AquaMVS use ray depth (distance along the refracted ray) rather than world Z-coordinate. This parameterization is natural for plane sweep stereo and simplifies depth estimation.

Ray Depth Definition

For a pixel at \((u, v)\), ray casting returns:

\(\mathbf{O}\): Ray origin (water surface intersection point)
\(\mathbf{d}\): Ray direction (unit vector pointing into water)

A 3D point at ray depth \(d\) is:

\[\mathbf{p}(d) = \mathbf{O} + d \, \mathbf{d}\]

World Z Conversion

To convert ray depth to world Z-coordinate:

\[Z = O_z + d \, d_z\]

where \(O_z\) is the Z-component of the ray origin and \(d_z\) is the Z-component of the ray direction.

Why Ray Depth?

Ray depth is preferred over world Z for several reasons:

Plane Sweep: Depth hypotheses naturally correspond to distances along the ray. Sweeping in world Z would require non-uniform sampling.
Depth Priors: Sparse triangulation produces 3D points. Projecting onto the ray gives the depth directly without Z-to-depth conversion.
Numerical Stability: Ray depth is well-conditioned even for near-vertical rays, while Z-based parameterization can be ill-conditioned.

Depth Range Specification

Configuration files specify depth ranges in ray depth units (meters):

reconstruction:
  depth_min: 0.05  # meters along ray
  depth_max: 2.0   # meters along ray

These are independent of the water surface Z-coordinate or ray direction.

Connection to Code¶

The mathematical concepts described here are implemented in:

aquamvs.projection.RefractiveProjectionModel: Ray casting (cast_ray) and refractive projection (project).
Calibration: Calibration data structures providing camera parameters (intrinsics, extrinsics, water surface position, refractive indices).

For the underlying NumPy reference implementation (used during AquaCal calibration), see aquacal.core.refractive_geometry in the AquaCal library.

Next Steps¶

Now that we understand how rays are cast through the refractive interface, we can use them for depth estimation. The next section covers Dense Stereo Matching, which evaluates photometric similarity at discrete depth hypotheses to build dense depth maps.