# CLI Reconstruction Guide

AquaMVS provides command-line tools for the complete reconstruction workflow. This guide walks through reconstruction using the `aquamvs` command-line interface, from data preparation to mesh export.

## Overview

The CLI workflow consists of:

1. **Data preparation**: Organize videos/images and calibration
2. **Configuration generation**: Auto-generate pipeline config from data
3. **(Optional) ROI masking**: Export reference images for drawing region-of-interest masks
4. **Reconstruction**: Run the full pipeline
5. **Mesh export**: Convert meshes to different formats

## Prerequisites

Ensure you have installed AquaMVS and its dependencies:

```bash
# Install PyTorch (CPU or CUDA version)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

# Install AquaMVS
pip install aquamvs

# Install optional dependencies
pip install git+https://github.com/cvg/LightGlue.git@edb2b83
pip install git+https://github.com/tlancaster6/RoMaV2.git
```

See the [Installation Guide](installation.rst) for detailed instructions.

## Step 1: Prepare Data

Organize your data with the following structure:

```
project_data/
├── videos/               # Video files OR image directories
│   ├── e3v82e0-cam1.mp4
│   ├── e3v82e1-cam2.mp4
│   └── ...
└── calibration.json      # AquaCal calibration file
```

**Video files**: Synchronized multi-camera videos (supported formats: `.mp4`, `.avi`, `.mkv`, `.mov`)

**Image directories**: Alternatively, use synchronized image sequences (one directory per camera with matching frame counts)

**Calibration**: AquaCal calibration JSON containing camera intrinsics, extrinsics, and refractive parameters

### Download Example Dataset

An example dataset is available on Zenodo: [https://doi.org/10.5281/zenodo.18702024](https://doi.org/10.5281/zenodo.18702024)

## Step 2: Generate Configuration

Use `aquamvs init` to auto-generate a pipeline configuration from your data:

```bash
aquamvs init \
  --input-dir project_data/videos \
  --pattern "^([a-z0-9]+)-" \
  --calibration project_data/calibration.json \
  --output-dir ./output
```

**Parameters:**

- `--input-dir`: Directory containing video files or camera subdirectories with images
- `--pattern`: Regex pattern to extract camera name from filename. The first capture group is used as the camera name. Example: `"^([a-z0-9]+)-"` extracts `e3v82e0` from `e3v82e0-cam1.mp4`
- `--calibration`: Path to AquaCal calibration JSON
- `--output-dir`: Output directory for reconstruction results
- `--preset`: (Optional) Quality preset (`fast`, `balanced`, `quality`). Bakes optimized parameters into the generated config. Default: `balanced`.
- `--config`: (Optional) Output path for generated config YAML (default: `config.yaml`)

**Output:**

The command prints a summary of matched cameras and saves `config.yaml`:

```
======================================================================
Configuration Initialization Summary
======================================================================

[OK] Matched 12 camera(s):
  e3v82e0        -> e3v82e0-cam1.mp4
  e3v82e1        -> e3v82e1-cam2.mp4
  ...

[OK] Configuration saved to: config.yaml
======================================================================
```

The generated config contains default parameters for all pipeline stages. You can edit `config.yaml` to customize reconstruction settings.

## Step 3: (Optional) Create ROI Masks

To limit reconstruction to a specific region of interest (e.g., exclude pool edges, instruments), you can create mask images:

```bash
aquamvs export-refs config.yaml --frame 0
```

**What this does:**

1. Reads frame 0 from all cameras
2. Applies undistortion (removes lens distortion)
3. Saves undistorted reference images to `output/reference_images/{camera}.png`

**Next steps:**

1. Open the exported images in an image editor (GIMP, Photoshop, Paint.NET)
2. Draw a mask where white = region to reconstruct, black = region to ignore
3. Save masks as PNG files in a `masks/` directory with the same camera names
4. Update your config:

```yaml
mask_dir: ./masks
```

Re-run the pipeline — only masked regions will be reconstructed.

## Step 4: Run Reconstruction

Execute the full reconstruction pipeline:

```bash
aquamvs run config.yaml
```

**What happens:**

1. **Undistortion**: Apply camera calibration to remove lens distortion
2. **Feature matching**: Extract and match features across camera pairs (LightGlue or RoMa)
3. **Triangulation**: Compute 3D points from feature correspondences (sparse mode) or depth ranges (full mode)
4. **Plane sweep stereo**: Dense depth estimation via photometric cost volume (full mode only)
5. **Depth fusion**: Merge multi-view depth maps into a single point cloud
6. **Surface reconstruction**: Generate a triangle mesh from the point cloud

**Progress:** A progress bar shows frame processing status. Logs indicate stage completion.

**Optional flags:**

- `-v` / `--verbose`: Enable verbose (DEBUG) logging
- `--device cuda`: Override device (use GPU if available)
- `-q` / `--quiet`: Suppress progress bars (useful for non-interactive use, CI)

**Example with GPU:**

```bash
aquamvs run config.yaml --device cuda
```

## Step 5: Examine Results

The output directory contains frame-wise results. Which files are present depends on the matcher type (`roma` vs `lightglue`) and pipeline mode (`full` vs `sparse`).

```
output/
├── config.yaml                          # Copy of the config used for this run
├── frame_000000/
│   ├── sparse/                          # LightGlue and RoMa sparse only
│   │   └── sparse_cloud.pt
│   ├── features/                        # Optional (runtime.save_features)
│   │   ├── {camera}.pt                  #   LightGlue only: per-camera keypoints
│   │   └── {ref}_{src}.pt              #   Per-pair matches
│   ├── depth_maps/                      # Full mode only (always saved)
│   │   └── {camera}.npz
│   ├── consistency_maps/                # Full mode, LightGlue only (runtime.save_consistency_maps)
│   │   ├── {camera}.npz
│   │   └── {camera}.png
│   ├── point_cloud/
│   │   ├── fused.ply                    #   Full mode (always saved)
│   │   └── sparse.ply                   #   Sparse mode (always saved)
│   ├── mesh/                            # Always saved
│   │   └── surface.ply
│   └── viz/                             # runtime.viz_enabled + viz_stages
│       ├── depth_{camera}.png
│       ├── confidence_{camera}.png
│       ├── sparse_{camera}.png          #   LightGlue only (viz_stages: features)
│       ├── matches_{ref}_{src}.png      #   LightGlue only (viz_stages: features)
│       ├── fused_top.png                #   viz_stages: scene
│       ├── fused_oblique.png
│       ├── fused_side.png
│       ├── mesh_top.png
│       ├── mesh_oblique.png
│       ├── mesh_side.png
│       └── rig.png                      #   viz_stages: rig
├── frame_000010/
│   └── ...
└── summary/                             # viz_stages: summary (multi-frame runs)
    └── timeseries_gallery.png
```

### Data outputs

**`sparse/sparse_cloud.pt`** -- Triangulated 3D points from feature correspondences, stored as PyTorch tensors with `points_3d` and `scores` keys. Used internally to derive depth ranges for plane sweep stereo. *LightGlue and RoMa sparse mode only* (RoMa full mode skips triangulation).

**`features/{camera}.pt`** and **`features/{ref}_{src}.pt`** -- Per-camera keypoints/descriptors and per-pair match correspondences. Useful for debugging feature quality. LightGlue saves both per-camera and per-pair files; RoMa sparse saves per-pair only; RoMa full does not save features. *Config: `runtime.save_features` (default: off).*

**`depth_maps/{camera}.npz`** -- Per-camera depth and confidence maps (ray depth in meters, float32). Each `.npz` contains `depth` (H x W) and `confidence` (H x W) arrays. *Full mode only. Always saved. Set `runtime.keep_intermediates: false` to delete after fusion.*

**`consistency_maps/{camera}.npz`** and **`{camera}.png`** -- Number of source cameras that agree on the depth at each pixel, saved as both raw counts (`.npz`) and colormapped visualization (`.png`). Only produced by the geometric consistency filter. *Full mode, LightGlue only* (RoMa full mode skips consistency filtering). *Config: `runtime.save_consistency_maps` (default: off).*

**`point_cloud/fused.ply`** -- Fused point cloud merged from all cameras, with vertex colors. Produced in full mode after depth map fusion and outlier removal. *Always saved.*

**`point_cloud/sparse.ply`** -- Colored sparse point cloud downsampled from triangulated correspondences. Produced in sparse mode. *Always saved.*

**`mesh/surface.ply`** -- Reconstructed triangle mesh with vertex colors (Poisson reconstruction by default). *Always saved.*

### Visualizations

All visualizations require `runtime.viz_enabled: true` (default: off). The `runtime.viz_stages` list controls which groups are generated; leave it empty to enable all stages.

**`viz/depth_{camera}.png`** and **`viz/confidence_{camera}.png`** -- Colormapped depth and confidence maps for each ring camera. *Stage: `depth`. Full mode only.*

**`viz/sparse_{camera}.png`** and **`viz/matches_{ref}_{src}.png`** -- Keypoint overlays on undistorted images and side-by-side match visualizations. *Stage: `features`. LightGlue only.*

**`viz/fused_top.png`**, **`fused_oblique.png`**, **`fused_side.png`**, **`mesh_top.png`**, **`mesh_oblique.png`**, **`mesh_side.png`** -- Point cloud and mesh rendered from three canonical viewpoints (top-down, 45-degree oblique, side). *Stage: `scene`.*

**`viz/rig.png`** -- Camera rig diagram showing camera frustums and the water plane, optionally overlaid with the fused point cloud. *Stage: `rig`.*

**`summary/timeseries_gallery.png`** -- Grid gallery of height maps across all processed frames. Only generated for multi-frame runs. *Stage: `summary`.*

### View Results

**Point cloud:**

```python
import open3d as o3d
pcd = o3d.io.read_point_cloud("output/frame_000000/fused.ply")
o3d.visualization.draw_geometries([pcd])
```

**Mesh:**

```python
mesh = o3d.io.read_triangle_mesh("output/frame_000000/surface.ply")
o3d.visualization.draw_geometries([mesh])
```

## Step 6: Export Mesh

Convert the reconstructed mesh to other formats:

### Single File Export

```bash
# Export to OBJ (widely supported, preserves colors)
aquamvs export-mesh output/frame_000000/surface.ply --format obj

# Export to STL with simplification (for 3D printing)
aquamvs export-mesh output/frame_000000/surface.ply --format stl --simplify 10000

# Export to GLB (compact, web-ready)
aquamvs export-mesh output/frame_000000/surface.ply --format glb
```

### Batch Export

Convert all meshes in a directory:

```bash
aquamvs export-mesh --input-dir output/ --format obj --output-dir meshes/
```

**Parameters:**

- `--format`: Output format (`obj`, `stl`, `gltf`, `glb`)
- `--simplify`: (Optional) Target face count for mesh simplification
- `--input-dir`: Batch mode — convert all `.ply` files in directory
- `--output-dir`: (Batch mode) Output directory (defaults to `--input-dir`)

## Configuration Tips

Edit `config.yaml` to customize reconstruction:

### Switch Matcher Type

```yaml
sparse_matching:
  matcher_type: lightglue  # Fast, sparse features (default)
  # matcher_type: roma      # Slower, dense correspondences, higher accuracy
```

### Adjust Depth Range

Focus reconstruction on a specific depth range (in meters, ray depth):

```yaml
reconstruction:
  depth_min: 0.5
  depth_max: 2.0
```

**How to determine range:** Run once with defaults, inspect depth map statistics, then narrow range for second pass.

### GPU vs. CPU

```yaml
runtime:
  device: cuda  # Use GPU (requires CUDA-capable GPU)
  # device: cpu  # CPU-only (slower, but works everywhere)
```

### Quality vs. Speed

Use a quality preset when generating your configuration:

```bash
aquamvs init --input-dir ... --preset fast      # Fewer depth planes, larger batches — quickest results
aquamvs init --input-dir ... --preset balanced   # Default tradeoffs (default)
aquamvs init --input-dir ... --preset quality    # Maximum depth planes, best accuracy
```

The preset bakes optimized values directly into the generated `config.yaml`. To regenerate with a different preset, re-run `aquamvs init` with the desired `--preset` value.

For fine-grained control, manually adjust depth hypotheses in the generated config:

```yaml
reconstruction:
  num_depth_hypotheses: 128  # Default: 64, higher = better quality, longer runtime
  depth_batch_size: 8        # Process depth planes in batches (GPU speedup)
```

### Pipeline Mode

```yaml
reconstruction:
  pipeline_mode: full    # Dense stereo (default, best quality)
  # pipeline_mode: sparse  # Sparse reconstruction (faster, lower quality)
```

**Sparse mode:** Uses only feature matches (no plane sweep stereo). Faster but produces sparser point clouds.

**Full mode:** Dense depth estimation for each camera. Slower but produces complete surface reconstructions.

### Multi-Frame Processing

Process a frame range:

```yaml
preprocessing:
  frame_start: 0
  frame_stop: 100   # Process frames 0-99
  frame_step: 10    # Every 10th frame
```

Leave `frame_stop: null` to process all frames.

### Output Control

Depth maps, point clouds, and meshes are always saved. Control other outputs:

```yaml
runtime:
  save_features: false          # Per-camera features and matches (default: off)
  save_consistency_maps: false  # Consistency maps (default: off)
  keep_intermediates: true      # Keep depth maps after fusion (default: on)
```

## Benchmarking

Compare LightGlue and RoMa reconstruction pathways on a single frame:

```bash
aquamvs benchmark config.yaml --frame 0
```

This runs four pathway variants (LightGlue sparse, LightGlue full, RoMa sparse, RoMa full) and prints a comparison table with timing, point count, and cloud density for each.

**Optional flags:**

- `--extractors lightglue roma`: Run only specific extractors (default: all)
- `--with-clahe`: Enable CLAHE preprocessing for all pathways

Results are printed to the console as a formatted table. For sample output and pathway comparisons, see the [Sample Output](benchmarks) page.

## Temporal Filtering

Apply temporal median filtering to remove fish/debris from underwater video:

```bash
aquamvs temporal-filter input_video.mp4 --output-dir filtered/ --window 30
```

**Parameters:**

- `input`: Video file or directory of videos
- `--output-dir`: Output directory
- `--window`: Median window size in frames (default: 30)
- `--framestep`: Output every Nth frame (default: 1)
- `--output-fps`: (Optional) Target output frame rate. Overrides automatic FPS detection (useful when source video metadata is unreliable).
- `--format`: Output format (`png` or `mp4`, default: `png`)

The median filter removes moving objects (fish, particles) while preserving static structure (water surface).

## See Also

- [Python API Tutorial](tutorial/index.rst): Programmatic workflow using the `Pipeline` class
- [Theory](theory/index.rst): Understand the refractive geometry and algorithms
- [API Reference](api/index.rst): Detailed documentation of all modules and functions

## Troubleshooting

See the {doc}`Troubleshooting Guide <troubleshooting>` for solutions to common issues with installation, pipeline errors, notebook usage, and configuration.