CLI Reconstruction Guide¶
AquaMVS provides command-line tools for the complete reconstruction workflow. This guide walks through reconstruction using the aquamvs command-line interface, from data preparation to mesh export.
Overview¶
The CLI workflow consists of:
Data preparation: Organize videos/images and calibration
Configuration generation: Auto-generate pipeline config from data
(Optional) ROI masking: Export reference images for drawing region-of-interest masks
Reconstruction: Run the full pipeline
Mesh export: Convert meshes to different formats
Prerequisites¶
Ensure you have installed AquaMVS and its dependencies:
# Install PyTorch (CPU or CUDA version)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
# Install AquaMVS
pip install aquamvs
# Install optional dependencies
pip install git+https://github.com/cvg/LightGlue.git@edb2b83
pip install git+https://github.com/tlancaster6/RoMaV2.git
See the Installation Guide for detailed instructions.
Step 1: Prepare Data¶
Organize your data with the following structure:
project_data/
├── videos/ # Video files OR image directories
│ ├── e3v82e0-cam1.mp4
│ ├── e3v82e1-cam2.mp4
│ └── ...
└── calibration.json # AquaCal calibration file
Video files: Synchronized multi-camera videos (supported formats: .mp4, .avi, .mkv, .mov)
Image directories: Alternatively, use synchronized image sequences (one directory per camera with matching frame counts)
Calibration: AquaCal calibration JSON containing camera intrinsics, extrinsics, and refractive parameters
Download Example Dataset¶
An example dataset is available on Zenodo: https://doi.org/10.5281/zenodo.18702024
Step 2: Generate Configuration¶
Use aquamvs init to auto-generate a pipeline configuration from your data:
aquamvs init \
--input-dir project_data/videos \
--pattern "^([a-z0-9]+)-" \
--calibration project_data/calibration.json \
--output-dir ./output
Parameters:
--input-dir: Directory containing video files or camera subdirectories with images--pattern: Regex pattern to extract camera name from filename. The first capture group is used as the camera name. Example:"^([a-z0-9]+)-"extractse3v82e0frome3v82e0-cam1.mp4--calibration: Path to AquaCal calibration JSON--output-dir: Output directory for reconstruction results--preset: (Optional) Quality preset (fast,balanced,quality). Bakes optimized parameters into the generated config. Default:balanced.--config: (Optional) Output path for generated config YAML (default:config.yaml)
Output:
The command prints a summary of matched cameras and saves config.yaml:
======================================================================
Configuration Initialization Summary
======================================================================
[OK] Matched 12 camera(s):
e3v82e0 -> e3v82e0-cam1.mp4
e3v82e1 -> e3v82e1-cam2.mp4
...
[OK] Configuration saved to: config.yaml
======================================================================
The generated config contains default parameters for all pipeline stages. You can edit config.yaml to customize reconstruction settings.
Step 3: (Optional) Create ROI Masks¶
To limit reconstruction to a specific region of interest (e.g., exclude pool edges, instruments), you can create mask images:
aquamvs export-refs config.yaml --frame 0
What this does:
Reads frame 0 from all cameras
Applies undistortion (removes lens distortion)
Saves undistorted reference images to
output/reference_images/{camera}.png
Next steps:
Open the exported images in an image editor (GIMP, Photoshop, Paint.NET)
Draw a mask where white = region to reconstruct, black = region to ignore
Save masks as PNG files in a
masks/directory with the same camera namesUpdate your config:
mask_dir: ./masks
Re-run the pipeline — only masked regions will be reconstructed.
Step 4: Run Reconstruction¶
Execute the full reconstruction pipeline:
aquamvs run config.yaml
What happens:
Undistortion: Apply camera calibration to remove lens distortion
Feature matching: Extract and match features across camera pairs (LightGlue or RoMa)
Triangulation: Compute 3D points from feature correspondences (sparse mode) or depth ranges (full mode)
Plane sweep stereo: Dense depth estimation via photometric cost volume (full mode only)
Depth fusion: Merge multi-view depth maps into a single point cloud
Surface reconstruction: Generate a triangle mesh from the point cloud
Progress: A progress bar shows frame processing status. Logs indicate stage completion.
Optional flags:
-v/--verbose: Enable verbose (DEBUG) logging--device cuda: Override device (use GPU if available)-q/--quiet: Suppress progress bars (useful for non-interactive use, CI)
Example with GPU:
aquamvs run config.yaml --device cuda
Step 5: Examine Results¶
The output directory contains frame-wise results. Which files are present depends on the matcher type (roma vs lightglue) and pipeline mode (full vs sparse).
output/
├── config.yaml # Copy of the config used for this run
├── frame_000000/
│ ├── sparse/ # LightGlue and RoMa sparse only
│ │ └── sparse_cloud.pt
│ ├── features/ # Optional (runtime.save_features)
│ │ ├── {camera}.pt # LightGlue only: per-camera keypoints
│ │ └── {ref}_{src}.pt # Per-pair matches
│ ├── depth_maps/ # Full mode only (always saved)
│ │ └── {camera}.npz
│ ├── consistency_maps/ # Full mode, LightGlue only (runtime.save_consistency_maps)
│ │ ├── {camera}.npz
│ │ └── {camera}.png
│ ├── point_cloud/
│ │ ├── fused.ply # Full mode (always saved)
│ │ └── sparse.ply # Sparse mode (always saved)
│ ├── mesh/ # Always saved
│ │ └── surface.ply
│ └── viz/ # runtime.viz_enabled + viz_stages
│ ├── depth_{camera}.png
│ ├── confidence_{camera}.png
│ ├── sparse_{camera}.png # LightGlue only (viz_stages: features)
│ ├── matches_{ref}_{src}.png # LightGlue only (viz_stages: features)
│ ├── fused_top.png # viz_stages: scene
│ ├── fused_oblique.png
│ ├── fused_side.png
│ ├── mesh_top.png
│ ├── mesh_oblique.png
│ ├── mesh_side.png
│ └── rig.png # viz_stages: rig
├── frame_000010/
│ └── ...
└── summary/ # viz_stages: summary (multi-frame runs)
└── timeseries_gallery.png
Data outputs¶
sparse/sparse_cloud.pt – Triangulated 3D points from feature correspondences, stored as PyTorch tensors with points_3d and scores keys. Used internally to derive depth ranges for plane sweep stereo. LightGlue and RoMa sparse mode only (RoMa full mode skips triangulation).
features/{camera}.pt and features/{ref}_{src}.pt – Per-camera keypoints/descriptors and per-pair match correspondences. Useful for debugging feature quality. LightGlue saves both per-camera and per-pair files; RoMa sparse saves per-pair only; RoMa full does not save features. Config: runtime.save_features (default: off).
depth_maps/{camera}.npz – Per-camera depth and confidence maps (ray depth in meters, float32). Each .npz contains depth (H x W) and confidence (H x W) arrays. Full mode only. Always saved. Set runtime.keep_intermediates: false to delete after fusion.
consistency_maps/{camera}.npz and {camera}.png – Number of source cameras that agree on the depth at each pixel, saved as both raw counts (.npz) and colormapped visualization (.png). Only produced by the geometric consistency filter. Full mode, LightGlue only (RoMa full mode skips consistency filtering). Config: runtime.save_consistency_maps (default: off).
point_cloud/fused.ply – Fused point cloud merged from all cameras, with vertex colors. Produced in full mode after depth map fusion and outlier removal. Always saved.
point_cloud/sparse.ply – Colored sparse point cloud downsampled from triangulated correspondences. Produced in sparse mode. Always saved.
mesh/surface.ply – Reconstructed triangle mesh with vertex colors (Poisson reconstruction by default). Always saved.
Visualizations¶
All visualizations require runtime.viz_enabled: true (default: off). The runtime.viz_stages list controls which groups are generated; leave it empty to enable all stages.
viz/depth_{camera}.png and viz/confidence_{camera}.png – Colormapped depth and confidence maps for each ring camera. Stage: depth. Full mode only.
viz/sparse_{camera}.png and viz/matches_{ref}_{src}.png – Keypoint overlays on undistorted images and side-by-side match visualizations. Stage: features. LightGlue only.
viz/fused_top.png, fused_oblique.png, fused_side.png, mesh_top.png, mesh_oblique.png, mesh_side.png – Point cloud and mesh rendered from three canonical viewpoints (top-down, 45-degree oblique, side). Stage: scene.
viz/rig.png – Camera rig diagram showing camera frustums and the water plane, optionally overlaid with the fused point cloud. Stage: rig.
summary/timeseries_gallery.png – Grid gallery of height maps across all processed frames. Only generated for multi-frame runs. Stage: summary.
View Results¶
Point cloud:
import open3d as o3d
pcd = o3d.io.read_point_cloud("output/frame_000000/fused.ply")
o3d.visualization.draw_geometries([pcd])
Mesh:
mesh = o3d.io.read_triangle_mesh("output/frame_000000/surface.ply")
o3d.visualization.draw_geometries([mesh])
Step 6: Export Mesh¶
Convert the reconstructed mesh to other formats:
Single File Export¶
# Export to OBJ (widely supported, preserves colors)
aquamvs export-mesh output/frame_000000/surface.ply --format obj
# Export to STL with simplification (for 3D printing)
aquamvs export-mesh output/frame_000000/surface.ply --format stl --simplify 10000
# Export to GLB (compact, web-ready)
aquamvs export-mesh output/frame_000000/surface.ply --format glb
Batch Export¶
Convert all meshes in a directory:
aquamvs export-mesh --input-dir output/ --format obj --output-dir meshes/
Parameters:
--format: Output format (obj,stl,gltf,glb)--simplify: (Optional) Target face count for mesh simplification--input-dir: Batch mode — convert all.plyfiles in directory--output-dir: (Batch mode) Output directory (defaults to--input-dir)
Configuration Tips¶
Edit config.yaml to customize reconstruction:
Switch Matcher Type¶
sparse_matching:
matcher_type: lightglue # Fast, sparse features (default)
# matcher_type: roma # Slower, dense correspondences, higher accuracy
Adjust Depth Range¶
Focus reconstruction on a specific depth range (in meters, ray depth):
reconstruction:
depth_min: 0.5
depth_max: 2.0
How to determine range: Run once with defaults, inspect depth map statistics, then narrow range for second pass.
GPU vs. CPU¶
runtime:
device: cuda # Use GPU (requires CUDA-capable GPU)
# device: cpu # CPU-only (slower, but works everywhere)
Quality vs. Speed¶
Use a quality preset when generating your configuration:
aquamvs init --input-dir ... --preset fast # Fewer depth planes, larger batches — quickest results
aquamvs init --input-dir ... --preset balanced # Default tradeoffs (default)
aquamvs init --input-dir ... --preset quality # Maximum depth planes, best accuracy
The preset bakes optimized values directly into the generated config.yaml. To regenerate with a different preset, re-run aquamvs init with the desired --preset value.
For fine-grained control, manually adjust depth hypotheses in the generated config:
reconstruction:
num_depth_hypotheses: 128 # Default: 64, higher = better quality, longer runtime
depth_batch_size: 8 # Process depth planes in batches (GPU speedup)
Pipeline Mode¶
reconstruction:
pipeline_mode: full # Dense stereo (default, best quality)
# pipeline_mode: sparse # Sparse reconstruction (faster, lower quality)
Sparse mode: Uses only feature matches (no plane sweep stereo). Faster but produces sparser point clouds.
Full mode: Dense depth estimation for each camera. Slower but produces complete surface reconstructions.
Multi-Frame Processing¶
Process a frame range:
preprocessing:
frame_start: 0
frame_stop: 100 # Process frames 0-99
frame_step: 10 # Every 10th frame
Leave frame_stop: null to process all frames.
Output Control¶
Depth maps, point clouds, and meshes are always saved. Control other outputs:
runtime:
save_features: false # Per-camera features and matches (default: off)
save_consistency_maps: false # Consistency maps (default: off)
keep_intermediates: true # Keep depth maps after fusion (default: on)
Benchmarking¶
Compare LightGlue and RoMa reconstruction pathways on a single frame:
aquamvs benchmark config.yaml --frame 0
This runs four pathway variants (LightGlue sparse, LightGlue full, RoMa sparse, RoMa full) and prints a comparison table with timing, point count, and cloud density for each.
Optional flags:
--extractors lightglue roma: Run only specific extractors (default: all)--with-clahe: Enable CLAHE preprocessing for all pathways
Results are printed to the console as a formatted table. For sample output and pathway comparisons, see the Sample Output page.
Temporal Filtering¶
Apply temporal median filtering to remove fish/debris from underwater video:
aquamvs temporal-filter input_video.mp4 --output-dir filtered/ --window 30
Parameters:
input: Video file or directory of videos--output-dir: Output directory--window: Median window size in frames (default: 30)--framestep: Output every Nth frame (default: 1)--output-fps: (Optional) Target output frame rate. Overrides automatic FPS detection (useful when source video metadata is unreliable).--format: Output format (pngormp4, default:png)
The median filter removes moving objects (fish, particles) while preserving static structure (water surface).
See Also¶
Python API Tutorial: Programmatic workflow using the
PipelineclassTheory: Understand the refractive geometry and algorithms
API Reference: Detailed documentation of all modules and functions
Troubleshooting¶
See the Troubleshooting Guide for solutions to common issues with installation, pipeline errors, notebook usage, and configuration.