目的: 推理出相机位姿, 高斯椭球G和相机位姿P联调

Recovering Camera Parameters

Simple optimization problem based on the Weiszfeld algorithm to calculate per-camera focal.
- \[f^{*} = \arg \min_{f} \sum_{i=0}^{W} \sum_{j=0}^{H} O^{i,j}\|(i', j') -f\! \frac{ \left(P^{i,j,0}, P^{i,j,1}\right) }{ P^{i,j,2} }\|\]
- $\bar{f} = \frac{1}{N} \sum_{i=1}^{N} f^{*}$ propose stabilizing the estimated focal length by averaging across all training views to represent the computed focal length that is utilized in subsequent processes
Camera Transformation
- T = [R t]
- Can be computed by RANSAC with PnP for each image pair.

Scaling pair-wise to Globally Aligned Poses

Using DUSt3R

first construct a complete connectivity graph $G = (V, \epsilon)$, V 代表单个image所对应的点云, and any image pair $e = (n, m) \in \epsilon$
$\{ P_i \in \mathbb{R}^{H \times W \times 3} \}_{i=1}^{N}$ 初始点云
全局对齐后的点云集合： $\{ \tilde{P}_i \in \mathbb{R}^{H \times W \times 3} \}_{i=1}^{N}$

全局对齐

update the point maps, transformation matrix, and a scale factor
\[\tilde{P}^{*} = \arg \min_{\tilde{P},\,T,\,\sigma} \sum_{e \in E} \sum_{v \in e} \sum_{i=1}^{HW} O_{i}^{v,e} \, \big\| \tilde{P}^{i}_{v} - \sigma_{e} T_{e} P^{i}_{v,e} \big\|\]
- $T_e$ should align both pointmaps $P_{n,e}$ and $P_{m,e}$ with the world-coordinate pointmaps $\tilde{P}_{m,e}$

Global Point Initialization

基本上可以说是confidence-based (个人理解),删除或者说mask掉多余重复的点

但作者不同版本给出了两个公式：

arxiv_V6, arxiv_V3

Joint Optimization for Alignment

\[G^*, T^* = \arg \min_{G, T} \sum_{v \in \mathcal{N}} \sum_{i=1}^{HW} \Big( \tilde{C}^{i}_{v}(G, T) - C^{i}_{v}(G, T) \Big)\]

        ###########################################################
        # InstantSplatting
        # means3D = pc.get_xyz
        rel_w2c = torch.eye(4, device=self._xyz.device)
        quaternion = viewpoint_camera.quaternion
        rel_w2c[:3, :3] = quaternion_to_matrix(normalize_quaternion(quaternion.unsqueeze(0))).squeeze(0)
        rel_w2c[:3, 3] = viewpoint_camera.T
        # Transform mean and rot of Gaussians to camera frame
        gaussians_xyz = self.get_xyz.clone()
        gaussians_rot = self.get_rotation.clone()
        # 把相机位姿强行绑定了 从而可微联合优化
        xyz_ones = torch.ones(gaussians_xyz.shape[0], 1, dtype=gaussians_xyz.dtype, device=gaussians_xyz.device)
        xyz_homo = torch.cat((gaussians_xyz, xyz_ones), dim=1)
        gaussians_xyz_trans = (rel_w2c.detach().inverse() @ rel_w2c @ xyz_homo.T).T[:, :3]
        gaussians_rot_trans = quaternion_raw_multiply(quaternion.detach() * torch.tensor([1, -1, -1, -1], device=quaternion.device), quaternion_raw_multiply(quaternion, gaussians_rot))
        ###########################################################
        means3D = gaussians_xyz_trans
        rotations = gaussians_rot_trans  # pc.get_rotation