You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for this great work and I was impressed by the design before you posted it on ArXiv (I noticed this on OpenReview and I am not a reviewer).
Is it possible to directly employ the monocular depth results? In your design, only the feature is applied but the DPT-head is dropped. But we know that Depth-anything2 can produce high-quality depth. It would be a pity that such prior information is lost. Have you tried some regarding experiments?
The text was updated successfully, but these errors were encountered:
Hi, thank you for your insightful question. Indeed, we initially considered directly using monocular depth predictions from Depth Anything. However, the monodepth model predicts relative depth values with unknown scale and shift parameters. For our application in Gaussian splatting, we require multi-view consistent depths, which can be combined into a coherent global 3D representation. We found it challenging to convert the relative depth to scale-consistent depths. This issue becomes even more pronounced as the number of views increases.
On the other hand, we explored an alternative approach of feature-level fusion, which we found worked surprisingly well. The method is also very simple, which avoids the complications associated with aligning relative depth scales. As a result, we opted for this design over relying on direct depth predictions.
It's also worth noting a related observation: when fine-tuning a pre-trained relative depth model for metric depth predictions, a common strategy is to retain only the pre-trained encoder and introduce a new decoder to predict metric depth. Our design shares similarities with this approach.
I hope this helps and we’re happy to continue the discussion if you have further questions or insights.
Thank you for this great work and I was impressed by the design before you posted it on ArXiv (I noticed this on OpenReview and I am not a reviewer).
Is it possible to directly employ the monocular depth results? In your design, only the feature is applied but the DPT-head is dropped. But we know that Depth-anything2 can produce high-quality depth. It would be a pity that such prior information is lost. Have you tried some regarding experiments?
The text was updated successfully, but these errors were encountered: