Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to step further by using off-the-shelf mono-depth instead of the features only? #1

Open
JUGGHM opened this issue Oct 18, 2024 · 2 comments

Comments

@JUGGHM
Copy link

JUGGHM commented Oct 18, 2024

Thank you for this great work and I was impressed by the design before you posted it on ArXiv (I noticed this on OpenReview and I am not a reviewer).

Is it possible to directly employ the monocular depth results? In your design, only the feature is applied but the DPT-head is dropped. But we know that Depth-anything2 can produce high-quality depth. It would be a pity that such prior information is lost. Have you tried some regarding experiments?

@haofeixu
Copy link
Member

Hi, thank you for your insightful question. Indeed, we initially considered directly using monocular depth predictions from Depth Anything. However, the monodepth model predicts relative depth values with unknown scale and shift parameters. For our application in Gaussian splatting, we require multi-view consistent depths, which can be combined into a coherent global 3D representation. We found it challenging to convert the relative depth to scale-consistent depths. This issue becomes even more pronounced as the number of views increases.

On the other hand, we explored an alternative approach of feature-level fusion, which we found worked surprisingly well. The method is also very simple, which avoids the complications associated with aligning relative depth scales. As a result, we opted for this design over relying on direct depth predictions.

It's also worth noting a related observation: when fine-tuning a pre-trained relative depth model for metric depth predictions, a common strategy is to retain only the pre-trained encoder and introduce a new decoder to predict metric depth. Our design shares similarities with this approach.

I hope this helps and we’re happy to continue the discussion if you have further questions or insights.

@JUGGHM
Copy link
Author

JUGGHM commented Oct 20, 2024

Thank you for your detailed and insightful answer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants