Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export schemas #4260

Draft
wants to merge 19 commits into
base: develop
Choose a base branch
from
Draft

Export schemas #4260

wants to merge 19 commits into from

Conversation

Anish9901
Copy link
Member

@Anish9901 Anish9901 commented Feb 13, 2025

Fixes #4250

Lets you export your schemas along with its table contents.

Checklist

  • My pull request has a descriptive title (not a vague title like Update index.md).
  • My pull request targets the develop branch of the repository
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no
    visible errors.

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@seancolsen seancolsen added this to the v0.2.1 milestone Feb 13, 2025
@Anish9901 Anish9901 marked this pull request as ready for review February 17, 2025 13:42
@Anish9901 Anish9901 changed the title Export schemas[WIP] Export schemas Feb 17, 2025
@Anish9901 Anish9901 added the pr-status: review A PR awaiting review label Feb 17, 2025
Copy link
Contributor

@mathemancer mathemancer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, pretty good. I have some changes to request for the actual pg_dump command.

'-U', conn.info.user,
'-d', conn.info.dbname,
'-n', sql.Identifier(schema_name).as_string(),
'-O' # Don't include owner info in the dump
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should add the --inserts flag here. We should prioritize compatibility over performance, unless the performance really becomes a problem for someone.

I'd also like you to look into whether there's a flag to add that will get rid of the SET commands at the start of the file. Those can also cause problems when loading, and aren't usually needed.

Copy link
Member Author

@Anish9901 Anish9901 Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also like you to look into whether there's a flag to add that will get rid of the SET commands at the start of the file.

There is no such flag, unfortunately, users would have to take those statements out manually. Apparently, postgres includes those statements to make sure that upon restore, the server behaves exactly the same as when the dump was created.

Given this, and the fact that we use GENERATED BY DEFAULT AS IDENTITY for id columns, would it make sense for us to prioritize compatibility over performance? The performance for someone dumping their local database and restoring it on a remote server would be terrible if there's a lot of data. Assuming this being a fairly common use case for someone using our product.

@Anish9901
Copy link
Member Author

@mathemancer As discussed in our 1:1, the dumps will now include the mathesar_types schema if any columns in the schema depends on our custom types.

Copy link
Contributor

@mathemancer mathemancer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

For context, for anyone watching, we're going to assume that we'll have postgres-client installed on the service layer for the foreseeable future.

@mathemancer
Copy link
Contributor

mathemancer commented Feb 20, 2025

@pavish we'll wait to merge this till you review the front end changes.

Copy link
Member

@pavish pavish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Anish9901

  • The export fails whenever the schema has at least one table that the user does not have select privileges on. For this scenario, I'd either expect the "Export" button to not be enabled, show a valid error, or export only the tables that the user has access to.
    • I do think there might be additional cases where export would fail due to permission related issues and we have to test it all.
  • I'd like the Export button to be placed within a dropdown menu. For eg., Similar to the dropdown in the Database Page which contains the Disconnect Database option.
    • The chances that the user would export the entire schema as a SQL file is low that it does not warrant us placing the Export button in such a prominent location.

@pavish pavish assigned Anish9901 and unassigned mathemancer and pavish Feb 20, 2025
@pavish pavish added pr-status: revision A PR awaiting follow-up work from its author after review and removed pr-status: review A PR awaiting review labels Feb 20, 2025
@Anish9901
Copy link
Member Author

@pavish I agree with your critique,

"Export" button to not be enabled, show a valid error, or export only the tables that the user has access to.

As far as export behavior is concerned, out of these three, I'd prefer the first option. I don't think this feature would be of much use to the user if we only dump a subset of tables from their schema. They could do that anyway with per table CSV export.

I'd like the Export button to be placed within a dropdown menu. For eg., Similar to the dropdown in the Database Page which contains the Disconnect Database option.

I agree.

That said, it would take me a while to figure out these changes on the frontend, and given that we want to cut our release tomorrow, would you be able to take over the frontend changes required? So that we can get this in before the feature freeze?

@zackkrida
Copy link
Contributor

@Anish9901 I do think @pavish's feedback should block this PR from inclusion in the current release if it can't be resolved by end of tomorrow. If it can, great!

I also think the triple-dot menu that triggers a dropdown, like that on the DB page, is also a good choice for the design.

Lastly, when I think about use cases for functionality like this, I can imagine there are users who would only want to export the structure of a schema and not just the data, depending on the scenario. It would be nice to offer both options in the future. There's an incongruity in the current approach where we offer csv data export at the table-level but sql data export at the schema level and both of these are labeled with the same "Export" button. I could see that being confusing. But I think it can be optimized in the future.

@pavish
Copy link
Member

pavish commented Feb 21, 2025

@Anish9901 The UI changes are not my primary concern. I'm not entirely sure about our product strategy with exporting schemas. We've jumped right into implementation on this rather than having a product level discussion on the requirements.

The fact that export would fail even if one table is owned by another user is not a good idea. I'd rather have the functionality working for all users, and export only the data that the user has access to. Until we have that figured out, the other reasonable option is to limit this functionality to database superusers only.

I'd rather wait a couple days to resolve this (and get it into the next release) rather than rushing the PR just to get it into this release.

cc: @zackkrida

@pavish
Copy link
Member

pavish commented Feb 21, 2025

@Anish9901 @zackkrida @mathemancer

  1. I've made the frontend changes in this PR to move the export button into a dropdown: efbae44.
    • Also the option is called 'Export SQL' instead of 'Export' to better differentiate it from the Export csv functionality we have for tables.
  2. I've made changes to display this dropdown and the export option only for database superusers in 1d01f70.
    • This ensures that the export would not end up in a failure.
    • This should suffice for users only working with a single role (including anyone creating databases from within Mathesar), and for other users working with a database superuser role.
    • Since the export is a SQL file, I think this is a reasonable restriction to have.
    • Note: This check is only done on the frontend. I think the backend is fine as-is since the request will fail if the user does not have access to a table in the schema.

With these changes, I think this PR could go into 0.2.1. I'll leave it to @zackkrida and @Anish9901 to make the call.

However, I also think we should discuss this feature holistically and it's actual usecase/user requirements soon after, I'm not entirely convinced about it's usefulness with the current implementation.

@zackkrida
Copy link
Contributor

@Anish9901 @pavish @mathemancer:

I also think the PR could be merged in it's current state. I really appreciate the improvements you've made, @pavish. However, I think it is best to draft this PR and save it for the next release for the following reasons:

It isn't clear to me yet who this functionality is for and what goal its intended to serve. When we do include it in a release I'd like to be able to give a clear impression of the value it offers to our users. Furthermore, if some discussion from the team and a refinement of this PR could make it significantly more useful to users, I'd like to take the extra time to do so before we include it in a release.

I'll draft this now just so it's clear that it shouldn't be merged in the short term.

@zackkrida zackkrida marked this pull request as draft February 21, 2025 13:03
@zackkrida zackkrida modified the milestones: v0.2.1, v0.2.2 Feb 21, 2025
@Anish9901
Copy link
Member Author

@pavish @zackkrida Here are some of my thoughts:

The problem with exporting a subset of tables as SQL:

  • When dumping a subset of tables, using -t flag, the information about the schema structure gets lost.
    • This means not only the schema structure definition is lost, but any objects other than the table, like functions, custom types, etc which are stored within the schema, that the table might depend upon would be lost. Without these objects, the restore might fail, and the dump will be useless.
  • If someone only has SELECT only on a referer table, but not on the referent(s), the information about the constraints is still included in the dump. And the restore would eventually fail if the referent table(s) are not present on target db.
  • For dumping a table successfully, you not only need SELECT on that table you also need, but you also need SELECT on the sequences used by that table.
  • Given that a lot of our users want READ ONLY setups, and the fact that we don't even support granting permissions on the sequences through mathesar, this feature would be useless for someone with only SELECT on a table.
  • Exporting CSV on the other hand requires only SELECT privilege on the table and is more portable, and accessible for someone wanting to use it in other applications(e.g. Excel, Airtable).
  • All this to say, that if anyone with limited privileges on the db wants their data out of postgres, SQL is probably not a format that would be suitable for them. Also, having "schema exports" without schema structure definition makes no sense to me.

Who is this feature useful for?

  • Anyone trying out mathesar for the first time or a long time user who has their data in one of our internal dockerized db wanting to move their data to a more robust postgres platform would use this feature.
  • Admins wanting a regular backup of their data, that can be easily restored. Given that we are still in beta, and are prone to make mistakes, it would induce confidence in users if backups can be taken easily.

How can we extend this in the future?

  • Allowing to restore these dumps easily through mathesar (would require design and probably would take more than a couple release cycles).
  • Allowing to export all the accessible tables as CSVs within a zip/tar file in one click from the schema page.
  • [Brent's idea] Have some metadata that would allow us to restore mathesar specific metadata along with the dump, e.g. Explorations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-status: revision A PR awaiting follow-up work from its author after review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow exporting schemas
5 participants