Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow copying with ANSI escape code control sequences #17059

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

FuegoFro
Copy link

@FuegoFro FuegoFro commented Apr 13, 2024

Summary of the Pull Request

This extends the copy command to be able to include control sequences, for use in tools that subsequently know how to parse and display that.

References and Relevant Issues

#15703

Detailed Description of the Pull Request / Additional comments

At a high level, this:

  • Expands the CopyTextArgs to have a withControlSequences bool.
  • Plumbs that bool down through many layers to where we actuall get data out of the text buffer.
  • Modifies the existing TextBuffer::Serialize to be more generic and renames it to TextBuffer::ChunkedSerialize.
  • Uses the new ChunkedSerialize to generate the data for the copy request.

Validation Steps Performed

To test this I've manually:

  • Generated some styled terminal contents, copied it with the control sequences, pasted it into a file, cated the file and seen that it looks the same.
  • Set "firstWindowPreference": "persistedWindowLayout" and validated that the contents of windows are saved and restored with styling intact.

I also checked that Invoke-OpenConsoleTests passed.

PR Checklist

@microsoft-github-policy-service microsoft-github-policy-service bot added Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. Area-Interaction Interacting with the vintage console window (as opposed to driving via API or hooks) Product-Terminal The new Windows Terminal. labels Apr 13, 2024

This comment has been minimized.

@FuegoFro
Copy link
Author

@microsoft-github-policy-service agree

Copy link
Author

@FuegoFro FuegoFro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if I should be looking to add automated tests with this PR; if so, I'd love a pointer on where would be best to do that (eg if there's an analogous test I could use as reference).

I've also left a few questions inline:

Comment on lines 209 to 214
if (WithControlSequences())
{
ss << L", withControlSequences: true";
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if I should have actual, translated special case names pulled from the resources when this is set (eg Copy with control sequences and Copy single line with control sequences). Happy to make that change if it's preferred!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meh, IMO this is good enough for now. We can always iterate on the naming in post

src/buffer/out/textBuffer.cpp Outdated Show resolved Hide resolved
src/buffer/out/textBuffer.cpp Outdated Show resolved Hide resolved
@@ -2102,6 +2102,32 @@ std::wstring TextBuffer::GetPlainText(const CopyRequest& req) const
return selectedText;
}

// Routine Description:
// - Retrieves the text data from the buffer *with* ANSI escape code control sequences and presents it in a clipboard-ready format.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, any guidance on comment max length? I didn't wrap any of the doc comments I added, they get a bit long heh 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to stick to around 120 characters and 140 at most, because then you can still comfortably read it on smaller ~14" laptop screens. In this instance I'd probably split it into 2 lines and make them even length.

FYI you don't need to write "Routine Description:" on new functions. This is only a hold over from our old code base.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, I'll go ahead and reflow the comments I've added (and remove that header haha) 👍

Copy link
Contributor

@tusharsnx tusharsnx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for doing this. I'm just a contributor like you 😄

I've added some helpful suggestions that you can think through while you wait for the core maintainers' reviews.

@@ -548,7 +548,7 @@ namespace winrt::TerminalApp::implementation
{
if (const auto& realArgs = args.ActionArgs().try_as<CopyTextArgs>())
{
const auto handled = _CopyText(realArgs.DismissSelection(), realArgs.SingleLine(), realArgs.CopyFormatting());
const auto handled = _CopyText(realArgs.DismissSelection(), realArgs.SingleLine(), realArgs.WithControlSequences(), realArgs.CopyFormatting());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can add a new CopyFormat, CopyFormat::VT, that would avoid the need for plumbing WithControlSequences down to ControlCore + introducing a new setting. It will just be carried forward with other copy formats.

If the user has "vt" in the copyFormatting list:

"copyFormatting": ["vt"],

we copy the text in the new VT format (including in any other formats set by the user).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add the new format VT = 0x04 here:

enum CopyFormat
{
HTML = 0x1,
RTF = 0x2,
All = 0xffffffff
};

In ControlCore::CopySelectionToClipboard, you can check if we need to copy in VT format like so:

const auto copyVt = WI_IsFlagSet(copyFormats, CopyFormat::VT);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I had the same thought when implementing this but wasn't sure if that's actually what we want here. In particular, currently the formats represent a wholly different clipboard format. That being said singleLine does affect all the formats so it's a bit odd to pass this boolean through to just affect the "plain text" (CF_UNICODETEXT) format, so I'm certainly open to having a "pseudo format" here that modifies the plain text rather than opting into another one! Not totally sure how to make this call haha.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely this isn't plain text, so it should actually be a separate clipboard format? Personally I would expect it to paste as ANSI only if the application can handle that, and the plain text format should still be available for applications that just accept text. Typically it should be the app's decision as to what format they want.

That said, I don't know if there's a generally agreed upon clipboard registration for ANSI text that will work automatically, but worst case I think apps should be able to offer users a choice of the available formats, and let them decide want they want at the time of the paste.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I don't know if there's a generally agreed upon clipboard registration for ANSI text that will work automatically, but worst case I think apps should be able to offer users a choice of the available formats, and let them decide want they want at the time of the paste.

As far as I'm aware there's no such format. It likely wouldn't work with a custom format either. My understanding is that the goal is that you can paste the ANSI text into markdown documents. For instance Discord supports this:
image

This will only work if the ANSI text is in the CF_UNICODETEXT format.

We've briefly talked about this as a team today and we felt like this new format should not be exposed in the settings UI here:
image

Precisely because of the interop issue that James mentioned. Instead it could be triggered with an action via the command palette. That is, we could for instance introduce a new action type for this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an ideal world, the apps that wanted functionality like this would have got together and decided on a custom clipboard format name, say "ANSIX3.64", which they'd all agree to support. Then when someone like Discord saw that on the clipboard, they could have automatically pasted it surrounded by their ansi markdown wrapper. Having to do that manually is sad. But I guess we have to live with what we've got.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hahaha indeed, I'm the person who hacked in the ansi "syntax highlighting" in Discord 😅

It's interesting (and, to be clear, I'm not sure what the "right" answer is here or if there is one), but in the context of using this in markdown (aka in Discord or in some other markdown as mentioned in #15703), what you're authoring is plaintext. Like, if I copied code from a website and it populated both the unicode and the HTML formats in the clipboard, when I pasted it into my markdown editor I'd personally expect it to paste the plaintext and not try to... paste formatted/wysiwyg contents, or to auto-insert a code block for me. And (again, in my opinion/use cases) similarly here. I would expect this to act less like a seamless parallel copy and more like a different behavior.

introduce a new action type for this

Especially given the above that makes sense! Would that mean something like what's done here, or not piggy backing on the copy action at all and introducing a new one with it's own arguments (if any)?

I really appreciate the discussion here, as well as you taking the time to talk this over with your team! 🙏

Copy link
Collaborator

@j4james j4james Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when I pasted it into my markdown editor I'd personally expect it to paste the plaintext and not try to... paste formatted/wysiwyg contents

This is what the "Paste Special" menu option is for. Have a look at WordPad if you want to see an example of how it's supposed to work. Just type in some text with some formatting and copy it to the clipboard, then press Ctrl+Alt+V. You should see a popup dialog offering you a bunch of choices for how you want the content to paste. If you paste as "Rich Text" it'll keep the formatting. If you paste as "Unformatted Text", you just get the plain text.

If you're responsible for the Discord clipboard code, you should be able to implement the same sort of thing. So by default you can paste as plain text, but still give users the option to paste as ANSI via a "Paste Special" context menu, or shortcut key.

Edit: I should add that I don't feel strongly about this if everyone else disagrees with me, or it's too much effort. It's just that this is the way I would have expected it to work in a perfect world.

src/buffer/out/textBuffer.cpp Outdated Show resolved Hide resolved
src/buffer/out/textBuffer.cpp Outdated Show resolved Hide resolved
Comment on lines 2568 to 2574
ChunkedSerialize(writeThreshold, L"\uFEFF", nullptr, [&](std::wstring& buffer, bool /* isDone */) {
const auto fileSize = gsl::narrow<DWORD>(buffer.size() * sizeof(wchar_t));
DWORD bytesWritten = 0;
THROW_IF_WIN32_BOOL_FALSE(WriteFile(file.get(), buffer.data(), fileSize, &bytesWritten, nullptr));
THROW_WIN32_IF_MSG(ERROR_WRITE_FAULT, bytesWritten != fileSize, "failed to write");
buffer.clear();
});
Copy link
Contributor

@tusharsnx tusharsnx Apr 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible refactoring:
I think we can avoid std::function usage here, by making ChunkedSerialize chunk buffer in row-by-row fashion instead of bytes-sized data. Then, both SerializeToPath and GetWithControlSequences can decide to read more or flush based on their specific logic.

This might be more complicated, so it's okay to not get into this right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if you decide to do this, then rename ChunkedSerialize to SerializeRow to be explicit.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is a fantastic idea, it really simplifies things! ChunkedSerialize felt like a clunky interface with the callback and it knew too much about how it might be used (changing behavior based on whether or not copyReq was passed). This refactor is much cleaner, I think, and completely invalidates a number of the questions I had below (no longer have the inline lambda for the rowInfo, no longer have the question about how to pass a maybe-CopyReq) 🙏

Copy link
Author

@FuegoFro FuegoFro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh thank you so much for the review @tusharsnx! I really appreciate it 😊

Comment on lines 2568 to 2574
ChunkedSerialize(writeThreshold, L"\uFEFF", nullptr, [&](std::wstring& buffer, bool /* isDone */) {
const auto fileSize = gsl::narrow<DWORD>(buffer.size() * sizeof(wchar_t));
DWORD bytesWritten = 0;
THROW_IF_WIN32_BOOL_FALSE(WriteFile(file.get(), buffer.data(), fileSize, &bytesWritten, nullptr));
THROW_WIN32_IF_MSG(ERROR_WRITE_FAULT, bytesWritten != fileSize, "failed to write");
buffer.clear();
});
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is a fantastic idea, it really simplifies things! ChunkedSerialize felt like a clunky interface with the callback and it knew too much about how it might be used (changing behavior based on whether or not copyReq was passed). This refactor is much cleaner, I think, and completely invalidates a number of the questions I had below (no longer have the inline lambda for the rowInfo, no longer have the question about how to pass a maybe-CopyReq) 🙏

@@ -548,7 +548,7 @@ namespace winrt::TerminalApp::implementation
{
if (const auto& realArgs = args.ActionArgs().try_as<CopyTextArgs>())
{
const auto handled = _CopyText(realArgs.DismissSelection(), realArgs.SingleLine(), realArgs.CopyFormatting());
const auto handled = _CopyText(realArgs.DismissSelection(), realArgs.SingleLine(), realArgs.WithControlSequences(), realArgs.CopyFormatting());
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I had the same thought when implementing this but wasn't sure if that's actually what we want here. In particular, currently the formats represent a wholly different clipboard format. That being said singleLine does affect all the formats so it's a bit odd to pass this boolean through to just affect the "plain text" (CF_UNICODETEXT) format, so I'm certainly open to having a "pseudo format" here that modifies the plain text rather than opting into another one! Not totally sure how to make this call haha.

src/buffer/out/textBuffer.cpp Outdated Show resolved Hide resolved
src/buffer/out/textBuffer.cpp Outdated Show resolved Hide resolved
// - addLineBreak - Whether to add a line break at the end of the serialized row.
// - buffer - A string to write the serialized row into.
// - previousTextAttr - Used for tracking state across multiple calls to `SerializeRow` for sequential rows. The value will be mutated by the call. The initial call should contain `nullopt`, and subsequent calls should pass the value that was written by the previous call.
void TextBuffer::SerializeRow(const ROW& row, const til::CoordType startX, const til::CoordType endX, const bool addLineBreak, std::wstring& buffer, std::optional<TextAttribute>& previousTextAttr) const
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method doesn't actually use anything on TextBuffer, should it still live on this class (as opposed to, eg, a freestanding function or maybe even on ROW)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ROW is technically an internal implementation detail of our TextBuffer. It's not intentional that it has leaked out into tons of places over the years. Ideally it should stay internal 1, and if it can't, then it should at least be a member of TextBuffer.

Footnotes

  1. The only reason we have a ROW at all is because our current text buffer implementation happens to store text in a list of "rows" (often called a line buffer). But that's not the only way to implement text buffers. In fact, a line buffer is usually considered the worst kind of text buffer, as it's neither trivial to implement nor fast. Alternative algorithms like the rope often don't have "rows" at all. They do know the current cursor position in rows and columns of course, but navigation between rows/columns happens by walking through the underlying text - you cannot simply "jump" to the preceding/next row. This shows why it's paramount that we don't leak algorithm-specific internals out of TextBuffer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes a lot of sense! I've moved this to be a private method since it's only used internally by TextBuffer, so we're not adding a usage of ROW to its external interface. Is that roughly what you had in mind? Again, there's still not anything in the method that actually needs it to be a TextBuffer member, but I wasn't sure what else "stay internal" might mean here haha.

Comment on lines 2619 to 2620
// - buffer - A string to write the serialized row into.
// - previousTextAttr - Used for tracking state across multiple calls to `SerializeRow` for sequential rows. The value will be mutated by the call. The initial call should contain `nullopt`, and subsequent calls should pass the value that was written by the previous call.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any particular way to notate out params (like buffer) and in/out params (like previousTextAttr)?

Also for previousTextAttr, another option is to have it take a copy of the value and return a TextAttribute indicating the final state, which callers would then need to pass to subsequent calls; is that better? (maybe less spooky action at a distance? Mutating input params gives me the heebie jeebies 👻)

@lhecker
Copy link
Member

lhecker commented Apr 15, 2024

I'll try to review your PR sometime later this week and hopefully help you with designing the new TextBuffer functions.
I'm currently trying to finish up a massive, rather complex PR myself unfortunately, which is why I can't help at the moment. 😣

Copy link
Author

@FuegoFro FuegoFro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the preliminary feedback @j4james and @lhecker and the clear communication! And no worries about taking some time to get to this, I'm in no rush haha 😊

@@ -2102,6 +2102,32 @@ std::wstring TextBuffer::GetPlainText(const CopyRequest& req) const
return selectedText;
}

// Routine Description:
// - Retrieves the text data from the buffer *with* ANSI escape code control sequences and presents it in a clipboard-ready format.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, I'll go ahead and reflow the comments I've added (and remove that header haha) 👍

// - addLineBreak - Whether to add a line break at the end of the serialized row.
// - buffer - A string to write the serialized row into.
// - previousTextAttr - Used for tracking state across multiple calls to `SerializeRow` for sequential rows. The value will be mutated by the call. The initial call should contain `nullopt`, and subsequent calls should pass the value that was written by the previous call.
void TextBuffer::SerializeRow(const ROW& row, const til::CoordType startX, const til::CoordType endX, const bool addLineBreak, std::wstring& buffer, std::optional<TextAttribute>& previousTextAttr) const
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes a lot of sense! I've moved this to be a private method since it's only used internally by TextBuffer, so we're not adding a usage of ROW to its external interface. Is that roughly what you had in mind? Again, there's still not anything in the method that actually needs it to be a TextBuffer member, but I wasn't sure what else "stay internal" might mean here haha.

@@ -548,7 +548,7 @@ namespace winrt::TerminalApp::implementation
{
if (const auto& realArgs = args.ActionArgs().try_as<CopyTextArgs>())
{
const auto handled = _CopyText(realArgs.DismissSelection(), realArgs.SingleLine(), realArgs.CopyFormatting());
const auto handled = _CopyText(realArgs.DismissSelection(), realArgs.SingleLine(), realArgs.WithControlSequences(), realArgs.CopyFormatting());
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hahaha indeed, I'm the person who hacked in the ansi "syntax highlighting" in Discord 😅

It's interesting (and, to be clear, I'm not sure what the "right" answer is here or if there is one), but in the context of using this in markdown (aka in Discord or in some other markdown as mentioned in #15703), what you're authoring is plaintext. Like, if I copied code from a website and it populated both the unicode and the HTML formats in the clipboard, when I pasted it into my markdown editor I'd personally expect it to paste the plaintext and not try to... paste formatted/wysiwyg contents, or to auto-insert a code block for me. And (again, in my opinion/use cases) similarly here. I would expect this to act less like a seamless parallel copy and more like a different behavior.

introduce a new action type for this

Especially given the above that makes sense! Would that mean something like what's done here, or not piggy backing on the copy action at all and introducing a new one with it's own arguments (if any)?

I really appreciate the discussion here, as well as you taking the time to talk this over with your team! 🙏

@lhecker
Copy link
Member

lhecker commented Apr 24, 2024

I've reviewed the code now and I think it's ready to merge. Ideally we would replace all those boolean parameters with something "more compact", an enum, struct, or something else, but I think that's outside the scope of the PR.

However, I'd personally like to wait merging this PR until we've released 1.21 to Preview (in a month or so?) and only merge it into version 1.22. That way we'll have enough time to test this and noodle about any followup improvements we may want to do.

@FuegoFro
Copy link
Author

Thank you for taking the time to review it! That's great to hear and makes sense regarding waiting for the release to merge 👍

I'm happy to make some refactors regarding the "more compact" thing you were talking about, but I'd need a little more about what you've got in mind. More compact in terms of syntax or bytes taken (or something else)? Would this effectively be combining singleLine, withControlSequences, formats, and maybe dismissSelection into a single struct that we pass through?

Copy link
Member

@zadjii-msft zadjii-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super fun. Let's do it.

Comment on lines 209 to 214
if (WithControlSequences())
{
ss << L", withControlSequences: true";
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meh, IMO this is good enough for now. We can always iterate on the naming in post

@zadjii-msft
Copy link
Member

The merge between this and main has gotten a little gnarly - I'm gonna take it out of the bugbash queue for today. Pretty sure it's not an impossible merge, but longer than I have time ATM

This extends the copy command to be able to include control sequences, for use in tools that subsequently know how to parse and display that.

At a high level, this:
- Expands the `CopyTextArgs` to have a `withControlSequences` bool.
- Plumbs that bool down through many layers to where we actuall get data
  out of the text buffer.
- Pulls out the core logic of `TextBuffer::Serialize` to a private `TextBuffer::_SerializeRow`.
- Uses the new `_SerializeRow` to generate the data for the copy request.

To test this I've manually:
- Generated some styled terminal contents, copied it with the control sequences, pasted it into a file, `cat`ed the file and seen that it looks the same.
- Set `"firstWindowPreference": "persistedWindowLayout"` and validated that the contents of windows are saved and restored with styling intact.
@lhecker
Copy link
Member

lhecker commented Oct 17, 2024

Uh oh. I forgot about this PR. 😥 I'll try to review & get this merged soon.

@lhecker lhecker self-assigned this Oct 17, 2024
@FuegoFro
Copy link
Author

Naw, it was my bad for taking so long to do the rebase, sorry about that! I've got it on top of latest main and retested everything 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Interaction Interacting with the vintage console window (as opposed to driving via API or hooks) Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. Product-Terminal The new Windows Terminal.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Copy with Control Sequences
5 participants