Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OCamldoc/Odoc highlighting #106

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

n-osborne
Copy link

This PR proposes to add syntax highlighting for OCamldoc and Odoc documentation.

This adds syntax highlighting for both *.mld files and documentation comments in OCaml files.

OCamldoc/Odoc syntax needs to load OCaml syntax in order to highlight code extracts. This PR uses the following strategy to avoid an infinite loop: documentation inside code extract inside documentation is highlighted as simple comments.

This means that we can have the following load chains:

  1. syntax/ocaml.vimsyntax/odoc.vimsyntax/ocaml.vim → STOP
  2. syntax/odoc.vimsyntax/ocaml.vim → STOP

This is joint work with @shym.

Copy link
Contributor

@Julow Julow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been running with an older version of this patch for many weeks and it's awesome !

I've noticed a bug due to types in code blocks, in a .mld page:

{[
type t = int
]}

This is not recognized as text, int, [foo]

It seems related to type decls as it doesn't happen if followed by something else:

{[
type t = int
val f : int
]}

Also happens with invalid syntax so it might be related to ]} not being a end token:

{[
module
]}

@n-osborne
Copy link
Author

Thank you :-)

We can't reproduce the first example.

@shym
Copy link
Contributor

shym commented Jan 29, 2024

Besides that, the OCaml syntax is heavily using “regions”, for instance for module.
I’m not sure I really understand why it is written in this way and I didn’t find any way to escape, so that ]} would break those regions without modifying the OCaml syntax for that purpose (which is maybe not worth it, it is complex enough as it is).

Copy link
Contributor

@Maelan Maelan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! I just took a superficial look at the diff, without trying the code myself.

The bug found by @Julow seems very relevant. I’m a bit surprised that you cannot reproduce it. @Julow can you reproduce it with the latest version of the patch? If so, it’s indeed very likely because of ]} not being an ending token for the type region. In that case, ]} should be added to ending tokens of relevant regions. It is unfortunate complexity, but things are what they are.

(Unless, perhaps, there is a way to arrange the syntax-matching instructions in some order that ultimately does The Right Thing, with respect to endings of nested regions; I doubt it, though I admit at the moment I don’t have in mind the details of how nested regions work).

doc/ocaml.txt Outdated Show resolved Hide resolved
syntax/odoc.vim Outdated
syn region odocTargetSpecific matchgroup=odocMarker start="{%html:" end="%}" contains=@Spell
endif

syn region odocDyckWord contained start="{" end="}" contains=odocDyckWord
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny but obscure naming, why not something obvious like odocBracket or odocGoodBracket or odocBalancedBracket ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because that’s how they’re called? Especially since we are in the context of syntax, it seemed fitting.
https://en.wikipedia.org/wiki/Dyck_language

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This region should have a transparent attribute (see :help syn-transparent).

Regarding naming: “Dyck word” is jargon known only to people who are knowledgeable in formal languages, and it equally applies to balanced square brackets in code spans (and to any recursive region really, including odocBold and so on). odocBalancedBrace is hopefully obvious to anyone reading that file, and it is more specific.

syntax/odoc.vim Show resolved Hide resolved
syntax/ocaml.vim Outdated Show resolved Hide resolved
@Julow
Copy link
Contributor

Julow commented Jan 30, 2024

I checked that I use the last version of the patch, merged on top of #81. I reproduce the bug both in a .mld file and in doc comment.

@shym
Copy link
Contributor

shym commented Jan 31, 2024

I can reproduce it, I don’t really what happened last time 🤔
Adding \]} to possible end tokens for ocamlTypeDef and ocamlTypeDefAnd fixes a few type examples for me. Now I wonder if we are missing other cases. I’m not sure what behaviour we should aim for in the module case.

@Julow
Copy link
Contributor

Julow commented Feb 1, 2024

This also interferes with regular comments:

(* type below is highlighted as a comment *)

type t

@n-osborne
Copy link
Author

This also interferes with regular comments:

(* type below is highlighted as a comment *)

type t

I have this behaviour only with neovim.

@shym
Copy link
Contributor

shym commented Feb 29, 2024

We’ve pushed an update that fixes the bug in which a single (* opens two ocamlComment scopes.
We’ve also updated the syntax so that documentation is accepted where comments are.
With this updated version, the mld

{[ type t = int ]}
This is as text, [code]

is correctly parsed with vim 9.1.16 and neovim 0.9.4 for us.
Is it broken for you? On which version of vim?

@Julow
Copy link
Contributor

Julow commented Mar 1, 2024

I confirm this now works fine :)

Though, the doc comments in ml and mli files are no longer rendered in the same color as regular comments, which is confusing. Also, code spans are not highlighted differently: (** foo [bar] *).

@shym
Copy link
Contributor

shym commented Mar 1, 2024

I confirm this now works fine :)

We’re making progress, great!
Unfortunately, we already found a bug :-( In the OCaml code:

module Make (X: sig end) =
  struct
    type t = int
  end

the body of the module is not highlighted correctly. This is similar to the bug we solved for comments, where an opening parenthesis was opening two syntax scopes. The parenthesis for the functor parameter is opening two ocamlModParam.
Maybe trying to inline the documentation highlighting would end up more reliable?

Though, the doc comments in ml and mli files are no longer rendered in the same color as regular comments, which is confusing.

You think there is more than getting accustomed to this new highlighting? We could indeed highlight the text body as comments. Our rationale is that comments are usually dimmed while we want documentation to stand out.

Also, code spans are not highlighted differently: (** foo [bar] *).

Indeed. I think odoc makes very little highlighting for inline code blocks, so we kept it very sober. Any suggestion?

@n-osborne
Copy link
Author

n-osborne commented Mar 22, 2024

We've removed ocamlModParam from ocamlModParam's contains. This fixes the last bug and doesn't seem to change things regarding #3 (even on today's master branch).

I believe all the known bugs are fixed.

@Maelan
Copy link
Contributor

Maelan commented Apr 2, 2024

Hi, sorry for the delay! At last I found time to have a closer look at the PR. I found a couple of bugs, I will send a review soon.

Copy link
Contributor

@Maelan Maelan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At last here is my review.

I used the manual of odoc and that of ocamldoc to check for missing syntax features.

Things I noted to be missing, but which would be overkill for this PR:

  • crossrefs can contain quotes, e.g. {! section:"section-label-with-dashes" }
  • keywords (like @param or @see) each have their special syntax,
    e.g. @param name_of_arg Paragraph of text about the argument
    or @see <url_within_angle_brackets> Text for the link.
    To be honest, I’d like this to be highlighted (as the special syntax for @param and @raise always tricks me), but this can go in a followup PR.

The bugs mentioned in earlier messages seem resolved…

Except for the one about {[ module ]}. This also happens for things like {[ object ]} or even {[ ( ]}. Is that something we want to fix, or are we happy with linting becoming messy when a code block contains bits of syntax which, alone, are invalid? Or at least do we tolerate it because it’s too hard to fix?

On the other hand, and to my pleased surprise, things like {[ type ]} or {[ 42 : int ] work just fine, despite the type linter’s heavy reliance on regions (see e.g. ocamlTypeDef and ocamlTypeAnnot), because among the ending tokens for all these regions, there is ], which subsumes ]}. :-)

Regarding doc-comments being highlighted differently from regular comments:
I too find it a bit confusing, but it makes sense that it stands out; plus, trying to colorize doc-comments works badly with {b, {i, {e (see my reviewing comment). I think we can let the users give a custom style to doc-comments if they wish so (by using :hi ... for ocamlDocumentation, odocBold, odocItalic, odocEmphasis).

Regarding code spans ([BLA] and {v BLA v} and also {@LANGUAGE[ BLA ]}) not being highlighted:
I would say that it’s okay as long as the delimiters are highlighted and there is a match group the user can use for custom styling (in these cases, it’s respectively odocCode, odocVerbatim, odocCodeBlock). However, personally I would highlight the text in [BLA], because the fact that we use brackets, instead of Markdown-style backticks, always tricks me, in particular I keep visually confusing them with OCaml lists’ delimiters.

FWTW, I tested the PR with Vim 9.1 in a color terminal, and I used the following .ml file for testing:

  (* normal comment (* nested normal comment *)
   * speling eror
   * let *)
  let test : int = 42*3
  (** oDoc comment inside [.ml] file
   * @author me
   * @since always
   * @unknown or non-existing tag @.
   * {1 this is a heading}
   * {1this is a wonky heading}
   * {2:lbl this is a heading with a label}
   * {b this is bold}
   * {i this is italic}
   * {e this is emphasized {b very much}}
   * {^ superscript}
   * {_ subscript}
   * {% target-specific content, default target %}
   * {%mytarget: target-specific content %}
   * {%html: HTML content %}
   * {%latex: LaTeX content %}
   * {math LaTeX content %}
   * {m LaTeX content %}
   * {v verbatim   text [{!@  v}
   * {! extension:Module.Sub.crossref}
   * {! extension-decl:Module.Sub.crossref}
   * {{!val:Module.Sub.crossref} text for the cross-reference}
   * {{: ocaml.org/ } text for the web link}
   * {nonsensical} curly braces } are syntax errors {
   *
   * - unordered item
   * - unordered item
   * - unordered item
   *
   * + ordered item 1
   * + ordered item 2
   * + ordered item 3
   *
   * escaped characters: \@normal text \[ normal text \] \{ normal text \} \\
   * (wrong escape: \a)
   * some decorative asterisks
   * speling eror
   *
   * (* nested normal comment *)
   * (** nested doc-comment *)
   * (**)
   * let
   *)
  type t = (int * int (* comment in doc *))
  type t = (int * int (** doc-comment in doc *))
  (* type below should not be highlighted as a comment: *)
  type t
  (* empty/ruler comments: *)
  (**)
  (***)

  (* stop comment: *)
  (**/**)

  module Make (X: sig end) =
    struct
      type t = int
    end
  (*after*)

As well as the following .mld file:

this is an oDoc file, the file extension is [.mld]
@author me
@since always
@unknown or non-existing tag @.
{1 this is a heading}
{1this is a wonky heading}
{2:lbl this is a heading with a label}
{b this is bold}
{i this is italic}
{e this is emphasized {b very much}}
{^ superscript}
{_ subscript}
{% target-specific content, default target %}
{%mytarget: target-specific content %}
{%html: HTML content %}
{%latex: LaTeX content %}
{math LaTeX content %}
{m LaTeX content %}
{v verbatim   text [{!@  v}
{! extension:Module.Sub.crossref}
{! extension-decl:Module.Sub.crossref}
{{!val:Module.Sub.crossref} text for the cross-reference}
{{: ocaml.org/ } text for the web link}
{nonsensical} curly braces } are syntax errors {

- unordered item
- unordered item
- unordered item

+ ordered item 1
+ ordered item 2
+ ordered item 3

escaped characters: \@normal text \[ normal text \] \{ normal text \} \\
(wrong escape: \a)
(**)
** some decorative asterisks
speling eror

(* is this is a comment? *)
(** is this is a doc-comment? *)

below is a code block:
{@python[
    def f(x):
        return x + 1
]}

below are OCaml code blocks:
{@ocaml[
  (* normal comment (* nested normal comment *)
   * speling eror
   * let *)
  let test : int = 42*3
  (** oDoc comment inside [.ml] file
   ** {1 this is a heading}
   ** (* nested normal comment *)
   ** speling eror
   ** let
   **)
  type t = (int * int (* comment in doc *))
  type t = (int * int (** doc-comment in doc *))
  (* type below should not be highlighted as a comment: *)
  type t
  (* empty/ruler comments: *)
  (**)
  (***)
]}

{[
type t = int ->
]}

Is this recognized as oDoc text (int, [foo]) ?

{[
let f : int ->
]}

Is this recognized as oDoc text (int, [foo]) ?

{[
module
]}

(* this is still OCaml syntax! *)
Foo = Stdlib.Map.Make (Stdlib.Int)

]}

This is text again!

syntax/ocaml.vim Outdated Show resolved Hide resolved
syntax/ocaml.vim Outdated Show resolved Hide resolved
syntax/ocaml.vim Outdated Show resolved Hide resolved
syntax/ocaml.vim Outdated Show resolved Hide resolved
syntax/ocaml.vim Show resolved Hide resolved
syntax/odoc.vim Outdated Show resolved Hide resolved
syntax/odoc.vim Outdated Show resolved Hide resolved
syntax/odoc.vim Show resolved Hide resolved
syntax/odoc.vim Outdated
Comment on lines 62 to 74
" Shamelessly borrowed from HTML syntax
hi def odocBold term=bold cterm=bold gui=bold
hi def odocEmphasis term=bold,underline cterm=bold,underline gui=bold,underline
hi def odocItalic term=italic cterm=italic gui=italic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE:
No action needed, but something to note.
This resets any style set by a surrounding region,
so the matched text (is bolded/italicized but)
falls back to default colors.
For instance, even if we are inside a doc-comment in an OCaml file,
and the user chose to highlight the doc-comments somehow
(e.g. :hi link ocamlDocumentation ocamlComment
or :hi ocamlDocumentation ctermfg=red),
the bolded text will appear as white.
Unfortunately this is a limitation of Vim,
there is no way of inheriting surrounding style
(see e.g. here or there).
More modestly, with some amount of programming
(I found this technique here,
and someone even wrote a plugin),
we can copy the style given to a specific match group,
in our case ocamlDocumentation;
something like:

exec 'hi odocBold term=bold cterm=bold gui=bold'
  \. ' ctermfg=' . synIDattr(synIDtrans(hlID('ocamlDocumentation')), 'fg', 'cterm')
  \. ' ctermbg=' . synIDattr(synIDtrans(hlID('ocamlDocumentation')), 'bg', 'cterm')
  \. ' guifg='   . synIDattr(synIDtrans(hlID('ocamlDocumentation')), 'fg', 'gui')
  \. ' guibg='   . synIDattr(synIDtrans(hlID('ocamlDocumentation')), 'bg', 'gui')
" ditto for odocItalic, odocEmphasis

… but this would have to be done after style is given to ocamlDocumentation,
and I found no way of detecting this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, and then bolded text would be colored even in .mld files, which is something we don’t want, I guess.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the context. I rather agree that it’s better, at least for now, to keep it simple and eventually fix it if needs be.

syntax/odoc.vim Outdated Show resolved Hide resolved
syntax/odoc.vim Outdated Show resolved Hide resolved
syntax/odoc.vim Outdated Show resolved Hide resolved
@Maelan
Copy link
Contributor

Maelan commented Apr 16, 2024

Now that I think of it, I think many of the regions in odoc.vim should be made transparent (:help syn-transparent), so that what’s inside them is styled just like what’s containing them (while still highlighting the delimiters). This makes it easier to change the styling of documentation text (because otherwise, regions that are not transparent don’t inherit style from their containers, they reset it). The relevant regions to be made transparent are:

odocLinkText ?
odocMiscInline
odocTable (both rules)
odocTableRow
odocTableEntry
odocList
odocListItem
odocDyckWord (already mentioned)
odocBalancedBracket (from another reviewing comment)

We don’t want to give the transparent attribute to regions that we might want to style specially (such as odocCode), because, apparently, transparent regions can’t be given proper style later.

syntax/odoc.vim Outdated Show resolved Hide resolved
@Maelan
Copy link
Contributor

Maelan commented Apr 16, 2024

Sorry for all these comments! I have finally a higher-level remark: when I tried the PR with a real-life .mli file, I found the bilingual ocaml/odoc highlighting to be too noisy for my eyes, and I was a bit lost. For instance, odoc tags are highlighted like Keyword, but the OCaml syntax also uses Keyword a lot, and (with my colorscheme at least) it stands out a lot. On the other hand, PreProc would almost make sense here, OCaml dosn’t use this group, and with my colorscheme it looks dimmer than Keyword, closer to the default styling.

So I did these customizations on my part:

hi link ocamlDocEncl PreProc
" ^ defined by PR review
hi link odocMarker PreProc
hi link odocCrossrefMarker odocMarker
" ^ defined by PR review
hi link odocTag PreProc
hi link odocCrossrefKw Structure
" ^ or PreProc
hi link odocLinkText odocItalic
hi link odocCode Constant
hi link odocCodeBlock odocCode
hi link odocVerbatim odocCodeBlock

… and I also disabled OCaml syntax linting within code blocks within odoc text. And now it looks better to me. These are just suggestions, in case you like it too.

n-osborne and others added 3 commits June 28, 2024 18:01
Co-authored-by: Samuel Hym <[email protected]>
Co-authored-by: Maëlan <[email protected]>
Co-authored-by: Samuel Hym <[email protected]>
Co-authored-by: Maëlan <[email protected]>
Disallow nested ocamlModParam. For some reason the ocamlModParam was
opened twice, but closed only once.

Co-authored-by: Samuel Hym <[email protected]>
Maelan added a commit to Maelan/vim-ocaml that referenced this pull request Jul 1, 2024
Maelan added a commit to Maelan/vim-ocaml that referenced this pull request Jul 1, 2024
- support escaped characters (`odocEscaped, odocEscapedError`)
- support escaped brackets and well-balanced brackets within `odocCode`
- report erroneous @tags (`odocTagError`)
- fix the regex for @tags so that it matches exact words, not prefixes
- define the `odocCrossrefMarker` match group
@Maelan
Copy link
Contributor

Maelan commented Jul 1, 2024

In case that helps, in the branch odoc-syntax-patched of my own fork I have commited most of the changes I suggested in this review, now rebased on top of your latest update. It is split in several commits for easier reviewing, but feel free to squash everything, or copy-paste the parts you want.

- support escaped characters (`odocEscaped, odocEscapedError`)
- support escaped brackets and well-balanced brackets within `odocCode`
- report erroneous @tags (`odocTagError`)
- fix the regex for @tags so that it matches exact words, not prefixes
- define the `odocCrossrefMarker` match group
@shym
Copy link
Contributor

shym commented Sep 27, 2024

First of all, thanks a lot for your very thorough review!
I hope we didn’t miss anything, as we took so much time to address it.

I have finally a higher-level remark: when I tried the PR with a real-life .mli file, I found the bilingual ocaml/odoc highlighting to be too noisy for my eyes, and I was a bit lost.
[...]

I’m not really good at styling, especially in a case that is so dependent on the independently-chosen color scheme.
I wonder now whether we should:

  • make a poll to pick the default color,
  • create a variable to disable highlighting inside .ml/.mli files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants