Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shaky attempt to add the HTML5 entities #738

Closed
wants to merge 1 commit into from

Conversation

StoneCypher
Copy link
Contributor

Hello. I don't speak Ruby, and I don't know how to run the tests on this. Use stink-eye heavily please.

I wrote this patch to attempt to address #736 , that all HTML5 entities are missing. I want ⅓.

Two assumptions are made in this patch which I am concerned about, above and beyond that I should be allowed anywhere near a keyboard.

  1. Extended combined entities may be represented in their long form. This is separately addressed as It is not clear how to write a multipoint entity in your entity list #737. I currently represent ⫅̸ as 10949 which is acceptable, but it should be 10949 338 instead.
  2. Integer placements may be repeated. Some entities have multiple names, such as 10878, which can be called ⩾̸ or ⩾̸. I'm assuming I can just list them both, but I don't ruby, so I'm not 100% sure I'm reading that hash table's uniqueness criterion correctly.

Thanks kindly.

@StoneCypher
Copy link
Contributor Author

If you're curious, the method for producing this patch was:

  1. Using firefox, control-select to pull the appropriate columns out of this table, after sorting on standard
  2. Postprocess with the below script in a browser console, because lazy
const chars = `Lang  ⟪   U+27EA (10218)  HTML 5.0
Rang  ⟫   U+27EB (10219)  HTML 5.0
... rest of Firefox paste here ...
varsupsetneqq, vsupnE   ⫌︀  U+2ACC (10956), U+FE00 (65024)  HTML 5.0
nparsl  ⫽⃥  U+2AFD (11005), U+20E5 (8421)   HTML 5.0`
  .split('\n')
  .map(row => {
    const group = row.replace('   ', '  ');
    return group.split('  ')
  });



const makeRow = (name, num) => {
  const names = name.split(', ');
  return names.map(nm => `        [${num}, '${nm}'],`).join('\n');
}




console.log(
  chars.map(
    row => makeRow(row[0], parseInt(row[2].split('(')[1].split(')')[0]))
  )
    .join('\n')
);

@StoneCypher
Copy link
Contributor Author

@gettalong

@gettalong
Copy link
Owner

@StoneCypher I will include this in the next released, though the expanded version that also handles multi-codepoint entities will have to wait.

@StoneCypher
Copy link
Contributor Author

OK :)

@gettalong gettalong self-assigned this Mar 15, 2023
@gettalong
Copy link
Owner

@StoneCypher Regarding your assumption 2: If multiple entries with the same code point exist, the mapping from code point to string representation uses the last entry. However, all string to code point representations are available.

I have merged your changes and they will be in the next release (see master...devel), this time really! 😄

@gettalong gettalong closed this Mar 15, 2023
@StoneCypher
Copy link
Contributor Author

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants