Skip to content
View brandonrobertz's full-sized avatar

Organizations

@html-extract @dosbox-staging @next-LI

Block or report brandonrobertz

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
brandonrobertz/README.md

Um Yes Hello

I'm Brandon Roberts. I'm an independent data journalist specializing in open source and bringing computational techniques to journalism projects. You can read more on my site: bxroberts.org

Pinned Loading

  1. SparseLSH SparseLSH Public

    A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.

    Python 143 27

  2. propublica/django-collaborative propublica/django-collaborative Public

    ProPublica's collaborative tip-gathering framework. Import and manage CSV, Google Sheets and Screendoor data with ease.

    Python 99 18

  3. autoscrape-py autoscrape-py Public

    An automated, programming-free web scraper for interactive sites

    HTML 107 17

  4. chatgpt-document-extraction chatgpt-document-extraction Public archive

    A proof of concept tool for using ChatGPT to transform messy text documents into structured JSON

    Python 121 12

  5. html-extract/hext.js html-extract/hext.js Public

    Use Hext in a browser or with node. Hext is a domain-specific language for extracting structured data from HTML documents.

    C++ 5 1

  6. llm-document-extraction llm-document-extraction Public

    A proof of concept tool for using local LLMs to transform messy text documents into structured JSON

    Python 17 1