Extract rich metadata from URLs.
npm install @borderless/unfurl --save
Unfurl attempts to parse and extract rich structured metadata from URLs.
import { scraper, urlScraper } from "@borderless/unfurl";
import * as plugins from "@borderless/unfurl/dist/plugins";
Accepts a request
function and a list of plugins
to use. The request is expected to return a "page" object, which is the same shape as the input to scrape(page)
.
const scrape = scraper({
request,
plugins: [plugins.htmlmetaparser, plugins.exifdata],
});
const res = await fetch("http://example.com"); // E.g. `popsicle`.
await scrape({
url: res.url,
status: res.status,
headers: res.headers.asObject(),
body: res.stream(), // Must stream the request instead of buffering to support large responses.
});
Simpler wrapper around scraper
that automatically makes a request(url)
for the page.
const scrape = urlScraper({ request });
await scrape("http://example.com");
Apache 2.0