You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To help automate discovery of new projects, I'd like to experiment with https://github.com/pgvector/pgvector and embeddings from a large language model to cluster projects together.
My plan is:
generate embeddings from the readme of each reviewed project
add the pgvector extension to postgresql
query the database for other projects closest to the embedding of the project
compare the "nearest" projects to their categories and topics
produce average vectors for each category of projects
produce an average of the vectors for all the reviewed projects
provide a interface (private for now due to API costs) for, a newly proposed project, to find out:
the closest existing projects
the closest categories
distance from each category average
distance from the total average
experiment with a selection of open source repositories (both climate related and totally unrelated) to find good distances to use as cut-off thresholds
experiment with including repo name, topics, description and other metadata when generating embeddings
The text was updated successfully, but these errors were encountered:
Also working on a related item in ecosyste-ms/awesome#3 which should help with classification of projects, in theory you'll be able to see which lists a project is most similar to.
To help automate discovery of new projects, I'd like to experiment with https://github.com/pgvector/pgvector and embeddings from a large language model to cluster projects together.
My plan is:
The text was updated successfully, but these errors were encountered: