Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to prevent single character matches? #136

Open
punkpeye opened this issue Sep 17, 2024 · 7 comments
Open

How to prevent single character matches? #136

punkpeye opened this issue Sep 17, 2024 · 7 comments

Comments

@punkpeye
Copy link

punkpeye commented Sep 17, 2024

This is being matched for query 'shoe'

Screenshot 2024-09-17 at 8 42 56 AM

What's the best way to prevent such matches?

I've experimented with different threshold values, like threshold: 0.2,, but that didn't get me far as it quickly started removing valid matches while sill keeping these.

@farzher
Copy link
Owner

farzher commented Sep 17, 2024

what "valid matches" have a score of 0.2?

there's not a way to prevent these matches. but you can just write your own filter to remove them based on result.indexes

@farzher
Copy link
Owner

farzher commented Sep 17, 2024

function valid_match(result) {
  const indexes = result.indexes
  let sequenceStart = 0
  let sequenceLength = 1
  
  for (let i = 1; i < indexes.length; i++) {
    if (indexes[i] === indexes[i-1] + 1) {
      sequenceLength++
    } else {
      if (sequenceLength === 1) return false
      sequenceStart = i
      sequenceLength = 1
    }
  }
  
  return sequenceLength !== 1
}

fuzzysort.go('shoe', ['i am designing an app that allows to chat with multiple LLMs at once.']).filter(valid_match)

@punkpeye
Copy link
Author

I think there might be a mistake in the code? I see you have sequenceStart, but that's not referenced anywhere.

Anyway, I get the idea – what does indexes actually contain? The position of a match? How do I tell the length of the match then?

@punkpeye
Copy link
Author

Okay, I figured out:

const extractMatches = (input: string, indexes: number[]): string[] => {
  const result: string[] = [];
  let currentSubstring = '';

  for (const [i, element] of input.split('').entries()) {
    if (indexes.includes(i)) {
      currentSubstring += element;
    } else if (currentSubstring) {
      result.push(currentSubstring);
      currentSubstring = '';
    }
  }

  if (currentSubstring) {
    result.push(currentSubstring);
  }

  return result;
};
.filter((result) => {
  const matches = extractMatches(result.target, result.indexes.slice());

  return matches.some((match) => match.length > 1);
})

Thank you!

@punkpeye punkpeye reopened this Sep 18, 2024
@punkpeye
Copy link
Author

punkpeye commented Sep 18, 2024

This mostly fixed the issue... but you can still see the issue in highlight() logic. What would be the way to filter out those single-character highlights without re-implementing the entire highlight logic?

@farzher
Copy link
Owner

farzher commented Sep 19, 2024

I think there might be a mistake in the code? I see you have sequenceStart, but that's not referenced anywhere.

lol oops. it's AI generated

This mostly fixed the issue... but you can still see the issue in highlight() logic. What would be the way to filter out those single-character highlights without re-implementing the entire highlight logic?

what's the issue with highlight? if you filtered out single characters matches, you shouldn't be trying to highlight them

@punkpeye
Copy link
Author

There might be a legit result where some highlighted snippets are multiple characters and others lone single characters. I would want to filter ou those that are lone single characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants