-
Notifications
You must be signed in to change notification settings - Fork 270
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[doc] support ngram_search function (#899)
- Loading branch information
Showing
9 changed files
with
403 additions
and
0 deletions.
There are no files selected for viewing
67 changes: 67 additions & 0 deletions
67
docs/sql-manual/sql-functions/string-functions/ngram-search.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
--- | ||
{ | ||
"title": "NGRAM_SEARCH", | ||
"language": "en" | ||
} | ||
--- | ||
|
||
<!-- | ||
Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
--> | ||
|
||
## Description | ||
|
||
Calculate the N-gram similarity between `text` and `pattern`. The similarity ranges from 0 to 1, where a higher similarity indicates greater similarity between the two strings. | ||
|
||
Both `pattern` and `gram_num` must be constants. If the length of either `text` or `pattern` is less than `gram_num`, return 0. | ||
|
||
N-gram similarity is a method for calculating text similarity based on N-grams. An N-gram is a set of continuous N characters or words extracted from a text string. For example, for the string "text" with N=2 (bigram), the bigrams are: {"te", "ex", "xt"}. | ||
|
||
The N-gram similarity is calculated as: | ||
|
||
2 * |Intersection| / (|text set| + |pattern set|) | ||
|
||
where |text set| and |pattern set| are the N-grams of `text` and `pattern`, and `Intersection` is the intersection of the two sets. | ||
|
||
Note that, by definition, a similarity of 1 does not necessarily mean the two strings are identical. | ||
|
||
Only supports ASCII encoding. | ||
|
||
## Syntax | ||
|
||
`DOUBLE ngram_search(VARCHAR text,VARCHAR pattern,INT gram_num)` | ||
|
||
## Example | ||
|
||
```sql | ||
mysql> select ngram_search('123456789' , '12345' , 3); | ||
+---------------------------------------+ | ||
| ngram_search('123456789', '12345', 3) | | ||
+---------------------------------------+ | ||
| 0.6 | | ||
+---------------------------------------+ | ||
|
||
mysql> select ngram_search("abababab","babababa",2); | ||
+-----------------------------------------+ | ||
| ngram_search('abababab', 'babababa', 2) | | ||
+-----------------------------------------+ | ||
| 1 | | ||
+-----------------------------------------+ | ||
``` | ||
## keywords | ||
NGRAM_SEARCH,NGRAM,SEARCH |
67 changes: 67 additions & 0 deletions
67
...-content-docs/current/sql-manual/sql-functions/string-functions/ngram-search.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
--- | ||
{ | ||
"title": "NGRAM_SEARCH", | ||
"language": "zh-CN" | ||
} | ||
--- | ||
|
||
<!-- | ||
Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
--> | ||
|
||
## Description | ||
|
||
`DOUBLE ngram_search(VARCHAR text,VARCHAR pattern,INT gram_num)` | ||
|
||
计算 text 和 pattern 的 N-gram 相似度。相似度从 0 到 1,相似度越高证明两个字符串越相似。 | ||
其中`pattern`,`gram_num`必须为常量。 | ||
如果`text`或者`pattern`的长度小于`gram_num`,返回 0。 | ||
|
||
N-gram 相似度(N-gram similarity)是一种基于 N-gram(N 元语法)的文本相似度计算方法。N-gram 是指将一个文本串分成连续的 N 个字符或词语的集合。例如,对于字符串“text”,当 N=2 时,其二元组(bi-gram)为:{“te”, “ex”, “xt”}。 | ||
|
||
N-gram 相似度的计算为 2 * |Intersection| / (|text set| + |pattern set|) | ||
|
||
其中|text set|,|pattern set|为 text 和 pattern 的 N-gram,`Intersection`为两个集合的交集。 | ||
|
||
注意,根据定义,相似度为 1 不代表两个字符串相同。 | ||
|
||
仅支持 ASCII 编码。 | ||
|
||
## Syntax | ||
|
||
`DOUBLE ngram_search(VARCHAR text,VARCHAR pattern,INT gram_num)` | ||
|
||
## Example | ||
|
||
```sql | ||
mysql> select ngram_search('123456789' , '12345' , 3); | ||
+---------------------------------------+ | ||
| ngram_search('123456789', '12345', 3) | | ||
+---------------------------------------+ | ||
| 0.6 | | ||
+---------------------------------------+ | ||
|
||
mysql> select ngram_search("abababab","babababa",2); | ||
+-----------------------------------------+ | ||
| ngram_search('abababab', 'babababa', 2) | | ||
+-----------------------------------------+ | ||
| 1 | | ||
+-----------------------------------------+ | ||
``` | ||
## keywords | ||
NGRAM_SEARCH,NGRAM,SEARCH |
65 changes: 65 additions & 0 deletions
65
...tent-docs/version-2.1/sql-manual/sql-functions/string-functions/ngram-search.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
--- | ||
{ | ||
"title": "NGRAM_SEARCH", | ||
"language": "zh-CN" | ||
} | ||
--- | ||
|
||
<!-- | ||
Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
--> | ||
|
||
## Description | ||
|
||
计算 text 和 pattern 的 N-gram 相似度。相似度从 0 到 1,相似度越高证明两个字符串越相似。 | ||
其中`pattern`,`gram_num`必须为常量。 | ||
如果`text`或者`pattern`的长度小于`gram_num`,返回 0。 | ||
|
||
N-gram 相似度(N-gram similarity)是一种基于 N-gram(N 元语法)的文本相似度计算方法。N-gram 是指将一个文本串分成连续的 N 个字符或词语的集合。例如,对于字符串“text”,当 N=2 时,其二元组(bi-gram)为:{“te”, “ex”, “xt”}。 | ||
|
||
N-gram 相似度的计算为 2 * |Intersection| / (|text set| + |pattern set|) | ||
|
||
其中|text set|,|pattern set|为 text 和 pattern 的 N-gram,`Intersection`为两个集合的交集。 | ||
|
||
注意,根据定义,相似度为 1 不代表两个字符串相同。 | ||
|
||
仅支持 ASCII 编码。 | ||
|
||
## Syntax | ||
|
||
`DOUBLE ngram_search(VARCHAR text,VARCHAR pattern,INT gram_num)` | ||
|
||
## Example | ||
|
||
```sql | ||
mysql> select ngram_search('123456789' , '12345' , 3); | ||
+---------------------------------------+ | ||
| ngram_search('123456789', '12345', 3) | | ||
+---------------------------------------+ | ||
| 0.6 | | ||
+---------------------------------------+ | ||
|
||
mysql> select ngram_search("abababab","babababa",2); | ||
+-----------------------------------------+ | ||
| ngram_search('abababab', 'babababa', 2) | | ||
+-----------------------------------------+ | ||
| 1 | | ||
+-----------------------------------------+ | ||
``` | ||
## keywords | ||
NGRAM_SEARCH,NGRAM,SEARCH |
65 changes: 65 additions & 0 deletions
65
...tent-docs/version-3.0/sql-manual/sql-functions/string-functions/ngram-search.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
--- | ||
{ | ||
"title": "NGRAM_SEARCH", | ||
"language": "zh-CN" | ||
} | ||
--- | ||
|
||
<!-- | ||
Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
--> | ||
|
||
## Description | ||
|
||
计算 text 和 pattern 的 N-gram 相似度。相似度从 0 到 1,相似度越高证明两个字符串越相似。 | ||
其中`pattern`,`gram_num`必须为常量。 | ||
如果`text`或者`pattern`的长度小于`gram_num`,返回 0。 | ||
|
||
N-gram 相似度(N-gram similarity)是一种基于 N-gram(N 元语法)的文本相似度计算方法。N-gram 是指将一个文本串分成连续的 N 个字符或词语的集合。例如,对于字符串“text”,当 N=2 时,其二元组(bi-gram)为:{“te”, “ex”, “xt”}。 | ||
|
||
N-gram 相似度的计算为 2 * |Intersection| / (|text set| + |pattern set|) | ||
|
||
其中|text set|,|pattern set|为 text 和 pattern 的 N-gram,`Intersection`为两个集合的交集。 | ||
|
||
注意,根据定义,相似度为 1 不代表两个字符串相同。 | ||
|
||
仅支持 ASCII 编码。 | ||
|
||
## Syntax | ||
|
||
`DOUBLE ngram_search(VARCHAR text,VARCHAR pattern,INT gram_num)` | ||
|
||
## Example | ||
|
||
```sql | ||
mysql> select ngram_search('123456789' , '12345' , 3); | ||
+---------------------------------------+ | ||
| ngram_search('123456789', '12345', 3) | | ||
+---------------------------------------+ | ||
| 0.6 | | ||
+---------------------------------------+ | ||
|
||
mysql> select ngram_search("abababab","babababa",2); | ||
+-----------------------------------------+ | ||
| ngram_search('abababab', 'babababa', 2) | | ||
+-----------------------------------------+ | ||
| 1 | | ||
+-----------------------------------------+ | ||
``` | ||
## keywords | ||
NGRAM_SEARCH,NGRAM,SEARCH |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
69 changes: 69 additions & 0 deletions
69
...oned_docs/version-2.1/sql-manual/sql-functions/string-functions/ngram-search.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
--- | ||
{ | ||
"title": "NGRAM_SEARCH", | ||
"language": "en" | ||
} | ||
--- | ||
|
||
<!-- | ||
Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
--> | ||
|
||
## Description | ||
|
||
`DOUBLE ngram_search(VARCHAR text,VARCHAR pattern,INT gram_num)` | ||
|
||
Calculate the N-gram similarity between `text` and `pattern`. The similarity ranges from 0 to 1, where a higher similarity indicates greater similarity between the two strings. | ||
|
||
Both `pattern` and `gram_num` must be constants. If the length of either `text` or `pattern` is less than `gram_num`, return 0. | ||
|
||
N-gram similarity is a method for calculating text similarity based on N-grams. An N-gram is a set of continuous N characters or words extracted from a text string. For example, for the string "text" with N=2 (bigram), the bigrams are: {"te", "ex", "xt"}. | ||
|
||
The N-gram similarity is calculated as: | ||
|
||
2 * |Intersection| / (|text set| + |pattern set|) | ||
|
||
where |text set| and |pattern set| are the N-grams of `text` and `pattern`, and `Intersection` is the intersection of the two sets. | ||
|
||
Note that, by definition, a similarity of 1 does not necessarily mean the two strings are identical. | ||
|
||
Only supports ASCII encoding. | ||
|
||
## Syntax | ||
|
||
`DOUBLE ngram_search(VARCHAR text,VARCHAR pattern,INT gram_num)` | ||
|
||
## Example | ||
|
||
```sql | ||
mysql> select ngram_search('123456789' , '12345' , 3); | ||
+---------------------------------------+ | ||
| ngram_search('123456789', '12345', 3) | | ||
+---------------------------------------+ | ||
| 0.6 | | ||
+---------------------------------------+ | ||
|
||
mysql> select ngram_search("abababab","babababa",2); | ||
+-----------------------------------------+ | ||
| ngram_search('abababab', 'babababa', 2) | | ||
+-----------------------------------------+ | ||
| 1 | | ||
+-----------------------------------------+ | ||
``` | ||
## keywords | ||
NGRAM_SEARCH,NGRAM,SEARCH |
Oops, something went wrong.