Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用ppocrlabel标注表格生成的gt.txt文件在训练时一直报错,超过索引边界 #14211

Open
3 tasks done
HUPg-95 opened this issue Nov 13, 2024 · 1 comment
Open
3 tasks done

Comments

@HUPg-95
Copy link

HUPg-95 commented Nov 13, 2024

🔎 Search before asking

  • I have searched the PaddleOCR Docs and found no similar bug report.
  • I have searched the PaddleOCR Issues and found no similar bug report.
  • I have searched the PaddleOCR Discussions and found no similar bug report.

🐛 Bug (问题描述)

使用ppocrlabel标注表格生成的gt.txt文件在训练时一直报错,超过索引边界,查看生成的gt文件时发现,生成的html文件中有合并单元格的情况存在,但是在应该在合并的时候还有标签,
比如5行3列的单元格,第一行是 td> 这样的话会多显示一个cell,正确的结果不应该是 `吗?

🏃‍♂️ Environment (运行环境)

windows11

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

html结果中structure.tokens和cells是对不上的,因此在训练时报错索引超过边界
"html": {"structure": {"tokens": ["<tr>", "<td", " colspan=\"6\"", ">", "</td>", "<td", " rowspan=\"2\"", ">", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td", " rowspan=\"5\"", ">", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>"]}, "cells": [{"tokens": ["青", "耸"], "bbox": [[[16, 16], [978, 16], [978, 84], [16, 84]]]}, {"tokens": ["式", "显", "示", "。"], "bbox": [[[980, 16], [1170, 16], [1170, 152], [980, 152]]]}, {"tokens": ["e"], "bbox": [[[16, 84], [206, 84], [206, 152], [16, 152]]]}, {"tokens": ["洞", "大"], "bbox": [[[208, 84], [354, 84], [354, 152], [208, 152]]]}, {"tokens": ["全", "保"], "bbox": [[[352, 84], [486, 84], [486, 152], [352, 152]]]}, {"tokens": ["孝", "义"], "bbox": [[[486, 84], [632, 84], [632, 152], [486, 152]]]}, {"tokens": ["望", "的", "国", "家"], "bbox": [[[632, 84], [822, 84], [822, 152], [632, 152]]]}, {"tokens": ["。", "渔", "民"], "bbox": [[[822, 84], [980, 84], [980, 152], [822, 152]]]}, {"tokens": ["N"], "bbox": [[[16, 152], [206, 152], [206, 220], [16, 220]]]}, {"tokens": ["$", "2", "4", ".", "7", "7"], "bbox": [[[208, 152], [354, 152], [354, 220], [208, 220]]]}, {"tokens": ["懂", "弄", "通", "以"], "bbox": [[[352, 152], [486, 152], [486, 220], [352, 220]]]}, {"tokens": ["6", "3"], "bbox": [[[486, 152], [632, 152], [632, 220], [486, 220]]]}, {"tokens": ["1", ".", "2", "5"], "bbox": [[[632, 152], [822, 152], [822, 220], [632, 220]]]}, {"tokens": ["9", "1", "5", ".", "2", "8"], "bbox": [[[822, 152], [980, 152], [980, 220], [822, 220]]]}, {"tokens": ["$", "8", ".", "8", "1"], "bbox": [[[980, 152], [1170, 152], [1170, 220], [980, 220]]]}, {"tokens": ["A", "t"], "bbox": [[[16, 220], [206, 220], [206, 288], [16, 288]]]}, {"tokens": ["$", "8", "0", "7", "2", ".", "0", "5"], "bbox": [[[208, 220], [354, 220], [354, 288], [208, 288]]]}, {"tokens": ["面"], "bbox": [[[352, 220], [486, 220], [486, 288], [352, 288]]]}, {"tokens": ["0", ".", "0", "4"], "bbox": [[[486, 220], [632, 220], [632, 288], [486, 288]]]}, {"tokens": ["1", ".", "2", "3"], "bbox": [[[632, 220], [822, 220], [822, 288], [632, 288]]]}, {"tokens": ["9", "6", "7", ".", "6", "9"], "bbox": [[[822, 220], [980, 220], [980, 288], [822, 288]]]}, {"tokens": ["$", "4"], "bbox": [[[980, 220], [1170, 220], [1170, 288], [980, 288]]]}, {"tokens": ["O"], "bbox": [[[16, 288], [206, 288], [206, 356], [16, 356]]]}, {"tokens": ["$", "3", "4", ".", "9", "9"], "bbox": [[[208, 288], [354, 288], [354, 356], [208, 356]]]}, {"tokens": ["是"], "bbox": [[[352, 288], [486, 288], [486, 356], [352, 356]]]}, {"tokens": ["1", "7", "3"], "bbox": [[[486, 288], [632, 288], [632, 356], [486, 356]]]}, {"tokens": ["7"], "bbox": [[[632, 288], [822, 288], [822, 356], [632, 356]]]}, {"tokens": ["8", "8", "9", "6", "."], "bbox": [[[822, 288], [980, 288], [980, 356], [822, 356]]]}, {"tokens": ["$", "1", "2", "4"], "bbox": [[[980, 288], [1170, 288], [1170, 356], [980, 356]]]}, {"tokens": ["对"], "bbox": [[[16, 356], [206, 356], [206, 424], [16, 424]]]}, {"tokens": ["$", "4", "0", "1", "5", ".", "3", "0"], "bbox": [[[208, 356], [354, 356], [354, 424], [208, 424]]]}, {"tokens": ["主", "权", "新"], "bbox": [[[352, 356], [486, 356], [486, 424], [352, 424]]]}, {"tokens": ["9", "3", "4"], "bbox": [[[486, 356], [632, 356], [632, 424], [486, 424]]]}, {"tokens": ["7", "7", "8", "6"], "bbox": [[[632, 356], [822, 356], [822, 424], [632, 424]]]}, {"tokens": ["6", "6", ".", "7", "7"], "bbox": [[[822, 356], [980, 356], [980, 424], [822, 424]]]}, {"tokens": ["$", "5", ".", "3", "9"], "bbox": [[[980, 356], [1170, 356], [1170, 424], [980, 424]]]}, {"tokens": ["”", "。", "无"], "bbox": [[[16, 424], [206, 424], [206, 492], [16, 492]]]}, {"tokens": ["$", "9", "5", ".", "7", "9"], "bbox": [[[208, 424], [354, 424], [354, 492], [208, 492]]]}, {"tokens": [",", "听", "取", "民"], "bbox": [[[352, 424], [486, 424], [486, 492], [352, 492]]]}, {"tokens": ["2", "2", "9", ".", "9", "7"], "bbox": [[[486, 424], [632, 424], [632, 492], [486, 492]]]}, {"tokens": ["4"], "bbox": [[[632, 424], [822, 424], [822, 492], [632, 492]]]}, {"tokens": ["6", "4", "4"], "bbox": [[[822, 424], [980, 424], [980, 492], [822, 492]]]}, {"tokens": ["$", "4", ".", "6", "6"], "bbox": [[[980, 424], [1170, 424], [1170, 492], [980, 492]]]}, {"tokens": ["R", "e", "d", "i"], "bbox": [[[16, 492], [206, 492], [206, 560], [16, 560]]]}, {"tokens": ["$", "6"], "bbox": [[[208, 492], [354, 492], [354, 560], [208, 560]]]}, {"tokens": ["之", "一", "、", "2"], "bbox": [[[352, 492], [486, 492], [486, 560], [352, 560]]]}, {"tokens": ["9", "6", "4", "6"], "bbox": [[[486, 492], [632, 492], [632, 560], [486, 560]]]}, {"tokens": ["3", "3", ".", "7", "2", "6"], "bbox": [[[632, 492], [822, 492], [822, 560], [632, 560]]]}, {"tokens": ["4", "0", "3", "0"], "bbox": [[[822, 492], [980, 492], [980, 560], [822, 560]]]}, {"tokens": ["$", "3", "4", "."], "bbox": [[[980, 492], [1170, 492], [1170, 560], [980, 560]]]}, {"tokens": ["L"], "bbox": [[[16, 560], [206, 560], [206, 628], [16, 628]]]}, {"tokens": ["$", "5", "5", "0", ".", "9", "2"], "bbox": [[[208, 560], [354, 560], [354, 628], [208, 628]]]}, {"tokens": [",", "历", "任", "教"], "bbox": [[[352, 560], [486, 560], [486, 628], [352, 628]]]}, {"tokens": ["2", "8", "1", "6", ".", "4", "8", "9"], "bbox": [[[486, 560], [632, 560], [632, 628], [486, 628]]]}, {"tokens": ["6", "7"], "bbox": [[[632, 560], [822, 560], [822, 628], [632, 628]]]}, {"tokens": ["6", "6", "4", "6", ".", "2"], "bbox": [[[822, 560], [980, 560], [980, 628], [822, 628]]]}, {"tokens": ["$", "8", ".", "4", "4"], "bbox": [[[980, 560], [1170, 560], [1170, 628], [980, 628]]]}, {"tokens": ["“", "奇", "点", "”"], "bbox": [[[16, 628], [206, 628], [206, 696], [16, 696]]]}, {"tokens": ["$", "6", "0", "9", ".", "2", "3"], "bbox": [[[208, 628], [354, 628], [354, 696], [208, 696]]]}, {"tokens": ["言", "服"], "bbox": [[[352, 628], [486, 628], [486, 696], [352, 696]]]}, {"tokens": ["8", "7"], "bbox": [[[486, 628], [632, 628], [632, 696], [486, 696]]]}, {"tokens": ["9", "0", "3", ".", "3", "1"], "bbox": [[[632, 628], [822, 628], [822, 696], [632, 696]]]}, {"tokens": ["6", ".", "7", "2", "3"], "bbox": [[[822, 628], [980, 628], [980, 696], [822, 696]]]}, {"tokens": ["$", "1"], "bbox": [[[980, 628], [1170, 628], [1170, 696], [980, 696]]]}, {"tokens": ["C"], "bbox": [[[16, 696], [206, 696], [206, 764], [16, 764]]]}, {"tokens": ["$", "9", ".", "5", "7"], "bbox": [[[208, 696], [354, 696], [354, 764], [208, 764]]]}, {"tokens": ["年", ",", "是", "上"], "bbox": [[[352, 696], [486, 696], [486, 764], [352, 764]]]}, {"tokens": ["5", "8", "5", "6"], "bbox": [[[486, 696], [632, 696], [632, 764], [486, 764]]]}, {"tokens": ["4", "3"], "bbox": [[[632, 696], [822, 696], [822, 764], [632, 764]]]}, {"tokens": ["5", "0", "0", "8", ".", "7", "4"], "bbox": [[[822, 696], [980, 696], [980, 764], [822, 764]]]}, {"tokens": ["$", "2", "4", ".", "9", "4", "7", "9"], "bbox": [[[980, 696], [1170, 696], [1170, 764], [980, 764]]]}, {"tokens": ["a", "x", "a", "t"], "bbox": [[[16, 764], [206, 764], [206, 832], [16, 832]]]}, {"tokens": ["$", "1", "7", "7", "1", ".", "2", "3"], "bbox": [[[208, 764], [354, 764], [354, 832], [208, 832]]]}, {"tokens": ["生", "命", "教"], "bbox": [[[352, 764], [486, 764], [486, 832], [352, 832]]]}, {"tokens": ["6", "6", "6"], "bbox": [[[486, 764], [632, 764], [632, 832], [486, 832]]]}, {"tokens": ["6", "5"], "bbox": [[[632, 764], [822, 764], [822, 832], [632, 832]]]}, {"tokens": ["2", "2", ".", "5", "3"], "bbox": [[[822, 764], [980, 764], [980, 832], [822, 832]]]}, {"tokens": ["$", "4", "5", ".", "9", "9"], "bbox": [[[980, 764], [1170, 764], [1170, 832], [980, 832]]]}, {"tokens": ["转"], "bbox": [[[16, 832], [206, 832], [206, 900], [16, 900]]]}, {"tokens": ["$", "5", "3", ".", "5", "0"], "bbox": [[[208, 832], [354, 832], [354, 900], [208, 900]]]}, {"tokens": ["集", "团"], "bbox": [[[352, 832], [486, 832], [486, 900], [352, 900]]]}, {"tokens": ["1", "6", "1", "3"], "bbox": [[[486, 832], [632, 832], [632, 900], [486, 900]]]}, {"tokens": ["6", "8"], "bbox": [[[632, 832], [822, 832], [822, 900], [632, 900]]]}, {"tokens": ["1", ".", "8", "5"], "bbox": [[[822, 832], [980, 832], [980, 900], [822, 900]]]}, {"tokens": ["$", "8", "5", "9", "5", ".", "1", "6"], "bbox": [[[980, 832], [1170, 832], [1170, 900], [980, 900]]]}, {"tokens": ["i", "l", "l"], "bbox": [[[16, 900], [206, 900], [206, 968], [16, 968]]]}, {"tokens": ["$", "1", "0"], "bbox": [[[208, 900], [354, 900], [354, 968], [208, 968]]]}, {"tokens": ["固", "废", "案", "件"], "bbox": [[[352, 900], [486, 900], [486, 968], [352, 968]]]}, {"tokens": ["3", ".", "3", "2"], "bbox": [[[486, 900], [632, 900], [632, 968], [486, 968]]]}, {"tokens": ["6", "0", "6", "."], "bbox": [[[632, 900], [822, 900], [822, 968], [632, 968]]]}, {"tokens": ["1", "1", "4", "6", "."], "bbox": [[[822, 900], [980, 900], [980, 968], [822, 968]]]}, {"tokens": ["$", "5", "9", "1", ".", "0", "5"], "bbox": [[[980, 900], [1170, 900], [1170, 968], [980, 968]]]}, {"tokens": ["R", "s"], "bbox": [[[16, 968], [206, 968], [206, 1208], [16, 1208]]]}, {"tokens": ["$", "4", ".", "2", "7"], "bbox": [[[208, 968], [354, 968], [354, 1016], [208, 1016]]]}, {"tokens": ["认", "真", "总"], "bbox": [[[352, 968], [486, 968], [486, 1016], [352, 1016]]]}, {"tokens": ["3", "6", ".", "6"], "bbox": [[[486, 968], [632, 968], [632, 1016], [486, 1016]]]}, {"tokens": ["4"], "bbox": [[[632, 968], [822, 968], [822, 1016], [632, 1016]]]}, {"tokens": ["4", ".", "5", "4"], "bbox": [[[822, 968], [980, 968], [980, 1016], [822, 1016]]]}, {"tokens": ["$", "8", ".", "3", "4"], "bbox": [[[980, 968], [1170, 968], [1170, 1016], [980, 1016]]]}, {"tokens": ["$", "4", "1", "0"], "bbox": [[[208, 1016], [354, 1016], [354, 1064], [208, 1064]]]}, {"tokens": ["视"], "bbox": [[[352, 1016], [486, 1016], [486, 1064], [352, 1064]]]}, {"tokens": ["7", "7", "8", "0"], "bbox": [[[486, 1016], [632, 1016], [632, 1064], [486, 1064]]]}, {"tokens": ["6", ".", "4", "1"], "bbox": [[[632, 1016], [822, 1016], [822, 1064], [632, 1064]]]}, {"tokens": ["8", "5", "5", ".", "3", "2"], "bbox": [[[822, 1016], [980, 1016], [980, 1064], [822, 1064]]]}, {"tokens": ["$", "0", ".", "4"], "bbox": [[[980, 1016], [1170, 1016], [1170, 1064], [980, 1064]]]}, {"tokens": ["$", "9"], "bbox": [[[208, 1064], [354, 1064], [354, 1112], [208, 1112]]]}, {"tokens": ["方", "面", "的", "原"], "bbox": [[[352, 1064], [486, 1064], [486, 1112], [352, 1112]]]}, {"tokens": ["4", "4", "4", "6", ".", "3", "4", "1"], "bbox": [[[486, 1064], [632, 1064], [632, 1112], [486, 1112]]]}, {"tokens": ["2", "9", "1", "3"], "bbox": [[[632, 1064], [822, 1064], [822, 1112], [632, 1112]]]}, {"tokens": [], "bbox": [[[822, 1064], [980, 1064], [980, 1112], [822, 1112]]]}, {"tokens": ["$", "5", "2", ".", "8", "6"], "bbox": [[[980, 1064], [1170, 1064], [1170, 1112], [980, 1112]]]}, {"tokens": ["$", "4", "7", "2", ".", "9", "3"], "bbox": [[[208, 1112], [354, 1112], [354, 1160], [208, 1160]]]}, {"tokens": ["涂", "干", ",", "清"], "bbox": [[[352, 1112], [486, 1112], [486, 1160], [352, 1160]]]}, {"tokens": ["1", "5"], "bbox": [[[486, 1112], [632, 1112], [632, 1160], [486, 1160]]]}, {"tokens": ["9", "6", "7", ".", "4", "2"], "bbox": [[[632, 1112], [822, 1112], [822, 1160], [632, 1160]]]}, {"tokens": ["3"], "bbox": [[[822, 1112], [980, 1112], [980, 1160], [822, 1160]]]}, {"tokens": ["$", "3", "5", ".", "7", "2"], "bbox": [[[980, 1112], [1170, 1112], [1170, 1160], [980, 1160]]]}, {"tokens": ["$", "8", "2", "6"], "bbox": [[[208, 1160], [354, 1160], [354, 1208], [208, 1208]]]}, {"tokens": ["的", "生", "活", ","], "bbox": [[[352, 1160], [486, 1160], [486, 1208], [352, 1208]]]}, {"tokens": ["8", ".", "6", "1"], "bbox": [[[486, 1160], [632, 1160], [632, 1208], [486, 1208]]]}, {"tokens": ["6", "8"], "bbox": [[[632, 1160], [822, 1160], [822, 1208], [632, 1208]]]}, {"tokens": ["8", ".", "7"], "bbox": [[[822, 1160], [980, 1160], [980, 1208], [822, 1208]]]}, {"tokens": ["$", "6", "4", "0", ".", "1", "0"], "bbox": [[[980, 1160], [1170, 1160], [1170, 1208], [980, 1208]]]}, {"tokens": ["F", "o", "m"], "bbox": [[[16, 1208], [206, 1208], [206, 1276], [16, 1276]]]}, {"tokens": ["$", "6", "0", "8", "0", ".", "7", "2"], "bbox": [[[208, 1208], [354, 1208], [354, 1276], [208, 1276]]]}, {"tokens": ["数", "字"], "bbox": [[[352, 1208], [486, 1208], [486, 1276], [352, 1276]]]}, {"tokens": ["6", "2", ".", "1", "4"], "bbox": [[[486, 1208], [632, 1208], [632, 1276], [486, 1276]]]}, {"tokens": ["1", "1", ".", "8", "6"], "bbox": [[[632, 1208], [822, 1208], [822, 1276], [632, 1276]]]}, {"tokens": ["1", "7", "1", "3"], "bbox": [[[822, 1208], [980, 1208], [980, 1276], [822, 1276]]]}, {"tokens": ["$", "7", "8", "9"], "bbox": [[[980, 1208], [1170, 1208], [1170, 1276], [980, 1276]]]}]}

@TrioTea
Copy link

TrioTea commented Nov 14, 2024

尝试回退一下ppocrlabel的版本,最新的有问题

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants