-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] introduce show data distribution command to display data distribution #55588
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: MatthewH00 <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
@kevincai Hi Could you help review the pr ? |
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
may be change it to |
i think the two syntax is facing two diffrent role.
And i know the error in the UT, i would fix it. |
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
|
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 99 / 101 (98.02%) file detail
|
[BE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
@kevincai Hi Could you please help review the pr when have free time? |
Why I'm doing:
When data skew occurs, it can lead to cluster instability, but currently there are no good methods to detect it.
What I'm doing:
introduce show data distribution command to display data distribution at the bucket level.
notes : the command is facing the normal user(like data analyst), when occurs data skew , could use the command to detect data skew, then adjust table schema(bucket key) to fix.
syntax:
SHOW DATA DISTRIBUTION FROM [db_name.]tbl_name [PARTITION (p1, ...)];
desc:
PartitionName: when partition table display partition name, when unpartition table display table name
BucketId: bucket id
RowCount: row count
RowCount%: at the current partition, current bucket's row count / total row count
DataSize: data size
DataSize%: at the current partition, current bucket's data size / total data size
1.partition table:
![1_N](https://private-user-images.githubusercontent.com/29924327/410292067-a9104d0c-b6e8-4749-99ba-12da44ee7baf.jpg?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzNTYyNjEsIm5iZiI6MTczOTM1NTk2MSwicGF0aCI6Ii8yOTkyNDMyNy80MTAyOTIwNjctYTkxMDRkMGMtYjZlOC00NzQ5LTk5YmEtMTJkYTQ0ZWU3YmFmLmpwZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEyVDEwMjYwMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTQ3ZTFkM2E0ZmQ5NjhiYWM5ZDNjYmRmMGE2ZjVmMDEwNWE5MGVlZGNiZWRlMWM0MDhmYjY2M2IzNWE3ZTIyMDcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.q_n72Oh5N0bxSek6os1nmVmRPJ_yyJTe45yzbv1IZp0)
1)entire table
e.g. show data distribution from partition_tbl_name
2)single partition
e.g. show data distribution from partition_tbl_name partition(p1)
3)several partition
e.g. show data distribution from partition_tbl_name partition(p1,p2)
2.unpartition table
![2_N](https://private-user-images.githubusercontent.com/29924327/410292104-977d676b-33ab-4454-9447-e69c45ed3ae8.jpg?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzNTYyNjEsIm5iZiI6MTczOTM1NTk2MSwicGF0aCI6Ii8yOTkyNDMyNy80MTAyOTIxMDQtOTc3ZDY3NmItMzNhYi00NDU0LTk0NDctZTY5YzQ1ZWQzYWU4LmpwZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEyVDEwMjYwMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTMxZDRiY2YyZDkwODA3Y2Y1ODdlYjYxNDAzNzFjMDIyN2FlZmE0YTMzNTZmMjNiYzg2OTViMzU4ZDMwNmI5MmUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.uj_GfyWP-EORS-d9CXc6nO7khIpLAhbMTXjUqwyLDqI)
1)entire table
e.g. show data distribution from unpartition_tbl_name
e.g. show data distribution from unpartition_tbl_name partition(unpartition_tbl_name)
3.special case
1)not exist db
return error like: Getting analyzing error. Detail message: Database db_name does not exsit.
2)not exist table
return error like: Getting analyzing error. Detail message: Table does not exist or is not native table: table_name.
3)not exist partition
return error like: Getting analyzing error. Detail message: Partition does not exist: partition_name.
4)not privilege table
return error like: Access denied; you need (at least one of) the ANY privilege(s) on TABLE table_name for this operation.
5)invalid sql
return error like: Getting syntax error at line 1, column 15. Detail message: Unexpected input 'table_name', the most similar input is {'FROM'}.
Fixes #issue
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: