-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix np array int labels #2325
Fix np array int labels #2325
Conversation
Reviewer's Guide by SourceryThis pull request refactors the label encoding logic to correctly handle numpy arrays and preserve the original data types of labels. It introduces a LabelManager class to manage label encoding and decoding, replacing the previous LabelEncoder implementation within the DataProcessor class. This change ensures that labels, whether numerical or string-based, are processed without losing their original type or data structure. Sequence diagram for label processing flowsequenceDiagram
participant DP as DataProcessor
participant LM as LabelManager
participant LE as LabelEncoder
DP->>LM: process_field(data_name, label_field, data)
activate LM
LM->>LM: _analyze_input(data)
alt is numerical data
LM->>LM: Convert to float32 array
else non-numerical data
LM->>LE: Create new encoder
LM->>LE: fit_transform(data)
end
LM-->>DP: Return encoded data
deactivate LM
Note over DP,LM: Later when restoring...
DP->>LM: restore_field(data_name, label_field, encoded_data)
activate LM
alt is numerical data
LM->>LM: Restore original dtype
else non-numerical data
LM->>LE: inverse_transform(encoded_data)
end
LM->>LM: _restore_type(decoded_data)
LM-->>DP: Return restored data
deactivate LM
Class diagram showing the new label management structureclassDiagram
class LabelManager {
-metadata: dict
+process_field(data_name: str, label_field: str, field_data: Any): np.ndarray
+restore_field(data_name: str, label_field: str, encoded_data: np.ndarray): Any
-_analyze_input(field_data: Any): LabelMetadata
-_encode_data(field_data: Any, metadata: LabelMetadata): np.ndarray
-_decode_data(encoded_data: np.ndarray, metadata: LabelMetadata): np.ndarray
-_restore_type(decoded_data: np.ndarray, metadata: LabelMetadata): Any
+handle_empty_data(): list
}
class LabelMetadata {
+input_type: type
+is_numerical: bool
+dtype: np.dtype
+encoder: LabelEncoder
}
class LabelEncoder {
-classes_: dict
-inverse_classes_: dict
-num_classes: int
-is_numerical: bool
+fit(y: Sequence|np.ndarray): LabelEncoder
+transform(y: Sequence|np.ndarray): np.ndarray
+fit_transform(y: Sequence|np.ndarray): np.ndarray
+inverse_transform(y: Sequence|np.ndarray): np.ndarray
}
class DataProcessor {
-label_manager: LabelManager
}
DataProcessor --> LabelManager
LabelManager --> LabelMetadata
LabelMetadata --> LabelEncoder
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ternaus - I've reviewed your changes and they look great!
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🟢 Security: all looks good
- 🟡 Testing: 1 issue found
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Fixes: #2324
Summary by Sourcery
Bug Fixes: