-
Notifications
You must be signed in to change notification settings - Fork 21
Reweighing dataset workflow and blog #357
Reweighing dataset workflow and blog #357
Conversation
|
||
+++ | ||
|
||
In the [previous blog post]({{< ref "/blog/2023/2023-08-23-fairness-dataset-bias.md" >}}), we introduced the Orange fairness addon along with the Dataset Bias and As Fairness widgets. We also demonstrated how to use them to detect bias in a dataset and visualized the results for a better understanding. In this blog, we will introduce the Reweighing widget, which we can use to mitigate bias in a dataset, resulting in fairer machine learning models learning from it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't link to .md. This link should probably point to "/blog/2023/2023-08-23-fairness-dataset-bias/"). Then the build would not fail.
|
||
{{< window-screenshot src="/blog_img/2023/2023-08-24-fairness-reweighing-dataset-box-plot.png" >}} | ||
|
||
The box plot widget reveals that lower weights were given to instances of unprivileged groups with unfavorable class values and privileged groups with favorable class values. This behavior aligns perfectly with the expected outcomes of the reweighing algorithm. The opposite is true for the higher weights. The results show that the reweighing algorithm successfully assigned weights to the instances in a way that will encourage the model to prioritize learning from underrepresented groups while de-emphasizing overrepresented groups. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where can we see weights in the boxplot? The only thing I see in the Boxplot is that every other race than Caucasian is predicted "Yes".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also comment on what this means in the context of the data set, refer to actual feature values. Your current description is too generic.
|
||
## Orange use case | ||
|
||
Now that we have grasped how the Reweighing widget functions and its applications, let us explore a real-world example of using it to assign weights to data. For this illustration, we will use the [Compas dataset](https://github.com/propublica/compas-analysis). Contrary to the example in the previous blog, we will not use the As Fairness widget to select fairness attributes. This is because datasets with a fairness tag come with default fairness attributes. Specifically, for the Compas dataset, "race" is identified as the protected attribute, with "Caucasian" set as the privileged, protected attribute value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should shortly describe what the data set is about and what are we predicting. Otherwise users can not understand the interpretation.
I know you have a link, but one individual blog should be most self-contained.
|
||
The box plot widget reveals that lower weights were given to instances of unprivileged groups with unfavorable class values and privileged groups with favorable class values. This behavior aligns perfectly with the expected outcomes of the reweighing algorithm. The opposite is true for the higher weights. The results show that the reweighing algorithm successfully assigned weights to the instances in a way that will encourage the model to prioritize learning from underrepresented groups while de-emphasizing overrepresented groups. | ||
|
||
Another way to see the effects of using the Reweighing widget on a dataset is to use a Data Table widget, where we can see that a new meta attribute called weights has been added to the dataset. This attribute contains the weights assigned to each instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, what does this tell us? Interpret some of the weights for us.
weight = 1001 | ||
+++ | ||
|
||
Detecting bias is only the first step in ensuring fair machine learning. The next step is to mitigate the bias. This workflow illustrates removing bias at the dataset level using the Reweighing widget on the data. Initially, we load the dataset and split it into training and validation subsets. We then check for bias in the validation set before reweighing. Using the training set, we train the reweighing algorithm and apply it to the validation set. Finally, we check for bias in the reweighed validation set. We can also visualize the effect of the reweighing using a Box Plot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"on the data" - remove
"we load the dataset and split it" -> "we split the data" (it is obvious that we had to load it somehow
7e4a99a
to
8e97c82
Compare
Second of the fairness workflows and blogs.