Skip to content
This repository has been archived by the owner on Nov 6, 2023. It is now read-only.

Reweighing dataset workflow and blog #357

Merged
merged 2 commits into from
Sep 19, 2023

Conversation

ZanMervic
Copy link
Contributor

Second of the fairness workflows and blogs.


+++

In the [previous blog post]({{< ref "/blog/2023/2023-08-23-fairness-dataset-bias.md" >}}), we introduced the Orange fairness addon along with the Dataset Bias and As Fairness widgets. We also demonstrated how to use them to detect bias in a dataset and visualized the results for a better understanding. In this blog, we will introduce the Reweighing widget, which we can use to mitigate bias in a dataset, resulting in fairer machine learning models learning from it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't link to .md. This link should probably point to "/blog/2023/2023-08-23-fairness-dataset-bias/"). Then the build would not fail.


{{< window-screenshot src="/blog_img/2023/2023-08-24-fairness-reweighing-dataset-box-plot.png" >}}

The box plot widget reveals that lower weights were given to instances of unprivileged groups with unfavorable class values and privileged groups with favorable class values. This behavior aligns perfectly with the expected outcomes of the reweighing algorithm. The opposite is true for the higher weights. The results show that the reweighing algorithm successfully assigned weights to the instances in a way that will encourage the model to prioritize learning from underrepresented groups while de-emphasizing overrepresented groups.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where can we see weights in the boxplot? The only thing I see in the Boxplot is that every other race than Caucasian is predicted "Yes".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also comment on what this means in the context of the data set, refer to actual feature values. Your current description is too generic.


## Orange use case

Now that we have grasped how the Reweighing widget functions and its applications, let us explore a real-world example of using it to assign weights to data. For this illustration, we will use the [Compas dataset](https://github.com/propublica/compas-analysis). Contrary to the example in the previous blog, we will not use the As Fairness widget to select fairness attributes. This is because datasets with a fairness tag come with default fairness attributes. Specifically, for the Compas dataset, "race" is identified as the protected attribute, with "Caucasian" set as the privileged, protected attribute value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should shortly describe what the data set is about and what are we predicting. Otherwise users can not understand the interpretation.

I know you have a link, but one individual blog should be most self-contained.


The box plot widget reveals that lower weights were given to instances of unprivileged groups with unfavorable class values and privileged groups with favorable class values. This behavior aligns perfectly with the expected outcomes of the reweighing algorithm. The opposite is true for the higher weights. The results show that the reweighing algorithm successfully assigned weights to the instances in a way that will encourage the model to prioritize learning from underrepresented groups while de-emphasizing overrepresented groups.

Another way to see the effects of using the Reweighing widget on a dataset is to use a Data Table widget, where we can see that a new meta attribute called weights has been added to the dataset. This attribute contains the weights assigned to each instance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, what does this tell us? Interpret some of the weights for us.

weight = 1001
+++

Detecting bias is only the first step in ensuring fair machine learning. The next step is to mitigate the bias. This workflow illustrates removing bias at the dataset level using the Reweighing widget on the data. Initially, we load the dataset and split it into training and validation subsets. We then check for bias in the validation set before reweighing. Using the training set, we train the reweighing algorithm and apply it to the validation set. Finally, we check for bias in the reweighed validation set. We can also visualize the effect of the reweighing using a Box Plot.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"on the data" - remove
"we load the dataset and split it" -> "we split the data" (it is obvious that we had to load it somehow

@markotoplak markotoplak merged commit 59ad240 into biolab:master Sep 19, 2023
1 of 4 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants