Reweighing dataset workflow and blog #357

ZanMervic · 2023-09-01T10:27:08Z

Second of the fairness workflows and blogs.

markotoplak · 2023-09-06T12:03:53Z

content/blog/2023/2023-08-24-fairness-reweighing-dataset.md

+
+++
+
+In the [previous blog post]({{< ref "/blog/2023/2023-08-23-fairness-dataset-bias.md" >}}), we introduced the Orange fairness addon along with the Dataset Bias and As Fairness widgets. We also demonstrated how to use them to detect bias in a dataset and visualized the results for a better understanding. In this blog, we will introduce the Reweighing widget, which we can use to mitigate bias in a dataset, resulting in fairer machine learning models learning from it.


Don't link to .md. This link should probably point to "/blog/2023/2023-08-23-fairness-dataset-bias/"). Then the build would not fail.

markotoplak · 2023-09-06T12:11:49Z

content/blog/2023/2023-08-24-fairness-reweighing-dataset.md

+
+{{< window-screenshot src="/blog_img/2023/2023-08-24-fairness-reweighing-dataset-box-plot.png" >}}
+
+The box plot widget reveals that lower weights were given to instances of unprivileged groups with unfavorable class values and privileged groups with favorable class values. This behavior aligns perfectly with the expected outcomes of the reweighing algorithm. The opposite is true for the higher weights. The results show that the reweighing algorithm successfully assigned weights to the instances in a way that will encourage the model to prioritize learning from underrepresented groups while de-emphasizing overrepresented groups.


Where can we see weights in the boxplot? The only thing I see in the Boxplot is that every other race than Caucasian is predicted "Yes".

Also comment on what this means in the context of the data set, refer to actual feature values. Your current description is too generic.

markotoplak · 2023-09-06T12:13:04Z

content/blog/2023/2023-08-24-fairness-reweighing-dataset.md

+
+## Orange use case
+
+Now that we have grasped how the Reweighing widget functions and its applications, let us explore a real-world example of using it to assign weights to data. For this illustration, we will use the [Compas dataset](https://github.com/propublica/compas-analysis). Contrary to the example in the previous blog, we will not use the As Fairness widget to select fairness attributes. This is because datasets with a fairness tag come with default fairness attributes. Specifically, for the Compas dataset, "race" is identified as the protected attribute, with "Caucasian" set as the privileged, protected attribute value.


You should shortly describe what the data set is about and what are we predicting. Otherwise users can not understand the interpretation.

I know you have a link, but one individual blog should be most self-contained.

markotoplak · 2023-09-06T12:15:41Z

content/blog/2023/2023-08-24-fairness-reweighing-dataset.md

+
+The box plot widget reveals that lower weights were given to instances of unprivileged groups with unfavorable class values and privileged groups with favorable class values. This behavior aligns perfectly with the expected outcomes of the reweighing algorithm. The opposite is true for the higher weights. The results show that the reweighing algorithm successfully assigned weights to the instances in a way that will encourage the model to prioritize learning from underrepresented groups while de-emphasizing overrepresented groups.
+
+Another way to see the effects of using the Reweighing widget on a dataset is to use a Data Table widget, where we can see that a new meta attribute called weights has been added to the dataset. This attribute contains the weights assigned to each instance.


Again, what does this tell us? Interpret some of the weights for us.

markotoplak · 2023-09-10T20:07:50Z

content/workflows/fairness-reweighing-dataset.md

+weight = 1001
+++
+
+Detecting bias is only the first step in ensuring fair machine learning. The next step is to mitigate the bias. This workflow illustrates removing bias at the dataset level using the Reweighing widget on the data. Initially, we load the dataset and split it into training and validation subsets. We then check for bias in the validation set before reweighing. Using the training set, we train the reweighing algorithm and apply it to the validation set. Finally, we check for bias in the reweighed validation set. We can also visualize the effect of the reweighing using a Box Plot.


"on the data" - remove
"we load the dataset and split it" -> "we split the data" (it is obvious that we had to load it somehow

markotoplak reviewed Sep 6, 2023

View reviewed changes

markotoplak reviewed Sep 10, 2023

View reviewed changes

Reweighing dataset blog and workflow

8e97c82

ZanMervic force-pushed the fairness-reweighing-dataset branch from 7e4a99a to 8e97c82 Compare September 14, 2023 08:41

Changes to dates

b80a559

markotoplak merged commit 59ad240 into biolab:master Sep 19, 2023
1 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reweighing dataset workflow and blog #357

Reweighing dataset workflow and blog #357

ZanMervic commented Sep 1, 2023

markotoplak Sep 6, 2023

markotoplak Sep 6, 2023

markotoplak Sep 6, 2023

markotoplak Sep 6, 2023

markotoplak Sep 6, 2023

markotoplak Sep 10, 2023


		+++

		In the [previous blog post]({{< ref "/blog/2023/2023-08-23-fairness-dataset-bias.md" >}}), we introduced the Orange fairness addon along with the Dataset Bias and As Fairness widgets. We also demonstrated how to use them to detect bias in a dataset and visualized the results for a better understanding. In this blog, we will introduce the Reweighing widget, which we can use to mitigate bias in a dataset, resulting in fairer machine learning models learning from it.


		{{< window-screenshot src="/blog_img/2023/2023-08-24-fairness-reweighing-dataset-box-plot.png" >}}

		The box plot widget reveals that lower weights were given to instances of unprivileged groups with unfavorable class values and privileged groups with favorable class values. This behavior aligns perfectly with the expected outcomes of the reweighing algorithm. The opposite is true for the higher weights. The results show that the reweighing algorithm successfully assigned weights to the instances in a way that will encourage the model to prioritize learning from underrepresented groups while de-emphasizing overrepresented groups.


		## Orange use case

		Now that we have grasped how the Reweighing widget functions and its applications, let us explore a real-world example of using it to assign weights to data. For this illustration, we will use the [Compas dataset](https://github.com/propublica/compas-analysis). Contrary to the example in the previous blog, we will not use the As Fairness widget to select fairness attributes. This is because datasets with a fairness tag come with default fairness attributes. Specifically, for the Compas dataset, "race" is identified as the protected attribute, with "Caucasian" set as the privileged, protected attribute value.


		The box plot widget reveals that lower weights were given to instances of unprivileged groups with unfavorable class values and privileged groups with favorable class values. This behavior aligns perfectly with the expected outcomes of the reweighing algorithm. The opposite is true for the higher weights. The results show that the reweighing algorithm successfully assigned weights to the instances in a way that will encourage the model to prioritize learning from underrepresented groups while de-emphasizing overrepresented groups.

		Another way to see the effects of using the Reweighing widget on a dataset is to use a Data Table widget, where we can see that a new meta attribute called weights has been added to the dataset. This attribute contains the weights assigned to each instance.

Reweighing dataset workflow and blog #357

Reweighing dataset workflow and blog #357

Conversation

ZanMervic commented Sep 1, 2023

markotoplak Sep 6, 2023

Choose a reason for hiding this comment

markotoplak Sep 6, 2023

Choose a reason for hiding this comment

markotoplak Sep 6, 2023

Choose a reason for hiding this comment

markotoplak Sep 6, 2023

Choose a reason for hiding this comment

markotoplak Sep 6, 2023

Choose a reason for hiding this comment

markotoplak Sep 10, 2023

Choose a reason for hiding this comment