How to design negative samples for Florence-2 model training? #52

David-19940718 · 2024-09-18T02:48:31Z

Search before asking

I have searched the Multimodal Maestro issues and found no similar feature requests.

Question

We currently have a good understanding of how to create positive samples for the Florence-2 model, using a format like this:

{
  "image": "IMG_20220316_144445_jpg.rf.a79f523e54855af2323f0cfdb9a4dedc.jpg",
  "prefix": "<OD>",
  "suffix": "5 of hearts<loc_54><loc_213><loc_291><loc_598>6 of hearts<loc_205><loc_251><loc_471><loc_670>7 of hearts<loc_363><loc_309><loc_688><loc_797>8 of hearts<loc_598><loc_395><loc_973><loc_974>"
}

However, I'm unclear on how to properly design negative samples for training. Negative samples are crucial for improving the model's ability to discriminate and reduce false positives. Some questions I have:

Should negative samples use the same image but with incorrect object descriptions?
Do we need to use completely unrelated images and descriptions?
How do we handle the location tags for negative samples?
What's the recommended ratio of positive to negative samples in the training set?

Any guidance or best practices for creating effective negative samples would be greatly appreciated. This will help ensure we're training the Florence-2 model optimally for object detection tasks.

Additional

If there are any existing resources, documentation, or examples specifically for Florence-2 negative sample creation, please point me in that direction. Also, if there are any tools or scripts the team recommends for generating or augmenting negative samples, that information would be very helpful.

David-19940718 · 2024-09-18T05:51:01Z

We're currently experiencing a situation where our model's mAP (mean Average Precision) metrics are degrading while the loss values suggest overfitting. Our current saving strategy is based solely on validation loss, as shown in the following code snippet:

    def save_best(self, processor: AutoProcessor, model: AutoModelForCausalLM, val_loss: float):
        """Saves the best model checkpoint if the validation loss improves.

        Args:
            processor (AutoProcessor): The processor to save.
            model (AutoModelForCausalLM): The model to save.
            val_loss (float): The current validation loss.
        """
        if val_loss < self.best_val_loss:
            self.best_val_loss = val_loss
            save_model(self.best_checkpoint_dir, processor, model)
            print(f"New best model saved with validation loss: {self.best_val_loss}")

I've been looking at our model saving strategy, and I'm curious about your thoughts on its effectiveness. While we're using validation loss as the primary metric for saving the best model, it seems that our mAP scores are not reflecting the improvements we see in the loss. Do you think relying solely on validation loss is the best approach for designing our model saving criteria?

Would it be more beneficial to consider a combination of metrics, such as both validation loss and mAP, to ensure we're not just minimizing loss but also improving the model's precision? Or are there other metrics or strategies you believe would be more suitable for our current situation?

Looking forward to your insights on this matter.

SkalskiP · 2024-09-18T11:39:31Z

Hi @David-19940718 👋🏻 First of all, I'm thrilled to have users like you who are eager to experiment early on and push the library forward.

Regarding negative samples, I don't think there are any established best practices at the moment, but I'll ask a few people involved in VLM training about it.

I thought a good idea, and potentially simple to implement, would be to use the COCO dataset as negative samples. For example, splitting the training into two parts. In the first part, you fine-tune only on your dataset, and in the second part, on a mix of your dataset and the COCO dataset. This way, in the first phase, the model quickly learns your classes, and in the second phase, it becomes resistant to overfitting.

As for your second question, the ability to define any metric as a condition for saving a checkpoint sounds very reasonable. I'll try to add a GH issue to add such support.

David-19940718 · 2024-09-19T08:20:17Z

Thank you for your detailed and encouraging response. 😄

David-19940718 · 2024-09-23T03:16:48Z

Hi @SkalskiP,

By introducing appropriate data augmentation strategies, I've observed a significant reduction in overfitting. Moreover, under the same experimental conditions, the mAP accuracy has improved by several percentage points.

In future version development plans, it might be worth considering the addition of this feature.

SkalskiP · 2024-09-24T13:37:30Z

Hi @David-19940718 👋🏻 That looks fantastic! Could you tell me exactly what strategies you employed?

David-19940718 · 2024-09-25T09:21:52Z

Sure! The main strategies I employed are:

Random horizontal flipping (50% chance)
Color jittering (adjusting brightness, contrast, saturation, and hue)

class DetectionDataset(Dataset):
    def __init__(self, jsonl_file_path: str, image_directory_path: str, split_name: str):
        self.dataset = JSONLDataset(jsonl_file_path, image_directory_path)
        self.mode = split_name
        if split_name == "train":
            self.transform = transforms.Compose([
                transforms.RandomHorizontalFlip(p=0.5),
                transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1)
            ])

    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, idx):
        image, data = self.dataset[idx]
        prefix = data["prefix"]
        suffix = data["suffix"]
        # Apply data augmentation
        if self.mode == "train":
            image = self.transform(image)
        
        return prefix, suffix, image

SkalskiP · 2024-09-25T10:29:21Z

Hi @David-19940718 👋🏻 Oh, so you ended up using fairly traditional data augmentation techniques?

From what I see, you applied flipping. I understand that you also had to augment the object detection suffix in the process.

David-19940718 · 2024-09-25T16:22:59Z

Yes, I just did a simple initial validation. I applied some basic data augmentation techniques to get started and test things out. 😄

SkalskiP · 2024-09-25T19:23:30Z

@David-19940718 would you perhaps have a moment to draft a PR introducing basic data augmentation?

David-19940718 added the question Further information is requested label Sep 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to design negative samples for Florence-2 model training? #52

How to design negative samples for Florence-2 model training? #52

David-19940718 commented Sep 18, 2024

David-19940718 commented Sep 18, 2024

SkalskiP commented Sep 18, 2024

David-19940718 commented Sep 19, 2024

David-19940718 commented Sep 23, 2024

SkalskiP commented Sep 24, 2024

David-19940718 commented Sep 25, 2024

SkalskiP commented Sep 25, 2024

David-19940718 commented Sep 25, 2024

SkalskiP commented Sep 25, 2024

How to design negative samples for Florence-2 model training? #52

How to design negative samples for Florence-2 model training? #52

Comments

David-19940718 commented Sep 18, 2024

Search before asking

Question

Additional

David-19940718 commented Sep 18, 2024

SkalskiP commented Sep 18, 2024

David-19940718 commented Sep 19, 2024

David-19940718 commented Sep 23, 2024

SkalskiP commented Sep 24, 2024

David-19940718 commented Sep 25, 2024

SkalskiP commented Sep 25, 2024

David-19940718 commented Sep 25, 2024

SkalskiP commented Sep 25, 2024