Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different ranges in the results graphs #991

Open
1 task done
Josephts1 opened this issue Jan 22, 2025 · 2 comments
Open
1 task done

Different ranges in the results graphs #991

Josephts1 opened this issue Jan 22, 2025 · 2 comments
Labels
detect Object Detection issues, PR's question Further information is requested

Comments

@Josephts1
Copy link

Search before asking

Question

I'm looking for the best way to train a YOLO model for mandarin detection with my own dataset.
First, I tried with a pre-trained model (model=YOLO('yolo11s.pt')). I got fairly good results but there were a lot of peaks and troughs in the graphs (see image 1). The execution time was too long.
Second, I tried with the same pre-trained model but freezing the 24 layers of the YOLO model. I got better results (better mAP50-95 and better convergence) but there were still peaks and troughs (see image 2).
Finally, I trained a model from scratch (model=YOLO('yolo11s.yaml')) and got the best result so far (see image 3).

My question is: If you look at the vertical axis of the train/box_loss, val/box_loss, train/cls_loss, val/cls_loss, train/dfl_loss and val/dfl_loss graphs, they vary a lot between the pre-trained model and the new model. Does anyone know why these ranges are so different, one has values ​​between 1 and 0, the other has values ​​between 5 and 0?

(imagen 1)
Image
(imagen 2)
Image
(imagen 3)
Image

Additional

No response

@Josephts1 Josephts1 added the question Further information is requested label Jan 22, 2025
@UltralyticsAssistant UltralyticsAssistant added the detect Object Detection issues, PR's label Jan 22, 2025
@UltralyticsAssistant
Copy link
Member

👋 Hello @Josephts1, thank you for raising this issue about Ultralytics HUB 🚀! It looks like you’re working on training YOLO models for mandarin detection, and we appreciate you sharing your training details along with the results graphs 📊.

To help address your question, please take a look at our HUB Docs for guidance on best practices and potential insights:

  • Quickstart. Begin training and managing YOLO models with HUB in just a few steps.
  • Models: Training and Exporting. Learn how to effectively train YOLO models and export them for deployment.
  • Metrics. Understand loss functions (e.g., box_loss, cls_loss, dfl_loss) and performance metrics to interpret your training and evaluation results.

If you're observing varying loss graph ranges, this could depend on the model's architecture, training dynamics, or specific dataset characteristics (scale, variability, annotations). Please share additional context, such as:

  1. The dataset details: How it's labeled and structured, along with any preprocessing steps.
  2. The exact training configuration: Hyperparameters, optimizer settings, or any custom modifications made.
  3. If there are specific commands or code snippets being used, sharing those could help us investigate further.

If this is a potential 🐛 Bug Report, please also include a minimum reproducible example (MRE) to assist us in reproducing the behavior.

Our engineering team will look into this further and get back to you soon. Your patience and detailed input are much appreciated, as they help us continue improving the HUB platform! 🚀😊

@pderrenger
Copy link
Member

@Josephts1 thank you for your detailed explanation and the accompanying graphs! It’s great to see your experimentation with different training strategies for mandarin detection. The variation in loss ranges you observed is a common occurrence and can be explained by the following factors:

  1. Pre-trained Model vs. Training from Scratch:

    • Pre-trained models, like yolo11s.pt, are initialized with weights optimized on large datasets (e.g., COCO). These weights are already well-tuned, resulting in smaller initial losses. The loss values typically start closer to the optimal range (e.g., 1 to 0) when fine-tuning such models.
    • On the other hand, when training from scratch (using yolo11s.yaml), the model starts with randomly initialized weights. The loss values are initially much higher (e.g., 5 or above) because the model has no prior knowledge and must learn everything from the ground up.
  2. Freezing Layers:

    • Freezing layers (e.g., the first 24 layers in your experiment) reduces the number of parameters being updated, often leading to smoother convergence. However, even with frozen layers, the pre-trained weights still influence the initial loss ranges, keeping them relatively low compared to training from scratch.
  3. Dataset-Specific Characteristics:

    • The variability in loss ranges can also depend on your dataset. Factors like class imbalance, annotation quality, and dataset size can lead to differences in how the model optimizes during training. For instance, if your dataset has very different characteristics from COCO, training from scratch might better align the model to your specific task, as seen in your results.
  4. Loss Function Dynamics:

    • The different loss components (e.g., box_loss, cls_loss, dfl_loss) have their own scales and dynamics based on the training strategy and the model's initialization. Pre-trained models might start closer to their optimal point, whereas training from scratch involves a broader exploration of the parameter space, leading to higher initial loss values.

Recommendations:

  • Choose the Best Strategy: Based on your observations, training from scratch seems to yield the best results for your dataset. This makes sense if your dataset (mandarin detection) is very different from the COCO dataset used to pre-train the model.
  • Monitor Class Distribution: Ensure your dataset has a balanced class distribution to avoid potential biases in loss calculations.
  • Use Validation Metrics: Focus on validation metrics like mAP50-95 to evaluate model performance, rather than the absolute loss values, as these are more indicative of real-world performance.

If you'd like to further analyze or adjust training behavior, consider visualizing additional metrics or leveraging tools provided in the Ultralytics HUB. The HUB allows for streamlined dataset management, training, and monitoring of results.

Let me know if you have further questions or need clarification! 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
detect Object Detection issues, PR's question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants