Comparing YOLOv8 performance in detecting Cheerios boxes in synthetic images (bottom row) and real images (top row)

In this blog, we’ll give you a high level overview of training a YOLOv8 model using Falcon, our digital twin simulation platform. We cover how to create scenarios, generate data from digital twins, train and test your model, and introduce best practices for getting high performance results.

This process is also covered in the video (below) by Duality's Community Manager, Rebekah Bogdanoff, making it easy to follow along with the entire exercise.

‍Note: This blog offers a glimpse into the content of Falcon EDU Exercises 1 and 2. Anyone can try these exercises for themselves and access the more detailed documentation that walks through the entire process. Simply sign up for the free EDU tier of Falcon, and start today: Create Account - FalconCloud by Duality AI .

Why YOLOv8?

For these exercises, we’ll use YOLOv8 (You Only Look Once), a powerful model optimized for lightweight hardware and high-speed performance. While it may not be as advanced as some newer models, its excellent compute-cost-to-performance ratio makes it versatile across various setups. Pre-trained YOLO models save time and boost accuracy, making them a great starting point. While here we focus on Object Detection, YOLOv8 also supports tasks like pose estimation and instance segmentation, which is useful for broader AI explorations.

Why Object Detection?

Object detection is the foundational action for a wide variety of perception modules common on today's robots and AI powered systems, enabling vital tasks that include:

Navigation for autonomous vehicles
Obstacle avoidance for drones
Registerless checkout
And much, much more!

While object detection is a well-established task, it is constantly being improved upon by new models and new techniques. The exercise allows us to introduce a very powerful and common model to individuals that may be newer to this field. It also allows us to teach our users a wide array of vital concepts in an easily accessible exercise.

Why Use Synthetic Data?

Synthetic data is generated from simulated real-world environments to create labeled datasets for training AI models. Its benefits include:

Efficiency: Eliminate costly and time-consuming data collection and labeling efforts.
Customizability: Tailor datasets for specific tasks that may require detecting rare edge cases or operating in varying lighting conditions.
Scalability: Generate diverse datasets on demand.

For these exercises, we trained YOLOv8 to detect two objects, a cereal box and a soup can, in an indoor setting. Using Falcon, we replicated real-world conditions while introducing randomness for more robust results. This includes object pose variations, visual occlusions, and diverse lighting conditions.

Step 1: Creating a Dataset in FalconEditor

For generating synthetic data, we use FalconEditor- our integrated development environment that lets us create any scenario of interest, deploy virtual sensors, run simulations, and generate data all in one place.

Our documentation (1. Binary Install - Documentation - FalconCloud by Duality AI) walks you through the installation and configuration of FalconEditor. Just create a free EDU account to get started.

Simulation Setup

The scenarios contain 4 main components:

Digital Twin of an Object for Detection: A Cheerios box for Exercise 1 and a soup can (or twin of your choosing) for Exercise 2.
Environment: It’s crucial to align the virtual environment with real-world testing scenarios. A small room simulating indoor lighting conditions. ‍
Digital Twins of Other Objects: various other objects to add variety and realism to the training data and produce a more robust model.‍
Sensor: A camera for capturing the images.

Note: All of the digital twins and assets needed to run this scenario are provided for free in FalconCloud, which you can access with the EDU account.

Setting up the scenario with hero object, additional clutter object, and a virtual camera sensor.

Creating Intentional Data

While we could place the main object anywhere in the room, and similarly, place the camera in any desired location, our goal is to create an intentional, information rich data set that is indistinguishable from one that could exist in the real-world (read more about how to make this type of robust datasets here). To achieve this, we need to consider some parameters.

A robust dataset includes:

Diverse views: A cereal box looks different from different angles, so we need to make sure all of the object's sides are represented.
Controlled randomness: Randomness is introduced by dropping objects to create new object poses, occlusions, groupings, and more. This approach, supported by Falcon’s physics system, captures diverse object poses and interactions, enhancing dataset richness. Camera angles are randomized for the same reason.
Comprehensive conditions: Controlling parameters like lighting, distance from camera, prevalence of partial object occlusions, and more, is vital for adequately training the model and to prevent misalignment. Misaligned data in AI training refers to data that does not accurately represent the intended task or objective leading to biased, or unreliable model behavior. They can lead to poor model performance, often manifesting in false positives or false negatives.

Adjusting Simulation Parameters with Python

Fully controlling simulation parameters is fast in Falcon. While Exercises 1 and 2 provide fully pre-configured parameters, the instructions still walk through key variables and how to adjust them. These include:

Camera Parameters

It's a good idea to match the parameters of the simulation camera to those of the real camera used to take the testing images.

Post Processing Parameters

Post processing covers all alterations made to an image after it has been generated in the scenario. They’re designed to mimic the artifacts found in photographs taken by real cameras with real optics. These include:

Focal distance: the distance at which an object will be in focus
Depth of field: the size of area in focus and amount of blurring for out of focus areas of the image
Vignetting: real-world lenses often produce a degree of light fall-off around the edges. This can look different for every lens and every sensor, but can be quickly replicated in FalconEditor.

Twin Parameters

These parameters define the starting conditions of the digital twins in the scenario – in this case the cereal box or soup can. For Exercises 1 and 2, main parameters include:

The digital twin of the item that the model is learning to detect
Max height from which the items are dropped into the scene
The area in which the twin will drop into the scene

Just Press Play! — Running the Simulation

Falcon can be customized with various modules designed for different synthetic data workflows. Here we’re using FalconVision, a module specifically designed to streamline data generation for training vision models. Once the scenario is set up, running it is literally as easy as pressing “Play”.

Generating a comprehensive synthetic dataset, with varieties of angles, placement, and lighting.

The scenario's Python script will guide both the positioning of objects as well as the image capture process. The script will first instruct Falcon to drop each clutter twin at a random location within the designated volume, then drop the chosen object twin, and finally position the camera and capture an image. It then repeats this process, capturing the twin in different locations, positions, and angles within the environment until it has created a full dataset. As the images are generated, they automatically save into output folders that will later be used to train the AI model. The captured images, along with the simultaneously generated YOLO annotations, form our synthetic dataset.

This process takes about 30-40 minutes.

Note: How do we know the annotations are accurate? With digital twins, all of the ground truth information is contained within the scenario, and Falcon knows exactly which pixel belongs to what object, making annotation labels 100% accurate.

Step 2: Training YOLOv8

Understanding the Synthetic Data and How it’s Used in Training

The generated images and annotations are automatically separated into two sets: training data and validation data. The model uses the synthetic training data to adjust parameters and “learn” to detect the chosen object. It then uses the separate validation synthetic data to evaluate how well it learned to detect the object in the non-training dataset.

Automatically separating the data into "train" and "validation" sets, as well as populating the "train" and "predict" scripts needed for the training process

Simplify with Google Colab

Training can be carried out on your local machine, but learners or casual users can use Google Colab, a free processing service, to train and test their model. This avoids potential installation conflicts or memory concerns. With Google Colab, you can:

Access GPUs free of charge.
Write and execute Python on your browser — a vital component since the training process is controlled by a few Python scripts (these are covered in detail in the EDU exercises).
Train models without need for cumbersome configuration of their own machines.

Anyone with a Google account already has access to Colab and you can learn more about it here: https://colab.research.google.com/ . We also have instructions in our documentation.

Start Training

Exercise 1 and Exercise 2 automatically provide training and testing scripts that you can simply run either on your machine or on Google Colab. You can adjust some parameters such as location of the testing data, location of the training data, or number of epochs.

As the model is training, the script outputs epoch progress, loss functions, and mAP50 metrics per epoch. When it is finished, it plots these metrics and outputs them onto a graph that the user can analyze for potential training issues. The exercises outline some of the more common problems such as overfitting, underfitting, or diverging loss. See our documentation for more.

The training script graphs metrics such as loss, mAP50, precision, and recall for each epoch to track progress and identify potential issues

While this process does output a mAP50 score, it’s important to note that this is based on only the synthetic data at this point in the process. To know if the model truly works in the physical world, we have to test it using real-world images.

Step 3: Testing YOLOv8

As with the training above, the testing scripts for Exercises 1 and 2 are already set up so that the user just has to run the testing script, and it will test the model using annotated real-world images that we provide. The extension for Exercise 2 outlines how to take and annotate your own real-world images, so that more advanced users can create their own testing set, and tweak their simulation to provide aligned synthetic images.

Once tested, the script outputs the following:

mAP50 score: Our exercises achieve mAP50 scores greater than 0.9
Precision and Recall graphs: Plots of the precision and recall at each confidence level. This can help the user pinpoint the model’s general capabilities‍
Predictions folder: All of the model’s predictions for where the object is in the testing images. These can help the user understand specific situations where the model fails to perform well.

Visualized predicted bounding boxes and confidence scores for each testing image

Analyzing Results

After the first round of training, you may want to push your mAP50 even higher; for example, manufacturers might want an mAP50 at or greater than .99. A strong advantage of synthetic training data over real-world training data is that we can easily go back and create new training data that will create a more robust model, tweaking parameters, and extending the dataset to provide the model with the information it needs to learn and perform well.

So what kind of changes can we introduce in our synthetic data to improve model performance? And how does simulation make this a breeze?

Lighting: Adjust brightness, darkness, and color temperature for variety. Unlike real-world lighting, we can easily adjust simulation lighting and we can program in lighting variation throughout our dataset, so that we can train our model to perform under various lighting conditions.
Other Objects: Include realistic quantities and placements of objects to teach the model what not to detect. FalconCloud provides a library of assets you can use and our documentation outlines how to create your own twins. Additionally, your simulation can provide additional variety, such as the random spawn location of digital twins.
Background: Use varied or realistic backgrounds to enhance the simulation’s alignment with real world spaces. Intentionally crafted environments allow users to train for specific conditions, as well as easily allowing them to train in a range of environments.
Posing: Ensure hero objects mimic realistic poses found in testing data. Simulation allows us to closely control this.
Occlusion: Include partially visible or obstructed hero objects to match real-world scenarios. The digital twin library provides users with a variety of assets.

We don’t necessarily need to adjust ALL of these parameters for successful training. This is another area where synthetic data makes it easy to quickly try out variations to find what produces better results — a much more difficult task with real-world data.

Beyond the Exercises

Exercises 1 and 2 are designed to equip users with key AI training knowledge, baseline simulation skills, as well as basic scripts and functions needed to begin creating their own projects. Each exercise ends with a challenge for users to either continue improving the model or to edit the simulation for their own novel twins.

For learners of all levels Duality is regularly offering live courses to teach more niche and intensive skills. These include: digital twin creation, blueprint breakdowns, simulation setups, and many more. All of these resources are designed to lower the barrier for anyone looking to take advantage of the vast possibilities offered by synthetic data to build smarter, more versatile AI models.

Ready to get started? Create your FREE Falcon EDU account to try these exercises for yourself. And then start your own synthetic data projects for any application you can think of!

‍

Recommended for you

Capabilities

With Falcon 4.4 and New EDU License, Duality is Empowering the Next Generation of AI Developers

Insights

Creating Simulation-Ready Digital Twins (The Digital Twin Encapsulation Standard, Presented at I/ITSEC 2023)

Capabilities

Why YOLOv8?

Why Object Detection?

Why Use Synthetic Data?

Step 1: Creating a Dataset in FalconEditor

Simulation Setup

Creating Intentional Data

Adjusting Simulation Parameters with Python

Camera Parameters

Post Processing Parameters

Twin Parameters

Just Press Play! — Running the Simulation

Step 2: Training YOLOv8

Understanding the Synthetic Data and How it’s Used in Training

Simplify with Google Colab

Start Training

Step 3: Testing YOLOv8

Analyzing Results

Beyond the Exercises

Recommended for you

With Falcon 4.4 and New EDU License, Duality is Empowering the Next Generation of AI Developers

Creating Simulation-Ready Digital Twins (The Digital Twin Encapsulation Standard, Presented at I/ITSEC 2023)

Falcon 4.2 with FalconEditor is Here!