This repository is meant to provide a simple way to access the OOD Detection Benchmark: a comprehensive benchmark suite designed to evaluate machine learning models performing Out-Of-Distribution Detection, with a specific focus on its semantic aspect.
First, download the required datasets and make sure they're stored with their default layouts, as reported below:
- DomainNet (downloads:
clipart,
infograph,
painting,
quickdraw,
real,
sketch):
The root directories for the 6 different domains are expected to be located under the same parent folder, which has to be speficied either in~/.ooddb/config.json
or at run-time to theDataset
class (see below):- For Quickdraw, the expected image file format is:
<root_dir>/quickdraw/<class_name>/<img_id>.png
where<class_name>
refers to the natural name of the category (e.g.,book
). - For all the other domains, the expected format is:
<root_dir>/<domain>/<class_name>/<domain>_<class_id>_<img_id>.jpg
where<domain>
is the lowercase domain name and<class_id>
is the original class index (a 3-digit integer).
- For Quickdraw, the expected image file format is:
- DTD
(download):
<root_dir>/images/<class_name>/<class_name>_<img_id>.jpg
- PatternNet
(download):
<root_dir>/images/<class_name>/<short_class_name><img_id>.jpg
- Stanford Cars
(download):
<root_dir>/cars_<split>/<img_id>.jpg
where<split>
is eithertrain
ortest
- SUN
(download):
<root_dir>/<class_name_initial>/<class_name>/sun_<img_id>.jpg
You can easily access the benchmark splits by installing the OODDB
package via pip:
pip install git+https://github.com/ooddb/OODDB.git
By default, the on-disk datasets root folder locations are read from the ~/.ooddb/config.json
file, which is automatically created and populated with the default values if it doesn't exist yet:
{
"domainnet": "~/data/DomainNet",
"dtd": "~/data/DTD",
"patternnet": "~/data/PatternNet",
"stanford_cars": "~/data/Stanford_Cars",
"sun": "~/data/SUN397",
}
You can specify any value you prefer or overrride the dataset location by providing it at run-time (see below).
The OODDB
package exposes a single Dataset
class which accepts different dataset and split names
to access the desired data.
Usage example:
from torchvision import transforms
from OODDB import Dataset
dataset = Dataset(
dataset_name="sun",
split="train",
order=0,
root_dir="~/data/my_folder",
transform=transforms.ToTensor()
)
Specifically, the Dataset
class accepts the following parameters:
dataset_name
: the name of the desired dataset. The supported values are:domainnet
for DomainNetdtd
for DTDpatternnet
for PatternNetstanford_cars
for Stanford Carssun
for SUN
split
: the name of the desired split.- DomainNet accepts
<domain>_train
,<domain>_test
orno_<domain>_train
, where<domain>
must be one amongclipart
,infograph
,painting
,quickdraw
,real
andsketch
. - All the other datasets only accept either
train
ortest
.
- DomainNet accepts
order
: one of the three data orders provided for the selected dataset. Must be a value between 0 and 2 (inclusive). (WARNING: each data order is not compatible with the others as the class ids differ for each one of them. Use the sameorder
value for both the train and test splits.)root_dir
: the dataset root location on disk. If not specified, the value from~/.ooddb/config.json
will be used.transform
: a function/transform to be applied to aPIL
Image
.
For more details, see dataset.py.
If you prefer to use your own Dataset
class instead, you can utilize the OODDB.utils.get_dataset_split_info
function to retrieve the necessary information. Example:
from OODDB.utils import get_dataset_split_info
file_names, labels, class_idx_to_name = get_dataset_split_info(
dataset="sun",
split="train",
data_order=0
)
The benchmark supports the two following tracks:
In this case the train and test samples are drawn from the same visual data distribution. With the exception of DomainNet, all the datasets exclusively support this setting. As for the former, you can execute this track by selecting the same domain for both the train and test data.
Example (DTD):
from OODDB import Dataset
DATASET="dtd"
DATA_ORDER=0
train_dataset = Dataset(DATASET, split="train", order=DATA_ORDER)
test_dataset = Dataset(DATASET, split="test", order=DATA_ORDER)
Example (DomainNet Painting):
from OODDB import Dataset
DATASET="domainnet"
DATA_ORDER=0
train_dataset = Dataset(DATASET, split="painting_train", order=DATA_ORDER)
test_dataset = Dataset(DATASET, split="painting_test", order=DATA_ORDER)
In this case the train and test samples are drawn from different visual data distributions. This track is only supported by DomainNet and can be further divided into single-source (all the train samples belong to the same single domain) and multi-source (the train samples are drawn from several different domains, disjoint from the test one). In both cases the test samples belong to a single domain (i.e., both settings are single-target).
In order to execute this track, do the following:
- single-source: select two different domains for the train and test splits.
Example (Clipart → Sketch):from OODDB import Dataset DATASET="domainnet" DATA_ORDER=0 train_dataset = Dataset(DATASET, split="clipart_train", order=DATA_ORDER) test_dataset = Dataset(DATASET, split="sketch_test", order=DATA_ORDER)
- multi-source: select
no_<domain>
for the train split and and<domain>
for the test one.
Example (Quickdraw):from OODDB import Dataset DATASET="domainnet" DATA_ORDER=0 train_dataset = Dataset(DATASET, split="no_quickdraw_train", order=DATA_ORDER) test_dataset = Dataset(DATASET, split="quickdraw_test", order=DATA_ORDER)