DataIterator

class dragon.vision.DataIterator(**kwargs)[source]

Iterator to return the batch of data for image classification.

Usually, we will pack serialized data into KPLRecord:

writer = dragon.io.KPLRecordWriter(
    path,
    protocol={
        'data': 'bytes',  # Content of image
        'encoded': 'int64',  # Image is encoded?
        'shape': ['int64'],  # (H, W, C)
        'label': ['int64'],  # Label index
    }
)
for example in examples:
    writer.write(example)

Defining an iterator will start the prefetch processes:

iterator = dragon.vision.DataIterator(
    dataset=dragon.io.KPLRecordDataset,
    source=path,
    batch_size=32,
    shuffle=True,
    phase='TRAIN',  # Flag to determine some methods
)

Then, you can get a batch of data by Iterator.next():

images, labels = iterator.next()

__init__

DataIterator.__init__(**kwargs)[source]

Create a DataIterator.

Parameters:
  • dataset (class) – The dataset class to load examples.
  • source (str) – The path of data source.
  • shuffle (bool, optional, default=False) – Whether to shuffle the data.
  • initial_fill (int, optional, default=1024) – The length of sampling sequence for shuffle.
  • resize (int, optional, default=0) – The size for the shortest edge.
  • padding (int, optional, default=0) – The size for the zero-padding on two sides.
  • fill_value (Union[int, Sequence], optional, default=127) – The value(s) to fill for padding or cutout.
  • crop_size (int, optional, default=0) – The size for random-or-center cropping.
  • random_crop_size (int, optional, default=0) – The size for sampling-based random cropping.
  • cutout_size (int, optional, default=0) – The square size for the cutout algorithm.
  • mirror (bool, optional, default=False) – Whether to apply the mirror (flip horizontally).
  • random_scales (Sequence[float], optional, default=(0.08, 1.)) – The range of scales to sample a crop randomly.
  • random_aspect_ratios (Sequence[float], optional, default=(0.75, 1.33)) – The range of aspect ratios to sample a crop randomly.
  • distort_color (bool, optional, default=False) – Whether to apply color distortion.
  • inverse_color (bool, option, default=False) – Whether to inverse channels for color images.
  • phase ({'TRAIN', 'TEST'}, optional) – The optional running phase.
  • batch_size (int, optional, default=128) – The size of a mini-batch.
  • prefetch (int, optional, default=4) – The prefetch count.
  • num_transformers (int, optional, default=-1) – The number of transformers to process image.
  • seed (int, optional) – The random seed to use instead.

Methods