DataReader

class dragon.io.DataReader(**kwargs)[source]

Read examples from a dataset.

The dataset class and data source are required to create a reader:

# Here we use ``dragon.io.KPLRecordDataset``
dataset = dragon.io.KPLRecordDataset
simple_reader = DataReader(dataset=dataset, source=path)

Shuffle is supported to randomly sampling into a sequence buffer:

shuffle_reader = DataReader(
    dataset=dataset,
    source=path,
    shuffle=True,
    # It is recommended to set a buffer size larger than
    # the batch size to make batches of single node more diverse.
    # Default value 1024 is sufficient for most case.
    initial_fill=1024,
)

Partition are available over distributed nodes:

distributed_reader = DataReader(
    dataset=dataset,
    source=path,
    part_idx=rank,
    num_parts=world_size,
)

__init__

DataReader.__init__(**kwargs)[source]

Create a DataReader.

Parameters:
  • dataset (class) – The dataset class to load examples.
  • source (str) – The path of data source.
  • part_idx (int, optional, default=0) – The index of partition to read.
  • num_parts (int, optional, default=1) – The total number of partitions over dataset.
  • shuffle (bool, optional, default=False) – Whether to shuffle the data.
  • initial_fill (int, optional, default=1024) – The length of sampling sequence for shuffle.
  • seed (int, optional) – The random seed to use instead.

Methods

before_first

DataReader.before_first()[source]

Move the cursor before begin.

next_example

DataReader.next_example()[source]

Return the next example.

reset

DataReader.reset(stick_to_part=False)[source]

Reset the environment of dataset.

run

DataReader.run()[source]

Start the process.