mlreco.iotools.datasets module

class mlreco.iotools.datasets.LArCVDataset(data_schema, data_keys, limit_num_files=0, limit_num_samples=0, event_list=None, skip_event_list=None)[source]

Bases: Generic[torch.utils.data.dataset.T_co]

A generic interface for LArCV data files.

This Dataset is designed to produce a batch of arbitrary number of data chunks (e.g. input data matrix, segmentation label, point proposal target, clustering labels, etc.). Each data chunk is processed by parser functions defined in the iotools.parsers module. LArCVDataset object can be configured with arbitrary number of parser functions where each function can take arbitrary number of LArCV event data objects. The assumption is that each data chunk respects the LArCV event boundary.

__init__(data_schema, data_keys, limit_num_files=0, limit_num_samples=0, event_list=None, skip_event_list=None)[source]

Instantiates the LArCVDataset.

Parameters
  • data_schema (dict) –

    A dictionary of (string, dictionary) pairs. The key is a unique name of a data chunk in a batch and the associated dictionary must include:

    • parser: name of the parser

    • args: (key, value) pairs that correspond to parser argument names and their values

    The nested dictionaries can replaced be lists, in which case they will be considered as parser argument values, in order.

  • data_keys (list) – a list of strings that is required to be present in the file paths

  • limit_num_files (int) – an integer limiting number of files to be taken per data directory

  • limit_num_samples (int) – an integer limiting number of samples to be taken per data

  • event_list (list) – a list of integers to specify which event (ttree index) to process

  • skip_event_list (list) – a list of integers to specify which events (ttree index) to skip

__module__ = 'mlreco.iotools.datasets'
__parameters__ = ()
static list_data(f)[source]
static get_event_list(cfg, key)[source]
static create(cfg)[source]
data_keys()[source]
__len__()[source]
__getitem__(idx)[source]