mirar.data package
Module to specify the input data classes for :module:`wintedrp.processors`
Submodules
mirar.data.base_data module
This contains the base data classes for the :module:`wintedrp.processors`.
The smallest unit is a DataBlock object,
corresponding to a single image.
These DataBlock objects are grouped into
DataBatch objects.
Each BaseProcessor will operate on a individual
DataBatch object.
The DataBatch objects are stored within a larger
DataSet object.
A BaseProcessor will iterate over each
DataBatch in a
Dataset.
- class mirar.data.base_data.DataBatch(batch: list[DataBlock] | DataBlock | None = None)[source]
Bases:
PseudoListBase class for a collection of individual
DataBlockobjects. EachDataBatchwill be operated on by aBaseProcessor- property data_type: Type[DataBlock]
Each list should take one specific data type. This is where that type is defined.
- class mirar.data.base_data.DataBlock[source]
Bases:
objectBase unit for processing, corresponding to a single image.
- get_name() str[source]
Function to retrieve the :variable:`mirar.paths.BASE_NAME_KEY` of the parent image
- Returns:
Base name of parent image
- class mirar.data.base_data.Dataset(batches: list[DataBatch] | DataBatch | None = None)[source]
Bases:
PseudoListBase class for a collection of individual
DataBatchobjects. ABaseProcessorwill iterate over these.
- class mirar.data.base_data.PseudoList(data_list=None)[source]
Bases:
objectBase Class for a list-like object which contains a list of data. Other classes inherit from this object.
The basic idea is that this class holds all the functions for safely creating an object with a specified data type.
This class also contains the relevant magic functions so that len(x), x[i] = N, and for y in x work as intended.
- append(item)[source]
Function to append, list-style, new objects.
- Parameters:
item – Object to be added
- Returns:
None
- property data_type
Each list should take one specific data type. This is where that type is defined.
mirar.data.cache module
Central module for handling the cache, currently used only for storing image data.
mirar.data.image_data module
Module to specify the input data classes for
mirar.processors.base_processor.ImageHandler
The basic idea of the code is to pass
DataBlock objects
through a series of BaseProcessor objects.
Since a given image can easily be ~10-100Mb, and there may be several hundred raw images
from a typical survey in a given night, the total data volume for these processors
could be several 10s of Gb or more. Storing these all in RAM would be very
inefficient/slow for a typical laptop or many larger processing machines.
To mitigate this, the code can be operated in cache mode. In that case, after raw images are loaded, only the header data is stored in memory. The actual image data itself is stored temporarily in as a npy file in a dedicated cache directory, and only loaded into memory when needed. When the data is updated, the npy file is changed. The path of the file is a unique hash, and includes the read time of the file, so multiple copies of an image can be read and modified independently.
In cache mode, all of the image data is temporarily stored in a cache, and this cache can therefore reach the size of 10s of Gb. The location of the cache is in the configurable output data directory. This would increase linearly with successive code executions. To mitigate that, and to avoid cleaning the cache by hand, the code tries to automatically delete cache files as needed.
Python provides a default __del__() method for handling clean up when an object is deleted. Images automatically delete their cache in this method. However, has a somewhat-complicated method of ‘garbage collection’ (see the official description for more info), and it is not guaranteed that Image objects will clean themselves.
As a fallback, when you run the code from the command line (and therefore call __main__), we use the standard python tempfile library <https://docs.python.org/3/library/tempfile.html> to create a temporary directory, and set this as a cache. We call the directory using with context manager, ensuring that cleanup runs automatically before exiting, even if the code crashes/raises errors. We also use tempfile and careful cleaning
for the unit tests, as provided by the base test class. If you try to interact with the code in any other way, please be mindful of this behaviour, and ensure that you clean your cache in a responsible way!
If you don’t like this feature, you don’t need to use it. Cache mode is entirely optional, and can be disabled by setting the environment variable to false.
You can change this via an environment variable.
export USE_WINTER_CACHE = false
See Usage for more information about selecting cache mode, and setting the output data directory.
- class mirar.data.image_data.Image(data: ndarray, header: Header)[source]
Bases:
DataBlockA subclass of
DataBlock, containing an image and header.This class serves as input for
BaseImageProcessorandBaseCandidateGeneratorprocessors.- cache_files = []
- get_cache_path() Path[source]
Get a unique cache path for the image (.npy file). This is hash, using name and time, so should be unique even when rerunning on the same image.
- Returns:
unique cache file path
- get_mask() ndarray[source]
Get the mask data for an image. 0 is masked, 1 is unmasked.
- Returns:
mask data (numpy array)
- set_cache_data(data: ndarray)[source]
Set the data with cache
- Parameters:
data – Updated image data
- Returns:
None
- set_data(data: ndarray)[source]
Set the data with cache
- Parameters:
data – Updated image data
- Returns:
None
- class mirar.data.image_data.ImageBatch(batch: list[Image] | Image | None = None)[source]
Bases:
DataBatchA subclass of
DataBatch, which containsImageobjectsTo batch, de-batch, and select objects within batches, see
ImageBatcher,ImageDebatcher, andImageSelector.
mirar.data.source_data module
Module for SourceTable objects, and their corresponding SourceBatches
- class mirar.data.source_data.SourceBatch(batch: list[SourceTable] | SourceTable | None = None)[source]
Bases:
DataBatchDataBatch class for holding SourceTables
- append(item: SourceTable)[source]
Function to append, list-style, new objects.
- Parameters:
item – Object to be added
- Returns:
None
- data_type
alias of
SourceTable
mirar.data.utils module
Utils for data