mirar.data package

Module to specify the input data classes for :module:`wintedrp.processors`

Submodules

mirar.data.base_data module

This contains the base data classes for the :module:`wintedrp.processors`.

The smallest unit is a DataBlock object, corresponding to a single image. These DataBlock objects are grouped into DataBatch objects. Each BaseProcessor will operate on a individual DataBatch object.

The DataBatch objects are stored within a larger DataSet object. A BaseProcessor will iterate over each DataBatch in a Dataset.

class mirar.data.base_data.DataBatch(batch: list[DataBlock] | DataBlock | None = None)[source]

Bases: PseudoList

Base class for a collection of individual DataBlock objects. Each DataBatch will be operated on by a BaseProcessor

property data_type: Type[DataBlock]: Each list should take one specific data type. This is where that type is defined.

get_batch() → list[DataBlock][source]

Returns the DataBlock items within the batch

Returns:: list of DataBlock objects

get_raw_image_names() → list[Path][source]

Returns the name of each parent raw image

Returns:: list of raw image names

class mirar.data.base_data.DataBlock[source]

Bases: object

Base unit for processing, corresponding to a single image.

get_name() → str[source]

Function to retrieve the :variable:`mirar.paths.BASE_NAME_KEY` of the parent image

Returns:: Base name of parent image

get_raw_img_list() → list[Path][source]

Function to retrieve the paths of all raw images from which this object is derived. Because of stacking, this list may include multiple entries.

Returns:: List of path strings

class mirar.data.base_data.Dataset(batches: list[DataBatch] | DataBatch | None = None)[source]

Bases: PseudoList

Base class for a collection of individual DataBatch objects. A BaseProcessor will iterate over these.

append(item: DataBatch)[source]

Function to append, list-style, new objects.

Parameters:: item – Object to be added
Returns:: None

data_type: alias of DataBatch

get_batches()[source]

Returns the DataBatch items within the batch

Returns:: list of DataBatch objects

class mirar.data.base_data.PseudoList(data_list=None)[source]

Bases: object

Base Class for a list-like object which contains a list of data. Other classes inherit from this object.

The basic idea is that this class holds all the functions for safely creating an object with a specified data type.

This class also contains the relevant magic functions so that len(x), x[i] = N, and for y in x work as intended.

append(item)[source]

Function to append, list-style, new objects.

Parameters:: item – Object to be added
Returns:: None

property data_type: Each list should take one specific data type. This is where that type is defined.

get_data_list()[source]

Retrieve the data list

Returns:: The saved list of objects

mirar.data.cache module

Central module for handling the cache, currently used only for storing image data.

class mirar.data.cache.Cache[source]

Bases: object

A cache object for storing temporary data

cache_dir: Path | None = None

get_cache_dir() → Path[source]

Returns the current cache dir

Returns:: Cache dir

set_cache_dir(cache_dir: Path | str)[source]

Function to set the cache directory

Parameters:: cache_dir – Cache dir to set
Returns:: None

exception mirar.data.cache.CacheError[source]

Bases: Exception

Error Relating to cache

mirar.data.image_data module

Module to specify the input data classes for mirar.processors.base_processor.ImageHandler

The basic idea of the code is to pass DataBlock objects through a series of BaseProcessor objects. Since a given image can easily be ~10-100Mb, and there may be several hundred raw images from a typical survey in a given night, the total data volume for these processors could be several 10s of Gb or more. Storing these all in RAM would be very inefficient/slow for a typical laptop or many larger processing machines.

To mitigate this, the code can be operated in cache mode. In that case, after raw images are loaded, only the header data is stored in memory. The actual image data itself is stored temporarily in as a npy file in a dedicated cache directory, and only loaded into memory when needed. When the data is updated, the npy file is changed. The path of the file is a unique hash, and includes the read time of the file, so multiple copies of an image can be read and modified independently.

In cache mode, all of the image data is temporarily stored in a cache, and this cache can therefore reach the size of 10s of Gb. The location of the cache is in the configurable output data directory. This would increase linearly with successive code executions. To mitigate that, and to avoid cleaning the cache by hand, the code tries to automatically delete cache files as needed.

Python provides a default __del__() method for handling clean up when an object is deleted. Images automatically delete their cache in this method. However, has a somewhat-complicated method of ‘garbage collection’ (see the official description for more info), and it is not guaranteed that Image objects will clean themselves.

As a fallback, when you run the code from the command line (and therefore call __main__), we use the standard python tempfile library <https://docs.python.org/3/library/tempfile.html> to create a temporary directory, and set this as a cache. We call the directory using with context manager, ensuring that cleanup runs automatically before exiting, even if the code crashes/raises errors. We also use tempfile and careful cleaning

for the unit tests, as provided by the base test class. If you try to interact with the code in any other way, please be mindful of this behaviour, and ensure that you clean your cache in a responsible way!

If you don’t like this feature, you don’t need to use it. Cache mode is entirely optional, and can be disabled by setting the environment variable to false.

You can change this via an environment variable.

export USE_WINTER_CACHE = false

See Usage for more information about selecting cache mode, and setting the output data directory.

class mirar.data.image_data.Image(data: ndarray, header: Header)[source]

Bases: DataBlock

A subclass of DataBlock, containing an image and header.

This class serves as input for BaseImageProcessor and BaseCandidateGenerator processors.

cache_files = []

get_cache_data() → ndarray[source]

Get the image data from cache

Returns:: image data (numpy array)

get_cache_path() → Path[source]

Get a unique cache path for the image (.npy file). This is hash, using name and time, so should be unique even when rerunning on the same image.

Returns:: unique cache file path

get_data() → ndarray[source]

Get the image data from cache

Returns:: image data (numpy array)

get_header() → Header[source]

Get the image header

Returns:: astropy Header

get_mask() → ndarray[source]

Get the mask data for an image. 0 is masked, 1 is unmasked.

Returns:: mask data (numpy array)

get_ram_data() → ndarray[source]

Get the image data from RAM

Returns:: image data (numpy array)

keys()[source]

Get the header keys

Returns:: Keys of header

set_cache_data(data: ndarray)[source]

Set the data with cache

Parameters:: data – Updated image data
Returns:: None

set_data(data: ndarray)[source]

Set the data with cache

Parameters:: data – Updated image data
Returns:: None

set_header(header: Header)[source]

Update the header

Parameters:: header – updated header
Returns:: None

set_ram_data(data: ndarray)[source]

Set the data in RAM

Parameters:: data – Updated image data
Returns:: None

class mirar.data.image_data.ImageBatch(batch: list[Image] | Image | None = None)[source]

Bases: DataBatch

A subclass of DataBatch, which contains Image objects

To batch, de-batch, and select objects within batches, see ImageBatcher, ImageDebatcher, and ImageSelector.

append(item: Image)[source]

Function to append, list-style, new objects.

Parameters:: item – Object to be added
Returns:: None

data_type: alias of Image

get_batch() → list[Image][source]

Returns the ImageBatch items within the batch

Returns:: list of Image objects

mirar.data.source_data module

Module for SourceTable objects, and their corresponding SourceBatches

class mirar.data.source_data.SourceBatch(batch: list[SourceTable] | SourceTable | None = None)[source]

Bases: DataBatch

DataBatch class for holding SourceTables

append(item: SourceTable)[source]

Function to append, list-style, new objects.

Parameters:: item – Object to be added
Returns:: None

data_type: alias of SourceTable

get_batch() → list[SourceTable][source]

Returns the DataBlock items within the batch

Returns:: list of DataBlock objects

class mirar.data.source_data.SourceTable(source_list: DataFrame, metadata: dict)[source]

Bases: DataBlock

Data class for SourceTables, a type data block based around sources detected in an image

get_data() → DataFrame[source]

Get the table of sources

Returns:: source dataframe

get_metadata() → dict[source]

Get the metadata associated with the source table

Returns:: metadata

keys()[source]

Return the metadata keys

Returns:: keys

set_data(source_list: DataFrame)[source]

Set the table of sources

Parameters:: source_list – new source list
Returns:: None

mirar.data.utils module

Utils for data