-
Notifications
You must be signed in to change notification settings - Fork 27
Home
Biggus is a pure-Python library for handling very large (i.e. too large for system memory) n-dimensional arrays.
It has two main components:
- Representation, lazy indexing, and conversion to persistent files and NumPy arrays.
- Lazy calculation.
At the core of Biggus is the Array
which provides a simple, consistent, NumPy-esque interface to n-dimensional data which avoids reading data until explicitly requested by user code. Commonly these Array objects are created by wrapping "concrete" data sources such as HDF5 variables, netCDF4 variables, or even just NumPy arrays.
Once created, Array objects can be concatenated and stacked to form new Array objects, which can themselves be concatenated and stacked as required. In this way it is possible to construct virtual arrays of arbitrary size, spanning multiple data sources.
In addition, all Array objects can be indexed to extract subsets. As with the concatenation and stacking operations, this does not cause any data to be read.
User code may request any Array object be saved to a "concrete" data form (e.g. HDF5, etc. as above). The size of this operation is not limited by system memory. Alternatively, user code may explicitly request any Array object to provide the corresponding NumPy array in memory. It is the responsibility of the user code to determine if this is an appropriate action.
Currently (Aug 2013), this is still at the proof-of-concept stage. But early results have indicated that it is quite possible for simple Python code to perform large, out-of-core calculations at a rate that meets or even exceeds other tools in common usage.
To get more ideas of what Biggus can do, please browse the examples.
If you have any questions or feedback please feel free to post to the discussion group or raise an issue on the issue tracker.