Releases · tensorflow/datasets

04 Nov 12:02

Conchylicultor

v4.1.0

1a22a94

v4.1.0

When generating a dataset, if download fails for any reason, it is now possible to manually download the data. See doc.
Simplification of the dataset creation API.
- We've made it is easier to create datasets outside TFDS repository (see our updated dataset creation guide).
- _split_generators should now returns {'split_name': self._generate_examples(), ...} (but current datasets are backward compatible).
- All dataset inherit from tfds.core.GeneratorBasedBuilder. Converting a dataset to beam now only require changing _generate_examples (see example and doc).
- tfds.core.SplitGenerator, tfds.core.BeamBasedBuilder are deprecated and will be removed in future version.
Better pathlib.Path, os.PathLike compatibility:
- dl_manager.manual_dir now returns a pathlib-Like object. Example:
```
text = (dl_manager.manual_dir / 'downloaded-text.txt').read_text()
```
- Note: Other dl_manager.download, .extract,... will return pathlib-like objects in future versions
- FeatureConnector,... and most functions should accept PathLike objects. Let us know if some functions you need are missing.
- Add a tfds.core.as_path to create pathlib.Path-like objects compatible with GCS (e.g. tfds.core.as_path('gs://my-bucket/labels.csv').read_text()).
Other bug fixes and improvement. E.g.
- Add verify_ssl= option to tfds.download.DownloadConfig to disable SSH certificate during download.
- BuilderConfig are now compatible with Beam datasets #2348
- --record_checksums now assume the new dataset-as-folder model
- tfds.features.Images can accept encoded bytes images directly (useful when used with img_name, img_bytes = dl_manager.iter_archive('images.zip')).
- Doc API now show deprecated methods, abstract methods to overwrite are now documented.
- You can generate imagenet2012 with only a single split (e.g. only the validation data). Other split will be skipped if not present.
And of course new datasets

Thank you to all our contributors for improving TFDS!

Assets 2

09 Oct 17:45

Conchylicultor

v4.0.1

22e8539

v4.0.1

Fix tfds.load when generation code isn't present
Fix improve GCS compatibility.

Thanks @carlthome for reporting and fixing the issue.

Assets 2

06 Oct 19:15

Conchylicultor

v4.0.0

3bfcc7e

v4.0.0

API changes, new features:

Dataset-as-folder: Dataset can now be self-contained module in a folder with checksums, dummy data,... This simplify implementing datasets outside the TFDS repository.
tfds.load can now load dataset without using the generation class. So tfds.load('my_dataset:1.0.0') can work even if MyDataset.VERSION == '2.0.0' (See #2493).
Add a new TFDS CLI (see https://www.tensorflow.org/datasets/cli for detail)
tfds.testing.mock_data does not require metadata files anymore!
Add tfds.as_dataframe(ds, ds_info) with custom visualisation (example)
Add tfds.even_splits to generate subsplits (e.g. tfds.even_splits('train', n=3) == ['train[0%:33%]', 'train[33%:67%]', ...]
Add new DatasetBuilder.RELEASE_NOTES property
tfds.features.Image now supports PNG with 4-channels
tfds.ImageFolder now supports custom shape, dtype
Downloaded URLs are available through MyDataset.url_infos
Add skip_prefetch option to tfds.ReadConfig
as_supervised=True support for tfds.show_examples, tfds.as_dataframe

Breaking compatible changes:

tfds.as_numpy() now returns an iterable which can be iterated multiple times. To migrate next(ds) -> next(iter(ds))
Rename tfds.features.text.Xyz -> tfds.deprecated.text.Xyz
Remove DatasetBuilder.IN_DEVELOPMENT property
Remove tfds.core.disallow_positional_args (should use Py3 *, instead)
tfds.features can now be saved/loaded, you may have to overwrite FeatureConnector.from_json_content and FeatureConnector.to_json_content to support this feature.
Stop testing against TF 1.15. Requires Python 3.6.8+.

Other bug fixes:

Better archive extension detection for dl_manager.download_and_extract
Fix tfds.__version__ in TFDS nightly to be PEP440 compliant
Fix crash when GCS not available
Script to detect dead-urls
Improved open-source workflow, contributor guide, documentation
Many other internal cleanups, bugs, dead code removal, py2->py3 cleanup, pytype annotations,...

And of course, new datasets, datasets updates.

A gigantic thanks to our community which has helped us debugging issues and with the implementation of many features, especially vijayphoenix@ for being a major contributor.

Assets 2

12 Aug 10:05

Conchylicultor

v3.2.1

e7bff10

v3.2.1

Fix an issue with GCS on Windows.

Assets 2

10 Jul 21:39

Conchylicultor

v3.2.0

61ce68c

v3.2.0

Future breaking change:

The tfds.features.text encoding API is deprecated. Please use tensorflow_text instead.

New features

API:

Add a tfds.ImageFolder and tfds.TranslateFolder to easily create custom datasets with your custom data.
Add a tfds.ReadConfig(input_context=) to shard dataset, for better multi-worker compatibility (#1426).
The default data_dir can be controlled by the TFDS_DATA_DIR environment variable.
Better usability when developing datasets outside TFDS
- Downloads are always cached
- Checksum are optional
Added a tfds.show_statistics(ds_info) to display FACETS OVERVIEW. Note: This require the dataset to have been generated with the statistics.
Open source various scripts to help deployment/documentation (Generate catalog documentation, export all metadata files,...)

Documentation:

Catalog display images (example)
Catalog shows which dataset have been recently added and are only available in tfds-nightly nights_stay

Breaking compatibility change:

Fix deterministic example order on Windows when path was used as key (this only impact a few datasets). Now example order should be the same on all platforms.
Remove tfds.load('image_label_folder') in favor of the more user-friendly tfds.ImageFolder

Other:

Various performances improvements for both generation and reading (e.g. use __slot__, fix parallelisation bug in tf.data.TFRecordReader,...)
Various fixes (typo, types annotations, better error messages, fixing dead links, better windows compatibility,...)

Thanks to all our contributors who help improving the state of dataset for the entire research community!

Assets 2

30 Apr 00:18

Conchylicultor

v3.1.0

1705802

v3.1.0

Beaking compatibility change:

Rename tfds.core.NamedSplit, tfds.core.SplitBase -> tfds.Split. Now tfds.Split.TRAIN,... are instance of tfds.Split
Remove deprecated num_shards argument from tfds.core.SplitGenerator. This argument was ignored as shards are automatically computed.

Future breaking compatibility changes:

Rename interleave_parallel_reads -> interleave_cycle_length for tfds.ReadConfig.
Invert ds, ds_info argument orders for tfds.show_examplesFuture breaking change:
The tfds.features.text encoding API is deprecated. Please use tensorflow_text instead.

Other changes:

Testing: Add support for custom decoders in tfds.testing.mock_data
Documentation: shows which datasets are only present in tfds-nightly
Documentation: display images for supported datasets
API: Add tfds.builder_cls(name) to access a DatasetBuilder class by name
API: Add info.split['train'].filenames for access to the tf-record files.
API: Add tfds.core.add_data_dir to register an additional data dir
Remove most ds.with_options which where applied by TFDS. Now use tf.data default.
Other bug fixes and improvement (Better error messages, windows compatibility,...)

Thank you all for your contributions, and helping us make TFDS better for everyone!

Assets 2

16 Apr 03:03

Conchylicultor

v3.0.0

717b380

v3.0.0

Breaking changes:

Legacy mode tfds.experiment.S3 has been removed
New image_classification section. Some datasets have been move there from images.
in_memory argument has been removed from as_dataset/tfds.load (small datasets are now auto-cached).
DownloadConfig do not append the dataset name anymore (manual data should be in <manual_dir>/ instead of <manual_dir>/<dataset_name>/)
Tests now check that all dl_manager.download urls has registered checksums. To opt-out, add SKIP_CHECKSUMS = True to your DatasetBuilderTestCase.
tfds.load now always returns tf.compat.v2.Dataset. If you're using still using tf.compat.v1:
- Use tf.compat.v1.data.make_one_shot_iterator(ds) rather than ds.make_one_shot_iterator()
- Use isinstance(ds, tf.compat.v2.Dataset) instead of isinstance(ds, tf.data.Dataset)
tfds.Split.ALL has been removed from the API.

Future breaking change:

The tfds.features.text encoding API is deprecated. Please use tensorflow_text instead.
num_shards argument of tfds.core.SplitGenerator is currently ignored and will be removed in the next version.

Features:

DownloadManager is now pickable (can be used inside Beam pipelines)
tfds.features.Audio:
- Support float as returned value
- Expose sample_rate through info.features['audio'].sample_rate
- Support for encoding audio features from file objects
Various bug fixes, better error messages, documentation improvements
More datasets

Thank you to all our contributors for helping us make TFDS better for everyone!

Assets 2

25 Feb 21:51

Conchylicultor

v2.1.0

8865321

v2.1.0

New features:

Datasets expose info.dataset_size and info.download_size. All datasets generated with 2.1.0 cannot be loaded with previous version (previous datasets can be read with 2.1.0 however).
Auto-caching small datasets. in_memory argument is deprecated and will be removed in a future version.
Datasets expose their cardinality num_examples = tf.data.experimental.cardinality(ds) (Requires tf-nightly or TF >= 2.2.0)
Get the number of example in a sub-splits with: info.splits['train[70%:]'].num_examples

Assets 2

24 Jan 20:02

Conchylicultor

v2.0.0

9dd9d66

v2.0.0

This is the last version of TFDS that will support Python 2. Going forward, we'll only support and test against Python 3.
The default versions of all datasets are now using the S3 slicing API. See the guide for details.
The previous split API is still available, but is deprecated. If you wrote DatasetBuilders outside the TFDS repository, please make sure they do not use experiments={tfds.core.Experiment.S3: False}. This will be removed in the next version, as well as the num_shards kwargs from SplitGenerator.
Several new datasets. Thanks to all the contributors!
API changes and new features:
- shuffle_files defaults to False so that dataset iteration is deterministic by default. You can customize the reading pipeline, including shuffling and interleaving, through the new read_config parameter in tfds.load.
- urls kwargs renamed homepage in DatasetInfo
- Support for nested tfds.features.Sequence and tf.RaggedTensor
- Custom FeatureConnectors can override the decode_batch_example method for efficient decoding when wrapped inside a tfds.features.Sequence(my_connector)
- Declaring a dataset in Colab won't register it, which allow to re-run the cell without having to change the name
- Beam datasets can use a tfds.core.BeamMetadataDict to store additional metadata computed as part of the Beam pipeline.
- Beam datasets' _split_generators accepts an additional pipeline kwargs to define a pipeline shared between all splits.
Various other bug fixes and performance improvements. Thank you for all the reports and fixes!

Assets 2

24 Oct 16:12

Conchylicultor

v1.3.0

95f11eb

v1.3.0

Bug fixes and performance improvements.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: tensorflow/datasets

v4.1.0

v4.0.1

v4.0.0

v3.2.1

v3.2.0

v3.1.0

v3.0.0

v2.1.0

v2.0.0

v1.3.0