Skip to content

v4.1.0

Compare
Choose a tag to compare
@Conchylicultor Conchylicultor released this 04 Nov 12:02
  • When generating a dataset, if download fails for any reason, it is now possible to manually download the data. See doc.

  • Simplification of the dataset creation API.

    • We've made it is easier to create datasets outside TFDS repository (see our updated dataset creation guide).
    • _split_generators should now returns {'split_name': self._generate_examples(), ...} (but current datasets are backward compatible).
    • All dataset inherit from tfds.core.GeneratorBasedBuilder. Converting a dataset to beam now only require changing _generate_examples (see example and doc).
    • tfds.core.SplitGenerator, tfds.core.BeamBasedBuilder are deprecated and will be removed in future version.
  • Better pathlib.Path, os.PathLike compatibility:

    • dl_manager.manual_dir now returns a pathlib-Like object. Example:
    text = (dl_manager.manual_dir / 'downloaded-text.txt').read_text()
    • Note: Other dl_manager.download, .extract,... will return pathlib-like objects in future versions
    • FeatureConnector,... and most functions should accept PathLike objects. Let us know if some functions you need are missing.
    • Add a tfds.core.as_path to create pathlib.Path-like objects compatible with GCS (e.g. tfds.core.as_path('gs://my-bucket/labels.csv').read_text()).
  • Other bug fixes and improvement. E.g.

    • Add verify_ssl= option to tfds.download.DownloadConfig to disable SSH certificate during download.
    • BuilderConfig are now compatible with Beam datasets #2348
    • --record_checksums now assume the new dataset-as-folder model
    • tfds.features.Images can accept encoded bytes images directly (useful when used with img_name, img_bytes = dl_manager.iter_archive('images.zip')).
    • Doc API now show deprecated methods, abstract methods to overwrite are now documented.
    • You can generate imagenet2012 with only a single split (e.g. only the validation data). Other split will be skipped if not present.
  • And of course new datasets

Thank you to all our contributors for improving TFDS!