v4.1.0
-
When generating a dataset, if download fails for any reason, it is now possible to manually download the data. See doc.
-
Simplification of the dataset creation API.
- We've made it is easier to create datasets outside TFDS repository (see our updated dataset creation guide).
_split_generators
should now returns{'split_name': self._generate_examples(), ...}
(but current datasets are backward compatible).- All dataset inherit from
tfds.core.GeneratorBasedBuilder
. Converting a dataset to beam now only require changing_generate_examples
(see example and doc). tfds.core.SplitGenerator
,tfds.core.BeamBasedBuilder
are deprecated and will be removed in future version.
-
Better
pathlib.Path
,os.PathLike
compatibility:dl_manager.manual_dir
now returns a pathlib-Like object. Example:
text = (dl_manager.manual_dir / 'downloaded-text.txt').read_text()
- Note: Other
dl_manager.download
,.extract
,... will return pathlib-like objects in future versions FeatureConnector
,... and most functions should acceptPathLike
objects. Let us know if some functions you need are missing.- Add a
tfds.core.as_path
to create pathlib.Path-like objects compatible with GCS (e.g.tfds.core.as_path('gs://my-bucket/labels.csv').read_text()
).
-
Other bug fixes and improvement. E.g.
- Add
verify_ssl=
option totfds.download.DownloadConfig
to disable SSH certificate during download. BuilderConfig
are now compatible with Beam datasets #2348--record_checksums
now assume the new dataset-as-folder modeltfds.features.Images
can accept encodedbytes
images directly (useful when used withimg_name, img_bytes = dl_manager.iter_archive('images.zip')
).- Doc API now show deprecated methods, abstract methods to overwrite are now documented.
- You can generate
imagenet2012
with only a single split (e.g. only the validation data). Other split will be skipped if not present.
- Add
-
And of course new datasets
Thank you to all our contributors for improving TFDS!