fugue.dataset#
fugue.dataset.api#
- fugue.dataset.api.show(data, n=10, with_count=False, title=None)[source]#
Display the Dataset
- Parameters
data (AnyDataset) – the dataset that can be recognized by Fugue
n (int) – number of rows to print, defaults to 10
with_count (bool) – whether to show dataset count, defaults to False
title (Optional[str]) – title of the dataset, defaults to None
- Return type
None
Note
When
with_count
is True, it can trigger expensive calculation for a distributed dataframe. So if you call this function directly, you may need tofugue.execution.execution_engine.ExecutionEngine.persist()
the dataset.
fugue.dataset.dataset#
- class fugue.dataset.dataset.Dataset[source]#
Bases:
ABC
The base class of Fugue
DataFrame
andBag
.Note
This is for internal use only.
- assert_not_empty()[source]#
Assert this dataframe is not empty
- Raises
FugueDatasetEmptyError – if it is empty
- Return type
None
- abstract property empty: bool#
Whether this dataframe is empty
- property has_metadata: bool#
Whether this dataframe contains any metadata
- abstract property is_bounded: bool#
Whether this dataframe is bounded
- abstract property is_local: bool#
Whether this dataframe is a local Dataset
- abstract property native: Any#
The native object this Dataset class wraps
- abstract property num_partitions: int#
Number of physical partitions of this dataframe. Please read the Partition Tutorial
- show(n=10, with_count=False, title=None)[source]#
Display the Dataset
- Parameters
n (int) – number of rows to print, defaults to 10
with_count (bool) – whether to show dataset count, defaults to False
title (Optional[str]) – title of the dataset, defaults to None
- Return type
None
Note
When
with_count
is True, it can trigger expensive calculation for a distributed dataframe. So if you call this function directly, you may need tofugue.execution.execution_engine.ExecutionEngine.persist()
the dataset.
- class fugue.dataset.dataset.DatasetDisplay(ds)[source]#
Bases:
ABC
The base class for display handlers of
Dataset
- Parameters
ds (Dataset) – the Dataset
- repr()[source]#
The string representation of the
Dataset
- Returns
the string representation
- Return type
str