fugue.dataset
fugue.dataset.api
- fugue.dataset.api.show(data, n=10, with_count=False, title=None)[source]
Display the Dataset
- Parameters:
data (AnyDataset) – the dataset that can be recognized by Fugue
n (int) – number of rows to print, defaults to 10
with_count (bool) – whether to show dataset count, defaults to False
title (str | None) – title of the dataset, defaults to None
- Return type:
None
Note
When
with_count
is True, it can trigger expensive calculation for a distributed dataframe. So if you call this function directly, you may need tofugue.execution.execution_engine.ExecutionEngine.persist()
the dataset.
fugue.dataset.dataset
- class fugue.dataset.dataset.Dataset[source]
Bases:
ABC
The base class of Fugue
DataFrame
andBag
.Note
This is for internal use only.
- assert_not_empty()[source]
Assert this dataframe is not empty
- Raises:
FugueDatasetEmptyError – if it is empty
- Return type:
None
- abstract property empty: bool
Whether this dataframe is empty
- property has_metadata: bool
Whether this dataframe contains any metadata
- abstract property is_bounded: bool
Whether this dataframe is bounded
- abstract property is_local: bool
Whether this dataframe is a local Dataset
- abstract property native: Any
The native object this Dataset class wraps
- abstract property num_partitions: int
Number of physical partitions of this dataframe. Please read the Partition Tutorial
- show(n=10, with_count=False, title=None)[source]
Display the Dataset
- Parameters:
n (int) – number of rows to print, defaults to 10
with_count (bool) – whether to show dataset count, defaults to False
title (str | None) – title of the dataset, defaults to None
- Return type:
None
Note
When
with_count
is True, it can trigger expensive calculation for a distributed dataframe. So if you call this function directly, you may need tofugue.execution.execution_engine.ExecutionEngine.persist()
the dataset.
- class fugue.dataset.dataset.DatasetDisplay(ds)[source]
Bases:
ABC
The base class for display handlers of
Dataset
- Parameters:
ds (Dataset) – the Dataset
- repr()[source]
The string representation of the
Dataset
- Returns:
the string representation
- Return type:
str