fugue.dataset#

fugue.dataset.api#

fugue.dataset.api.show(data, n=10, with_count=False, title=None)[source]#

Display the Dataset

Parameters
  • data (AnyDataset) – the dataset that can be recognized by Fugue

  • n (int) – number of rows to print, defaults to 10

  • with_count (bool) – whether to show dataset count, defaults to False

  • title (Optional[str]) – title of the dataset, defaults to None

Return type

None

Note

When with_count is True, it can trigger expensive calculation for a distributed dataframe. So if you call this function directly, you may need to fugue.execution.execution_engine.ExecutionEngine.persist() the dataset.

fugue.dataset.dataset#

class fugue.dataset.dataset.Dataset[source]#

Bases: ABC

The base class of Fugue DataFrame and Bag.

Note

This is for internal use only.

assert_not_empty()[source]#

Assert this dataframe is not empty

Raises

FugueDatasetEmptyError – if it is empty

Return type

None

abstract count()[source]#

Get number of rows of this dataframe

Return type

int

abstract property empty: bool#

Whether this dataframe is empty

property has_metadata: bool#

Whether this dataframe contains any metadata

abstract property is_bounded: bool#

Whether this dataframe is bounded

abstract property is_local: bool#

Whether this dataframe is a local Dataset

property metadata: ParamDict#

Metadata of the dataset

abstract property native: Any#

The native object this Dataset class wraps

abstract property num_partitions: int#

Number of physical partitions of this dataframe. Please read the Partition Tutorial

reset_metadata(metadata)[source]#

Reset metadata

Parameters

metadata (Any) –

Return type

None

show(n=10, with_count=False, title=None)[source]#

Display the Dataset

Parameters
  • n (int) – number of rows to print, defaults to 10

  • with_count (bool) – whether to show dataset count, defaults to False

  • title (Optional[str]) – title of the dataset, defaults to None

Return type

None

Note

When with_count is True, it can trigger expensive calculation for a distributed dataframe. So if you call this function directly, you may need to fugue.execution.execution_engine.ExecutionEngine.persist() the dataset.

class fugue.dataset.dataset.DatasetDisplay(ds)[source]#

Bases: ABC

The base class for display handlers of Dataset

Parameters

ds (Dataset) – the Dataset

repr()[source]#

The string representation of the Dataset

Returns

the string representation

Return type

str

repr_html()[source]#

The HTML representation of the Dataset

Returns

the HTML representation

Return type

str

abstract show(n=10, with_count=False, title=None)[source]#

Show the Dataset

Parameters
  • n (int) – top n items to display, defaults to 10

  • with_count (bool) – whether to display the total count, defaults to False

  • title (Optional[str]) – title to display, defaults to None

Return type

None