fugue.dataset

fugue.dataset.api

fugue.dataset.api.show(data, n=10, with_count=False, title=None)[source]

Display the Dataset

Parameters:
  • data (AnyDataset) – the dataset that can be recognized by Fugue

  • n (int) – number of rows to print, defaults to 10

  • with_count (bool) – whether to show dataset count, defaults to False

  • title (str | None) – title of the dataset, defaults to None

Return type:

None

Note

When with_count is True, it can trigger expensive calculation for a distributed dataframe. So if you call this function directly, you may need to fugue.execution.execution_engine.ExecutionEngine.persist() the dataset.

fugue.dataset.dataset

class fugue.dataset.dataset.Dataset[source]

Bases: ABC

The base class of Fugue DataFrame and Bag.

Note

This is for internal use only.

assert_not_empty()[source]

Assert this dataframe is not empty

Raises:

FugueDatasetEmptyError – if it is empty

Return type:

None

abstract count()[source]

Get number of rows of this dataframe

Return type:

int

abstract property empty: bool

Whether this dataframe is empty

property has_metadata: bool

Whether this dataframe contains any metadata

abstract property is_bounded: bool

Whether this dataframe is bounded

abstract property is_local: bool

Whether this dataframe is a local Dataset

property metadata: ParamDict

Metadata of the dataset

abstract property native: Any

The native object this Dataset class wraps

abstract property num_partitions: int

Number of physical partitions of this dataframe. Please read the Partition Tutorial

reset_metadata(metadata)[source]

Reset metadata

Parameters:

metadata (Any)

Return type:

None

show(n=10, with_count=False, title=None)[source]

Display the Dataset

Parameters:
  • n (int) – number of rows to print, defaults to 10

  • with_count (bool) – whether to show dataset count, defaults to False

  • title (str | None) – title of the dataset, defaults to None

Return type:

None

Note

When with_count is True, it can trigger expensive calculation for a distributed dataframe. So if you call this function directly, you may need to fugue.execution.execution_engine.ExecutionEngine.persist() the dataset.

class fugue.dataset.dataset.DatasetDisplay(ds)[source]

Bases: ABC

The base class for display handlers of Dataset

Parameters:

ds (Dataset) – the Dataset

repr()[source]

The string representation of the Dataset

Returns:

the string representation

Return type:

str

repr_html()[source]

The HTML representation of the Dataset

Returns:

the HTML representation

Return type:

str

abstract show(n=10, with_count=False, title=None)[source]

Show the Dataset

Parameters:
  • n (int) – top n items to display, defaults to 10

  • with_count (bool) – whether to display the total count, defaults to False

  • title (str | None) – title to display, defaults to None

Return type:

None