fugue_ibis

fugue_ibis.execution
- fugue_ibis.execution.ibis_engine
- fugue_ibis.execution.pandas_backend

fugue_ibis.dataframe

class fugue_ibis.dataframe.IbisDataFrame(table, schema=None)[source]

Bases: DataFrame

DataFrame that wraps Ibis Table.

Parameters:

rel – DuckDBPyRelation object
table (Table)
schema (Any)

alter_columns(columns)[source]

Change column types

Parameters:: columns (Any) – Schema like object, all columns should be contained by the dataframe schema
Returns:: a new dataframe with altered columns, the order of the original schema will not change
Return type:: DataFrame

as_array(columns=None, type_safe=False)[source]

Convert to 2-dimensional native python array

Parameters:

columns (List[str] | None) – columns to extract, defaults to None
type_safe (bool) – whether to ensure output conforms with its schema, defaults to False

Returns:

2-dimensional native python array

Return type:

List[Any]

Note

If type_safe is False, then the returned values are ‘raw’ values.

as_array_iterable(columns=None, type_safe=False)[source]

Convert to iterable of native python arrays

Parameters:

columns (List[str] | None) – columns to extract, defaults to None
type_safe (bool) – whether to ensure output conforms with its schema, defaults to False

Returns:

iterable of native python arrays

Return type:

Iterable[Any]

Note

If type_safe is False, then the returned values are ‘raw’ values.

as_arrow(type_safe=False)[source]

Convert to pyArrow DataFrame

Parameters:: type_safe (bool)
Return type:: Table

as_dict_iterable(columns=None)[source]

Convert to iterable of python dicts

Parameters:: columns (List[str] | None) – columns to extract, defaults to None
Returns:: iterable of python dicts
Return type:: Iterable[Dict[str, Any]]

Note

The default implementation enforces type_safe True

as_dicts(columns=None)[source]

Convert to a list of python dicts

Parameters:: columns (List[str] | None) – columns to extract, defaults to None
Returns:: a list of python dicts
Return type:: List[Dict[str, Any]]

Note

The default implementation enforces type_safe True

as_local_bounded()[source]

Convert this dataframe to a LocalBoundedDataFrame

Return type:: LocalBoundedDataFrame

as_pandas()[source]

Convert to pandas DataFrame

Return type:: DataFrame

property columns: List[str]: The column names of the dataframe

count()[source]

Get number of rows of this dataframe

Return type:: int

property empty: bool: Whether this dataframe is empty

head(n, columns=None)[source]

Get first n rows of the dataframe as a new local bounded dataframe

Parameters:

n (int) – number of rows
columns (List[str] | None) – selected columns, defaults to None (all columns)

Returns:

a local bounded dataframe

Return type:

LocalBoundedDataFrame

property is_bounded: bool: Whether this dataframe is bounded

property is_local: bool: Whether this dataframe is a local Dataset

property native: Table: Ibis Table object

native_as_df()[source]

The dataframe form of the native object this Dataset class wraps. Dataframe form means the object contains schema information. For example the native an ArrayDataFrame is a python array, it doesn’t contain schema information, and its native_as_df should be either a pandas dataframe or an arrow dataframe.

Return type:: Table

property num_partitions: int: Number of physical partitions of this dataframe. Please read the Partition Tutorial

peek_array()[source]

Peek the first row of the dataframe as array

Raises:: FugueDatasetEmptyError – if it is empty
Return type:: List[Any]

rename(columns)[source]

Rename the dataframe using a mapping dict

Parameters:: columns (Dict[str, str]) – key: the original column name, value: the new name
Returns:: a new dataframe with the new names
Return type:: DataFrame

abstract to_sql()[source]

Compile IbisTable to SQL

Return type:: str

fugue_ibis.execution_engine

class fugue_ibis.execution_engine.IbisExecutionEngine(conf)[source]

Bases: ExecutionEngine

The base execution engine using Ibis. Please read the ExecutionEngine Tutorial to understand this important Fugue concept

Parameters:: conf (Any) – Parameters like object, read the Fugue Configuration Tutorial to learn Fugue specific options

broadcast(df)[source]

Broadcast the dataframe to all workers for a distributed computing framework

Parameters:: df (DataFrame) – the input dataframe
Returns:: the broadcasted dataframe
Return type:: DataFrame

create_default_map_engine()[source]

Default MapEngine if user doesn’t specify

Return type:: MapEngine

abstract create_non_ibis_execution_engine()[source]

Create the execution engine that handles operations beyond SQL

Return type:: ExecutionEngine

distinct(df)[source]

Equivalent to SELECT DISTINCT * FROM df

Parameters:: df (DataFrame) – dataframe
Returns:: [description]
Return type:: DataFrame

dropna(df, how='any', thresh=None, subset=None)[source]

Drop NA recods from dataframe

Parameters:

df (DataFrame) – DataFrame
how (str) – ‘any’ or ‘all’. ‘any’ drops rows that contain any nulls. ‘all’ drops rows that contain all nulls.
thresh (int | None) – int, drops rows that have less than thresh non-null values
subset (List[str] | None) – list of columns to operate on

Returns:

DataFrame with NA records dropped

Return type:

DataFrame

fillna(df, value, subset=None)[source]

Fill NULL, NAN, NAT values in a dataframe

Parameters:

df (DataFrame) – DataFrame
value (Any) – if scalar, fills all columns with same value. if dictionary, fills NA using the keys as column names and the values as the replacement values.
subset (List[str] | None) – list of columns to operate on. ignored if value is a dictionary

Returns:

DataFrame with NA records filled

Return type:

DataFrame

get_current_parallelism()[source]

Get the current number of parallelism of this engine

Return type:: int

property ibis_sql_engine: IbisSQLEngine

intersect(df1, df2, distinct=True)[source]

Intersect df1 and df2

Parameters:

df1 (DataFrame) – the first dataframe
df2 (DataFrame) – the second dataframe
distinct (bool) – true for INTERSECT (== INTERSECT DISTINCT), false for INTERSECT ALL

Returns:

the unioned dataframe

Return type:

fugue_ibis

fugue_ibis.dataframe

fugue_ibis.execution_engine

fugue_ibis.extensions