fugue.dataframe

fugue.dataframe.array_dataframe

class fugue.dataframe.array_dataframe.ArrayDataFrame(df=None, schema=None, metadata=None)[source]

Bases: fugue.dataframe.dataframe.LocalBoundedDataFrame

DataFrame that wraps native python 2-dimensional arrays. Please read the DataFrame Tutorial to understand the concept

Parameters
  • df (Any) – 2-dimensional array, iterable of arrays, or DataFrame

  • schema (Any) – Schema like object

  • metadata (Any) – dict-like object with string keys, default None

Examples

>>> a = ArrayDataFrame([[0,'a'],[1,'b']],"a:int,b:str")
>>> b = ArrayDataFrame(a)
alter_columns(columns)[source]

Change column types

Parameters

columns (Any) – Schema like object, all columns should be contained by the dataframe schema

Returns

a new dataframe with altered columns, the order of the original schema will not change

Return type

fugue.dataframe.dataframe.DataFrame

as_array(columns=None, type_safe=False)[source]

Convert to 2-dimensional native python array

Parameters
  • columns (Optional[List[str]]) – columns to extract, defaults to None

  • type_safe (bool) – whether to ensure output conforms with its schema, defaults to False

Returns

2-dimensional native python array

Return type

List[Any]

Note

If type_safe is False, then the returned values are ‘raw’ values.

as_array_iterable(columns=None, type_safe=False)[source]

Convert to iterable of native python arrays

Parameters
  • columns (Optional[List[str]]) – columns to extract, defaults to None

  • type_safe (bool) – whether to ensure output conforms with its schema, defaults to False

Returns

iterable of native python arrays

Return type

Iterable[Any]

Note

If type_safe is False, then the returned values are ‘raw’ values.

count()[source]

Get number of rows of this dataframe

Return type

int

property empty: bool

Whether this dataframe is empty

property native: List[Any]

2-dimensional native python array

peek_array()[source]

Peek the first row of the dataframe as array

Raises

FugueDataFrameEmptyError – if it is empty

Return type

Any

rename(columns)[source]

Rename the dataframe using a mapping dict

Parameters

columns (Dict[str, str]) – key: the original column name, value: the new name

Returns

a new dataframe with the new names

Return type

fugue.dataframe.dataframe.DataFrame

fugue.dataframe.arrow_dataframe

class fugue.dataframe.arrow_dataframe.ArrowDataFrame(df=None, schema=None, metadata=None, pandas_df_wrapper=False)[source]

Bases: fugue.dataframe.dataframe.LocalBoundedDataFrame

DataFrame that wraps pyarrow.Table. Please also read the DataFrame Tutorial to understand this Fugue concept

Parameters
  • df (Any) – 2-dimensional array, iterable of arrays, pyarrow.Table or pandas DataFrame

  • schema (Any) – Schema like object

  • metadata (Any) – dict-like object with string keys, default None

  • pandas_df_wrapper (bool) –

Examples

>>> ArrowDataFrame([[0,'a'],[1,'b']],"a:int,b:str")
>>> ArrowDataFrame(schema = "a:int,b:int")  # empty dataframe
>>> ArrowDataFrame(pd.DataFrame([[0]],columns=["a"]))
>>> ArrowDataFrame(ArrayDataFrame([[0]],"a:int).as_arrow())
alter_columns(columns)[source]

Change column types

Parameters

columns (Any) – Schema like object, all columns should be contained by the dataframe schema

Returns

a new dataframe with altered columns, the order of the original schema will not change

Return type

fugue.dataframe.dataframe.DataFrame

as_array(columns=None, type_safe=False)[source]

Convert to 2-dimensional native python array

Parameters
  • columns (Optional[List[str]]) – columns to extract, defaults to None

  • type_safe (bool) – whether to ensure output conforms with its schema, defaults to False

Returns

2-dimensional native python array

Return type

List[Any]

Note

If type_safe is False, then the returned values are ‘raw’ values.

as_array_iterable(columns=None, type_safe=False)[source]

Convert to iterable of native python arrays

Parameters
  • columns (Optional[List[str]]) – columns to extract, defaults to None

  • type_safe (bool) – whether to ensure output conforms with its schema, defaults to False

Returns

iterable of native python arrays

Return type

Iterable[Any]

Note

If type_safe is False, then the returned values are ‘raw’ values.

as_arrow(type_safe=False)[source]

Convert to pyArrow DataFrame

Parameters

type_safe (bool) –

Return type

pyarrow.lib.Table

as_pandas()[source]

Convert to pandas DataFrame

Return type

pandas.core.frame.DataFrame

count()[source]

Get number of rows of this dataframe

Return type

int

property empty: bool

Whether this dataframe is empty

property native: pyarrow.lib.Table

pyarrow.Table

peek_array()[source]

Peek the first row of the dataframe as array

Raises

FugueDataFrameEmptyError – if it is empty

Return type

Any

rename(columns)[source]

Rename the dataframe using a mapping dict

Parameters

columns (Dict[str, str]) – key: the original column name, value: the new name

Returns

a new dataframe with the new names

Return type

fugue.dataframe.dataframe.DataFrame

fugue.dataframe.dataframe

class fugue.dataframe.dataframe.DataFrame(schema=None, metadata=None)[source]

Bases: abc.ABC

Base class of Fugue DataFrame. Please read the DataFrame Tutorial to understand the concept

Parameters
  • schema (Any) – Schema like object

  • metadata (Any) – dict-like object with string keys, default None

Note

This is an abstract class, and normally you don’t construct it by yourself unless you are implementing a new ExecutionEngine

abstract alter_columns(columns)[source]

Change column types

Parameters

columns (Any) – Schema like object, all columns should be contained by the dataframe schema

Returns

a new dataframe with altered columns, the order of the original schema will not change

Return type

fugue.dataframe.dataframe.DataFrame

abstract as_array(columns=None, type_safe=False)[source]

Convert to 2-dimensional native python array

Parameters
  • columns (Optional[List[str]]) – columns to extract, defaults to None

  • type_safe (bool) – whether to ensure output conforms with its schema, defaults to False

Returns

2-dimensional native python array

Return type

List[Any]

Note

If type_safe is False, then the returned values are ‘raw’ values.

abstract as_array_iterable(columns=None, type_safe=False)[source]

Convert to iterable of native python arrays

Parameters
  • columns (Optional[List[str]]) – columns to extract, defaults to None

  • type_safe (bool) – whether to ensure output conforms with its schema, defaults to False

Returns

iterable of native python arrays

Return type

Iterable[Any]

Note

If type_safe is False, then the returned values are ‘raw’ values.

as_arrow(type_safe=False)[source]

Convert to pyArrow DataFrame

Parameters

type_safe (bool) –

Return type

pyarrow.lib.Table

as_dict_iterable(columns=None)[source]

Convert to iterable of native python dicts

Parameters

columns (Optional[List[str]]) – columns to extract, defaults to None

Returns

iterable of native python dicts

Return type

Iterable[Dict[str, Any]]

Note

The default implementation enforces type_safe True

abstract as_local()[source]

Convert this dataframe to a LocalDataFrame

Return type

fugue.dataframe.dataframe.LocalDataFrame

as_pandas()[source]

Convert to pandas DataFrame

Return type

pandas.core.frame.DataFrame

assert_not_empty()[source]

Assert this dataframe is not empty

Raises

FugueDataFrameEmptyError – if it is empty

Return type

None

abstract count()[source]

Get number of rows of this dataframe

Return type

int

drop(columns)[source]

Drop certain columns and return a new dataframe

Parameters

columns (List[str]) – columns to drop

Raises

FugueDataFrameOperationError – if columns are not strictly contained by this dataframe, or it is the entire dataframe columns

Returns

a new dataframe removing the columns

Return type

fugue.dataframe.dataframe.DataFrame

abstract property empty: bool

Whether this dataframe is empty

get_info_str()[source]

Get dataframe information (schema, type, metadata) as json string

Returns

json string

Return type

str

head(n, columns=None)[source]

Get first n rows of the dataframe as 2-dimensional array

Parameters
  • n (int) – number of rows

  • columns (Optional[List[str]]) – selected columns, defaults to None (all columns)

Returns

2-dimensional array

Return type

List[Any]

abstract property is_bounded: bool

Whether this dataframe is bounded

property is_local: bool

Whether this dataframe is a LocalDataFrame

property metadata: triad.collections.dict.ParamDict

Metadata of the dataframe

abstract property num_partitions: int

Number of physical partitions of this dataframe. Please read the Partition Tutorial

abstract peek_array()[source]

Peek the first row of the dataframe as array

Raises

FugueDataFrameEmptyError – if it is empty

Return type

Any

peek_dict()[source]

Peek the first row of the dataframe as dict

Raises

FugueDataFrameEmptyError – if it is empty

Return type

Dict[str, Any]

abstract rename(columns)[source]

Rename the dataframe using a mapping dict

Parameters

columns (Dict[str, str]) – key: the original column name, value: the new name

Returns

a new dataframe with the new names

Return type

fugue.dataframe.dataframe.DataFrame

property schema: triad.collections.schema.Schema

Schema of the dataframe

show(rows=10, show_count=False, title=None, best_width=100)[source]

Print the dataframe to console

Parameters
  • rows (int) – number of rows to print, defaults to 10

  • show_count (bool) – whether to show dataframe count, defaults to False

  • title (Optional[str]) – title of the dataframe, defaults to None

  • best_width (int) – max width of the output table, defaults to 100

Return type

None

Note

When show_count is True, it can trigger expensive calculation for a distributed dataframe. So if you call this function directly, you may need to fugue.execution.execution_engine.ExecutionEngine.persist() the dataframe.

class fugue.dataframe.dataframe.LocalBoundedDataFrame(schema=None, metadata=None)[source]

Bases: fugue.dataframe.dataframe.LocalDataFrame

Base class of all local bounded dataframes. Please read this to understand the concept

Parameters
  • schema (Any) – Schema like object

  • metadata (Any) – dict-like object with string keys, default None

Note

This is an abstract class, and normally you don’t construct it by yourself unless you are implementing a new ExecutionEngine

property is_bounded: bool

Always True because it’s a bounded dataframe

class fugue.dataframe.dataframe.LocalDataFrame(schema=None, metadata=None)[source]

Bases: fugue.dataframe.dataframe.DataFrame

Base class of all local dataframes. Please read this to understand the concept

Parameters
  • schema (Any) – a schema-like object

  • metadata (Any) – dict-like object with string keys, default None

Note

This is an abstract class, and normally you don’t construct it by yourself unless you are implementing a new ExecutionEngine

as_local()[source]

Always return self, because it’s a LocalDataFrame

Return type

fugue.dataframe.dataframe.LocalDataFrame

property is_local: bool

Always True because it’s a LocalDataFrame

property num_partitions: int

Always 1 because it’s a LocalDataFrame

class fugue.dataframe.dataframe.LocalUnboundedDataFrame(schema=None, metadata=None)[source]

Bases: fugue.dataframe.dataframe.LocalDataFrame

Base class of all local unbounded dataframes. Read this <https://fugue-tutorials.readthedocs.io/ en/latest/tutorials/advanced/schema_dataframes.html#DataFrame>`_ to understand the concept

Parameters
  • schema (Any) – Schema like object

  • metadata (Any) – dict-like object with string keys, default None

Note

This is an abstract class, and normally you don’t construct it by yourself unless you are implementing a new ExecutionEngine

count()[source]
Raises

InvalidOperationError – You can’t count an unbounded dataframe

Return type

int

property is_bounded

Always False because it’s an unbounded dataframe

class fugue.dataframe.dataframe.YieldedDataFrame(yid)[source]

Bases: fugue.collections.yielded.Yielded

Yielded dataframe from FugueWorkflow. Users shouldn’t create this object directly.

Parameters

yid (str) – unique id for determinism

property is_set: bool

Whether the value is set. It can be false if the parent workflow has not been executed.

property result: fugue.dataframe.dataframe.DataFrame

The yielded dataframe, it will be set after the parent workflow is computed

set_value(df)[source]

Set the yielded dataframe after compute. Users should not call it.

Parameters
Return type

None

fugue.dataframe.dataframe_iterable_dataframe

class fugue.dataframe.dataframe_iterable_dataframe.LocalDataFrameIterableDataFrame(df=None, schema=None, metadata=None)[source]

Bases: fugue.dataframe.dataframe.LocalUnboundedDataFrame

DataFrame that wraps an iterable of local dataframes

Parameters
  • df (Any) – an iterable of DataFrame. If any is not local, they will be converted to LocalDataFrame by as_local()

  • schema (Any) – Schema like object, if it is provided, it must match the schema of the dataframes

  • metadata (Any) – dict-like object with string keys, default None

Examples

def get_dfs(seq):
    yield IterableDataFrame([], "a:int,b:int")
    yield IterableDataFrame([[1, 10]], "a:int,b:int")
    yield ArrayDataFrame([], "a:int,b:str")

df = LocalDataFrameIterableDataFrame(get_dfs())
for subdf in df.native:
    subdf.show()

Note

It’s ok to peek the dataframe, it will not affect the iteration, but it’s invalid to count.

schema can be used when the iterable contains no dataframe. But if there is any dataframe, schema must match the schema of the dataframes.

For the iterable of dataframes, if there is any empty dataframe, they will be skipped and their schema will not matter. However, if all dataframes in the interable are empty, then the last empty dataframe will be used to set the schema.

alter_columns(columns)[source]

Change column types

Parameters

columns (Any) – Schema like object, all columns should be contained by the dataframe schema

Returns

a new dataframe with altered columns, the order of the original schema will not change

Return type

fugue.dataframe.dataframe.DataFrame

as_array(columns=None, type_safe=False)[source]

Convert to 2-dimensional native python array

Parameters
  • columns (Optional[List[str]]) – columns to extract, defaults to None

  • type_safe (bool) – whether to ensure output conforms with its schema, defaults to False

Returns

2-dimensional native python array

Return type

List[Any]

Note

If type_safe is False, then the returned values are ‘raw’ values.

as_array_iterable(columns=None, type_safe=False)[source]

Convert to iterable of native python arrays

Parameters
  • columns (Optional[List[str]]) – columns to extract, defaults to None

  • type_safe (bool) – whether to ensure output conforms with its schema, defaults to False

Returns

iterable of native python arrays

Return type

Iterable[Any]

Note

If type_safe is False, then the returned values are ‘raw’ values.

as_arrow(type_safe=False)[source]

Convert to pyArrow DataFrame

Parameters

type_safe (bool) –

Return type

pyarrow.lib.Table

as_pandas()[source]

Convert to pandas DataFrame

Return type

pandas.core.frame.DataFrame

property empty: bool

Whether this dataframe is empty

property native: triad.utils.iter.EmptyAwareIterable[fugue.dataframe.dataframe.LocalDataFrame]

Iterable of dataframes

peek_array()[source]

Peek the first row of the dataframe as array

Raises

FugueDataFrameEmptyError – if it is empty

Return type

Any

rename(columns)[source]

Rename the dataframe using a mapping dict

Parameters

columns (Dict[str, str]) – key: the original column name, value: the new name

Returns

a new dataframe with the new names

Return type

fugue.dataframe.dataframe.DataFrame

fugue.dataframe.dataframes

class fugue.dataframe.dataframes.DataFrames(*args, **kwargs)[source]

Bases: triad.collections.dict.IndexedOrderedDict[str, fugue.dataframe.dataframe.DataFrame]

Ordered dictionary of DataFrames. There are two modes: with keys and without keys. If without key _<n> will be used as the key for each dataframe, and it will be treated as an array in Fugue framework.

It’s a subclass of dict, so it supports all dict operations. It’s also ordered, so you can trust the order of keys and values.

The initialization is flexible

>>> df1 = ArrayDataFrame([[0]],"a:int")
>>> df2 = ArrayDataFrame([[1]],"a:int")
>>> dfs = DataFrames(df1,df2)  # init as [df1, df2]
>>> assert not dfs.has_key
>>> assert df1 is dfs[0] and df2 is dfs[1]
>>> dfs_array = list(dfs.values())
>>> dfs = DataFrames(a=df1,b=df2)  # init as {a:df1, b:df2}
>>> assert dfs.has_key
>>> assert df1 is dfs[0] and df2 is dfs[1]  # order is guaranteed
>>> df3 = ArrayDataFrame([[1]],"b:int")
>>> dfs2 = DataFrames(dfs, c=df3)  # {a:df1, b:df2, c:df3}
>>> dfs2 = DataFrames(dfs, df3)  # invalid, because dfs has key, df3 doesn't
>>> dfs2 = DataFrames(dict(a=df1,b=df2))  # init as {a:df1, b:df2}
>>> dfs2 = DataFrames([df1,df2],df3)  # init as [df1, df2, df3]
Parameters
  • args (Any) –

  • kwargs (Any) –

convert(func)[source]

Create another DataFrames with the same structure, but all converted by func

Returns

the new DataFrames

Parameters

func (Callable[[fugue.dataframe.dataframe.DataFrame], fugue.dataframe.dataframe.DataFrame]) –

Return type

fugue.dataframe.dataframes.DataFrames

Examples

>>> dfs2 = dfs.convert(lambda df: df.as_local()) # convert all to local
property has_key

If this collection has key (dict-like) or not (list-like)

fugue.dataframe.iterable_dataframe

class fugue.dataframe.iterable_dataframe.IterableDataFrame(df=None, schema=None, metadata=None)[source]

Bases: fugue.dataframe.dataframe.LocalUnboundedDataFrame

DataFrame that wraps native python iterable of arrays. Please read the DataFrame Tutorial to understand the concept

Parameters
  • df (Any) – 2-dimensional array, iterable of arrays, or DataFrame

  • schema (Any) – Schema like object

  • metadata (Any) – dict-like object with string keys, default None

Examples

>>> a = IterableDataFrame([[0,'a'],[1,'b']],"a:int,b:str")
>>> b = IterableDataFrame(a)

Note

It’s ok to peek the dataframe, it will not affect the iteration, but it’s invalid operation to count

alter_columns(columns)[source]

Change column types

Parameters

columns (Any) – Schema like object, all columns should be contained by the dataframe schema

Returns

a new dataframe with altered columns, the order of the original schema will not change

Return type

fugue.dataframe.dataframe.DataFrame

as_array(columns=None, type_safe=False)[source]

Convert to 2-dimensional native python array

Parameters
  • columns (Optional[List[str]]) – columns to extract, defaults to None

  • type_safe (bool) – whether to ensure output conforms with its schema, defaults to False

Returns

2-dimensional native python array

Return type

List[Any]

Note

If type_safe is False, then the returned values are ‘raw’ values.

as_array_iterable(columns=None, type_safe=False)[source]

Convert to iterable of native python arrays

Parameters
  • columns (Optional[List[str]]) – columns to extract, defaults to None

  • type_safe (bool) – whether to ensure output conforms with its schema, defaults to False

Returns

iterable of native python arrays

Return type

Iterable[Any]

Note

If type_safe is False, then the returned values are ‘raw’ values.

property empty: bool

Whether this dataframe is empty

property native: triad.utils.iter.EmptyAwareIterable[Any]

Iterable of native python arrays

peek_array()[source]

Peek the first row of the dataframe as array

Raises

FugueDataFrameEmptyError – if it is empty

Return type

Any

rename(columns)[source]

Rename the dataframe using a mapping dict

Parameters

columns (Dict[str, str]) – key: the original column name, value: the new name

Returns

a new dataframe with the new names

Return type

fugue.dataframe.dataframe.DataFrame

fugue.dataframe.pandas_dataframe

class fugue.dataframe.pandas_dataframe.PandasDataFrame(df=None, schema=None, metadata=None, pandas_df_wrapper=False)[source]

Bases: fugue.dataframe.dataframe.LocalBoundedDataFrame

DataFrame that wraps pandas DataFrame. Please also read the DataFrame Tutorial to understand this Fugue concept

Parameters
  • df (Any) – 2-dimensional array, iterable of arrays or pandas DataFrame

  • schema (Any) – Schema like object

  • metadata (Any) – dict-like object with string keys, default None

  • pandas_df_wrapper (bool) – if this is a simple wrapper, default False

Examples

>>> PandasDataFrame([[0,'a'],[1,'b']],"a:int,b:str")
>>> PandasDataFrame(schema = "a:int,b:int")  # empty dataframe
>>> PandasDataFrame(pd.DataFrame([[0]],columns=["a"]))
>>> PandasDataFrame(ArrayDataFrame([[0]],"a:int).as_pandas())

Note

If pandas_df_wrapper is True, then the constructor will not do any type check otherwise, it will enforce type according to the input schema after the construction

alter_columns(columns)[source]

Change column types

Parameters

columns (Any) – Schema like object, all columns should be contained by the dataframe schema

Returns

a new dataframe with altered columns, the order of the original schema will not change

Return type

fugue.dataframe.dataframe.DataFrame

as_array(columns=None, type_safe=False)[source]

Convert to 2-dimensional native python array

Parameters
  • columns (Optional[List[str]]) – columns to extract, defaults to None

  • type_safe (bool) – whether to ensure output conforms with its schema, defaults to False

Returns

2-dimensional native python array

Return type

List[Any]

Note

If type_safe is False, then the returned values are ‘raw’ values.

as_array_iterable(columns=None, type_safe=False)[source]

Convert to iterable of native python arrays

Parameters
  • columns (Optional[List[str]]) – columns to extract, defaults to None

  • type_safe (bool) – whether to ensure output conforms with its schema, defaults to False

Returns

iterable of native python arrays

Return type

Iterable[Any]

Note

If type_safe is False, then the returned values are ‘raw’ values.

as_pandas()[source]

Convert to pandas DataFrame

Return type

pandas.core.frame.DataFrame

count()[source]

Get number of rows of this dataframe

Return type

int

property empty: bool

Whether this dataframe is empty

head(n, columns=None)[source]

Get first n rows of the dataframe as 2-dimensional array

Parameters
  • n (int) – number of rows

  • columns (Optional[List[str]]) – selected columns, defaults to None (all columns)

Returns

2-dimensional array

Return type

List[Any]

property native: pandas.core.frame.DataFrame

Pandas DataFrame

peek_array()[source]

Peek the first row of the dataframe as array

Raises

FugueDataFrameEmptyError – if it is empty

Return type

Any

rename(columns)[source]

Rename the dataframe using a mapping dict

Parameters

columns (Dict[str, str]) – key: the original column name, value: the new name

Returns

a new dataframe with the new names

Return type

fugue.dataframe.dataframe.DataFrame

fugue.dataframe.utils

fugue.dataframe.utils.deserialize_df(json_str, fs=None)[source]

Deserialize json string to LocalBoundedDataFrame

Parameters
Raises

ValueError – if the json string is invalid, not generated from serialize_df()

Returns

LocalBoundedDataFrame if json_str contains a dataframe or None if its valid but contains no data

Return type

Optional[fugue.dataframe.dataframe.LocalBoundedDataFrame]

fugue.dataframe.utils.get_join_schemas(df1, df2, how, on)[source]

Get Schema object after joining df1 and df2. If on is not empty, it’s mainly for validation purpose.

Parameters
  • df1 (fugue.dataframe.dataframe.DataFrame) – first dataframe

  • df2 (fugue.dataframe.dataframe.DataFrame) – second dataframe

  • how (str) – can accept semi, left_semi, anti, left_anti, inner, left_outer, right_outer, full_outer, cross

  • on (Iterable[str]) – it can always be inferred, but if you provide, it will be validated agained the inferred keys.

Returns

the pair key schema and schema after join

Return type

Tuple[triad.collections.schema.Schema, triad.collections.schema.Schema]

Note

In Fugue, joined schema can always be inferred because it always uses the input dataframes’ common keys as the join keys. So you must make sure to rename() to input dataframes so they follow this rule.

fugue.dataframe.utils.pickle_df(df)[source]

Pickles a dataframe to bytes array. It firstly converts the dataframe using to_local_bounded_df(), and then serialize the underlying data.

Parameters

df (fugue.dataframe.dataframe.DataFrame) – input DataFrame

Returns

pickled binary data

Return type

bytes

Note

Be careful to use on large dataframes or non-local, un-materialized dataframes, it can be slow. You should always use unpickle_df() to deserialize.

fugue.dataframe.utils.serialize_df(df, threshold=- 1, file_path=None, fs=None)[source]

Serialize input dataframe to base64 string or to file if it’s larger than threshold

Parameters
Raises

InvalidOperationError – if file is large but file_path is not provided

Returns

a json string either containing the base64 data or the file path

Return type

str

Note

If fs is not provided but it needs to write to disk, then it will use open_fs() to try to open the file to write.

fugue.dataframe.utils.to_local_bounded_df(df, schema=None, metadata=None)[source]

Convert a data structure to LocalBoundedDataFrame

Parameters
  • df (Any) – DataFrame, pandas DataFramme and list or iterable of arrays

  • schema (Optional[Any]) – Schema like object, defaults to None, it should not be set for DataFrame type

  • metadata (Optional[Any]) – dict-like object with string keys, defaults to None

Raises
  • ValueError – if df is DataFrame but you set schema or metadata

  • TypeError – if df is not compatible

Returns

the dataframe itself if it’s LocalBoundedDataFrame else a converted one

Return type

fugue.dataframe.dataframe.LocalBoundedDataFrame

Examples

>>> a = IterableDataFrame([[0,'a'],[1,'b']],"a:int,b:str")
>>> assert isinstance(to_local_bounded_df(a), LocalBoundedDataFrame)
>>> to_local_bounded_df(SparkDataFrame([[0,'a'],[1,'b']],"a:int,b:str"))

Note

Compared to to_local_df(), this function makes sure the dataframe is also bounded, so IterableDataFrame will be converted although it’s local.

fugue.dataframe.utils.to_local_df(df, schema=None, metadata=None)[source]

Convert a data structure to LocalDataFrame

Parameters
  • df (Any) – DataFrame, pandas DataFramme and list or iterable of arrays

  • schema (Optional[Any]) – Schema like object, defaults to None, it should not be set for DataFrame type

  • metadata (Optional[Any]) – dict-like object with string keys, defaults to None

Raises
  • ValueError – if df is DataFrame but you set schema or metadata

  • TypeError – if df is not compatible

Returns

the dataframe itself if it’s LocalDataFrame else a converted one

Return type

fugue.dataframe.dataframe.LocalDataFrame

Examples

>>> a = to_local_df([[0,'a'],[1,'b']],"a:int,b:str")
>>> assert to_local_df(a) is a
>>> to_local_df(SparkDataFrame([[0,'a'],[1,'b']],"a:int,b:str"))
fugue.dataframe.utils.unpickle_df(stream)[source]

Unpickles a dataframe from bytes array.

Parameters

stream (bytes) – binary data

Returns

unpickled dataframe

Return type

fugue.dataframe.dataframe.LocalBoundedDataFrame

Note

The data must be serialized by pickle_df() to deserialize.