fugue.extensions.creator#

fugue.extensions.creator.convert#

fugue.extensions.creator.convert.creator(schema=None)[source]#

Decorator for creators

Please read Creator Tutorial

Parameters

schema (Optional[Any]) –

Return type

Callable[[Any], _FuncAsCreator]

fugue.extensions.creator.convert.register_creator(alias, obj, on_dup=0)[source]#

Register creator with an alias. This is a simplified version of parse_creator()

Parameters
Return type

None

Tip

Registering an extension with an alias is particularly useful for projects such as libraries. This is because by using alias, users don’t have to import the specific extension, or provide the full path of the extension. It can make user’s code less dependent and easy to understand.

New Since

0.6.0

See also

Please read Creator Tutorial

Examples

Here is an example how you setup your project so your users can benefit from this feature. Assume your project name is pn

The creator implementation in file pn/pn/creators.py

import pandas import pd

def my_creator() -> pd.DataFrame:
    return pd.DataFrame()

Then in pn/pn/__init__.py

from .creators import my_creator
from fugue import register_creator

def register_extensions():
    register_creator("mc", my_creator)
    # ... register more extensions

register_extensions()

In users code:

import pn  # register_extensions will be called
from fugue import FugueWorkflow

dag = FugueWorkflow()
dag.create("mc").show()  # use my_creator by alias
dag.run()

fugue.extensions.creator.creator#

class fugue.extensions.creator.creator.Creator[source]#

Bases: ExtensionContext, ABC

The interface is to generate single DataFrame from params. For example reading data from file should be a type of Creator. Creator is task level extension, running on driver, and execution engine aware.

To implement this class, you should not have __init__, please directly implement the interface functions.

Note

Before implementing this class, do you really need to implement this interface? Do you know the interfaceless feature of Fugue? Implementing Creator is commonly unnecessary. You can choose the interfaceless approach which may decouple your code from Fugue.

See also

Please read Creator Tutorial

abstract create()[source]#

Create DataFrame on driver side

Note

  • It runs on driver side

  • The output dataframe is not necessarily local, for example a SparkDataFrame

  • It is engine aware, you can put platform dependent code in it (for example native pyspark code) but by doing so your code may not be portable. If you only use the functions of the general ExecutionEngine interface, it’s still portable.

Returns

result dataframe

Return type

DataFrame