fugue.extensions.creator#
fugue.extensions.creator.convert#
- fugue.extensions.creator.convert.creator(schema=None)[source]#
Decorator for creators
Please read Creator Tutorial
- Parameters
schema (Optional[Any]) –
- Return type
Callable[[Any], _FuncAsCreator]
- fugue.extensions.creator.convert.register_creator(alias, obj, on_dup=0)[source]#
Register creator with an alias. This is a simplified version of
parse_creator()
- Parameters
alias (str) – alias of the creator
obj (Any) – the object that can be converted to
Creator
on_dup (int) – see
triad.collections.dict.ParamDict.update()
, defaults toParamDict.OVERWRITE
- Return type
None
Tip
Registering an extension with an alias is particularly useful for projects such as libraries. This is because by using alias, users don’t have to import the specific extension, or provide the full path of the extension. It can make user’s code less dependent and easy to understand.
New Since
0.6.0
See also
Please read Creator Tutorial
Examples
Here is an example how you setup your project so your users can benefit from this feature. Assume your project name is
pn
The creator implementation in file
pn/pn/creators.py
import pandas import pd def my_creator() -> pd.DataFrame: return pd.DataFrame()
Then in
pn/pn/__init__.py
from .creators import my_creator from fugue import register_creator def register_extensions(): register_creator("mc", my_creator) # ... register more extensions register_extensions()
In users code:
import pn # register_extensions will be called from fugue import FugueWorkflow dag = FugueWorkflow() dag.create("mc").show() # use my_creator by alias dag.run()
fugue.extensions.creator.creator#
- class fugue.extensions.creator.creator.Creator[source]#
Bases:
ExtensionContext
,ABC
The interface is to generate single DataFrame from params. For example reading data from file should be a type of Creator. Creator is task level extension, running on driver, and execution engine aware.
To implement this class, you should not have
__init__
, please directly implement the interface functions.Note
Before implementing this class, do you really need to implement this interface? Do you know the interfaceless feature of Fugue? Implementing Creator is commonly unnecessary. You can choose the interfaceless approach which may decouple your code from Fugue.
See also
Please read Creator Tutorial
- abstract create()[source]#
Create DataFrame on driver side
Note
It runs on driver side
The output dataframe is not necessarily local, for example a SparkDataFrame
It is engine aware, you can put platform dependent code in it (for example native pyspark code) but by doing so your code may not be portable. If you only use the functions of the general ExecutionEngine interface, it’s still portable.
- Returns
result dataframe
- Return type