Skip to content

Python & Spark Helpers

Most teams are happy using the Builder UI and the YAML files it produces. If you want to automate things in Python or inside a Spark notebook, Polymo exposes a small helper module. The code is short and friendly—you can copy/paste it as-is.

Loading a connector in Spark

from pyspark.sql import SparkSession
from polymo import ApiReader

spark = SparkSession.builder.getOrCreate()
spark.dataSource.register(ApiReader)

df = (
    spark.read.format("polymo")
    .option("config_path", "./config.yml")  # YAML file you saved earlier
    .option("token", "YOUR_TOKEN")          # Only if the API needs one
    .options(owner="dan1elt0m", repo="polymo")  # Extra values used in templates
    .load()
)

df.show()

Structured Streaming works out of the box:

stream_df = (
    spark.readStream.format("polymo")
    .option("config_path", "./config.yml")
    .option("stream_batch_size", 100)
    .option("stream_progress_path", "/tmp/polymo-progress.json")
    .load()
)

query = stream_df.writeStream.format("memory").outputMode("append").queryName("polymo")
query.start()

Key ideas: - config_path points to the YAML file. - Pass sensitive values (tokens, keys) with .option(...) so they never touch the config file. - You can add as many .option("name", "value") calls as you need. They show up inside templates as {{ options.name }}.

The ApiReader class is what tells Spark how to interpret the config. Register it once per Spark session and you are good to go.