Ingesting files#

To make common operations such as ingesting files from your project’s source directory a little easier, palgen comes with a “magic” Pipeline helper. This helper allows palgen to automatically parallelize tasks if possible.

Essentially a Pipeline such as

pipeline = Sources() >> foo >> bar >> baz

when invoked with pipeline(iterable) would essentially expand to

You can use the @max_jobs(...) decorator from palgen.ext to control the amount of jobs a step can be run at. Additionally annotating the data parameter with list is equivalent to decorating the step with @max_jobs(1).

All steps must be functions or function objects returning an iterable, generators, a class implementing the iterator protocol or an instance of such a class. The palgen.machinery.pipelines.Pipeline class is aliased as palgen.ext.Sources for convenience. The default run(...) method will call the pipeline class attribute of your extension with a pre-filtered list of source files.

Filters and loaders#

For convenience some commonly used path filters and file loaders are implemented in palgen.ingest. All of the built-in filter and file loader steps expect an iterable of Path`s, so make sure to use only one loader as last step as it yields a tuple of the :code:`Path to the file and the file’s loaded content.

Default pipelines#

By default palgen searches for TOML files named after the extension’s name. For example an extension called error would by default ingest all files called error.toml from the source directories.

ingest = Sources() >> Suffix('toml')
                   >> Name(<extension name>)
                   >> Toml

This behavior can be overridden by defining the ingest class variable of your extension. Possible values are None to disable ingest entirely, a pipeline or a dictionary of pipelines with string keys.

The overall pipeline of an extension can be overridden by setting the pipeline class variable of your extension. By default palgen will execute the ingest pipeline and the transform, validate, render and write methods of your extension in order. This is equivalent to setting something like

pipeline = Sources() >> <extension>.ingest
                     >> <extension>.transform
                     >> <extension>.validate
                     >> <extension>.render
                     >> <extension>.write

Full example#

from hashlib import md5
from pathlib import Path
from typing import Iterable

from palgen import Extension, Sources, max_jobs
from palgen.ingest import Suffixes, Relative, Raw
from palgen.template.string import Template

# extension class must inherit from palgen.Extension


class Embed(Extension):
    ''' Converts arbitrary files to C arrays '''

    ingest = (Sources                       # Paths to all files
              >> Suffixes('.embed', position=0)  # Select files with .embed as first extension (ie foo.embed.bar)
              >> Relative                       # Convert paths to relative paths
              >> Raw)                           # Read in files as raw bytes

    def transform(self, data: Iterable[tuple[Path, bytes]]):
        for file, content in data:
            # remove the `.embed` suffix
            path = file.with_stem(file.name[: -len(''.join(file.suffixes))])

            # add `.hpp` suffix
            outpath = path.with_suffix(''.join([*file.suffixes[1:], '.hpp']))

            # hash the input path without the `.embed` suffix
            filehash = md5(bytes(str(path), encoding="utf-8")).hexdigest()

            # pass this along to the next step
            yield outpath, filehash, Template("resource.tpl.hpp")(hash = filehash,
                                                                  data = ','.join([hex(value)for value in content]),
                                                                  path = path)

    # this collects all previous results, so limit this to 1 job
    @max_jobs(1)
    def render(self, data: Iterable[tuple[Path, str, str]]):
        includes = []
        pairs = []
        for file, filehash, content in data:
            includes.append(f"Embed::r{filehash}")
            pairs.append(f"#include <{file}>")
            yield file, content

        indent = " " * 16
        yield Path('src') / 'embed.hpp', Template("embed.tpl.hpp")(amount   = len(pairs),
                                                                   pairs    = f',\n{indent}'.join(pairs),
                                                                   includes = '\n'.join(includes))