Ingesting files#

To make common operations such as ingesting files from your project’s source directory a little easier, palgen comes with a “magic” Pipeline helper. This helper allows palgen to automatically parallelize tasks if possible.

Essentially a Pipeline such as

pipeline = Sources() >> foo >> bar >> baz

when invoked with pipeline(iterable) would essentially expand to

You can use the @max_jobs(...) decorator from palgen.ext to control the amount of jobs a step can be run at. Additionally annotating the data parameter with list is equivalent to decorating the step with @max_jobs(1).

All steps must be functions or function objects returning an iterable, generators, a class implementing the iterator protocol or an instance of such a class. The palgen.machinery.pipelines.Pipeline class is aliased as palgen.ext.Sources for convenience. The default run(...) method will call the pipeline class attribute of your extension with a pre-filtered list of source files.

Filters and loaders#

For convenience some commonly used path filters and file loaders are implemented in palgen.ingest. All of the built-in filter and file loader steps expect an iterable of Path`s, so make sure to use only one loader as last step as it yields a tuple of the :code:`Path to the file and the file’s loaded content.

Default pipelines#

By default palgen searches for TOML files named after the extension’s name. For example an extension called error would by default ingest all files called error.toml from the source directories.

ingest = Sources() >> Suffix('toml')
                   >> Name(<extension name>)
                   >> Toml

This behavior can be overridden by defining the ingest class variable of your extension. Possible values are None to disable ingest entirely, a pipeline or a dictionary of pipelines with string keys.

The overall pipeline of an extension can be overridden by setting the pipeline class variable of your extension. By default palgen will execute the ingest pipeline and the transform, validate, render and write methods of your extension in order. This is equivalent to setting something like

pipeline = Sources() >> <extension>.ingest
                     >> <extension>.transform
                     >> <extension>.validate
                     >> <extension>.render
                     >> <extension>.write

Full example#

 1from hashlib import md5
 2from pathlib import Path
 3from typing import Iterable
 4
 5from palgen import Extension, Sources, max_jobs
 6from palgen.ingest import Suffixes, Relative, Raw
 7from palgen.template.string import Template
 8
 9# extension class must inherit from palgen.Extension
10
11
12class Embed(Extension):
13    ''' Converts arbitrary files to C arrays '''
14
15    ingest = (Sources                       # Paths to all files
16              >> Suffixes('.embed', position=0)  # Select files with .embed as first extension (ie foo.embed.bar)
17              >> Relative                       # Convert paths to relative paths
18              >> Raw)                           # Read in files as raw bytes
19
20    def transform(self, data: Iterable[tuple[Path, bytes]]):
21        for file, content in data:
22            # remove the `.embed` suffix
23            path = file.with_stem(file.name[: -len(''.join(file.suffixes))])
24
25            # add `.hpp` suffix
26            outpath = path.with_suffix(''.join([*file.suffixes[1:], '.hpp']))
27
28            # hash the input path without the `.embed` suffix
29            filehash = md5(bytes(str(path), encoding="utf-8")).hexdigest()
30
31            # pass this along to the next step
32            yield outpath, filehash, Template("resource.tpl.hpp")(hash = filehash,
33                                                                  data = ','.join([hex(value)for value in content]),
34                                                                  path = path)
35
36    # this collects all previous results, so limit this to 1 job
37    @max_jobs(1)
38    def render(self, data: Iterable[tuple[Path, str, str]]):
39        includes = []
40        pairs = []
41        for file, filehash, content in data:
42            includes.append(f"Embed::r{filehash}")
43            pairs.append(f"#include <{file}>")
44            yield file, content
45
46        indent = " " * 16
47        yield Path('src') / 'embed.hpp', Template("embed.tpl.hpp")(amount   = len(pairs),
48                                                                   pairs    = f',\n{indent}'.join(pairs),
49                                                                   includes = '\n'.join(includes))