palgen.ingest.filter#

Module Contents#

class palgen.ingest.filter.Filter(*needles, regex=False, unix=False)#
palgen.ingest.filter.Filter palgen.ingest.filter.Filter palgen.ingest.filter.Filter palgen.ingest.filter.Filter

Generic filter

Parameters:
  • *needles (str | re.Pattern[str]) – list of strings or regex patterns

  • regex (bool) – if True, all needles will be interpreted as regex patterns

  • unix (bool) – if True, all needles will be interpreted as unix patterns

Important

Needles that start with ^ or end with $ are always converted to regex patterns.

__slots__ = ('needles',)#
match_str(other)#
Parameters:

other (str) – string to check against

Returns:

True if other matches any of the needles

match_files(files, attribute=None)#
Parameters:
  • files (Iterable[pathlib.Path]) – input Iterable of files to filter

  • attribute – attribute of the pathlib.Path object to check against

Yields:

Path – for every file that matches any of the needles

filter(files)#
Parameters:

files (Iterable[pathlib.Path]) – input Iterable of files to filter

Yields:

Path – for every file that matches any of the needles

Return type:

Iterable[pathlib.Path]

__call__(file_cache)#
Parameters:

file_cache (Iterable[pathlib.Path]) – input Iterable of files to filter

Yields:

Path – for every file that matches any of the needles

Return type:

Iterable[pathlib.Path]

class palgen.ingest.filter.Pattern(*patterns, unix=False)#
palgen.ingest.filter.Pattern palgen.ingest.filter.Filter palgen.ingest.filter.Filter palgen.ingest.filter.Filter palgen.ingest.filter.Pattern palgen.ingest.filter.Pattern palgen.ingest.filter.Pattern palgen.ingest.filter.Pattern->palgen.ingest.filter.Filter

Filter by regex or unix pattern. Equivalent to Filter(…, regex=True)

Parameters:
  • patterns (str | re.Pattern[str]) – Iterable of strings or regex patterns

  • unix (bool) – if True, all needles will be interpreted as unix patterns

__slots__ = ()#
match_str(other)#
Parameters:

other (str) – string to check against

Returns:

True if other matches any of the needles

match_files(files, attribute=None)#
Parameters:
  • files (Iterable[pathlib.Path]) – input Iterable of files to filter

  • attribute – attribute of the pathlib.Path object to check against

Yields:

Path – for every file that matches any of the needles

filter(files)#
Parameters:

files (Iterable[pathlib.Path]) – input Iterable of files to filter

Yields:

Path – for every file that matches any of the needles

Return type:

Iterable[pathlib.Path]

__call__(file_cache)#
Parameters:

file_cache (Iterable[pathlib.Path]) – input Iterable of files to filter

Yields:

Path – for every file that matches any of the needles

Return type:

Iterable[pathlib.Path]

class palgen.ingest.filter.Folder(*needles, regex=False, unix=False)#
palgen.ingest.filter.Folder palgen.ingest.filter.Filter palgen.ingest.filter.Filter palgen.ingest.filter.Filter palgen.ingest.filter.Folder palgen.ingest.filter.Folder palgen.ingest.filter.Folder palgen.ingest.filter.Folder->palgen.ingest.filter.Filter

Generic filter

Parameters:
  • *needles (str | re.Pattern[str]) – list of strings or regex patterns

  • regex (bool) – if True, all needles will be interpreted as regex patterns

  • unix (bool) – if True, all needles will be interpreted as unix patterns

Important

Needles that start with ^ or end with $ are always converted to regex patterns.

__slots__ = ()#
filter(files)#

Filters by folder name. This will match folder names at any level

Parameters:

files (Iterable[Path]) – input Iterable of files to filter

Yields:

Path – for every file that matches any of the needles

Return type:

Iterable[pathlib.Path]

match_str(other)#
Parameters:

other (str) – string to check against

Returns:

True if other matches any of the needles

match_files(files, attribute=None)#
Parameters:
  • files (Iterable[pathlib.Path]) – input Iterable of files to filter

  • attribute – attribute of the pathlib.Path object to check against

Yields:

Path – for every file that matches any of the needles

__call__(file_cache)#
Parameters:

file_cache (Iterable[pathlib.Path]) – input Iterable of files to filter

Yields:

Path – for every file that matches any of the needles

Return type:

Iterable[pathlib.Path]

class palgen.ingest.filter.Suffix(*needles, regex=False, unix=False)#
palgen.ingest.filter.Suffix palgen.ingest.filter.Filter palgen.ingest.filter.Filter palgen.ingest.filter.Filter palgen.ingest.filter.Suffix palgen.ingest.filter.Suffix palgen.ingest.filter.Suffix palgen.ingest.filter.Suffix->palgen.ingest.filter.Filter

Generic filter

Parameters:
  • *needles (str | re.Pattern[str]) – list of strings or regex patterns

  • regex (bool) – if True, all needles will be interpreted as regex patterns

  • unix (bool) – if True, all needles will be interpreted as unix patterns

Important

Needles that start with ^ or end with $ are always converted to regex patterns.

__slots__ = ()#
filter(files)#

Filters by suffix with leading dot. Unlike Python’s default behavior this concatenates all suffixes.

ie while pathlib.Path.suffix for foo.tar.gz would only be .gz, this will instead check against .tar.gz.

Parameters:

files (Iterable[Path]) – input Iterable of files to filter

Yields:

Path – for every file that matches any of the needles

Return type:

Iterable[pathlib.Path]

match_str(other)#
Parameters:

other (str) – string to check against

Returns:

True if other matches any of the needles

match_files(files, attribute=None)#
Parameters:
  • files (Iterable[pathlib.Path]) – input Iterable of files to filter

  • attribute – attribute of the pathlib.Path object to check against

Yields:

Path – for every file that matches any of the needles

__call__(file_cache)#
Parameters:

file_cache (Iterable[pathlib.Path]) – input Iterable of files to filter

Yields:

Path – for every file that matches any of the needles

Return type:

Iterable[pathlib.Path]

class palgen.ingest.filter.Suffixes(*needles, regex=False, unix=False, position=None)#
palgen.ingest.filter.Suffixes palgen.ingest.filter.Filter palgen.ingest.filter.Filter palgen.ingest.filter.Filter palgen.ingest.filter.Suffixes palgen.ingest.filter.Suffixes palgen.ingest.filter.Suffixes palgen.ingest.filter.Suffixes->palgen.ingest.filter.Filter

Multiple suffixes filter

Parameters:
  • *needles (str | re.Pattern[str]) – list of strings or regex patterns

  • regex (bool) – if True, all needles will be interpreted as regex patterns

  • unix (bool) – if True, all needles will be interpreted as unix patterns

  • position (Optional[int]) – Check against the suffix at position position only. Tries all parts of the suffix if this is None.

__slots__ = ('position',)#
filter(files)#

Filters by suffix

Parameters:

files (Iterable[Path]) – input Iterable of files to filter

Yields:

Path – for every file that matches any of the needles

Return type:

Iterable[pathlib.Path]

match_str(other)#
Parameters:

other (str) – string to check against

Returns:

True if other matches any of the needles

match_files(files, attribute=None)#
Parameters:
  • files (Iterable[pathlib.Path]) – input Iterable of files to filter

  • attribute – attribute of the pathlib.Path object to check against

Yields:

Path – for every file that matches any of the needles

__call__(file_cache)#
Parameters:

file_cache (Iterable[pathlib.Path]) – input Iterable of files to filter

Yields:

Path – for every file that matches any of the needles

Return type:

Iterable[pathlib.Path]

class palgen.ingest.filter.Stem(*needles, regex=False, unix=False)#
palgen.ingest.filter.Stem palgen.ingest.filter.Filter palgen.ingest.filter.Filter palgen.ingest.filter.Filter palgen.ingest.filter.Stem palgen.ingest.filter.Stem palgen.ingest.filter.Stem palgen.ingest.filter.Stem->palgen.ingest.filter.Filter

Generic filter

Parameters:
  • *needles (str | re.Pattern[str]) – list of strings or regex patterns

  • regex (bool) – if True, all needles will be interpreted as regex patterns

  • unix (bool) – if True, all needles will be interpreted as unix patterns

Important

Needles that start with ^ or end with $ are always converted to regex patterns.

__slots__ = ()#
filter(files)#

Filters by stem (file’s name without extension(s)) ie the stem of foobar.tar.gz is foobar

Parameters:

files (Iterable[Path]) – input Iterable of files to filter

Yields:

Path – for every file that matches any of the needles

Return type:

Iterable[pathlib.Path]

match_str(other)#
Parameters:

other (str) – string to check against

Returns:

True if other matches any of the needles

match_files(files, attribute=None)#
Parameters:
  • files (Iterable[pathlib.Path]) – input Iterable of files to filter

  • attribute – attribute of the pathlib.Path object to check against

Yields:

Path – for every file that matches any of the needles

__call__(file_cache)#
Parameters:

file_cache (Iterable[pathlib.Path]) – input Iterable of files to filter

Yields:

Path – for every file that matches any of the needles

Return type:

Iterable[pathlib.Path]

class palgen.ingest.filter.Name(*needles, regex=False, unix=False)#
palgen.ingest.filter.Name palgen.ingest.filter.Filter palgen.ingest.filter.Filter palgen.ingest.filter.Filter palgen.ingest.filter.Name palgen.ingest.filter.Name palgen.ingest.filter.Name palgen.ingest.filter.Name->palgen.ingest.filter.Filter

Generic filter

Parameters:
  • *needles (str | re.Pattern[str]) – list of strings or regex patterns

  • regex (bool) – if True, all needles will be interpreted as regex patterns

  • unix (bool) – if True, all needles will be interpreted as unix patterns

Important

Needles that start with ^ or end with $ are always converted to regex patterns.

__slots__ = ()#
filter(files)#

Filters by name

Parameters:

files (Iterable[Path]) – input Iterable of files to filter

Yields:

Path – for every file that matches any of the needles

Return type:

Iterable[pathlib.Path]

match_str(other)#
Parameters:

other (str) – string to check against

Returns:

True if other matches any of the needles

match_files(files, attribute=None)#
Parameters:
  • files (Iterable[pathlib.Path]) – input Iterable of files to filter

  • attribute – attribute of the pathlib.Path object to check against

Yields:

Path – for every file that matches any of the needles

__call__(file_cache)#
Parameters:

file_cache (Iterable[pathlib.Path]) – input Iterable of files to filter

Yields:

Path – for every file that matches any of the needles

Return type:

Iterable[pathlib.Path]

palgen.ingest.filter.Passthrough(data)#

No-op, yields everything from the input Iterable

Parameters:
  • files (Iterable[Any]) – any Iterable

  • data (Iterable[Any]) –

Yields:

Path – for every file that matches any of the needles

Return type:

Iterable[Any]

palgen.ingest.filter.Nothing(data)#

Consumes the input iterable but does not yield anything

Parameters:
  • files (Iterable[Any]) – any Iterable

  • data (Iterable[Any]) –

Yields:

Nothing whatsoever.

Return type:

Iterable[Any]