io¶
ZIP-based serialisation. DataLoader reads datasets from archives;
DataSaver writes them.
Defines helpers for inputs/outputs.
- class fstg_toolkit.io.AreasDescHandler(*args, **kwargs)[source]¶
Bases:
DataHandler[DataFrame]Handler for the areas descriptor CSV file (
areas.csv).- classmethod deserialize(fp: IO, **context: Any) DataFrame[source]¶
Deserialize the item from a file-like object.
- Parameters:
fp (IO) – Readable binary file-like object.
**context (Any) – Optional extra keyword arguments required by the handler.
- Returns:
The deserialized data object.
- Return type:
Any
- classmethod matches(filename: str) bool[source]¶
Check if the handler can handle a file from its filename.
- class fstg_toolkit.io.DataHandler(*args, **kwargs)[source]¶
-
Protocol defining the interface for data format handlers.
A
DataHandleris responsible for serializing and deserializing a specific type of data to and from file-like objects, and for mapping between human-readable names and their on-disk filenames.- classmethod deserialize(fp: IO, **context: Any) T[source]¶
Deserialize the item from a file-like object.
- Parameters:
fp (IO) – Readable binary file-like object.
**context (Any) – Optional extra keyword arguments required by the handler.
- Returns:
The deserialized data object.
- Return type:
Any
- classmethod matches(filename: str) bool[source]¶
Check if the handler can handle a file from its filename.
- class fstg_toolkit.io.DataLoader(filepath: ~pathlib.Path, _inventory: dict[str, list[str]] = <factory>)[source]¶
Bases:
objectRead-only accessor for a ZIP archive produced by
DataSaver.On construction the archive is opened once to build an inventory of all known data files, grouped by handler kind.
Methods are provided to load (lazily or not) the elements of the dataset, such as the correlation matrices, the graphs, the metrics, etc.
- Parameters:
filepath (Path) – Path to an existing ZIP archive.
- Raises:
FileNotFoundError – If filepath does not exist or points to a directory.
- __post_init__()[source]¶
Validate that the provided filepath points to an existing file.
- Raises:
FileNotFoundError – If the path does not exist or is a directory.
- extract_files(output: Path, kinds: str | list[str]) tuple[int, Generator[str, None, None]][source]¶
Extract the files of the given kinds to the given output directory.
This method returns a generator yield, so it is required to iterate on so the extract can happen.
- Parameters:
- Returns:
The total number of files and a generator to the extraction that yields the name of the file that has been extracted.
- Return type:
- Raises:
ValueError – if output path does not exist or is not a directory.:
- lazy_load_frequent_patterns() list[str][source]¶
Return the list of frequent pattern filenames present in the archive.
- Returns:
Filenames that can be passed to
load_frequent_pattern().- Return type:
- lazy_load_graphs() list[str][source]¶
Return the list of graph filenames present in the archive.
- Returns:
Filenames that can be passed to
load_graph().- Return type:
- lazy_load_matrices() list[str][source]¶
Return the list of matrix filenames present in the archive.
- Returns:
Filenames that can be passed to
load_matrix().- Return type:
- lazy_load_metrics() list[str][source]¶
Return the list of metrics filenames present in the archive.
- Returns:
Filenames that can be passed to
load_metric().- Return type:
- load_areas() DataFrame | None[source]¶
Load the areas descriptor data frame from the archive.
- Returns:
The areas data frame, or
Noneif no areas file is present.- Return type:
pd.DataFrame or None
- load_frequent_pattern(filename: str) FrequentPatterns | None[source]¶
Load a single frequent pattern dict by its filename from the archive.
- Parameters:
filename (str) – Filename of the pattern entry inside the ZIP archive.
- Returns:
The pattern dict, or
Noneif filename is not in the archive.- Return type:
FrequentPatterns or None
- load_frequent_patterns() dict[tuple[str, str], FrequentPatterns][source]¶
Load all frequent pattern dicts from the archive.
- Returns:
Mapping from
(subject, mode)to the patterns object.- Return type:
- load_graph(areas_desc: DataFrame, filename: str) SpatioTemporalGraph | None[source]¶
Load a single graph by its filename from the archive.
- Parameters:
areas_desc (pd.DataFrame) – Areas descriptor data frame required for graph deserialization.
filename (str) – Filename of the graph entry inside the ZIP archive.
- Returns:
The deserialized graph, or
Noneif filename is not in the archive.- Return type:
SpatioTemporalGraph or None
- load_graphs(areas_desc: DataFrame) dict[str, SpatioTemporalGraph][source]¶
Load all graphs from the archive.
- Parameters:
areas_desc (pd.DataFrame) – Areas descriptor data frame required for graph deserialization.
- Returns:
Mapping from graph name to its deserialized object.
- Return type:
- load_matrix(filename: str) ndarray | None[source]¶
Load a single matrix by its filename from the archive.
- Parameters:
filename (str) – Filename of the matrix entry inside the ZIP archive.
- Returns:
The deserialized array, or
Noneif filename is not in the archive.- Return type:
np.ndarray or None
- load_metric(filename: str) DataFrame | None[source]¶
Load a single metric data frame by its filename from the archive.
- Parameters:
filename (str) – Filename of the metric entry inside the ZIP archive.
- Returns:
The deserialized data frame, or
Noneif filename is not in the archive.- Return type:
pd.DataFrame or None
- class fstg_toolkit.io.DataRegistry[source]¶
Bases:
objectCentral registry mapping data kinds to their
DataHandlerinstances.Handlers are registered with
register()and looked up dynamically from filenames viaresolve().- classmethod classify(filenames: list[str]) dict[str, list[str]][source]¶
Group filenames by their resolved handler kind.
- classmethod deserialize(filename: str, fp: IO, **context: Any) tuple[str, Any][source]¶
Deserialize an item from
fpusing the handler matched byfilename.- Parameters:
filename (str) – Used to look up the appropriate handler and derive the item name.
fp (IO) – Readable binary file-like object.
**context (Any) – Extra keyword arguments forwarded to the handler.
- Returns:
A
(name, item)pair where name is the human-readable identifier derived fromfilename.- Return type:
- Raises:
NoDataHandlerFound – If no handler matches
filename.
- classmethod filename2name(filename: str) str[source]¶
Convert a filename to its corresponding name.
- Parameters:
filename (str) – The on-disk filename of the data item.
- Returns:
The corresponding human-readable name of the data item.
- Return type:
- Raises:
NoDataHandlerFound – If no handler is registered for
filename.
- classmethod name2filename(name: str, kind: str) str[source]¶
Convert a logical name to its on-disk filename for a given kind.
- Parameters:
- Returns:
The corresponding on-disk filename.
- Return type:
- Raises:
NoDataHandlerFound – If no handler is registered for
kind.
- classmethod register(kind: str)[source]¶
Class decorator that registers a handler under the given kind key.
- Parameters:
kind (str) – The logical kind label (e.g.
"graphs","metrics").- Returns:
A decorator that stores the decorated class in the registry.
- Return type:
Callable
Examples
>>> @DataRegistry.register('my_kind') ... class MyHandler: ...
- classmethod resolve(filename: str) tuple[str, DataHandler] | None[source]¶
Find the handler matching the given filename.
- Parameters:
filename (str) – The filename to match against all registered handlers.
- Returns:
A
(kind, handler)pair if a match is found, otherwiseNone.- Return type:
tuple[str, DataHandler] or None
- classmethod serialize(filename: str, item: Any, fp: IO, **context) None[source]¶
Serialize
itemtofpusing the handler matched byfilename.- Parameters:
filename (str) – Used to look up the appropriate handler.
item (Any) – Data object to serialize.
fp (IO) – Writable binary file-like object.
**context (Any) – Extra keyword arguments forwarded to the handler.
- Raises:
NoDataHandlerFound – If no handler matches
filename.
- class fstg_toolkit.io.DataSaver(_inventory: dict[str, list[tuple[str, ~typing.Any]]] = <factory>)[source]¶
Bases:
objectAccumulates data items in memory and writes them to a ZIP archive.
Items are staged via
add_*methods and flushed to disk by callingsave(). If the target archive already exists, only files whose names overlap with the new data are replaced; all other existing entries are preserved.- add_areas(areas: DataFrame) None[source]¶
Stage an areas descriptor data frame for saving.
- Parameters:
areas (pd.DataFrame) – Areas descriptor data frame.
- add_frequent_patterns(patterns: dict[tuple[str, str], FrequentPatterns]) None[source]¶
Stage frequent pattern dicts for saving.
- Parameters:
patterns (dict mapping
nameto patterns dict.)
- add_graphs(graphs: dict[str, SpatioTemporalGraph]) None[source]¶
Stage a collection of graphs for saving.
- Parameters:
graphs (dict[str, SpatioTemporalGraph]) – Mapping from graph name to graph object.
- add_matrices(matrices: dict[str, ndarray]) None[source]¶
Stage a collection of NumPy matrices for saving.
- add_metrics(metrics: dict[str, DataFrame]) None[source]¶
Stage a collection of metric data frames for saving.
- save(filepath: Path) tuple[int, Generator[str, None, None]][source]¶
Flush all staged items to a ZIP archive at filepath.
If the archive already exists and some staged filenames collide with existing entries,
__transfer_save()is used to replace only those entries while preserving the rest. Otherwise, new entries are simply appended.This method returns a generator yield, so it is required to iterate on so the saving to dataset can happen.
- class fstg_toolkit.io.FrequentPatternsHandler(*args, **kwargs)[source]¶
Bases:
DataHandler[FrequentPatterns]Handler for frequent pattern from SPMiner
The filenames are
<subject>/motifs_enriched_<mode>.json, where mode iss,torst.- classmethod deserialize(fp: IO, **context: Any) FrequentPatterns[source]¶
Deserialize the item from a file-like object.
- Parameters:
fp (IO) – Readable binary file-like object.
**context (Any) – Optional extra keyword arguments required by the handler.
- Returns:
The deserialized data object.
- Return type:
Any
- classmethod matches(filename: str) bool[source]¶
Check if the handler can handle a file from its filename.
- classmethod serialize(item: FrequentPatterns, fp: IO, **context: Any) None[source]¶
Serialize the item to a file-like object.
- Parameters:
item (Any) – The data object to serialize.
fp (IO) – Writable binary file-like object.
**context (Any) – Optional extra keyword arguments passed through to the handler.
- class fstg_toolkit.io.FrequentPatternsIO[source]¶
Bases:
objectI/O utilities for loading frequent subgraph patterns from SPMiner output files.
Provides class methods for loading frequent patterns from JSON files generated by the SPMiner service. Patterns are parsed into FrequentPattern objects (networkx DiGraph subclasses) and aggregated into
FrequentPatternscollections.- classmethod from_spminer_file(file: Path) FrequentPatterns[source]¶
Load frequent patterns from a single SPMiner JSON output file.
Parses a JSON file containing frequent subgraph patterns and returns a FrequentPatterns collection with all patterns decoded from their dictionary representation.
- Parameters:
file (Path) – Path to a JSON file containing frequent patterns from SPMiner output. Expected format: dict mapping pattern names to pattern dictionaries with ‘nodes’ and ‘edges’ keys.
- Returns:
A FrequentPatterns dataclass containing the parsed patterns.
- Return type:
Examples
>>> patterns = FrequentPatternsIO.from_spminer_file(Path('motifs_enriched_s.json')) >>> len(patterns) 5
- classmethod from_spminer_files(output_dir: Path, filenames: Iterable[Path]) dict[tuple[str, str], FrequentPatterns][source]¶
Load frequent patterns from multiple SPMiner JSON output files.
Loads patterns from multiple files and returns a dictionary where keys are derived from relative file paths (directory structure and filename without extension), and values are FrequentPatterns collections.
- Parameters:
- Returns:
Mapping from relative file paths (without extension) to FrequentPatterns objects. For example, a file at
output_dir/subject_A/motifs_enriched_s.jsonwould be keyed as'subject_A/motifs_enriched_s'.- Return type:
Examples
>>> patterns_dict = FrequentPatternsIO.from_spminer_files( ... Path('output'), [Path('output/subject_A/motifs_enriched_s.json')] ... ) >>> 'subject_A/motifs_enriched_s' in patterns_dict True
- class fstg_toolkit.io.GraphHandler(*args, **kwargs)[source]¶
Bases:
DataHandler[SpatioTemporalGraph]Handler for
SpatioTemporalGraphstored as JSON files.Filenames must end with
.jsonand must not match themotifs_enriched_*.jsonpattern (which is reserved for motif data).- classmethod deserialize(fp: IO, **context: Any) SpatioTemporalGraph[source]¶
Deserialize the item from a file-like object.
- Parameters:
fp (IO) – Readable binary file-like object.
**context (Any) – Optional extra keyword arguments required by the handler.
- Returns:
The deserialized data object.
- Return type:
Any
- classmethod matches(filename: str) bool[source]¶
Check if the handler can handle a file from its filename.
- classmethod serialize(item: SpatioTemporalGraph, fp: IO, **context: Any) None[source]¶
Serialize the item to a file-like object.
- Parameters:
item (Any) – The data object to serialize.
fp (IO) – Writable binary file-like object.
**context (Any) – Optional extra keyword arguments passed through to the handler.
- class fstg_toolkit.io.GraphsDataset(loader: DataLoader, areas_desc: DataFrame, factors: list[set[str]], subjects: DataFrame)[source]¶
Bases:
objectDataset for managing spatio-temporal graphs and associated (meta)data.
The dataset is lazily loaded from a specified file path, which contains graph and matrix files. The dataset includes a description of areas (nodes) in the graphs, factors for grouping subjects, and a table of subjects with their associated graph and matrix files.
Some methods allow for retrieving graphs and matrices by subject IDs, checking for the presence of matrices, and serializing/deserializing the dataset.
- loader¶
Loader object for reading graph and matrix files.
- Type:
- areas_desc¶
A dataframe describing the areas (nodes) in the graphs.
- Type:
- subjects¶
A dataframe containing subject information, indexed by factors and subject ID.
- Type:
- get_graph(ids: Tuple[str, ...]) SpatioTemporalGraph[source]¶
Retrieves the graph associated with the given subject IDs.
- get_matrix(ids: Tuple[str, ...]) np.ndarray[source]¶
Retrieves the matrix associated with the given subject IDs.
- deserialize(data: Dict[str, Any]) 'GraphsDataset'[source]¶
Deserializes a dataset from a dictionary format.
- from_filepath(filepath: Path) 'GraphsDataset'[source]¶
Creates a GraphsDataset instance from a file path, loading the dataset lazily.
- areas_desc: DataFrame¶
- static deserialize(data: dict[str, Any]) GraphsDataset[source]¶
Deserializes a dataset from a dictionary format.
- Parameters:
data (dict mapping property names to their values) – A dictionary containing the serialized dataset properties, including: - ‘filepath’: The file path of the dataset. - ‘areas_desc’: A list of dictionaries representing the areas description. - ‘factors’: A list of sets, each containing factor names. - ‘subjects’: A list of dictionaries representing the subjects table.
- Returns:
An instance of GraphsDataset created from the provided data.
- Return type:
- static from_filepath(filepath: Path) GraphsDataset[source]¶
Creates a GraphsDataset instance from a file path, loading the dataset lazily.
- Parameters:
filepath (pathlib.Path) – The path to the dataset file, which should contain graph and matrix files.
- Returns:
An instance of GraphsDataset created from the specified file path.
- Return type:
- get_all_frequent_patterns(mode: str) dict[tuple[str, ...], FrequentPatterns][source]¶
Get frequent patterns for all subjects filtered by mining mode.
- Parameters:
mode (str) – The mining mode to filter by (
's','t', or'st').- Returns:
A dictionary keyed by subject index tuples (same as used by
get_graph()) with the correspondingFrequentPatterns.- Return type:
dict mapping subject index tuple to FrequentPatterns
- get_available_frequent_pattern_modes() list[str][source]¶
Return sorted list of available frequent pattern mining modes.
Parses filenames from
DataLoader.lazy_load_frequent_patterns()using theFrequentPatternsHandler.patternregex and collects unique mode groups.
- get_frequent_patterns(ids: tuple[str, ...], mode: str) FrequentPatterns | None[source]¶
Get frequent patterns for a subject.
- Parameters:
ids (tuple[str, ...]) – Subject index values (same as used by
get_graph()).mode (str) – The mining mode to filter by (
's','t', or'st').
- Returns:
The objects to manipulate frequent patterns for the specified subject and mode.
- Return type:
Optional[FrequentPatterns]
- get_frequent_patterns_analysis(mode: str, equivalence_strategy: Type[PatternEquivalenceStrategy]) FrequentPatternsPopulationAnalysis[source]¶
Analyze frequent patterns across all subjects with an equivalence strategy.
- Parameters:
mode (str) – The pattern mining mode (e.g., ‘s’, ‘t’ or ‘st’).
equivalence_strategy (Type[PatternEquivalenceStrategy]) – Strategy class to determine if two patterns are equivalent.
- Returns:
Population analysis object with unique patterns and tracking information.
- Return type:
- get_graph(ids: tuple[str, ...]) SpatioTemporalGraph[source]¶
Retrieves the graph associated with the given subject IDs.
- Parameters:
ids (tuple of str) – A tuple of strings representing the subject IDs, which should match the index of the subjects Data.
- Returns:
The spatio-temporal graph corresponding to the specified subject IDs.
- Return type:
- Raises:
KeyError – If the provided IDs do not match any subject in the dataset.
- get_matrix(ids: tuple[str, ...]) ndarray[source]¶
Retrieves the matrix associated with the given subject IDs.
- has_frequent_patterns() bool[source]¶
Check if the dataset contains frequent pattern files.
- Returns:
Trueif at least one frequent pattern file is present.- Return type:
- has_matrices() bool[source]¶
Checks if the dataset contains matrices for subjects.
- Returns:
True if the dataset has a ‘Matrix’ column in the subjects DataFrame, False otherwise.
- Return type:
- loader: DataLoader¶
- serialize() dict[str, Any][source]¶
Serializes the dataset into a dictionary format for storage or transmission.
- Returns:
A dictionary mapping dataset attributes to their values, including
- ‘filepath’ (The file path of the dataset.)
- ‘areas_desc’ (A list of dictionaries representing the areas description.)
- ‘factors’ (A list of sets, each containing factor names.)
- ‘subjects’ (A list of dictionaries representing the subjects table.)
- subjects: DataFrame¶
- class fstg_toolkit.io.MatrixHandler(*args, **kwargs)[source]¶
Bases:
DataHandler[ndarray]Handler for correlation matrices stored as
.npyfiles.- classmethod deserialize(fp: IO, **context: Any) ndarray[source]¶
Deserialize the item from a file-like object.
- Parameters:
fp (IO) – Readable binary file-like object.
**context (Any) – Optional extra keyword arguments required by the handler.
- Returns:
The deserialized data object.
- Return type:
Any
- classmethod matches(filename: str) bool[source]¶
Check if the handler can handle a file from its filename.
- class fstg_toolkit.io.MetricsHandler(*args, **kwargs)[source]¶
Bases:
DataHandler[DataFrame]Handler for metric data frames stored as
metrics_<name>.csvfiles.- classmethod deserialize(fp: IO, **context: Any) DataFrame[source]¶
Deserialize the item from a file-like object.
- Parameters:
fp (IO) – Readable binary file-like object.
**context (Any) – Optional extra keyword arguments required by the handler.
- Returns:
The deserialized data object.
- Return type:
Any
- classmethod matches(filename: str) bool[source]¶
Check if the handler can handle a file from its filename.
- exception fstg_toolkit.io.NoDataHandlerFound(name: str)[source]¶
Bases:
TypeErrorRaised when no registered handler matches a given filename or name.
- fstg_toolkit.io.load_spatio_temporal_graph(filepath: Path | str) SpatioTemporalGraph[source]¶
Load a spatio-temporal graph from its zip file.
If multiple graphs are in the archive, the first found will be loaded.
- Parameters:
- Returns:
The spatio-temporal graph contained in the zip file.
- Return type:
Example
>>> G = nx.DiGraph() >>> G.add_nodes_from({ ... 1: {'t': 0, 'areas': {1}, 'region': 'R1', 'internal_strength': 1}, ... 2: {'t': 0, 'areas': {2}, 'region': 'R1', 'internal_strength': 1}, ... 3: {'t': 0, 'areas': {3}, 'region': 'R2', 'internal_strength': 1}, ... 4: {'t': 1, 'areas': {1, 2}, 'region': 'R1', 'internal_strength': 0.52873788}, ... 5: {'t': 1, 'areas': {3}, 'region': 'R2', 'internal_strength': 1}}) >>> G.add_edges_from([ ... (1, 3, {'t': 0, 'type': 'spatial', 'correlation': -0.41853318}), ... (1, 4, {'type': 'temporal', 'transition': RC5.PP}), ... (2, 3, {'t': 0, 'type': 'spatial', 'correlation': 0.75087697}), ... (2, 4, {'type': 'temporal', 'transition': RC5.PP}), ... (3, 1, {'t': 0, 'type': 'spatial', 'correlation': -0.41853318}), ... (3, 2, {'t': 0, 'type': 'spatial', 'correlation': 0.75087697}), ... (3, 5, {'type': 'temporal', 'transition': RC5.EQ}), ... (4, 5, {'t': 1, 'type': 'spatial', 'correlation': 0.75087697}), ... (5, 4, {'t': 1, 'type': 'spatial', 'correlation': 0.75087697})]) >>> areas_desc = pd.DataFrame({ ... 'Id_Area': [1, 2, 3], ... 'Name_Area': ['Area 1', 'Area 2', 'Area 3'], ... 'Name_Region': ['R1', 'R2', 'R3']}) >>> areas_desc.set_index('Id_Area', inplace=True) >>> graph_path = Path(tempfile.gettempdir()) / 'test_load.zip' >>> save_spatio_temporal_graph(SpatioTemporalGraph(G, areas_desc), graph_path) >>> graph_struct = load_spatio_temporal_graph(graph_path)
- Raises:
RuntimeError – If no graph is found in the zip file.
- fstg_toolkit.io.save_spatio_temporal_graph(graph: SpatioTemporalGraph, filepath: Path | str) None[source]¶
Save a spatio-temporal graph to a zip file.
- Parameters:
graph (SpatioTemporalGraph) – The spatio-temporal graph to save.
Example
>>> G = nx.DiGraph() >>> G.add_nodes_from({ ... 1: {'t': 0, 'areas': {1}, 'region': 'R1', 'internal_strength': 1}, ... 2: {'t': 0, 'areas': {2}, 'region': 'R1', 'internal_strength': 1}, ... 3: {'t': 0, 'areas': {3}, 'region': 'R2', 'internal_strength': 1}, ... 4: {'t': 1, 'areas': {1, 2}, 'region': 'R1', 'internal_strength': 0.52873788}, ... 5: {'t': 1, 'areas': {3}, 'region': 'R2', 'internal_strength': 1}}) >>> G.add_edges_from([ ... (1, 3, {'t': 0, 'type': 'spatial', 'correlation': -0.41853318}), ... (1, 4, {'type': 'temporal', 'transition': RC5.PP}), ... (2, 3, {'t': 0, 'type': 'spatial', 'correlation': 0.75087697}), ... (2, 4, {'type': 'temporal', 'transition': RC5.PP}), ... (3, 1, {'t': 0, 'type': 'spatial', 'correlation': -0.41853318}), ... (3, 2, {'t': 0, 'type': 'spatial', 'correlation': 0.75087697}), ... (3, 5, {'type': 'temporal', 'transition': RC5.EQ}), ... (4, 5, {'t': 1, 'type': 'spatial', 'correlation': 0.75087697}), ... (5, 4, {'t': 1, 'type': 'spatial', 'correlation': 0.75087697})]) >>> areas_desc = pd.DataFrame({ ... 'Name_Area': ['Area 1', 'Area 2', 'Area 3'], ... 'Name_Region': ['R1', 'R2', 'R3']}, index=[1, 2, 3]) >>> graph_path = Path(tempfile.gettempdir()) / 'test_save.zip' >>> graph_struct = SpatioTemporalGraph(G, areas_desc) >>> save_spatio_temporal_graph(graph_struct, graph_path)