fresco.validate package

Submodules

fresco.validate.exceptions module

Simple module to catch exception errors and print helpful messages.

exception fresco.validate.exceptions.ParamError(msg: str)[source]

Bases: Exception

Simple class to catch exception errors and print helpful debugging messages.

fresco.validate.validate_params module

Module for sanity checking parameters for data and model building.

class fresco.validate.validate_params.ValidateClcParams(cli_args, data_source: str = 'pre-generated')[source]

Bases: object

Class to validate model-specific parameters for MOSSAIC models.

Parameters:
  • cli_args – Argparse list of command line args.

  • data_source (str) – Indicates where the data will come from. Should be one of: - pre-generated: data_args.yml will indicate the source.

Post-condition: model_args dict loaded and sanity checked.

check_abstain_args()[source]

Verify keywords needed for abstention to work are present and valid.

check_data_files(data_path)[source]

Verify the necessary data files exist.

Parameters:

data_path (str) – From argparser, optional path to dataset.

Note: Setting data_path will override the path set in model_args.yml.

check_data_train_args()[source]

Verify arguments are appropriate for the chosen model options.

Parameters: none

Pre-condition: self.model_args is not None.

Post-condition: self.model_args[‘train_kwargs’][‘doc_max_len’] is updated from the data_kwargs.

clc_arg_check()[source]

Check and modify HiSAN specific args.

Parameters: none

Pre-condition: self.model_args is not None

Post-condition:

self.model_args[‘MTHiSAN_kwargs’][‘max_lines’] modified to be the ceiling of doc_max_len / max_words_per_line. self.model_args[‘train_kwargs’][‘doc_max_len’] modified to be max_words_per_line * max_lines.

class fresco.validate.validate_params.ValidateParams(cli_args, data_source: str = 'pre-generated', model_args: dict | None = None)[source]

Bases: object

Class to validate model-specific parameters for MOSSAIC models.

Parameters:
  • cli_args – argparse list of command line args.

  • data_source (str) – Indicates where the data will come from. Should be one of: - pre-generated: data_args.yml will indicate the source.

Post-condition: model_args dict loaded and sanity checked.

check_abstain_args()[source]

Verify keywords needed for abstention to work are present and valid.

check_data_files(data_path=None)[source]

Verify the necessary data files exist.

Parameters:

data_path (str) – From argparser, optional path to dataset.

Note: Setting data_path will override the path set in model_args.yml.

check_data_train_args(from_pretrained=False)[source]

Verify arguments are appropriate for the chosen model options.

Parameters:
  • from_pretrained (bool) – Checking model args from a pretrained model. Pretrained model args are different,

  • train_kwargs. (some are copied from data_kwargs to) –

Pre-condition: self.model_args is not None.

Post-condition: self.model_args[‘train_kwargs’][‘doc_max_len’] is updated from the data_kwargs, ‘max_lines’ is added to the hisan kw_args.

check_keyword_args()[source]

Validate keyword args.

check_weights()[source]

Validate class weights path exists.

hisan_arg_check()[source]

Check and modify HiSAN specific args.

Parameters: none

Pre-condition: self.model_args is not None

Post-condition:

self.model_args[‘MTHiSAN_kwargs’][‘max_lines’] modified to be the ceiling of the doc_max_len / max_words_per_line. self.model_args[‘train_kwargs’][‘doc_max_len’] modified to be max_words_per_line * max_lines.

mtcnn_arg_check()[source]

Check the number of filters matchesthe number of windows.

Module contents