ncsw_data.source.reaction.uspto.utility.formatting

The ncsw_data.source.reaction.uspto.utility package formatting module.

Classes

USPTOReactionDatasetFormattingUtility

The United States Patent and Trademark Office (USPTO)

Module Contents

class ncsw_data.source.reaction.uspto.utility.formatting.USPTOReactionDatasetFormattingUtility

The United States Patent and Trademark Office (USPTO) chemical reaction dataset formatting utility class.

static format_v_1976_to_2013_rsmi_by_20121009_lowe_d_m(input_directory_path: str | os.PathLike[str], output_directory_path: str | os.PathLike[str]) None

Format the data from the v_1976_to_2013_rsmi_by_20121009_lowe_d_m version of the dataset.

Parameters:
  • input_directory_path – The path to the input directory where the data is extracted.

  • output_directory_path – The path to the output directory where the data should be formatted.

static format_v_50k_by_20141226_schneider_n_et_al(input_directory_path: str | os.PathLike[str], output_directory_path: str | os.PathLike[str]) None

Format the data from the v_50k_by_20141226_schneider_n_et_al version of the dataset.

Parameters:
  • input_directory_path – The path to the input directory where the data is extracted.

  • output_directory_path – The path to the output directory where the data should be formatted.

static format_v_50k_by_20161122_schneider_n_et_al(input_directory_path: str | os.PathLike[str], output_directory_path: str | os.PathLike[str]) None

Format the data from the v_50k_by_20161122_schneider_n_et_al version of the dataset.

Parameters:
  • input_directory_path – The path to the input directory where the data is extracted.

  • output_directory_path – The path to the output directory where the data should be formatted.

static format_v_15k_by_20170418_coley_c_w_et_al(input_directory_path: str | os.PathLike[str], output_directory_path: str | os.PathLike[str]) None

Format the data from the v_15k_by_20170418_coley_c_w_et_al version of the dataset.

Parameters:
  • input_directory_path – The path to the input directory where the data is extracted.

  • output_directory_path – The path to the output directory where the data should be formatted.

static _parse_v_1976_to_2016_cml_by_20121009_lowe_d_m_file(input_file_path: str | os.PathLike[str]) List[Tuple[int | str | None, Ellipsis]]

Parse a file from the v_1976_to_2016_cml_by_20121009_lowe_d_m version of the dataset.

Parameters:

input_file_path – The path to the input file.

Returns:

The parsed input file.

static format_v_1976_to_2016_by_20121009_lowe_d_m(version: str, input_directory_path: str | os.PathLike[str], output_directory_path: str | os.PathLike[str], number_of_processes: int = 1) None

Format the data from a v_1976_to_2016_*_by_20121009_lowe_d_m version of the dataset.

Parameters:
  • version – The version of the dataset.

  • input_directory_path – The path to the input directory where the data is extracted.

  • output_directory_path – The path to the output directory where the data should be formatted.

  • number_of_processes – The number of processes.

static format_v_50k_by_20170905_liu_b_et_al(input_directory_path: str | os.PathLike[str], output_directory_path: str | os.PathLike[str]) None

Format the data from the v_50k_by_20170905_liu_b_et_al version of the dataset.

Parameters:
  • input_directory_path – The path to the input directory where the data is extracted.

  • output_directory_path – The path to the output directory where the data should be formatted.

static format_v_50k_by_20171116_coley_c_w_et_al(input_directory_path: str | os.PathLike[str], output_directory_path: str | os.PathLike[str]) None

Format the data from the v_50k_by_20171116_coley_c_w_et_al version of the dataset.

Parameters:
  • input_directory_path – The path to the input directory where the data is extracted.

  • output_directory_path – The path to the output directory where the data should be formatted.

static format_v_480k_or_mit_by_20171204_jin_w_et_al(input_directory_path: str | os.PathLike[str], output_directory_path: str | os.PathLike[str]) None

Format the data from the v_480k_or_mit_by_20171204_jin_w_et_al version of the dataset.

Parameters:
  • input_directory_path – The path to the input directory where the data is extracted.

  • output_directory_path – The path to the output directory where the data should be formatted.

static format_v_by_20180622_schwaller_p_et_al(version: str, input_directory_path: str | os.PathLike[str], output_directory_path: str | os.PathLike[str]) None

Format the data from a v_*_by_20180622_schwaller_p_et_al version of the dataset.

Parameters:
  • version – The version of the dataset.

  • input_directory_path – The path to the input directory where the data is extracted.

  • output_directory_path – The path to the output directory where the data should be formatted.

static format_v_lef_by_20181221_bradshaw_j_et_al(input_directory_path: str | os.PathLike[str], output_directory_path: str | os.PathLike[str]) None

Format the data from the v_lef_by_20181221_bradshaw_j_et_al version of the dataset.

Parameters:
  • input_directory_path – The path to the input directory where the data is extracted.

  • output_directory_path – The path to the output directory where the data should be formatted.

static format_v_1k_tpl_by_20210128_schwaller_p_et_al(input_directory_path: str | os.PathLike[str], output_directory_path: str | os.PathLike[str]) None

Format the data from the v_1k_tpl_by_20210128_schwaller_p_et_al version of the dataset.

Parameters:
  • input_directory_path – The path to the input directory where the data is extracted.

  • output_directory_path – The path to the output directory where the data should be formatted.

static format_v_1976_to_2016_remapped_by_20210407_schwaller_p_et_al(input_directory_path: str | os.PathLike[str], output_directory_path: str | os.PathLike[str]) None

Format the data from the v_1976_to_2016_by_20210407_schwaller_p_et_al version of the chemical reaction dataset.

Parameters:
  • input_directory_path – The path to the input directory where the data is extracted.

  • output_directory_path – The path to the output directory where the data should be formatted.

static format_v_chen_s_et_al(version: str, input_directory_path: str | os.PathLike[str], output_directory_path: str | os.PathLike[str]) None

Format the data from a v_*_chen_s_et_al version of the dataset.

Parameters:
  • version – The version of the dataset.

  • input_directory_path – The path to the input directory where the data is extracted.

  • output_directory_path – The path to the output directory where the data should be formatted.