fresco.models package

Submodules

fresco.models.clc module

class fresco.models.clc.CaseLevelContext(num_classes, doc_embed_size=400, att_dim_per_head=50, att_heads=8, att_dropout=0.1, forward_mask=True, device='cuda')[source]

Bases: Module

forward(doc_embeds, num_docs)[source]

Case level context forward pass.

Parameters:
  • doc_embeds (torch.tensor) – Float tensor of shape [batch_size x max_seq_length x doc_embed_size]. Document embeddings.

  • num_docs (torch.tensor) – Integer tensor of shape [batch_size]. Number of reports per case.

Returns:

None

fresco.models.mtcnn module

class fresco.models.mtcnn.MTCNN(embedding_matrix, num_classes, window_sizes=None, num_filters=None, dropout=0.5, bag_of_embeddings=True, embeddings_scale=20)[source]

Bases: Module

Multitask simple text CNN for classifying cancer pathology reports.

Parameters:
  • embedding_matrix (numpy.array) – Numpy array of word embeddings. Each row should represent a word embedding. NOTE: The word index 0 is masked, so the first row is ignored.

  • num_classes (list[int]) – Number of possible output classes for each task.

  • (list[int] (num_filters) – [3, 4, 5]): Window size (consecutive tokens examined) in parallel convolution layers. Must match the length of num_filters.

  • default – [3, 4, 5]): Window size (consecutive tokens examined) in parallel convolution layers. Must match the length of num_filters.

  • (list[int] – [300, 300, 300]): Number of filters used in parallel convolution layers. Must match the length of window_sizes.

  • default – [300, 300, 300]): Number of filters used in parallel convolution layers. Must match the length of window_sizes.

  • (float (embeddings_scale) – 0.5): Dropout rate applied to the final document embedding after maxpooling.

  • default – 0.5): Dropout rate applied to the final document embedding after maxpooling.

  • (bool (bag_of_embeddings) – False): Adds a parallel bag of embeddings layer and concatenates it to the final document embedding.

  • default – False): Adds a parallel bag of embeddings layer and concatenates it to the final document embedding.

  • (float – 2.5): Scaling of word embeddings matrix columns.

  • default – 2.5): Scaling of word embeddings matrix columns.

Returns:

None

forward(docs: tensor, return_embeds: bool = False) list[source]

MT-CNN forward pass.

Parameters:

docs (torch.tensor) – Batch of documents to classify. Each document should be a 0-padded row of mapped word indices.

Returns:

List of predicted logits for each task.

Return type:

list[torch.tensor]

fresco.models.mthisan module

class fresco.models.mthisan.MTHiSAN(embedding_matrix, num_classes, max_words_per_line, max_lines, att_dim_per_head=50, att_heads=8, att_dropout=0.1, bag_of_embeddings=False, embeddings_scale=2.5)[source]

Bases: Module

Multitask hierarchical self-attention network for classifying cancer pathology reports.

Parameters:
  • embedding_matrix (numpy array) – Numpy array of word embeddings. Each row should represent a word embedding. NOTE: The word index 0 is masked, so the first row is ignored.

  • num_classes (list[int]) – Number of possible output classes for each task.

  • max_words_per_line (int) – Number of words per line. Used to split documents into smaller chunks.

  • max_lines (int) – Maximum number of lines per document. Additional lines beyond this limit are ignored.

  • (int (att_heads) – 50): Dimension size of output from each attention head. Total output dimension is att_dim_per_head * att_heads.

  • default – 50): Dimension size of output from each attention head. Total output dimension is att_dim_per_head * att_heads.

  • (int – 8): Number of attention heads for multihead attention.

  • default – 8): Number of attention heads for multihead attention.

  • (float (embeddings_scale) – 0.1): Dropout rate for attention softmaxes and intermediate embeddings.

  • default – 0.1): Dropout rate for attention softmaxes and intermediate embeddings.

  • (bool (bag_of_embeddings) – False): Adds a parallel bag of embeddings layer. Concats to the final document embedding.

  • default – False): Adds a parallel bag of embeddings layer. Concats to the final document embedding.

  • (float – 2.5): Scaling of word embeddings matrix columns.

  • default – 2.5): Scaling of word embeddings matrix columns.

forward(docs, return_embeds=False)[source]

Flexible attention operation for self and target attention.

Parameters:
  • q (torch.tensor) – Float tensor of shape [batch x heads x seq_len x dim1].

  • k (torch.tensor) – Float tensor of shape [batch x heads x seq_len x dim1].

  • v (torch.tensor) – Float tensor of shape [batch x heads x seq_len x dim2]. NOTE: q and k must have the same dimension, but v can be different.

  • drop (torch.nn.Dropout) – Dropout layer.

  • mask_q (torch.tensor) – Boolean tensor of shape [batch x seq_len].

  • mask_k (torch.tensor) – Boolean tensor of shape [batch x seq_len].

  • mask_v (torch.tensor) – Boolean tensor of shape [batch x seq_len].

Returns:

None

Module contents