fresco.models package
Submodules
fresco.models.clc module
- class fresco.models.clc.CaseLevelContext(num_classes, doc_embed_size=400, att_dim_per_head=50, att_heads=8, att_dropout=0.1, forward_mask=True, device='cuda')[source]
Bases:
Module
- forward(doc_embeds, num_docs)[source]
Case level context forward pass.
- Parameters:
doc_embeds (torch.tensor) – Float tensor of shape [batch_size x max_seq_length x doc_embed_size]. Document embeddings.
num_docs (torch.tensor) – Integer tensor of shape [batch_size]. Number of reports per case.
- Returns:
None
fresco.models.mtcnn module
- class fresco.models.mtcnn.MTCNN(embedding_matrix, num_classes, window_sizes=None, num_filters=None, dropout=0.5, bag_of_embeddings=True, embeddings_scale=20)[source]
Bases:
Module
Multitask simple text CNN for classifying cancer pathology reports.
- Parameters:
embedding_matrix (numpy.array) – Numpy array of word embeddings. Each row should represent a word embedding. NOTE: The word index 0 is masked, so the first row is ignored.
num_classes (list[int]) – Number of possible output classes for each task.
(list[int] (num_filters) – [3, 4, 5]): Window size (consecutive tokens examined) in parallel convolution layers. Must match the length of num_filters.
default – [3, 4, 5]): Window size (consecutive tokens examined) in parallel convolution layers. Must match the length of num_filters.
(list[int] – [300, 300, 300]): Number of filters used in parallel convolution layers. Must match the length of window_sizes.
default – [300, 300, 300]): Number of filters used in parallel convolution layers. Must match the length of window_sizes.
(float (embeddings_scale) – 0.5): Dropout rate applied to the final document embedding after maxpooling.
default – 0.5): Dropout rate applied to the final document embedding after maxpooling.
(bool (bag_of_embeddings) – False): Adds a parallel bag of embeddings layer and concatenates it to the final document embedding.
default – False): Adds a parallel bag of embeddings layer and concatenates it to the final document embedding.
(float – 2.5): Scaling of word embeddings matrix columns.
default – 2.5): Scaling of word embeddings matrix columns.
- Returns:
None
fresco.models.mthisan module
- class fresco.models.mthisan.MTHiSAN(embedding_matrix, num_classes, max_words_per_line, max_lines, att_dim_per_head=50, att_heads=8, att_dropout=0.1, bag_of_embeddings=False, embeddings_scale=2.5)[source]
Bases:
Module
Multitask hierarchical self-attention network for classifying cancer pathology reports.
- Parameters:
embedding_matrix (numpy array) – Numpy array of word embeddings. Each row should represent a word embedding. NOTE: The word index 0 is masked, so the first row is ignored.
num_classes (list[int]) – Number of possible output classes for each task.
max_words_per_line (int) – Number of words per line. Used to split documents into smaller chunks.
max_lines (int) – Maximum number of lines per document. Additional lines beyond this limit are ignored.
(int (att_heads) – 50): Dimension size of output from each attention head. Total output dimension is att_dim_per_head * att_heads.
default – 50): Dimension size of output from each attention head. Total output dimension is att_dim_per_head * att_heads.
(int – 8): Number of attention heads for multihead attention.
default – 8): Number of attention heads for multihead attention.
(float (embeddings_scale) – 0.1): Dropout rate for attention softmaxes and intermediate embeddings.
default – 0.1): Dropout rate for attention softmaxes and intermediate embeddings.
(bool (bag_of_embeddings) – False): Adds a parallel bag of embeddings layer. Concats to the final document embedding.
default – False): Adds a parallel bag of embeddings layer. Concats to the final document embedding.
(float – 2.5): Scaling of word embeddings matrix columns.
default – 2.5): Scaling of word embeddings matrix columns.
- forward(docs, return_embeds=False)[source]
Flexible attention operation for self and target attention.
- Parameters:
q (torch.tensor) – Float tensor of shape [batch x heads x seq_len x dim1].
k (torch.tensor) – Float tensor of shape [batch x heads x seq_len x dim1].
v (torch.tensor) – Float tensor of shape [batch x heads x seq_len x dim2]. NOTE: q and k must have the same dimension, but v can be different.
drop (torch.nn.Dropout) – Dropout layer.
mask_q (torch.tensor) – Boolean tensor of shape [batch x seq_len].
mask_k (torch.tensor) – Boolean tensor of shape [batch x seq_len].
mask_v (torch.tensor) – Boolean tensor of shape [batch x seq_len].
- Returns:
None