scifact.model package¶

Submodules¶

scifact.model.download_pdf module¶

scifact.model.download_pdf.extract_ref_pdf(text)¶

Extract portion of the pdf that appears in the References section

Parameters: text (str) – contents of the pdf in str form
Returns: text of all the References found in the pdf
Return type: str

scifact.model.download_pdf.find_download_pdf(pdf_name, data)¶

Given a name of a pdf, downloads the pdf

Parameters

pdf_name (str) – name of the pdf to download which contains to claim
data (pandas dataframe) – arxiv dataset which contains the details of all pdfs and their authors, links etc

Returns

all the content/text found in the pdf

Return type

str

scifact.model.download_pdf.unzip(path_to_zip_file, dir_path)¶

Unzip a folder

Parameters

path_to_zip_file (str) – path to the zipped folder
dir_path (str) – path to place unzipped files

scifact.model.label module¶

scifact.model.label.Label_sentences(df)¶

Use the label_model and label the selected abstracts as Supports/Rejects

Parameters: df (pandas df) – df containing claim and sentences selected by the rationale model
Returns: labels predicted for each sentence selected by the rationale model
Return type: list

scifact.model.label.encode(sentences, claims, tokenizer)¶

Encode sentences and claim using the labeling model tokenizer

Parameters

sentences (str) – sentences selected by the pretrained rationale selection model that are most relevant to the claim
claim (str) – claim/query entered by the user

Returns

dict with tokenized claim and sentences that are most relavant to the claim

Return type

encoded_dict

scifact.model.pretrained_model module¶

class scifact.model.pretrained_model.rationale_label_selection¶

Bases: object

Cosine_Evidence_Selection(top_matches, df)¶

Given number of top matches and df containing the claim and sentences, find the most relevant sentences :param top_matches: int

Number of top sentences to find

Parameters: df – pd dataframe Dataframe containing claim/query and all cited document sentences
Returns: list of predicted sentences

abstract_selection(doc_query, references2, top_matches, data_copy)¶

Given a claim/query, text of all its citations, number of matches required and the arxiv dataset,: prints the abstracts

Parameters

doc_query (str) – user entered claim/query
references2 (str) – text of all the citations combined together
top_matches (int) – number of matching abstracts to extract
data_copy (pandas dataframe) – arxiv dataset which contains the details of all pdfs and their authors, links etc

Returns

call to function find_extracts_labels

download_all_ref_content(all_ref, references2, data_original)¶

Given a list of citations, download cited documents from the internet and combine them.: Example: If the provided citation list is [3,6,9], the function will search the references part of the primary pdf, locate the titles of the pdf corresponding to the 3rd, 6th and 9th citations, download them, preprocess them and combine them into a single str

Parameters

all_ref (list) – list of all the citation numbers
references2 (str) – References section of the primary pdf
data_original (pandas dataframe) – arxiv dataset which contains the details of all pdfs and their authors, links etc

Returns

ref_str which is a str of all the sentences from the different cited documents combined

Return type

str

Returns

ref_list which is a list of all the sentences from the different cited documents combined

Return type

list

find_extracts_labels(doc_query, all_ref_text, top_matches_entered)¶: Given a claim/query, text from cited documents and top matches, print the relevant sentences :param doc_query: user entered claim/query :type doc_query: str :param ref_text_list: list of all the sentences from the different cited documents combined :type ref_text_list: list :param top_matches_entered: number of relevant sentences to return :type top_matches_entered: int

preprocess_query(doc_query)¶

Given a claim/query, function finds the citation within the sentence. Example: If claim/query is “Covid spread through air[3,6,9] and transmits fast”; The function is able to find the citation numbers: [3,6,9] and return this as a list

Parameters

doc_query (str) – user entered claim/query
top_matches (int) – number of matching abstracts to extract
data_copy (pandas dataframe) – arxiv dataset which contains the details of all pdfs and their authors, links etc

Returns

all_ref which is a list of all the citation numbers

Return type

list

printwd()¶

scifact.model package¶

Submodules¶

scifact.model.download_pdf module¶

scifact.model.label module¶

scifact.model.pretrained_model module¶

Module contents¶