scifact.model package¶
Submodules¶
scifact.model.download_pdf module¶
- scifact.model.download_pdf.extract_ref_pdf(text)¶
Extract portion of the pdf that appears in the References section
- Parameters
text (str) – contents of the pdf in str form
- Returns
text of all the References found in the pdf
- Return type
str
- scifact.model.download_pdf.find_download_pdf(pdf_name, data)¶
Given a name of a pdf, downloads the pdf
- Parameters
pdf_name (str) – name of the pdf to download which contains to claim
data (pandas dataframe) – arxiv dataset which contains the details of all pdfs and their authors, links etc
- Returns
all the content/text found in the pdf
- Return type
str
- scifact.model.download_pdf.unzip(path_to_zip_file, dir_path)¶
Unzip a folder
- Parameters
path_to_zip_file (str) – path to the zipped folder
dir_path (str) – path to place unzipped files
scifact.model.label module¶
- scifact.model.label.Label_sentences(df)¶
Use the label_model and label the selected abstracts as Supports/Rejects
- Parameters
df (pandas df) – df containing claim and sentences selected by the rationale model
- Returns
labels predicted for each sentence selected by the rationale model
- Return type
list
- scifact.model.label.encode(sentences, claims, tokenizer)¶
Encode sentences and claim using the labeling model tokenizer
- Parameters
sentences (str) – sentences selected by the pretrained rationale selection model that are most relevant to the claim
claim (str) – claim/query entered by the user
- Returns
dict with tokenized claim and sentences that are most relavant to the claim
- Return type
encoded_dict
scifact.model.pretrained_model module¶
- class scifact.model.pretrained_model.rationale_label_selection¶
Bases:
object- Cosine_Evidence_Selection(top_matches, df)¶
Given number of top matches and df containing the claim and sentences, find the most relevant sentences :param top_matches: int
Number of top sentences to find
- Parameters
df – pd dataframe Dataframe containing claim/query and all cited document sentences
- Returns
list of predicted sentences
- abstract_selection(doc_query, references2, top_matches, data_copy)¶
- Given a claim/query, text of all its citations, number of matches required and the arxiv dataset,
prints the abstracts
- Parameters
doc_query (str) – user entered claim/query
references2 (str) – text of all the citations combined together
top_matches (int) – number of matching abstracts to extract
data_copy (pandas dataframe) – arxiv dataset which contains the details of all pdfs and their authors, links etc
- Returns
call to function find_extracts_labels
- download_all_ref_content(all_ref, references2, data_original)¶
- Given a list of citations, download cited documents from the internet and combine them.
Example: If the provided citation list is [3,6,9], the function will search the references part of the primary pdf, locate the titles of the pdf corresponding to the 3rd, 6th and 9th citations, download them, preprocess them and combine them into a single str
- Parameters
all_ref (list) – list of all the citation numbers
references2 (str) – References section of the primary pdf
data_original (pandas dataframe) – arxiv dataset which contains the details of all pdfs and their authors, links etc
- Returns
ref_str which is a str of all the sentences from the different cited documents combined
- Return type
str
- Returns
ref_list which is a list of all the sentences from the different cited documents combined
- Return type
list
- find_extracts_labels(doc_query, all_ref_text, top_matches_entered)¶
Given a claim/query, text from cited documents and top matches, print the relevant sentences :param doc_query: user entered claim/query :type doc_query: str :param ref_text_list: list of all the sentences from the different cited documents combined :type ref_text_list: list :param top_matches_entered: number of relevant sentences to return :type top_matches_entered: int
- preprocess_query(doc_query)¶
Given a claim/query, function finds the citation within the sentence. Example: If claim/query is “Covid spread through air[3,6,9] and transmits fast”; The function is able to find the citation numbers: [3,6,9] and return this as a list
- Parameters
doc_query (str) – user entered claim/query
top_matches (int) – number of matching abstracts to extract
data_copy (pandas dataframe) – arxiv dataset which contains the details of all pdfs and their authors, links etc
- Returns
all_ref which is a list of all the citation numbers
- Return type
list
- printwd()¶