scifact.tasks package¶
Submodules¶
scifact.tasks.arxiv_preprocess module¶
- scifact.tasks.arxiv_preprocess.arxiv_clean(df)¶
Clean the arxiv dataset to be used for searching of pdf links
- Parameters
df (pandas dataframe) – arxiv dataset which contains the details of all pdfs and their authors, links etc
- Returns
cleaned arxiv dataset
- Return type
pandas dataframe
- scifact.tasks.arxiv_preprocess.rem_bracket(line)¶
- scifact.tasks.arxiv_preprocess.rem_unwanted(line)¶
scifact.tasks.download module¶
- class scifact.tasks.download.ContentData(*args, **kwargs)¶
Bases:
luigi.task.ExternalTask- DATA_ROOT = 's3://advancedpythonmeenu/scifact/'¶
- client = <luigi.contrib.s3.S3Client object>¶
- data_name = <luigi.parameter.Parameter object>¶
- output()¶
The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Targetor a list ofTargetinstances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
- class scifact.tasks.download.DownloadData(*args, **kwargs)¶
Bases:
luigi.task.Task- LOCAL_ROOT = '/Users/meenu/Desktop/Harvard/AdvancedPython/Assignments/Pset3/2021sp-scifact-lalitanjali-ai/docsrc/data'¶
- S3_ROOT = 's3://advancedpythonmeenu/scifact/'¶
- SHARED_RELATIVE_PATH = 'dataset/'¶
- data = <luigi.parameter.Parameter object>¶
- output()¶
The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Targetor a list ofTargetinstances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
- path = '/Users/meenu/Desktop/Harvard/AdvancedPython/Assignments/Pset3/2021sp-scifact-lalitanjali-ai/docsrc/data/dataset/'¶
- requires()¶
The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
- run()¶
The task run method, to be overridden in a subclass.
See Task.run
- class scifact.tasks.download.DownloadModel(*args, **kwargs)¶
Bases:
luigi.task.Task- LOCAL_ROOT = '/Users/meenu/Desktop/Harvard/AdvancedPython/Assignments/Pset3/2021sp-scifact-lalitanjali-ai/docsrc/data'¶
- S3_ROOT = 's3://advancedpythonmeenu/scifact/'¶
- SHARED_RELATIVE_PATH = 'saved_models/'¶
- model = <luigi.parameter.Parameter object>¶
- output()¶
The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Targetor a list ofTargetinstances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
- path = '/Users/meenu/Desktop/Harvard/AdvancedPython/Assignments/Pset3/2021sp-scifact-lalitanjali-ai/docsrc/data/saved_models/'¶
- requires()¶
The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
- run()¶
The task run method, to be overridden in a subclass.
See Task.run
- class scifact.tasks.download.SavedModel(*args, **kwargs)¶
Bases:
luigi.task.ExternalTask- MODEL_ROOT = 's3://advancedpythonmeenu/scifact/'¶
- client = <luigi.contrib.s3.S3Client object>¶
- model = <luigi.parameter.Parameter object>¶
- output()¶
The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Targetor a list ofTargetinstances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
scifact.tasks.tasks module¶
- class scifact.tasks.tasks.Preprocess_arxiv_data(*args, **kwargs)¶
Bases:
luigi.task.Task- LOCAL_ROOT = '/Users/meenu/Desktop/Harvard/AdvancedPython/Assignments/Pset3/2021sp-scifact-lalitanjali-ai/docsrc/data'¶
- SHARED_RELATIVE_PATH = 'dataset/'¶
- ext = '.csv'¶
- glob1 = 'preprocessed_arXivData.csv'¶
- glob_path = <luigi.parameter.Parameter object>¶
- output()¶
The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Targetor a list ofTargetinstances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
- path = '/Users/meenu/Desktop/Harvard/AdvancedPython/Assignments/Pset3/2021sp-scifact-lalitanjali-ai/docsrc/data/dataset/'¶
- requirement = None¶
- requires() → dict¶
Returns the requirements of a task
Assumes the task class has
Requirementdescriptors, which can clone the appropriate dependences from the task instance.- Returns
requirements compatible with task.requires()
- Return type
dict
- run()¶
The task run method, to be overridden in a subclass.
See Task.run
- class scifact.tasks.tasks.find_display_abstracts(*args, **kwargs)¶
Bases:
luigi.task.Task- doc_query = <luigi.parameter.Parameter object>¶
- model_label = 'label_roberta_large_fever_scifact.zip'¶
- model_rationale = 'rationale_roberta_large_fever.zip'¶
- pdf_name = <luigi.parameter.Parameter object>¶
- requires()¶
The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
- run()¶
The task run method, to be overridden in a subclass.
See Task.run
- top_matches = <luigi.parameter.IntParameter object>¶