seesaw Package¶
seesaw
Package¶
ArchiveTeam seesaw kit
config
Module¶
Configuration value manipulation.
-
class
seesaw.config.
ConfigValue
(name, title='', description='', default=None, editable=True, advanced=True)[source]¶ Bases:
object
Configuration value validator.
The collection methods are useful for providing user configurable settings at run time. For example, when a pipeline file is executed by the warrior, the additional config values are presented in the warrior configuration panel.
-
collector
= None¶
-
-
class
seesaw.config.
NumberConfigValue
(*args, **kwargs)[source]¶ Bases:
seesaw.config.ConfigValue
-
class
seesaw.config.
StringConfigValue
(*args, **kwargs)[source]¶ Bases:
seesaw.config.ConfigValue
-
seesaw.config.
realize
(v, item=None)[source]¶ Makes objects contain concrete values from an item.
A silly example:
class AddExpression(object): def realize(self, item): return = item['x'] + item['y'] pipeline = Pipeline(ComputeMath(AddExpression()))
In the example, we want to compute an addition expression. The values are defined in the Item.
event
Module¶
Actor model.
externalprocess
Module¶
Running subprocesses asynchronously.
-
class
seesaw.externalprocess.
AsyncPopen
(*args, **kwargs)[source]¶ Bases:
object
Asynchronous version of
subprocess.Popen
.Deprecated.
-
class
seesaw.externalprocess.
AsyncPopen2
(*args, **kwargs)[source]¶ Bases:
object
Adapter for the legacy AsyncPopen
-
stdin
¶
-
-
class
seesaw.externalprocess.
CurlUpload
(target, filename, connect_timeout='60', speed_limit='1', speed_time='900', max_tries=None)[source]¶ Bases:
seesaw.externalprocess.ExternalProcess
Upload with Curl process runner.
-
class
seesaw.externalprocess.
ExternalProcess
(name, args, max_tries=1, retry_delay=30, accept_on_exit_code=None, retry_on_exit_code=None, env=None)[source]¶ Bases:
seesaw.task.Task
External subprocess runner.
-
class
seesaw.externalprocess.
RsyncUpload
(target, files, target_source_path='./', bwlimit='0', max_tries=None, extra_args=None)[source]¶ Bases:
seesaw.externalprocess.ExternalProcess
Upload with Rsync process runner.
-
class
seesaw.externalprocess.
WgetDownload
(args, max_tries=1, accept_on_exit_code=None, retry_on_exit_code=None, env=None, stdin_data_function=None)[source]¶ Bases:
seesaw.externalprocess.ExternalProcess
Download with Wget process runner.
item
Module¶
Managing work units.
-
class
seesaw.item.
Item
(pipeline, item_id, item_number, keep_data=False, prepare_data_directory=True, **kwargs)[source]¶ Bases:
seesaw.item.ItemData
A thing, or work unit, that needs to be downloaded.
It has properties that are filled by the
Task
.An Item behaves like a mutable mapping.
Note
State belonging to a item should be stored on the actual item itself. That is, do not store variables onto a
Task
unless you know what you are doing.-
class
ItemState
[source]¶ Bases:
object
State of the item.
-
canceled
= 'canceled'¶
-
completed
= 'completed'¶
-
failed
= 'failed'¶
-
running
= 'running'¶
-
-
class
TaskStatus
[source]¶ Bases:
object
Status of happened on a task.
-
completed
= 'completed'¶
-
failed
= 'failed'¶
-
running
= 'running'¶
-
-
canceled
¶
-
completed
¶
-
end_time
¶
-
failed
¶
-
finished
¶
-
item_id
¶
-
item_number
¶
-
item_state
¶
-
pipeline
¶
-
start_time
¶
-
task_status
¶
-
class
-
class
seesaw.item.
ItemData
(properties=None)[source]¶ Bases:
collections.abc.MutableMapping
Base item data property container.
- Args:
properties (dict): Original dict on_property (Event): Fired whenever a property changes.
Callback accepts:
- self
- key
- new value
- old value
-
properties
¶
pipeline
Module¶
-
class
seesaw.pipeline.
Pipeline
(*tasks)[source]¶ Bases:
object
The sequence of steps that complete a
Task
.Your pipeline will probably be something like this:
- Request an assignment from the tracker.
- Run Wget to download the file.
- Upload the downloaded file with rsync.
- Tell the tracker that the assignment is done.
project
Module¶
Project information.
-
class
seesaw.project.
Project
(title=None, project_html=None, utc_deadline=None)[source]¶ Bases:
object
Briefly describes a project metadata.
This class defines the title of the project, a short description with an optional project logo and an optional deadline. The information will be shown in the web interface when the project is running.
runner
Module¶
Pipeline execution.
task
Module¶
Managing steps in a work unit.
-
class
seesaw.task.
ConditionalTask
(condition_function, inner_task)[source]¶ Bases:
seesaw.task.Task
Runs a task optionally.
-
class
seesaw.task.
LimitConcurrent
(concurrency, inner_task)[source]¶ Bases:
seesaw.task.Task
Restricts the number of tasks of the same type that can be run at once.
-
class
seesaw.task.
PrintItem
[source]¶ Bases:
seesaw.task.SimpleTask
Output the name of the
Item
.
-
class
seesaw.task.
SetItemKey
(key, value)[source]¶ Bases:
seesaw.task.SimpleTask
Set a value onto a task.
-
class
seesaw.task.
SimpleTask
(name)[source]¶ Bases:
seesaw.task.Task
A subclassable
Task
that should do one small thing well.Example:
class MyTask(SimpleTask): def process(self, item): item['my_message'] = 'hello world!'
tracker
Module¶
Contacting the work unit server.
A Tracker refers to the Universal Tracker (https://github.com/ArchiveTeam/universal-tracker).
-
class
seesaw.tracker.
GetItemFromTracker
(tracker_url, downloader, version=None)[source]¶ Bases:
seesaw.tracker.TrackerRequest
Get a single work unit information from the Tracker.
-
class
seesaw.tracker.
PrepareStatsForTracker
(defaults=None, file_groups=None, id_function=None)[source]¶ Bases:
seesaw.task.SimpleTask
Apply statistical values on the item.
-
class
seesaw.tracker.
SendDoneToTracker
(tracker_url, stats)[source]¶ Bases:
seesaw.tracker.TrackerRequest
Inform the Tracker the work unit has been completed.
-
class
seesaw.tracker.
TrackerRequest
(name, tracker_url, tracker_command, may_be_canceled=False)[source]¶ Bases:
seesaw.task.Task
Represents a request to a Tracker.
-
DEFAULT_RETRY_DELAY
= 60¶
-
-
class
seesaw.tracker.
UploadWithTracker
(tracker_url, downloader, files, version=None, rsync_target_source_path='./', rsync_bwlimit='0', rsync_extra_args=[], curl_connect_timeout='60', curl_speed_limit='1', curl_speed_time='900')[source]¶ Bases:
seesaw.tracker.TrackerRequest
Upload work unit results.
One of the inner task is used depending on the Tracker’s response to where to upload:
RsyncUpload
CurlUpload
util
Module¶
Miscellaneous functions.
-
seesaw.util.
find_executable
(name, version, paths, version_arg='-V')[source]¶ Returns the path of a matching executable.
See also
warrior
Module¶
The warrior server.
The warrior phones home to Warrior HQ (https://github.com/ArchiveTeam/warrior-hq).
-
class
seesaw.warrior.
BandwidthMonitor
(device)[source]¶ Bases:
object
Extracts the bandwidth usage from the system stats.
-
devre
= re.compile('^\\s*([a-z0-9]+):(.+)$')¶
-
-
class
seesaw.warrior.
Warrior
(projects_dir, data_dir, warrior_hq_url, real_shutdown=False, keep_data=False)[source]¶ Bases:
object
The warrior god object.
-
class
Status
[source]¶ Bases:
object
-
INVALID_SETTINGS
= 'INVALID_SETTINGS'¶
-
NO_PROJECT
= 'NO_PROJECT'¶
-
REBOOTING
= 'REBOOTING'¶
-
RESTARTING_PROJECT
= 'RESTARTING_PROJECT'¶
-
RUNNING_PROJECT
= 'RUNNING_PROJECT'¶
-
SHUTTING_DOWN
= 'SHUTTING_DOWN'¶
-
STARTING_PROJECT
= 'STARTING_PROJECT'¶
-
STOPPING_PROJECT
= 'STOPPING_PROJECT'¶
-
SWITCHING_PROJECT
= 'SWITCHING_PROJECT'¶
-
UNINITIALIZED
= 'UNINITIALIZED'¶
-
-
class
web
Module¶
The warrior web interface.
-
class
seesaw.web.
ApiHandler
(application, request, **kwargs)[source]¶ Bases:
seesaw.web_util.BaseWebAdminHandler
Processes API requests.
-
get_template_path
()[source]¶ Override to customize template path for each handler.
By default, we use the
template_path
application setting. Return None to load templates relative to the calling file.
-
initialize
(warrior=None, runner=None)[source]¶ Hook for subclass initialization. Called for each request.
A dictionary passed as the third argument of a url spec will be supplied as keyword arguments to initialize().
Example:
class ProfileHandler(RequestHandler): def initialize(self, database): self.database = database def get(self, username): ... app = Application([ (r'/user/(.*)', ProfileHandler, dict(database=database)), ])
-
-
class
seesaw.web.
IndexHandler
(application, request, **kwargs)[source]¶ Bases:
seesaw.web_util.BaseWebAdminHandler
Shows the index.html.
-
class
seesaw.web.
ItemMonitor
(item)[source]¶ Bases:
object
Pushes item states and information to the client.
-
class
seesaw.web.
SeesawConnection
(session)[source]¶ Bases:
sockjs.tornado.conn.SockJSConnection
A WebSocket server that communicates the state of the warrior.
-
classmethod
broadcast
(event, message)[source]¶ Broadcast message to the one or more clients. Use this method if you want to send same message to lots of clients, as it contains several optimizations and will work fast than just having loop in your code.
- clients
- Clients iterable
- message
- Message to send.
-
clients
= set()¶
-
instance_id
= '14394-0.434116'¶
-
item_monitors
= {}¶
-
on_open
(info)[source]¶ Default on_open() handler.
Override when you need to do some initialization or request validation. If you return False, connection will be rejected.
You can also throw Tornado HTTPError to close connection.
- request
ConnectionInfo
object which contains caller IP address, query string parameters and cookies associated with this request (if any).
-
project
= None¶
-
runner
= None¶
-
warrior
= None¶
-
classmethod
-
seesaw.web.
start_runner_server
(project, runner, bind_address='localhost', port_number=8001, http_username=None, http_password=None)[source]¶ Starts a web interface for a manually run pipeline.
Unlike
start_warrior_server()
, this UI does not contain an configuration or project management panel.
web_util
Module¶
-
class
seesaw.web_util.
BaseWebAdminHandler
(application, request, **kwargs)[source]¶ Bases:
tornado.web.RequestHandler
-
prepare
()[source]¶ Called at the beginning of a request before get/post/etc.
Override this method to perform common initialization regardless of the request method.
Asynchronous support: Decorate this method with .gen.coroutine or .return_future to make it asynchronous (the asynchronous decorator cannot be used on prepare). If this method returns a .Future execution will not proceed until the .Future is done.
New in version 3.1: Asynchronous support.
-