Building workflows

Workflows represent chaining multiple blocks together. Imagine calling multiple tasks in a row, doing conditional logic, extracting data to a CSV, etc. All of these ideas will be supported within our workflows feature

All of our workflows are defined in YAML format, and allow chaining multiple components together to generate some defined output

Today, we’re building the workflows for most of our customers as we iterate on the specification. This is a cumbersome experience — rest assured that we are improving our web application that will offer a significantly enhanced user experience for this process.

Request - Create Workflow (YAML)

POST api.skyvern.com/api/v1/workflows

Use this API to create a workflow. The response of this API is a workflow_permanent_id, which can be used to run workflows below

ParameterTypeRequired?Sample ValueDescription
titleStringyesCalculate a product price diff %A title for a workflow
descriptionStringyesCompare two products’ price diff % on alibaba vs newlabelwholesaleA description for a workflow

Workflow Parameters

Workflow parameters are specific parameters you’re going to be passing into the workflows to allow execution

ParameterTypeRequired?Sample ValueDescription
keyStringyesalibaba_urlunique key corresponding to a specific parameter
parameter_typeEnumyesworkflowThe type of parameter for the workflow. Meant to indicate whether this parameter is being passed in via the run workflow endpoint (workflow), or whether a parameter is the output of a different workflow step (output). Can be workflow , context, aws_secret, or output
workflow_parameter_typeEnumno?stringThe actual type of the parameter, meant to be used for type-safety reasons.
Supported types:
STRING = “string”
INTEGER = “integer”
FLOAT = “float”
BOOLEAN = “boolean”
JSON = “json”
FILE_URL = “file_url”
descriptionstringyesAlibaba product URL for checking the price of the productDescription of the parameter

Blocks

Blocks are the building block (pun intended) of Skyvern’s workflows. Each block is one discrete task you want to occur. Multiple blocks may be chained together, with outputs from one block being fed as inputs to the next block.

ParameterTypeRequired?Sample ValueDescription
block_typeEnumyesTaskSignifying the type of block this is in the workflow
labelStringyesget_alibaba_priceThe unique identifier for this block within this workflow
parameter_keysArrayyesparameter_keys:
- alibaba_price
- NLW_price
The list of parameters this block depends on for execution
output_parameter_keystringyesoutput_parameter_key: price_diff_percentageThe optional output of the block, so that it may be used by other blocks
{{ block specific parameters }}??yesOther parameters, specific to the block_type specified above. These are covered below

Building blocks supported today:

  1. TaskBlock: The magic block. Skyvern navigates through the websites to take actions and/or extract information.
  2. ForLoopBlock
  3. CodeBlock
  4. TextPromptBlock
  5. DownloadToS3Block
  6. UploadToS3Block
  7. SendEmailBlock
  8. FileParserBlock

To read about specific blocks, check out the block documentation

Managing Credentials and Sensitive Information

This is something the Skyvern team will need to set you up with today. If you’re interested, please book a call with Suchintan

Common concepts

continue_on_failure

continue_on_failure flag indicates whether a failed block execution should block subsequent blocks or not

error_code_mapping

Maps errors to specific error codes so you can have deterministic outputs

persist_browser_session

The persist_browser_session flag indicates whether the browser session should be retained between different workflow runs. When enabled, it uses the same user_data_dir for each run and updates it at the end of each run. This is useful for maintaining the browser state, such as login sessions and cookies, across multiple runs of the same workflow, leading to more efficient and seamless execution.

Note: This flag is set at the workflow level, not the block level, meaning it applies to the entire workflow’s session persistence rather than individual blocks.

output_parameter_key (autogenerated)

Specifies the output parameter of a specific block so it can be re-used in a subsequent block

Its format is always: {label}_output

ie the output parameter for a block like this (which can be referenced in subsequent blocks) would be: login_output

- block_type: task
      label: login
      parameter_keys:
        - credentials
      url: website_url
      navigation_goal: >-
        If you're not on the login page, navigate to login page and login using
        the credentials given. First, take actions on promotional popups or cookie prompts that could prevent taking other action on the web page. If you fail to login to find the login page or can't login after several trials, terminate. If login is
        completed, you're successful. 
      data_extraction_goal: >-
        Extract anything for the sake of this demo
      error_code_mapping:
        stuck_with_popups: terminate and return this error if you can't close popups after several tries and can't take the necessary actions on the website because there is a blocking popup on the page
        failed_to_login: terminate and return this error if you fail logging in to the page

Example workflow

title: Invoice Downloading Demo (Jun 13)
description: >-
  Login to the website, download all the invoices after a date, email the
  invoices
workflow_definition:
  parameters:
    - key: website_url
      parameter_type: workflow
      workflow_parameter_type: string
    - key: credentials
      parameter_type: bitwarden_login_credential
      bitwarden_client_id_aws_secret_key: SECRET
      bitwarden_client_secret_aws_secret_key: SECRET
      bitwarden_master_password_aws_secret_key: SECRET
      bitwarden_collection_id: SECRET
      url_parameter_key: website_url
    - key: invoice_retrieval_start_date
      parameter_type: workflow
      workflow_parameter_type: string
    - key: smtp_host
      parameter_type: aws_secret
      aws_key: SKYVERN_SMTP_HOST_AWS_SES
    - key: smtp_port
      parameter_type: aws_secret
      aws_key: SKYVERN_SMTP_PORT_AWS_SES
    - key: smtp_username
      parameter_type: aws_secret
      aws_key: SKYVERN_SMTP_USERNAME_SES
    - key: smtp_password
      parameter_type: aws_secret
      aws_key: SKYVERN_SMTP_PASSWORD_SES
    - parameter_type: context
      key: order_history_url
      source_parameter_key: get_order_history_page_url_and_qualifying_order_ids_output
    - parameter_type: context
      key: order_ids
      source_parameter_key: get_order_history_page_url_and_qualifying_order_ids_output
    - parameter_type: context
      key: order_id
      source_parameter_key: order_ids
  blocks:
    - block_type: task
      label: login
      parameter_keys:
        - credentials
      url: website_url
      navigation_goal: >-
        If you're not on the login page, navigate to login page and login using the credentials given, and then navigate to the personal account page. First, take actions on promotional popups or cookie prompts that could prevent taking other action on the web page. Then, try to login and navigate to the personal account page. If you fail to login to find the login page or can't login after several trials, terminate. If you're on the personal account page, consider the goal is completed.
      error_code_mapping:
        stuck_with_popups: terminate and return this error if you can't close popups after several tries and can't take the necessary actions on the website because there is a blocking popup on the page
        failed_to_login: terminate and return this error if you fail logging in to the page
    - block_type: task
      label: get_order_history_page_url_and_qualifying_order_ids
      parameter_keys:
        - invoice_retrieval_start_date
      navigation_goal: Find the order history page. If there is no orders after given start date, terminate.
      data_extraction_goal: >-
        You need to extract the order history page url by looking at the current
        page you're on. You need to extract contact emails you see on the page. You also need to extract the order ids for orders that
        happened on or after invoice_retrieval_start_date. Make sure to filter
        only the orders that happened on or after invoice_retrieval_start_date. You need to compare each order's date with the invoice_download_start_date. You can only include an order in the output if the order's date is after or the same as the invoice_download_start_date.
        While comparing dates, first compare year, then month, then day. invoice_retrieval_start_date
        is in YYYY-MM-DD format. The dates on the websites may be in different formats, compare accordingly and compare year, date, and month.
      error_code_mapping:
        failed_to_find_order_history_page: return this error if you can't find the order history page on the website
        no_orders_found_after_start_date: return this error if there are no orders after the specified invoice_download_start_date
      data_schema:
        type: object
        properties:
          order_history_url:
            type: url
            description: >-
              The exact URL of the order history page. Do not make any
              assumptions. Return the URL that's passed along in this context.
          contact_emails:
            type: array
            items:
                type: string
                description: Contact email for the ecommerce website you're on. If you can't find any return null
          date_comparison_scratchpad:
            type: string
            description: >-
                You are supposed to filter the orders that happened on or after the invoice_download_start_date. Think through how you will approach this task step-by-step here. Consider these before starting the comparison:
                - What format is the order date in? How can you parse it into a structured format?
                - What is the correct way to compare two dates?
                - How will you compare the order dates to the invoice_download_start_date? 
                
                Write out your thought process before filling out the order_ids field below. Remember, the original date may be in any format, so parse it carefully! The invoice_download_start_date will be an exact date you can directly compare against in the format YYYY-MM-DD.
          order_ids:
            type: array
            items:
              type: object
              properties:
                order_date:
                  type: iso-8601-date-string
                order_id:
                  type: string
            description: >-
              Return a list of order id strings. Do not return order ids of
              orders that happened before the specified
              invoice_retrieval_start_date
    - block_type: for_loop
      label: iterate_over_order_ids
      loop_over_parameter_key: order_ids
      continue_on_failure: true
      loop_blocks:
        - block_type: task
          label: download_invoice_for_order
          complete_on_download: true
          continue_on_failure: true
          parameter_keys:
            - order_id
          url: order_history_url
          navigation_goal: Download the invoice of the order with the given order ID. Make sure to download the invoice for the given order id. If the element tree doesn't have a matching order id, check the screenshots. Complete if you have successfully downloaded the invoice according to action history, if you were able to download it, you'll see download_triggered=True for the last step. If you don't see a way to download an invoice, navigate to the order page if possible. If there's no way to download an invoice terminate. If the text suggests printing, you can assume you can download it. Return click action with download=True if you want to trigger a download.
          error_code_mapping:
            not_possible_to_download_invoice: return this error if the website doesn't allow downloading/viewing invoices
            cant_solve_captcha: return this error if captcha isn't solved after multiple retries
    - block_type: upload_to_s3
      label: upload_downloaded_files_to_s3
      path: SKYVERN_DOWNLOAD_DIRECTORY
    - block_type: send_email
      label: send_email
      smtp_host_secret_parameter_key: smtp_host
      smtp_port_secret_parameter_key: smtp_port
      smtp_username_secret_parameter_key: smtp_username
      smtp_password_secret_parameter_key: smtp_password
      sender: hello@skyvern.com
      recipients:
        - founders@skyvern.com
      subject: Skyvern - Downloaded Invoices Demo
      body: website_url
      file_attachments:
        - SKYVERN_DOWNLOAD_DIRECTORY

Update Workflows (YAML)

PUT api.skyvern.com/api/v1/workflows/{workflow_permanent_id}

Workflows are versioned. Each time you create a new workflow or update the workflow, you will get a new workflow_id and but the workflow_permanent_id will remain the same.

The update workflow API payload should be exactly the same as the create workflow API payload, a YAML.