- CHOOSE: "Workflow": arrow_right: "Add new Workflow"
- ADD: Pre-configured modules in the correct order
- SET: startup schedule
🎉 READY give yourself a high-five 🎉
Workfolw is the most important feature of WitCloud. Its task is to run previously configured modules. In Workfolw, the order in which the functions collect, process, and report are to be executed are specified by combining them into a string.
# An example of operation
- Every hour we combine the hits that were collected by the Google Analytics collector into sessions.
- We collect advertising data from the Google Ads system.
- We attach the data from Google Ads to the session table.
- We collect advertising data from the Facebook Ads system.
- We attach data from Facebook Ads to the session table.
- We update the data in reports in Data Studio.
# Before you start
The module is used to manage the sequence of processes execution. Before starting to build a workflow, it is necessary to prepare its components (Collect, Process and Report modules) in advance.
# Start creating the module
To create a new workflow, select workflow (1) from the menu on the left, and then Add new Workflow (2).
# Initial settings
First of all, in the field (1) we name the workflow data. Below is a diagram of the sequence of events we are designing.
Individual modules are added by selecting the "+" sign.
Among the available options, we have:
Map (Job Collection),
Pass (Time range).
The operation of individual options is presented in detail later in the document.
The workflow runs in a specific direction, symbolically marked from the "START" field towards the "END" field. The order of adding individual processes is important.
After configuring the order of events, at the end of the workflow creation, we set the schedule of its execution after enabling the Shedule Workflow (1) option. It is necessary to select the time zone (2) in which the tasks will be performed. It is possible to turn on the workflow at set intervals (3) (every specified number, hours, days, weeks or months). After completing the configuration, click the "Create" button
This is the basic step type in Workflow. The object of the logic performed by the task is the WitCloud module.
A task of the "Task" type is performed once (for one table) in the workflow, in the case of repetitive tasks (loops), the appropriate function is "Map".
Tasks can be added after creating the workflow by clicking the "+" (2) sign on the diagram, and then selecting the “Task” icon
Then we go to the first step of setting a given task, which allows us to name (1), add a comment (2), and in the last point of selecting the module that we want to use (3a). The drop-down menu allows you to select the function (3b).
A task can be any previously configured module: Collect, Process Report.
We confirm the choice by clicking the "next" button. We then get a preview of our Workflow, with the task placed in it.
The map function should be used when we want to process data for many tables, e.g. from a specific time range, previously defined with the ** Pass function. **
The map function can be added after creating the workflow by clicking the "+" (2) sign on the diagram, and then selecting the “Map” icon.
In the second step of the configuration, we define the name (1), add an optional description (2) and set the method of loading data (3) for which we want to perform a specific operation by clicking the "+" icon (4).
In point (3) we have the option to choose:
All at one - the option refers to the range of variables (e.g. dates) defined in the pass function. It allows you to attach data "at once", without dividing into smaller batches, or iteratively every given portion of data (eg 10 days at a time). The simultaneous process is faster, but has technical limitations - eg data read from spreadsheet files has a limited number of queries handled per second, which in some cases may result in the loss of some data.
Specific number at once - the input variable range will be divided into smaller packages containing the specified number of queries processed simultaneously. The process may take a long time, but it will provide more secure data integration.
It is recommended to specify 30 as the maximum amount of data.
In the next step (4), we select a task that is to be completed in predetermined steps. We can choose from:
task, choice, and pass. The actions of individual elements are the same as in the initial workflow level. This allows you to nest and expand more complex functionalities.
The pass function allows you to specify the date range for which the operations will be performed. This gives you the benefits of analyzing historical data and optimizing the costs of current analyzes.
The date range specified in "Pass" will be considered when processing only the next step in the Workflow, which must be Map.
The date range can be added after creating the workflow by clicking the "+" (2) sign on the diagram, and then selecting the "Pass" icon.
In the second step of the configuration, we specify the name (1), add an optional description (2) and determine the method of specifying the date range (3).
We have the following options:
Lat X Days - this option specifies dates dynamically, in the range from the current date to x days back.
Dynamic Date Range - this option specifies dates dynamically. It allows you to set a time range in relation to the current date - e.g. a range from 14 to 3 days ago.
Static Date Range - allows you to statically specify a fixed date range.
After selecting the range, all dates will be displayed that will be processed in the next step of the workflow.
The choice function allows you to enter a condition that will automatically separate further workflow steps. The function allows e.g. to attach data flowing periodically from CRM systems. The condition can be added after the workflow has been created by clicking the "+" (2) sign on the diagram, and then selecting the “Choice” icon.
In the second step of the configuration, we specify the name (1), add an optional description (2) and set the logical selection conditions (3).
The condition can be specified as a value:
Is - meeting certain criteria,
Not - excluding certain criteria,
And - meeting the sum of the criteria,
Or - meeting one of the criteria.
The following criteria operators are available:
== - exactly matches the value
<- less than the value,
**> ** - greater than the value,
<= - less than or equal to,
> = - greater than or equal to.
After completing the definition of the condition, click "Add". On the screen we can see the updated scheme of our workflow, forked where the “Choice” was added. In the next steps, we add more points on the appropriate branches by clicking “+” for the option that meets our condition (1), and if the condition is not met (2).
# Best practice
# Order of modules
The order of tasks performed
The correct order of creating a Workflow chain is crucial. The order of adding individual elements should be viewed through the prism of data added to the main session table using the "small steps" method. For example, before adding advertising and cost data from Google Ads:
- Process sessions - Collect Google Analytics module
- Download Google Ads data - Google Ads collect module
Only now should you start the Google Ads process
The following is an example of an incorrectly configured Workflow:
In the first case, the Google Ads _ (Google Ads Collect) _ data collector was not selected, so the executed Google Ads - join ads with GA session module did not have data to be attached to the session table.
In the latter case, all the necessary modules are present, but the order in which they were executed was incorrect. First, select the Collect modules responsible for collecting data, and then those responsible for adding them to the session table or transformations.
An example of properly prepared Workflows is presented below:
The first example shows a standard solution where session data is first collected in one chain (GA sessions module), then advertising data is collected (Google Ads - Collect module), and in the last step, advertising data is attached to the session table (module Google Ads - join with GA sessions).
In the second example, Workflow was set up responsible only for collecting advertising data. In the first step, advertising data is collected (Google Ads - Collect module), and then it is attached to the session table (Google Ads - join with GA sessions module). In order for such a solution to work properly, it is necessary to create a separate Workflow in which the sessions are processed. It must be performed before the Workflow attaching advertising data - otherwise the information will not be attached because there will be no session table created. Workflow execution order can be set by selecting the start time. It should be borne in mind that such a solution, despite its correctness, may cause problems in the case of e.g. prolonged processing of one of the Workflow - if it ends after starting the next Workflow, the data will not connect correctly.
Workflow for beginners
The easiest way to create a Workflow is to include the modules in the following order:
- Modules collect
- process modules
- Modules report
# Map function
Flexible date range
It is a good practice to create each workflow in the form of Map - Pass. Then, after placing tasks in one "container", it is easy to quickly define the date range to be processed. This is useful for historical data processing. It also enables quick "patching" of missing data in individual days, eg in case of technical problems.
If the planned activities are broken down into several smaller workflows, remember that in all configurations a single collect module should be used only once. Each time the process is run, the data is overwritten, so all added information (e.g. connected data from Google Ads and Facebook Ads) will be removed from the table after the session is re-processed for the already processed period.