Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Pentaho Worker Nodes system best practices

Parent article

There are several hardware, networking, and operating system recommendations for running the Pentaho Worker Nodes Product on one or more instances.

Resource best practices

This section provides basic resource requirements and best practices for running Pentaho Worker Nodes on one or more instances. You can scale your own worker nodes environments based on your work item load.

This guideline uses the following definitions:

  • Minimum guidelines

    The minimum amounts of RAM, CPU, and available disk space required to run a system instance.

  • Best practice guidelines

    The best practice amounts of RAM, CPU, and available disk space for system instances.

A system in which all instances meet or exceed these best practices can index more documents and can process documents faster than a system in which the instances meet only the minimum amounts.

ResourceMinimum guidelinesBest practice guidelines
RAM16 GB32 GB
CPU4 cores8 cores
Available disk space50 GB500 GB

ImportantEach instance uses all available RAM and CPU resources on the server or virtual machine on which it is installed.

Single-instance systems versus multi-instance systems

A system can be a single instance or it can have multiple instances of four or more nodes. Each instance must meet the minimum RAM, CPU, and disk space requirements.

Single instance system

A single-instance system is useful for testing and demonstration purposes. It requires only a single server and can perform all product functionality. However, a single-instance system has the following drawbacks:

  • It has a single point of failure. If the instance hardware fails, you lose access to the system.
  • With no additional instances, you cannot choose where to run services. All services run on that one instance.

Multiple instance system

A multi-instance system is a best practice for use in a production environment because it offers the following advantages:

  • You can control how services are distributed across the multiple instances, providing improved service redundancy, scale-out, and availability.
  • A multi-instance system can survive instance outages. For example, with a four-instance system running the default distribution of services, the system can lose one instance and remain available.
  • Performance is improved since work is performed in parallel across instances.
  • You can add additional instances to the system at any time.

You cannot convert a single-instance system to a production-ready multi-instance system by adding new instances since the system does not support adding additional master instances. Master instances are special instances that run a particular set of system services. Single instance systems have one master instance. Multi-instance systems have a minimum of three master servers.

By adding additional instances to a single-instance system, your system still has only one master instance, meaning there is a single point of failure for the essential services that only a master instance can run.

A multi-instance system should have a minimum of three master servers. A non-master or worker node can be added to a multi-instance if the minimum of three is the starting point.

CautionThe three master instance IP values should be determined before you are Running the setup script. Once the system is installed, any IP changes would require the complete removal and re-installation of the system to enact the changes, such as changing single-instance IP values to multi-instance IP values.

For information on adding instances to an existing system, see the Administrator Help, which is available from the Administration App.

Size of cluster

The number of nodes or masters in the cluster corresponds to the amount of fault tolerance you want to build into your system.

In a multi-node cluster, Zookeeper maintains a quorum of nodes: a minimum number of nodes required for the cluster to function optimally. ZooKeeper defines the quorum as ceil(N/2) where N is number of masters. Example clusters using this ceiling function are as follows:

  • For 2 masters, quorum size is 2
  • For 3 masters, quorum size is 2
  • For 5 masters, quorum size is 3
In the above example, the best practice is to create a cluster of 5 masters for reliability and easy management. With 5 masters, you can take 1 master out for maintenance and still function effectively if one of the remaining masters fails.

Docker and operating system requirements

To be a system instance, each server or virtual machine you provide must meet the following requirements:

  • Must have Docker version 1.13.1 or later installed.
  • Must run a 64-bit Linux distribution.
NoteYou must install the current Docker version suggested by your operating system, unless that version is earlier than 1.13.1. The system cannot run with Docker versions prior to 1.13.1.

For more information about the Docker versions suggested by various operating systems, refer to the Administrator Help, which is available from the Administration App.

Networking

The following sections describe the network usage and requirements for both system instances and services. When networking, do the following:

  • You must configure the network settings for each service when you install the system. You cannot change these settings after the system is up and running.
  • If your networking environment changes after you deploy the system, such that the system can no longer function with its current networking configuration, you need to reinstall the system. For more information about networking, see the installation guide included with your installation.

For more information about adding network security, see Enabling secure communication for Pentaho Worker Nodes.

Instance IP address requirements

All instance IP addresses must be static, including both internal and external network IP addresses, if applicable to your system.

If the IP address of any instance changes, see the installation guide included with your installation.

Network types

Each system service can bind to one type of network, either internal or external, for receiving incoming traffic. If your network infrastructure supports having two networks, you may want to isolate the traffic for most system services to a secured internal network that has limited access.

You can use either a single network type for all services or a mix of both types. If you want to use both types, every instance in your system must be addressable by two IP addresses: one on your internal network and one on your external network. If you use only one network type, each instance needs only one IP address.

Allowing access to external resources

Regardless of whether you are using a single network type or a mix of types, you need to configure your network environment to verify that all instances have outgoing access to the external resources you want to use, including:

  • The data sources where your data is stored.
  • Identity providers for user authentication.
  • Email servers that you want to use for sending email notifications.

Ports

Each service binds to a number of ports for receiving incoming traffic.

Before installing the system, you can configure the services to use different ports, or use the default values shown below.

External ports

The following table contains information about the service ports that users use to interact with the system. On every instance in the system, each of these ports must be accessible from:

  • Any network that requires administrative or search access to the system.
  • Every other instance in the system.
Default Port ValueServicePurpose
8000Admin-AppAccess to administrative interfaces:
  • Administration App
  • Administrative REST API
  • Administrative CLI
38080Content Execution RouterEntry point to Worker Node (non-secure setup)
38443Content Execution RouterEntry point to Worker Node (non-secure setup)

If you are enabling security, you need to indicate a port value for secure communication. See Enabling secure communication for Pentaho Worker Nodes for more information.

Internal ports

Determine which ports each system service should use. You can use the default ports for each service or specify different ones. In either case, these restrictions apply:

  • Every port must be accessible from all instances in the system.
  • Some ports must be accessible from outside the system.
  • All port values must be unique; no two services can share the same port.
  • For information on port usage and requirements for each service, see Ports.

You can find more information on how these ports are used in the documentation for the third-party software underlying each service.