Blockbridge 4.1 is the GA release of the AnyScale architecture. It includes improvements across the entire product. Highlights include:

  • ethernet-attached backplanes with high availability support for NVMe & SATA
  • improved data mobility using remote-basis clones
  • enhanced monitoring, notification and support capabilities
  • simplified base imaging via ISO and USB
  • integration with Kubernetes via CSI

1. Software Installation

Bare-metal and Virtual Imaging via ISO

Installation of Blockbridge on bare-metal platforms and virtual environments is simplified through the use of a “Live-style” bootable ISO. Boot the ISO (via USB, virtual CD-ROM, UEFI or legacy), select the disk media to install on, and write an image for the desired installation type. The platform characteristics are auto-detected and the kernel is configured with the appropriate parameters. Upon booting into the image, the system is ready for configuration. Additionally, you can specify the initial network configuration directly from the Live shell.

Openstack Cloud Images

For installation of Blockbridge in an Openstack virtual environment, cloud images are available for all installation types. Select the desired installation type and launch a new instance with the appropriate image as basis. Upon booting the instance, the system is ready for configuration. Coupled with supplied user-data, the installation flow can be completely automated.

Automated Base Software Configuration

Autosetup is a utility that automates configuration and setup of all installation types. Describe the system through a declarative YAML configuration file and autosetup will install or upgrade the software, configure High-Availability Clusters, publish disk media for consumption from the local backplane, set initial user authentication, and perform all other system configuration required to get up and running. The autosetup utility shines by intelligently applying configuration, pulling in platform specific options, and performing operations through the management control plane, and across the cluster.

CentOS 7.5

The latest CentOS 7.5 release is now supported and recommended. It is used as the basis for all pre-built bare-metal and cloud images.

2. Customer Support

Remote Support

For customers needing more hands-on support, the Remote Support capability allows a customer to easily open a secure link to Blockbridge directly from an installed system. Blockbridge support engineers are able to login and manage the system, diagnose and troubleshoot issues, or work with the customer to implement best practices. Alternatively, diagnostics and platform information may also be uploaded securely by the customer to Blockbridge without providing direct system access.

Email / Slack Notifications

Through the system monitoring subsystem, email and slack notifications are configurable to alert support personnel of actionable conditions on the chassis. These alerts provide detailed information regarding cluster member failures, system software going offline, and more.

3. Chassis

System Monitoring and Notifications

The new probectl command serves as the one-stop shop for displaying or configuring monitoring information and notifications. Blockbridge software comes with a pre-defined set of probes that report on the health and status of various subsystems. When the probe indicates an actionable condition, an alert is raised, and any configured email or slack notifications are triggered.

Enclosure Monitoring

A platform-specific chassis definition file provides enclosure and slot information for attached disk media, with specific support for NVMe and SATA devices. The system monitors power supplies, fans, and disks and reports changes in status to the chassis enclosure.

4. Platform

Blockbridge 4.14 Kernel

Blockbridge now ships and supports our own kernel version based on the latest Long Term Support (LTS) linux kernel version 4.14. This kernel provides full NVMe support, the latest network drivers, and the latest bug fixes from upstream linux. The bare-metal and cloud images include the blockbridge kernel built-in.

netcfg

Simplify network interface configuration during the imaging process or after installation using the netcfg command. View the current network settings, rename network interfaces for consistency, setup teams, configure IP addresses, MTU settings and all aspects of network interface configuration from the command line. This command is available both during the imaging process and after installation.

kernelcfg

Download the latest blockbridge kernel, specify the kernel boot order, detect platform specific options, and set kernel parameters through the kernelcfg command. This command abstracts the usual “grub” bootloader set of tools to reduce complexity and the chance for errors, while providing a simple user interface.

Tuning Daemon

Included with the Blockbridge software is a new platform specific tuning framework, where IRQ affinities, ethernet coalescing and process affinities are configured in an optimal way automatically. These settings provides lower latency, higher performance, and better system behavior across the platform.

5. High-Availability Clustering

Virtual Votes behind NAT

A vote node is critical for operation of the cluster in the event of node failure. While vote nodes could always be installed in a virtual environment, votes are now supported when operating behind NAT. For example, this feature allows a vote node to be configured in Openstack using private addressing and Floating IPs.

Persistent Reservations Updates

In a cluster configured with shared storage, Blockbridge can leverage SCSI persistent reservations to provide data protection guarantees, ensuring only a valid active cluster member is able to perform writes to disk. This mechanism was updated for compatibility with a wider range of media types.

Service Management Improvements

The internal clustering software has been updated to provide better, consistent service management and monitoring. This means that service outages are caught earlier, allowing faster failover recovery. Additionally, “systemd” is used for service management in the cluster to provide more consistent feedback to the cluster on service states, which further reduces service start times.

6. HEAL / RAID

Managed System Array

The system array is the foundation for storing the Controlplane configuration and Dataplane journal in a reliable and fault-tolerant way. Create the system array using a rich query syntax (in autosetup or manually) to ensure data is stored redundantly across shared or ethernet-attached storage. The HEAL subsystem automatically performs monitoring and repair operations for the array, utilizing a bitmap to optimize recovery and reduce the time required to resynchronize data.

Cluster-aware RAID membership

To guarantee data integrity for HEAL volumes and system arrays, an mdvote agent is tightly integrated with the RAID subsystem. This mdvote agent provides a consistent store for the versioned array membership (tracking any changes to the array), which is replicated across all members of the cluster. Wherever an array is running in the cluster, mdvote ensures the array is always assembled with valid members, and old disks that are no longer valid for the array are rejected. This external “vote” is critical to protect from power-loss recovery or when previously offline enclosures come back online: the array validates its membership, and data integrity is maintained.

Ethernet-Attached RAID

HEAL volumes are now constructed with disk media from ethernet-attached enclosures and disk nodes, if available. Building and repairing arrays with drives selected across failure domains is now a configuration checkbox, automatically providing data redundancy that sustains during power outages, node replacements, and maintenance windows.

7. Tools

Improved Host Attach

Blockbridge Host Attach functionality available in our command line tools (CLI) provides improved host device to backend virtual disk mapping information and can optionally persist the attachment across host reboot. Additionally, detaching a disk from the host is now possible by specifying the host device path, target IQN, virtual disk UUID or attachment reference, allowing for easier scripted and user interactions.

Disk Create Subcommands

The ability to create a “clone” of a disk or an “external” disk is improved, by separating out the commands from the base disk “create”. This allows for a more straightforward approach.

8. Controlplane

Extended SU Support

SU or “switch user” support has been extended to support Openstack users when authenticated through Blockbridge. Additionally, permissions for your own user are now retained when switching to a different user. This allows an administrator to perform actions on behalf of other users.

Support for Team Interfaces

Configured Team interfaces are now fully supported as top-level configuration objects (and show proper nesting for lower interfaces in the GUI). Status flows from lower to upper interfaces as appropriate.

Support for Enclosures

Chassis specific enclosure support displays as a configuration tab, with status and information reporting on fans, power supplies and disk media present in the enclosure.

Support for Disk Nodes

Ethernet-attached disk nodes present themselves as a proper disk enclosure, presenting the disk media for consumption by the attached dataplanes.

9. Dataplane

Remote Basis Clones

A remote basis clone is a virtual disk whose backing data source comes from either a) an object store backup of a snapshot, or b) a snapshot of a virtual disk from another dataplane (potentially from a remote site). Remote basis clones have immediate online fault-in which means that as data is accessed from the client, it is copied from the remote basis on demand if needed. This capability provides a Backup and Restore workflow with minimal downtime, as the client does not have to wait for the entire dataset to be restored before data access.

10. Backplane

Disk Node

A Disk Node is a new Blockbridge installation type that publishes local disks for consumption by Blockbridge dataplanes. A Disk Node presents itself as an Ethernet-connected disk enclosure, available to dataplanes in the cluster, containing a set of disk slots with media present, fans, and power supplies. Optionally, a Disk Node can be enabled and converged within a dataplane, rather than as a separate platform installation. This allows a dataplane to also present an enclosure backplane, and publish disks local to the dataplane to be consumed by other dataplanes in the cluster.

Dynamic Ethernet-Attached Storage

For a dataplane to consume storage from a Disk Node (separate or converged), the backplane must be configured as a service for the dataplane client to consume. System arrays and HEAL volumes query the configured backplanes for storage matching the desired characteristics and failure domain, perform an ownership request for the particular disks, and dynamically attach the remote disks to the dataplane. An array is formed out of the matching set of disks, and is dynamically repairable in the case of backplane or disk failure, maintaining allocation constraints across the requested disk properties and failure domains. For low latency, RDMA is supported from dataplane to backplane. Header and data checksums provide additional integrity guarantees.

11. Integrations

CSI Volume Plugin for Kubernetes

Blockbridge continues our long standing commitment to the container ecosystem by developing and providing a CSI Volume Plugin for Kubernetes for use with Blockbridge storage. Deployed as a StatefulSet/DaemonSet with RBAC, the Blockbridge plugin follows current best practices for CSI plugins. Simply provide an API URL and access token for the Blockbridge Controlplane via secret key, and the plugin dynamically creates Blockbridge volumes, attaches them, and makes them available to kubernetes pods. The Blockbridge CSI plugin currently tracks CSI version 0.2 for kubernetes 1.11 but is expected to continue evolving and support kubernetes 1.12 and beyond as the CSI specification enters out of beta by the end of the year.