The SAKA 2.0 was designed to be a highly-available appliance within a cluster, replicating all transactions asynchronously to all other nodes of the cluster. As such, applications can communicate with any node of the SAKA cluster to consume webservices. However, applications must either be designed to deal with clustered-nodes or must rely upon infrastructure to provide the high-availability to them.
This can be achieved in one of the following ways:
- Through a commercial load-balancer that hosts a single FQDN for applications, but directs traffic to all nodes in the cluster;
- Through a free and open-source load-balancer (such as HAProxy – http://haproxy.1wt.eu/) that hosts a single FQDN for applications, but directs traffic to all nodes in the cluster;
- Through round-robin DNS which serves up a different IP address from the cluster for the common FQDN known to the applications;
- Through a configuration file on the application that knows of all nodes in the cluster, and uses an application-specific algorithm to communicate with a different node for webservices – this essentially pulls the load-balancing logic into the application code providing it greater control, if desired.
Another implementation is depicted in the following diagram; this implementation has client applications in both locations accessing SAKA appliances only locally while transactions to the appliances themselves are replicated to all nodes of the EKM cluster. It is expected, of course, that applications have their own replication strategy for application data across locations:
Another consideration when communicating with a clustered configuration is “data-availability-latency”. In a configuration where the application is running in the Primary Data Center (PDC) and has encrypted a PAN on Node 1 of the cluster. If, within a few seconds, another thread of the application needs to decrypt the token, but the load-balancing logic sends the request to the remote SAKA node (Node 3 or 4 as shown in Figure 2 or 3), there is the possibility the application will receive an error of an “Invalid Token” from the remote SAKA node.
This happens for one of two reasons:
- The network between the transacting node (Node 1 or 2 in our example) and the remote nodes (Nodes 3 or 4) is sufficiently slow that replication is not current and the remote nodes do not yet have the just-encrypted/tokenized objects from the origin; or
- The network between the PDC and the Disaster Recovery Data Center (DRDC) is currently broken and the cluster is “broken” .
In this case, the application must be designed to take one of the following actions:
- Retry the transaction, and depending on the load-balancing strategy implemented, either hope it will go to a different cluster node that has the encrypted object, or if the load-balancer supports rule-based load-balancing, send the transaction request to a different URL on the load-balancer that knows to direct the transaction to the source SAKA (in the default configuration, the SAKA with SID #1 returns tokens that begin with the numeral one (1), the SAKA with SID #2 returns tokens that begin with the numeral two (2), etc.); or
- Retry the transaction, but communicate directly with the original SAKA that encrypted the object (the tokenizing SAKA). This assumes that the application knows which SAKA tokenized the sensitive data either by having implemented the load-balancing logic within the application and having saved the unique Server ID (SID) of the tokenizing SAKA within the ADB record, or by parsing the token number to determine the SID of the SAKA that tokenized the plaintext (based on the default tokenizing logic mentioned in the earlier paragraph).
Design Consideration #3
Design the application with high-availability and replication-latency in mind. Companies vary on their disaster recovery (DR) requirements and the resources available for this purpose; this will be a factor in the design of the application. StrongAuth has tested the open-source HAProxy load-balancer, and also has customers where some brands of commercial load-balancers work with the appliance. StrongAuth has also written cluster-aware code samples to deliver HA features to the business unit. In the end, business requirements, cost and resource-constraints will dictate what's optimal for a given site.
|