Migrate OpenAM to Apache Cassandra without a Single Point of Failure
Original article: https://github.com/OpenIdentityPlatform/OpenAM/wiki/Migrate-OpenAM-to-Apache-Cassandra-without-Single-Point-of-Failure
Initial Data Storage Scheme
Data Type | Storage Method | Fautl Tolerance Method | Disadvantages |
---|---|---|---|
OpenAM configuration | OpenDJ (localhost:1389) | Multi-master replication | Configuration update on a single node affects all nodes by replication. |
CTS-Core Token Service (Session persistence) |
Syncronisation payload processed on all nodes Read performance from single node restricted by single node performance. Replication failure could cause other nodes to read-only mode | ||
Accounts repository (except AD) |
Data Storage Scheme for the Number of Credentials >5 Million
Data Type | Storage Method | Fault Tolerance Method | Disadvantages |
---|---|---|---|
OpenAM configuration | OpenDJ (localhost:1389) | Local independent storage as a part of the distribution (war file) | Updating the configuration on one node does not affect other nodes (the nodes are completely independent) |
CTS-Core Token Service (Session persistence) | Cassandra Cluster (tcp:9042) | Cluster without a single point of failure with geo-distribution and distribution by rack |
Synchronization write payload is processed only on the nodes with the required replication level The reading load from one node is not limited to the performance of one node, but distributed according to the replication level. Node failure does cause replication stop according to the replication level |
Accounts repository (except AD) |
Migration Plan
- Cluster hardware resources planning
- Deploy the cluster according to the required level of fault tolerance
- Provide network access OpenAM-> tcp:9042
- Migration stages (can be done independently):
- Switch "CTS - Core Token Service (sessions)"
- Switch "Accounts repository (except AD)" with legacy data migration
- Switch "OpenAM configuration"
Fault Tolerance Level Planning
Datacenter
Defines geo-distributed storage fault tolerance.
- Minimum number of data centers: 1
- The recommended number of data centers:
- at least two
- at least the same number of data centers used for application servers (OpenAM)
- Allowed data center fault tolerance mode:
- Hot Spare: used for data processing for application servers (OpenAM)
- Cold Spare: not used for data processing for application servers (OpenAM)
Rack
Minimum fault tolerance unit within a data center for data distribution within a data center, haves:
- Independent disk subsystem array
- Independent virtualization hypervisor (host system)
Amount calculation:
- Minimum quantity inside hot spare data center: 1, but not less than replication level inside the data center
- Minimum quantity inside cold spare datacenter: 1, but not less than replication level inside the datacenter
Node
An Example of a Recommended Minimum Configuration
Test
DataCenter | Type | Amount of Copies | Rack | % data | Node | % data |
---|---|---|---|---|---|---|
dc01 | host | 1 | rack01 | 100% | dc01-rack01-node01 | 100% |
Production
DataCenter | Type | Amount of Copies | Rack | data % | Node | data % |
---|---|---|---|---|---|---|
dc01 | hot | 1 | rack01 | 100% | dc01-rack01-node01 | 50% |
dc01-rack01-node02 | 50% | |||||
1 | rack02 | 100% | dc01-rack02-node01 | 50% | ||
dc01-rack02-node02 | 50% | |||||
dc02 | hot | 1 | rack01 | 100% | dc02-rack01-node01 | 50% |
dc02-rack01-node02 | 50% | |||||
1 | rack02 | 100% | dc02-rack02-node01 | 50% | ||
dc02-rack02-node02 | 50% |
Allowed:
- Increase the data center amount without service interruption
- Increase the rack amount without service interruption
- Increase the node amount without service interruption
- Change the number of data copies inside the data center without service interruption
Hardware Requirements For a Single Node
Environment | CPU | RAM | Disk |
---|---|---|---|
Test | >=2 | >=8 | 16G HDD |
Production | >=8 | >=32 ballon=off | 64G SSD RAID |