By Nassyam Basha, Data Intensity | Simon Pane, Editor
“Sharding” is something many of us have heard a lot about in various database engines, including Oracle’s NoSQL Database. Recently, Oracle announced Oracle Database 12c Release 2 which added various new features, one of which is database native “Sharding”. In this three part article series we will cover this new feature in depth. Starting with the Oracle Sharding architecture and its components, then moving on to deployment and management of the sharding configuration.
Roots of Sharding
Before starting with Sharding in the Oracle Database, we should recall the table partitioning concept as Sharding works on a similar logical architecture. Imagine a case where we have a 600GB table with billions of rows, containing historical data over 10 years. In such cases, the optimizer may perform bulk reads consuming significant IO. To avoid this overhead, the table can be partitioned using the interval, range, or hash types of partition schemes. For example, there may be a requirement to create a partition for each year and if the user fetches the data of the current year, then the other years’ data is unnecessary and can be automatically eliminated. With such a strategy, we can achieve a significant performance improvement.
Oracle introduced a very similar concept in the Oracle Database 12.2 with Sharding, where data will be horizontally partitioned across independent database(s). The goal of sharding is easy scalability.
Scaling database infrastructure vertically is fast and easy, but after a while you may reach a threshold due to software limitations or constraints on the underlying hardware. At which time, the only option may be to scale-out horizontally.
The goal is to distribute data optimally across multiple independent (not RAC) database nodes equally, while guaranteeing reliability from any one database being a single point of failure. The foundation of sharding is robust replication, which is synchronous at the transaction level. (This is suggested only between machines in the same Data Center - sharding across data centers is possible but is not recommended). Sharding is especially applicable for critical custom OLTP application domains such as retailers, gaming sites, media companies, auction sites, etc where scalability is important. Oracle provides great flexibility to scale up as much as needed, for example up to 1000 shards.
Background of Sharding
In 12c Release 1, Oracle introduced a new feature called Global Data Service/Global Service Manager. Let’s suppose in a highly complicated environment we have a 50 database mixture of RAC, Oracle GoldenGate (OGG), and Active Data Guard (ADG) databases with 500 local services. It’s not easy to manage them manually. To overcome this situation, Oracle introduced Global Data Services (GDS), which takes care of load balancing, routing, and service failover while allowing DBAs easy monitoring using the utility called GDSCTL.
Global Service Manager (GSM) is somewhat similar to the SCAN listener of a RAC database where end users connect using a SCAN IP or hostname and then the SCAN listener coordinates with local listeners based on the load balance technique and service configuration. Similarly, in GSM all the services will be connected to a regional listener and then GSM will establish the connection with the desired target considering the load balance, database role, routing, etc. However, the major difference is that RAC works within cluster and GSM can involve the entire landscape of customer environments. In short, GSM acts as hub between the end users and the database servers.
In Oracle 12c Release 2 Sharding was introduced and developed on the same framework as GSM mentioned above. In fact the underlying software engine of GSM is now known as “Shard Director”. It also provides real time load balancing advisory and availability of targets, managing global services, and connection routing. GSM/Shard Director helps in deploying and managing shards.
How Sharding Works
With Sharding, the data will be horizontally divided into different and independent databases. Each independent database will have its own server with components like CPU, memory, storage, etc. The servers are not different Oracle RAC nodes, but the Oracle databases are completely independent.
The data is distributed from a single database known as the sharded “catalog database”, which coordinates with the Global Software Manager. The sharded table will be created in the sharded catalog database, which contains only the table structure, or metadata without any rows. The sharded databases all contain the same tables and columns but with different row data.
The illustration gives us an example of a traditional Oracle database without the sharding feature on the left side. Logically if we review the architecture of sharding (shown on the right), we have one sharded catalog database which contains only the table definition (columns) with no data. But the data will be distributed across all the available shards as rows meaning that the same columns will be available on each shard database. Thus we increase scalability and performance as each database only manages and processes a subset of the entire table data set.
Components of Sharding
The entire configuration is considered as single framework, but it consists of various components, products, services, utilities, and logical names.
Global Service Manager / Shard Director: GSM is a central software component to install and configure, allowing all connections through the centralized GSM regional listener. It plays a key role in transferring the connections based on the real-time load balancing advisory and the preference of the targets. GSM can be configured in multiple locations for high availability. GSM also plays a secondary role if GSM is managing the sharded databases: in this case, the GSM will also act as the “Shard Director”.
GDS Catalog / Sharded Catalog (SDB): GSM requires a database where the configuration of the GSM, databases, and services metadata will be stored. This is known as the GDS catalog database. After the introduction of Sharding, this database used in two ways: 1) GDS Catalog database; 2) Sharded catalog (SDB). This database should be a non-container database with OMF configuration though it can be a RAC database for high availability.
Regions: If our databases or servers are hosted across different geographical locations, then we will divide them into regions for easy reference, such as East, West, EMEA etc. These region names will work as a badge for all the targets of the configuration.
Pools: We can create various pools to restrict the usage to specific groups, so that each pool will be assigned to each administrative domain. Consider if we have read-write services and read-only services so that we can create two pools to distinguish the usage clearly and to help in identifying to which domain it belongs to.
Global Services: If we have 5 local services created, then when we connect using these services, then the connection will be established directly with the specific database. Instead though, the connections should be routed to GSM and GSM will manage which shard/database the connection should be routed to, based on the service configuration and the availability of targets. This service is considered as a Global Service, which is possible only with GSM. The connection uses GSM regional listener and the logical cloud hostname.
GDSCTL: This utility is part of the GSM installation. The entire GDS configuration can be managed by GDSCTL, including databases, services, configuration, adding shards etc.
Shards: A “Shard” or “Sharded database” is one in the set of horizontally partitioned Shard (databases). Underneath, these are independent databases, hosted on individual servers.
Shard Group: The shard group is a grouping of one or more shards. This is a good way to logically represent or alias grouping such as a “Primary shard group” or “standby shard group” etc.
Shardspace: The Shardspace consists of one or more Shard groups. Let’s say 2 shards for primary shard group and 2 shards for standby shard group. This whole setup is considered a single shardspace.
Sharding Key: When an application connects to the database, it needs to pass in a sharding key (such as a Student ID, Account Number, etc), so that the connection (and the associated SQL statements) will be passed to the desired shard database. Although sharding is still possible even without a user/application provided sharding key.
Prior to Oracle 12.2, I was wondering why Oracle introduced Global Database Services as an additional and separately licensed product? Possibly because complicated environments would consider this a new feature and see the value in the expense. After the release of Oracle 12.2, and the fact that Oracle Sharding is based on the the GDS framework, the product’s purpose/value (for that expense) is more apparent
Types of Sharding
During the creation of sharded table, there is an option to choose the preferred sharding type.
The various types are:
Prerequisites of Sharding Configuration
In the subsequent examples, we will see the deployment of Oracle sharding with 2 shards in a primary shard group and 2 shards in the standby shard group.
● 5 Virtual Machines – With minimal configuration where we can install the RDBMS and manage the databases
-Linux 64bit with Minimum Kernel version 2.6.39-400.211.1 for Linux 6.x on all the hosts
● In our example we will be using the hosts: ORA-C1, ORA-C2, ORA-C3, ORA-C4 & ORA-CCREP
-ORA-CCREP is the sharded catalog database with GSM installed.
-ORA-C1/2/3/4 are the shards
● Install Oracle 12c Release 2 RDBMS on ORA-C1, ORA-C2, ORA-C3, ORA-C4 and ORA-CCREP
● Install Oracle 12c Release 2 GSM on ORA-CCREP
● Create an Oracle database for the Sharded Catalog with non-container & OMF on ORA-CCREP.
Ensure the GSM and the RDBMS software versions are the same. Different versions will encounter compatibility issues and not be able to continue with the configuration.
This article explains the heart of Oracle Sharding, covering the concepts and architecture that will be referenced throughout this article series. It also describes what are the various components associated with Oracle Sharding including GSM and the sharding catalog. Also important is an understanding of the various types of sharding which we can choose to configure based on the application or business requirements. Finally, we have seen the required configuration to deploy Oracle Sharding.
In next article, we will address how to deploy the Oracle sharding with simple example steps.
Nassyam Basha is a Database Administrator. He has around ten years of experience as a Production Oracle DBA, currently working as Senior Principal Consultant at Data Intensity. He holds a master's degree in Computer Applications from the University of Madras. He is an Oracle 11g Certified master an Oracle ACE Director. He actively participates in Oracle-related forums such as OTN with a status of Super Hero, Oracle support awarded as "Guru" and acting as OTN Moderator and written numerous articles with OTN and on Toad World. He maintains an Oracle technology-related blog, www.oracle-ckpt.com and can be reached at firstname.lastname@example.org.