DMTN-032: Qserv Data Placement

  • Fabrice Jammes

Latest Revision: 2017-02-06

1   Requirements

Provide a sharp requirements list for Data-Placement is critical in order to offer an efficient and well-fitted solution.

Maximize high availability
SQL user queries must continue in real time, if less than 2% worker nodes crashes.
Minimize overall cost
Costs include infrastructure, sofware development, system administration and maintenance.
Minimize data reconstruction time
A node which crashes must be cloned in a minimal amount of time (data and application)
Prevent data loss
At least one instance of each chunk must be available at any time on disk, even in case 5% of storage is definitly lost. Archive should also be managed by the system in order to reduce cost.
Maximize performances
  • Bottleneck must be identified and quantified for each proposed architecture.
  • Chunk placement combinations can be optimized depending on a given query set.
Support Disaster recovery
At first glance, disaster recovery looks like a separate, likely unrelated concern. However, when considering operations for Qserv instead of just the standing up of Qserv, this all become related. A first requirement is that the data recovery loading method must take into account changes in hardware configs between initial load and recovery.

2   Data throughput

Based on

In order to provide a lower bound for data throughput, problem is simplified:

  • low volume queries are ignored.

  • high volume queries are splitted in following groups:

    1. full scan of Object table
    2. joins of the Source tables with the most frequent accessed columns in the Object table
    3. joins of the ForcedSource tables with the most frequent accessed columns in the Object table
    4. scans of the full Object table or joins of it with up to three additional tables other than Source and ForcedSource
Table 1 The estimated data throughput for concurrent high volumes queries
Query type Avg query number at time t Avg query latency (hour) Data read by 1 query (TB)
1 16 1 107
2 12 12 5107
3 12 12 2007
4 16 16 163
Table 2 The estimated data throughput for one node, for a 500 nodes cluster
Query type Data read by 1 query in 1 hour on 1 node (TB)
1 0.214
2 0.8511
3 0.3345
4 0.02025

With optimal shared scans (i.e. each table is only read once for concurrent queries), data throughput on one node is: 394 MB/sec

Without shared scans, data throughput on one node is: 4993 MB/sec