close

Вход

Забыли?

вход по аккаунту

?

Parallel join for IBGF partitioned relational databases

код для вставкиСкачать
CONCURRENCY: PRACTICE AND EXPERIENCE, VOL. 9(8), 821–836 (AUGUST 1997)
Parallel join for IBGF partitioned relational
databases
M. BOZYIGIT, S. A. MOHAMMED∗ AND M. AL-TAYYEB†
Information & Computer Science Department, King Fahd University of Petroleum & Minerals, PO Box 1619,
Dhahran 31261, Saudi Arabia
SUMMARY
This study is concerned with a parallel join operation where the subject relations are partitioned according to an interpolation based grid file (IBGF) scheme. The partitioned relations
and directories are distributed over a set of independently accessible external storage units,
together with the partitioning control data. The join algorithms executed by a mesh type parallel computing system allow handling of uniform as well as nonuniformly partitioned relations.
Each processor locates and retrieves the data partitions it is to join at each step of the join
process, in synchronisation with other processors.
The approach is found to be feasible as the speedup and efficiency results found by simulation
are consistent with theoretical bounds. The algorithms are tuned to join-key distributions, so
that effective load balancing is achieved during the actual join. 1997 by John Wiley & Sons,
Ltd.
1. INTRODUCTION
Join is a frequently used database operation. It is also the most expensive, when compared
to other database operations such as selection and projection. Join, unlike other operations,
involves more than one relation. The sequential indexing methods such as nested loops,
hybrid hash and join index are fashionable, but they can be slow, especially when large
databases are involved[1,2].
Emerging parallel and distributed computing systems have, during the last two decades,
created an important incentive for efficient database operations[3,4]. Parallel join algorithms, such as parallel hybrid hash and join index, partition relations just before the join
operation. This makes a real-time join costlier. Also, frequent deletions, additions or updates must not require complete reorganisation of the database. Frequent join operations
will be more efficient if the involved relations, once partitioned, remain unchanged and
ready for spontaneous access.
A parallel join operation is expected to perform well if the involved relations are properly
organised[5,6]. The individual processors should, independently and in parallel, be able to
detect and interpret data allocated to them for local joins and then co-operate with others
to achieve the global join.
This is the main reason for choosing an IBGF based relation organisation. It imposes a
structured partitioning on the database, yet allows for its maintenance dynamically[7,8].
The resultant decomposition is also suitable for parallel processing as the distribution
of disjoint data partitions over a parallel system can be formulated. This study is just a
∗ Currently
† Currently
with the University of Melbourne, Australia.
with Saudi ARAMCO oil company, Dhahran.
CCC 1040–3108/97/080821–16 $17.50
1997 by John Wiley & Sons, Ltd.
Received 18 April 1994
Revised 13 May 1996
822
M. BOZYIGIT, S. A. MOHAMMED AND M. AL-TAYYEB
beginning, as it excludes implementation of the developed framework on a real distributed
memory parallel system.
Various parallel computer interconnection schemes such as shuffle exchange, hypercube,
ring, pipeline and mesh can be considered for a parallel join[1,9,10]. In this study, the
emphasis is on a mesh type architecture with a multi-unit external storage system. There are
important commercially available mesh type parallel systems, such as DAP, Paragon XP/S,
Cray T3D, J-Machine and Tera Computer[11]. Architectures such as ring and hypercube
can be considered as special forms of the mesh type[11]. They are expected to perform
equally well. Given a multi-unit disk storage system, the issue of how to map partitions in
order to maximise the concurrency, during a join, is closely related to the efficiency[12].
The partition mapping is controlled by the size and topology of the storage subsystem, the
parallel computing system, as well as the relation characteristics.
The remainder of the paper covers the following topics: Section 2 introduces relation
partitioning, according to IBGF. The layout of the parallel mesh architecture used as a
testbed for the simulation is presented in Section 3. The parallel join algorithms for uniform
and nonuniform (skewed) relations are presented in Sections 4 and 5, respectively. Finally,
the results obtained by simulation are introduced in Section 6 followed by conclusions in
Section 7.
2. RELATION PARTITIONING
An IBGF based data organisation is basically an indexing scheme. Details of an IBGF are
not within the scope of this study. However, its main features are discussed to provide the
basis for its use in the parallel join under study. A relation is viewed as a file of n tuples and
d attributes. It is partitioned into buckets, based on d-dimensional hashing. The size of a
bucket (partition) is normally imposed by the physical characteristics of the available disk
storage and memory, and therefore is a matter of granularity. It is possible for a relation to
fit into one bucket, if the relation is relatively small and the physical characteristics of the
external storage and main memory are appropriate.
Basically, the value of each attribute of a tuple is mapped to a real number in the half
open interval [0, 1) by scaling, using the domain range. The decimal point is then ignored,
retaining as many bits as necessary to its right. The number of bits retained, for mapping,
is a function of partition and relation sizes, as in dynamic hashing[13].
For example, consider a 2-attribute tuple t : (100, 25), where the values of attributes
1 and 2 are 100 and 25, respectively (refer to Table 1 for frequently used symbols and
abbreviations). Also, let the domain ranges for the attributes 1 and 2 be 200 and 100, in
that order. First, scale the tuple by mapping it to a floating point binary number, as
100 25
,
−→ (0.1, 0.01)
t : (100, 25) −→
200 100
Then, retain one bit to the right of the decimal point, ignoring the decimal point, to
provide the co-ordinates of the tuple in binary. Thus, the final mapping, after skipping the
intermediate steps, can be represented as
t : (100, 25) −→ (1, 0)
PARALLEL JOIN FOR IBGF PARTITIONED RELATIONAL DATABASES
Table 1.
Ai
ai
d
f
IBGF
K
Ki
ki
k
LAIL
`
m
n
pij
p
q
Ri
r
s
sb
t
u
αi
∆i
−→
823
List of main symbols used in the paper
value of attribute i
axis (attribute) interval label, on axis i, in binary
number of attributes of a relation
skew factor of a nonuniform relation
interpolation based grid file
total number of partitions in a uniform relation, 2k
total number of axis intervals on axis i, 2ki
partition level of axis i P
ki , for all is
sum of partition levels,
largest axis interval label
partition size
number of rows of a mesh
number of columns of a mesh
processor (i, j) in a mesh
partition label
partition level at the time of creation of a partition
relation i
number of partitions in a join class
number of external storage units
scaled binary
tuple
relation size, in number of tuples
minimum value of attribute i
maximum domain interval of attribute i
implies a mapping
which will be referred to as a scaled-binary (sb) form, from now on.
The procedure starts with one partition, which is dubbed as 0 (zero) partition level, along
all axes. The partitioning is conducted along all the attribute axes in a round-robin fashion.
An insertion in excess of partition capacity, along an attribute axis, causes splitting along
that axis. Each split increments the partition level by one along the subject axis. In the
uniform relation case, all these partitions will eventually be filled to their full capacities
(bucket size). The next partition level, after partition level 0, is 1; the number of partitions
is now 2 (= 21 ). The second split will result in four partitions (= 22 ). The partitioning
procedure ends with a partition level of ki along attribute axis i, where i ∈ (d − 1). A
partition level of ki implies 2ki axis (or partition) intervals. Therefore, for a d-attribute
uniform relation case, there will be a total of 2k partitions, where k = k0 + k1 + . . . + kd−1 .
For a nonuniform relation case, some partitions may be empty, contrary to the uniform
case where all the partitions are of equal size. The algorithm developed for nonuniform
relations should avoid bucket allocations for empty partitions. Hence, a directory structure
is required to keep a record of the existing partitions. Such a directory structure can be
viewed as a binary tree where terminal nodes constitute data partitions and internal nodes
constitute directory or subdirectory partitions.
A partition label can be computed from the corresponding axes intervals. The tuples in
the same bucket produce the same partition label.
To generalise, let t be a tuple represented by t : (At0 , At1 , At2 , . . . , Atd−1 ), αi be minj∈u−1
(Aji ), and domain interval ∆i be maxj∈u−1 (Aji − αi ), for i ∈ (d − 1).
A tuple-to-co-ordinate (axis interval) mapping can be written as
t : (At0 , At1 , At2 , . . . , Atd−1 ) −→ (a0 , a1 , a2 , . . . , ad−1 )
824
M. BOZYIGIT, S. A. MOHAMMED AND M. AL-TAYYEB
A2
A2
t1
t1
0
0
t3
t2
t2
p0
A1
p0
p1
0
0
A1
1
(b) Two partitions: 3 tuples
in 2 partitions exist.
(a) One partition: 2 tuples
in 1 partition exist.
A2
t5
t6
11
t7
t10
10
0
t3
t1
p0
t2
t4
01
p3
0
p1
00
A1
p3
p6
t11
t2
t3
t4
p1
p0
00
1
(c) Four partitions: 4 tuples
in 3 partitions exist.
Figure 1.
t1
t9
t8
p2
A2
1
t12
p10
01
10
A1
11
(d) Sixteen partitions: 12
tuples in six partitions exist.
Partitioning by tuple insertion: a new tuple will cause a split if the partition mapped is
full (ti indicates tuple i, pi indicates partition i, and Ai indicates attribute i)
where ai is the axis interval of the partition to which t is mapped, on axis i, and it is the
sb scaled-binary representation computed as [(Ati − αi )/∆i ]. The maximum number of
tuples that can map to a partition is limited to its size, `.
Conversely, given co-ordinates of a partition in a d-dimensional system, there is a unique
partition label p, that can be computed as
p=
d−1
X
i=0
2i
kX
i −1
aij 2(ki −j−1)
(1)
j=0
where ki is the partition level and aij is the jth bit of the axis interval label on axis i. These
partition levels will be used to take decisions about partition-to-processor mapping during
the parallel join operation.
The upper bound on the total number of partitions is the product of the upper bounds on
the number of axis intervals, which is equal to 2k , where k is the sum of partition levels,
i.e. k0 + k1 + . . . + kd−1 . For the uniform relation case, the number of partitions is exactly
equal to 2k . For the nonuniform case, 2k is only an upper bound.
As an example, assume that there are four tuples in a 2-attribute relation R. Also, assume
that the bucket (partition) size is 2 tuples. During successive tuple processing, each bucket
overflow causes a split and thus doubles the number of intervals (or increments the partition
level by 1), expanding the binary interval labels by another bit.
Figure 1(a) shows a single partition, after insertion of the first two tuples. Now the
partition level is 0 and the partition interval is 1 (= 20 ), along attribute axes A1 and
PARALLEL JOIN FOR IBGF PARTITIONED RELATIONAL DATABASES
825
A2. Each insertion requires scaling, as discussed before, on both attributes. Figure 1(b)
shows the situation after insertion of a third tuple, which causes a partition-split. The
fourth insertion causes the formation of four partitions, one of which is empty, as shown in
Figure 1(c). Figure 1(d) shows the state after an insertion of 12 tuples. Only the nonempty
partitions are shown. If the relation was uniform, all 16 partitions would have been shown.
Note that the partition labels are computed by equation (1).
This example demonstrates a nonuniform case, which requires the implementation of a
multiple level directory structure to avoid storage allocations for empty partitions[13]. The
directories are treated like data files, except that the number of tuples per bucket will be
larger, as a directory tuple is normally shorter than a data tuple.
3. PARALLEL ARCHITECTURE
The parallel system used in this study is an m× n mesh topology with s multi-storage units,
where m and n are some powers of 2, as shown in Figure 2. The basic system architecture
is typical of some commercially available parallel systems. The computation model is
appropriate for an MIMD (multiple instruction multiple data) parallel architecture[11]. Each
processor, Pij , has a local memory and four buffered communication channels managed by
a communication processor. Apart from modular growth and simple pipelining, there is no
special restriction on the topology used. The mesh interconnection network can be replaced
by a ring, hypercube, or any other parallel system. The crossbar switch, proposed as a
storage-to-processor interconnection network, can also be replaced by other interconnection
schemes.
In the given topology, the boundary processors are Pi1 and P1j , where i ∈ m and j ∈ n.
They are special processors, which are connected to a multi-unit external storage system by
an interconnection network, as shown in Figure 2. The size of the interconnection network
in terms of the number of switches indicates the degree of I/O concurrency. The processors
share the switches (or the storage units) if the number of switches is less than n or m. Proper
concurrency control protocols are employed so that conflicting I/O requests are minimised
and the load is balanced[14,15]. This is part of the parallel join algorithm which must be
aware of the data distribution on the external storage units and the parallel system topology.
4. PARALLEL JOIN ALGORITHM FOR UNIFORM RELATIONS
The main parameters that need to be known about the physical structure of a given relation,
before actually accessing data, are its partition levels and its uniform/nonuniform status. If
the partition count is equal to 2k , where k indicates the total partition level, at the completion
of the tuple insertion process, then the relation is uniform. Otherwise, it is nonuniform.
Once the relations are IBGF partitioned, the directory structure will allow any attribute
of any relation to be chosen as a join attribute. In this study, each join is subject to the
overhead imposed by the parallel algorithm only. Otherwise, the directory maintenance
cost incurred by tuple insertions or deletions is excluded. A study concerned with the total
cost would have considered the effect of directory maintenance cost per update as well as
the initial partitioning cost.
For clarity, let R1 and R2 be two equal cardinality relations with attributes of equal
domains and partitioned into an equal number of axis intervals during the partitioning process. For the join of R1 and R2, it is sufficient to match the partitions in the corresponding
826
M. BOZYIGIT, S. A. MOHAMMED AND M. AL-TAYYEB
Multi Disk Units
P11
P12
P13
P14
P21
P22
P23
P24
P31
P32
P33
P34
P41
P42
P43
P44
Cross_bar switch
Processors
Figure 2.
A parallel computing system
intervals on the join axes. For example, the partitions from interval i on a join attribute of
R1 are matched with the partitions in interval i on the corresponding join attribute of R2.
Let the set of partitions in a common axis interval form an interval class; then there is
a direct mapping from a partition to its interval class. Given the binary representation of a
partition p as bk−1 bk−2 . . . b1 b0 , the interval class label on axis i (Ii ) to which the partition
belongs is easily computed using the following equation:
Ii =
kX
i −1
2j bi+dj
(2)
j=0
For a uniform relation, the number of partitions in each interval class on axis i is equal
to 2k−ki . This many partitions map to the same interval class. The reverse mapping, i.e. a
computation of the partition label from the axis interval labels, was given by equation (1).
For the simple two relations case, the join involves matching the corresponding interval
classes. If the number of interval classes in R1 is more than that in R2, then more than
one interval class in R1 must be matched with a single interval class in R2.
Let the term join class indicate compatible interval classes in different relations to be
joined. The tuples in a join class can only match the tuples in the corresponding (or
compatible) join class. In the uniform relation case, while a join class with one interval
class belongs to the smaller relation, a join class with a set of interval classes belongs to
the larger relation. The join class concept is generalised so that it covers uniform as well
as nonuniform relations of any size and domain.
The concept of join classes forms the basis of the parallel join algorithm. The parallel
computing system (Figure 2) used for the join operation can be viewed as an extended
pipeline system where the partitions move along the horizontal and vertical processor
pipelines. A partition, having reached a processor, remains resident until its join with
PARALLEL JOIN FOR IBGF PARTITIONED RELATIONAL DATABASES
827
PROGRAM UniformJoin; /* executed by processor i */
BEGIN /* Initialise */
Input Relation Data;
Input Computing System Data;
Build Join Classes;
WHILE (Join is not complete)
BEGIN
Compute Current Partitions to Join; /* schedule partitions */
Input Scheduled Partitions Along Row and Column;
Pipeline Partitions Along Row and Column;
Join Scheduled Partitions;
IF ( the processor is on boundary row)
/* schedule next partition from R2 */
BEGIN
WHILE (Current join class is not complete)
BEGIN
Input Next Scheduled Partition;
Pipeline Partition Along Column;
Join Scheduled Partition;
END
END
IF (the processor i is on boundary column)
/* schedule next R1 partition */
BEGIN
WHILE (Current join class is not complete)
BEGIN
Input Next Scheduled Partition;
Pipeline Partition Along Row;
Join Scheduled Partition;
END
END
END
END
Figure 3.
Summary of the parallel join algorithm (in Pascal-like notation)
compatible partitions from the other relation is complete. There are no important provisions
about the operating system. It is assumed that there is a small kernel at each node which
boots up the nodes. It enables each processor to load the algorithms and handle partition
input/output as well as the data routing along the vertical and horizontal pipelines, as
commanded by the parallel join algorithm.
The join classes of R1 are handled by the vertical pipelines and those of R2 are handled
by the horizontal ones. They are balanced over the column and row processors, respectively.
The excess partitions, if any, after equal allocations, are allocated one per processor, until
exhausted. This simple heuristic minimises the completion time of the join, as there cannot
be better load balancing. Note that each processor is able to deduce its own partitions and
its own join classes independently of other processors. This is important for the parallelism.
As previously mentioned, the labeling scheme allows the processors to carry out unique
mapping using bit manipulation of the partition, axis interval and join class labels.
A concise form of the algorithm is given in Figure 3. In summary, a processor i is able to
infer the partitions to input, the partitions to join, the partitions to pipeline and the pipeline
(row or column) to choose for send (or receive), from the partitioning information available.
Any attribute can be a join attribute as the partitioning covers all the attributes.
828
M. BOZYIGIT, S. A. MOHAMMED AND M. AL-TAYYEB
5. NONUNIFORM RELATIONS
For a relation to be nonuniform, the number of partitions will be less than the theoretical
maximum (2k ). In this case, there is only a subset of possible data partitions that actually
exist per relation. Some partitions, which would otherwise exist in the uniform case, are
now treated as being collapsed into one partition. A set of empty adjacent partitions in a
continuous range of an axis interval is treated as being embedded into the smallest label
nonempty partition adjacent to that range. Similarly, some axis intervals may not exist.
They too are embedded in the preceding adjacent ones[7]. For example, the unoccupied
axis intervals i + 1, i + 2, . . . , i + j would be embedded in the occupied axis interval i.
With the implementation of a directory structure, storage is not allocated for the empty
partitions.
Despite its more complicated partitioning process, the join class concept developed
earlier for the uniform relation case is also maintained for the nonuniform relation case.
However, the computation of the compatible join classes and the constituting partitions
is not straightforward. The simplest method is to use equation (2) to find the compatible
join classes from the partition labels, taking embedding into consideration. For example,
in Figure 1(d), partition p0 embeds p4 , p8 and p12 on axis A1. It also extends over axis
intervals 0 and 1. The highest axis interval label covered by the last partition in a join class
is taken as the basis for computing the compatible join classes.
Two join classes are compatible if the highest axis interval label in one join class is equal
to that of the other. For example, in Figure 1(d), the highest label axis interval covered
by join class 0 (J0 ) is 1. J0 would have been compatible with J0 of another relation only
if its highest axis interval was also 1. If they are not compatible they are forced to be so.
Therefore, the compatible join classes are computed during each join operation, according
to the criterion just mentioned. After this, the join process continues in the same way as in
the uniform case.
Given the binary representation of a partition label, the largest axis interval label (LAIL)
covered can be computed. If the binary representation of a partition is bk−1 bk−2 . . . b1 b0 ,
then the LAIL on axis i (hi ) can be computed as
hi =
q−1
X
2j bi+dj + 2ki −q − 1
(3)
j=0
where i is the attribute axis label, bt is the tth bit of the partition label, and q is the partition
level at the time of its creation.
The smallest axis interval label (h0i ) of a partition is equal to that of the uniform case.
Thus, coverage of a partition is equal to (hi − h0i ), in terms of the number of axis intervals.
This information allows the common LAIL for partitions in a join class to be computed,
which will, in turn, be used to compute compatible join classes.
Figure 4 shows a non-uniform relation with nine existing partitions (out of 32, had the
relation been uniform). Tables 2 and 3 show the partitioning information compiled using
the mapping details discussed above.
The scheme used allows us to treat the join of nonuniform relations as a general case, of
which the join of uniform relations is a special case.
The parameters such as partition levels, number of partitions in each join class, and
the axis intervals covered by each partition are essential for each processor in the system
829
PARALLEL JOIN FOR IBGF PARTITIONED RELATIONAL DATABASES
Figure 4.
Partitioned state of a nonuniform relation with two attributes (pi indicates ith partition;
binary numbers indicate axis interval labels)
to perform the correct join. The partitioning information for each relation, having been
recorded upon completion of the partitioning, is input by the boundary processors and
pipelined or broadcast through horizontal and vertical processor pipelines during the join
operation. In fact, for a wraparound mesh system, the communication complexity of oneto-all broadcasting is logarithmic in system size[11].
A parallel partition input protocol is devised to handle input at the pipeline boundaries.
Simply, boundary processors of the vertical pipelines read in the join classes of one relation,
and those of the horizontal ones read in the join classes of the other relation, as in the uniform
case.
The uniform join algorithm, given in Figure 3, can also be used for the nonuniform case,
with a new interpretation of the join classes. In particular, the initialising and mapping
routines need to consider the directory structure that implements the partitioned nonuniform
relations.
Table 2.
Partitioning details of the nonuniform relation shown in Figure 4, along attribute A1
Existing partitions, p
Partition level at creation, q
Largest axis interval covered
0
3
0
1
1
7
2
1
3
3
2
5
4
2
7
7
2
7
8
2
1
15
3
1
16
2
7
830
Table 3.
M. BOZYIGIT, S. A. MOHAMMED AND M. AL-TAYYEB
Join class details of the nonuniform relation shown in Figure 4, along attribute A1
Join class
label
Partitions
covered
Axis intervals
covered
LAIL
0
0
2
8
(0)
(0,1,2,3)
(0,1)
(0)
1
2
8
16
(0,1,2,3)
(0,1)
(1)
(1)
2
2
4
(0,1,2,3)
(2,3)
(2,3)
4
1
3
(4,5,6,7)
(4,5)
(4,5)
6
1
7
15
(4,5,6,7)
(6,7)
(6,7)
(6,7)
The algorithm allows even distribution of partitions. If the number of partitions in a
join class (r) is not a multiple of the row and/or column sizes, some of the processors
would remain idle. For example, let the number of excess partitions of a join class be µ
(= r mod m). The last cycle of the join involves µ out of the m row processors only. If
m is a multiple of µ, repeating a copy of the excess partitions on the m/µ row processors
provides exact load balancing. The join of each partition of R1 will involve ( nr )/( m
µ)
partitions from R2, instead of nr .
6. PERFORMANCE
6.1. Simulation
To study the performance of the algorithm, the entire system (operational behaviour of
the parallel system and the algorithms) was simulated. The simulation consists of two
main modules. Module-1 creates a directory structure for IBGF, by mapping the tuples to
partitions. The depth of a directory tree depends on the skewedness of the relation and the
bucket sizes. Following the construction of the directory tree, the partitions are distributed
over the multiple storage units, according to the mapping function. For uniform relations,
the binary representation of a partition identification is sufficient to optimally map it to a
storage unit, in the uniform case. During the join operation, the partitions can be retrieved
from the storage units using the same mapping function. For the nonuniform relation case,
the directory tree is traversed in post-order form, while assigning the partitions to storage
units evenly. At the same time, an index is created for partition allocation, to be used
during the join operation. In all cases, a partition is assigned to one storage unit only, i.e.
no duplications are allowed.
Module-2 of the simulation package models the join operation. For each relation it
handles the partition flow and manages the join on the system. Note that the partitions of a
join class on one relation are joined with the partitions in the compatible join class(es) on
the other.
The simulation assumes 56 byte long data tuples, an 8 ms external storage access time,
a 16 Mbit per second effective data transmission rate, and a 4 µs tuple join time. The size
PARALLEL JOIN FOR IBGF PARTITIONED RELATIONAL DATABASES
831
of a partition is computed so that the time it takes to retrieve two tuples is equal to the time
it takes to join them. This restriction is used to simplify the synchronisation required to
overlap I/O with processing time. In our case, partition size is computed to be 73 tuples.
The processing and data transmission are synchronised using event-based virtual clocks,
where each action is represented by an event.
The uniform relations are formed such that tuple attribute values show uniform frequency
in a particular domain. The nonuniform relations are formed using a skew factor (f). For
the sake of simplicity, the attributes of the relations are assumed to be numeric, targeted to
numeric databases.
The main performance metrics used in this study are speedup and efficiency. The speedup
is equal to the sequential join time divided by the parallel join time. The efficiency is
computed as speedup divided by the number of processors[11].
The parameters whose effects on the efficiency and speedup were studied are system size,
relation size, number and size of partitions, number and size of interval classes, number and
size of join classes, number of storage units, number of storage-to-processor (s-p) access
paths, and the degree of data skew for nonuniform relations.
6.2. Results on uniform relations
Let r and s be the number of partitions in join compatible classes of relations R1 and R2,
respectively; let m and n be the number of rows and columns of the parallel system. If
r < m and s < n, then mn − rs processors will remain idle. In an ideal situation, r and
s are multiples of m and n, in which case all the processors will be kept busy, during the
join operation. Otherwise, in the last iteration, the mn − r0 s0 processors will still remain
idle, where r0 = (r mod m) 6= 0 and s0 = (s mod n) 6= 0.
Figure 5 shows the effect of the number of partitions on overall efficiency. The smoother
parts of the lower curve correspond to the situations where the number of partitions per
join class remains fixed but the number of join classes increases. In this case, the pipelining
is join class by join class. The results show that the pipelining is more susceptible to the
number of partitions rather than to the number of join classes. However, the flow of the join
classes can be synchronised so that the processors do not remain idle in between. In this
case, the efficiency shows a continuous upward trend, because of the reduced pipelining
effect, as shown by the upper curve. In general, a performance increase with relation size is
to be expected, as parallelism in pipeline processing shows itself to be better for relatively
longer data streams.
Figure 6 shows the effect of the system size on efficiency. The number of partitions in
each relation is 1024. A 16-fold increase in the mesh size produces a 13-fold speedup.
However, the efficiency drops to 77 from 93, due to the long pipelining effect, especially,
at the beginning of the first join class (head effect) and the end of the last one (tail effect).
The stability of the system is tested by a simultaneous proportional increase in the system
and relation sizes. The ratio of relation size to system size is kept fixed, at 64, but their
respective sizes are increased by 4, 16 and 64. The observed efficiency was 87.5 ∓ 0.3.
These results show that system or relation size has no adverse effect on the efficiency.
The drop in the efficiency was only 0.6% while the increase in system size (in terms of
number of processors) and relation size (in terms of number of partitions) was 64-fold. This
indicates that the overhead incurred by long pipelining can be outweighed by continuous
input streams of larger join classes, making the head and tail effects of pipelining negligible.
832
M. BOZYIGIT, S. A. MOHAMMED AND M. AL-TAYYEB
Efficiency (in %)
90
80
70
Simple pipelining
Synchronised pipelining
60
50
1000
Figure 5.
2000
Number of partitions
3000
4000
Effect of relation size on efficiency
Efficiency & Speed-up
80
60
Efficiency, % (in percentage)
Speed-up (in number of processors)
40
20
10
Figure 6.
20
30
40
Number of processors
50
Effect of system size (number of processors) on efficiency
60
833
PARALLEL JOIN FOR IBGF PARTITIONED RELATIONAL DATABASES
Efficiency & Sopeed-up
70
Speed-up ( for 64 processors)
Efficiency in % (percentage)
60
50
40
10
Figure 7.
20
Number of disk units
30
Effect of number of storage units
The number of storage units is another factor, an increase of which is expected to increase
the parallelism in input, thus increasing the throughput. An increase in the degree of data
distribution over a large number of storage units reduces contention on individual storage
units and the interconnection network, allowing increased parallelism, reflected in the
speedup and efficiency as shown in Figure 7. The tests are made on the relations of 1024
partitions and a system of 64 processors, under varying numbers of storage units. However,
no performance improvement is expected after some optimum value of the number of
storage units.
Similarly, experiments showed a proportional efficiency increase with higher storage-toprocessor (s-p) connectivity, under a fixed number of storage units, which strengthens the
scalability of the algorithm.
6.3. Results on nonuniform relations
Contrary to the uniform case, the sizes of interval classes and join classes may show considerable variations in performance for the nonuniform case. The degree of nonuniformity
can be represented by a skew in attribute values, as mentioned before. For the same number
of tuples, a highly skewed relation will have more join classes of different sizes than a
moderately skewed one. A high skew implies the presence of a large number of join classes,
many with fewer partitions.
To test the effect of the skewedness, an experiment was carried out with two relations
of varying skewedness, represented by a degree of skew factor, f . The factor f is allowed
to vary from 0.1 to 0.9 where a value of 0.5 indicates a uniform relation and a value under
or above 0.5 indicates a left or right inclination, respectively. Table 4 shows efficiency
834
M. BOZYIGIT, S. A. MOHAMMED AND M. AL-TAYYEB
Table 4.
f of R2
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Effect of nonuniformity (f = skew factor) on efficiency
0.1
0.2
0.3
0.4
f of R1
0.5
0.6
0.7
0.8
0.9
29.0
41.4
58.7
61.8
67.9
45.5
33.4
26.1
18.7
40.9
34.8
46.4
53.2
60.9
42.0
35.9
29.5
26.4
58.6
45.0
40.9
47.6
59.6
40.4
36.9
33.4
33.4
62.2
53.0
46.2
43.8
60.3
41.1
40.3
41.5
43.7
65.0
64.4
63.1
62.4
70.2
62.7
63.7
64.8
75.9
48.0
42.6
43.9
44.1
61.5
40.8
44.6
51.2
62.2
32.4
34.9
37.0
42.2
61.0
45.9
35.8
42.0
59.9
26.1
29.3
34.5
40.2
61.5
50.8
40.9
27.8
31.6
18.5
26.4
36.4
45.5
66.1
62.1
58.1
32.1
22.2
as affected by f . The highest efficiency is experienced by the uniform relations where
f 1 = f 2 = 0.5. The smallest efficiency is experienced when the data-skew is at its
extreme ends, that is f = 0.1 or f = 0.9. Table 4 is not symmetrical as R1 and R2 may
not give the same size join classes at the same f values.
The number of storage units and s-p connections show similar trends to those of the
uniform case. An increase in the size of the interconnection network or number of the storage
units, under fixed application conditions, is not expected to improve the performance after
a turning point, beyond which further parallelism is not possible.
The uniform algorithm determines the available partition labels from the partition levels.
The same algorithm cannot deduce this information for nonuniform relations. The partition
level for the nonuniform relations can only indicate the range of the partition labels. In
highly skewed relations, this range could be very large. For example, a possible range of
partition labels for an eight partition uniform relation with f = 0.5 is 0 to 7, whereas for
an equal size relation with f = 0.9 the same range can be from 0 to 63, resulting in an
eight-fold difference. This is because most of the partitions in the range are either empty or
have low occupancies. The nonuniform algorithm can handle this variance at the expense
of directory maintenance overhead as part of the partitioning process, which is excluded
from the performance computations.
If the parallel join algorithm developed for the uniform case is applied to the nonuniform
case, the processors that take part in the join of one join class may remain idle during the
join of another or vice versa. Thus, the cumulative processor idle time may amount to a level
where parallel processing is not an advantage. For example, the application of the uniform
algorithm to the join of nonuniform relations, with f = 0.3 or f = 0.7, experienced an
efficiency decline 700 times greater than that of the nonuniform algorithm, whereas for
f = 0.5 the ratio was nearly 1. The uniform algorithm reads all the possible partitions,
most of which may be nonexistent, if it is applied to the nonuniform relations.
7. CONCLUSION
This study is about exploiting a parallel join operation in relational databases where the
relations are organised according to an IBGF scheme. The relations with uniform or nonuniform key distributions are partitioned and stored over a multi-unit external storage medium,
along with various partitioning parameters.
The parallel algorithms for both uniform and nonuniform relations are implemented,
assuming that each processor hosts an instance of the kernel algorithm. From the partition
PARALLEL JOIN FOR IBGF PARTITIONED RELATIONAL DATABASES
835
level and the join class labels, an instance of a kernel algorithm is able to determine the
partitions it should input for the current join. For any attribute, it will know to or from
which pipeline to send or receive a particular partition.
The simulation results have shown that the entire approach is feasible, with good scalability characteristics. An increase in the system or relation sizes reflects a higher efficiency
or speedup. A balanced distribution of data and processing shows a similar trend.
The results also show that, for the best performance, uniform and nonuniform relations
must be handled by their respective parallel algorithms. The nonuniform algorithm balances
the inherited imbalance, but it cannot eliminate the effect of the skew factor completely.
The uniform algorithm with uniform relations still shows a better performance compared
to the nonuniform algorithm with nonuniform relations.
The formation of an IBGF directory tree and partition distribution costs are excluded
from the time complexity (cost) of the parallel join algorithms. These costs are considered
negligible for the relations, where transient changes are either infrequent or nonexistent.
Otherwise, especially for nonuniform cases, it may be difficult to ignore the effect of
dynamic directory maintenance and the relation re-partitioning. However, the simulation
results are to create a base for further research on the issue of dynamic directory maintenance
and testing the algorithms on real parallel systems.
ACKNOWLEDGEMENTS
In particular, the authors would like to acknowledge the contribution of the author of
[7] on IBGF-based relation partitioning. Acknowledgement is also due to the King Fahd
University of Petroleum & Minerals, Dhahran, Saudi Arabia. We extend our sincere thanks
to the anonymous reviewers for their helpful comments.
REFERENCES
1. D. Dewitt and R. Gerber, ‘Multiprocessor hash based join algorithms’, Proceedings of VLDB
(Stockholm), 1985.
2. V. Patrick, ‘Join indices’, ACM Trans. Database Syst., 12, (2), (1987).
3. M. Negri and G. Pelagatti, ‘Distributed join: A new algorithm for joining relations’, ACM Trans.
Database Syst., 16, (4), (1991).
4. F. T. Leighton, Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes,
Morgan Kaufmann Publishers, Inc., 1992.
5. M. Bozyigit, S. A. Mohammed and M. Tayyeb, ‘Parallel join algorithms for multicomputer
systems’, Proceedings of 7th ISCIS – International Symposium on Computer & Information
Sciences, Antalya, Nov. 1992.
6. M. C. Murphy and D. Rotem, ‘Multiprocessor join scheduling’, IEEE Trans. Knowl. Data Eng.,
5, (2), (1993).
7. M. Ouksel, ‘The interpolation based grid file: a multidimensional dynamic order-preserving
partition scheme’, Proc. ACM SIGMOD Symp. on Principles of Database Systems, Portland,
Oregon, March 1985.
8. E. A. Ozkarahan and C. H. Bozsahin, ‘Join strategies using dataspace partitioning’, New Gener.
Comput., (June 1988).
9. D. H. Bitton, H. Boral and D. J. Witt, ‘Parallel algorithms for execution of relational database
operation’, ACM Trans. Database Syst., 8, (3), (1983).
10. J. Richardson, H. La and K. Mekkilineni, ‘Design and evaluation of parallel pipelined join
algorithms’, SIGMOD Management of Data Conference, ACM, 1987.
11. V. Kumar, A. Grama, A. Gupta and G. Karypis, Introduction to Parallel Computing: Design
and Analysis of Algorithms, Benjamin/Cummings Publishing Company, Inc, 1994, p. 597.
836
M. BOZYIGIT, S. A. MOHAMMED AND M. AL-TAYYEB
12. E. R. Omiecinski and E. T. Lin, ‘Hash-based and index-based join algorithms for cube and ring
connected multicomputers’, IEEE Trans. Knowl. Data Eng., 1, (1), (1989).
13. R. J. Enbody and H. C. Du, ‘Dynamic hashing’, ACM Comput. Surv., 20, (2), (1988).
14. K. A. S. Abdul-Ghaffar and A. El Abbadi, ‘Optimal disk allocation for partial match queries’,
ACM Trans. Database Syst., 18, (1), (1993).
15. E. Rahm, ‘Empirical performance evaluation of concurrency and coherency control protocols
for database sharing systems’, ACM Trans. Database Syst., 18, (2), (1993).
Документ
Категория
Без категории
Просмотров
2
Размер файла
133 Кб
Теги
partition, parallel, relations, joint, database, ibgf
1/--страниц
Пожаловаться на содержимое документа