вход по аккаунту


View Maintenance in a Warehousing Environment

код для вставкиСкачать
Data Warehouse
View Maintenance
Presented By:
Katrina Salamon
For CS561
What is a Data Warehouse?
Repository of integrated information
As information becomes available from a
source it is added to the repository
What is a View?
A function from a set of base tables to a
derived table
Can be recreated every time the view is
What’s a Materialized View
A view where the tuples are stored in a
database (or warehouse)
Can create indexes on them
Provides fast data access
Similar to a cache
What’s View Maintenance
View data becomes out of date when base
tables are changed
Updating the view to reflect these changes
is called view maintenance
Sounds Easy Right!
Sounds Easy Right!
Here’s Why. . .
Data sources are typically legacy systems
and do not understand views
Sources can tell the warehouse there is
new data, but they cannot determine if any
additional data is needed
Ideal World – new record is added to base
relation and view is notified and updated
The Real World
– Maintenance Anomaly – trying to update
a view while the underlying data is
Update Anomaly
Deletion Anomaly
The (Possible) Solutions
Recompute the view
Store all relations involved in the
Eager Compensating Algorithm (ECA)
Recompute the View
– Whenever an update occurs
– At a periodic interval
Time and Resource intensive especially in
a distributed environment (transferring of
data from one source to the other)
Storing Base Relations
Keep up-to-date copies of all relations in
the warehouse, queries can be evaluated
locally and no anomalies occur
Takes up extra space in the warehouse,
storing duplicate data
Copied relations still need to be updated
Eager Compensating Algorithm
Most promising solution
– No duplicating base relations or
recomputing overheads
All queries sent have compensating
queries added to them to offset concurrent
updates to the source data
ECA cont. . .
Strongly Consistent
– Upon competition of activity, view is
consistent with base relations
– Every View state has a corresponding
state in the base relations and they are
completed in order
Not complete
– Every source state may not be reflected
in a view state (direct mapping)
How ECA Works - 4 basic events
1. Source executes an update (U) and
notification is sent to the warehouse
2. Warehouse receives update (U) and
creates query (Q) to be evaluated by the
3. Source evaluates query (Q) against base
relations and sends answer (A) to
4. Warehouse receives query result and
updates view
Resolving Anomalies
Two Updates: Query1 is assumed to be
computed before Update2 but is actually
computed after Update2
– ECA knows that is happens and takes
Update2 into account when Updating the view
by using a compensating query for each
query it creates
Resolving Issues
When using compensating queries, we
should not apply the results until after all
related queries have been received
If updates occurred after each query the
view could temporally be in an invalid state
To avoid invalid states ECA collects the
intermediate answers in a relation called
Collect (initialized to empty set)
Three insertions in to three base relations
and its affect on the view that references
Used to streamline the algorithm when a
key from the base relations are available in
the View
The Collect relation is initialized to current
View and becomes a working copy of the
ECA-Key Algorithms
Delete received, no query sent, delete is
directly applied to Collect
Insert received, query sent, no
compensating queries created, answers
are added to Collect and duplicate values
are ignored because of the keys
Once completed the tuples in Collect
replace the tuples for the View
ECA - Local
Combines the compensating queries of
ECA and the local updates of ECA-Key to
create a more streamlined query
Maintaining order of execution of local and
non-local processes is complicated and
will create a greater over head then other
Future work needs to be done to see if
this is a worthwhile approach
Performance Comparison
Total Bytes Transferred
Cardinality of Relation
Total Bytes Transferred
# of Source Updates
Review of ECA
Incremental updating approach, it doesn’t
start from scratch every time
No additional burden placed on sources
(timestamps or locks)
Compensating queries are only used when
more then update is occurring, keeping
computation costs low
Размер файла
1 935 Кб
Пожаловаться на содержимое документа