close

Вход

Забыли?

вход по аккаунту

?

Integration of partially homogeneous databases

код для вставки
Integration of partially homogeneous databases Vinokurov Anatoly Stanislavovich Sholom-Aleichem Priamursky State University Undergraduate Abstract The article considers an experiment on the integration of two partially homogeneous databases - the
УДК 004.42
original article: http://e-postulat.ru/index.php/Postulat/article/view/1679/1713
Integration of partially homogeneous databases
Vinokurov Anatoly Stanislavovich
Sholom-Aleichem Priamursky State University
Undergraduate
Abstract
The article considers an experiment on the integration of two partially
homogeneous databases - the database of the admission Commission with the
database of the Dean of one University in its own software system, as well as
bringing the database to a common denominator by identifying database objects
with the subsequent integration of its tables.
Keywords: information technology, automated system, database integration,
database merging, database consolidation, homogeneous databases, partial
homogeneity, identification of database objects.
To conduct an experiment on the integration of two partially homogeneous
databases, we used an automated system for the integration of partially
homogeneous databases developed within the framework of the master's thesis
"research of questions and development of methods for the integration of partially
homogeneous databases" [1].
Many researchers were involved in the integration of databases. D. A.
Malakhov, V. A. Serebryakov, K. B. Teimurazov, O. N. Shorin [2] integrated the
databases of library resources of the Russian National Library with the database of
the British National Library. We have developed our own software application that
integrates these databases. M. E. Bibikov [3] described the implementation of
integration of databases of the Russian nuclear power plant with the use of the
software procedures in the language of logic programming ABIS. The problems of
integration of data bases were considered and proposed to use the algebra of tuples
as the main method of integration of databases and knowledge of intelligent
systems. The study presented the results of the experiment of such integration. Yu.
G. Kononov, V. M., Patel [5] conducted a study of methods of integration of
databases of multiple energy retail and distribution network companies. A. O.
Erkimbaev, A. B. Zhizhchenko, V. Y. Zitserman, G. A. Kobzev, E. E. Son, A. N.
Centurions [6] analyzed the problem of integrating databases of scientific
electronic resources. As a result of the study, the authors proposed to use a model
based on XML language standards as an integrating data model in such databases.
A. I. Nelyubin [7] considered the development of a corporate information system
and its database, and then integrated it with another information system operating
in the structural unit of the company.
As part of the study, it was decided to conduct an experiment on two
partially homogeneous databases – the database of the selection Committee (Fig.
1) and the base of the Dean's office (Fig. 2).
Figure 1-physical structure of the admissions database
Figure 2-physical structure of the Dean's database
These databases store similar information about students of higher
education. Database of admissions holds personal information about students who
applied for admission in the dial group in a particular specialty, the address of
residence of the applicant, passport data and information about the parents of the
applicant (full name of parent, address, place of work). The database of the Dean's
office of the faculty of mathematics stores personal data about students of the
faculty of higher education, and includes information about the transcripts and
academic performance, as well as the address of residence of the student and
information about his parents (Name, surname, patronymic of the parent, address,
place of work, position).
We have taken only the structure of database tables used in one of the higher
education institutions. For the experiment-current databases have been prepared for
integration:
1. The name of the tables and columns of each database was changed,
the names convenient for visual perception, but the structure of the
database was not affected;
2. In order to comply with the non-disclosure of personal information
about real students, their data (including real name, first name, date of
birth, address and phone number) were replaced by random
information so as to increase the probability of the presence of
information about the same person in both databases.
As a result, the database of the Dean's office was filled with 215 records of
students, and the database of the selection Committee was filled with 2,000 records
of applicants.
Let's take a closer look at the source databases. The admissions database has
a table "Dannye_ob_abiturientah" which stores information about applicants. In
the database of the deanery of that table there, and information about the students
is spread across two tables – "Studenty_ochnogo_otdeleniya" and
"Studenty_zaochnogo_otdeleniya". Table "Dannye_ob_abiturientah" in the
database of the selection Committee for the appointment of the same as two tables
"Studenty_ochnogo_otdeleniya" and "Studenty_zaochnogo_otdeleniya" in the
database of the deanery, and stores of similar type attributes.
In order to merge the database admissions database of the deanery will need
to bring the database of the deanery to a common denominator – to-table
"Uchebnaya_kartochka". To do this, run our program and select "integrate tables of
one database" (Fig. 3).
Figure 3-Step_one Form»
In the next step, you must enter the database connection parameters in the
appropriate fields. To hide/display the password entered by the user in the
password field, the PopupMenu component is used. The Connect button connects
to the database based on the connection parameters entered by the user, and also
monitors connection errors. The status of the database connection is displayed in
the StatusBar in the lower left corner of the form (Fig. 4).
Figure 4-Form " Tables_Step_two»
Next, the user is shown a list of all tables and fields of the selected database
table. The button "Customize table" produces hides the current form and navigate
to a special form "Setting_up_tables", but in this example, this configuration is not
required,
as
the
table
"Studenty_ochnogo_otdeleniya"
and
"Studenty_zaochnogo_otdeleniya" have almost similar structure (Fig. 5).
Figure 5-Form " Tables_Step_three»
The next step is to select the first table and its column, as well as the second
table and its column, which contain similar data, and set the matching coefficient.
This factor is responsible for the weights of the matching criteria and participates
in the similarity formula in the next step. The coefficient can be of the following
types: "Semantic attributes", "time Attributes", "Universal identifier", "foreign key
Attributes", "Facts". Clicking on the "set match" button displays the selected
parameters in the table located in the Central part of the form. Based on the userdefined correspondence, the program will then start to compare the records of the
selected columns of each of the database tables.
Since the tables we have selected contain links to the attributes of other
tables (tables "Adres", "Fakultety", "Specialnosti", "Gruppa", "Zachetka",
"CHto_okonchil") of the Dean's database, the foreign keys of such tables should be
added to the "set the correspondence to the foreign keys of the database tables" at
the bottom of the form. This item is optional and can be configured at the user's
discretion. Clicking on the "Add foreign key tables" button displays the selected
parameters in the table at the bottom of the screen (Fig. 6).
Figure 6-Form " Tables_Step_four»
In the next step, a SQL query is generated, which is represented by an
enumeration of the tables selected in the previous step, as well as the foreign keys
of such tables. This is the order in which the information will be displayed in the
tables at the top of the form.
The "Calculate match" button extracts a record from the n-th column of the
first table and compares it with all records of the n-th column of the second table.
Then the next record of the same column of the first table is extracted and
compared with all records of the column of the second table. The cycle continues
until all the columns in both database tables are finished. At the end of the
calculation, the user will be displayed in the table located in the Central part of the
form, the following information: the name of the compared record of the n-th
column of the first table, opposite it will be displayed the name of the compared
record of the n-th column of the second table, next to it the similarity coefficient;
and also provides output information in the penultimate column of this table,
containing the arithmetic mean value of similarity coefficients and the inscription
"similar" in the last column of the table opposite the compared records. This
inscription is set by the program automatically based on the value selected by the
user in the item "Consider similar records, the coefficients of which", and is
assigned to records whose arithmetic mean value of the coefficients is greater than
this value. Also, the user can change this value or assign it to other records by
simply clicking on the desired cell of the last column of the table (for convenience,
the creation of the ComboBox component was programmatically added when
clicking on the StringGrid cell) (Fig. 7 – 10).
Figure 7 - part of the table form "Tables_Step_five»
Figure 8 – Continued table form "Tables_Step_five»
Figure 9 – Continued table form "Tables_Step_five»
Figure 10 - setting the values to" similar " manually, in the table of the form
"Tables_Step_five»
In the next step, the user is shown all records from the table of the previous
step that have been assigned the "similar" property (Fig.11).
Figure 11-Form " Tables_Step_six»
The next step is to choose what to do with such records. As our goal is the
necessity of a merger (integration) of the table "Studenty_ochnogo_otdeleniya"
and table "Studenty_zaochnogo_otdeleniya" in table "Uchebnaya_kartochka",
choose the option "Create a new table and move it like the record". Specify a name
for the new table and choose which columns and foreign keys (specified when
setting up a match on the form "Tables_Step_four") to use when transferring
information. Also select "Transfer all records and replace similar" to transfer all
data
from
the
tables
"Studenty_ochnogo_otdelenia"
and
"Studenty_zaochnogo_otdelenia" excluding duplicate records. Now the original
tables "Studenty_ochnogo_otdelenia" and "Studenty_zaochnogo_otdelenia" can be
deleted manually, as there is no need for their presence in the database (Fig. 12).
Figure 12-Form " Tables_Step_seven»
In the next step, the program displays a list of all the tables in the Dean's
database. When you select a cell in the first table located in the upper left corner of
the form, a list containing the name of the columns of the selected table is
displayed, as well as a complete list of parameters in the second table located in the
lower left corner of the form. All information stored in the selected table is
displayed to the user in the third table on the right side of the form.
The "Shut down" button closes the program. The button " Go to the main
form (Step1) "closes the current form and goes to the form" Tables_Result " (Fig.
13).
Figure 13-Form " Tables_Result»
As a result of the integration of the database tables of the Dean's office, table
"Studenty_ochnogo_otdeleniya" and table "Studenty_zaochnogo_otdeleniya" was
integrated into the table "Uchebnaya_kartochka" within the database of the
deanery, and all similar (duplicate) records in the source tables have been moved to
the table "Uchebnaya_kartochka" overwrite. After we have brought the database of
the Dean's office to the common denominator – to the table
"Uchebnaya_kartochka", the original tables can be deleted manually, since there is
no need for their presence in the database.
Upon completion of the integration of the database tables of the Dean's
office, the physical structure of the database has acquired the following form (Fig.
14).
Figure 14-physical structure of the Dean's database
Once we have brought the database of the Dean's office to a common
denominator, we can begin to integrate it with the database of the selection
Committee. To do this, run our program again and select "integrate two databases"
(Fig. 15).
Figure 15-Step_one Form»
In the next step, you must enter the connection parameters to the database of
the Dean's office and the database of the selection Committee in the appropriate
fields. Clicking on the "Connect" button connects to the database based on the
connection parameters entered by the user, and also monitors connection errors.
The status of the database connection is displayed in the StatusBar at the bottom of
the form (Fig. 16).
Figure 16 - Form " Database_Step_two»
Next, the user is shown a list of all tables and fields of the selected database
table (Fig. 17).
Figure 17 - Form " Database_Step_three»
The "Configure tables" button hides the current form and goes to the special
form "Setting_up_tables" (Fig. 18).
Figure 18 – Pressing the "Set table DB1" form "Database_Step_three»
Let us consider the work with the form "Setting_up_tables".
The" Dannye_ob_abiturientah "table of the admissions database has a"
FIO_abiturienta " column that stores information about the applicant's surname,
name and patronymic, separated by a space. In the table "Uchebnaya_kartochka"
database of the deanery such a column there, and information about surname, name
and patronymic of the student dispersed in three columns – "Familiya", "Imya" and
"Otchestvo". The column "FIO_abiturienta" table "Dannye_ob_abiturientah" in the
database of the selection Committee for the appointment of the same as the three
column "Familiya", "Imya" and "Otchestvo" table "Uchebnaya_kartochka" in the
database of the deanery, and stores of similar type attributes.
In order to merge the database admissions database of the deanery will need
to bring three columns "Familiya", "Imya" and "Otchestvo" table
"Uchebnaya_kartochka" database of Dean's office to a common denominator –
column "FIO_studenta" (Fig. 19).
Figure 19-Form " Setting_up_tables»
The " Rodstvennik1 "table of the admissions database has a"
FIO_rodstvennika " column that stores information about the applicant's father's
surname, name and patronymic, separated by a space. In the table
"Uchebnaya_kartochka" database of the deanery such a column there, and
information about surname, name and patronymic of the father of a student are
dispersed in three columns – "Familiya_Otca", "Imya_Otca" and
"Otchestvo_Otca". The column "FIO_rodstvennika" table "Rodstvennik1" in the
database of the selection Committee for the appointment of the same as the three
column "Familiya_Otca", "Imya_Otca" and "Otchestvo_Otca" table
"Uchebnaya_kartochka" in the database of the deanery, and stores of similar type
attributes.
In order to merge the database admissions database of the deanery will need
to bring three columns "Familiya_Otca", "Imya_Otca" and "Otchestvo_Otca" table
"Uchebnaya_kartochka" database of Dean's office to a common denominator –
column "FIO_Otca" (Fig. 20).
Figure 20 - Form " Setting_up_tables»
The "Rodstvennik2" table of the admissions database has a
"FIO_rodstvennika" column that stores information about the applicant's mother's
surname, name and patronymic, separated by a space. In the table
"Uchebnaya_kartochka" database of the deanery such a column there, and
information about surname, name and patronymic of the mother of a student are
dispersed in three columns – "Familiya_Materi", "Imya_Materi" and
"Otchestvo_Materi". The column "FIO_rodstvennika" table "Rodstvennik1" in the
database of the selection Committee for the appointment of the same as the three
column "Familiya_Materi", "Imya_Materi" and "Otchestvo_Materi" table
"Uchebnaya_kartochka" in the database of the deanery, and stores of similar type
attributes.
In order to merge the database admissions database of the deanery will need
to bring three columns "Familiya_Materi", "Imya_Materi" and "Otchestvo_Materi"
table "Uchebnaya_kartochka" database of Dean's office to a common denominator
– column "FIO_Materi" (Fig. 21).
Figure 21 - Form " Setting_up_tables»
Upon completion of the setup of the columns of the database table of the
Dean's office, the physical structure of the database took the following form (Fig.
22).
Figure 22-physical structure of the Dean's database
The next step is to select the table and its column in the first database, and
the table and its column in the second database, which contain similar data, as well
as the match factor. Clicking on the "set match" button displays the selected
parameters in the table located in the Central part of the form. Based on the userdefined correspondence, the program will then start to compare the records of the
selected columns of the tables of each of the database.
Since the tables we have selected contain foreign keys of other tables, such
foreign keys must be added in the "match the foreign keys of the tables of each
database" section of the form. Clicking on the "Add foreign key tables" button
displays the selected parameters in the table at the bottom of the screen (Fig. 23).
Figure 23 - Form " Database_Step_four»
In the next step, a SQL query is generated, which is represented by an
enumeration of the table columns of each of the databases selected in the previous
step when matching, as well as the foreign keys of such tables.
The "Calculate match" button extracts the record from the n-th column of the
table of the first database and compares it with all records of the n-th column of the
table of the second database. Then, you extract the next record of the same table
column of the first database and compare it with all the records of the table column
of the second database. The cycle continues until all the columns of the tables in
both databases are finished (Fig. 24 – 27).
Figure 24 – part of the table form "Database_Step_five»
Figure 25 – Continued table form "Database_Step_five»
Figure 26 – Continued table form "Database_Step_five»
Figure 27 - setting the "similar" value manually in the form table
"Database_Step_five»
In the next step, the user is shown all records from the table of the previous
step that have been assigned the "similar" property (Fig. 28).
Figure 28 - Form " Database_Step_six»
As we can see from figure 28, as a result of comparing 215 records of
students of the Dean's database with 2000 records of the admissions database, a
total of about 27 records coincided.
The next step is to choose what to do with such records. Since our goal is the
need to merge (integrate) the database of the Dean's office with the database of the
selection Committee in a new database, choose the option "Create a new database
and transfer such records to it." Enter the name of the new database, its location,
and, if desired, you can add tables and their columns from the source databases. If
you do not add additional tables, a single table "New_table_1" will be created, and
it will create columns "New_column_n" containing data from the matches placed
by us (on the form "Database_Step_four") (Fig. 29). The data type for such
columns is set automatically by the program as follows: if both columns
participating in the correspondence have the same data type, then this type is
assigned to the corresponding column "New_column_n" of the table
"New_table_1", otherwise, the column "New_column_n" of the table
"New_table_1" will have the data type "Varchar", and the user will have the right
to convert from one data type to another using various third-party software
solutions.
Figure 29 - Form " Database_Step_seven»
Let's take a closer look at the database structure created by the program. In
the new database "New_Database_three" was created table "New_table_1".
Consider all the columns in this table (Fig. Thirty):
• Column New_ID contains the ID key;
• Column New_column_1 contains configured in the form of
"Database_Step_four" line: column FIO_studenta table Uchebnaya_kartochka
database of Dean's office corresponds to a column FIO_abiturienta table
Dannye_ob_abiturientah database of admissions;
• Column New_column_2, contains the following line: column Data_rozhdeniya
table Uchebnaya_kartochka database of Dean's office corresponds to a column
Data_rozhdeniya table Dannye_ob_abiturientah database of admissions;
* Column New_column_3, contains the following correspondence: the Pol column
of the Uchebnaya_kartochka table of the Dean's database corresponds to the Pol
column of the Dannye_ob_abiturientah table of the admissions database;
• Column New_column_4, contains the following line: column Okrug table Adres
database of Dean's office corresponds to a column Nazvanie_regionalnoj_oblasti
table Regionalnye_oblasti database of admissions;
• Column New_column_5, contains the following line: column of the table Adres
Gorod
database
of
Dean's
office
corresponds
to
a
column
Naimenovanie_naselennogo_punkta table Naselennyj_punkt database of
admissions;
• Column New_column_6, contains the following line: column of the table Adres
Ulica database of Dean's office corresponds to a column Nazvanie_ulicy table
Adresa database of admissions;
• Column New_column_7, contains the following line: column Dom table Adres
database of Dean's office corresponds to a column Nomer_doma table Adresa
database of admissions;
• Column New_column_8, contains the following line: column Dom table Adres
database of Dean's office corresponds to a column Nomer_doma table Adresa
database of admissions;
* Column New_column_9, contains the following match: the column Korpus of
the Adres table of the Dean's database corresponds to the column Korpus of the
Adresa table of the admissions database;
• Column New_column_10, contains the following line: column Kvartira table
Adres database of Dean's office corresponds to a column Nomer_kvartiry table
Adresa database of admissions;
• Column New_column_11, contains the following line: column Telefon table
Uchebnaya_kartochka database of Dean's office corresponds to a column
Kontaktnyj_telefon table Dannye_ob_abiturientah database of admissions;
• Column New_column_12, contains the following line: column group table group
database of Dean's office corresponds to a column Nazvanie_nabiraemoj_gruppy
table Nabiraemye_gruppy database of admissions;
• Column New_column_13, contains the following line: the column Title of the
table Fakultety database of Dean's office corresponds to a column
Abbreviatura_fakulteta table Fakultety database of admissions;
• Column New_column_14, contains the following line: column
Nazvanie_specialnosti table Specialnosti database of Dean's office corresponds to
a column Nazvanie_specialnosti table Specialnosti database of admissions;
• Column New_column_15, contains the following line: column
Name_CHto_okonchil table CHto_okonchil database of Dean's office corresponds
to
a
column
Nazvanie_SHkoli_kotoruyu_zakonchil_abiturient
table
SHkola_kotoruyu_zakonchil_abiturient database of admissions;
• Column New_column_16, contains the following line: column FIO_Otca table
Uchebnaya_kartochka database of Dean's office corresponds to a column
FIO_rodstvennika table Rodstvennik1 database of admissions;
• Column New_column_17, contains the following line: column
Mesto_Raboty_Otca table Uchebnaya_kartochka database of Dean's office
corresponds to a column Mesto_raboty table Rodstvennik1 database of
admissions;
• Column New_column_18, contains the following line: column FIO_Materi table
Uchebnaya_kartochka database of Dean's office corresponds to a column
FIO_rodstvennika table Rodstvennik2 database of admissions;
• Column New_column_19, contains the following line: column
Mesto_Raboty_Otca table Uchebnaya_kartochka database of Dean's office
corresponds to a column Mesto_Raboty_Materi table Rodstvennik2 database of
admissions.
Figure 30-Physical structure of the database " New_Database_three»
In the next step, the program displays a list of all the tables in the Dean's
database. When you select a cell in the first table located in the upper left corner of
the form, a list containing the name of the columns of the selected database table is
displayed, as well as a complete list of parameters in the second table located in the
lower left corner of the form. All information stored in the selected database table
is displayed to the user in the third table on the right side of the form (Fig. 31).
Figure 31 – Form "Database_Result" program
As a result of the experiment on the integration of the database of the Dean's
office with the database of the selection Committee, a new database
"New_Database_three" was created, and all similar (similar) records stored in the
tables of the original databases were moved to the table "New_table_1" with
overwriting.
Thus, we can make sure that we have developed an automated system for the
integration of partially homogeneous databases is able to automate the process of
bringing the databases to a common denominator, the process of integration of
tables within one database, as well as the integration of two partially homogeneous
databases through automatic calculation of similarity coefficients.
It is also worth noting that in the presence of certain meta-information
reflecting the correspondence between the tables of databases, the automated
system for integration can be applied to solving the problems of identification of
objects in heterogeneous (heterogeneous) databases.
Bibliographic list
1. Винокуров А.С. Разработка автоматизированной системы по интеграции
частично гомогенных баз данных // Постулат. 2018. №6 [Электронный
ресурс]. URL: http://e-postulat.ru/index.php/Postulat/article/view/1665/1699
(дата обращения: 18.06.2018).
2. Малахов Д.А., Серебряков В.А., Теймуразов К.Б., Шорин О.Н. Интеграция
библиографических данных в Linked Open Data. // Труды 16-й
Всероссийской научной конференции «Электронные библиотеки:
перспективные методы и технологии, электронные коллекции», Дубна,
Россия, 2014. С. 35-41.
3. Бывайков М.Е. Интеграция баз данных при разработке систем верхнего
(блочного) уровня АСУ ТП АЭС // XII Всероссийское совещание по
проблемам управления ВСПУ. М.: 2014. С. 4687-4694.
4. Зуенко, А.А. Интеграция баз данных и знаний интеллектуальных систем
на основе алгебраического подхода / А.А. Зуенко, Б.А. Кулик, А.Я.
Фридман // Open Semantic Technologies for Intelligent Systems (OSTIS2011): материалы Междунар. научн.-техн. конф. (Минск, 10-12 февраля
2011г.) / редкол.: В. В. Голенков (отв. ред.) [и др.]. – Минск: БГУИР, 2011.
- С. 59-70.
5. Кононов Ю.Г., Пейзель В.М. Интеграция баз данных энергосбытовых и
распределительных сетевых компаний для мониторинга потерь
электроэнергии // Автоматизированные системы и комплексы. 2007.
№2(14). С. 195-202.
6. Еркимбаев А.О., Жижченко А.Б., Зицерман В.Ю., Кобзев Г.А., Сон Э.Е.,
Сотников А.Н. Интеграция баз данных по свойствам вещества. Подходы и
технологии // -я Международная конференция «НТИ-2012», посвященная
60-летию Винити Ран «актуальные проблемы информационного
обеспечения науки, аналитической и инновационной деятельности». М.:
Винити Ран, 2012. С. 1-36.
7. Нелюбин А.И. Интеграция базы данных АРМ «Технолог доменного цеха»
в информационную структуру ОАО «ММК» / А. И. Нелюбин //
Теплотехника и информатика в образовании, науке и производстве:
сборник докладов IV Всероссийской научно-практической конференции
студентов, аспирантов и молодых ученых «Теплотехника и информатика в
образовании, науке и производстве» (ТИМ’2015) с международным
участием, посвященной 95-летию основания кафедры и университета. —
Екатеринбург: УрФУ, 2015. — С. 333-335.
1/--страниц
Пожаловаться на содержимое документа