close

Вход

Забыли?

вход по аккаунту

?

ICSTC.2017.8011869

код для вставкиСкачать
2017 3rd International Conference on Science and Technology - Computer (ICST)
Semantic Search with Rule Reasoning
for Scholarship Information Search
Kartikadyota Kusumaningtyas, Khabib Mustofa
Department of Computer Science and Electronics, Faculty of Mathematics and Natural Sciences
Universitas Gadjah Mada
Yogyakarta, Indonesia
kartikadyota.k@mail.ugm.ac.id, khabib@ugm.ac.id
Abstract—Tendency to pursue graduate study has become a
positive trend in Indonesia. This is supported by the growing
number of agency or institution that organizes a postgraduate
scholarship program with a wide range of requirements.
Generally, finding information through the Internet is done by a
keyword-based search engine which possibly gives several
irrelevant results and relies on user’s ability in selecting
keywords. This research aims to provide an alternative by
building semantic search system for finding scholarship
information that allows natural language sentence instead of
keywords. It implements rule reasoning to obtain implicit
knowledge by defining the rules from basic knowledge that
explicitly defined. Among the rules that are used in this system,
data range restriction is used to determine data based on certain
thresholds. The experiment results show that the system has
recall rate 98.51% and precision rate 100% from 90 of 105 input
sentences that are able to be processed by the system.
Keywords—Semantic Search, Rule Reasoning, Data Range
Restriction, Scholarship
I. INTRODUCTION
Tendency to pursue graduate study has become a positive
trend in Indonesia. A survey against undergraduate students of
Japanese Language Education of Universitas Pendidikan
Indonesia [1] shows that 56% of 159 people want to continue
their studies into a master degree. Scholarships can be used as
an alternative funding source for taking graduate study.
Currently, several organizations or institutions offer
scholarships program for Indonesian people, both domestically
and abroad, with a wide range of requirements. Thus,
scholarship seekers are able to choose the appropriate
program.
In line with developments in information technology
applications, the existence of Internet with several search
engines help seekers to find information. Most of search
engines rely on user’s ability in selecting keywords as input
which possibly gives several irrelevant results [2], [3].
For instance, user performs a search by entering “beasiswa
apa yang memerlukan skor TOEFL 400” which means in
English “what scholarship requires TOEFL 400” as a
keyword. TOEFL is acronym for Test of English as a Foreign
Language. The expected result is a list of scholarships
requiring TOEFL score less than or equal to 400. Instead,
keyword-based search engine retrieved a list of website links
containing tokens of the keyword and sorted by the frequency
of occurrences in accordance algorithms. Of course it will give
varied information and potentially make user confused.
Furthermore, it takes more time to search and compare of each
result [4].
One alternative that can be used to overcome those
problems is semantic search with rule reasoning. Semantic
search is a subset of semantic web technology that allows
search based on a meaning of natural language sentence. Rule
reasoning is a mechanism for discovering and generating
implicit knowledge based on explicit knowledge [5]. One of
the proposed rules in this research is a rule with data range
restriction. Data range restriction is used to determine data
based on the specific thresholds, so the system can perform
reasoning to retrieve a list of scholarships requiring TOEFL
less than or equal to 400.
The rest of this paper is organized as follows. Section II
discusses related works of semantic web technology in various
models of systems. Section III explains the research methods.
Section IV evaluates the proposed alternative by giving
experimental data, as well as analysis of experimental results.
The conclusion and future works are given in Section V.
II. RELATED WORKS
Research on semantic web technology has been widely
implemented in various models of systems. Planning system
of travel package tours in South Sumatera [6] used Semantic
Web Rule Language (SWRL) to control consistency of data,
combined with Java Server Pages (JSP) to compute tour costs.
The results of this research indicate that the use of SWRL
rules can provide recommendations on tour packages in South
Sumatera by several input parameters.
In a different domain, [7] knowledge representation and
semantic reasoning realization of productivity grade are
developed using ontology and SWRL. The system consists of
data layer, model layer, and reasoning layer. Data layer
includes the original database. Model layer includes ontology,
SWRL rules, and Java Expert System Shell (JESS) reasoning.
Ontology is used to represent the original database and
constructed manually using Protégé 3.4. Rules are directly
defined in SWRL data range built-ins function. JESS
reasoning does semantic reasoning by converting those SWRL
Beasiswa Pendidikan Indonesia Program Magister dan Doktoral by
Lembaga Pengelola Dana Pendidikan (LPDP)
978-1-5386-1874-5/17/$31.00 ©2017 IEEE
2017 3rd International Conference on Science and Technology - Computer (ICST)
rules into SWRL JESS rules then stored in the ontology.
Reasoning layer includes data, inference engine, and
productivity grade. The system is able to make a decision
through user input to get productivity grade.
Semantic web technology with rule-based methods is also
used in Clinical Decision Support Systems (CDSS) to analyze
model of the rehabilitation activity to improve the upper limb
function of the patients [8]. CDSS provides personalization of
therapies. The system is validated with a functional
rehabilitation scenario includes several types of indicators,
medical ontology, and time annotations of different
granularities. Based on rules described in SWRL and semantic
annotations from biomedical and time ontology, the system is
able to do automated reasoning to determine personalization
of therapies.
This research strives to provide alternative of scholarship
information search problems using semantic search with a rule
reasoning approach. One of the rules that is used in this
research corresponds to [7], [8]. The distinction of this
research, the data range restrictions are constructed
dynamically by generating the parameters from the search
sentence. The abilities of the developed system are able to
understand the input of natural language search sentence in
question form of Bahasa (Indonesian language) and do
reasoning based on rules that are defined in the ontology to
provide the relevant answers.
A. Knowledge Base Modeling
Knowledge base modeling is used to populate individuals
and literals into scholarship ontology. Individuals and literals
are obtained from data extraction on official scholarship
websites. Before starting the process, the first thing to do is
determine the ontology structure. Scholarship ontology is
modeled in Web Ontology Language (OWL) and consists of
classes, properties, individuals, and literals.
1) Ontology Structure
a) Classes and subclasses: Class hierarchy is defined in
a top-down approach, which starts from the first-level then
breaks down into smaller segments. Classes are divided into
simple class and complex class. Simple classes determine the
individuals manually, while complex classes determine the
individuals from the restriction expression. Table I shows a
class hierarchy and restriction expressions in the ontology.
TABLE I.
Classes
Scholarship
CLASSES AND RESTRICTION EXPRESSION LIST
Subclasses
Master_scholarship
Doctoral_scholarship
Domestic_scholarship
III. RESEARCH METHODS
Overseas_scholarship
Two main processes in this semantic search with rule
reasoning system architecture are knowledge base modeling
and semantic search process (shown in Fig. 1). Each process
consists of a client-side module and a server-side module. The
client-side module accepts user input through a user interface
and also display the response sent by the server, while the
server-side module is used to process user input.
Australia_awards
LPDP_scholarships
DAAD_scholarships
Field_of_study
Months
Educational_
degree
Cost_components
Institution
Country
Sponsor
Area
Temporary
Restriction expression
 Scholarship 
scholarshipProvidesDegree
(master)
 Scholarship 
scholarshipProvidesDegree
(doctor)
 Scholarship 
scholarshipForLocation
(indonesia)
 Scholarship 
scholarshipForLocation
( (indonesia))
 Scholarship 
managedBy (aciar)
 Scholarship 
managedBy (lpdp)
 Scholarship 
managedBy (daad)
Educational_costs
Support_costs
-
 strDataProperty [< =
integer NumeraliaObject]
b) Properties: Properties are classified into two
categories, i.e. object property and datatype property.
Object properties link individuals to individuals, while
datatype properties link individuals to data values. Table II
and Table III show the object properties and datatype
properties in the ontology.
Fig. 1. Architecture of the system
2017 3rd International Conference on Science and Technology - Computer (ICST)
TABLE II.
Object properties
sponsored_by
managed_by
degree
field_of_study
location
application_date
costs_cover
sponsor
funded_by
fund
manage
level
study_program
major
department
interest
destination
application_period
component
OBJECT PROPERTY LIST
Restriction expression
a inverseOf (sponsored_by)
a sponsored_by
a inverseOf (funded_by)
a inverseOf (managed_by)
a degree
a field_of_study
a field_of_study
a field_of_study
a field_of_study
a location
a application_date
a costsCover
Fig. 2. SWRL Rules
The restriction expression of Temporary (shown in Fig. 3)
and Overseas_scholarship (shown in Fig. 4) are defined in
OWLRestriction due to the limitation support of Hermit
reasoner to perform reasoning for SWRL Built-Ins function
and NOT expression against the individual. In this case, the
use of OWLRestriction and SWRL rule can be interchanged.
a.
TABLE III.
equivalentTo
Fig. 3. OWLRestriction of Temporary class
DATATYPE PROPERTY LIST
Datatype properties
amount
ielts
gpa
office
study_period
register
start_course
requirement
selection_process
toefl
age
website
duration
sign_up
eligibility
Restriction expression
b study_period
b register
b requirement
Fig. 4. OWLRestriction of Overseas_scholarship class
B. Semantic Search Process
Semantic search process is adapted from the natural
language processing model for semantic web applications [9]
that consists of Parser, Triple Generator, Query Generator,
Query Processor, and Response Generator. HTML Generator
module is added to transform JSON Object into HTML. This
process is used by the user to perform a semantic search.
b.
equivalentTo
c) Individuals and literals: An individual is member of
class while a literal is a value of the property of an
individual. The individuals and literals in the ontology are
populated from data extraction result using OWL API.
2) Scraper
Scraper receives input from the user. This input is used as
a parameter to extract data from some official scholarship
websites. This data extraction process is carried out by JSoup
library. This research uses Australia Awards, DAAD, LPDP,
and Monbukagakusho official scholarship websites as sample
data. The extracted data are stored as individuals and literals in
scholarship ontology by OWL-API.
3) Rule Reasoning
Rule reasoning is responsible for doing ontology reasoning
to obtain implicit knowledge. The reasoning process is carried
out by Hermit reasoner. SWRL rules defined in the ontology
(shown in Fig. 2) represent the classes’ restriction expressions
as shown in Table I.
1) Parser
A parser is used to analyze input search sentence from
users then transformed into a parse tree so that the computer
can understand its meaning. This research reuse the parser
developed by [10] with the stages of the process as follows:
 Tokenization, the beginning process of natural
language processing where sentences are divided into
each word constituent called token.
 Part of speech (POS) Tagging, used to check the class
of words against each constituent word sentences.
 Phrase formation refers to Bahasa grammar.
 Analysis of syntactic function performed to determine
subject, predicate, and object of the sentence.
2) Triple Generator
Mapping the constituent phrases of a parse tree to the
ontology entities is done by TripleGenerator class. The
mapping process performed to all constituent phrases except
constituent with Numbers or Pronouns word class type. The
output of this process is an array list called ontologyTriple that
contains of RDF tuple (s, p, o).
2017 3rd International Conference on Science and Technology - Computer (ICST)
3) Query Generator
Query Generator is responsible for constructing a
SPARQL-DL query based on ontologyTriple received by
looking at the patterns formed. If the pattern contains a
Number word class type then its value is used as a parameter
for constructing the data range restriction in the ontology.
4) Query Processor
Query Processor will interact with the ontology and rule
reasoning to execute input query SPARQL-DL from Query
Generator and return query results.
5) Response Generator
Response Generator receives query result to be formed
into JavaScript Object Notation (JSON) format. The formed
JSON Object will be sent to the client side.
6) HTML Generator
This module accepts JSON Object from Response
Generator on the server side. Then it will be transformed into
HTML so it can be displayed to the users.
IV. RESULT AND DISCUSSIONS
The system is tested using black-box testing as well as
recall and precision testing. Black-box testing examines the
functionality of the system. Recall and precision testing
evaluates the search results.
A. Black-box Testing
The functionality of the system consists of three
categories, which are: (1) the ability to process natural
language sentence, (2) the ability to map phrase constituent
into ontology, and (3) the ability to perform query and
reasoning. Table IV shows the black-box testing results in
natural language processing category and mapping constituent
phrase category.
Based on the results as shown in Table IV, the system is
able to process the natural language search sentences. In
Bahasa grammar, the position of question word can be placed
in the beginning or in the middle or at the end of the sentence
[11] and this system is able to process and understand them.
This system is able to map the phrase constituents consisting
of one, two, or more words into scholarship ontology.
The black-box testing in query and reasoning category will
be shown in Table V. Query and reasoning category consists
of some test cases includes, same individual as, equivalent
object property, inverse of, transitive property, individual
classification, and data range restriction.
TABLE V.
Category
BLACK-BOX TESTING RESULTS FOR QUERY AND REASONING
CATEGORY
Test cases in Bahasa
question form
Query and reasoning
Apa saja beasiswa luar
 Same
negeri
untuk
jenjang
individual
master
as
What
are
overseas
scholarships for master
level
Beasiswa
apa
yang
 Equivalent
tersedia untuk jurusan
object
ilmu budaya
property
What scholarships are
available for humanities
majors
Apa saja kriteria beasiswa
 Equivalent
Australia Awards
datatype
property
What is the elibility of
Australia
Awards
scholarships
Siapa yang membiayai
 Inverse of
beasiswa lpdp
Who
funds
LPDP
scholarship
TABLE IV.
BLACK-BOX TESTING RESULTS FOR CATEGORY OF NATURAL
LANGUAGE PROCESSING AND MAPPING CONSTITUENT PHRASE
Category
Test cases in Bahasa question form
Processing the natural language search sentence
Kapan periode pendaftaran beasiswa lpdp
 P-S
dibuka
When is the registration period of LPDP
scholarships opened?
Periode pendaftaran beasiswa LPDP dibuka
 S-P
kapan
Registration period of LPDP scholarship is
opened when?
Apa saja persyaratan wajib untuk beasiswa
 S-P-K
monbukagakusho
What are the mandatory requirements for
Monbukaga-kusho scholarship
Siapa yang membiayai beasiswa lpdp tahun
 S-P-O-K
2017
Who funds the LPDP scholarship 2017
Mapping
Kapan periode pendaftaran beasiswa LPDP
constituent
dibuka
phrase into When is the registration period of LPDP
scholarship
scholarship opened?
ontology
entities
Results
 Transitive
object
property





 Individual
classificati
on based
on SWRL
rules
 Individual
classificati
on based
on OWL
Restriction
 Data range
restriction
Apa saja beasiswa yang
memiliki tujuan di negaranegara eropa
What scholarships which
have the destination a
country in Europe
Apa
saja
program
beasiswa DAAD
What
are
DAAD’s
scholarship programs?
Apa saja beasiswa luar
negeri untuk jenjang S3
What
overseas
scholarships do exist for
the doctoral degree
Beasiswa
apa
yang
memerlukan skor TOEFL
400
What scholarships require
TOEFL 400
Beasiswa
apa
yang
menerima
pendaftar
dengan usia 40 tahun
What scholarships are
existed for age 40 years
old
Reasons
Results
The master is a
same individual
as individual S2.

The major is an
equivalent object
property of field_
of_study

The eligibility is
an
equivalent
datatype property
of requirement

The fund is an
inverse property
of
funded_by
which
is
equivalent
to
sponsored_by
object property
The destination is
an
transitive
object property
for Country and
Area classes.

The individuals
of
DAAD_
scholarships class
are inferred from
SWRL rule.
The individuals of
Overseas_
scholarships class
are inferred from
OWLRestriction
The data range
restriction
is
constructed using
parameter TOEFL
<= 400
Data
range
restriction
is
constructed using
parameter age>=
40





2017 3rd International Conference on Science and Technology - Computer (ICST)
Based on the results in Table V, the system is able to
understand the same individual as, the equivalent object
property, the inverse of, the transitive property, and the
individual classification. For data range restriction test cases,
system is able to construct dynamic data range restriction
based on the parameter values that are generated from the
input search sentence.
In equivalent datatype property test case, the system failed
to understand equivalent datatype property. After doing
further analysis, found that HermiT reasoner does not support
the reasoning against the equivalent datatype property
characteristic. It evidenced by the difference results when a
query is executed using Hermit and Pellet reasoners (shown in
Fig. 5).
reasoner failed to perform reasoning against equivalent
datatype property, (4) data is not available in the scholarship
ontology, (5) the search sentence is a compound sentence.
V. CONCLUSION AND FUTURE WORKS
A. Conclusions
Based on research that has been done, it could be
concluded:
 The system is able to process the natural language
sentence in question form of Bahasa.
 The system is able to define dynamic data range rule
based on the parameters derived from the input
sentence.
 The system is able to perform rule reasoning against all
defined rules, except the equivalent datatype property
characteristic. It occurred because the Hermit reasoner
failed to perform reasoning on equivalent datatype
property.
 Semantic search with rule reasoning system for
scholarship information search is able to retrieve the
relevant answers against 90 input sentences with recall
rate 98.51% and precision rate 100%.
B. Future Works
This system still has some drawbacks. When adding a data
range restriction, the ontology failed to re-synchronize the
reasoner automatically, so the search process must be run
twice. Moreover, the data extraction processes are sensitive to
the changes of structure or style of the websites. This is due to
the extracted data is a free-text data and the selected method
for extracting data is CSS Selector. Thus, further development
is required for the system to perform better in data extraction
and semantic search processes.
Fig. 5. The difference result of a query is executed using Hermit and Pellet
When a query is processed using HermiT reasoner, the
equivalent datatype property characteristic cannot be reasoned,
so it returns an empty set, while using Pellet reasoner, the
equivalent datatype property characteristic can be reasoned, so
it returns a query result as should be.
B. Recall and Precision
The rate of recall and precision are the evaluation
parameter in semantic search results. Recall and precision
testing is done by using 105 input search sentences collected
from respondents. It shows that the system gets 98.51% of
recall rate and 100% of precision rate.
The number of input search sentences that are not able to
be processed are 15 sentences. It caused by the several things,
such as: caused by several things, such as (1) the question
words are not valid according to Bahasa grammar, (2) the
subject of the search sentences are incomplete, (3) HermiT
ACKNOWLEDGMENT
Funding for this research is provided by Lembaga
Pengelola Dana Pendidikan (LPDP) under Beasiswa
Pendidikan Indonesia Program Magister dan Doktoral.
REFERENCES
[1]
[2]
[3]
[4]
[5]
N. Kobari, “Penelitian Dasar Terhadap Motivasi Mahasiswa yang
Memilih Keahlian Pendidikan Bahasa Jepang,” Bhs. Sastra, vol. 14, no.
2, pp. 117–130, 2014.
P. Priya and R. Rajalaxmi, “Ontology Based Semantic Query
Suggestion For Movie Search,” in 2013 International Conference on
Information Communication and Embedded Systems (ICICES), 2013,
pp. 277–282.
D. Zhi-Qiang, H. Jing, Y. Hong-Xia, and H. Jin-Zhu, “The Research of
the Semantic Search Engine Based on the Ontology,” in 2007
International Conference on Wireless Communications, Networking
and Mobile Computing, 2007, pp. 5403–5406.
F. T. Admojo and E. Winarko, “Sistem Pencarian Informasi Berbasis
Ontologi untuk Jalur Pendakian Gunung Menggunakan Query Bahasa
Alami dengan Penyajian Peta Interaktif,” IJCCS (Indonesian J.
Comput. Cybern. Syst., vol. 10, no. 1, pp. 23–34, 2016.
W3C,
“Inference.”
[Online].
Available:
https://www.w3.org/standards/semanticweb/inference. [Accessed: 16-
2017 3rd International Conference on Science and Technology - Computer (ICST)
[6]
[7]
[8]
Jan-2017].
Yunita, “Pemanfaatan Semantic Web Rule Language (SWRL) dalam
Prototipe Sistem Perencanaan Perjalanan Wisata di Sumatera Selatan,”
Universitas Gadjah Mada, 2011.
L. Ma, H. Yu, Y. Wang, and G. Chen, “The Knowledge Representation
and Semantic Reasoning Realization of Productivity Grade Based on
Ontology and SWRL,” in IFIP Advances in Information and
Communication Technology, 2012, vol. 368 AICT, no. PART 1, pp.
381–389.
L. Subirats, L. Ceccaroni, C. Gómez-Pérez, R. Caballero, R. LopezBlazquez, and F. Miralles, “On Semantic, Rule-Based Reasoning in the
Management of Functional Rehabilitation Processes,” in Management
Intelligent Systems: Second International Symposium, J. Casillas, F. J.
Martínez-López, R. Vicari, and F. la Prieta, Eds. Heidelberg: Springer
International Publishing, 2013, pp. 51–58.
[9]
[10]
[11]
H. Embregts, V. Milea, and F. Frasincar, “Metafrastes: A News
Ontology-Based Information Querying Using Natural Language
Processing,” in The 8th International Conference on Knowledge
Management in Organizations: Social and Big Data Computing for
Knowledge Management, L. Uden, L. S. L. Wang, J. M. Corchado
Rodríguez, H.-C. Yang, and I.-H. Ting, Eds. Dordrecht: Springer
Netherlands, 2014, pp. 313–324.
S. Muttaqin, “Sistem Question Answering Data Kabupaten di Nusa
Tenggara Barat Berbasis Multi-Ontologi,” Universitas Gadjah Mada,
2016.
H. Alwi, S. Dardjowidjojo, H. Lapoliwa, and A. M. Moelinono, Tata
Bahasa Baku Bahasa Indonesia, 3rd Ed. Jakarta: Balai Pustaka, 2010.
Документ
Категория
Без категории
Просмотров
3
Размер файла
829 Кб
Теги
2017, icstc, 8011869
1/--страниц
Пожаловаться на содержимое документа