pyrdfrules
PyRDFRules is a Python wrapper for the RDFRules tool, providing an interface to interact with RDFRules for rule mining from RDF knowledge graphs.
Features
- Start and stop the RDFRules engine.
- Provision a local instance of RDFRules.
- Create and run tasks.
- Access the results of the tasks.
- Format the results of the tasks.
Quickstart
If you want to get started with PyRDFRules instantly, you can use one of the two following Google Colab notebooks:
- Template RDFRules Notebook - use this notebook as a start for your analysis workloads, provisions the PyRDFRules library and local RDFRules.
- Pipeline sample - a sample pipeline on a local instance of RDFRules, from starting the instance to getting the results.
Installation
- Install the package using pip:
pip install pyrdfrules
- Configure the RDFRules instance ahead of time using the
Config
class:
from pyrdfrules.config import Config
config = Config()
For config options, click on the pyrdfrules.config.Config
class in the documentation.
- Start a local instance of RDFRules:
app = pyrdfrules.application.Application()
rdfrules = app.start_local(
install_jvm = True,
install_rdfrules = True,
config = config
)
Modules
The library is segmented into the following modules:
pyrdfrules.api
- internal API classes.pyrdfrules.application
- provides methods to start and stop local or remote instances of RDFRules.pyrdfrules.common
- contains common classes and methods.pyrdfrules.config
- configuration class.pyrdfrules.engine
- contains the engine classes, responsible for the lifetime of the RDFRules instance.pyrdfrules.rdfrules
- contains wrappers around RDFRules objects.
Supported operations and result types
Supported operations and bindings of serialized items for each domain can be found at:
pyrdfrules.rdfrules
- pipeline operations,pyrdfrules.common.result
- result types,pyrdfrules.common.result.evaluation
- evaluation results, printing confusion matrix,pyrdfrules.common.result.histogram
- histogram results, printing histograms, top N results,pyrdfrules.common.rule.ruleset
- ruleset, printing individualpyrdfrules.common.rule.rule
rules in text format.
Sample pipeline
Sample usage:
import pyrdfrules.application
from pyrdfrules.common.task.task import Task
from pyrdfrules.config import Config
from pyrdfrules.rdfrules.commondata import ConfidenceType, Constraint, RuleConsumer, RuleConsumerType, Threshold
from pyrdfrules.rdfrules.jsonformats import PrefixFull
from pyrdfrules.rdfrules.pipeline import ComputeConfidence, GetRules, GraphAwareRules, Index, LoadGraph, MergeDatasets, AddPrefixes, Mine, Pipeline, SortRuleset
# Create an instance of the application.
app = pyrdfrules.application.Application()
# Connect to an existing instance of RDFRules.
rdfrules = app.start_remote(
url = Url("http://example.com/api/"),
config=Config(
task_update_interval_ms=1000
)
)
# Create a pipeline, a sequence of steps to be executed.
# You do not have to use fully qualified names for the classes, as they are imported in the example.
pipeline = Pipeline(
tasks=[
LoadGraph(
graphName = "<dbpedia>",
path = "/dbpedia_yago/mappingbased_objects_sample.ttl"
),
LoadGraph(
graphName = "<yago>",
path = "/dbpedia_yago/yagoFacts.tsv",
settings = "tsvParsedUris"
),
LoadGraph(
graphName = "<dbpedia>",
path = "/dbpedia_yago/yagoDBpediaInstances.tsv",
settings = "tsvParsedUris"
),
MergeDatasets(),
AddPrefixes(
prefixes=[
PrefixFull(prefix="dbo", nameSpace="http://dbpedia.org/ontology/"),
PrefixFull(prefix="dbr", nameSpace="http://dbpedia.org/resource/")
]
),
Index(train=[], test=[]),
Mine(
thresholds=[
Threshold(name="MinHeadSize", value=100),
Threshold(name="MaxRuleLength", value=3),
Threshold(name="Timeout", value=5),
Threshold(name="MinHeadCoverage", value=0.01),
],
ruleConsumers=[
RuleConsumer(
name=RuleConsumerType.TOP_K,
k=1000,
allowOverflow=False
)
],
patterns=[],
constraints=[
Constraint(name="WithoutConstants")
],
parallelism=0
),
ComputeConfidence(confidenceType=ConfidenceType.PCA_CONFIDENCE, min=0.5, topk=50),
SortRuleset(by=[]),
GraphAwareRules(),
GetRules()
]
)
# Create a task, which represents the execution of the pipeline.
task : Task = None
# Submit the task to the RDFRules engine.
task = rdfrules.task.create_task(pipeline)
# Run the task step by step.
for step in rdfrules.task.run_task(task):
print(step)
# You can access the result of the task using the task object, read the logs, or interrupt the task here.
# Access the result of the task.
print(task.result)
# Access the rules from the result.
for rule in task.result.get_ruleset().get_rules():
print(rule.as_text()) # Print the rule in text format.
1# SPDX-FileCopyrightText: 2023-present Karel Douda <kareldouda1@gmail.com> 2# 3# SPDX-License-Identifier: MIT 4 5""" 6PyRDFRules is a Python wrapper for the RDFRules tool, providing an interface to interact with RDFRules for rule mining from RDF knowledge graphs. 7 8## Features 9 10- Start and stop the RDFRules engine. 11- Provision a local instance of RDFRules. 12- Create and run tasks. 13- Access the results of the tasks. 14- Format the results of the tasks. 15 16## Quickstart 17 18If you want to get started with PyRDFRules instantly, you can use one of the two following Google Colab notebooks: 19 20* [Template RDFRules Notebook](https://colab.research.google.com/drive/1KCyv7b6RtQgQXk-V-oTjYpiQsC-_mFHp?usp=sharing) - use this notebook as a start for your analysis workloads, provisions the PyRDFRules library and local RDFRules. 21* [Pipeline sample](https://colab.research.google.com/drive/192YaNsbpqoD9-he32OaY2nTi-E_ctXYT?usp=sharing) - a sample pipeline on a local instance of RDFRules, from starting the instance to getting the results. 22 23## Installation 24 251. Install the package using pip: 26```bash 27pip install pyrdfrules 28``` 29 302. Configure the RDFRules instance ahead of time using the `Config` class: 31 32```python 33from pyrdfrules.config import Config 34 35config = Config() 36``` 37 38For config options, click on the `pyrdfrules.config.Config` class in the documentation. 39 403. Start a local instance of RDFRules: 41```python 42app = pyrdfrules.application.Application() 43 44rdfrules = app.start_local( 45 install_jvm = True, 46 install_rdfrules = True, 47 config = config 48) 49``` 50 51## Modules 52 53The library is segmented into the following modules: 54 55* `pyrdfrules.api` - internal API classes. 56* `pyrdfrules.application` - provides methods to start and stop local or remote instances of RDFRules. 57* `pyrdfrules.common` - contains common classes and methods. 58* `pyrdfrules.config` - configuration class. 59* `pyrdfrules.engine` - contains the engine classes, responsible for the lifetime of the RDFRules instance. 60* `pyrdfrules.rdfrules` - contains wrappers around RDFRules objects. 61 62## Supported operations and result types 63 64Supported operations and bindings of serialized items for each domain can be found at: 65* `pyrdfrules.rdfrules` - pipeline operations, 66* `pyrdfrules.common.result` - result types, 67* `pyrdfrules.common.result.evaluation` - evaluation results, printing confusion matrix, 68* `pyrdfrules.common.result.histogram` - histogram results, printing histograms, top N results, 69* `pyrdfrules.common.rule.ruleset` - ruleset, printing individual `pyrdfrules.common.rule.rule` rules in text format. 70 71## Sample pipeline 72 73Sample usage: 74```python 75import pyrdfrules.application 76from pyrdfrules.common.task.task import Task 77from pyrdfrules.config import Config 78from pyrdfrules.rdfrules.commondata import ConfidenceType, Constraint, RuleConsumer, RuleConsumerType, Threshold 79from pyrdfrules.rdfrules.jsonformats import PrefixFull 80from pyrdfrules.rdfrules.pipeline import ComputeConfidence, GetRules, GraphAwareRules, Index, LoadGraph, MergeDatasets, AddPrefixes, Mine, Pipeline, SortRuleset 81 82# Create an instance of the application. 83app = pyrdfrules.application.Application() 84 85# Connect to an existing instance of RDFRules. 86rdfrules = app.start_remote( 87 url = Url("http://example.com/api/"), 88 config=Config( 89 task_update_interval_ms=1000 90 ) 91) 92 93# Create a pipeline, a sequence of steps to be executed. 94# You do not have to use fully qualified names for the classes, as they are imported in the example. 95pipeline = Pipeline( 96 tasks=[ 97 LoadGraph( 98 graphName = "<dbpedia>", 99 path = "/dbpedia_yago/mappingbased_objects_sample.ttl" 100 ), 101 LoadGraph( 102 graphName = "<yago>", 103 path = "/dbpedia_yago/yagoFacts.tsv", 104 settings = "tsvParsedUris" 105 ), 106 LoadGraph( 107 graphName = "<dbpedia>", 108 path = "/dbpedia_yago/yagoDBpediaInstances.tsv", 109 settings = "tsvParsedUris" 110 ), 111 MergeDatasets(), 112 AddPrefixes( 113 prefixes=[ 114 PrefixFull(prefix="dbo", nameSpace="http://dbpedia.org/ontology/"), 115 PrefixFull(prefix="dbr", nameSpace="http://dbpedia.org/resource/") 116 ] 117 ), 118 Index(train=[], test=[]), 119 Mine( 120 thresholds=[ 121 Threshold(name="MinHeadSize", value=100), 122 Threshold(name="MaxRuleLength", value=3), 123 Threshold(name="Timeout", value=5), 124 Threshold(name="MinHeadCoverage", value=0.01), 125 ], 126 ruleConsumers=[ 127 RuleConsumer( 128 name=RuleConsumerType.TOP_K, 129 k=1000, 130 allowOverflow=False 131 ) 132 ], 133 patterns=[], 134 constraints=[ 135 Constraint(name="WithoutConstants") 136 ], 137 parallelism=0 138 ), 139 ComputeConfidence(confidenceType=ConfidenceType.PCA_CONFIDENCE, min=0.5, topk=50), 140 SortRuleset(by=[]), 141 GraphAwareRules(), 142 GetRules() 143 ] 144) 145 146# Create a task, which represents the execution of the pipeline. 147task : Task = None 148 149# Submit the task to the RDFRules engine. 150task = rdfrules.task.create_task(pipeline) 151 152# Run the task step by step. 153for step in rdfrules.task.run_task(task): 154 print(step) 155 # You can access the result of the task using the task object, read the logs, or interrupt the task here. 156 157# Access the result of the task. 158print(task.result) 159 160# Access the rules from the result. 161for rule in task.result.get_ruleset().get_rules(): 162 print(rule.as_text()) # Print the rule in text format. 163``` 164"""