pyrdfrules

PyRDFRules is a Python wrapper for the RDFRules tool, providing an interface to interact with RDFRules for rule mining from RDF knowledge graphs.

Features

Start and stop the RDFRules engine.
Provision a local instance of RDFRules.
Create and run tasks.
Access the results of the tasks.
Format the results of the tasks.

Quickstart

If you want to get started with PyRDFRules instantly, you can use one of the two following Google Colab notebooks:

Template RDFRules Notebook - use this notebook as a start for your analysis workloads, provisions the PyRDFRules library and local RDFRules.
Pipeline sample - a sample pipeline on a local instance of RDFRules, from starting the instance to getting the results.

Installation

Install the package using pip:

pip install pyrdfrules

Configure the RDFRules instance ahead of time using the Config class:

from pyrdfrules.config import Config

config = Config()

For config options, click on the pyrdfrules.config.Config class in the documentation.

Start a local instance of RDFRules:

app = pyrdfrules.application.Application()

rdfrules = app.start_local(
    install_jvm = True,
    install_rdfrules = True,
    config = config
)

Modules

The library is segmented into the following modules:

pyrdfrules.api - internal API classes.
pyrdfrules.application - provides methods to start and stop local or remote instances of RDFRules.
pyrdfrules.common - contains common classes and methods.
pyrdfrules.config - configuration class.
pyrdfrules.engine - contains the engine classes, responsible for the lifetime of the RDFRules instance.
pyrdfrules.rdfrules - contains wrappers around RDFRules objects.

Supported operations and result types

Supported operations and bindings of serialized items for each domain can be found at:

pyrdfrules.rdfrules - pipeline operations,
pyrdfrules.common.result - result types,
pyrdfrules.common.result.evaluation - evaluation results, printing confusion matrix,
pyrdfrules.common.result.histogram - histogram results, printing histograms, top N results,
pyrdfrules.common.rule.ruleset - ruleset, printing individual pyrdfrules.common.rule.rule rules in text format.

Sample pipeline

Sample usage:

import pyrdfrules.application
from pyrdfrules.common.task.task import Task
from pyrdfrules.config import Config
from pyrdfrules.rdfrules.commondata import ConfidenceType, Constraint, RuleConsumer, RuleConsumerType, Threshold
from pyrdfrules.rdfrules.jsonformats import PrefixFull
from pyrdfrules.rdfrules.pipeline import ComputeConfidence, GetRules, GraphAwareRules, Index, LoadGraph, MergeDatasets, AddPrefixes, Mine, Pipeline, SortRuleset

# Create an instance of the application.
app = pyrdfrules.application.Application()

# Connect to an existing instance of RDFRules.
rdfrules = app.start_remote(
    url = Url("http://example.com/api/"),
    config=Config(
        task_update_interval_ms=1000
    )
)

# Create a pipeline, a sequence of steps to be executed.
# You do not have to use fully qualified names for the classes, as they are imported in the example.
pipeline = Pipeline(
    tasks=[
        LoadGraph(
            graphName = "<dbpedia>",
            path = "/dbpedia_yago/mappingbased_objects_sample.ttl"
        ),
        LoadGraph(
            graphName = "<yago>",
            path = "/dbpedia_yago/yagoFacts.tsv",
            settings = "tsvParsedUris"
        ),
        LoadGraph(
            graphName = "<dbpedia>",
            path = "/dbpedia_yago/yagoDBpediaInstances.tsv",
            settings = "tsvParsedUris"
        ),
        MergeDatasets(),
        AddPrefixes(
            prefixes=[
                PrefixFull(prefix="dbo", nameSpace="http://dbpedia.org/ontology/"),
                PrefixFull(prefix="dbr", nameSpace="http://dbpedia.org/resource/")
            ]
        ),
        Index(train=[], test=[]),
        Mine(
            thresholds=[
                Threshold(name="MinHeadSize", value=100),
                Threshold(name="MaxRuleLength", value=3),
                Threshold(name="Timeout", value=5),
                Threshold(name="MinHeadCoverage", value=0.01),
            ],
            ruleConsumers=[
                RuleConsumer(
                    name=RuleConsumerType.TOP_K,
                    k=1000,
                    allowOverflow=False
                )
            ],
            patterns=[],
            constraints=[
                Constraint(name="WithoutConstants")
            ],
            parallelism=0
        ),
        ComputeConfidence(confidenceType=ConfidenceType.PCA_CONFIDENCE, min=0.5, topk=50),
        SortRuleset(by=[]),
        GraphAwareRules(),
        GetRules()
    ]
)

# Create a task, which represents the execution of the pipeline.
task : Task = None

# Submit the task to the RDFRules engine.
task = rdfrules.task.create_task(pipeline)

# Run the task step by step.
for step in rdfrules.task.run_task(task):
    print(step)
    # You can access the result of the task using the task object, read the logs, or interrupt the task here.

# Access the result of the task.
print(task.result)

# Access the rules from the result.
for rule in task.result.get_ruleset().get_rules():
    print(rule.as_text()) # Print the rule in text format.

View Source

  1# SPDX-FileCopyrightText: 2023-present Karel Douda <kareldouda1@gmail.com>
  2#
  3# SPDX-License-Identifier: MIT
  4
  5"""
  6PyRDFRules is a Python wrapper for the RDFRules tool, providing an interface to interact with RDFRules for rule mining from RDF knowledge graphs.
  7
  8## Features
  9
 10- Start and stop the RDFRules engine.
 11- Provision a local instance of RDFRules.
 12- Create and run tasks.
 13- Access the results of the tasks.
 14- Format the results of the tasks.
 15
 16## Quickstart
 17
 18If you want to get started with PyRDFRules instantly, you can use one of the two following Google Colab notebooks:
 19
 20* [Template RDFRules Notebook](https://colab.research.google.com/drive/1KCyv7b6RtQgQXk-V-oTjYpiQsC-_mFHp?usp=sharing) - use this notebook as a start for your analysis workloads, provisions the PyRDFRules library and local RDFRules.
 21* [Pipeline sample](https://colab.research.google.com/drive/192YaNsbpqoD9-he32OaY2nTi-E_ctXYT?usp=sharing) - a sample pipeline on a local instance of RDFRules, from starting the instance to getting the results.
 22
 23## Installation
 24
 251. Install the package using pip:
 26```bash
 27pip install pyrdfrules
 28```
 29
 302. Configure the RDFRules instance ahead of time using the `Config` class:
 31
 32```python
 33from pyrdfrules.config import Config
 34
 35config = Config()
 36``` 
 37
 38For config options, click on the `pyrdfrules.config.Config` class in the documentation.
 39
 403. Start a local instance of RDFRules:
 41```python
 42app = pyrdfrules.application.Application()
 43
 44rdfrules = app.start_local(
 45    install_jvm = True,
 46    install_rdfrules = True,
 47    config = config
 48)
 49```
 50
 51## Modules
 52
 53The library is segmented into the following modules:
 54
 55* `pyrdfrules.api` - internal API classes.
 56* `pyrdfrules.application` - provides methods to start and stop local or remote instances of RDFRules.
 57* `pyrdfrules.common` - contains common classes and methods.
 58* `pyrdfrules.config` - configuration class.
 59* `pyrdfrules.engine` - contains the engine classes, responsible for the lifetime of the RDFRules instance.
 60* `pyrdfrules.rdfrules` - contains wrappers around RDFRules objects.
 61
 62## Supported operations and result types
 63
 64Supported operations and bindings of serialized items for each domain can be found at:
 65* `pyrdfrules.rdfrules` - pipeline operations,
 66* `pyrdfrules.common.result` - result types,
 67* `pyrdfrules.common.result.evaluation` - evaluation results, printing confusion matrix,
 68* `pyrdfrules.common.result.histogram` - histogram results, printing histograms, top N results,
 69* `pyrdfrules.common.rule.ruleset` - ruleset, printing individual `pyrdfrules.common.rule.rule` rules in text format.
 70
 71## Sample pipeline
 72
 73Sample usage:
 74```python
 75import pyrdfrules.application
 76from pyrdfrules.common.task.task import Task
 77from pyrdfrules.config import Config
 78from pyrdfrules.rdfrules.commondata import ConfidenceType, Constraint, RuleConsumer, RuleConsumerType, Threshold
 79from pyrdfrules.rdfrules.jsonformats import PrefixFull
 80from pyrdfrules.rdfrules.pipeline import ComputeConfidence, GetRules, GraphAwareRules, Index, LoadGraph, MergeDatasets, AddPrefixes, Mine, Pipeline, SortRuleset
 81
 82# Create an instance of the application.
 83app = pyrdfrules.application.Application()
 84
 85# Connect to an existing instance of RDFRules.
 86rdfrules = app.start_remote(
 87    url = Url("http://example.com/api/"),
 88    config=Config(
 89        task_update_interval_ms=1000
 90    )
 91)
 92
 93# Create a pipeline, a sequence of steps to be executed.
 94# You do not have to use fully qualified names for the classes, as they are imported in the example.
 95pipeline = Pipeline(
 96    tasks=[
 97        LoadGraph(
 98            graphName = "<dbpedia>",
 99            path = "/dbpedia_yago/mappingbased_objects_sample.ttl"
100        ),
101        LoadGraph(
102            graphName = "<yago>",
103            path = "/dbpedia_yago/yagoFacts.tsv",
104            settings = "tsvParsedUris"
105        ),
106        LoadGraph(
107            graphName = "<dbpedia>",
108            path = "/dbpedia_yago/yagoDBpediaInstances.tsv",
109            settings = "tsvParsedUris"
110        ),
111        MergeDatasets(),
112        AddPrefixes(
113            prefixes=[
114                PrefixFull(prefix="dbo", nameSpace="http://dbpedia.org/ontology/"),
115                PrefixFull(prefix="dbr", nameSpace="http://dbpedia.org/resource/")
116            ]
117        ),
118        Index(train=[], test=[]),
119        Mine(
120            thresholds=[
121                Threshold(name="MinHeadSize", value=100),
122                Threshold(name="MaxRuleLength", value=3),
123                Threshold(name="Timeout", value=5),
124                Threshold(name="MinHeadCoverage", value=0.01),
125            ],
126            ruleConsumers=[
127                RuleConsumer(
128                    name=RuleConsumerType.TOP_K,
129                    k=1000,
130                    allowOverflow=False
131                )
132            ],
133            patterns=[],
134            constraints=[
135                Constraint(name="WithoutConstants")
136            ],
137            parallelism=0
138        ),
139        ComputeConfidence(confidenceType=ConfidenceType.PCA_CONFIDENCE, min=0.5, topk=50),
140        SortRuleset(by=[]),
141        GraphAwareRules(),
142        GetRules()
143    ]
144)
145
146# Create a task, which represents the execution of the pipeline.
147task : Task = None
148
149# Submit the task to the RDFRules engine.
150task = rdfrules.task.create_task(pipeline)
151    
152# Run the task step by step.
153for step in rdfrules.task.run_task(task):
154    print(step)
155    # You can access the result of the task using the task object, read the logs, or interrupt the task here.
156
157# Access the result of the task.
158print(task.result)
159
160# Access the rules from the result.
161for rule in task.result.get_ruleset().get_rules():
162    print(rule.as_text()) # Print the rule in text format.
163```        
164"""