Comment by westurner

Comment by westurner 4 days ago

1 reply

Property graphs don't specify schema.

Is it Shape.color or Shape.coleur, feet or meters?

RDF has URIs for predicates (attributes). RDFS specifies :Class(es) with :Property's, which are identified by URIs.

E.g. Wikidata has schema; forms with validation. Dbpedia is Wikipedia infoboxes regularly extracted to RDF.

Google acquired metaweb freebase years ago, launched a Knowledge Graph product, and these days supports Structured Data search cards in microdata, RDFa, and JSONLD.

[LLM] NN topology is sort of a schema.

Linked Data standards for data validation include RDFS and SHACL. JSON schema is far more widely implemented.

RDFa is "RDF in HTML attributes".

How much more schema does the application need beyond [WikiWord] auto-linkified edges? What about typed edges with attributes other than href and anchor text?

AtomSpace is an in-memory hypergraph with schema to support graph rewriting specifically for reasoning and inference.

There are ORMs for graph databases. Just like SQL, how much of the query and report can be done be the server without processing every SELECTed row.

Query languages for graphs: SQL, SPARQL, SPARQLstar, GraphQL, Cypher, Gremlin.

Object-attribute level permissions are for the application to implement and enforce. Per-cell keys and visibility are native db features of e.g. Accumulo, but to implement the same with e.g. Postgres every application that is a database client is on scout's honor to also enforce object-attribute access control lists.

And then identity; which user with which (sovereign or granted) cryptographic key can add dated named graphs that mutate which data in the database.

So, property graphs eventually need schema and data validation.

markmap.js.org is a simple app to visualize a markdown document with headings and/or list items as a mindmap; but unlike Freemind, there's no way to add edges that make the tree a cyclic graph.

Cyclic graphs require different traversal algorithms. For example, Python will raise MaxRecursionError when encountering a graph cycle without a visited node list, but a stack-based traversal of a cyclic graph will not halt without e.g. a visited node list to detect cycles, though a valid graph path may contain cycles (and there is feedback in so many general systems)

YAML-LD is JSON-LD in YAML.

JSON-LD as a templated output is easier than writing a (relatively slow) native RDF application and re-solving for what SQL ORM web frameworks already do.

There are specs for cryptographically signing RDF such that the signature matches regardless of the graph representation.

westurner 4 days ago

There are processes and business processes around knowledge graphs like there are for any other dataset.

OTOH; ETL, Data Validation, Publishing and Hosting of dataset and/or servicing arbitrary queries and/or cost-estimable parametric [windowed] reports, Recall and retraction traceability

DVC.org and the UC BIDS Computational Inference notebook book probably have a better enumeration of processes for data quality in data science.

...

With RDF - though it's a question of database approach and not data representation -

Should an application create a named graph per database transaction changeset or should all of that data provenance metadata be relegated to a database journal that can't be read from or written to by the app?

How much transaction authentication metadata should an app be trusted to write?

A typical SQL webapp has one database user which can read or write to any column of any table.

Blockchains and e.g. Accumulo require each user to "connect to" the database with a unique key.

It is far harder for users to impersonate other users in database systems that require a cryptographic key per user than it is to just write in a different username and date using the one db cred granted to all application instances.

W3C DIDs are cryptographic keys (as RDF with schema) that can be generated by users locally or generated centrally; similar to e.g. Bitcoin account address double hashes.

Users can cryptographically sign JSON-LD, YAML-LD, RDFa, and any other RDF format with W3C DIDs; in order to assure data integrity.

How do data integrity and data provenance affect the costs, utility, and risks of knowledge graphs?

Compared to GPG signing git commits to markdown+YAML-LD flat files in a git repo, and paying e.g gh to enforce codeowner permissions on files and directories in the repo by preventing unsigned and unauthorized commits, what are the risks of trusting all of the data from all of the users that could ever write to a knowledge graph?

Which initial graph schema support inference and reasoning; graph rewriting?