Extraction

Extraction

Introduction

The simplest way to "teach" AWMT is by directly using the models composition API, in order to insert knowledge that is already structutured and reliable. However, the system can also learn from unstructured data, such as text or images.

The process of extraction consists of updating one or several existing abstraction(s) with new information provided in an unstructured format.

Human language is ambiguous and often incomplete, so the system is able to ask for clarification or confirmation when needed.

Extraction can also be seen as the ability to "learn new knowledge" about one specific object. The ability to generalize that knowledge to all sibling objects is induction.

Extracting from text

request.gql
mutation {
	create_model(label: "Football Club") {
		man_u: instantiate(label: "Manchester United") {
			model {
				label
			}
		}
		chelsea: instantiate(label: "Chelsea") {
			model {
				label
			}
		}
		arsenal: instantiate(label: "Arsenal") {
			model {
				label
			}
		}
		man_city: instantiate(label: "Manchester City") {
			model {
				label
			}
		}
	}
}
 
query {
	extract(text: "erling haaland plays for manchester") @stream {
		status
		kind
		label
		start {
			path
			label
		}
		end {
			path
			label
		}
		ambiguity {
			path
			options {
				path
				label
			}
		}
	}
}
response.json
{
	"data": {
		"extract": [{
			{
				"status": "FOUND",
				"kind": "HAS_PROPERTY",
				"label": "plays for",
				"start": {
					"path": "football_player",
					"label": "Football player"
				},
				"end": {
					"path": "football_club",
					"label": "Football club"
				}
			},
			{
				"status": "CREATED",
				"kind": "INSTANCE_OF",
				"start": {
					"path": "erling_haaland",
					"label": "Erling Haaland"
				},
				"end": {
					"path": "football_player",
					"label": "Football player"
				}
			},
			{
				"status": "AMBIGUOUS",
				"kind": "REFERENCE",
				"ambiguity": {
					"path": "erling_haaland:plays_for",
						"options": [
							{
								"path": "manchester_united",
								"label": "Manchester United"
							},
							{
								"path": "manchester_city",
								"label": "Manchester City"
							}
						]
					}
			}
		}]
	}
}

In that example, the information extracted from the text has been confronted with the existing models.

The system knows what a football player is, what a football club is, that a football player may play for a club, and it knows a few clubs by their names.

It has been able to extract that there is a new football player called Erling Haaland and that he plays for a club, but it is not sure which one, because several ones applied to the context. It has asked for clarification between Manchester United and Manchester City, making the user "force" the disambiguation.

Extracting from images