Barry Robson

Corresponding author: Barry Robson barry.robson@quantalsemantics.com

**Author Affiliations :**

Quantal Semantics Inc, North Carolina, US; St. Matthew's University School of Medicine, Grand Cayman; Department of Mathematics and Computer Science, University of Wisconsin-Stout, US; The Dirac Foundation, UK.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Many workers involved in drug discovery will have some early familiarity with the principles of quantum mechanics as applied in chemistry. Certainly those involved in computational chemistry, and particularly molecular modeling, will do so. This familiarity will however be in regard to the algebraic systems based on the imaginary number *i* that is required for wave mechanics and hence study of molecular properties. This does not exhaust the scope of quantum mechanics, and following an argument by Dirac, the larger picture including more flavors of imaginary number should be applicable to all aspects of human thought where numbers are involved. This includes probabilistic semantics and the construction of probabilistic networks that not only capture knowledge, but perform inference of a more general value to the pharmaceutical industry.

**Escalation of Pharmaceutical and Biomedical Data.** Large reservoirs of chemical or biomedical data, such as all US patents now accessible on the Internet, contain a huge amount of information for chemical business intelligence and drug discovery [1]. Large archives of structured biomedical data, such as patient records, contain a great deal of information about interrelations between clinical factors such as drug prescriptions and outcomes [2]. Collections of protein and nucleic acid sequences and structures have long been available and continue to grow for exploitation by the biotechnology and pharmaceutical industries [3]. Their impact on healthcare has been recently reviewed in historical and futurological context [4]. The ability to data mine such very large sources provides us with more reliable and objective probabilities associated with *rules* or *statements* about things in the world and the relationships between them. (e.g. Refs. [5, 6, 7] ).

**Purpose of this Report and Status of the Project.** The true value of the above resides not simply in sifting hundreds and thousands of rules arbitrarily in order to find interesting new ones, because the true importance of a single isolated rule lies in the context of other related rules, collectively providing a more complete weight of evidence. Rather, one wishes to use many rules collectively for automated inference and decision support. The present report describes the theoretical and algorithmic basis for such an inference and knowledge discovery approach, the detailed results of which we hope to report in further papers. The basic idea has already been applied in prototype applications to inference networks comprising probabilistic rules from large collections of medical data, including drug outcomes [2]. To be of value to drug discovery, this is now being extended by the generation of appropriate rules from 6.7 million proto-rules that link molecular formulae to patent text, derived by automatic reading of all US patents [1]. The size and complexity of these data sets, and of many of the rules derived from them, motivates us to on-going theoretical and algorithmic development, and so the ultimate value is yet to be demonstrated. Nonetheless, we have already shown [1] that we can evolve rules about what appear to be novel drug candidates by combining the 6.7 million proto-rules with screening *in silico*, meaning here the simulation of candidate drug binding to appropriate protein targets. What we also want our methodology to include are *probabilistically qualified statements* about the particular relationships of types of molecule to various protein targets, the relevant disease states, possible drug side effects, synthetic method, shelf-life stability, and so forth. So far, pharmaceutical companies have for the most part used e.g. English to express such knowledge and, acting as “rules” exchanged between humans, these have served fairly well. However, as rules for use in automated reasoning they are at best qualitative, and not in an efficient canonical form. The basis of a *probabilistic semantics* is required that can capture rules from both data and human experts.

**Lack of Universal Best Practice in Inference.** Unlike the simplicity of examples conveyed in most standard statistics textbooks, many rules from many data sources can interact in a complicated way as a probabilistic-logical network to determine the final prediction to support decisions. Largely that is because classical text approaches focus on hypothesis tests to establish what is in effect a *single* rule such as “A associates with B”, “The value of A correlates negatively with the value of B”, or “The value of A cannot be exceeding x by chance” or simply test that potential rules are implied somewhere in the data (e.g. χ^{2} test). Data mining requires no prior hypotheses or hunches, and seeks to find all interesting rules subject to the computing power available, so raising the issue of how to combine them. Apparently tidy textbook treatments of probability and decision theory have fallen short of what is needed in practice, and in consequence the early innovative MYCIN Expert System approach [8] set the stage for a subsequent great diversity of approaches. This diversity begs the question of *best practice* [9]. A familiar approach that is theoretically well founded on the classical view is the Bayes Net [10], but this is traditionally confined to networks that are acyclic directed graphs, and involving only AND logic as implied by the multiplication of conditional probabilities of general form P(A | B & C &…) = P(A & B & C &..) / P(B & C &..) as the rules. It is equivalent to a fully connected graph in which rules that would result in cyclic paths through the network are assigned probability 1, and hence need not be expressly included. This is consistent with information theory (I = −logP, 0 = −log1) and the thesis of Popper [11], but it is a bad assumption made in many instances even given the available raw data that could correct it. In consequence, non-traditional Bayes Nets that can have cyclic paths have been developed, but are seen as requiring iteration [12]. For a large network expressing knowledge, this is time consuming.

**The Semantic Web.** All this uncertainty about best approach now becomes pressing with the emerging worldwide semantic web (SW), where universal best practice for harnessing *probabilistic* statements in semantic format is recognized as desirable but still problematic [13]. A truly probabilistic SW, i.e. one that employs probabilistic semantics throughout with certainty simply as a limiting case of probability 1, would imply an enormous reservoir of rules or statements obtained by data mining or expert opinion that can be used in methods of inference. These rules either recognize uncertainty about a statement, or the fact that they are only observed as true in a fraction of observed cases; either way, they need to be associated with a probability. Importantly, compared with Bayes Nets as our reference case, they introduce the further feature of not simply a conditional probability relating, in the simplest instance, two states events observations or measurements A and B, but also a relationship description or *relator* with the linguistic force of a verb or preposition (or verb or prepositional phrases) relating them. Comprising three things, nouns or noun phrases A and B, and the relator, they are said to constitute *triples*. This is still a simplification compared with the more complex human sentence, but the approach below does provide a basis for a richer treatment to be described elsewhere.

**Limitations of Conditional Probabilities.** Only in simple categorical cases that are typically matters of documented history such as P(“carbenoxolone is a synthetic derivative of glycyrrhetinic acid”), or of definition, can we interpret them as conditional probabilities with probability one. If it were less certain, and without loss of generality, it can be expressed in conditional probability form P(“a synthetic derivative of glycyrrhetinic acid” | carbenoxolone). Here the very specific nature of the first argument A in P(A|B) alerts us that we would need a vast number of arguments A, B, C,.. to represent knowledge that way. It also begs the question of how more profitably to write, evaluate, and use probabilistic triples for relators such as verbs of action that are *not* purely categorical, such as P(carbenoxolone | inhibits | 11Beta-hydroxysteroid-dehydrogenase). Unfortunately, the format P(A| relator |B) has no classical probability counterpart, except that we might break it down into its three components where P(A| is a row vector, the relator is a matrix, and P|B) is a column vector. This idea has a strong *relation* to the mathematical system and notation developed by Dirac [14] for quantum mechanics (QM), which is also a probabilistic inference system. In that notation one writes < $A| relator |$B>. It conveniently looks like an extension of XML to handle semantic relationships, but in fact, as discussed below, it is what Dirac’s < $A| relator |$B> *more precisely* implies algebraically that is interesting, and it shows that it is not simply P(A| relator P|B) that is wanted.

**Appeal to Quantum Mechanics.** The QM approach is compelling for several reasons. One is that Dirac’s approach in particular provides many tools and ideas to draw from. It is widely used even in traditional (pre-Dirac) quantum mechanics which is based on complex algebra using the imaginary number *i* (the square root of minus one). Importantly for present purposes, however,a further aspect of Dirac’s mathematics often called the *Clifford-Dirac calculus* is not confined to *i*-complex algebra. This extended algebra forms the basis of modern particle physics, both the so-called *standard model*, and the spinor, twistor, and string theories developed from it [15]. A more compelling strategic reason for using QM in general, however, is that it is widely held by physicists to be the required *universal best practice* for representing knowledge and inference from it on all scales from the subatomic to the cosmological [15]. The Feynman path integral approach [15] may be in particular be seen as performing inference on the particle physics analogue of a knowledge network. But no less compelling is the remarkable relation to semantics. More specifically, Dirac saw his treatment of QM as part of general language that would certainly encompass probabilistic semantics: “*The methods of theoretical physics should be applicable to all those branches of thought in which the essential features are expressible with numbers” *(Dirac’s Nobel Prize Banquet speech, 1933 - that he included human thought *per se* and its communication is clear because, rightly or wrongly, he excluded those with emotive content, poetry and economics). He did not however leave clear instruction on how this application was to be done, as if it could more-or-less be applied as-is.

**Semantic Significance of the Adjoint Operation.** At first, features of QM that will be familiar to many theoretical and computational chemists do look promising. Many of QM’s tools relate remarkably well to transformations and symmetries implied in linguistics, at least symbolically, because encodes rather more grammar information than even P(A| relator P|B) might suggest. One way of looking at this is through the fundamental QM algebraic operation known as taking the *adjoint*^{ †} which relates to potential reversal in time and causality, or more generally conditionality [15]. Potentially all algebraic symbols *s* in an expression *e* are subject to *e*^{†} as the action of the adjoint on the expression, i.e. *s*^{†} ≠ *s*. Even bracket duals like ( and ) but which are not reflectively symmetrical, notably | and >, can change. However whether and how any *s* is changed by it depends on its susceptibility to two algebraic operations known as taking the *complex conjugate* * (changing the sign of the imaginary part, if any), and taking the *transpose*^{ T} (interchanging rows and columns, if any) of the algebraic entity that the symbol implies. This is because *s*^{†} = *s*^{*T} = *s*^{T*}.

The syntax of most, and certainly Indo-European, languages is well constructed to reflect a subset of choices of symmetries related to these, through word order and active-passive tenses of verbs, and with highly inflected languages like Latin even more so, but the underlying meanings are general to semantics because the knowledge in a network is a *directed* graph. In the semantic approach based on triples, one may imagine the relator as a label associated with the directed arcs (edges) of the graph envisaged as an arrow → between two nouns or noun phrases as labels of the nodes (vertices), e.g. A and B. Here A → B equivalent to B ← A, with ← linguistically seen as the active-passive inversion of → as a verb, e.g. between *chase* and *chased by*, and A ← B is equivalent to B →A, again an active-passive inversion, but both are distinct in meaning from A → B. The fact that is *some kind of* *scalar complex value* (on occasion it can be purely real) provides one sufficient criterion for making it susceptible to the action of taking the adjoint, and this is what allows the value of to encode P(A| relator P|B) *and* P(B| relator P|A). In contrast, classical probability P of any kind is always a *scalar real value*, and so not susceptible to taking the adjoint. We can attempt to talk about the adjoint transformation of a classical conditional probability P(A|B) to P(B|A) as a symbolic manipulation, but quantitatively P(A|B)† = P(A|B) so there is no way to calculate the value of one from the other alone. P(A| relator P|B) does much better but it is only a half way house to encoding the above-mentioned four relationships involving A, B, →, and ←, and getting them to behave in the required way. As a vector-matrix approach, P(A| relator P|B) is susceptible to the action of T, but not *. < $A| relator |$B> covers both, and in this sense, probabilistic semantics is much more related to QM than it is to classical probability theory.

**Difficulties of Pre-Dirac Quantum Mechanics for Semantics.** In the above, we are in effect asking to replace A, B, C, etc. as normally applied to fundamental particles and their properties, and molecules and their properties, by macroscopic everyday objects or properties of them. It is well known that Schrödinger noted that nothing in the algebra of QM appears to prohibit this, and it led to his famous thought experiment that a cat can be alive and dead at the same time, in superposition of states. Notoriously, QM gives bizarre predictions such as superposition of states, and non-locality, on the range of scale that is everyday human experience. We here examine how this difficulty is overcome within a QM formalism, and how it may be utilized for probabilistic semantics to describe the everyday world in a quantitative way. It is arguable as to whether this *is* QM or simply some math borrowed from it. The former may be argued, because standard basic QM calculations can also be performed with the same prototype software.

**General Description.** There is a blurring between the theory and methods of this approach, because to a significant extent theory is represented by what the user writes as a program to be compiled. The prototype system constructed is essentially a compiler and executor for QM expressions on an input file. A great deal of what the executing program finally does is defined by input, where many other approaches to inference might “hard code” the actions. The fixed and brief content of the compiler is compensated by the effort represented in input but this confers great flexibility of use, and those parts of input which are very general in nature can be retained from project to project. The term “user” is often employed below for the person who programs the input in this way, though the program’s actual end user will not usually be the programmer, because programming requires some expertise. Notably, the focus is on complex valued vectors and matrices in Dirac bra-ket notation, along with the operators that act on them. An example bra is <11Beta-hydroxysteroid-dehydrogenase| and an example bra-relator-ket is <11Beta-hydroxysteroid-dehydrogenase| binds |carbenoxolone>. Again, here *binds* is the *operator* in QM terms, or relator (predication) in semantic theory terms. Such entities are used to define a network for inference purposes, called the *Dirac Net*. As discussed below, and as for any algebra, such entities can contain variables such as $A, $B,.. However, their role is different, such that, for introductory purposes, we can think of the role of the input as assigning values to single bra-relator-kets that describe the probabilistic relations between nodes A and B. This is analogous to assembling a set of conditional probabilities P(A | B, C,…) to define a Bayes Net. Nonetheless, initial emphasis will be placed on definitions using expressions with variables like $A, $B, etc, because that is a considerable differentiator, and the relatively fixed part, whereas expressions lacking variables are essentially data that may frequently change.

**Program Flow.** In the present prototype, there is no program flow control (‘go to’, loops etc.) in the QM language; what comes earlier is regarded as potentially *definitional* of what comes later. Interacting entities must be defined so there is no use of algebraic ( ) brackets, and *a = b*(*c+d*) with the parenthetic expression (*c+d*) would be rendered as *e = c + d* defining *e*, followed by *be*. That said, an operation can always be represented as a sequence of operators, *a b c d e*, and bras and kets can contain brakets and bra-relator-kets with ‘|’ and ‘>’ which do have similar effect as parenthetic expressions in ‘(‘ and ‘)’. These are beyond present scope and will be discussed elsewhere. Unlike normal programming languages but like expressions in classical logic, a form such * or * can be to the left of an assignment. Such are stored as a unit representing a part of the “giant expression” that represents the Dirac Net, and in this case represents an OR “gate” whereas otherwise AND is implied.

**Association Variables.** bras, kets, brakets, bra-relator kets (and ketbras of form |A>**association variables as they are really stored by the compiler as association (or hash) arrays, the keys of which are string constants such as ethanol which variables match (see discussion later below). **

**Defining Basic Format.** For semantic application, there is no constraint on input to mean that input and hence output is, in effect, English, nor any human language except for convenience and standardization. To allow this, a few more-or-less fixed format defining forms should appear early in input, albeit that the current implementation as following is somewhat a matter of taste, and is not fundamental to the approach save to illustrate flexibility.

<$A|$A>=<$A|$A> #Define the symbol, here '=', for semantic equivalence.

<$A|$A>not<$B|$B> #Define the symbol, here 'not', for non-equivalence.

Note that, as throughout, there is one expression per line, almost always an assignment or symbolic representation of a class of assignments, and the use of # to indicate comment. The ‘=’ so defined is also subsequently an assignment of the value of the expression on its right to that on the left, which is usually a bra-relator-ket. To highlight that, one could choose ‘:=’ rather than ‘=’, for more general reasons called the *metadata operator*. As an example of power in the hands of the programmer, for better or worse, subsequent basic format definitions such as <$A:=$B | $C> = <$B | $A $C> could then define the algebra that the operator implies. Such approaches that are theoretically controversial, or are arbitrary although self-consistent, are left to the user as programmer. Basic format definitions and the kind of expressions that are now to be discussed are in many respects a research bench for the inference system developer.

**Binding Variables.** In practice the above kind of statements are not further used once the job of defining basic symbols is done, but others that define format, and a large number that do not, can have a role that persists, as follows.

<$A|$B>=<$A|$C>and<$C|$B> #Define logical AND, and a basic type of syllogism.

$A etc are as before obviously general symbolic variables. The difference from the basic format definers is that the variables are distributed in a way that will provide meaningful match in expressions. They are here called *binding variables*, and the whole expression above is a *template*. Note that two binding variables of differing name such as $A and $B cannot normally stand for the same thing such as the word *ethanol*. <$A| bind |$B> will match *program* and those remaining (because they do not have any variables of this kind) represent *data*. Recall, however, that the whole set of data even by itself does imply an expression representing a static network that can be evaluated. “Program” and “data” lines may normally be mixed in order for human readability. The above template example is, incidentally, an example of what is fixed inside the compiler. Once logical *and* is so defined as a “symbol”, whatever it may be (say French “et”) it can be omitted by default, and it will imply multiplication of the two brakets.

**Defining Relators as Algorithms.** In this report, operators as relators such as verbs and verb phrases (or prepositional and other relationships) that cannot be defined in any basic logical way from what is defined so far, can be defined directly either as algorithms or as matrices. In practice, both algorithms and matrices are defined within subroutines which are currently written in Perl 5, and which the user includes in the input file, though in emerging versions, constant values of matrices can be applied to variables by an assignment statement. If a new operator is encountered which is not defined by the thread of previous definitions, it searches all the following lines for the subroutine of same name to define it. The following illustrates how an explicit expression with a new operator can be followed by its general definitional subroutine, here for brevity showing the start of that only.

<6|more than|3>

sub more_than #This is an example of a basic action defined by a subroutine.

{

This defines the evaluation <6|more than|3> which is in this case the scalar value 1 if true, and 0 if false. Functions such as *log* are similarly seen as operators, and are definable by the user in input. Expressions with operators that are not defined in this way may still have meaning by preceding definitions, and if not can still be manipulated by grammars represented in templates, and if not even that, they can still be used as entities having a value like a conditional probability in a Bayes Net. The following is the most common alternative to defining operators as algorithms.

**Defining Relators as Descendants of Defined Relators.** Extensions of definitions that are possible in terms of the ideas of adjoint, transpose, and complex conjugate need not necessarily be defined as algorithms and fall under the scope of the input:

<$A| equal to or less than|$B> = <$B|more than|$A>

This defines the converse such as active-passive inverse, and we may note the following examples. Again, defining relators first time by a subroutine is not essential for all purposes.

<$A| includes |$B> = <$A|$B> #conversion of brakets to categorical bra-relator-kets, an important “seed step”

<$A| include |$B> = <$A| includes |$B> #reduction to canonical form

<$A| be |$B> = <$B| include |$A> #definition of active-passive inversion, and choosing use of ‘be’ as canonical

<$A| be |$B> = <$A| is |$B> #reduction to canonical form

<$A| be |$B> = <$A| are |$B> #reduction to canonical form

<$A| $R |$B> = <$A | be | $B-$Rers> #An example generation of a non-categorical form, e.g. of

#

<$A| pays |$B> = <$A| gives | money>

Relators defined as above are stored with their definitions in a memory space called the *thesaurus*, for inspection of correct threading of multiple definitions dependent on each other. Those that have no such origins are described as *root*, which may mean that a subroutine of that name was used for the definition, or one was not found. A root relator is still capable of active-passive inversion, negation etc, and association variables such as a bra-relator-ket containing such can still be assigned values. In the above we were not concerned with plurality of nouns and the corresponding verb forms, and so reduce to a categorical form based on the infinitive, but note for example

<$A| are |$B> = <$As | are |$Bs> #example of treatment of plural to canonical form

< $A| is | $B> = <$As | are |$Bs> #example of treatment of plurality conversion canonicalization`<`

a sheep| is | a $B> = `<`

sheep| are | $Bs> #specific example of irregular noun

<$As| some are |$Bs> = <$As | are |$Bs><$As | are |$Bs>* #specification of one kind of evaluation of the existential case

Treatment of these cases is elaborate and depend on the extent to which the user wishes to go in treating semantics as linguistics, say as good English, and not least a tool for exploring the underlying generative grammar for correct forms. It depends on whether we want to render all statements into standard form using, say, a canonical form of Ogden Basic English[16], or start from the basic forms and define more complex verbs, as in e.g. “pays” ← “gives money to”, or constantly explore both. One purpose here is to reconcile apparently different forms that are really semantically equivalent and hence mutually redundant. Another is to deduce one rule from two or more, or conversely decompose it into simpler rules.

**Default Definitions and Manipulations.** In contrast to the above, there is a higher order of more fundamental symbolic representation (e.g. which applies to any relator as $R, without specifying it). The following examples may be noted.

<$A| = ($B, $C)

<$A| = |$A>*

|$A> = <$A|*

|$A> = $A|>

<$A| = <|$A*

|$A $B> = $A|$B>

<$A $B| = $A*<$B|>

<$A|$B>* = <$B|$A>

<$A|$B> = <$B|$A>*

<$A| $R |$B> = <$B| $R |$A>* #Relators R are Hermitian

<$A| $R |$B> = <$B| $R* |$A> #Relators R are not necessarily trivially Hermitian

<$A| $R* |$B> = <$B| $R* |$A>* #Relators R are not necessarily trivially Hermitian

The difference is that covering all cases of interest for these symmetries would be extensive and computationally inefficient, so the essential features for semantic computation are default in the program, which is indeed concerned with matters of adjoint, complex conjugate, and transpose. The action of the above would be merely to redefine the notation as that the user wishers to use. But also, while relators are usually *non-trivially Hermitian* as defined by the above examples, which is the default relator = relator† but relator ≠ relator*, we can have special cases or exceptions to specify.

<$A| is |green> =

<$A| marries |$B> = <$B| marries |$A>

**Active Variables.** Expressions such as *$molecule = chloramphenicol* are not templates, and the variable, distinguished by starting in lower case, is not a binding variable. Rather, once the variable is defined, they are active during the reading and interpretation of input, line by line. They can be seen as Perl variables. When { *executable Perl* } is encountered in an expression or stand alone on a line, it may return a value and may return a string that substitutes for the string ‘{ *executable Perl* }. The returned value may be an empty string, in which case the executable Perl may do other things, such as re-compute what is stored in variable *$molecule*. Note that *$molecule = *{ *executable Perl* } is permissible. The idea is evidently extensible to other programming languages than Perl.

**Dirac Net Definitional Phase.** All input expressions are “definitions”; the word “definitional” here refers those expressions that define it with constant values, and so do not contain $A,$B, $R etc. As the first step, the compiler builds the Dirac Net from these; it represents a large expression in the Dirac algebra. It is important to understand that binding variables and their templates do not play a role in this first pass, and need not be present in input. Then there is only one pass and the NET it is said to be *static*, like a Bayes Net. Conversely, there “must” be at least one expression in input without binding variables, meaning that if there is not, the null net will return in the subsequent evaluation phase the scalar probability 1. When the expression to the left of the value assignment is a braket of form or bra-relator-ket of form , or an expression that implies such, it is stored in the *network memory*; those to the right are not stored. Contrast this with the template which is the whole binding expression to be stored in *template memory*. The job of the expression to the right of ‘=’ is to assign probability values to those stored in network memory, directly as constant values or indirectly through expressions with active variables. These values are stored alongside the entities in network memory, actually as the values of bra-relator-kets etc. as associative variables, thus forming the analogue of a Bayes Net. They are generally but not always algebraic-complex quantities (with real and imaginary parts) as discussed soon below.

**Dirac Net Evaluation Phase.** In the evaluation phase, the collective degree of truth of the Dirac Net as a knowledge network is evaluated as a complex number encoding two probabilities, called forward probability Pfwd and backward probability Pbwd, as discussed later below. It is sufficient for the moment that the following can be done, but for a basic understanding note that (a) given a “network” of only , Pfwd = P(A|B) and Pbwd= P(B|A), and given two or more such the resulting Pfwd and Pbwd is the product of all Pfwd and all Pbwd. More correctly, this is as long as logical *and* is applied throughout, i.e. multiplication is applied between brakets and bra-relator kets as for conditional probabilities in a Bayes Net. Similarly, for the AND-only case, *order is immaterial*. They comprise a set. Statements about the world that are not included in that set have the same effect as if they were included with probability one, and note that including many irrelevant statements with lower values that one can only lower the overall probabilities. Implicit semantic triple forms, `<`

A | B and C and D>, or explicit triple form with categorical relators < B and C and D | are | A>, are still said to be triples despite joint multiple arguments B and C and D. They provide the counterpart of P(A | B and C and D) in a Bayes Net. In many cases, the implication of non-categorical relators may not have particular insightful consequences. We cannot compute from multiplying *and* the probability Pbwd that the etiology of that contamination is the lake (see discussion below on bidirectionality). *relevancy set*. The interface allows one to write, save, edit, open and run multiple relevancy sets as text files. They can also be joined into one relevancy set. The Pfwd and Pbwd result from that will be the product of Pfwd for the two sets and Pbwd for the two sets, though if further steps using binding are implied, that is not generally true, as follows.

**Dirac Net Evolution Phase with Binding Variables. **A more advanced aspect of flow is that the network is *dynamic*. In this phase the binding variables come into play, if present in input. The dynamic aspect arises in that templates may also be seen as *editing instructions* that can convert one or more data bra-relator kets to one or more other data bra-relator kets. A template <$A | are not | $B> = < non $B | are |$A> uses the logical law of the contrapositive to express the desire of the programmer to convert all forms such as < birds| are |non mammals> to a canonical form *match part* comprises the bra-relator-kets <$A| are $B> and <$B are $C> in the expression on right hand side of the assignment. These component bra-relator-kets, as features of “program”, hunt out those as bra-relator-kets as “data” in the network that match to them. They insert them into a copy of the expression on the right side of the template, and force evaluation of that expression, in this case simply a product of two bra-relator-kets. Note that there is only a match if different binding variables match different constant parts in the “data” bra-relator-ket, and if the same binding variables match the same constant parts in the “data” bra-relator-ket, and if all remaining constant parts of the bra-relator-ket match. The *edit part* of the template is the bra-relator-ket <$A| are |$C> on the left side of the assignment. From the relationship between the right hand side of the template and the bra-relator-kets matched, and the bra-relator-ket on the left side, it deduces the specific form of <$A| are |$C> on the left of the template, i.e. with the binding variables replaced by constants. At present, only one bra-relator ket can replace one or more. This shrinks the network. However, a mode may be applied that reverses the editing process and expands the network. The order in which templates are applied to edit is arbitrary, and the resulting network and its value often order independent, but not generally so, hence the following.

**Dirac Net Optimization Phase.** The order of template application can be randomized and for each random choice the process of net evolution and net evaluation is applied in a step called *local optimization. * This process repeated many times in the hunt to achieve what is optimal attempts *global optimization*, and may consist of various heuristic algorithms to direct the search in addition to randomizing the order in which templates are applied. For example, recall that confidence in the resulting probabilistic rule or statement can be applied to a template; templates with more confidence can be applied earlier. These issues otherwise lie beyond present scope and are under ongoing development, and will be described elsewhere, but some interesting general or typical findings are discussed in Results. The overall process is halted when no better optimum is found after a specified number of iterations, which has the appearance of convergence of the evaluation of the network when plotted.

**Reconciliation.** The above has omitted an important issue, except by brief reference to statements which look different but are semantically equivalent. Network evolution as editing of the network can generate rules for the network that can be detected as having similar semantic content to rules already present. This is facilitated by reducing all statements to a canonical form, say with the verb “to be’ and negatives reflected in negation of the verb, but that is a matter of the threading of definition presented in input. Reconciliation of detected similar forms is by

It can be shown to be order independent and not to artificially increase information content of the system. Actually, even in a static net, this is applied, and is part of every evaluation step. The reason is that different experts could enter the same rule twice or more with the same or different probabilities. We almost always run the Net with an evolution phase as local optimization, because recursive use of suitable templates can then detect that the apparently distinct rules are really semantically equivalent when the relationship is not obvious. This generates the canonical forms, defined by the templates in input. The above reconciliation algorithm is not arbitrary and has a deeper significance. It is a kind of hard wired template rule for stating the probabilities associated with two rules reconciled as one are computed as *randomly associated OR*, that is, the rules are independent, but can be distinguished as statements about the world that can recur, such that they are countable. The reconciliation mechanism applied repeatedly to remove the same or semantically equivalent rule is actually a counting process, and repeated application in any order implies the binomial theorem and binomial expansion. In that sense, data mining can be done, and is not distinct from the inference process. If we really wish to count in the Bernoulli sampling sense, it is recognized that seeing one such specific relation out of an as yet unknown and potentially large number implies small probabilities associated with duplicate rules. A small arbitrary and constant probability value is assigned and later normalized, as will be discussed elsewhere. However, assignment of probabilities to rules by a preceding separate step of data mining, or by human experts, or both, are the norm and have meaning, as follows.

**Empirical Assignment and Interpretation of Probabilities.** The probability assignment statement for the bra-relator-ket is the most important and is also the algebraic way of expressing what the content of any is, in probability terms.

= (*Pfwd*, *Pbwd*) (1)

For example,

The brackets may be omitted for such a scalar quantity, and we may also use 90%, 70%. In this case it implies simply

We can do much with a system based on this idea alone, but alone it says nothing about many useful things. These include how we involve P(overeating), P(obesity), other operators than AND, the role of the relator as a matrix and the consequence of relators acting on relators, the role and significance of mutual information, and the emergent properties of networks. We certainly could not show consistency with QM by doing *typical* QM calculations with this idea. To expand on and exploit the relationship to QM, we need to see how QM relates. QM usually calculates probabilities *ab initio* from the physics of the system of interest, while we want to derive bras and kets like those above with empirical probabilities data mined from the everyday world of human experience, or from human expertise. So, along with one other important modification relating to interpretation of the complex number, these probabilities somehow replace the normalized statistical weights ** k**e

(*Pfwd*, *Pbwd*) = ½ [ Pfwd + Pbwd + ** j** (Pbwd – Pfwd)] = ½(1 –

Note that (*Pfwd*, *Pbwd*) = (*Pbwd*, *Pfwd*)† = (*Pbwd*, *Pfwd*)*. This equation is the usual form used in QM based on the *commutator* to obtain the required symmetry properties such as = *** [15]. Here for the moment j is some kind of imaginary number with adjoint j^{†} = j*. Above and in the following account, any (Pfwd, Pbwd) can be replaced by a scalar real value. Inspection of Eqn. 2 shows this to be mathematically correct: if Pfwd = Pbwd, as in (0.6, 0.6), then the result of Eqn. 2 is a scalar real value, here 0.6.**

**Semantic Interpretation of Real and Imaginary Parts.** There is a semantic significance to this by the categorical interpretation of conditional probabilities, extended to bra-relator-kets later below. The real part ½ (Pfwd + Pbwd) of Eqn. 2 is the *degree of existential qualification*, the extent to which we can interpret = ** as “some A are B” ≡ “some B are A”. The imaginary part and commutator ½(Pbwd – Pfwd) is the degree of universal qualification, being −1 for the strongest case of “All A are B” and +1 for the strongest case of “all B are A”. These strongest values are not always achieved on a numerical interpretation of the categorical case. Whilst by definition P(“enzymes are catalysts”) = 1, P(“catalysts are enzymes”) can be considered as the fraction of individual cases observed as catalysts that are more precisely enzymes. Such a fraction is, nonetheless, typically a very small value, even smaller in P(“vertebrates are cats”) than in P(“mammals are cats’), and in cases like P(“inhibitors inhibit enzymes”) and P(“enzymes inhibit inhibitors”) the latter can be considered zero by definition. These considerations are universal and in QM or any other system have some kind of counterparts for any definition of j that allows an adjoint, as long as we are looking at one rule. However, when we bring rules together in different circumstances such as when, for example, we form syllogisms, it requires a deeper understanding of j.**

**The Hyperbolic Imaginary Number h.** Theoretical discussion is now required in reference to the nature imaginary number

This follows the “iota notation” of Ref. [9]. By analogy with operators in quantum field theory [15], and its complex conjugate [9], and Eqn. 3 arises as This “iota notation” is simple to use algebraically, as supposed to constantly addressing ** h**. It is readily shown to have the idempotent property the annihilation property and the normalization property . From these alone, example consequences are that As a practice, and illustration of relative ease of use as well as of its broader significance in underlying adjoint symmetries, one may deduce the following for the Riemann zeta function used in one approach to data mining [1, 2, 5, 6, 7]. It will also be needed later.

**Eigenvalues of h.** With practice iota algebra is simple because has eigenvalues 0 and +1 when has eigenvalues +1 and 0, and

It later appears as a decomposition called the *Dirac field* into left and right handed projections of the wave function which we *relate* to our Pfwd and Pbwd, and with each term seen as a spinor it is a dual spinor called the *Dirac spinor* [15], and idea that will become important below. The roles of and or their counterparts nonetheless are otherwise rather sparser in traditional *i*-complex QM equations than the above would suggest, precisely because what is usually written corresponds to partial or final solutions *after * substituting eigenvalues +1 and -1.

**The Generalization as j.** The above being so, QM really rests on the

This indicates (e^{hiθ})^{*} = e^{−hiθ} so that ** j*** = −

**Conjugate Symmetry.** The remaining hurdle is that e^{j θ} is fundamental but has a very restrictive symmetry that may be called *conjugate symmetry* e^{-j θ} (e^{-j θ})* = 1. The value of one conjugate variable determines the value of the other, so as noted by Chester [18] for e^{iθ}= e^{ixp/Ñ�} it follows that, suitably normalized, then P(x|p) = P(p|x) (his Eqn. 2.18): the *event reversal theorem*. It is very much what is *not* wanted for something like P(A|B) if we want a distinct and useful adjoint P(B|A). In physics, asymmetric examples are interaction with the Higgs particle and other external fields in Quantum Field Theory [15]. The role of the observer and experiment implied in Dirac’s ket normalization is an example of breaking that symmetry, and relevant here. Dirac’s ket normalization is part of his Recipe for obtaining observable probabilities [14] involving (1) normalization with the respect to the ket such that in the implied probability P(B|A) = 1 and (2) taking the product P(A|B) = **’** (**’**)^{†} = **’** (**’**)*. To obtain P(B|A) one can first form † and then proceed as above, or equivalently replace ket by bra normalization ‘, though this replacement is unphysical for conjugate variables A and B, like p and x in QM, which is why ket normalization is the more general recipe. We need to break conjugate symmetry in a more general way. Perhaps the most general statement follows from Eqn. 5. If we have two algebraic expressions _{1} and _{2} and we wish them to be the parts and such that = ***, then we can form the following linear combination.**

**Non-Conjugate Asymmetry and Simple Empirical Assignments.** Eqn. 7 is of immediate practical importance because forms with relators ** R**certainly do not in general have conjugate symmetry. We can write by definition

In general operators of interest here are Hermitian and cast orthodata values to metadata A and B. Here ‘:=’ is again the metadata operator. In QM what we really mean by x and p is e.g. position(nm):=6.4 and momentum(Kg nm/sec):= 2.3, and in medicine we have e.g. Systolic_BP(mmHg):=140. More analogous to spin states of QM are the active and passive forms of ** R**cast as the active form r and the passive form r* (relative to the active-passive tense of the relator). A mundane example is

Using these in a network, however, would assume the presence of

It is relatively easy for a human expert to assign the condition probabilities as

This is still bi-directional in conditionality. By Eqn. 9, ** R** in

There are certainly trivially Hermitian relators, though, e.g. *R* |B> = *R*|A>*.

**The Importance of Mutual Information.**This topic will be needed for further development, but it also relevant to the above empirical treatment.

is the *Fano* *mutual information* [19] between A and B, with Robson’s treatment [5, 6, 7] for finite data, with observed and expected frequencies o[ ] and e[ ]. The practical importance is that we get I(A; B) by data mining. The theoretical importance is that the limit expressed for indefinitely large data means that we can always write I(A; B) in terms of zeta functions ζ. One practical importance of K(A; B) is that Pfwd and Pbwd do not carry enough knowledge to calculate P(A), P(B) etc with nodes A, B and use these as prior probabilities if required. Nor can we evaluate probabilities as complementary or negative states such as P(~A) and P(~B), P(~A, B) and so on. If we could, then a large variety of scientific measures such as predictive odds, likelihood ratios, odds ratios, number needed to treat can be determined [20]. All this is possible if we provide the association constant *assoc*, K(A; B) = e^{I(A;B)}, as well as Pfwd and Pbwd. For a simple braket, provides Pfwd and Pbwd as P(A|B) and P(B|A), and P(A) = P(A|B)/K(A; B), and P(B) = P(B|A)/K(A; B), from which the above may be calculated. So in provision of these kinds of probabilistic rules and statements over the internet, something like the following would be appropriate [20].

< overeating Pfwd:=0.9 | causes assoc= 6.8 | obesity Pbwd:=0.7 >

K(A; B) is conveniently written with the relator in such cases because its exponent may be seen as the eigenvalue of a linear Hermitian operator such that *R* |B> = e^{I(A; B)} (12)

It is possible to show that the required form for a categorical relator like *include* or *if *(see further formal discussion below) is

If we consider a default braket as which would be the case when mutual information is not available such that A and B are held to be randomly associated, then for a linear Hermitian operator ** R**, association constan K(A;B) = e

** h-Complex Mutual Information.** When we consider e

It is the situation for conjugate variables, say ** = P(A) e**^{hI(A; B)} = P(B) e^{hI(A; B)}.

**Some Issues of Mixed i and h Complex Systems.** This is a theoretical aside as a discussion to show issues of consistency with QM, but it will become important as part of the method should it be shown that the i-complex algebra is also relevant to inference, as some workers have suggested. One should really think in terms of the “mother system” as comprising complex terms involving

θ = 2π** A**(A; B)/h = −I(A; B) (17)

where ** A**(A; B) is the physical action expressed in terms of A and B such as p and x or equivalent energy E and time t, h is Planck’s constant, and I(A; B) implies mutual information given the uncertainty in measurement according to the uncertainty principle. The choice of −I(A; B) rather than +I(A; B) may appear to be for certain consistencies with text-book QM but depends on “directional” frame of conditionality reference, and on

must be correct according to traditional considerations of normalization because with eigenvalues** h** = −1 and

We are now required to extend beyond the specific case of conjugate variables, and some brief observations should be made which really relate to what complex conjugation means when at least two kinds of complex numbers are present. was an arbitrary form, but we can build from the brakets for conjugate variables ** as above, now highlighted in bold to show that meaning, using The order of **** j**-complex conjugation operations on

*I*** j** = 2πn

n

We can think of the transformations

*j* →*i*: n*i* = integer x* i (quantized)*

*j* →** h**: n

We write these in place of the pure phase θ, and we do not need** **a deeper understanding of the structure of ** I** to use it. However there is evidence that real solutions exist that shortcut the Dirac recipe for observable probabilities like P(x|y). For the analogue of the traditional treatment of the quantized case

** h implies Classical Probabilistic Behavior.** for

and the exact calculation by the classical law of composition of probabilities representing the sum over all possible arguments X.

It now also holds as the law of composition of probability amplitudes as in wave mechanics

**Orthogonal Vectors. **QM makes great use of these as a simple and tractable case, and it may serve to illustrate how in input the user can program the required functionalities. The difference here is that for classical and semantic purposes we define them as ** h**-complex.

The importance of these is that we can think of all nodes in the network with self probabilities P(A), P(B) etc. as prior probabilities expressible as these vectors, and associate the relators with mutual information from data mining. However it is often useful to think of some or all as relatively fixed and change the mutual information instead. Typically, leaf nodes in the network are considered priors, though with a bidirectional network the distinction is not so meaningful. We may change any self probabilities. Usually we envisage input and output as special brakets of observation that relate to the vectors, as e.g. , where ? indicates the fact of an observation as occurring with probability one. The simple example <**?**|A>

Here (1,0) is Pfwd and Pwd setting the value that implies and** **(0, 0.7) similarly setting the value that define P(B), and (0,0.6) similarly for P(A). We do this for every node A, B, C, in the network. But in Eq. 27, every node is then *orthogonal* to every other node: = 0. Whilst all those brakets and bra-relator-kets that we have not specified imply that they are there with probability one, all those that are expressly defined make, in effect, the assumption that all nodes are *mutually exclusive*, until corrected otherwise by the addition of a relator. In many cases that will stand as correct. ** h**-complex value. Non-categorical relators in general have the effect of removing the categorical meaning, and specifying some other probabilistic relationship. We should more specifically, at least in algebra if not input, write

**Operators Acting on Orthogonal Vectors.** Recall that [a, b; c, d] [p, q]T = [ap+bq, cp+dq]T is the product of a matrix with a column vector, and [p, q] [a, b; c, d] = [pa+qc, pb+qd] is the product of a row vector with a matrix. They are defined in the following non-trivially Hermitian form, and the actions on bra and ket are respectively as follows.

We may now consider the notion of “casting a value” into the bra and ket. In the above sense, ** R**(A, B)

** R**(A, B)

Here ** R**(A, B) ≠

**Hence is effectively a projection matrix for the form that really interests us. We just have to always include a R** |B> to stand for . To understand the process of antisymmetric projection in the absence of bra or ket normalization, We can factorize the actions of

The system developed is rich in capabilities and for brevity attention will be paid here in Results and Discussion to those summary findings and insight of general importance.

To appreciate the significance of following comments, recall that for a net which is not static, the process of local optimization repeated many times in the hunt to achieve *global optimization*. The usual idea of optimal in this case relates to the notion of knowledge is what results when given data is processed such that maximal knowledge is carried in the least number of bits of information. In practice, it is the *information density* as the average information content of the rules, the total information in the net divided by the number of rules N. Like the network overall, each rule is associated with Pfwd and Pbwd, which relate to the semantic statement and its adjoint. It turns out that an equivalent semantic statement can always be written by use of negation or some qualification that replaces a probability P by 1 – P, at least in principle. This is done if any P < 0.5, so information = –log_{e}P lies between 0 and 1. The maximum information that a network of N rules can have is N bits in each direction (in the sense of overall Pfwd and overall Pbwd) or 2N considering both directions. The theoretical achievable upper limit of information density is thus 1 bit in each direction of conditionality. Recall that by information theory and Popper’s argument [11] it is rules of 100% probability in a direction of conditionality that are not interesting, since the same effect would be obtained by not having them there at all. Popper’s position is that asserted statements mean little unless refuted. If all probabilities are 100% in a given direction of conditionality, that information density by the above definition is 0%. Note also that the closer the two directional probabilities are to being equal, the less directionality matters, and the rules could on average be expressed as the symmetric existential or “some” case.

Many computations to date have been concerned with the ordering of diagnostics and selection of therapy, but well illustrate the above principles. In one study regarding predictions of tuberculosis in the newborn based on suspected exposure to tuberculosis by the mother 46 rules implying multiple cyclic paths were reduced to the 23 in 30 local optimizations and no improvement was obtained in up to 2000 local optimizations. The overall forward % probability of the Net was 0.323% and the overall reverse probability was 57.684%, with real and imaginary components of 29.0 and 28.7 on a “percentage basis” (0.290 and 0.286 in actuality). The 0.323% largely, but not solely, reflects the low probability that tuberculosis is transmitted from mother to the baby in the womb (compared with say, HIV, which has a high probability). The 57.684% of are the collective causes of etiologies which *could have* caused tuberculosis in the newborn. The forward information density of 0.55 bits is considerably less than the theoretical upper limit of 1.00 nats, but the backward information density 0.053 reflects much less evidence from refutation in the Popper sense [18].

Whilst comparable studies involving pharmaceutical chemistry are at an early stage, there is a tantalizing much larger number of available rules from the reading of all US patents [1]. The 6.7 million proto-rules were originally of form<*formula* | is quoted by | *assignee* and *patent number*>, where assignee is most often the company such as AstraZeneca, and the formula is a compound described in SMILES code [1]. The formula basically reflects the nomenclature of names advised by the International Union of Pure and Applied chemistry. The system has to recognize that the same formulae can be written in different ways, which is particularly problematic when relating parts of compounds. This latter is probabilistic in the sense of a degree of similarity between molecules with similar parts [1]. Initially we simply assigned the above formula-assignee rule as 100% true in both directions of conditionality. While containing a huge amount of information, this does not add much to *probabilistic* inference, and the following will also serve as ilustrating use of Pfwd and Pbwd. The probabilistic interpretation is up the user but, as an example, we recently we employed the rule <*formula* | if | *patent number*>, with a distinct Pfwd and Pbwd. Given a patent, several compounds may appear in it, and the same compounds can appear in different patents. Pfwd = P(*formula* | *patent number*) = n(*formula*, *patent number*) / n(*patent number*) is often less than 100% because the chance of picking one precise compound from several on a patent at random is less than 100%. Compare P(males | New Yorkers) = n(males, New Yorkers) / n(New Yorkers) ≈ 50% to see the idea of this. More interesting from an intellectual property perspective is that the same molecule may appear on different patents, so that Pbwd = P(*patent number *| *formula*) = n(*formula*, *patent number*) / n(*formula*), which is certainly less than 100%, and more interesting still that Pbwd = P(*assignee *| *formula*) which is less than 100%. The interpretation could be that given a molecule, assignees do not really have 100% clear ownership. However, most often it is likely that prior art may have been quoted, or a known compound used in a synthetic process, or new use is being patented for a compound which is not a novel composition of matter. This area is nonetheless controversial because it is a good way of detecting controversial issues. It remains, however, a matter of data mining until several rules are combined to make a network to be used in inference. At present the networks for chemical compounds are very small, and therefore more in the nature of queries.

**General Theoretical Findings.** A referee raised the issue of how the indeterminate aspects of QM theory could be applied in this area, noting how this aspect is being extensively utilized in quantum computing, and wondering how this powerful aspect of QM theory might be applied to probabilistic semantics. Quantum indeterminacy is the seemingly necessary incompleteness in the QM description of a physical system. It is also true that, along with the related idea of fundamental uncertainty, this feature of QM has attracted attention in regard to uncertainty, fuzziness, and sometimes unexplained leaps of insight in human language and thought, albeit it usually approached rather symbolically rather than through use of a system of complex algebra. In large part the issue of indeterminacy relates to what can be characterized by a *probability distribution* on the set of measurement outcomes of an observable. Probability distributions are non-discrete vectors and can be represented finitely by vectors, while matrices describe the dependencies between them. We have built in vectors and matrices from the outset precisely to accommodate probability distributions in future, and not least because they are of course no less important in classical data analytics. We could for example use a density function to express Bayesian degrees of belief for different values of an observed quantity, or indeed of an information value or classical probability, given say a binomial distribution as a likelihood. It can of course imply a scalar value as an expected or average value derived from distributions: expected information in terms of zeta functions rose formally from this idea, but preserving the original distributions allows averaging of other (perhaps as yet unforeseen) measures, and permits other statistical summaries such as maximum likelihood (as opposed to expectation). It seems intuitively obvious that we could introduce appropriate indeterminacy in this way.

Highly relevant to the above is that the *i* ** **→ ** h** transformation does not get rid of distributions implied in QM nor does it get rid of Planck’s constant and the uncertainty principle, and yet the interpretation becomes classical. The distribution however changes, and in a very useful way. Consider the particle on a circular orbit of length L. Proceeding as described in Methods,

It is a *particle function* as a *Gaussian function* of x spreading with time, although that will only be apparent for everyday objects of large mass m over cosmological time t because of the small value of h. It will be classically interpretable as increasing error with which we can interpret the position of the particle, and increasing entropy along with that increasing lack of knowledge. Planck’s constant merely sets the least possible error on our measurements, which for any realistic apparatus will be much larger. “Measurements” really include perturbing interactions with other objects and fields that do not involve human observers but in some way mimic the interaction implied in true observation. Whatever the meaning of n now, say such that the mass of the object M = nm, we should presumably stick with that value based on initial state, assuming observations or other interactions do not modify it. *Between* observations on an entity as an object rather than a wave, QM teaches that switches to the wave function, but this is in practice indistinguishable from the uncertainty arising from our observations and other interactions. These aspects have been discussed elsewhere (e.g. Ref. [21]), but there are slight differences and there is no inclusion of n in those references. The appearance of the Gaussian function is a blessing for everyday practical use because we can apply transformations to express uncertainties in data and distributions in populations that follow the ubiquitous normal distribution.

This raises the question of what a distribution can mean for a state A when it is categorical data: say the observation that a patient is male. We can certainly think of an error in making that assessment of being male, but there is a more fundamental analogy with QM. The point is to see it in conditional probability terms. When we write to embody P(A|B) and P(B|A), it resembles the QM particle seen as being in *oscillation* between states A and B, at any moment of time being in these with a certain weight, P(A) and P(B). An obvious choice for equations like Eqn. 33 is to consider the ground state. This has the interesting analogies with setting up the *i*-complex wave equations for a harmonic oscillator [18], in which the particle is seen as oscillating around its mean position. The ground state is then a *Gaussian function*, i.e. a result similar to that obtained using ** h**. It appears plausible that applying this “oscillation interpretation” analogy more generally, including bias to generate skewed Gaussian functions, no transformation from

**Cyclic Paths.** As noted above, being algebraic-complex, encodes both directions of conditionality and the Dirac Net is thus *bidirectional*. A consequence of bi-directionality is that the network can be a general graph allowing cyclic paths. The emergent property of a cyclic path such as is that its value is scalar real, as can be shown algebraically. This would not of course be seen if two separate Bayes Nets were used to encode each direction of conditionality. In consequence of the real value, the notion of events ultimately affecting their own cause that led to the restrictions of a Bayes Net traditionally defined as an acyclic directed graph [10] do not apply. A Dirac Net does not require iteration for solution.

**Work in Progress.** An assembly of rules linking common drug names to formulae of compounds is in hand. Also by reading patents, a growing number links the compounds to protein targets, disease targets, species (typically but not always humans) in which results are obtained, and relevant biochemical and methodological details. Of considerable interest is the deduction of related formulae that may have intended or other biological actions, which until experimentally qualified, are inherently probabilistic. Novel formulae can be automatically generated by evolving under the “natural selection” of not being in the scope of those already in patents, but binding appropriately in simulation to the protein targets. In addition, they may satisfy other requirements such as stability, relative ease of synthesis, and implications of minor toxicity effects by common use of related chemistries [1]. Probabilistic inference from these kind of rules will be described elsewhere.

Networks based on quantitative QM concepts, and on Dirac algebra and notation in particular, appear to be very promising as a basis for probabilistic semantics. The issue of what is Best Practice in the specific implementation here is inevitably still open to argument on a detailed basis, as it is an early work in progress. Nonetheless it is felt that the overall strategic position that a correct rendering of QM is Best Practice is persuasive. We have not here discussed in depth the important topic of negation and the significance of definite and indefinite article, and more complex statements with more than one relator. These and other linguistic aspects, as well as Artificial Intelligence issues relating mapping of language to conceptual spaces of thought, will be discussed elsewhere (See also Refs. [20,21]).

The author has no competing interest.

Received: 29-Jan-2012 Accepted: 22-Mar-2012

Published: 27-Mar-2012

- Robson B, Dettinger R, Peters A, Boyer SKP:
**Drug discovery using very large numbers of patents: general strategy with extensive use of match and edit operations**.*J. Computer Aided Molecular Design*2011,**25(5)**:427-41 | Book - Mullins IM, Siadaty MS, Lyman J, Scully K, Garrett CT, Miller WG, Robson B, Apte C, Weiss, S., Rigoutsos I, Platt D, Cohen S Knaus WA:
**Data mining and clinical data repositories: Insights from a 667,000 patient data set****.***Computers in Biology and Medicine*2006,**36(12)**:1351-77 | Article - Robson B, Garnier J: Introduction to Protein and Protein Engineering. 1984, 1988, Elsevier Press | Article
- Robson B, Baek OK:
**The Engines of Hippocrates. From the Dawn of Medicine to Medical and Pharmaceutical informatics****.**2009, Wiley. | Book - Robson B:
**Clinical and pharmacogenomic data mining: 1. Generalized theory of expected information and application to the development of tools**.*J Proteome Res*2003;**2**;(3.);283-302 | Pubmed - Robson B:
**Clinical and Pharmacogenomic Data Mining: 3. Zeta Theory As a General Tactic for Clinical Bioinformatics**.*J. Proteome Res.**(Am. Chem. Soc.)*2005,**4(2)**:445-455 | Article - Robson B:
**Clinical and Pharmacogenomic Data Mining: 4. The FANO Program and Command Set as an Example of Tools for Biomedical Discovery and Evidence Based Medicine***.**J. Proteome Res. (Am. Chem. Soc.)*,**7****(9)**:3922–3947 | Article - Buchanan BG, Shortliffe EH:
*Rule Based Expert Systems. The Mycin Experiments of the Stanford Heuristic Programming Project*1982, Addison-Wesley: Reading, Massachusetts. | Article - Robson B:
**The New Physician as Unwitting Quantum Mechanic: Is Adapting Dirac’s Inference System Best Practice for Personalized Medicine, Genomics and Proteomics?***J. Proteome Res.*(Am. Chem. Soc.) 2007,**6(8):**3114 – 3126 | Article - Pearl J, Russell S: Bayesian Networks.
*Handbook of Brain Theory and Neural Networks*; M. A. Arbib, Ed. 2003, MIT Pres, Cambridge: 157-160 | Article - Popper K:
**The Logic of Scientific Discovery**, (as*Logik der Forschung*, 1934), English translation (1959), Routledge) - Klopotek MA:
**Cyclic Bayesian Network – Markov Process Approach**,*Studia Informatica*2006, 1/2(7) Systemy i Technologie Informacyjne | Article - Prediou L, Tuckenschmidt H:
**Probabilistic Models for the SW****– A Survey**. http://ki.informatik.uni-mannheim.de/fileadmin/ publication/ Predoiu08Survey.pdf 2009, (last accessed 4/29/2010) | Book - Dirac PAM:
**The Principles of Quantum Mechanics**. 1930, Oxford University Press, Oxford | Article - Penrose R:
**The Road to Reality. A Complete Guide to the Laws of the Universe****.**2004, Jonathan Cape, Random House, London | Article - Ogden CK,
**Basic English: A General Introduction with Rules and Grammar**. 1930,1940,

Paul Treber & Co., Ltd., London | Fulltext - J. Cockle,
**A New Imaginary in Algebra**.1848*London-Edinburgh-Dublin Philosophical Magazine*, 3(33): 345-349 - Chester M:
**Primer of Quantum mechanics****.**1987, John Wiley and Sons, New Jersey (2003, Dover Publications) | Article - Fano RM:
**Transmission of Information: A Statistical Theory of Communications**. (1961), MIT Press, Cambridge | Article - Robson B, Balis UGJ, Caruso, TP:
**Considerations for a Universal Exchange Langue in Healthcare**. 2011,*e-Health Networking Applications and Services (Healthcom)*, 173-176 | Article - Robson B:
**Towards intelligent Internet-roaming agents for mining and inference from medical data**.*Stud Health Technol Inform*2009;**149**;(157-77 | Article

Volume 1

Robson B: **Towards Automated Reasoning for Drug Discovery and Pharmaceutical Business Intelligence**. *journal of Pharmaceutical Technology and Drug Research* 2012, **1**:http://dx.doi.org/10.7243/2050-120X-1-3

View Metrics

Copyright © 2015 Herbert Publications Limited. All rights reserved.

Post Comment|View Comments