Select your font size 
 
about us products & services consulting & support news & events contact us
To make it clear how Bayes theorem works, you will develop an online

Medical diagnosis wizard - PEI

print this article 
 

To make it clear how Bayes theorem works, you will develop an online medical diagnosis wizard using PHP. This wizard could also have been called a calculator except that it takes four input steps to supply the prerequisite information then a step to review the result.

The wizard works by asking the user to supply the various pieces of information critical to computing the full posterior probability. The user can examine the posterior distribution to determine which which disease hypothesis enjoys the highest probability based on:

  1. The diagnositic test information
  2. The sample data used to estimate the prior and likelihood distributions

Bayes Wizard: Step 1

Step 1 in using Bayes theorem to make a medical diagnosis involves specifying the number of disease alternatives that you will examine along with the number of symptoms or evidence keys. In the generic example you will look at, you will evaluate three disease alternatives based on evidence from two diagnostic tests. Each diagnostic test can only produce a positive or negative result. This means that the total number of symptom combinations, or evidence keys, you can observe is four (++, +-, -+, or --).

Figure 3. Form to enter disease hypotheses and symptom possibilities
Form to enter  disease hypotheses and symptom possibilities

Bayes Wizard: Step 2

Step 2 involves entering the disease and symptom labels. In this case, you are just going to enter d1, d2, and d3 for the disease labels and ++, +-, -+ and -- for the symptom labels. The two symbols used for symptom labels signify whether the results of the two diagnostic tests came out positive or negative.

Figure 4. Form to enter disease and symptom labels
Form to enter disease and symptom labels

Bayes Wizard: Step 3

Step 3 involves entering the prior probabilities for each disease. You will use the data table below to determine the prior probabilities to enter for step three and the likelihood to enter for step four (this data table originally appeared in Introduction to Probability). Using this example allows you to confirm that the final result you obtain from the wizard agrees with the results you can find in this book.

Figure 5. Joint frequency of diseases and symptoms
Joint frequency of diseases and symptoms

The prior probability of each disease refers to the number of patients diagnosed with each disease divided by the total number of diagnosed cases in this sample. The relevant prior probabilities for each disease are entered in the following:

Figure 6. Form to enter disease priors
Form to enter disease priors

You do not have to rely upon a data table such as the previous one to derive the prior probability estimates. In some cases, you can derive prior probabilities by using common-sense reasoning: The prior probability of a fair two-sided coin coming up heads is 0.5. The prior probability of selecting a queen of hearts from a randomized deck of cards is 1/52.

You also commonly run into situations where you intially have no good estimates of what the prior probability of each hypothesis might be. In such cases, it is common to posit noninformative priors. If you have four hypothesis alternatives, then the noninformative prior distribution would be 1/4 or 0.25 for each hypothesis. You might note here that Bayesians often criticize the use of a null hypothesis in significance testing because it amounts to assuming noninformative priors in cases where positing informative priors might be more theoretically or empirically justified.

A final way to derive estimates of the prior probability of each hypothesis P(Hi) is through a subjective estimate of what those probabilities might be given everything you have learned about the way the world works up to that point P( H=h | Everything you know). You will often find Bayesian inference sharing the same bed with a subjective view of probability in which the probability of a proposition is equated with one's subjective degree of belief in the proposition.

What it important in this discussion is that Bayesian inference is a flexible technique that allows you to estimate prior probabilities using objective methods, common-sense logical methods, and subjective methods. When using subjective methods, you must still be willing to defend your prior probability estimates. You may use objective data to help set and justify your subjective estimates which means that Bayesian inference is not necessarily in conflict with more objectively oriented approaches to statistical inference.

Bayes Wizard: Step 4

The data table provides you with information you can use to compute the probability of the symptoms (like test results) given the disease, also known as the likelihood distribution P(E | H).

To see how the likelihood values entered below were computed, you can unpack P(E|H) using the frequency format for computing conditional probabilities:

P(E | H) = {E & H} / {H}

This tells us that you need to divide a joint frequency count {E & H} by a marginal frequency count {H} to obtain the likelihood value for each cell in your likelihood matrix. The top left cell of your likelihood matrix P(E='++' | H='d1) can be immediately computed from the joint and marginal frequency counts appearing in the data table:

P(E='++' | H='d1) = 2110 / 3125 = .6562

All the likelihood values entered in Step 4 were computed in this manner.

Figure 7. Form to enter likelihood of symptoms given the disease
Form to enter likelihood of symptoms given the disease

It should be noted that many statisticians use likelihood as a system of inference instead of, or in addition to, Bayesian inference. This is because likelihoods also provide a metric one can use to evaluate the relative degree of support for several hypotheses given the data.

In the previous example, you can see that the probability of a particular evidence key varies for each hypothesis under consideration. The probability of the ++ evidence key is the greatest for the d1 hypothesis. You can assess which hypothesis is best supported by the data by:

  1. Examining the likelihood of the evidence key given each hypothesis key
  2. Selecting the hypothesis that maximizes the likelihood of the evidence key

Doing so would be an example of inference according to the principle of maximum likelihood.

Another interesting point to note is that all the values in the above likelihood distibution sum to a value greater than 1. What this means is that the likelihood distribution is not really a probability distribution because it lacks the defining property that the distribution of values sum to 1. This summation property is not essential for the purposes of evaluating the relative support for different hypotheses. What is important for this purpose is that the "likelihood supplies a natural order of preference among the possibilities under consideration" (from R.A. Fisher's Statistical Methods and Scientific Inference, p. 68).

You may not understand fully the concept of likelihood from this brief discussion, but I do hope that you appreciate its importance to the overall Bayes theorem calculation and its importance as the foundation for another system of inference. The likelihood system of inference is preferred by many statisticians because you don't have to resort to the dubious practice of trying to estimate the prior probability of each hypothesis.

Maximum likelihood estimators also have many desirable mathematical properties that make them nice to work with (the properties include transitivity, additivity, a lack of bias, and invariance under transformations, among others). For these reasons, it is often a good idea to closely examine your likelihood distribution in addition to your posterior distibution when making inferences from your data.

Bayes Wizard: Step 5

The final step of the process involves displaying the posterior distribution of the diseases given the symptoms P(H | E):

Figure 8. Probability of each disease given symptoms
Probability of each disease given symptoms

The section of the script that was used to compute and display the posterior distribution looks like this:

Listing 4. Computing and displaying the posterior distribution
<?php
include "Bayes.php";

$disease_labels = $_POST["disease_labels"];
$symptom_labels = $_POST["symptom_labels"];
$priors         = $_POST["priors"];
$likelihoods    = $_POST["likelihoods"];

$bayes = new Bayes($priors, $likelihoods);
$bayes->getPosterior();
$bayes->setRowLabels($symptom_labels);    // aka evidence labels
$bayes->setColumnLabels($disease_labels); // aka hypothesis labels
$bayes->toHTML();
?>

You begin by loading the Bayes constructor with the priors and likelihoods obtained from previous wizard steps. Using this information, you compute the posterior using the $bayes->getPosterior() method. To output the posterior distribution to the browser, you first set the row and column labels to display, then output the posterior distribution using the $bayes->toHTML() method.



Page:   1  2  3  4  5  6  7  8  9  10  11 Next Page: Implementing the calculation with Bayes.php

The content shown in this page was first published by IBM developerWorks and is reprinted with permission from Paul Meagher (www.datavore.com)


Most Recent Website and Regional Updates

 Research Tools
Measure human resource allocation and collect data with the goal of determining patterns that will bring forward actionable insights which may lead to policy changes, saving money and improving quality of service.

 
 Process Evaluation Questions
Questions to help focus discussion about process improvement

 
 Operations Research
Operations Research (frequently called OR), is the methodical study of how to do things better. It is also called Optimization Theory.

 
 Our Role in Operations Optimization
Meet objectives more efficiently by improving operational effectiveness and profitability.

 
 Monte Carlo Method
Short description of the Monte Carlo Method in optimization theory.

 
 Introduction to Markov Process
Introduction to Markov Chains and Markov Processes, with a link to an introductory PDF which provides essential details about Markov analysis, along with helpful examples and exercises.

 
 Windows PC Tech Support Price Calculator
Transparen offers first-class Windows PC and Linux PC tech support services based on a number of factors such as a count of supported desktops.

 
 Remote Technical Support
Remote technical support is provided 24/7 by Transparen's staff, not only for data recovery, but also for proactive maintenance (i.e. detecting problems early, before they become disasters).

 
 Monthly Price for Computer Support Service
Transparen's billing is predictable and affordable. Use our automated price calculator to find the monthly price for our tech support service.

 
 Eco-Friendly Fashions at Affordable Prices

 
 Chef Michael to Host Fall Flavours - Prince Edward Island

 
  Sharon Labchuk: A Force of Nature Leading the Green Party of PEI

 
 The Enviro Church Conservation Project

 
 Carbon credits - Farmers Helping Farmers

 
 21/11/2008: Somali Pirates
Earlier this week, a giant Saudi oil tanker became the largest vessel ever hijacked by pirates operating with near impunity off the coast of Somalia. Today on the Current podcast, we'll have an interview with the head of one of the pirate groups operating in the area.

 
 20/11/2008: Juice Box Investigation
How a tainted juice box led one family on a convoluted quest for answers about their children's health.

 
 19/11/2008: Intimidation by Pharmaceutical Companies
Doctors sometimes discover that the drugs they're prescribing can be more harmful to some of their patients than the diseases they treat. But as CBC Radio health reporter Pauline Dakin tells the Current, some doctors say they've been intimidated by pharmaceutical companies into keeping their suspicions and their research quiet.

 
 18/11/2008: The Ascent of Money: Niall Ferguson
Author Niall Ferguson forecasts our financial future by checking on the planet's moneyed history in his book, "The Ascent of Money: A Financial History of The World".

 
 17/11/2008: Shuja Nawaz on Pakistan
For more than half of its existence as an independent nation, Pakistan has been governed by its military. Author and journalist Shuja Nawaz dissects the Country's military history and provides a perspective on today's political reality.

 
 14/11/2008: Peter C. Newman on Izzy Asper
A bold and brassy Prairie man named Izzy created Canada's third national T.V. network. Now, as CanWest Global sheds jobs and cash, we're stepping back, and remembering the life and times of Israel Asper.

 
 13/11/2008: The Full Interview with Mellissa Fung
Yesterday, Anna Maria spoke with Mellissa Fung, a CBC journalist who was held captive for a month in Afghanistan. This is the only interview Mellissa Fung will do. Today on the Current podcast, their full conversation.

 

Google
 
Web transparen.com

Contact Information

Related Information

 
   
 
E C M | © 2003-2007 Transparen Corp.      

Standardized Services: Data Recovery Service / Creative Services / Premium Web Hosting Services / System Administration Tech Support Services
Recent Projects: Full-Service Mortgage and Financing Company / System to manage flights from Vancouver to Tofino / Photo exchange verification service
Our Vancouver BC Server Proudly Hosts: automated parking and revenue control systems, leafside lane at southlands, cost effective alternative power sources, Higher Grade Learning Centres, pacific forage bag supply, sunburst medical, neosonic design, roger mahler photography - passionate, intriguing, desirable, the connection between east and west, affordable flights to victoria and tofino, low interest mortgage brokers in vancouver, richmond, surrey, toronto, Toronto Calgary and Vancouver IT staffing and talent search
* Alberton * Ascension * Belfast * Borden * Breadalbane * Cavendish * Central Bedeque * Charlottetown * Cornwall * Emerald Junction * Foxley River * Freeland * Georgetown * Hampton * Hebron * Hunter River * Kensington * Knutsford * Miminegash * Montague * Morell * Mt. Stewart * Murray Harbour * Murray River * North Cape * North Rustico * O'Leary * Orwell * Souris * Stanhope * Stratford * Summerside * Tignish * Tyne Valley * Avonlea