Select your font size 
 
about us products & services consulting & support news & events contact us
Centuries-old techniques developed by Thomas Bayes find modern applications because they are simple and effective.

Bayesian Techniques Assist Automated Decision Tools - PEI

print this article 
 

Centuries-old techniques developed by Thomas Bayes find modern applications because they are simple and effective.


(1) Bayesian techniques are employed in the automated filtering of unwanted spam, the formation of medical diagnoses, the detection of viruses, and in several other ways that advance compatible business objectives.

Bayes Theorem uses Conditional Probability to calculate the probability of A given B, provided that the probability of B given A, the probability of A and the probability of B are all known.

For example, suppose an automated program could determine that a particular phrase is present in 70% of spam and 50% of non-spam emails, and that an email is 90% likely to be spam, and suppose that A means "the email is spam", B means "the email is not spam", and C means "the phrase is present". Then P(A), the probability of A, is 90%. P(B) is 10%. P(C|A), the probability of C given that A is true, is 70%. P(C|B) is 50%. P(C) = (0.70 * 0.90) + (0.50 * 0.10) = 0.68 = 68%. Some useful probabilities for classifying the email would then be P(A|C), the probability that the email is spam given that the phrase is present, and P(B|C), the probability that the email is not spam given that the phrase is present. Using conditional probability, P(A|C) = P(A) * P(C|A) / P(C). P(A|C) = 0.90 * 0.70 / 0.68 ~= 0.92647 ~= 93%. P(B|C) = P(B) * P(C|B) / P(C) = 0.10 * 0.50 / 0.68 ~= 0.07353 ~= 7%. Therefore, the probability that the email is spam, based on that one data point, is 93%, and the probability that it is not spam is 7%.

To improve the accuracy of this technique, a computer program could analyze thousands of data points in less than a tenth of a second, which is approximately how long it takes to download an email message. The program could test for phrase D, phrase E, phrase F, etc., and use data about each one to modify the overall confidence that the email is spam or not spam. In hand-waving mathematical terms, if C, D and E are found to be true, the program can automatically determine P(A|C,D,E) (the probability that the email is spam given that C, D, and E are true) using an extended version of Bayes Theorem, applied to a Bayesian network. By using good tokens (i.e. by asking the right questions, using all available information), this technique can be up to 99.5% accurate with 0.3% false positives.

A related algorithm is state-based, so that rather than using a directed acyclic graph of conditional probabilities (a Bayesian network), one could automatically produce a set of known states, along with transition matrices (stochastic matrices) showing the probability of moving from a given state to another (transition probabilities). These stochastic matrices fall naturally from statistics coming from a large enough set. CRM114 uses this technique to improve spam detection accuracy beyond naive Bayesian techniques, and suggests innovative approaches to potentially bring 99.999% reliability.

The applications of statistical techniques such as described above go beyond spam detection. For instance, one could imagine the same or similar techniques being used in the medical field to classify DNA, in industry to properly direct calls that come through automated call systems, in elections to predict how a particular message may affect polls, etc.

To learn more about how statistical methods can be used to improve your business, please contact us.

Works Cited

  1. J J O'Connor and E F Robertson. "Thomas Bayes." From The MacTutor History of Mathematics archive.
  2. William S. Yerazunis. "The Spam Filtering Plateau at 99.9% Accuracy and How to Get Past It." From
  3. CRM114 - the Controllable Regex Mutilator.
  4. Eric W. Weisstein. "Bayes' Theorem." From MathWorld--A Wolfram Web Resource.
  5. Eric W. Weisstein. "Conditional Probability." From MathWorld--A Wolfram Web Resource.
  6. Paul Graham. "Better Bayesian Filtering." From
  7. Paul Graham.

Most Recent Website and Regional Updates

 Timing Upgrades - Factors Affecting Time Between Purchases for Tech Toys
It is possible to understand client purchase decisions by performing a regression analysis. By forming strategies based on the results, companies can optimize strategic programs to maximize profits.

 
 Personal Shopping Assistants - Turning the Table Against Merchant Databases
Consumers can use technology to watch the merchants who already have been watching them. But to do this, they need a champion.

 
 Operations Research
Links to pages related to Operations Research, which is the methodical study of how to do things better.

 
 Operations Research: Avoiding the Taint of Corporate Espionage
A $200M corporate espionage lawsuit against Westjet uncovers the possibility that Operations Research firms may not fully protect client data. A mechanism is suggested to ensure two competitors do not share the same Operations Research team.

 
 Reviewing the Audit of Management Consulting Engagements in Government
Review of Audit of Management Consulting Engagements in Government (2000/01 Report 4), as well as the 2002 and 2003 follow-ups from the Office of the Auditor General of British Columbia.

 
 Competition in Content Distribution Raises Value of Creativity
Content distributors facing competition on all sides turn to creative types for product differentiation. Using stable cash flow, distributors might acquire creative teams. Instead, a risk mitigating joint venture alternative is proposed.

 
 Adaptive Management
How the Forest Practices Branch of the British Columbia Ministry of Forests describes Adaptive Management, with links to further information about business process optimization.

 
 Macintosh Technical Support
Transparen provides remote technical support for Macintosh, Apple, iBook, and PowerBook.

 
 Linux-based Web Service Solutions
Integrate disparate systems using Web Services, where such services are available in other than XML.

 

Google
 
Web transparen.com

Contact Information

Related Information

   
 
E C M | © 2003-2007 Transparen Corp.      

Standardized Services: Data Recovery Service / Creative Services / Premium Web Hosting Services / System Administration Tech Support Services
Recent Projects: Full-Service Mortgage and Financing Company / System to manage flights from Vancouver to Tofino / Photo exchange verification service
Our Vancouver BC Server Proudly Hosts: automated parking and revenue control systems, leafside lane at southlands, cost effective alternative power sources, the photo genie, pacific forage bag supply, sunburst medical, neosonic design, roger mahler photography - passionate, intriguing, desirable, the connection between east and west, affordable flights to victoria and tofino, low interest mortgage brokers in vancouver, richmond, surrey, toronto, mortgage brokers in calgary
* Alberton * Ascension * Belfast * Borden * Breadalbane * Cavendish * Central Bedeque * Charlottetown * Cornwall * Emerald Junction * Foxley River * Freeland * Georgetown * Hampton * Hebron * Hunter River * Kensington * Knutsford * Miminegash * Montague * Morell * Mt. Stewart * Murray Harbour * Murray River * North Cape * North Rustico * O'Leary * Orwell * Souris * Stanhope * Stratford * Summerside * Tignish * Tyne Valley * Avonlea