Nabin K. Malakar, Ph.D.

NASA JPL
I am a computational physicist working on societal applications of machine-learning techniques.

Research Links

My research interests span multi-disciplinary fields involving Societal applications of Machine Learning, Decision-theoretic approach to automated Experimental Design, Bayesian statistical data analysis and signal processing.

Linkedin


Interested about the picture? Autonomous experimental design allows us to answer the question of where to take the measurements. More about it is here...

Hobbies

I addition to the research, I also like to hike, bike, read and play with water color.

Thanks for the visit. Please feel free to visit my Weblogs.

Welcome to nabinkm.com. Please visit again.

Showing posts with label bayes. Show all posts
Showing posts with label bayes. Show all posts

Friday, September 27, 2013

Monty Hall’s #paradox

Consider an urn , with 15 blue balls, and 10 red balls, and an urn , with 10 blue balls, and 15 red balls. We select randomly one urn (with probability 50% for each urn).
We draw a ball, which turns out to be blue, and we put it back in the urn, Now, we draw a (second) ball. What is the probability that this (second) ball is blue?


Solution after the break!

Thursday, August 5, 2010

Bayes' Theorem in \LATEX

I am learning Latex. Using Texnic Center and other similar softwares. I wish that Kile was available on Windows...
Anyways, everytime I start a new file, I have to search for the barebone of the file which needs to be there before anything can be done.
So, here I am collecting some skeleton for latex files:
I think they are free from any (c).


\documentclass[a4paper,11pt]{article}
\usepackage{amsmath} % need for subequations
\usepackage{hyperref} % use for hypertext links, including those to external documents and URLs
\title{Your Title Here}
\author{Your Name here \thanks{Email: adf@gmail.com}
\\University}
\begin{document}
\maketitle
\begin{abstract}
Abstract goes here...
\end{abstract}
\tableofcontents{} % comment: just in case... it can be commented
\section{One}
Here we start...
Oh, why not start by writing Bayes Theorem in Latex ?
... we use Bayesian method to infer the model parameters in question. We learn from the available data. The process of arriving at the posterior from the prior in the light of given data can be accomplished by using Bayes' theorem.
As a general statement, we can state Baye's theorem as follows\\
\begin{equation}
\label{eq:bayes}
P(\theta|\textbf{D}) = P(\theta ) \frac{P(\textbf{D} |\theta)}{P(\textbf{D})} ~~~~~|| I,
\end{equation}
where we have adopted Skilling--Gull convention of writing $I$ as the generally accepted term in the conditionals. The data are represented by \textbf{D} and parameters are represented by $\theta$.
\end{document}








Thanks to the Blogger platform which do not convert latex command into symbols. (That was a satire :P )
Note to self: I believe I have seen Gull using the conditional out of bracket... where, where ???

Monday, May 10, 2010

Nested Sampling Algorithm (John Skilling)

Nested Sampling was developed by John Skilling (http://www.inference.phy.cam.ac.uk/bayesys/box/nested.pdf // http://ba.stat.cmu.edu/journal/2006/vol01/issue04/skilling.pdf).

Nested Sampling is a modified Markov Chain Monte Carlo algorithm which can be used to explore the posterior probability for the given model. The power of Nested Sampling algorithm lies in the fact that it is designed to compute both the mean posterior probability as well as the Evidence. The algorithm is initialized by randomly taking samples from the prior. The algorithm contracts the distribution of samples around high likelihood regions by discarding the sample with the least likelihood, Lworst.
To keep the number of samples constant, another sample is chosen at random and duplicated. This sample is then randomized by taking Markov chain Monte Carlo steps subject to a hard constraint so that its move is accepted only if the new likelihood is greater than the new threshold, L > Lworst. This ensures that the distribution of samples remains uniformly distributed and that new samples have likelihoods greater than the current likelihood threshold. This process is iterated until the convergence. The logarithm of the evidence is given by the area of the sorted log likelihood as a function of prior mass. When the algorithm has converged one can compute the mean parameter values as well as the log evidence.
Data Analysis: A Bayesian Tutorial
For a nice description of Nested Sampling, the book by Sivia and Skilling is highly recommended: Data Analysis: A Bayesian Tutorial.
The  codes in C/python/R with an example of light house problem is available at:
http://www.inference.phy.cam.ac.uk/bayesys/
The paper is available at:
http://www.inference.phy.cam.ac.uk/bayesys/nest.ps.gz

Wednesday, February 13, 2008

Bayesian Spam filtering

This is from the blog: Time!

http://ajabgajab.blogspot.com/2007/07/bayesian-spam-filtering.html



Imagine a situation: you are receiving more than hundred Emails. You have to read each of them and classify whether it is good or bad one, for your BOSS.
Stressed? It seems like a stupid question because , now a days, it is not the real situation , right?
You see that there are two folders: Inbox and Bulk(spam). And one more called trash.
Life is so easy!
Eventually there are some, which make the way through.
How is that possible?

If you are receiving less spam, it is the boon of Bayesian Spam Filtering.
It works by learning.
Exactly like we would do, If we face the first condition,
We would start classifying the mails according to its contents and some key words provided by the boss. If confused, with some new situation, feel free to ask the boss. Sub-consciously we will be attributing the spam coefficient to each mail, and finally, to decide whether the Email is Spam or not.
Therefore there will be some training data (lets say) to begin with. Each time we classify the Email, we will become expert so as to classify whether the mail is spam or not. After gaining enough expertise, there are no spams for your BOSS to read (sounds ambitious) . He is also happy that he has to train less and less to classify the incoming mails, as you are gaining the expertise on the environment.
In contrast, you are smart enough, not to mark it spam just by only seeing some key-words used to mark spam. It is the overall mail that will affect your decision. Am I right?
Suppose you change the office, say from management to health. The nature of mails are very different. For example, the term “Pills” may not be spam anymore! While a very good proposal “invitation to join business partnership from africa” is likely to be a spam. If you mark it with the training gained in previous office, you are in trouble!
It would be advantageous for your boss to read all the Emails (including spams) himself than to loose a single (but important) mail.
The advantage of Bayesian spam filtering is that it gets customized with user and the coefficient of spamness differs from user to user.
Well, watch the situation from the eyes of a spammer! You will clearly see the difficulties to spam the mail box. You would be forced to think!
HOW TO SPAM? Some people just can not sleep without spamming.
Because, even if you are able to get through, the way you found will work only once, there is no next chance through the same door. If marked spam (training), there will be no way to that trick for the next time .
Learning makes it possible.
Useful readings:
 
I am highly inspired by:
http://www.paulgraham.com/spam.html
and listening to Prof. Kevin Knuth, Prof. Carlos Rodriguez, Adom Giffin and Roger Pink.
Recommended texts:
 Data Analysis: A Bayesian TutorialBayesian Ideas and Data Analysis: An Introduction for Scientists and Statisticians (Chapman & Hall/CRC Texts in Statistical Science)Bayesian Logical Data Analysis for the Physical Sciences: A Comparative Approach with Mathematica® Support