Nabin K. Malakar, Ph.D.

NASA JPL
I am a computational physicist working on societal applications of machine-learning techniques.

Research Links

My research interests span multi-disciplinary fields involving Societal applications of Machine Learning, Decision-theoretic approach to automated Experimental Design, Bayesian statistical data analysis and signal processing.

Linkedin


Interested about the picture? Autonomous experimental design allows us to answer the question of where to take the measurements. More about it is here...

Hobbies

I addition to the research, I also like to hike, bike, read and play with water color.

Thanks for the visit. Please feel free to visit my Weblogs.

Welcome to nabinkm.com. Please visit again.

Wednesday, February 27, 2008

Gmail Captcha Broken by Spammers!

CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) challenge-response systems, which prevents automatic creation of accounts/ or automatic posting of messages. It involves a user (human) to correctly identify letters/digits in the form of an image. These are designed to ensure requests are made by a human rather than an automated program/software. The technique has been used to defeat automatic sign-ups to email accounts by services including Yahoo! Mail and Gmail, and has been the nail-biting challenges for hackers.

Recently, I got the news that Spammers have broken the system at Gmail. Recently the success of cracking the Windows Live captcha used by Hotmail was also reported. If they keep being successful at it, then we will be having a huge percentage rise in spam. The main worries are being the reason that nearly no spam blocker will identify and blacklist it as “spam”.

Internet security firm Websense reported bots have been created which are capable of signing up and creating random Gmail accounts for spamming purposes, defeating Captcha-based defences in the process.

Websense considers the latest Gmail Captcha hack to be the most sophisticated one it has seen to date. Live Mail Captcha breaking involved just one zombie host doing the entire job, the Gmail breaking process involves two hosts. One to try, and another to monitor the success. The two compromised hosts applies a slightly different technique to analyse Captcha.

They have reported that only one in every five Captcha-breaking attempts is successful. It seems to be low, but that's more if we consider millions of automated attacks.

Report:

http://www.websense.com/securitylabs/blog/blog.php?BlogID=174

CAPTCHA:

http://www.answers.com/captcha?cat=technology&gwp=13


Links:

http://www.codinghorror.com/blog/archives/001067.html


Wednesday, February 13, 2008

Bayesian Spam filtering

This is from the blog: Time!

http://ajabgajab.blogspot.com/2007/07/bayesian-spam-filtering.html



Imagine a situation: you are receiving more than hundred Emails. You have to read each of them and classify whether it is good or bad one, for your BOSS.
Stressed? It seems like a stupid question because , now a days, it is not the real situation , right?
You see that there are two folders: Inbox and Bulk(spam). And one more called trash.
Life is so easy!
Eventually there are some, which make the way through.
How is that possible?

If you are receiving less spam, it is the boon of Bayesian Spam Filtering.
It works by learning.
Exactly like we would do, If we face the first condition,
We would start classifying the mails according to its contents and some key words provided by the boss. If confused, with some new situation, feel free to ask the boss. Sub-consciously we will be attributing the spam coefficient to each mail, and finally, to decide whether the Email is Spam or not.
Therefore there will be some training data (lets say) to begin with. Each time we classify the Email, we will become expert so as to classify whether the mail is spam or not. After gaining enough expertise, there are no spams for your BOSS to read (sounds ambitious) . He is also happy that he has to train less and less to classify the incoming mails, as you are gaining the expertise on the environment.
In contrast, you are smart enough, not to mark it spam just by only seeing some key-words used to mark spam. It is the overall mail that will affect your decision. Am I right?
Suppose you change the office, say from management to health. The nature of mails are very different. For example, the term “Pills” may not be spam anymore! While a very good proposal “invitation to join business partnership from africa” is likely to be a spam. If you mark it with the training gained in previous office, you are in trouble!
It would be advantageous for your boss to read all the Emails (including spams) himself than to loose a single (but important) mail.
The advantage of Bayesian spam filtering is that it gets customized with user and the coefficient of spamness differs from user to user.
Well, watch the situation from the eyes of a spammer! You will clearly see the difficulties to spam the mail box. You would be forced to think!
HOW TO SPAM? Some people just can not sleep without spamming.
Because, even if you are able to get through, the way you found will work only once, there is no next chance through the same door. If marked spam (training), there will be no way to that trick for the next time .
Learning makes it possible.
Useful readings:
 
I am highly inspired by:
http://www.paulgraham.com/spam.html
and listening to Prof. Kevin Knuth, Prof. Carlos Rodriguez, Adom Giffin and Roger Pink.
Recommended texts:
 Data Analysis: A Bayesian TutorialBayesian Ideas and Data Analysis: An Introduction for Scientists and Statisticians (Chapman & Hall/CRC Texts in Statistical Science)Bayesian Logical Data Analysis for the Physical Sciences: A Comparative Approach with Mathematica® Support

Sunday, February 3, 2008

Wikipedia: A reliable source of information?

Do you think that wikipedia is the reliable source of information?

Yes (74%)
No(23%)
Don't care(2%)

Total vote: 47.

This is the result of the poll conducted in my blog: Time! (http://ajabgajab.blogspot.com)

The key question over here is what kind of information do we need? And what is the meaning of the reliability ?
Wikipedia is, of course, not the source of reliable news. Even some of the history chapters may have been biased.
I was reading some news on wikipedia that there had been two cases where the death of the peoples had been posted before they were actually killed. Similarly, when I was seeking the news on indian Idol, people would work hard to keep changing the names of the participant who have been voted out, I think they amused themselves by doing that.
When Reliability is the key question, I would wait and see.
However, when we do search, there are wikipedia results. For me I do peek into the results implied by the wikipedia link.
Answers.com has been nice place for me when I try to see definitive answers. And, most of the time I have been looking for some definitive results. There are of course Wikipedia results.