SS Showing more False Negatives

Hey -

I’ve begun to see more false positives with SS. here is one i got today -


Return-Path: <blahblah>
Delivered-To: blahblah
Received: (qmail 11460 invoked by uid 89); 2 May 2007 16:45:42 -0000
Mailing-List: contact blahblah; run by ezmlm
Precedence: bulk
X-No-Archive: yes
List-Post: <mailto:blahblah>
List-Help: <mailto:blahblah>
List-Unsubscribe: <mailto:blahblah>
List-Subscribe: <mailto:blahblah>
Delivered-To: mailing list blahblah
Received: (qmail 11453 invoked by uid 0); 2 May 2007 16:45:42 -0000
X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on yt.23i.net
X-Spam-Level: *
X-Spam-Status: No, score=1.2 required=5.0 tests=BAYES_50,HTML_90_100,
	HTML_MESSAGE,HTML_MIME_NO_HTML_TAG,MIME_HTML_ONLY autolearn=no version=3.1.8
X-Originating-IP: [502.506.16.83] 
X-Originating-Email: [blahblah] 
X-Sender: blahblah
Message-Id: <20070502090131.6682.qmail@poisoningness.furor.volia.net>
To: <blahblah>
From: Natasha <blahblah>
MIME-Version: 1.0
Importance: High
Content-Type: text/html
Subject: [systems] Are you ready for a change in the way you date?

<style>

/Send/Fabulous/neyvxl/north/BHG/Hypertension/attach/customer/recently/1171/Others/telecommunications/Gora/milestone/currently/1408/Readings/fiber<83>..they/public/things/annat/choice/

/4x6/message/oigf3/has/630/PFS/period/Sarah/bombs/bannerwaving/outfallscould/Anhang/1923/Portable/appeals/forecast/4320/CenterSecurity/BHG/Stationary/800F/Fibromyalgia/expedite/

/Bebo/E<83>..04/update/doublefaced/4320/modifier/<83>..junk/range/Meeting/frequent/journalism/gusto/faulty/expectancy/tres/preferences/sells/

/Meeting/Herbs/harp/both/Gingrich/communist/prosperity/<83>..All/2<83>..40/coordinated/5000<83>..9999/car/1075/Andrea/pouvons/inactivate/1980<83>..1985/Jones/Judith/ishow/Booking/rely/clip/GL2/PFS/gather/bike/Bulletins/10<83>..100/Warm/Hayes/

/forward/7000/Vietnamese/outcomes/chair/Newsweek/gear/BHG/paint/Shalom/thus/duplicated/clear/curtain/base/killings/Andrew/kttpjvpzi/Select/horizontal/Cello/Bangor/Purdue/slink/Fish/1/18<83>..4/scarred/

/vfrtyk/information<83>..as/rmxprzkf/Buys/Banks/compact/engines/ratings/Shanghai/Hoop/itin/mzibcwnyhs/portrays/uiteindelijk/nerves/twoyear/

/fishwater/practices/oigf3/Mexico/<83>..junk/voor/running/Math/Note/Members<83>..two/ASAP/Members/343352/Watch/considered/all/Gill/SPS/beurt/close/SANS/fiber<83>..they/1012/1014/165166/raquo/Judy/operating/Convention/Marc../graphics/

/Nix/<83>..delete/bleeding/asked/closer/Oy/155<83>..157/email/seguros/forces/Iranian/mejores/<83>..brain/growing/10<83>..20/Another/penalties/1440/wings/klikken/kicked/students/recycling/storiesit/founding/

/PFS/reat/Phoenix/Cerro/coldstress/invested/All/wallpaper/copies/81<83>..87/27<83>..29/tightened/Jamie/Signup/Pensacola/Pinnacle/1440/favorite/Miriam/delayed/faulty/roles/konnen/sign/betaalde/fishwater/

/hike/tract/May<83>..July/do<83>..on/273/demand/1428/stability/343352/1030/15/confiance/kljmn/andern/update/debated/Stephanie/630/again<83>..not/kljmn/

/hurt/2<83>..10/Virginia/bgcolor/Associated/271279/kljmn/Contacts/12<83>..28/E<83>..03/UPDATING/Booking/thwarted/lowpower/PFS/strategies/founding/digital/1980/pullout/desktop/monetary/lifetime/it<83>..s/argue/on/kunna/Maiden/populations/changes/
</style><a target="_blank"  href="http://xav.ho.com.ua/images/index.htm" ><img src="http://caravela.md/images/1.jpg">
<style>
/9394/Wolf/Grieving/rose/eunzdf/16th/Fibromyalgia/<83>..<83>..live/recycling/MICHELINE/Servicio/Cameroon/chances/mailreleasedate/delayed/Transform/577/realization/Schilling/currently/mailversion/in/Introduction/..ndern/

/Prefs/sie/PFS/masculine/oigf3/dcloapqppokz/opening<83>..in/surrounded/3032/again<83>..not/actions/repository/trading/bulbs/20013/pointed/decisionmaking/11<83>..13/<83>..growing/Font/19<83>..28/newsid/if/besonders/oigf3/explanation/Janet/100<83>..10/countriesfrom/

/Wolf/headersender/bully/savor/pouvez/rum/2<83>..40/furniture/oigf3/beautiful/ASAP/choice/James/candidacy/Sharpe/uses/headersender/information/outofbody/stout/delayed/

/digest/Prefs/treat/programs/oigf3/Stones/1957<83>..2007/whateverwhen/361<83>..372/Technica/PFS/Stephanie/likely/deliverlockattempt/mailowner/trampoline/kommunikation/PFS/begin/cover/From/ahuuonn/calms/portrays/

/Enter/Applied/events/cdpwgemaqzqt/Cold/4314/explaining/favorite/15<83>..19/login/hasn/recently/detailor/realizar/winstonsalem/push/uses/

/indulgences/682/qvwihlxot/author/instant/count/againnot/roles/laqjzfv/oigf3/397<83>..411/15<83>..29/comprendre/Weather/colony/initials/Saint/yrynvwq/HOPE/Organiser/figure/healthy/used/citrus/niveaux/Automakers/Related/yearsin/

/Byrd/3<83>..30/example<83>..or/available/sustainable/planning<83>..has/Bill/joined/glad/Speculative/waarom/AIDS/treat/3438/64<83>..66/1<83>..14/Harris/colon/4041/Pacing/dwellinghouse/harp/Raid/ElizabethHurley/topic/update/Fairmont/plastic/butoddlynot/

/privileged/sunny/Brushed/152<83>..155/kljmn/extending/..ver/Bangor/1950s/ffuyhlp/populationthose/Cameroon/mbrxcftexmf/teenage/1058/speech/frontpage/differentit/delete/BHG/radio/Southey/logo/downloaden/Cookies/beste/Paulo/Grenadines/

/regards/dialectsin/Schilling/Snack/PFS/attempt/during/Losses/bumping/34<83>..38/option/200300/clicking/Scottsdale/skull/landing/regime/penalties/

/acclaimed/Richmond/change/alert/zpxii/performances/catastrophic/vxavgkweve/standing/BHG/oigf3/makes/tricky/weekdays/QUAD/denen/strategically/1923/Source/<83>..American/topical/transform/topnotch/SETTINGS/acclaimed/besonders/93<83>..94/masculine/colon/

/inside/James/RealMedia/pouvons/ALERTS/dayofweek/environmentalists/MayJuly/Jeanette/worstcase/clear/fouryearolds/slim/SETTINGS/participate/stew<83>..at/controversy/

/Did/coercive/Wool/could/10<83>..1<83>..101/semana/instant/2007/stripes/Peter/49<83>..60/flowed/pose/irresistible/baby/coastal/color/eyccoozt/Dich/Santos/classic/hbnbgxige/stripes/refills/secure/amnesty/

/Damages/Simone/fitting/headeraddresstoken/barely/Robert/Strom/agonizing/Signup/1415/line/8000/subsidies/ACC/crit/SANS/greatbut/chart/applies/2009/vauvbh/

/300/Kharkov/antibiotics/12<83>..14/production/vacation/ANDERSSON/mucha/Unsere/front/Dow/Nathan/savor/Members<83>..two/simulator/104/evening/1428/issues/coercive/reimbursement/kljmn/Spanish/bully/food/Top/Cameroon/mysterious/Grieving/

/40<83>..41/Interestingly/Grenadines/yearfemale/experiences/automatically/participate/datevalue/PFS/seating/outside/simulator/candidacy/subscribed/Source/Lemons/Wednesday/wings/expanding/Caradoc/Jobs/wyrzr/PFS/written/oigf3/itmay/bill/peers/aqwtoxowl/color/

/variety/Uzbekistan/E04/Jeanette/BHG/bike/collapsed/80s/Grill/vous/itbut/Brothers/15100/lot/carrots/mailreleasedate/BHG/dyes/

/week/predators/xxraq/slight/seamless/Dirk/Safety/warmwater/tire/currency/bill<83>..how/80s/index03/4x6/delete/<83>..Meeting/1030/magazine<83>..this/till/besonders/Nassau<83>..it/

/break/100<83>../Ph/lastModified/teenage/281<83>..287/Tim/161162/serif/exits/Golden/meeting/kicked/PFS/Toby/PFS/

/classic/10<83>..20/Cardmember/pointed/automatisch/whistler/Brushed/higher/Nassau<83>..it/springhouse/eopeywe/available/Send/175<83>..185/Xeon/link/1415/great/go/surgical/

/4300/rdquo/20013/applying/1027/1969/experiences/HERMAN/New/sie/sonic/gamespot/sonic/protection/uses/it<83>..in/different<83>..it/stripes/limited/surged/systematic/Searching/trading/Wolf/hooks/American/track<83>../separate/

/Watch/Associated/commercials/Choosing/images<83>..all/brothers/alcohol/286140/BHG/closer/widget<83>..from/pouvez/161<83>..162/397411/mapscould/layer/20<83>..100/

/newest/1870/repeatedly/Maidens/dining/1041/landmark/Consider/100<83>..200/millimeterabout/dknjincivsmx/Easton/Home/Huber/Natasha/2<83>..1/Metro/Grieving/kljmn/home/oigf3/Pacing/166167/1075/bulk/10<83>..30/substring/

</style>

Here’s my stats:


Filtered Mail
51,540 Good Messages
12,800 Spam Messages (20%)
120 Spam Messages Per Day

SpamSieve Accuracy
91 False Positives
53 False Negatives (37%)
99.8% Correct

Corpus
3,077 Good Messages
16,289 Spam Messages (84%)
424,382 Total Words

Rules
22,432 Blocklist Rules
1,501 Whitelist Rules

Showing Statistics Since
16/01/2007 03:48

(the corpus is so big as i had 16K spam emails ready to feed into it ;))

not sure what more info i can give you right away- but definitely would be good to cut out these FPs - beginning to destroy my SS confidence :wink:

I would be interested to see the “Predicted: Spam” entry for this message from the log, so that I can see why SpamSieve thought it was spam. More generally, having such a large corpus with 84% spam is probably not good for SpamSieve’s accuracy. You could try resetting SpamSieve’s corpus and re-training it with the recommended number and ratio of messages.

Michael,

I meant False Negative. sorry. it didn’t get labelled as spam. I appreciate the corpus is large, however i’ve also very rarely (till now) had spam in my inbox.

here’s from the log–


Predicted: Good (1)
Subject: [systems] Are you ready for a change in the way you date?
From: Natasha <systems@23i.net>
Identifier: shYIRFej3+mGY9MTNaAGSg==
Reason: ("systems@23i.net") matched rule <From (address) Is Equal to "systems@23i.net"> in SpamSieve whitelist
Date: 2007-05-02 17:01:35 +0100
=====================================================================
Trained: Good (Auto)
Subject: [systems] Are you ready for a change in the way you date?
Identifier: shYIRFej3+mGY9MTNaAGSg==
Actions: added to Good corpus (3077)
Date: 2007-05-02 17:01:35 +0100
=====================================================================
Trained: Spam (Manual)
Subject: [systems] Are you ready for a change in the way you date?
Identifier: shYIRFej3+mGY9MTNaAGSg==
Actions: disabled rule <From (address) Is Equal to "systems@23i.net"> in SpamSieve whitelist, added to Spam corpus (16289), removed from Good corpus (3076)
Date: 2007-05-02 17:02:09 +0100
=====================================================================
Mistake: False Negative
Subject: [systems] Are you ready for a change in the way you date?
Identifier: shYIRFej3+mGY9MTNaAGSg==
Classifier: Whitelist
Score: 1
Date: 2007-05-02 17:02:14 +0100

so i see that problem is resolved… i’ll keep an eye on it though.

further bad false negatives -


Predicted: Good (27)
Subject: Important News!!! 'Rates-below 4.7%
From: Christfam@resultcompute.net
Identifier: HcwUKc8cLg+BW1uQz0sVyA==
Reason: P(spam)=0.052[0.498], bias=0.000, R:^212^158(0.001), ^fg-585858(0.998), ^a-width-655(0.998), ^a-width-655(0.998), S:4.7%(0.005), X:URIBL-SBL(0.994), ^comment-2(0.041), X:HTML-IMAGE-ONLY-24(0.957), viewing(0.056), ^a-height-51(0.059), U:^long-G-30(0.073), U:gif(0.126), U:gif(0.126), properly(0.126), I:gif(0.128)
Date: 2007-05-10 17:28:03 +0100
=====================================================================
Trained: Good (Auto)
Subject: Important News!!! 'Rates-below 4.7%
Identifier: HcwUKc8cLg+BW1uQz0sVyA==
Actions: added rule <From (address) Is Equal to "Christfam@resultcompute.net"> to SpamSieve whitelist, added rule <From (name) Is Equal to "Christian_L 0 A N S!"> to SpamSieve whitelist, added to Good corpus (5056)
Date: 2007-05-10 17:28:03 +0100
=====================================================================
Predicted: Good (25)
Subject: Find out more about One of the most exciting projects in the Caribbean
From: DominicanLandSales@solutionchild.com
Identifier: BIhlWLGY0uvOaMzGGvg6pQ==
Reason: P(spam)=0.000[0.462], bias=0.000, F:Land(0.999), F:Sales(0.002), R:^212^24(0.998), U:x57pTRd(0.005), U:Jgy71tU(0.005), I:AweR7gfv(0.005), U:Jgy71tU(0.005), I:Jgy71tU(0.005), I:Jgy71tU(0.005), I:x57pTRd(0.005), U:AweR7gfv(0.005), U:nmdsnmqq(0.005), X:URIBL-SBL(0.993), ^comment-2(0.041), X:HTML-IMAGE-ONLY-24(0.953)
Date: 2007-05-10 17:42:11 +0100
=====================================================================
Trained: Good (Auto)
Subject: Find out more about One of the most exciting projects in the Caribbean
Identifier: BIhlWLGY0uvOaMzGGvg6pQ==
Actions: added rule <From (address) Is Equal to "DominicanLandSales@solutionchild.com"> to SpamSieve whitelist, added rule <From (name) Is Equal to "Dominican Land Sales"> to SpamSieve whitelist, added to Good corpus (5057)
Date: 2007-05-10 17:42:11 +0100
=====================================================================
Predicted: Good (26)
Subject: Schedule Your No Cost LASIK Vision Exam Today
From: LasikPlusVisionCenters@solutionfresh.net
Identifier: SpuHhmUVY0ZDTnKA4H0qxA==
Reason: P(spam)=0.000[0.484], bias=0.000, S:Cost(0.999), R:^212^158(0.001), I:AweR7gfv(0.002), I:x57pTRd(0.002), U:Jgy71tU(0.002), U:Jgy71tU(0.002), U:x57pTRd(0.002), U:nmdsnmqq(0.002), U:AweR7gfv(0.002), F:Doctor(0.002), I:Jgy71tU(0.002), I:Jgy71tU(0.002), S:Vision(0.998), X:URIBL-SBL(0.993), X:HTML-SHORT-LINK-IMG-3(0.989)
Date: 2007-05-10 17:42:12 +0100
=====================================================================
Trained: Good (Auto)
Subject: Schedule Your No Cost LASIK Vision Exam Today
Identifier: SpuHhmUVY0ZDTnKA4H0qxA==
Actions: added rule <From (address) Is Equal to "LasikPlusVisionCenters@solutionfresh.net"> to SpamSieve whitelist, added rule <From (name) Is Equal to "Eye Doctor"> to SpamSieve whitelist, added to Good corpus (5058)
Date: 2007-05-10 17:42:12 +0100


raw code for the lasik email –


Return-Path: <LasikPlusVisionCenters@solutionfresh.net>
Delivered-To: me
Received: (qmail 6929 invoked by uid 0); 10 May 2007 17:21:03 -0000
Received: by simscan 1.2.0 ppid: 6924, pid: 6925, t: 1.6763s
         scanners: clamav: 0.88.2/m:39/d:1524 spam: 3.0.6
X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on yt.23i.net
X-Spam-Level: ***
X-Spam-Status: No, score=3.6 required=5.0 tests=BAYES_20,FORGED_RCVD_HELO,
	HTML_90_100,HTML_IMAGE_ONLY_20,HTML_IMAGE_RATIO_02,HTML_MESSAGE,
	HTML_SHORT_LINK_IMG_3,MIME_HTML_ONLY,URIBL_SBL autolearn=no version=3.1.8
Received: from unknown (HELO solutionfresh.net) (212.158.162.172)
  by mail.23i.net with SMTP; 10 May 2007 17:21:01 -0000
Return-path: <LasikPlusVisionCenters@solutionfresh.net>
Received: by solutionfresh.net (Postfix from userid 17071)
    id CD8F446AD4084; 10 May 2007 12:35:36 -0400
Errors-to: LasikPlusVisionCenters@solutionfresh.net
Message-Id: <20070510123536.CD8F446AD4084@solutionfresh.net>
Date: Thu, 10 May 2007 12:35:36 -0400
From: "Eye Doctor" <LasikPlusVisionCenters@solutionfresh.net>
To: <james@imajes.info>
Precedence: normal
Subject: Schedule Your No Cost LASIK Vision Exam Today
Mime-Version: 1.0
Content-Type: text/html; charset="iso-8859-1"

<html>
<head>
<meta HTTP-EQUIV="Content-Type" content="text/html; charset=iso-8859-1">
<title>Email Message</title>
</head>
<body leftmargin="0" topmargin="0" marginwidth="0" marginheight="0">
<IMG SRC="http://solutionfresh.net/Jgy71tU/AweR7gfv.gif" WIDTH="655" HEIGHT="24" BORDER="0" ALT="">
<FONT face="Verdana, Arial, Helvetica, sans-serif" size=1 color="585858">
<p>
Having trouble viewing this email properly, <a href="http://solutionfresh.net/nfospal?BJdUpdbRCFQQNc1QMy4QWH5BKOGUzkpBu30VlHCEilJB"target="_blank">please click here.</a>
</p>

<center>
<TABLE cellSpacing=0 cellPadding=0 align=center border=0>
   <TR>
    <TD><A href="http://solutionfresh.net/nfospal?BJdUpdbRCFQQNc1QMy4QWH5BKOGUzkpBu30VlHCEilJB"><IMG  src="http://solutionfresh.net/ghbvIERMDbdf357dfaj_df7UVD/hfKJGSD354zsdf_ksad1KSDFH.gif"   border=0></A></TD>
   </TR>
  <TR>
    <TD><A href="http://solutionfresh.net/nfospal?BJdUpdbRCFQQNc1QMy4QWH5BKOGUzkpBu30VlHCEilJB"><IMG  src="http://solutionfresh.net/ghbvIERMDbdf357dfaj_df7UVD/hfKJGSD354zsdf_ksad2KSDFH.gif"   border=0></A></TD>
   </TR>
  <TR>
    <TD><A href="http://solutionfresh.net/nfospal?LMtUcrvQmjRAP08E30WAUNAQmMyApRCB5tgFaqaEmZnA"><IMG  src="http://solutionfresh.net/ghbvIERMDbdf357dfaj_df7UVD/hfKJGSD354zsdf_ksad3KSDFH.gif"   border=0></A></TD>
   </TR>
  </TABLE>
</center>
<br>

<ul style="list-style-image:url(http://solutionfresh.net/mfsdmab?BJdUpdbRCFQQNc1QMy4QWH5BKOGUzkpBu30VlHCEilJB);color:white"><li></ul>
<LINK href="http://solutionfresh.net/mesapsd?BJdUpdbRCFQQNc1QMy4QWH5BKOGUzkpBu30VlHCEilJB" type=text/css rel=STYLESHEET>
<P>
<A href="http://solutionfresh.net/nmdsnmqq?BJdUpdbRCFQQNc1QMy4QWH5BKOGUzkpBu30VlHCEilJB"><IMG SRC="http://solutionfresh.net/Jgy71tU/x57pTRd.gif" WIDTH="655" HEIGHT="51" BORDER="0" ALT=""></a>
</p>
<!--James-->
</body>
</html>

i appreciate how that’s pretty hard to figure out, other than a blacklist entry, but still…

thoughts?

Please try resetting SpamSieve’s corpus and re-training it. If you have further problems, please e-mail me the log and false negatives files.