Since i first saw mention of it by Jose Solorzano (in his prospect#180, now closed at https://www.kaggle.com/c/cir-prospect i guess?), it has been clear that “against” contributions would be a critical feature to capture. looking at the FEC’s descriptions at http://www.fec.gov/finance/disclosure/metadata/DataDictionaryTransactionTypeCodes.shtml lists 24A and 24N as explicitly “against.”
The fact that these seem to be in constant flux (cf. http://www.fec.gov/blog/disclosure/entry/four_4_new_transaction_type, 20 Aug 12) doesn’t inspire confidence, either!
Brandon, i see you’ve include 22Y as well; why is that?
Another thing i noticed is that many of the amount fields are negative?! These seem to be focused on particular transaction types:
PType Total $Amt
24E $269209
24A $27756
24C 416764
24K $6468806
24Z $3595
(see lines 540-543 in pyfec.py for details.) so, for now, my code focuses on trans types 24A, 24N and also treats negative amounts as” against.” can we do better?
some of the candidates died, dropped out, or for some other reason, refunded contributions (22Y, “CONTRIBUTION REFUND TO INDIVIDUAL” & 22Z, “CONTRIBUTION REFUND TO CANDIDATE/COMMITTEE”). I was only interested in seeing who backed a candidate, not really caring about the money flow after the initial donation. The loans were weird territory that I probably should have eliminated, but there are so few of them that they really sink to the bottom of the data. The majority of contributions are #15. There are a lot of earmarked donations, too, which could be interesting to look at.