Wednesday, April 09, 2008

Daniel Day-Lewis, the top performance of 2007

Top 100+ performance on big screen in 2007 is

  1. Daniel Day-Lewis in "There Will Be Blood (2007/08)"
  2. Casey Affleck in "The Assassination of Jesse James by the Coward Robert Ford (2007)"
  3. Marion Cotillard in "La Vie en Rose (2007)"
  4. Carice van Houten in "Black Book (2006/07)"
  5. Anamaria Marinca in "4 Months, 3 Weeks, 2 Days (2007)"
  6. Brad Pitt in "The Assassination of Jesse James by the Coward Robert Ford (2007)"
  7. Javier Bardem in "No Country for Old Men (2007)"
  8. Ulrich Mühe in "The Lives of Others (2006/07)"
  9. Tang Wei in "Lust, Caution (2007)"
  10. Amy Ryan in "Gone Baby Gone (2007)"
Complete list...

Friday, April 04, 2008

4-states state machine for CSV parsing

Parsing CSV file is easy, it's nothing but splitting string with comma delimiter, which can be easily done in Java... The first thing came to my mind when I'm about to parse CSV file in Java is just like that. Now, reality is that following examples are all possible valid lines in a CSV file
  • 1,Bender
  • 2,"Bender"
  • 3,"Bender, Bending"
  • 4,"Ben""d""er"
  • 5, Ben"der
  • 6, Ben""der
Line 7 might be arguable but anyway, two basic rules are
  • If there's comma in field, use double quot to wrap field, otherwise double quot wrapper isn't required.
  • Inside double quot, double quot is used to escape double quot.
Suddenly the problem is complicated to something more than string splitting, however it can be simplified into a finite state machine with 4 states.

States:
  • 1. Ready for new field (initial state)
  • 2. Field without double quot
  • 3. Field with double quot
  • 4. Escaping or end of double quot
Transitions

*Direction*|*Condition*|*Action*
1->2 |not(" or ,)|Append character to buffer
1->3 |" |Nothing
2->2 |not , |append character to field
1|2|4->1 |, |Output complete field and create buffer for next field
3->3 |not " |Append character to buffer
3->4 |" |Nothing