Skip to main content

4-states state machine for CSV parsing

Parsing CSV file is easy, it's nothing but splitting string with comma delimiter, which can be easily done in Java... The first thing came to my mind when I'm about to parse CSV file in Java is just like that. Now, reality is that following examples are all possible valid lines in a CSV file
  • 1,Bender
  • 2,"Bender"
  • 3,"Bender, Bending"
  • 4,"Ben""d""er"
  • 5, Ben"der
  • 6, Ben""der
Line 7 might be arguable but anyway, two basic rules are
  • If there's comma in field, use double quot to wrap field, otherwise double quot wrapper isn't required.
  • Inside double quot, double quot is used to escape double quot.
Suddenly the problem is complicated to something more than string splitting, however it can be simplified into a finite state machine with 4 states.

States:
  • 1. Ready for new field (initial state)
  • 2. Field without double quot
  • 3. Field with double quot
  • 4. Escaping or end of double quot
Transitions

*Direction*|*Condition*|*Action*
1->2 |not(" or ,)|Append character to buffer
1->3 |" |Nothing
2->2 |not , |append character to field
1|2|4->1 |, |Output complete field and create buffer for next field
3->3 |not " |Append character to buffer
3->4 |" |Nothing

Comments

Popular posts from this blog

Spring, Angular and other reasons I like and hate Bazel at the same time

For several weeks I've been trying to put together an Angular application served Java Spring MVC web server in Bazel. I've seen the Java, Angular combination works well in Google, and given the popularity of Java, I want get it to work with open source. How hard can it be to run arguably the best JS framework on a server in probably the most popular server-side language with  the mono-repo of planet-scale ? The rest of this post walks through the headaches and nightmares I had to get things to work but if you are just here to look for a working example, github/jiaqi/angular-on-java is all you need. https://github.com/jiaqi/angular-on-java Java web application with Appengine rule Surprisingly there isn't an official way of building Java web application in Bazel, the closest thing is the Appengine rule  and Spring MVC seems to work well with it. 3 Java classes, a JSP and an appengine.xml was all I need. At this point, the server starts well but I got "No ...

Customize IdGenerator in JPA, gap between Hibernate and JPA annotations

JPA annotation is like a subset of Hibernate annotation, this means people will find something available in Hibernate missing in JPA. One of the important missing features in JPA is customized ID generator. JPA doesn't provide an approach for developer to plug in their own IdGenerator. For example, if you want the primary key of a table to be BigInteger coming from sequence, JPA will be out of solution. Assume you don't mind the mixture of Hibernate and JPA Annotation and your JPA provider is Hibernate, which is mostly the case, a solution before JPA starts introducing new Annotation is, to replace JPA @SequenceGenerator with Hibernate @GenericGenerator. Now, let the code talk. /** * Ordinary JPA sequence. * If the Long is changed into BigInteger, * there will be runtime error complaining about the type of primary key */ @Id @Column(name = "id", precision = 12) @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "XyzIdGenerator") @SequenceGe...

Project Euler problem 220 - Heighway Dragon

This document goes through a Java solution for Project Euler problem 220 . If you want to achieve the pleasure of solving the unfamiliarity and you don't have a solution yet, PLEASE STOP READING UNTIL YOU FIND A SOLUTION. Problem 220 is to tell the coordinate after a given large number of steps in a Dragon Curve . The first thing came to my mind, is to DFS traverse a 50 level tree by 10^12 steps, during which it keeps track of a direction and a coordinate. Roughly estimate, this solution takes a 50 level recursion, which isn't horrible, and 10^12 switch/case calls. Written by a lazy and irresponsible Java engineer, this solution vaguely looks like: Traveler traveler = new Traveler(new Coordinate(0, 0), Direction.UP); void main() { try { traverse("Fa", 0); } catch (TerminationSignal signal) { print signal; } } void traverse(String plan, int level) { foreach(char c:plan) { switch(c) { case 'F': traveler.stepForward(); break; ca...