Sunday, February 16, 2020

Favorite quote

I spoke frankly, and I expected those around me to speak frankly. I fought for what I thought was best, and I wanted them to do so as well. When I thought someone did something stupid, I said so and I expected them to tell me when I did something stupid. Each of us would be better for it. To me, that was what strong and productive relationships looked like. Operating any other way would be unproductive and unethical.
- Ray Dalio, Principles

These words are so true and yet conflicting. People keep telling me otherwise, books and studies teach the secret of tricking emotion. I can't say the principle works everywhere, in fact it probably doesn't work in most places, but it certainly helps me stay true about myself.

One of the 14 Amazon leadership principles is to "disagree and commit". It's uncomfortable but like Ray Dalio said, each of us are better for it. I've seen it in Amazon and I'm sure BridgeWater is the same. Like it or not, it works.

Tuesday, December 17, 2019

History repeats itself because nobody was listening the first time

I like how Rick Houlihan starts with the statement, "History repeats itself because nobody was listening the first time". However I disagree because nobody ever listens or learns the lesson.

Monday, February 11, 2019

Wednesday, January 30, 2019

Introducing Flixport, a tool that exports photos from Flickr


If you are like me with tens of thousands photos in Flickr which is about to stop 1TB free storage service, moving these photos out would be a very challenging task.

Flixport is an open source command line tool that bulk-copies photos from Flickr to Amazon S3, Google Storage or your local computer.

Flickr is terminating the free 1TB storage service in February 2019. Unless you convert to a premium user, Flickr will only keep 1000 photos and purge others at some point in February. To backup my own photos I evaluated all existing solution and none worked for me. Luckily, being a software engineer, I could write code to work around difficulties.

Please visit the Flixport page to find out more.

Why S3 and Google storage?

AWS S3, as of today, is the de facto online storage. People may hear Dropbox or Box more often but many of these consuming storage product are backed by another Cloud storage service, of which S3 is the clear leader. Since coding is not a problem for me, I got rid of the middle man and went straight to S3.

Because I work for Google, I also felt comfortable implementing a connection to Google Cloud storage. Sorry Azure, no Microsoft support for now.

Why not Dropbox, Google Drive or Google Photos?

These popular consumer products are obviously good next steps. In general for a command line to work with them, some OAuth-based authentication needs to happen and users will have to copy some long, obscure token from browser and paste it in command line. To offer the best user experience, it's better to run the tool from a website instead of as a command line.

Therefore, if I ever had bandwidth to work on the support of them, it'd not be part of the command line tool, but likely a web-based service that integrates with user's Google or Dropbox account.

Wednesday, September 26, 2018

Tensorflow Hub web experience

Tensorflow released a more dynamic and discoverable web experience for Tensorflow Hub modules. Check it out.

TensorFlow Hub is a platform for sharing reusable pieces of ML, and our vision is to provide a convenient way for researchers and developers to share their work with the broader community.

Can't believe I haven't posted anything in last 3 years. Something either extremely bad or extremely good must have happened.

Friday, January 23, 2015

Private Maven repository in Amazon S3

To put together a private Maven repository, Amazon S3 might be the only solution that requires no installation of software. Comparing to hosting a nexus, S3 backend solution is incredibly cheap. In fact most cases don't exceed the free tier. Besides it comes with all benefits of S3 as storage: 11-9s durability, highly available, IAM based authentication, easy to integrate with CDN, etc.

Unfortunately Maven doesn't come with a native S3 wagon. This blog talks about how to setup private Maven repository on S3 using CyclopsGroup open source utilities.

Define repositories in base pom

This following example defines a SNAPSHOT repository for a project to pull dependencies from, in bucket mycompany-bucket-name with prefix /maven/snapshot-repository. A release repository can be setup in a similar way.

            <name>Snapshot repository of my organization</name>

To provide AWS credentials for Maven to access S3, add corresponding server definition in developers local settings.xml.


Define distribution management to upload artifact

Assume engineers are allowed to upload their local-built artifacts to SNAPSHOT repository defined earlier. If we decide to use the same AWS credentials for both download and upload of artifacts, a similar repository definition should be added to distribution management section of base pom.

            <name>My company's private snapshot repository</name>
Since the repository ID is the same as the repository in repositories section, the same credential applies.

Note that S3 access is implemented by awss3-maven-wagon. In very near future, the next version of awss3-maven-wagon will support username with reserved keyword "INSTANCE_PROFILE", which tells the wagon to get AWS credentials form instance profile on an EC2 instance.  This feature can be used in the case where we disallow engineers from uploading locally built artifacts and only want a few builder EC2 instances to write to repository.

Site distribution

Maven site distribution can be easily supported using similar approach


Here the site will be uploaded to S3 bucket, which may not be accessible by browser. S3 allows to define access control rule based on object path prefix so the S3 sites can be easily opened up while artifacts are still kept secret.

Best practices

I'm sure there can be many ways of using awss3-maven-wagon. I can't say the ways I use or suggest are the best practices, but during years I found them important, helpful and convenient across many projects.

Externalize bucket name to a property

In all examples above, the bucket name is not directly written in each section of pom, but defined via a property. This is not only to keep value consistent across several places, more importantly, it allows engineer to temporary change bucket name locally without having to release new version of base pom. Engineer can do so by adding property to settings.xml and overwrite the value in base pom.

Use different locations for release and snapshot repositories

This is not merely to keep files clean and pretty. When there's a big hierarchy of artifacts, it happens often where people want to make sure an artifact depends on only released dependencies at certain time.

Disallow engineer from writing to repository

It sure is convenient if engineer can build an artifact and upload it to repository immediately. However, it's only a matter of time before someone builds something from his special local environment, or without fully checking in his change, or checking out latest code from server.  The best way to deal with it is to have one(or more) standalone build server, a Jenkins server perhaps, with sanitized environment, and only give that server write access to repository. Engineer only gets read-only IAM for development.

Publish base pom to Maven central repository

I'm not sure if everyone will agree with me but I find it very useful. In the end of day, I want engineer to be able to checkout a project and be able to build it immediately as long as IAM is in settings.xml. When base pom is publicly available, user experience become very smooth. What people often do instead is to add repository to settings.xml instead of pom.xml and avoid publishing private artifact to public repository. This achieve the same result except when engineer works for more than one private organization and have to switch repository based on project he works on. When repository is defined in base pom, switching repository automatically happens. If not, engineer needs to modify settings.xml frequently as he moves between projects.

I hope this article somewhat helps your work. If you have doubt, please share your comments below. Thank you.