Skip to main content

Private Maven repository in Amazon S3

To put together a private Maven repository, Amazon S3 might be the only solution that requires no installation of software. Comparing to hosting a nexus, S3 backend solution is incredibly cheap. In fact most cases don't exceed the free tier. Besides it comes with all benefits of S3 as storage: 11-9s durability, highly available, IAM based authentication, easy to integrate with CDN, etc.

Unfortunately Maven doesn't come with a native S3 wagon. This blog talks about how to setup private Maven repository on S3 using CyclopsGroup open source utilities.

Define repositories in base pom


This following example defines a SNAPSHOT repository for a project to pull dependencies from, in bucket mycompany-bucket-name with prefix /maven/snapshot-repository. A release repository can be setup in a similar way.

    <properties>
<aspectj.version>1.6.11</aspectj.version>
</properties>
    ...
    <dependency>

        <groupId>org.aspectj</groupId>

           <artifactId>aspectjweaver</artifactId>

           <version>${aspectj.version}</version>

           <scope>runtime</scope>

    </dependency>
    ...
    <plugin>

        <groupId>org.apache.maven.plugins</groupId>

        <artifactId>maven-surefire-plugin</artifactId>

        <configuration>

            <argLine>-javaagent:${settings.localRepository}/org/aspectj/aspectjweaver/${aspectj.version}/aspectjweaver-${aspectj.version}.jar</argLine>

        </configuration>

    </plugin>

To provide AWS credentials for Maven to access S3, add corresponding server definition in developers local settings.xml.

<settings>
...
    <server>
        <id>mycompany.repository</id>
        <username>AWS_ACCESS_KEY_ID</username>
        <password>AWS_SECRET_KEY</password>
    </server>

Define distribution management to upload artifact


Assume engineers are allowed to upload their local-built artifacts to SNAPSHOT repository defined earlier. If we decide to use the same AWS credentials for both download and upload of artifacts, a similar repository definition should be added to distribution management section of base pom.

    <distributionManagement>
        <snapshotRepository>
            <id>mycompany.repository</id>
            <name>My company's private snapshot repository</name>
            <url>s3://${dist.bucketName}/maven/snapshot-repository</url>
        </snapshotRepository>
    </distributionManagement>

Since the repository ID is the same as the repository in repositories section, the same credential applies.

Note that S3 access is implemented by awss3-maven-wagon. In very near future, the next version of awss3-maven-wagon will support username with reserved keyword "INSTANCE_PROFILE", which tells the wagon to get AWS credentials form instance profile on an EC2 instance.  This feature can be used in the case where we disallow engineers from uploading locally built artifacts and only want a few builder EC2 instances to write to repository.

Site distribution


Maven site distribution can be easily supported using similar approach

    <distributionManagement>
        <site>
            <id>mycompany.repository</id>
            <url>
                s3://${dist.bucketName}/sites/myproject
            </url>
        </site>
    </distributionManagement>

Here the site will be uploaded to S3 bucket, which may not be accessible by browser. S3 allows to define access control rule based on object path prefix so the S3 sites can be easily opened up while artifacts are still kept secret.

Best practices


I'm sure there can be many ways of using awss3-maven-wagon. I can't say the ways I use or suggest are the best practices, but during years I found them important, helpful and convenient across many projects.

Externalize bucket name to a property


In all examples above, the bucket name is not directly written in each section of pom, but defined via a property. This is not only to keep value consistent across several places, more importantly, it allows engineer to temporary change bucket name locally without having to release new version of base pom. Engineer can do so by adding property to settings.xml and overwrite the value in base pom.

Use different locations for release and snapshot repositories


This is not merely to keep files clean and pretty. When there's a big hierarchy of artifacts, it happens often where people want to make sure an artifact depends on only released dependencies at certain time.

Disallow engineer from writing to repository

It sure is convenient if engineer can build an artifact and upload it to repository immediately. However, it's only a matter of time before someone builds something from his special local environment, or without fully checking in his change, or checking out latest code from server.  The best way to deal with it is to have one(or more) standalone build server, a Jenkins server perhaps, with sanitized environment, and only give that server write access to repository. Engineer only gets read-only IAM for development.

Publish base pom to Maven central repository


I'm not sure if everyone will agree with me but I find it very useful. In the end of day, I want engineer to be able to checkout a project and be able to build it immediately as long as IAM is in settings.xml. When base pom is publicly available, user experience become very smooth. What people often do instead is to add repository to settings.xml instead of pom.xml and avoid publishing private artifact to public repository. This achieve the same result except when engineer works for more than one private organization and have to switch repository based on project he works on. When repository is defined in base pom, switching repository automatically happens. If not, engineer needs to modify settings.xml frequently as he moves between projects.

I hope this article somewhat helps your work. If you have doubt, please share your comments below. Thank you.

Comments

Popular posts from this blog

Spring, Angular and other reasons I like and hate Bazel at the same time

For several weeks I've been trying to put together an Angular application served Java Spring MVC web server in Bazel. I've seen the Java, Angular combination works well in Google, and given the popularity of Java, I want get it to work with open source. How hard can it be to run arguably the best JS framework on a server in probably the most popular server-side language with  the mono-repo of planet-scale ? The rest of this post walks through the headaches and nightmares I had to get things to work but if you are just here to look for a working example, github/jiaqi/angular-on-java is all you need. https://github.com/jiaqi/angular-on-java Java web application with Appengine rule Surprisingly there isn't an official way of building Java web application in Bazel, the closest thing is the Appengine rule  and Spring MVC seems to work well with it. 3 Java classes, a JSP and an appengine.xml was all I need. At this point, the server starts well but I got "No ...

Customize IdGenerator in JPA, gap between Hibernate and JPA annotations

JPA annotation is like a subset of Hibernate annotation, this means people will find something available in Hibernate missing in JPA. One of the important missing features in JPA is customized ID generator. JPA doesn't provide an approach for developer to plug in their own IdGenerator. For example, if you want the primary key of a table to be BigInteger coming from sequence, JPA will be out of solution. Assume you don't mind the mixture of Hibernate and JPA Annotation and your JPA provider is Hibernate, which is mostly the case, a solution before JPA starts introducing new Annotation is, to replace JPA @SequenceGenerator with Hibernate @GenericGenerator. Now, let the code talk. /** * Ordinary JPA sequence. * If the Long is changed into BigInteger, * there will be runtime error complaining about the type of primary key */ @Id @Column(name = "id", precision = 12) @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "XyzIdGenerator") @SequenceGe...

Project Euler problem 220 - Heighway Dragon

This document goes through a Java solution for Project Euler problem 220 . If you want to achieve the pleasure of solving the unfamiliarity and you don't have a solution yet, PLEASE STOP READING UNTIL YOU FIND A SOLUTION. Problem 220 is to tell the coordinate after a given large number of steps in a Dragon Curve . The first thing came to my mind, is to DFS traverse a 50 level tree by 10^12 steps, during which it keeps track of a direction and a coordinate. Roughly estimate, this solution takes a 50 level recursion, which isn't horrible, and 10^12 switch/case calls. Written by a lazy and irresponsible Java engineer, this solution vaguely looks like: Traveler traveler = new Traveler(new Coordinate(0, 0), Direction.UP); void main() { try { traverse("Fa", 0); } catch (TerminationSignal signal) { print signal; } } void traverse(String plan, int level) { foreach(char c:plan) { switch(c) { case 'F': traveler.stepForward(); break; ca...