Sander van der Burg's blog

Friday, October 19, 2012

My post-PhD career, a.k.a. leaving academia

It's been quiet here on my blog for a while now. Meanwhile, I'm more or less finished writing my PhD thesis. I still have to defend my thesis -- currently, it's being reviewed by my supervisors and my reading committee. As I have decided not to write a thesis in which (apart from the introduction and conclusion) the chapters are composed of my papers (as most people do in our research group), this will take a bit longer than usual.

Furthermore, my employment contract with the university has expired (in the Netherlands PhD students are typically employed a university). Since this month, I have started my new job at an industrial company. So essentially, I left academia.

On leaving academia

I think that some people that know me and haven't seen me in a while may be surprised (or perhaps shocked :-) ) by this announcement, so I think it would be a good idea to give an explanation why I have decided not to pursue a research career.

First, I'd like to stress out that the last four years were great, although there were a few periods that could have been better. I have no regrets becoming a PhD student (as opposed to working for an industrial company) -- I did many interesting things, I have been at many interesting places around the world, presented my research and software at a number of international software engineering conferences, learned interesting/useful stuff and I have met quite a number of interesting people. I'm really thankful for that to my supervisors, who gave me this opportunity.

Apart from research, one of the other things I'm really proud of is the practical impact of the work I'm involved with. For example, I gave a presentation at FOSDEM about NixOS -- a non-academic event -- which was quite successful. Apart from FOSDEM, I also gave a number of talks and demonstrations at industrial companies that were very well received. Finally, I'm also happy about the impact of my blog.

After listing all these positive traits, it almost sounds a bit odd that I'm not continuing the same path, which I often hear from outsiders. I have the following reasons for this:

My working style does not completely map to the job description of a researcher. Apart from doing the work to get things published, I have also invested a significant amount of time and effort in the tools and their practical usage.

As I have also presented frequently to people outside the academic community, I typically use a lot of practical examples and I often have a number of demos showing good practical usage. These aspects are nice, but are (in principle) not part of my job description. Quite often, outsiders become happy after seeing these aspects and think that my research is about doing that kind of stuff, which is not true.
It's about publishing. As I have explained earlier in a number of blog posts, researchers have to publish and often it's used as the primary means to measure their productivity.

I know some people that have left academia because they hate writing papers and the pressure that it sometimes gives. I don't really hate writing papers, although I've had my frustrating moments sometimes -- actually, if I have a good idea I'm very eager to write them down.

What bothers me is that my research subject -- software deployment -- is not a very popular topic in the software engineering research world. If I would continue my career in research, I have to produce more publications, preferably at highly ranked conferences and journals.

It's difficult to achieve this goal with my current research topic and working style. As a consequence, I'm more or less forced to pick subjects that are better publishable, which basically forces me to check the interests of the program committee members and do stuff that I'm not always interested in. I'm not somebody who likes to work that way.

Some research topics are hyper-specialized, which is not always a good thing. I have noticed that even for my research (that already looks quite practical to most people in the research community) is difficult to sell conceptually to non-academic people, unless I do it the right way. For a number of presentations that I have seen at conferences, I'm pretty sure that people for industry would lose their interest in a matter of minutes and have no clue what the speaker is talking about.

For many of these papers, the "real world" does not care about and as a consequence they tend to become forgotten knowledge. I'm not somebody who likes to do stuff that nobody cares about and do not bring any value to the software engineering community as a whole.

Furthermore, if I would spend my time to publish more, there is less time to do engineering work and to produce/improve tools that can actually do something.

I've picked my current research project, because I like solving problems that I really believe that they are relevant and I want to add value. However, the "deliverables" of my style unfortunately don't always map to publishable units at academic events.
I want to add more practical value. I have always been a person that likes to build stuff. If I have a nice idea in mind, I'm always very eager to try it out and to make piece of software that does something, although it is not always very relevant. For the last 6 months, I had no time for doing stuff like this. Nearly all my time was consumed by finishing publications, writing my thesis and preparing/visiting a conference.

Looking for a new job

As my time was completely consumed by finishing publications and writing my thesis the last 6 months, I didn't really have the time to think about a new career perspective. At the end of August, I have decided to put my CV online and I have used the entire month of September to visit and talk to companies.

I was approached by many companies and I have visited all of them in a circle of Rotterdam, Amsterdam, Eindhoven and Utrecht. So I have traveled quite a lot :-).

In the beginning, I felt a bit worried, because it's been a long time ago that I officially applied for a job and I knew that there are a number of prejudices that industry people have on academics like me. This a bit funny, because when I was still a bachelor's student I was not worried about application procedures or getting a job at all :-)

First it looked like my view on industry turned out to be true -- some potential employers did not care about my research career, just asked me: "can you program in Java?" and offered me a salary, which was way lower that my current salary. Some of them also did not know what a PhD is and treated me the same as an undergrad student.

Furthermore, at one company I had to perform a number of capacity tests to check whether I was smart enough. I had to do tests, such as verbal analogies, number sequences, and abstract reasoning. Fortunately, I did the test well enough. There were a few applicants that immediately had to leave because their score was too low and the company was not even interested talking to them. I have made it to the last round and they gave me a job offer, but I refused it. :-)

Fortunately, I did not unlearn my application skills (nearly all of them went quite successfully) and not all companies that I talked to were disappointing -- in the last 2 weeks of September I had "collected" 4 interesting job offers, with good salaries and interesting job aspects. I have met a number of interesting people at these companies and it was very hard for me to turn 3 of them down. A few of them were very disappointed that I rejected them. :-)

In total, I received 6 job offers, one application turned out to be a mismatch of interests from both sides, and 1 rejection after the second round. Apparently, that company was looking for somebody that just wants to write code and not somebody who's critical like me :-).

My new job: Software Architect @ Conference Compass

Currently, I have joined Conference Compass, a Delft-based startup company developing applications (mostly Apps for mobile devices) to improve the experience of conference participants.

I will fulfill the role of Software Architect -- I'm involved with many technical aspects, ranging from design, implementation, documentation and (interestingly enough) deployment.

The main reason of joining this company is that the company is small, there is a lot of interesting challenges that I can work on, a lot of room in things that I can do/improve, and the atmosphere -- 3 employees are PhDs (once I graduate they have 4 of them) and are -- apart from technical aspects -- very much interested in applying new concepts.

Even more interesting is that there is room in which I can use my experience from my research background and that they are very much interested in participation in free and open-source software projects.

Another interesting aspect is that they are located in the YES!Delft incubation centre, which is supported by various external parties including Delft University of Technology. In fact, I'm not that far away from my previous working spot (only 100 meters or so). I can see my former employer's building from the window here. :-)

In my past career, I have worked for medium sized companies as well as big ones. Joining a small group is a new experience to me.

The future of Disnix and other components of the Nix project

After reading this, you may possibly wonder what the future is of the software that I wrote during my PhD career, in particular Disnix and other components part of the Nix project.

As I may have already partially revealed on this blog, my new employer is also interested in the deployment technologies of my previous job. One of their biggest interests is having this technology supported in the mobile space and in the cloud.

Furthermore, my former employer is still using our Nix-based Hydra buildfarm and my two former colleagues (who are now also employed elsewhere) still maintain it and use the Nix technology at their company as well.

So in short: The Nix project is far from dead and is still actively being used. Now that we're not required to publish papers anymore, we even have more time to improve the tools and to make it even better applicable.

Also, my homepage and this blog will remain to exist. It will still be tech-related, cover deployment and other technical aspects that I may run into in the near future. My new job is not only about software deployment, so I will also be working on other aspects.

The future of research in software deployment

As the Mancoosi project has come to an end and because me and my former colleagues working on Nix (including the Nix author) have left the university, deployment research is pretty much dead now, which is a shame, because I think it's very important in practice.

However, I have agreed with my supervisors (and still have the intention!) to publish one journal paper that contains some unpublished stuff from my PhD thesis, in my spare time. Maybe (you'll never know) I have more stuff published when I can find the time and space for it, but I can't give you any time window. :-)

It's not that since I have switched to industry that I'm not interested in research anymore and that I don't want to read or write a paper. I'd still like to be involved in a healthy/doable way, but it's no longer my primary task.

In fact, if academia and industry invest some effort, it can result in fruitful collaborations that give both parties benefits. I wrote a very large blog post about this some time ago, titled 'Software engineering fractions, collaborations and "The System"' that I'd like to encourage both academic and industry people to read.

So to all my academic friends: I'd like to say that if you have a new idea, let me know and make sure that you show that it's relevant to read or even better -- something that I can actually use/try.

Concluding remarks

In this blog post, I have announced that I have moved from academia to industry. The title contains "leaving academia" which sounds very dramatic and a bit like a farewell message. Although I'm no longer employed by the university, I'm actually still located quite close to it, and I have visited my former colleagues twice this week. Furthermore, the technology that we have "invented" is still actively being used and maintained.

I also have the intention to retain the relationship with people from the research world, although I still have to see how I can properly integrate that into my new job description.

Finally, I still have to defend my PhD thesis. Once I have more details about this, I'll announce it here and at several other places of course :-).

Sunday, September 2, 2012

A review of conferences in 2011-2012

It's time for the final episode in the "visited conference series". Fortunately, 2011 and 2012 were not as busy as 2010, so I can put the remaining ones in a single blog post.

FOSDEM: Free and Open Source Software Developers' European Meeting (2011)

2011 started with FOSDEM as with 2010. As with all the previous editions, the FOSDEM experience is always the same as usual - the same peculiar building etc. etc. (see my previous blog posts).

In this edition, I have also given a talk about NixOS titled: "Using NixOS for declarative deployment and testing". I didn't really know what to expect, but it turned out that it was received quite well. After my talk, I saw some #fosdem tweets on Twitter, in which one of the attendants described my talk as one of the most exciting ones next to systemd and the Google Go language (personally, I also find these two other talks quite good).

SEAMS: Software Engineering for Adaptive and Self-Managing Systems (2011)

The next event was another academic one: SEAMS, which is about self-adaptive and self-managing systems. SEAMS was co-located with ICSE and a number of other events, such as MSR, and held in Honolulu, Hawaii.

SEAMS looked like an interesting event to me, but I had some difficulties finding attendants that were looking into the same issues I did. Most of the presentations were about embedded software and covered issues I did not know about or understand. Although service-oriented systems were listed on their call for papers, I was the only one with a paper in that domain.

I missed some interaction between the participants in general. Perhaps, it's due to the fact that SEAMS was originally a workshop and upgraded to a symposium. Furthermore, this edition had a record number of participants and submissions. Also, the panel discussion was not very interesting and apart from the panel members, nobody took part.

Another disappointing thing is that after the panel discussion a number of attendants left and I had to present my paper in the final slot, with only half of the people left.

The social event was at a Chinese restaurant in Honolulu and was much better. Here, I finally had the opportunity to meet people and to have discussions about research. One attendant suggested a partial solution to me, I was struggling with for a while.

In general, SEAMS was not a bad event although several aspects could be improved and the domain does not entirely map to the kind of research that I'm doing.

ICSE: International Conference on Software Engineering (2011)

Apart from SEAMS, I also attended ICSE, which was held at the same place (Honolulu, Hawaii).

The picture above shows how awesome the environment is. This picture has nothing to do with the conference of course, but I like to show it to you anyway :-)

The conference experience was good and actually a bit better than the 2009's edition, that I have attended previously. A colleague as well as a former colleague had to present a paper there about spreadsheets and testing cross browser compatibility. Apart from my colleague's presentations, there was another good presentation titled: "An Empirical Study of Build Maintenance Effort", which contains a collection of empirical results relevant to our research.

The social event was a dinner, which included an amazing show with lots of typical Hawaiian cultural aspects.

FOSDEM: Free and Open Source Software Developers' European Meeting (2012)

I started 2012 with FOSDEM, as with the previous two years. This year was a bit special compared to the others, as I had a lot of difficulties getting there due to the weather conditions. In 2012, there was a lot of snowfall. As a result, I arrived in Brussels somewhere in the afternoon, although it gave me some pretty pictures, such as the one above, which I took from a small train station en route, because we had to wait until a defect was fixed.

You probably expect me to say that the FOSDEM experience is same as usual. Although it was held at the ULB campus like all the other years, several more buildings were used for the talks. These buildings also had a much better quality compared to the regular ones. As a result, the hallways were less crowded (which is a good thing). Normally, I was always struggling to get from one devroom to another.

The picture above shows a room in a newer building in which the Wayland talk was held.

HotSWUp: Workshop on Hot Topics in Software Upgrades (2012)

In 2012, I attended HotSWUp for the second time. HotSWUp was the first academic event that I have attended four years earlier. This time it was co-located with ICSE and held at the Irchel Campus of the University of Zürich, Switzerland.

I haven't attended any academic events for a while, and a day before the workshop I didn't expect it to be that exciting. The first HotSWUp was already great, and this one was even a bit better.

There were many interesting presentations, covering upgrading aspects in various domains ranging from programming language level, to component level and system level. As with the first HotSWUp, I liked the cross disciplinary aspect very much.

I was quite satisfied with my own presentation, which raised a number of interesting questions, such as how to version components. Moreover, I received an interesting suggestion that I could use to optimise the approach described in my paper.

The social event was a dinner in a vegetarian restaurant. Normally, I wouldn't pick such a location, but it was interesting anyway. The fake meat tasted like real meat. After the dinner, me and several other participants had a couple of beers somewhere else.

ICSE: International Conference on Software Engineering (2012)

In addition to HotSWup, I also attended ICSE - the host conference held in the Kongresshaus Zürich. It is the third time for me attending ICSE.

This years ICSE was nice, had some interesting talks (two of them were of my colleagues, one about code smells in spreadsheets and the other about test confessions of developers), but I'm a bit critical about this years' edition compared to my previous experiences.

The quality of the first two keynote presentations were a bit disappointing and I really had to force myself listening, which (of course) failed after a few minutes. The third keynote was much better, although the speaker could have included more recent examples of research in software architectures.

Another thing was the conference venue itself. The rooms for the research talks were much too small, overly crowded and without any air conditioning (so the temperatures were really high). As a result, I sometimes had difficulties listening to a talk and it was almost impossible to ask questions.

Furthermore, it was difficult to talk to new people on the hallways, also partially due to the weird structure of the conference venue. Fortunately, I did not limit me entirely to meet new people, but it could have been better.

We had two social events at ICSE. The last one was a dinner, which included a show with two singers and a keyboard player, as shown above.

Although I'm critical about this years' ICSE, it was not a bad conference, but some things could have been better.

Conclusions

In this blog posts I have described the events I have attended in 2011 and 2012, which were less busy as 2010. The only new event I have attended was SEAMS.

These were all the conferences I have visited in my period as as a PhD student. In a next blog post, I'll try to draw some general conclusions from these blog posts.

Friday, August 24, 2012

A review of conferences in 2010

Previously, I wrote a review of the conferences I have attended in 2008-2009. In this blog post, I will cover those I have visited in 2010. 2010 was a very busy conference year for me and in order to keep things short, I have decided to dedicate a full blog post to this year.

FOSDEM: Free and Open Source Software Developers' European Meeting (2010)

The first conference that I attended in 2010 was FOSDEM, which I visited for the second time. As with my visit last year, the experience was nearly identical - the same building with the same "peculiar atmosphere" and the same kind of enthusiastic people. I also slept in the same hotel in Brussels (whose staff is unable to speak Dutch, for some reason).

During this years' edition I have met quite a number of interesting people. Two of them were Nix developers I have frequently spoken with on the IRC, but never met in person.

Nicolas Pierron gave a talk about NixOS and the NixOS configuration system, which he had implemented in his spare time.

Apart from a number of Nix developers, we have also seen some members of the competing Macoosi project. Eelco and I met the RPM5 maintainer (also member of Mancoosi). He had some interesting ideas about package management and he proposed to merge our approaches (i.e. the purely functional model), although he did not clearly explain how. Although some attempts have been made by the Mancoosi members, it did not result into anything useful, unfortunately.

SEAA: EUROMICRO Conference on Software Engineering and Advanced Applications (2010)

The next conference of 2010 was another academic one - SEAA, which was a conference I did not know about. I ended up visiting this conference, because my CBSE paper submission was rejected.

The rejection was actually not a surprise. I really did a poor job. Parts of the paper were incomplete, I tried to put too many technical details in it and I did not study related work too well. Most of these flaws were due to time pressure.

Although it was disappointing to have a paper rejected, the good thing about it was that the reviewers listed all my flaws, suggested how certain sections of the paper could be improved and they gave me a number of references that I should study. A few days later I received an e-mail in which they encouraged me to submit my revised paper to SEAA, which I did.

As a sidenote: I think this is something that more reviewers of conference papers should do. Nearly all my other rejections always resulted in a few vague comments, which were not useful to me at all, nor gave me any insights in what to revise.

The SEAA conference was located in France, in a city called Lille (which apparently has a Dutch translation: Rijsel) and held at the Polytech' Lille. I had no idea what to expect from this conference, but my experience was quite positive. The conference had very friendly and enthusiastic organizers. Some of them apparently have a long organizing reputation and knew many participants quite well.

Moreover, the conference was co-located with the EUROMICRO DSD (Digital System Design) conference and the keynote sessions of SEAA and DSD were shared. During these keynotes, I also learned a thing or two about embedded systems and VHDL. A few keynotes were given by industry people.

The social event was a dinner at Chateau de Bourgogne in Estaimbourg and the wine tasted good :-).

Unfortunately, I had to present my paper the next day as the second presenter, in the first presentation slot (somewhere at 9:00 in the morning). I really had a hard time getting up and to get myself motivated, but it turned out that I was well prepared - nobody was sleeping during my presentation and I received a number interesting questions.

WASDeTT: Workshop on Academic Software Development Tools and Techniques (2010)

The next event that I attended was WASDeTT, which was co-located with ASE and held at the University of Antwerp.

As explained earlier, my CBSE submission was rejected partially because I included too much implementation details. Some time later I "discovered" this workshop that is focused on the development of tools within academia and development aspects in general. For that reason, I have decided to write a tool-oriented paper for Disnix, covering the latest version, implementation details and extensions. Because the page limit was 25 pages, I could write down all the stuff that I could not do in the SEAA paper.

The workshop was not about the tools themselves, but rather about their development aspects. So instead of presenting the concepts of Disnix again, I have explained all its related tool development aspects and development choices. We have had some interesting discussions, most notably - how to make research tools available to other researchers, which (of course) is a deployment issue.

One of the ideas was making a tool available as a web application. I also suggested Nix, as it provides complete dependencies and reproducible deployment. The suggestion, however, was slammed by a fellow workshop participant stating that Nix is nice, but a "conquer-the-world approach".

ASE: International Conference on Automated Software Engineering (2010)

I have also attended ASE, the host conference of the WASDeTT workshop. ASE is a conference that was known as KBSA (Knowledge Based Software Assistant) in the past, as it purpose was to use artificial intelligence techniques to automate all phases of a software engineering process. Nowadays, the domain has been broadened to software engineering in general and ASE is more or less a general conference on nearly the same level as ICSE, with a comparable paper acceptance rate.

Before I left, I considered Antwerp a very "boring" place compared to all the previous conference locations (as I have visited Antwerp many times and I can speak to its residents in Dutch, my native language), but the entire experience was much much better than I expected.

One of the good things about ASE compared to ICSE, is that it is less crowded and it has fewer parallel sessions (at ASE there were only 2 parallel sessions, while at ICSE there are 6 or even more, I'm not sure :-) ). At ASE, it was easier for me to meet new people and to have discussions.

Two papers were very interesting at ASE. One of them was from the Mancoosi project (our "competitors"), in which they have used SAT-solvers for solving dependency resolution problems. The other one was about a tool called Ninka, which can be used to derive licenses from source files using sentence matching techniques. We have used Ninka in our research as well.

ASE included two social events - a beer tasting event in the city hall (which was interesting, because I have seen the city hall many times, but I have never seen it from the inside) and a banquet at the Antwerp Zoo. After both events, I noticed that several conference attendees were unable to walk back to their hotels in a straight line. I have to admit that it took me a bit of effort too :-)

ISSRE: International Symposium on Software Reliability Engineering (2010)

ISSRE was the third academic conference that I have attended in 2010. ISSRE is a conference about reliability engineering and related subjects, such as testing. We ended up there, because our ICSE submission was rejected (although we made it very close). We have turned the testing part of this rejected paper into a paper for ISSRE.

ISSRE was also an unknown conference to me and held in a suburb of San Jose, California, which was basically composed of a large number of Cisco buildings. The conference itself was also held at Cisco.

I've noticed that the conference also had very friendly and enthusiastic organizers and a loyal group of visitors that knew each other quite well. Furthermore, the organizers and session chairs were eager to meet all the new/unknown participants including me and Eelco.

A lot of the papers covered topics that were completely unknown to me. For example, I still remember a paper covering a software aging analysis of the Linux kernel. Before attending the presentation, I had no clue what software aging was. In turned out that it was about "progressive performance degradation or a sudden hang/crash of a software system due to exhaustion of operating system resources, fragmentation and accumulation of errors".

There were many more interesting papers. Another paper I liked was about fixing certain undefined features in the C programming language and the "political process" of the C standards committee.

There was also another presentation in which (apparently) something went wrong, and the speaker cried some kind of F-word out loud :-)

Another good aspect of this conference was the high industry participation degree of which the organizers were very proud. I've talked to one of them about other conferences that I've attended, such as ICSE, and he said to me that these conferences are "so highly academic", which turned out that this is indeed true, unfortunately (which does not implicate that they are useless, of course).

The social event was held at the Tech Museum in San Jose, which had some exciting attractions, such as an earthquake simulator and an interesting exercise in which you had to construct a road by using a number of random objects, so that the ball can safely roll from the entrance to the exit. As you may see in the picture above, Eelco is a smart guy and knew how to do it.

LAC: Landelijk Architectuur Congres (translation: 'National Architecture Congress') (2010)

The LAC (Dutch National Architecture Congress) was the last conference I have attended in 2010, which was basically an industry oriented conference with a bit of research participation. Furthermore, nearly all the presentations were in Dutch.

I ended up visiting this conference, because our funding agency: NWO/Jacquard had its own session and I was asked to give a presentation about PDS, the research project I'm participating in.

Apart from giving a presentation about my project, I also took some time to explore the conference venue. There were a lot of companies there presenting themselves (probably to attract new potential employees), showing all kinds of pictures representing software architectures.

One of those pictures captured my interest, because it looked like a painting and it did not make sense to me.

I asked one of the presenters what the picture is supposed to mean and I heard a story that I did not understand. After a while, he started talking about his projects, such as the traffic light systems, and suddenly the conversation became much more interesting.

Another funny thing is that the conference was dominated by certain buzz words, such as Agile and TOGAF (which I have never heard of previously). It turned out that TOGAF is an framework developed by The Open Group providing a comprehensive approach for designing, planning, implementing, and governing an enterprise information architecture.

One of the major differences I have noticed between this conference and an academic conference covering software architectures, is that industrial conferences are dominated by tools and standards, whereas academic conferences are mostly about concepts.

Conclusion

In this blog post I have described all conferences that I have attended in 2010. Stay tuned, as there are two more years to come...

Monday, July 30, 2012

A review of conferences in 2008-2009

As some of you may know (and some others probably don't :-) ), I have visited several conferences, which I'm required to attend as part of my job as PhD student. Most of the conferences are academic conferences where I have to present papers that I publish. In contrast to many other research fields - in software engineering - we typically publish at conferences instead of journals.

Apart from academic conferences, I have visited a number of non-academic conferences as well, because apart from being a researcher, I'm also a person interested in technology and development.

In this blog post, I'm going to list the conferences I have visited the past few years and I'm going to share some experiences. The reason that I have written this blog post is basically because of what I have written earlier about software engineering fractions. It's not to have a rant about these conferences, but to share my personal experiences about the topics, audience, peculiarities and other traits, because I think it's important to know how to reach several types of audiences instead of living on an isolated island.

The conference experiences here are in chronological order. In order to keep blog posts short, I start with the first two years (2008-2009):

DSM: Workshop on Domain-Specific Modeling (2008)

This is the first academic event I have ever attended. DSM is a workshop about Domain-Specific Modeling and is typically co-located with OOPSLA (nowadays known as SPLASH). I have attended the 2008 edition in Nashville, Tennessee, USA, which was also my first trip to the USA.

I didn't have to present anything here, but the main reason of attending was that students were allowed to attend any co-located event at OOPSLA for free and one of my colleagues had to present a paper there.

The workshop covers short papers about both theoretical and practical aspects of domain specific modeling. I have seen many kinds of submissions, ranging from simple ideas to a number of tools that have a significant practical use. I have even seen some submissions that me and my colleagues still laugh about today (including their paper).

One of the nice aspects of this workshop is that there is a lot of room for discussion with the attendants and a relatively low barrier of getting your idea published (this is both positive as well as a negative thing).

The attendants of this workshop are mostly academic people, but also quite a number of industry people, showing industrial applications. Below I have included a picture taken of the attendants (people who know me will surely recognise me :-) )

HotSWUp: Workshop on Hot Topics in Software Upgrades (2008)

HotSWUp is the second academic event I have attended. HotSWUp is a cross-disciplinary workshop about software upgrades in any sub field, such as programming languages, operating systems, distributed systems and so on. This event was also co-located with OOPSLA and held in Nashville.

In this workshop, I had to present my first research paper, covering distributed atomic upgrades, an important aspect of my masters thesis.

HotSWUp was a very great workshop and a great experience for me. There were many nice papers covering great ideas, such as distributed upgrades, package management upgrades and how to safely mirror software components with untrusted parties. Even though many of these papers are not directly applicable to my research, they still have a fair amount of practical usage that I liked very much.

I received interesting questions, suggestions and feedback. Furthermore, there was also a lot of room for discussion that gave me many interesting ideas and several new research directions.

I have also met a number of interesting people close to my research. I met two "competitors" from the Mancoosi project (one of them is now the current Debian project leader), who also investigate package management related issues. We had a very nice opportunity to get to acquainted and share ideas about package management. I also met two other people who did similar kind of research with distributed systems.

OOPSLA: Object-Oriented Programming, Systems, Languages & Applications (2008)

OOPSLA is the first academic conference I have attended (which was basically the host conference of the two previous workshops). Actually, OOPSLA was the only conference I knew about before I started my PhD and I very much liked the fact that I was able to attend it. Moreover, another colleague of mine had to present a full paper there and several others had a poster submission about WebDSL.

The papers of OOPSLA were mostly about practical aspects of Object-Oriented programming languages or related issues, such as garbage collection. There were quite a number of interesting papers showing good practical usage, such as the fact that using inverse lookups is better way of refactoring code. I saw an interesting paper using Robocode as a case-study, to which I was addicted when I started learning programming in Java.

There were also some presentations that were a bit hard to follow. For example, I still remember this presentation showing ovals with curly lines attached to it. When I saw those figures it reminded me of something totally different (which was probably not something that the presenter has intended :-) )

Another funny thing I have noticed during the presentations, was some guy in the audience, who was always sleeping during the presentations and could not keep his eyes open.

Apart from attending presentations, I also attended two tutorials, which were also free for students. I learned a thing or two about the Scala programming language, which combines object-oriented, functional programming and several other concepts in a new programming language which builts on top of the Java virtual machine. I also did a Smalltalk tutorial involving the Squeak implementation, which can also be used to develop web applications in Smalltalk.

Nowadays, OOPSLA no longer specifically focuses itself on object-oriented programming languages anymore. It's even not focused on programming languages, but on software engineering in general. To reflect the broadened scope, the new name of the conference is SPLASH: Systems, Programming, Languages and Applications: Software for Humanity.

FOSDEM: Free and Open Source Software Developers' European Meeting (2009)

FOSDEM was the next and a non-academic that I have visited. I was not required to visit it nor I had to present something. The main reason of attending it, was because I have always been involved with free and open source software.

FOSDEM is the biggest free and open-source event in Europe in which virtually every well known free and open-source project is represented, such as the X Window System, the Linux kernel, KDE, Mozilla, GNOME etc. FOSDEM is always held in Brussels at the campus of the Université libre de Bruxelles and can be visited by anyone interested for free (gratis).

When I arrived there for the first time, I was actually a bit "surprised" about the conference venue (the quality of the buildings were a bit "different" than I expected, after the impression I got when looking at their website :-) ).

I also noticed that there were an overwhelming amount of attendants. At the keynote presentation, the entire Janson auditorium is completely filled with free and open source enthusiasts, as shown in the picture above. Some of them were really, really enthusiastic about the stuff they are working, as may see in the picture when looking at the headgear of some of the attendants.

Apart from a filled Janson auditorium room, it was also very difficult to move from one room to another or to look for somebody you know, because the hallways were always full of people.

The quality of the talks is -- as with many conferences -- of varying quality. I liked the talks about GEM/KMS and Ext4. There were also a number of talks, that were hard to follow and not well prepared. In contrast to academic conferences, where the session chair typically has a question for you (when nobody else has one), here at FOSDEM, the session chair has no mercy and immediately announces the next speaker.

ICSE: International Conference on Software Engineering (2009)

ICSE was the second top general software engineering conference I have attended. The main reason of attending it was because I had to present a paper at the co-located cloud computing workshop.

This conference was held in Vancouver, British Columbia, Canada and my visit was actually a bit special, as it was the first time for me to travel by plane alone and to travel such a long distance. :-)

The ICSE conference is a very broad conference covering topics in the entire software engineering spectrum, ranging from requirements engineering, testing, model checking and collaboration. Although software deployment is also listed on their call for papers, I haven't encountered any papers covering this subject in the last few years, unfortunately.

This years' ICSE edition had a very good atmosphere and I've met quite a number of interesting people there as well as some familiar ones from OOPSLA and HotSWUp. One of the papers presented there resulted in a collaboration many years later.

I also had to cheer for my former colleague: Ali Mesbah who presented his paper: 'Invariant-Based Automatic Testing of Ajax User Interfaces' for which he won the ACM SIGSOFT Distinguished paper award.

The social event was held at the Vancouver Aquarium and included a nice show with dolphins. Apart from the conference, I also took a few days off to explore the environment and I went to many interesting places, such as Stanley Park, Dr. Sun-Yat Sen Garden and Granville Island.

I also ended up in an infamous street called: 'East Hastings', for which I have been warned not visiting it, a few days later (so you have been warned now!!). I have some interesting pictures included below:

ICSE-Cloud: ICSE Workshop on Software Engineering Challenges in Cloud Computing (2009)

The ICSE Cloud workshop was the last event I have attended in 2009 in which I had to present a paper. The picture above, is taken by Jan S. Rellermeyer who also attended this workshop and HotSWUp, which I have described earlier.

Apart from a very good ICSE experience, the ICSE cloud shop was also a very nice workshop, with lots of discussions. Another important aspect was the high industry participation degree and especially the first keynote, given by somebody from Amazon, the company who basically pioneered the 'cloud computing' term.

Although I have to admit that the paper I had to present there was not one of my strongest contributions, it was actually quite well received there. Furthermore, for some unknown reasons, it also my most downloaded and cited paper.

Conclusion

This blog post reports about my conference experiences from the begin period of my career as a PhD researcher. In a future blog post I'll report about the next years, so stay tuned...

Friday, June 22, 2012

IFF file format experiments

I haven't written any blog post for a while, but don't worry... I'm still alive, though very busy with writing my PhD thesis. There is still a fun project that showed some interesting results a while ago, but so far I never allowed myself to take the time to report about it.

A while ago, I wrote a blog post about my second computer, the Commodore Amiga. I also mentioned that one of my favourite Amiga programs was Deluxe Paint, which stores images in a so-called "IFF file format", which was ubiquitous on the AmigaOS and supported by many Amiga programs.

Nowadays, this file format is rarely used and also poorly supported in modern applications. Most common viewers cannot open it, although a number of more advanced programs can. However, the quality of their implementations typically differ as well as the features that they support.

What is the IFF file format?

IFF is an abbreviation for Interchange File Format. Quite often, people think that it is just a format to store images, as the most common IFF application format is the InterLeaved BitMap (ILBM) format used by Deluxe Paint and many other programs.

In fact, IFF is a generic purpose container format for structuring data. Apart from pictures, it is also used to store 8-bit audio samples (8SVX), musical scores (SMUS), animations (ANIM) and several other formats. The IFF file format as well as a number of application file formats were designed by Electronic Arts, nowadays a well-known game publishing company, and described in several public domain specifications, which everybody was allowed to implement.

The confusion with the IFF file format is similar to the OGG file format, which are quite often mistakenly identified as Vorbis audio files, as Vorbis is the most common OGG application file format. In fact, OGG is a container format for bitstreams, while Vorbis is an application format to provide lossy audio compression and decompression. There are many other OGG application formats, such as Theora (for video) and Speex (for speech).

IFF concepts

Conceptually, IFF files have a very simple structure. Every IFF file is divided into chunks. Each chunk consists of a 4 character identifier followed by a signed 32-bit integer describing the chunk size, followed by the given amount of bytes representing data:

The picture above shows a very simple example chunk with identifier: 'BODY' which contains 24000 bytes of data. The data in the chunk body represents pixel data.

Although the concept of IFF files using chunks is ridiculously simple, it immediately offers a number of useful features for handling file formats. By looking at a chunk identifier, a program can determine whether it contains useful information to present to end users or whether a chunk is irrelevant and can be skipped. Furthermore, the chunk sizes indicate how much data has to be read or how many bytes must be skipped to reach the next chunk. Using these attributes make it possible to implement a robust parser capable of properly retrieving the data that we want to present.

In principle, every chunk captures custom data. Apart from data chunks, the IFF standard defines a number of special group chunks, which can be used to structure data in a meaningful way. The IFF standard defines three types of group chunks:

The FORM chunk, contains an arbitrary collection of data chunks or other group chunks, as shown in the picture above. Our example, defines a FORM which has the type ILBM. In principle, every application file format is essentially a form in which the form type refers to the application file format identifier. In the body of the FORM several data chunks can be found:
- The BMHD defines the bitmap header containing various general settings, such as the width, height and the amount of colors used.
- The CMAP defines the red, green and blue color channel values of each color in the palette.
- The BODY chunk contains pixel data.
The CAT chunk may only contain a collection of group chunks, that is only FORM, CAT or LIST chunks.
The LIST chunk is an extended CAT chunk that also contains a number of PROP chunks. PROP chunks are group chunks which may only reside in a list and contain a collection of data chunks. These data chunks are shared properties of all group chunks inside the LIST. For example, a LIST containing ILBM FORM chunks, may use a PROP chunk containing a CMAP chunk, which purpose is to share the same palette over a number of bitmap images.

Application file formats can define their own application specific chunks and their attributes. For example, the ILBM file format defines the BMHD data as BitMap Header chunk, containing important attributes of an image, such as the width, height and the amount of colors used and the BODY chunk that stores the actual graphics data.

Apart from these basic concepts, IFF has a number of other small requirements:

If a chunk size is odd, then the chunk data must be padded with an extra 0 byte, so that the next chunk is always stored on an even address in memory (as shown in the example form). This requirement was introduced, because the 68000 processor (which the Amiga uses) processes integers much faster on even addresses in memory. In our example form shown earlier, the CMAP chunk is padded.
Also application file format attributes of word and long word sizes, must be word aligned (stored on even addresses in memory).
All integers must be big-endian, because the Amiga was a big-endian system. This means that on little-endian systems, such as PCs, the byte order of integers has to be reversed.

IFF file format support

The IFF file format is yet simple, but also powerful and served it purpose really well when the Amiga was still alive. For a very large and cool experiment (which I will keep secret for a while) I wanted to open ILBM images in a SDL application (a cross-platform library frequently used to develop games and multimedia applications), as well as modifying ILBM files and saving them. I ran into several issues:

Support for most IFF application formats is not present in many common viewers and players. However, some more advanced programs support it. For example, Paint Shop Pro and the SDL_image library have support for viewing ILBM images.
These applications all have their own implementation of a specific IFF application format. Some implementations are good, others lack certain features. For example, I have seen several viewers not supporting the special Amiga screen modes, such as Extra HalfBrite (EHB) and Hold-and-Modify (HAM) or the color range cycle chunks.
Applications can open simple IFF files that consist of a single FORM, but do not know how to deal with IFF scrap files, i.e. CATs/LISTs containing multiple FORMs of various types, possibly with shared options.
Most applications can view IFF application formats, but cannot write them or check for their validity, which may result in crashes if invalid IFF files are opened.
A number of open file formats have generic parser libraries, e.g. PNG (libpng), JPEG (libjpeg), GIF (giflib), Ogg (libogg), Vorbis (libvorbis) etc. that applications use to open, parse and save files. There is no equivalent for ILBM and other IFF application formats.

IFF libraries experiment

So after I ran into these issues I've decided to take a look at the IFF specification to see how hard it could be to implement the stuff I needed. After reading the standard, I started appreciating the IFF file format more and more, because of the simplicity and the practical purpose.

Furthermore, instead of implementing yet another crappy parser that only supports a subset, I have decided to do it right and to develop a set of general, good quality, reusable and portable libraries for this purpose, with similar goals to the other file format libraries so that application programs can support IFF application file formats as easy as the common file formats that we use nowadays.

I also think it's good to have file formats which used to be widely used, properly supported on modern platforms. Finally, it looks like fun, so why not doing it?? I did a few experiments that resulted in a number of interesting software packages.

Implementing a SDL ILBM viewer

First, I have decided to implement support for my primary use case: Proper ILBM image support in SDL applications. I have implemented a SDL-based viewer program, having the following architecture:

In the picture above, several components are shown:

libiff. This library implements the properties defined in the IFF specification, such as parsing data chunks and groups chunks. Furthermore, it also supports writing IFF files as well as conformance checking.
libilbm. This library implements the application chunks as well as the byte run compression algorithm defined in the ILBM specification. Furthermore, it supports several extension chunks and the file format used by the PC version of Deluxe Paint (which has several minor differences compared to Amiga version). Application chunks can be parsed, by defining a table with function pointers to the ILBM functions that handle these and to pass the table to the IFF library functions.
libamivideo. This library acts as a conversion library for Amiga graphics data. As explained earlier, the Amiga uses bitplanes to organise graphics and has several special screen modes (Extra-Halfbrite (EHB) and Hold-and-Modify (HAM)) to display more colors out of the predefined color registers. In the SDL viewer we use the libamivideo library to convert Amiga graphics data to chunky or RGB graphics and to emulate the special screen modes.

Images saved by the PC version of Deluxe Paint however (which have the PBM form type instead of ILBM), do not use bitplanes but chunky graphics, and thus conversion is not necessary.
SDL_ILBM. This package contains a high level SDL library as well as the ilbmviewer command-line tool, directly generating SDL surfaces from IFF files containing ILBM images as well as performing the required conversions automatically.

Usage of the SDL ILBM viewer is straight forward:

$ ilbmviewer picture.IFF

The viewer can also be used view IFF scrap files. For example, it may be possible to combine several ILBM images as well as other formats (such as a 8SVX file) into a single IFF file. By passing the combined file to the viewer, you can switch between images using the 'Page Up' and 'Page Down' keys. For example:

$ iffjoin Champagne Venus.lores Sample.8SVX > join.IFF 
$ ilbmviewer join.IFF

Below I have included some screenshots of the SDL ILBM viewer. The picture on the top left is an image included in Graphicraft, which defines a color range cycle to animate the bird and the bunny. By pressing the 'TAB' key, the viewer cycles the color range to show you the animation. The other screenshots are images included with Deluxe Paint V. As you can see, the viewer also knows how to view HAM images (the Aquarium) and AGA images (the desk).

Implementing a SDL 8SVX player

To see how well my IFF library implementation is designed, I have decided to implement a second IFF application format, namely the 8SVX format used to store 8-bit audio samples. The architecture of the SDL 8SVX player is quite similar to the SDL ILBM viewer, with the following differences:

lib8svx. This library implements the application chunks as well as the fibonacci-delta compression method defined in the 8SVX specification. As with libilbm, it also defines a table with function pointers handling application specific chunks to the IFF parser.
libresample. This library is used to convert sample rates. 8SVX samples have variable sample rates, while on the PC hardware samples are typically passed to audio buffers with a fixed sample rate. Therefore, we have to convert them.
SDL_8SVX. This package contains a library as well as the 8svxplayer command-line tool. Sample rate conversion is automatically done by the SDL library.

As with the SDL ILBM viewer, the SDL 8SVX player can also play samples from scrap IFF files:

$ iffjoin Picture.ILBM Sample.8SVX > join.IFF
$ 8svxplayer join.IFF

Backporting the ILBM viewer to AmigaOS

The third experiment I did was a really crazy one. I have backported the libraries and tools to the AmigaOS. People probably wonder why I want to do something like this, but hey: it is fun, so why shouldn't I do it? The reasons are the same why people want to backport WINE to Windows or AROS back to the Motorola 68000 platform.

Another reason is that I wanted to know how well these libraries perform on the original platform were these file formats were designed for. The Nix AmigaOS build function I have developed previously, helped me a lot in achieving this goal. Apart from a few small fixes, mainly because getopt_long() is not supported, I could easily port the codebase in a straight forward manner without implementing any workarounds.

The architecture shown above is nearly identical to the SDL ILBM viewer. The only difference is the role of the libamivideo library. In the AmigaOS viewer application, it serves the opposite goal compared to the SDL version; it converts images saved by the PC version of Deluxe Paint in chunky graphics format to bitplanes.

It was also nice to write a Intuition GUI application for AmigaOS. Back in the old days, I have never programmed in C and I never wrote a GUI application (apart from a few small experiments in Amiga BASIC), simply because I did not have the knowledge and tools available back then. The AmigaOS libraries were not very difficult to understand and to use.

Below I have included some screenshots of the UAE emulator running the viewer using my own libraries. As you can see, the GUI application has implemented Intuition menus allowing you to open other IFF files using a file picker and to navigate through IFF scrap files:

Conclusion

In this blog post I have described several software packages that resulted from my IFF file format experiments, because I could not find any IFF libraries that have all the features that I want. The purpose of these packages is to provide a set of high quality, complete, portable libraries to display, parse, write and check several IFF application formats.

All the software packages can be obtained from the IFF file format experiments subpage of my homepage and used under various free and non-copylefted software licenses, such as the MIT license and the zlib license.

I haven't made any official releases yet, nor I have defined a roadmap, so don't consider these libraries production ready. Also, the API may still evolve. Probably, at some time in the future I will make it more stable.

I have also found two other projects implementing the IFF standard:

The IFF project on Sourceforge, is a C++ library using uSTL, which deviates on some aspects. For example, it stores integers in little-endian format. Furthermore, I haven't seen any application file formats using this library.
I also found a project named libiff on Google Code. It seems to have no releases and very little documentation. I have no clue about its capabilities and features.

It is also interesting to point out that I have more stuff on my hard drive, such as libraries supporting several other file formats, which utilise several packages described here. When I can find the time, I'll make these available as well.

Thursday, April 26, 2012

Dynamic analysis of build processes to discover license constraints

As I have explained earlier in the blog post about software deployment complexity, software is rarely self-contained nowadays, but typically use many off the shelf components. Reuse has advantages, such as the fact that productivity increases and products can be finished more quickly. One of the disadvantages is an increasingly more complicated deployment process.

Apart from productivity and deployment aspects, the usage of components under Free and Open Source licenses is very popular. This is probably due to the fact that the source code is available, can be adapted and most software packages are available for free through the internet (free in price, a.k.a. gratis).

What a lot of vendors don't realize, is that most Free and Open-Source components are not in the public domain. They are in fact copyrighted and distributed under licenses ranging from simple permissive ones (allowing you to do almost everything you want including keeping modifications secret) to more complicated copyleft licenses imposing requirements on derived works. The GNU General Public License (GPL) is the most famous copyleft license around.

Because of the obligations that these licenses impose on users, licenses have become a very important non-functional requirement of software systems. Not obeying these licenses could result in costly lawsuits by copyright holders. Busybox is a well-know example of a software package which has been defended successfully in court several times.

Some clarification

Before you read on, I first want to give some clarification to readers unfamiliar with Free and Open Source software. I have written an earlier blog post about Free and Open-Source software explaining what it is and what it is not. In this blog post I also try to clarify a number of common misconceptions.

Most outsiders think that these lawsuits are about money, which is not true. These lawsuits are not held because people include FOSS components in their commercial products and ask money for these products. As I have explained earlier, selling free and open-source software is fine.

These lawsuits are held because some copyleft licenses require that the source code of the derived products or parts thereof are available under the conditions of the same license, which includes access to the source code. Typically, many vendors refrain from publishing source code and do not obey the obligations that these licenses specify. In many cases, vendors are unaware of this.

Background

The research I have done about this subject has a bit of history, which I'd like to explain here :-)

A couple of years ago, when I was in my first year as a PhD student, I've attended ICSE 2009 and there was one talk that I found very inspiring and gave me a lot of ideas. The paper was titled: 'License Integration Patterns: Addressing License Mismatches in Component-Based Development' and presented by Ahmed E. Hassan, which basically covered a large number of FOSS licenses and described patterns how to combine components governed under various licenses in a proper way.

Although their paper covers a great amount of license issues, they were still looking into automation of their patterns, for example to automatically verify derived works. This process turns out te be quite challenging because automating such processes require you to have powerful deployment tools and a complete notion of all dependencies involved in producing an artifact, such as a binary. Fortunately, deployment research is our expertise and our tools are designed for such purposes. For a while, I had several ideas about a possible solution in mind, but I never implemented anything.

Some time later at ASE 2010, which I also attended, there was another talk related to this subject titled: 'A sentence-matching method for automatic license identification of source code files' and given by Yuki Manabe. In this paper a tool was developed, called Ninka, which can be used to analyse sentences in comments of the source code, to determine under which license a source file is governed. I asked Yuki whether the tool was available somewhere, but unfortunately the idea quickly appeared on the bottom of my todo list and I forgot about this (which is a shame).

A while later Daniel German, who is involved in all the publications I have mentioned, was invited by Eelco Dolstra to visit our group in Delft. That visit resulted in an eventual collaboration between me, Eelco Dolstra (from our group), Daniel German, Julius Davies (from University of Victoria) and Armijn Hemel (who is from the gpl-violations.org project as well as owner of Tjaldur, a company specialised in software governance and license compliance engineering).

Motivation

In order to say something about the rights and obligations of software systems, you must know the following things:

How are source ﬁles combined into a final executable (e.g. static linking, dynamic linking)?
What licenses govern the (re)use of source ﬁles that were used to build the binary?
How can we derive the license of the resulting binary from the build graph containing the source ﬁles?

We together wrote a paper to provide an answer for the first question.

Approach

To provide a good answer for that question, we have crafted a method which traces system calls of build processes (essentially the involved processes and what files go in and out) and we produce build graphs out of these traces. Furthermore, the traces of each package are stored in a central database, so that inter package-dependencies can be determined.

We have used the Nix package manager to manage all the build processes. Nix is a very convenient instrument, as it has a number of good features, such as the fact that builds are pure (so no undeclared dependencies can affect the reliability of our results), that it guarantees dependency completeness (so that we are certain that no crucial dependencies affecting the license of the result binary are missed) and because Nix stores all packages in isolation in separate directories, we can easily identify inter-package relationships by looking at absolute file names. Furthermore, the Nix expression language allows us to modify the standard builder environment, without changing any package build specifications.

Tracing system calls

We trace the following system calls:

File related system calls, e.g. open(), execve()
Process related system calls, e.g. fork(), clone(), vfork()

Apart from capturing traces, there were a number issues we had to deal with:

We have to translate all relative paths to absolute paths
In Linux, pids wrap around if they exceed 32767, so we have to use a different attribute to distinguish between processes
Cycles appear in the graph, if files are read/written multiple times, which we have to remove
The are coarse grained processes, such as the install process, which install multiple files in one run. In the resulting graph it looks like the resulting artifact is dependent on all other artifiacts installed by the same process, which is not true. We have to identify these processes and rewrite them.
We don't want to know anything about the dependencies of the build tools themselves, because these are not considered derived work.

I have kept the mechanics intentionally brief here, because I don't want to explain them again here. The exact details can be found in the paper.

Build trace graphs

Below I have included a graph of cupsd, an executable belonging to the CUPS package, which we have derived with our tool:

The SVG pictures of this graph as well as several other graphs, can be found here.

By using a graph such as the one of cupsd and by using Ninka to analyse the source files in this graph, one can say something about the license under which the resulting binary is covered. In the paper, we have found an interesting problem with a well known free/open-source package, which I'm not going to reveal in this blog post :-)

Discussion

A reader with Nix experience may probably wonder why we have implemented an additional tracing approach, next to the Nix package manager. The answer is that Nix works on package level, but licenses do not always cover complete packages. There are packages in which individual files are covered under several licenses. Therefore, a more fine-grained tracing approach is required.

Unfortunately, the paper was rejected from ICSE 2012, which I was a bit disappointed about a while ago (although I'm still there anyway because I have to present another paper at HotSWUp). The fact that a paper is not "good enough" is not really what bothers me, but what bothers me is that it is a bit unclear whether this contribution is useful or useless and the fact that the solution is seen as 'too simple' (which is NOT a bad thing IMHO).

Perhaps it may indeed be too simple for a top general conference, but I also have no idea to what other type of conference or journal I could send this. And if this solution is too "practical", would it then perhaps be useful for a 'Software Engineering in Practice Track / Experience Track' at some conference? Although I have heard somebody talking about "engineering perspective", I haven't heard any reviewers suggesting about submitting to another track type.

The only thing that becomes clear to me from the reviews I have received, is that they are not really critical about the contents (although certain details can be strengthened of course), but rather about the significance of the contribution.

I have also noticed that the goal of this paper is generally misunderstood. People think that we are actually solving the complete licensing problem, but instead we provide an important ingredient, which is not there yet. Realising these build graphs, which cover complete build processes are already complicated enough, although the idea of using system call tracing for various purposes is not new. Nobody, however, has used system call tracing for this purpose yet (and therefore had to solve several problems as well). And furthermore, because we're using Nix, the process of experimenting with builds, suddenly becomes much simpler, which with conventional solutions will take significantly more effort.

If you look to the three questions I have given earlier, the paper is about the first question. The ASE 2010 paper provides a solution for the second. The third question is still future work, for an eventual license calculus. But in order to develop such a license calculus the ingredient of complete build trace graphs is required. I'm pretty sure that if I would talk to software deployment people about this, that this story will be appreciated. Unfortunately, as I have explained before, software deployment is a very cold research subject, without a real community.

I'm still thinking what to do with this paper, but I have no idea yet. Furthermore, the amount of time that I have left, is pretty limited. I have decided to put it online and announce it through this blog anyway. Normally, I always report about papers after they have been accepted, but not everything in research can be a 'success story'. Of course, I'm always open for all suggestions.

References

The paper is titled: 'Discovering Software License Constraints: Identifying a Binary's Sources by Tracing Build Processes'. As always, papers can be obtained from my publications page.

The techniques described in this blog post are becoming part of the service portfolio of Tjaldur, a company specialised in software governance. Furthermore, one day I expect that this tool is also going to be integrated in the Nix project.

UPDATE: Never give up! In the meantime, an updated version of this paper titled: "Tracing software build processes to uncover license compliance inconsistencies" has been accepted for ASE 2014! I owe a big thanks to Shane McIntosh, who did some major efforts in improving the paper, and he showed me that there are always new possibilities. Sometimes, it's good to be wrong about something! :-)