Sunday, July 2, 2017

Some reflections on my experiences with web technology

It has been a while since I wrote my last blog post. In the last couple of months, I have been working on many kinds of things, such as resurrecting the most important components of my personal web framework (that I have developed many years ago) and making them publicly available on GitHub.

There are a variety of reasons for me to temporarily switch back to this technology area for a brief period of time -- foremost, there are a couple of web sites still using pieces of my custom framework, such as those related to my voluntary work. I recently had to make changes, mostly maintenance-related, to these systems.

The funny thing is that most people do not consider me a web development (or front-end) person and this has an interesting history -- many years ago (before I started doing research) I always used to refer to "web technology" as one of my main technical interests. Gradually, my interest started to fade, up until the point that I stopped mentioning it.

I started this blog somewhere in the middle of my research, mainly to provide additional practical information. Furthermore, I have been using my blog to report on everything I do open-source related.

If I would have started this blog several years earlier, then many articles would have been related to web technology. Back then, I have spent considerable amounts of time investigating techniques, problems and solutions. Retrospectively, I regret that it took me so long to make writing a recurring habit as part of my work -- many interesting discoveries were never documented and have become forgotten knowledge.

In this blog post, I will reflect over my web programming experiences, describe some of the challenges I used to face and what solutions I implemented.

In the beginning: everything looked great


I vividly remember the early days in which I was just introduced to the internet, somewhere in the mid 90s. Around that time Netscape Navigator 2.0 was still the dominant and most advanced web browser.

Furthermore, the things I could do on the internet were very limited -- today we are connected to the internet almost 24 hours a day (mainly because many of us have smart phones allowing us to do so), but back then I only had a very slow 33K6 dial-up modem and internet access for only one hour a week.

Aside from the fact that it was quite an amazing experience to be connected to the world despite these limitations, I was also impressed by the underlying technology to construct web sites. It did not take long for me to experiment with these technologies myself, in particular HTML.

Quite quickly I was able to construct a personal web page whose purpose was simply to display some basic information about myself. Roughly, what I did was something like this:

<html>
  <head>
    <title>My homepage</title>
  </head>

  <body bgcolor="#ff0000" text="#000000">
    <h1>Hello world!</h1>

    <p>
      Hello, this is my homepage.
    </p>
    <p>
      <img src="image.jpg" alt="Image">
    </p>
  </body>
</html>

It was simply a web page with a red colored background displaying some text, hyperlinks and images:


You may probably wonder what is so special about building a web page with a dreadful background color, but before I was introduced to web technology, my programming experience was limited to various flavours of BASIC (such as Commodore 64, AMOS, GW and Quick BASIC), Visual Basic, 6502 assembly and Turbo Pascal.

Building user interfaces with these kind of technologies was quite tedious and somewhat impractical compared to using web technology -- for example, you had to programmatically define your user interface elements, size them, position them, define style properties for each individual element and programming event handlers to respond to user events, such as mouse clicks.

With web technology this suddenly became quite easy and convenient -- I could now concisely express what I wanted and the browser took care of the rendering parts.

Unknowingly, I was introduced to a discipline called declarative programming -- I could describe what I wanted as opposed to specifying how to do something. Writing applications declaratively had all kinds of advantages beyond the ability to express things concisely.

For example, because HTML code is high-level (not entirely, but is supposed to be), it does not really matter much what browser application you use (or underlying platform, such as the operating system) making your application quite portable. Rendering a paragraph, a link, image or button can be done on many kinds of different platforms from the same specification.

Another powerful property is that your application can degrade gracefully. For example, when using a text-oriented browser, your web application should still be usable without the ability to display graphics. The alternate text (alt) attribute of the image element should ensure that the image description is still visible. Even when no visualization is possible (e.g. for visually impaired people), you could, for example, use a Text to Speech system to interpret your pages' content.

Moreover, the introduction of Cascading Style Sheets (CSS) made it possible to separate the style concern from the page structure and contents making the code of your web page much more concise. Before CSS, extensive use of presentational tags could still make your code quite messy. Separation of the style concern also made it possible to replace the stylesheet without modifying the HTML code to easily give your page a different appearance.

To deal with visual discrepancies between browser implementations, HTML was standardized and standards-mode rendering was introduced when an HTML doctype was added to a HTML file.

What went wrong?


I have described a number of appealing traits of web technology in the previous section -- programming applications declaratively from a high level perspective, separation of concerns, portability because of high-level abstractions and standardization, the ability to degrade gracefully and conveniently making your application available to the world.

What could possibly be the reason to lose passion while this technology has so many compelling properties?

I have a long list of anecdotes, but most of my reasons can be categorized as follows:

There are many complex additional concerns


Most web technologies (e.g. HTML, CSS, JavaScript) provide solutions for the front-end, mainly to serve and render pages, but many web applications are much more than simply a collection of pages -- they are in fact complex information systems.

To build information systems, we have to deal with many additional concerns, such as:

  • Data management. User provided data must be validated, stored, transformed in something that can be visually represented, properly escaped so that they can be inserted into a database (preventing SQL injections).
  • Security. User permissions must be validated on all kinds of levels, such as page level, or section level. User roles must be defined. Secure connections must be established by using the SSL protocol.
  • Scalability. When your system has many users, it will no longer be possible to serve your web application from a single web server because it lacks sufficient system resources. Your system must be decomposed and optimized (e.g. by using caching).

In my experience, the tech companies I used to work for understood these issues, but I also ran into many kinds of situations with non-technical people not understanding that all these things were complicated and necessary.

One time, I even ran into somebody saying: "Well, you can export Microsoft Word documents to HTML pages, right? Why should things be so difficult?".

There is a lack of abstraction facilities in HTML


As explained earlier, programming with HTML and CSS could be considered declarative programming. At the same time, declarative programming is a spectrum -- it is difficult to draw a hard line between what and how -- it all depends on the context.

The same thing applies to HTML -- from one perspective, HTML code can be considered a "what specification" since you do not have specify how to render a paragraph, image or a button.

In other cases, you may want to do things that cannot be directly expressed in HTML, such as embedding a photo gallery on your web page -- there is no HTML facility allowing you to concisely express that. Instead, you must provide the corresponding HTML elements that implement the gallery, such as the divisions, paragraphs, forms and images. Furthermore, HTML does not provide you any facilities to define such abstractions yourself.

If there are many recurring high level concepts to implement, you may end up copying and pasting large portions of HTML code between pages making it much more difficult to modify and maintain the application.

A consequence of not being able to define custom abstractions in HTML is that it has become very common to generate pages server side. Although this suffices to get most jobs done, generating dynamic content is many times more expensive than serving static pages, which is quite silly if you think too much about it.

A very common sever side abstraction I used to implement (in addition to an embedded gallery) is a layout manager allowing you to manage static common sections, a menu structure and dynamic content sections. I ended up inventing such a component because I was used to frames, that became deprecated. Moving away from them required me to reimplement common sections of a page over and over again.

In addition to generated code, using JavaScript also has become quite common by dynamically injecting code into the DOM or transforming elements. As a result, quite a few pages will not function properly when JavaScript has been disabled or when JavaScript in unsupported.

Moreover, many pages embed a substantial amount of JavaScript significantly increasing their sizes. A study reveals that the total size of a quite a few modern web pages are equal to the Doom video game.

There is a conceptual mismatch between 'pages' and 'screens'


HTML is a language designed for constructing pages, not screens. However, information systems typically require a screen-based workflow -- users need to modify data, send their change requests to the server and update their views so that their modifications become visible.

In HTML, there are only two ways to propagate parameters to the server and getting feedback -- with hyperlinks (containing GET parameters) or forms. In both cases, a user gets redirected to another page that should display the result of the action.

For pages displaying tabular data or other complicated data structures, this is quite inconvenient -- we have to rerender the entire page each time we change something and scroll the user back to the location where the change was made (e.g. by defining anchors).

Again, with JavaScript this problem can be solved in a more proper and efficient way -- by programming an event handler (such as a handler for the click event), using the XMLHttpRequest object to send a change message to the server and updating the appropriate DOM elements, we can rerender only the affected parts.

Unfortunately, this again breaks the declarative nature of web technologies and the ability to degrade gracefully -- in a browser that lacks JavaScript support (e.g. text-oriented browsers) this solution will not work.

Also, efficient state management is a complicated problem, that you may want to solve by integrating third party JavaScript libraries, such as MobX or Redux.

Layouts are extremely difficult


In addition to the absence of abstractions in HTML (motivating me to develop a layout manager), implementing layouts in general is also something I consider to be notoriously difficult. Moreover, the layout concern is not well separated -- some aspects need to be done in your page structure (HTML code) and other aspects need to be done in stylesheets.

Although changing most visual properties of page elements in CSS is straight forward (e.g. adjusting the color of the background, text or borders), dealing with layout related aspects (e.g. sizing and positioning page elements) is not. In many cases I had to rely on clever tricks and hacks.

One of the weirdest recurring layout-tricks I used to implement is a hack to make the height of two adjacent floating divs equal. This is something I commonly used to put a menu panel next to a content panel displaying text and images. You do not know in advance the height of both panels.

I ended up solving this problems as follows. I wrapped the divs in a container div:

<div id="container">
  <div id="left-column">
  </div>
  <div id="right-column">
  </div>
</div>

and I provided the following CSS stylesheet:

#container
{
    overflow: hidden;
}

#left-column
{
    padding-bottom: 3000px;
    margin-bottom: -3000px;
}

#right-column
{
    padding-bottom: 3000px;
    margin-bottom: -3000px;
}

In the container div, I abuse the overflow property (disabling scroll bars if the height exceeds the screen size). For the panels themselves, I use a large padding value value and a equivalent negative margin. The latter hack causes the panels to stretch in such a way that their heights become equal:


(As a sidenote: the above problem can now be solved in a better way using a flexbox layout, but a couple of years ago you could not use this newer CSS feature).

The example shown above is not an exception. Another notable trick is the clear hack (e.g. <div style="clear: both;"></div>) to ensure that the height of the surrounding div grows automatically with the height of your inner divs.

As usual, JavaScript can be used to solved to abstract these oddities away, but breaks declarativity. Furthermore, when JavaScript is used for an essential part of your layout, your page will look weird if JavaScript has been disabled.

Interoperability problems


Many web technologies have been standardized by the World Wide Web Consortium (W3C) with the purpose to ensure interoperability among browsers. The W3C also provides online validator services (e.g. for HTML and CSS) that you can use to upload your code and check for its validity.

As an outsider, you may probably expect that if your uploaded code passes validation that your web application front-end is interoperable and will work properly in all browsers... wrong!

For quite some time, Internet Explorer 6 was the most dominant web browser. Around the time that it was released (2001) it completely crushed its biggest competitor (Netscape) and gained 95% market share. Unfortunately, it did not support modern web standards well (e.g. CSS 2 and newer). After winning the browser wars, Microsoft pretty much stopped its development.

Other browsers kept progressing and started to become much more advanced (most notably Mozilla Firefox). They also followed the web standards more faithfully. Although this was a good development, the sad thing was that in 2007 (6 years later) Internet Explorer 6 was still the most dominant browser with its lacking support of standards and many conformance bugs -- as a result, you were forced to implement painful IE-specific workarounds to make your web page work properly.

What I typically used to do is that I implemented a web page for "decent browsers" first (e.g. Firefox, Chrome, Safari), and then added Internet Explorer-specific workarounds (such as additional stylesheets) on top. By using conditional comments, an Internet Explorer-specific feature that treated certain comments as code, I could ensure that the hacks were not used by any non-IE browser. An example usage case is:

<!--[if lt IE 7]><link rel="stylesheet" type="text/css" href="ie-hacks.css"><![endif]-->

The above conditional comment states that if an Internet Explorer version lower than 7 is used, then the provided ie-hacks.css stylesheet should be used. Otherwise, it is treated as a comment and will be ignored.

Fortunately, Google Chrome overtook the role as the most dominant web browser and is developed more progressively eliminating most standardization problems. Interoperability today is still not a strong guarantee, in particular for new technologies, but considerably better around the time that Internet Explorer still dominated the browser market.

Stakeholder difficulties


Another major challenge are the stakeholders with their different and somewhat conflicting interests.

The most important thing that matters to the end-user is that a web application provides the information they need, that it can be conveniently found (e.g. through a search engine), and that they are not distracted too much. Visual appearance of your web site also matters in some extent (e.g. a dreadful appearance your web site will affect an end user's ability to find what they need and your credibility), but is typically not as important as most people think.

Most clients (the group of people requesting your services) are mostly concerned with visual effects and features, and not so much with the information they need to provide to their audience.

I still vividly remember a web application that I developed whose contents could be fully managed by the end user with a simple HTML editor. I deliberately kept the editor's functionality simple -- for example, it was not possible in the editor to adjust the style (e.g. the color of the text or background), because I believed that users should simply follow the stylesheet.

Moreover, I have spent substantial amounts of time explaining clients how to write for the web -- they need to structure/organize their information properly write concisely and structure/format their text.

Despite all my efforts in bridging the gap between end-users and clients, I still remember that one particular client ran away dissatisfied because of the lack of customization. He moved to a more powerful/flexible CMS, and his new homepage looked quite horrible -- ugly background images, dreadful text colors, sentences capitalized, e.g.: "WELCOME TO MY HOMEPAGE!!!!!!".

I also had been in a situation once in which I had to deal with two rivaling factions in a organization -- one being very supportive with my ideas and another being completely against them. They did not communicate with each other much and I basically had to serve as a proxy between them.

Also, I have worked with designers giving me mixed experiences. A substantial group of designers basically assumed that a web page design is the same thing as a paper design, e.g. a booklet.

With a small number of them I had quite a few difficulties explaining that web page designs need to be flexible and working towards a solution meeting these criteria -- people use different kinds of screen sizes, resolutions. People tend to resize their windows, adjust their font sizes, and so on. Making a design that is too static will affect how many users you will attract.

Technology fashions


Web technology is quite sensitive to technological fashions -- every day new frameworks, libraries and tools appear, sometimes for relatively new and uncommon programming languages.

While new technology typically provides added value, you can also quite easily shoot yourself in the foot. Being forced to work around a system's broken foundation is not particularly a fun job.

Two of my favorite examples of questionable technology adoptions are:

  • No SQL databases. At some point, probably because of success stories from Google, a lot of people consider traditional relational databases (using SQL) not to be "web scale" and massively shifted to so-called "No SQL databases". Some well known NoSQL databases (e.g. MongoDB) sacrifice properties in service of speed -- such as consistency guarantees.

    In my own experience, many applications that I developed, the relational model made perfect sense. Also, consistency guarantees were way more important than speed benefits. As a matter of fact, most of my applications were fast enough. (As a sidenote: This does not disqualify NoSQL databases -- they have legitimate use cases, but in many cases they are simply not needed).
  • Single threaded event loop server applications (e.g. applications built on Node.js).

    The single thread event loop model has certain kinds of benefits over the traditional thread/process per connection approach -- little overhead in memory and multitasking making it a much better fit to handle large numbers of connections.

    Unfortunately, most people do not realize that there are two sides of the coin -- in a single threaded event loop, the programmer has the responsibility to make sure that it never blocks so that that your application remains responsive (I have seen quite a few programmers who simply lack the understanding and discipline to do that).

    Furthermore, whenever something unexpected goes wrong you end up with an application that crashes completely making it much more sensitive to disruptions. Also, this model is not a very good fit for computationally intensive applications.

    (Again: I am not trying to disqualify Node.js or the single threaded event loop concept -- they have legitimate use cases and benefits, but it is not always a good fit for all kinds of applications).

My own web framework


I have been extensively developing a custom PHP-based web framework between 2002 and 2009. Most of its ideas were born while I was developing a web-based information system to manage a documentation library for a side job. I noticed that there were quite a few complexities that I had to overcome to make it work properly, such as database integration, user authentication, and data validation.

After completing the documentation library, I had been developing a number of similarly looking information systems with similar functionality. Over the years, I learned more techniques, recognized common patterns, captured abstractions, and kept improving my personal framework. Moreover I had been using my framework for a variety of additional use cases including web sites for small companies.

In the end, it evolved into a framework providing the following high level components (the boxes in the diagram denote packages, while the arrows denote dependency relationships):


  • php-sbdata. This package can be used to validate data fields and present data fields. It can also manage collections of data fields as forms and tables. Originally this package was only used for presentation, but based on my experience with WebDSL I have also integrated validation.
  • php-sbeditor. This package provides an HTML editor implementation that can be embedded into a web page. It can also optionally integrate with the data framework to expose a field as an HTML editor. When JavaScript is unsupported or disabled, it will fall back to a text area in which the user can directly edit HTML.
  • php-sblayout. This package provides a layout manager that can be used to manage common sections, the menu structure and dynamic sections of a page. A couple of years ago, I wrote a blog post explaining how it came about. In addition to a PHP package, I also created a Java Servlet/JSP implementation of the same concepts.
  • php-sbcrud is my partial solution to the page-screen mismatch problem and combines the concepts of the data management and layout management packages.

    Using the CRUD manager, every data element and data collection has its own URL, such as http://localhost/index.php/books to display a collection of books and http://localhost/index.php/books/1 to display an individual book. By default, data is displayed in view mode. Modifications can be made by appending GET parameters to the URL, such as: http://localhost/index.php/books/1?__operation=remove_book.
  • php-sbgallery provides an embeddable gallery sub application that can be embedded in various ways -- directly in a page, as a collection of sub pages via the layout manager, in an HTML editor, and as a page manager allowing you to expose albums of the gallery as sub pages in a web application.
  • php-sbpagemanager extends the layout manager with the ability to dynamically manage pages. The page manager can be used to allow end-users to manage the page structure and contents of a web application. It also embeds a picture gallery so that users can manage the images to be displayed.
  • php-sbbiblio is a library I created to display my bibliography on my personal homepage while I was doing my PhD research.

For nowadays' standards the features provided by the above package are considered to be old fashioned. Still, I am particularly proud of the following quality properties (some people may consider them anti-features these days):

  • Being as declarative as possible. This means that the usage of JavaScript is minimized and non essential. Although it may not efficiently deal with the page-screen mismatch because of this deliberate choice, it does provide other benefits. For example, the system is still usable when JavaScript has been disabled and even works in text-oriented browsers.
  • Small page sizes. The layout manager allows you to conveniently separate common aspects from page-specific aspects, including external stylesheets and scripts. As a result, the size of the rendered pages are relatively small (in particular compared to many modern web sites), making page loading times fast.
  • Very thin data layer. The data manager basically works with primitive values, associative arrays and PDO. It has no strong ties to any database model or an Object-Relational-Mapper (ORM). Although this may be inconvenient from a productivity point of view, the little overhead ensures that your application is fast.

Conclusion


In this blog post, I have explained where my initial enthusiasm for the web came from, my experiences (including the negative ones), and my own framework.

The fact that I am not as passionate about the web anymore did not make me leave that domain -- web-based systems these days are ubiquitous. Today, much of the work I do is system configuration, back-end and architecture related. I am not so active on the front-end side anymore, but I still look at front-end related issues from time to time.

Moreover, I have used PHP for a very long time (in 2002 there were basically not that many appealing alternatives), but I have also used many other technologies such a Java Servlets/JSP, Node.js, and Django. Moreover, I have also used many client-side frameworks, such as Angular and React.

I was also briefly involved with the development of WebDSL, an ongoing project in my former research group, but my contributions were mostly system configuration management related.

Although these technologies all offer nice features and in some way impress me, it has been a very long time that I really felt enthusiastic about anything web related.

Availability


Three years ago, I have already published the data manager, layout manager and bibliography packages on my GitHub page. I have now also published the remaining components. They can be used under the terms and conditions of the Apache Software License version 2.0.

In addition to the framework components, I published a repository with a number of example applications that have comparable features to the information systems I used to implement. The example applications share the same authentication system and can be combined together through the portal application. The example applications are GPLv3 licensed.

You may probably wonder why I published these packages after such a long time? The are variety of reasons -- I always had the intention to make it open, but when I was younger I focused myself mostly on code, not additional concerns such as documenting how the API should be used or providing example cases.

Moreover, in 2002 platforms such as GitHub did not exist yet (there was Sourceforge, but it worked on a project-level and was not as convenient) so it was very hard to publish something properly.

Finally, there are always things to improve and I always run various kinds of experiments. Typically, I use my own projects as test subjects for other projects. I also have a couple of open ideas where I can use pieces of my web framework for. More about this later.