Wednesday, October 07, 2015

Value of ASF Projects

Matt Asay wrote an interesting piece last week, that took a rough stab at the "worth" of Open Source code under the care of the Linux Foundation. All the right caveats are there, of course: this isn't really the "worth" of the code, but an approximate cost in developer-years to produce that many lines of code. Fair enough, but when the number that pops out is $5 billion, that says something awesome. No matter how you may want to fiddle with the methodology, there are very few companies on the planet that can or have produced that much code.

Then he threw out the question: does the code under the umbrella of the Apache Software Foundation have that beat? It made me curious ...

I went to OpenHub and got its list of 340 Apache projects. For each project, I fetched the "lines of code" dataset used to produce a project's chart of LOC over time. After some edge case rejects, I had LOC for 332 projects at Apache, that OpenHub knows about. The result?

The ASF represents 177,229,680 lines of code, compared to Linux Foundation's 115 million.

So yes, by this crude measure, the ASF is "worth" something like $7.5 billion.

Talk amongst yourselves...

(obviously, I didn't use Wheeler's COCOMO model, but how far off could the value be on such a large/varied dataset? I think it's also interesting that the ASF provides a space for all this to happen with a budget of only about $1 million a year)

Sunday, September 20, 2015

GPASM object files

As part of the work on my home automation system, I've been doing a lot of assembly programming for the PIC16F688. That is my chosen microcontroller for all the various embedded systems around the house.

One of the particular issues that I've run into, is that I've divided the code into modules (like a good little boy). The gputils toolchain supports separate compilation, relocatable code, and linking. SWEET! But this is assembly code. I can't instantiate the I2C slave or master code for a particular pair of pins on the '688. There are tight timing loops, so the code must directly reference the correct port and pin (as opposed to variably-defined values).

One of my control boards talks to TWO I2C busses, and can operate as both slave and master on both busses. Since I must directly reference the port/pin, this means that I need separate compilations of the assembly code for each bus. And then I run into the problem: symbol conflict.

My solution is to rewrite symbols within the library modules for each bus instantiation. So the "start" function for the I2C master (I2C_M_start in the library's object file) is rewritten to HOUSE_I2C_M_start and LOCAL_I2C_M_start.

This works out really well, though I just ran into a problem where one library refers to another library. Not only do I need to rewrite the entrypoint symbols, but also the external reference symbols.

All of this rewriting is done with some Python code. The object files are COFF files, so I wrote a minimalist library to work with GPASM's object files (rather than generic COFF files). Using that library, I have a support script to add prefixes like HOUSE_ or LOCAL_.

Here are my support scripts:

    If you're dealing with PIC object files, then maybe the above scripts will be helpful.

    As an aside, I find it rather amusing to go back to assembly programming days, yet find myself still enmeshed within libraries, object files, and linkers.

    Saturday, August 22, 2015

    My Google Code projects have moved

    Back in March, Google announced that the project hosting service on Google Code was shutting down. I wrote a post about why/how we started the service. ... But that closure time has arrived.

    There are four projects on Google Code that I work on. Here is the disposition of each one:
    This has become Apache Serf, under the umbrella of the Apache Software Foundation. Justin and I started serf at Apache back in 2003. Two people are not sufficient for an Apache community, so we moved the project out of the ASF. We had a temporary location, but moved it to Google Code's project hosting at the service's launch, where it has resided for almost 10 years. The project now has a good community and is returning to its original home.
    (link to: old project site)

    This is a portability library that I started, as a tighter replacement for APR. Haven't worked on it lately, but will get back to it, as I believe it is an interesting and needed library. I've moved it to GitHub.

    This is a very old, very simple yet capable, and mature templating library that I wrote for Python. It is used in many places due to its simplicity and speed. Also moved to GitHub.

    This is my personal repository for random non-project work. I open source everything, even if it might not be packaged perfectly for use. Somebody might find utility in a block of code, so I keep it all open. The code in this repository isn't part of a team effort, so I'm not interested in the tooling over at GitHub. I just want an svn repository to keep history, and to keep it offsite. For this repository, I've chosen Wildbit's beanstalk, and the repository has been published/opened.
    (link to: old project site)
    I'm sad to see Google Code go away, and I don't consider the above movements ideal. But it's the best I've got right now. Flow with the times...

    Saturday, March 14, 2015

    Sigh. Google Code project hosting closing down

    Google has just let us know that Google Code's project hosting will be shutting down.

    On a story over on Ars Technica, there were a lot of misconceptions about why Google chose to provide project hosting. I posted a long comment there, but want to repeat that here for posterity:

    As the Engineering Manager behind Google's project hosting's launch, I think some clarifications need to be made here. 
    In early 2005, SourceForge was not well-maintained, it was hard to use, and it was the only large hosting site available. Chris and I posed the following question, "what would happen if SourceForge went dark tomorrow". … F/OSS apocalypse. SF would take 10's of thousands of projects down with it. This wasn't too far-fetched, given the funding and team assigned to at the time. Chris and I explored possibilities: provide Google operational support, machines, or just offer to buy it outright. … Our evaluation was: we didn't need to acquire SourceForge. We just needed to provide an alternative. Provide the community with another basket for their eggs. 
    Myself and three highly-talented engineers put together the project hosting from summer 2005, to its launch at OSCON in July 2006. We let SourceForge know late 2005 what we were doing, and they added staff. We couldn't have been happier! … we never set out to kill them. Just to provide safety against a potential catastrophic situation for the F/OSS community. 
    Did GitHub provide a better tool? I think so. But recall: that is their business. Google's interest was caretaking for the F/OSS community (much the same as the Google Summer of Code). The project hosting did that for TEN YEARS. 
    I'm biased, but call that a success. 
    There are many more hosting options today, compared to what the F/OSS ecosystem was dealing with in 2005 and 2006. I'm very sad to see it close down, but I can understand. Google contributes greatly to F/OSS, but what is the incremental value of their project hosting? Fantastic in 2006, but lower today. 
    … I hope the above helps to explain where/how Google Code's project hosting came about.

    Thursday, January 15, 2015


    I've been reading Ars Technica for years. The bulk of what they do: I find awesome.

    A recent article used the phrase "Climate Denial" in its title. To me, in terms of the scientific method, there is no such thing as "denial", but simply "critical" or "questioning" or "not convinced". "Skeptical", if you will. All of these labels are fine, as they acknowledge that the hypothesis in question (AGW) is being tested. But "denial" has been used to shut down conversation, as if critical examination is no longer allowed.

    So I posted my thoughts, in the forum attached to that article, basically repeating the above.

    Ars Technica appears to have disliked my points about questioning. and that falsifiability is no longer applicable to AGW. So they closed my forum post, marking it as "trolling".

    The ridiculous thing is that somebody even replied to my post, pointing out "scientific consensus" on Wikipedia, yet that article specifically discusses that certain theories can never be proven. Only disproven (ref: falsifiability, above). So when you find a hypothesis in this pattern... the approach is to disprove.

    But nope. Ars Technica shut me down.

    I will still read you, Ars. I like your content. But when you shut down discussion? And call it trolling, despite some kind of rational basis, and an attempt at civil discussion?

    No. That is wrong, and I have lost respect for what you do.

    Sunday, October 05, 2014

    Raspberry Pi and (lack of) I2C Repeated Starts

    Just spent several hours digging into a communication bug between my Raspberry Pi and a SparkFun MPR121 breakout board. I found two core problems: the MPR121 requiring Repeated Starts in its I2C communication, and the RPi's BCM2835 not implementing them.

    MPR121 Requires Repeated Starts

    The MPR121 uses I2C to communicate with its host. There are over a hundred registers in the MPR121 that can be read. From a functional standpoint, it would look something like:
    uint8 value = read_register(uint7 addr, uint8 which_reg)
    On the I2C bus, the bits/frames look something like:
    | Start | addr/W | which | R-Start | addr/R | data | Stop |
    The Repeated Start allows the host to hold the bus during this "write which register we want, now read it" transaction. If a normal Stop-then-Start sequence was performed, then you have a time where the bus was released, allowing another I2C master to take control. That race condition could allow the other master to change the register-to-read. The original I2C master would then get bad data.

    The requirement for a Repeated Start is eminently sensible in a multi-master environment, although the MPR121 documentation does not call out this requirement. My own experimentation and a bit of Google action confirms this. Sensible, yes, but poorly documented. (Though I have to say: the MPR121 doc and application notes are overall very outstanding! Usually, I use their I2C timing diagrams rather than the formal I2C specification)

    It is also important to note that most environments are single-master, so a Repeated Start wouldn't be necessary, and the requirement is a potential burden upon the I2C master (compared to a standard write/read pair of operations).

    BCM2835 Lack of Repeated Starts

    My test code was using the Python "smbus" module to speak I2C to the MPR121 breakout. Everything was quite straight forward, and there are several tutorials and pages on the web to show you how to set up I2C on an RPi, and to use Python to control it.

    But when I tried to read the "touched" results for the 10 pads (two bytes), I kept getting the same byte values. The first byte (in register 0x00) has the first eight pads, and the second byte (at 0x01) has the next four. But that second byte was always the same, and changed right along with the first byte as I touched various pads.

    After thorough digging through code, trying alternative approaches, oscilloscope review of the I2C bus, etc, it became apparent that upon receiving a Stop condition, the MPR121 would reset the which-register value to 0x00. Thus, I'd tell the MPR121 "read from 0x01. Stop. give me the byte.", and it would always return the value from register 0x00.

    After further investigation and reading: the BCM2835 does not implement Repeated Starts. There is no way for it to read from arbitrary registers of the MPR121. Some people have attempted gimmicks around the BCM2835's 10-bit addressing feature, but I'll avoid that.

    However, you can read all the sensor data by reading two bytes, starting at register 0x00. I2C supports a block data transfer, so this is quite straightforward. In fact, the host doesn't ever have to say "start reading from 0x00" since that is the default. It can just issue a read for two bytes.

    In order for an RPi to read arbitrary registers, it would be necessary to use "bit-banging" on the GPIO pins to manually run through the I2C protocol. Code out there exists, if this feature is needed.


    In my research, I found there are a number of I2C peripherals that require a Repeated Start. Presumably for transaction purposes (as noted above). Most work just fine with a Stop/Start pair, which will work in the typical single master environment.

    In my home automation scenario, the host will (always?) be a PIC16F688 microcontroller running my own I2C master code. Needless to say, it will incorporate the Repeated Start capability.

    Hopefully, my research will help your own use of an MPR121, or the I2C bus on a Raspberry Pi.

    Capacitive Touch Sensors, revisited

    Last year, I wrote about my plan to use capacitive touch sensors in my house as "wall switches" rather than what you'd find in any sane person's house.

    I got my first batch of sensor pad boards a year ago, but never got around to actually writing a post about them. The rather poor picture to the right shows the pads and the ground-hatch on the back. You can even make out an Apache Subversion revision tag on the silk screen :-)

    It has a few problems, however: sizing is incorrect for our 1-gang electrical boxes, it is missing drill/mounting holes, and there are no cutouts so the backlighting can shine through. Whoops! When the new board is designed and back from production, then I'll get some good pictures posted.

    So why post now? It's been a year!! ... well, I finally got around to pairing this sucker up with SparkFun's MPR121 breakout board. Connected that to a Raspberry Pi, and started Real Testing. I may have to tweak the pads and traces a bit for Rev2, but it is doing very well for the first iteration.

    The largest obstacle to getting this functioning with the MPR121 was failure to use its "Auto Config" feature. Once I let the chip figure out what the hell it is connected to... it worked like a dream.

    (see next post, re: problems talking to an MPR121 from a Raspberry Pi)

    Friday, August 15, 2014

    API Endpoints for Arduinos

    As part of my home automation system, I need to connect "high-level" systems (such as the primary Linux server) down into the underlying hardware systems around the house. The PIC16F688 microcontrollers that run those systems are seriously low-level. Thus, I've chosen to place all of them onto an I2C bus(*) driven by an Arduino. Why? ... the Arduino has enough capability to mount an Ethernet port on the "high" side, and has built-in I2C support for the "low" side. It is a great mapping device between the complex systems, and the hardware systems.

    With the hardware selected, and the wiring selected, it came down to protocol. How can that Linux server talk to the Arduino, as a proxy to the individual microcontroller critters splattered across the household? ... Naturally, HTTP was my first choice, as it allows all kinds of languages, libraries, and scripts to perform the work, and even some basic interaction via a full-on browser.

    Digging into HTTP servers for the Arduino... Ugh. The landscape is very disappointing. Much of the code is poorly engineered, or the code is designed for serving web pages. During my search, I ran into a colleague's TinyWebServer. I'd call it the best out there for web serving (well-designed and well-coded), but is still overpowered for my purpose: mapping a protocol down to simple hardware interactions.

    As a result, I designed a small library to construct a simple API endpoint on the Arduino. The application can register dozens of endpoints to process different types of client requests (62 endpoints are easy, more if you want to seek out the critical edges of allowed-characters in URIs). Each endpoint can accept a number of bytes as parameters, and can return zero, or an unlimited set of bytes to the client.

    I have yet to "package" the Endpoint system, so any requests and changes are most welcome! I've got some documentation to write, along with incorporating feedback from others' and my own usage.

    Would love to get some feedback! ==>

    (*) and yes, I know an I2C bus is designed for 0.5m meter runs, rather than a whole house; bus accelerators, speed compensation, and other approaches "should" manage it. I'll report on my success/failure in a future post.