Nitpicking: Dreadnoughtus wasn’t heavier than a jumbo jet

Warning: Aviation geek nitpicking below.

Last week saw quite a bit of press about the discovery of Dreadnoughtus schrani, a sauropod with the “largest calculable mass of any land animal”. Many reports also included some version of this figure:

Dreadnoughtus sure was heavy.

Dreadnoughtus compared with other sauropods and a 737-900. Image credit: Nature news

That figure, in turn, lead to multiple tweets appearing in my timeline claiming a dinosaur larger than a jumbo jet had been found (e.g., this one from Time). Despite an early “promise” not to do so, I eventually felt compelled to tweet a clarification:

To explain: in aviation, the term “jumbo jet” refers to certain large, wide-body aircraft like the Boeing 747 or Airbus A380; it’s generally not used to refer narrow-body aircraft like the 737-900 shown in the figure above. Relatedly, being specific is important when discussing aircraft weights, as various weights like the operating empty weight (OEW) and the maximum takeoff weight (MTOW) can differ significantly.

For example, for the three aircraft mentioned in this post (ranges arise from variations in aircraft configurations):

Model OEW (kg) MTOW (kg)
Boeing 737-900 42,901 [1] 74,389–79,016 [1]
Boeing 747-400 178,756–179,752 [2] 362,874–396,894 [2]
Airbus A380 276,800 [3] 490,000–575,000 [4]

Dreadnoughtus is (well, was) very much an impressive, enormous animal, but at a “mere” 60 tonnes, it’s absolutely dwarfed by jumbo jets. In fact, it’s quite a bit lighter than a loaded 737-900, a fact that’s probably not made very clear in that Nature news figure.

Mind you, none of this is meant to take away from the importance or interestingness of Dreadnoughtus (and you should read Brian Switek’s coverage, by the way). It’s just that I get oddly picky about precision when talking about aviation and science.

[1] “Boeing 737 Airplane Characteristics for Airport Planning: Chapter 2: Airplane Description”. (PDF)
[2] “Boeing 747 Airplane Characteristics for Airport Planning: Chapter 2: Airplane Description”. (PDF)
[3] Wikipedia: Airbus A380: Specifications
[4] “Airbus A380 Airplane Characteristics: Airport and Maintenance Planning”. (PDF)
Posted in Science | Leave a comment

Donors Choose Drive for #Ferguson-area schools

Update 1: Mrs. Randoll’s writing materials has been funded!
Update 2: Ms. Peach’s string bass has been funded!
Update 3: Mrs. Baughman’s reading mat and bean bags have been funded! Just Mr. Brown’s 3D printer left to go!

Last week, Drug Monkey organized two successful Donors Choose drives (here and here) to support schools in the Ferguson, Missouri area. If you’re not familiar with it, Donors Choose is an online charity that allows you to make donations to support teacher-selected projects in (generally) economically disadvantaged public schools.

This weekend, the Bill and Melinda Gates Foundation is providing matching funds to nearly every project on DonorsChoose.org. Since today is the last day of this “sale”, I figured it would be a great opportunity to continue what Drug Monkey started and help schools affected by the situation in Ferguson. Here are just four of the many projects I think are worth supporting:

  • Mrs. Randoll at Walnut Grove Elementary School needs writing supplies:

    Many of my students enter kindergarten without even knowing how to write their name. This year I want to make writing fun and meaningful for them by allowing them to publish their own books!

    The resources for this project will help my classroom because students will be able to share their thoughts and feelings daily through their writing. Publishing their writing will help engage them, create excitement for writing, and give them something they can share at home with their families.

    Learning how to write well is fundamental to how learning to communicate well, especially in this era where so much of our communication is text-based. So don’t you want to help this kindergarten class get their proper start?

  • Mrs. Baughman at Johnson Wabash Elementary School needs help with her library:

    I hope that creating a fun space to read and listen to stories will get them excited about all sorts of topics and encourage them to read more. My school is in a low-income area, so not every child has a stack of books waiting for them at home.

    The reading rug will provide a space for my K-2 students to come in and enjoy of fun story. Also, the bean bag chairs will provide some comfortable seating for all of my students to read independently. Currently, my library does not have any comfy seating.

    Browsing for hours at the local library and bringing home a stack of books to read was one of my favorite activities as a kid. Doing so certainly inspired my curiosity and set me down the path to my current science/engineering career. I suspect many of you have similar stories, so let’s give the kids of Johnson Wabash Elementary the same opportunity. After all, who wouldn’t like reading while sitting in a comfy bean bag?

  • Ms. Peach at Lee-Hamilton Elementary School needs a string bass:

    My students walk into my classroom excited about learning how to play a string instrument. Some students are trying out the instruments for the very first time; others have been playing for up to 3 years (since they started violin in 3rd grade), but they all are a talented group of musicians.

    Due to budget cuts, the one thing that’s missing from my orchestra is an upright string bass. I currently teach at 5 elementary schools and have only 1 bass (that is much too big for a majority of my students) available to my students to use.

    Playing the violin in orchestra was one of the great experiences of my middle and high school year. Music is a wonderful outlet for children, but an orchestra without a bass simply isn’t an orchestra. Shouldn’t we help Ms. Peach make her orchestra complete?

  • Mr. Brown at Cross Keys Middle School needs a 3D printer:

    I teach engineering and 3D modeling to middle school students. The students love using the computer software to digitally create 3D objects, but have no way to see them in real life. It would be great to have a 3D printer so that students can “print” their designs and see them come to life.

    [M]y focus is on 21st century skills (robotics, programming, 3D modeling, engineering). The students are excited about the program, and have lots of fun learning through building in my classroom.

    This is my “fun” suggestion and an opportunity I never had as a kid. Think of it as the modern replacement for woodshop—giving students an opportunity to design and build their own creations.

Of course, you are under no obligation to contribute to the projects I’ve listed. You could, for example, contribute to projects from other schools in the Ferguson area or from schools anywhere in the United States. Even if you can’t afford to give, you can help by spreading the word about Donors Choose (in general) and the generous but soon-to-expire offer from the Gates Foundation (specifically).

Thank you, Dear Readers, for all of your help!

Posted in Other | Leave a comment

Using pip with an alternate CA bundle

After recently upgrading my OS X pip install, I began having problems using it to install Python packages; for example, attempting to install requests resulted in the following:

$ pip install requests
Downloading/unpacking requests
  Cannot fetch index base URL https://pypi.python.org/simple/
  Could not find any downloads that satisfy the requirement requests
Cleaning up...
No distributions at all found for <foo>
Storing debug log for failure in $HOME/.pip/pip.log

The debug log reveals that this is an SSL certificate verification problem:

Downloading/unpacking requests
  Getting page https://pypi.python.org/simple/requests/
  Could not fetch URL https://pypi.python.org/simple/requests/: connection error: [Errno 1] _ssl.c:504: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
  Will skip URL https://pypi.python.org/simple/requests/ when looking for download links for requests

Now, pip has an internal CA bundle, but for reasons I didn’t bother looking into, that bundle no longer worked for validating the TLS certificate used by pypi.python.org. Fortunately, pip has a “--cert” option for providing an alternate CA bundle.

Connecting to PyPI in a browser allowed me to examine the TLS certificate chain:
CA chain for pypi.python.org

This told me that the pip alternate CA bundle needed to include the DigiCert “High Assurance EV Root CA” and “High Assurance CA-3” certificates, both of which can be obtained from the DigiCert Root Certificates page.

One caveat is that pip expects its (alternate) CA bundle to be in PEM format, which means just downloading the certs isn’t enough. So…sh, wget, and openssl to the rescue:

$ wget -q -O- "http://cacerts.digicert.com/DigiCertHighAssuranceEVRootCA.crt" | \
    openssl x509 -inform DER -outform PEM > ~/.pip/ca-bundle.crt
$ wget -q -O- "http://cacerts.digicert.com/DigiCertHighAssuranceCA-3.crt" | \
    openssl x509 -inform DER -outform PEM >> ~/.pip/ca-bundle.crt

Once this alternate CA bundle had been created, I could once again use pip to install Python packages; e.g.,

$ pip --cert ~/.pip/ca-certs.crt install requests
Downloading/unpacking requests
  Downloading requests-2.3.0.tar.gz (429kB): 429kB downloaded
  Running setup.py egg_info for package requests

Installing collected packages: requests
  Running setup.py install for requests
    
Successfully installed requests
Cleaning up...

Postscript: To make life easier, I added the following to my ~/.pip/pip.conf to make sure pip always uses the alternate CA bundle:

[global]
cert = /home/chl/.pip/ca-certs.crt
Posted in Technology | Tagged , | Leave a comment

Getting back into blogging

I recently realized that I haven’t updated this blog in more than seven months, and even before then, new posts were rather sporadic. While I certainly haven’t been lacking in things to say (see, for example, my activity on Twitter), I’ve avoided blogging due to a combination of (1) having other, higher-priority commitments and (2) being uncertain in the direction(s) I wanted my blogging to go.

However, after talking to others and giving it some thought, I decided I needed to get back into blogging for several reasons. First, while Twitter has helped me improve aspects of my writing (mostly in condensing thoughts and choosing words carefully), I’ve noticed that lack of use has caused my “long form” writing skills to degrade, and blogging seems like a great way to polish those skills again. Secondly, the longer format of blog posts provides a much better venue for developing and explaining “complex” ideas than a tweet storm. Finally, my startup work has really distracted me from graduate school, and I’m hoping that forcing myself to blog about PhD-related topics will get me back on track.

Going forward, I see two broad areas of focus for this blog. The first—based largely on my experiences at the startup—will be posts on programming and software development, covering topics like how to accomplish X in language Y or comparing various approaches to doing something [1]. The second, as mentioned above, will be PhD-related posts, essentially turning this blog into an open lab notebook; these will include research blogging papers I’m reading, describing methods I’m learning, and explaining my research ideas as I develop them [2]. And of course, I will continue blogging about other things, such as science communication, food, and (to a lesser extent now) politics.

So, to my few loyal readers and hopefully to some new ones as well, welcome (back) to my restarted blog!

[1] Inspired by Sebastian Raschka’s “One Python Benchmark Per Day”
[2] Inspired by various friends’ grad school blogs and “Becoming A Data Scientist”
Posted in Other | Leave a comment

Standing with DNLee [updated]

[Updated Oct. 14th: As of this afternoon, Scientific American has restored DNLee’s blog post with an editor’s note on why the post was originally taken down (“…for legal reasons…”).

While I respect the position that SciAm’s editors were in and commend them for doing the right thing by restoring the post, I still think their management of the situation as it developed misguided. The scicomm world has a fairly well-developed BS detector, and the mix of unresponsiveness and flailing for an explanation (“not about discovering science” and “too personal” before settling on “lawyers”) certainly set it off. However, the community is also reasonably patient, and I think a lot of the outrage could have been avoided had the editors just posted a simple “hey, this may cause us some legal problems, so we’re going to pull the post until we can sort it out” message from the get-go.]

Woke up this morning to find my Twitter feed in an uproar about Scientific American’s decision to take down a one of its blogger’s post; as explained by SciAm’s editor-in-chief:
SciAm's tweets why DNLee's post was taken down

The short story behind this post so clearly not about “discovering science”? DNLee, the blogger in question, was asked by Ofek, the editor of biology-online.org, to guest blog for them; after asking about the specifics, she politely declined:

Thank you very much for your reply.
But I will have to decline your offer.
Have a great day.

In a brilliant rapport-building move, Ofek responded:

Because we don’t pay for blog entries?
Are you an urban scientist or an urban whore?

Um, yeah; calling someone “an urban whore” really isn’t the way to make friends. An understandably very unhappy DNLee decided to blog about the experience—except that link is dead now because, well, see DiChristina’s tweet above (see update). Needless to say, SciAm’s decision spawned furor in the online scicomm community, as evidenced by the floods of #StandWithDNLee/#StandingWithDNLee tweets in my feed this morning.

Granted, DNLee’s post wasn’t about a headline grabbing new discovery. But I’d strongly argue that it is (or rather was) about “discovering science”. Effective science communication—i.e., the process of helping the public “discover science—can’t simply be a stream of “hey look at this cool new thing scientists discovered!” articles; it also has to be helping people understand process of how science is done, and that, unfortunately, also means exposing them to the uglier side of things, including the pervasive sexism that women in STEM fields face.

Following Dr. Isis’ lead, I’m now reposting DNLee in her own words. I encourage readers to also repost (with proper attribution!) and also hope that Scientific American blogs quickly corrects and apologizes their mistake.


wachemshe hao hao kwangu mtapoa

I got this wrap cloth from Tanzania. It’s a khanga. It was the first khanga I purchased while I was in Africa for my nearly 3 month stay for field research last year. Everyone giggled when they saw me wear it and then gave a nod to suggest, “Well, okay”. I later learned that it translates to “Give trouble to others, but not me”. I laughed, thinking how appropriate it was. I was never a trouble-starter as a kid and I’m no fan of drama, but I always took this 21st century ghetto proverb most seriously:

Don’t start none. Won’t be none.

For those not familiar with inner city anthropology – it is simply a variation of the Golden Rule. Be nice and respectful to me and I will do the same. Everyone doesn’t live by the Golden Rule it seems. (Click to embiggen.)

The Blog editor of Biology-Online dot org asked me if I would like to blog for them. I asked the conditions. He explained. I said no. He then called me out of my name.

My initial reaction was not civil, I can assure you. I’m far from rah-rah, but the inner South Memphis in me was spoiling for a fight after this unprovoked insult. I felt like Hollywood Cole, pulling my A-line T-shirt off over my head, walking wide leg from corner to corner yelling, “Aww hell nawl!” In my gut I felt so passionately:”Ofek, don’t let me catch you on these streets, homie!”

This is my official response:

It wasn’t just that he called me a whore – he juxtaposed it against my professional being: Are you urban scientist or an urban whore? Completely dismissing me as a scientist, a science communicator (whom he sought for my particular expertise), and someone who could offer something meaningful to his brand.What? Now, I’m so immoral and wrong to inquire about compensation? Plus, it was obvious me that I was supposed to be honored by the request..

After all, Dr. Important Person does it for free so what’s my problem? Listen, I ain’t him and he ain’t me. Folks have reasons – finances, time, energy, aligned missions, whatever – for doing or not doing things. Seriously, all anger aside…this rationalization of working for free and you’ll get exposure is wrong-headed. This is work. I am a professional. Professionals get paid. End of story. Even if I decide to do it pro bono (because I support your mission or I know you, whatevs) – it is still worth something. I’m simply choosing to waive that fee. But the fact is I told ol’ boy No; and he got all up in his feelings. So, go sit on a soft internet cushion, Ofek, ’cause you are obviously all butt-hurt over my rejection. And take heed of the advice on my khanga.

You don’t want none of this

Thanks to everyone who helped me focus my righteous anger on these less-celebrated equines. I appreciate your support, words of encouragement, and offers to ride down on his *$$.


Posted in Science | Tagged , | Leave a comment

Git: copying a subset of commits from another branch

Today’s git challenge was to copy a subset of commits from our current development branch (“dev” in the figures below) into our current release branch (“v2-hotfix” in the figures below). Graphically, we started with a repository looking like this:

... <-- A <-- B <-- C <-- D (v2-hotfix)
        ∧
        |                                   
        +---- Q <-- R <-- S <-- T <-- U <-- V (dev)

and wanted a repository looking like this:

... <-- A <-- B <-- C <-- D <-- R′ <-- S′ <-- T′ (v2-hotfix)
        ∧
        |                                   
        +---- Q <-- R <-- S <-- T <-- U <-- V (dev)

where R′, S′, and T′ are copies of the R, S, and T commits from the dev branch. The “obvious” use case for doing this is incorporating a set of dependent patches for fixing some bugs into our supported release branch (v2-hotfix).

Copying commits the easy way

The “easy” way of copying a sequential subset of commits is to use “git cherry-pick”. First, prepare the target v2-hotfix branch:

$ git checkout v2-hotfix
$ git stash
No local changes to save
$ git status
# On branch v2-hotfix
nothing to commit, working directory clean

Strictly speaking, the “git stash” is not necessary; however, before integrating patches from elsewhere, I like having a clean working directory to reduce the chance of merge conflicts. (In general, I find it easier to fix conflicts from a “git stash pop” than trying to resolve them when performing a merge or rebase; your milage may vary.)

Next, use “git log” to get the commit ids for the patches of interest:

$ git log --oneline dev
f36a95a Message for commit V
19ece6a Commit message for U
58ee3ac Commit message for T    <- Want this commit...
11ea6f6 Commit message for S    <- and this one...
707ac6a Commit message for R    <- and this one
824e4b8 Commit message for Q
...older commit messages for dev branch...

Finally, use “git cherry-pick <from_id>..<to_id>” and the SHA-1 ids for the commits of interest to incorporate them into the target branch:

$ git cherry-pick 707ac6a^..58ee3ac
[dev 707ac6a] Commit message for R
 .. files changed, .. insertions(+), .. deletions(-)
[dev 11ea6f6] Commit message for S
 .. files changed, .. insertions(+), .. deletions(-)
[dev 58ee3ac] Commit message for T
 .. files changed, .. insertions(+), .. deletions(-)

$ git status
# On branch v2-hotfix
nothing to commit (working directory clean)
$ git log --oneline
5141a8f Commit message for T    <- Now also the message for T′
1f69887 Commit message for S    <- Now also the message for S′
1b42c27 Commit message for R    <- Now also the message for R′
6ffbf6f Commit message for D
...older commit messages for v2-hotfix...

The key here is that because git needs a reference commit from which to generate the diff for R (and thus R′), the <from_id> argument for “git cherry-pickmust be the parent of R, i.e., 824e4b8, or equivalently, 707ac6a^.

Copying commits the harder way

The alternative, slightly harder, method uses “git rebase” and requires an additional command to achieve the same effect. It was, however, a good opportunity to learn how to use the “--onto” argument for “git rebase”, so I’ll discuss the approach here.

The key command is “git rebase --onto <new_root> <old_root> <old_tip>”, where <old_root> corresponds to <from_id> in the previous approach, <old_tip> corresponds to <to_id> and <new_root> is the branch we want to add the commits to. For the example repository in this post, we would run the following instead of “git cherry-pick”:

$ git rebase --onto v2-hotfix 707ac6a^ 58ee3ac
First, rewinding head to replay your work on top of it...
Applying: Commit message for R
Applying: Commit message for S
Applying: Commit message for T


$ git status
...output for git version > 1.8...
# HEAD detached from 6ffbf6f
nothing to commit, working directory clean
...output for git version < 1.8...
# Not currently on any branch.
nothing to commit, working directory clean

As with the cherry-pick approach, it’s important that the <old_root> argument refer to the parent of R so git can properly generate the commit R′.

Warning: be sure you don’t use a branch name instead of a commit id (or tag name) for the <old_tip>. Using a branch name will cause git to rebase (i.e., move, not copy) the branch and its associated commits as descendents of the <new_root> branch, which is definitely not the behavior we’re looking for here (refer to the git-rebase manpage for more details).

The output from “git status” tells that we’re working in a “detached HEAD” state. In fact, the state of the repository after running “git rebase --onto” looks like this:

                          (v2-hotfix)         (HEAD)
... <-- A <-- B <-- C <-- D <-- R' <-- S' <-- T'
        ^
        |                                   
        +---- Q <-- R <-- S <-- T <-- U <-- V (dev)

where the HEAD pointer is on commit T′ while the tip of the v2-hotfix branch is still on commit D. To fix this, we use:

$ git checkout -B v2-hotfix
Switched to and reset branch 'v2-hotfix'

The “-B” tells git to reset the v2-hotfix branch to the HEAD commit. We can verify this using status and log commands:

$ git status
# On branch v2-hotfix
nothing to commit (working directory clean)
$ git log --oneline
5141a8f Commit message for T    <- Now also the message for T′
1f69887 Commit message for S    <- Now also the message for S′
1b42c27 Commit message for R    <- Now also the message for R′
6ffbf6f Commit message for D
...older commit messages for v2-hotfix...

So there you go: two different ways of copying a subset of commits from one branch to another in git.

Posted in Technology | Tagged , | Leave a comment

R indexing trap

Even after nearly a decade working with it, one of the things that still catches me off guard is R's handling of zero-length index vectors. Specifically, I use this pattern in my code quite a bit:

# find elements to remove
idx = which(set.of.removal.conditions);

# now actually remove stuff
v = v[-idx];
# ...or...
d = d[,-idx];

and unfortunately, I often forget to wrap the “remove stuff” part in a conditional that verifies idx is not empty. As written here, if removal conditions aren’t met (i.e., idx is empty), R happily makes v and d zero-length and zero-column, respectively. Realizing that this is what happened isn’t always easy because (1) failing to meeting all of the removal conditions is a pretty rare occurrence, and (2) d might not be used in a way where being zero-column is a problem until much later in the script, which makes tracing the problem back to this line a lot harder.

Now, the “foolproof” way of dealing with this is to remove the which(...) statement and leave idx as a logical vector. However, I don’t like doing that if I have to reuse idx, which often has a large number of elements of which only a few are TRUE (seems a little “wasteful”).

My fundamental objection to this behavior is from a language design perspective. It makes sense to me that if idx is zero length, then “v[idx]” should also be zero-length. However, since in “normal” circumstances, “v[-idx]” means “give me all the elements of v except those with the indexes in idx”, I feel that for consistency, the same expression evaluated for an empty idx should return all of v, or at the very least, raise an exception (“are you sure you want to negate an empty vector?”).

Unfortunately, this quirk in semantics seems very fundamental to the R language, and fixing it isn’t likely going to happen. So, I suppose, it’ll just have to be another one of those things you learn to live with.

Posted in Technology | Tagged , | Leave a comment