Sources of error, “Q”, and the problematics of DIY statistical analysis.

Posted: September 28th, 2009 | Author:

Denoted in the CBECS pdf’s: for all values where “Q” appears, Data [was] withheld because the Relative Standard Error (RSE) was greater than 50 percent, or fewer than 20 buildings were sampled. Values needed to complete our algorithm, while quantifiable when parsing the public “micro data”, are often missing, and tagged instead with “Q”. Humbly, this is but one source to compound upon others concerning the probability of error in our final numbers.

1.)Building as an effective whole, or merely the sum of its parts?

Considering that, as established in earlier posts, there is not at ‘public meter’ at scale of a single building for a building such as this: a mixed-use conglomerate, which I would conclude is difficult to understand as a singular entity– both phenomenologically and logistically. Methodologically, therefore, it is necessary to measure, assess, and calculate a reasonable estimate for singular establishments, and total them as parts of a whole. While this is the only methodology that is accessible to us, it neglects the potential, shall we say, ‘gestalt’ effect of mitigating consumption by the consolidation of each establishment into an entire ‘building’.

2.) How accurate is Google measurement?

As noted in the methodology post, we are having to interpolate the function and dimensions of establishments based on Google “Street View” and “My Maps”. As well, this is combined with information from the building’s CO, and finally, semantics are converted into CBECS’s terminology. I have queried Google Maps forums concerning a standard deviation for their measurement tools, to no avail (could it be that if such information were public, Google could be held liable to certain standards?). We are relying on these numbers without having a way to verify their credibility. Furthermore, the dimensions of street level establishments can only be measured along their street frontage. For those businesses on the corners, this enables the reliability of 2 dimensions, however, the depth of most of these units must be guessed at. Ultimately, our final square footage estimates of this type are not to exceed the perpendicular, or depth dimension of the corner unit. In most cases, this dimension seems too large to be the internal dimension of the smaller establishments. We eventually settled on a roughly 25′ x 25′ square for these, though this is completely arbitrary other than it seeming reasonable.


Various Realtor websites boast of a gym, atrium, and other recreational facilities. This issue is primarily concerning the 1st through 7th floors of the building, which to varying degrees occupy its entire foot print (floors 8 through 27 are one of 4 towers, and according to the CO, are only occupied by residential units). Because there is no external sign of the aformentioned amenities short of this (wish that “Bing”, with its monopoly on “birds eye view” allowed the same embedding permissions as Google), which confirms their potential whereabouts and dimensions, it is ultimately impossible to tell which is which, and exactly how much square footage these amenities occupy. Our reasonable estimate is based upon the positioning of skylights evident in the aerial photography, and with presumption that residential units on the 6th and 7th floors occupy the outer perimeter of the building. Also, the average square footage of a given unit is ≈ 900 square feet, and the building CO designates 50 “dwellings” to the 6th floor, and 59 to the 7th. The same document also designates the 6th Floor as “Apartments”, “Laundry Room” and “Health Club”, while the 7th is designated as only the former 2. The correlation makes sense: more amenities on the 6th floor, less apartments. However, if the building’s total footprint for these floors is ≈ 64,000 square feet, and the average dwelling size, including walls, is ≈ 1000, this leaves over 14,000 square feet for a gym and laundry room. Strikes me as a big gym / laundry roof. Unfortunately, few means to verify. We should also consider the architectural precedent of subtracting 2.5 to 5 percent of a building’s total footprint to account for interior walls. Therefore, total square footage is automatically subject to a margin of error between 2.5 and 5 percent.

4.)Bureaucratic Lag

I have a question: were consumption data i.e. the numbers on the bill every month, which are a considerably detailed analysis thereof, “public”, then the EIA should have no trouble, and therefore individual citizens, knowing the information that this project aims to produce. The implication of CBECS , the “S” standing for “Survey”, is that this information need be volunteered by the establishment’s proprietors. The most recent CBECS data published is from 2003. The Most Recent from RECS: 2005. The current CEQR technical manual tables are based on information from CSWMP published in 2006. While we plan to account for the differences between these years and the present by way of weather, market status, and terror alert levels; the degree to which the information we are searching for becomes “public” i.e. is processed and made accessible to the general public by public officials– is grossly behind real-time.

5.)Multi-dimensional consumption per 2 dimensional space

CBECS and RECS are, of course, subject to their own error (remember “Q”?), additionally, these numbers are generalized averages of consumption per standard unit of architectural space (square feet). While this attribution is convenient, it is also, of course, reductive in that it precludes a significant section of determinants. Easily, one could ask, what about an atrium that is 4 floors high? Would not the square footage of this space consume, by way of static conjecture, 4 times as much as the equivalent square footage in a space that is only one floor high? Incidentally, CBECS differentiates between mall and non-mall retail space. Could it be that Malls are majority atrium space compounded by the fact that retail and atrium are highly permeable in this case?

6.) Interface Service Provider (we’re for profit) will sell you, at a minimum of $159, a business profile (an example) for the Best Buy Mobile at 2 Union Square East, occupying the southwest corner of the Zeckendorf Towers. The profile contains detailed information on the business’ viability, number of employees, viability of the area and region, demographics served, etc. Given the information from the CEQR and the CSWMP, were we to know the number of employees per commercial establishment in the towers, we would be able to reasonably estimate the weight of solid waste produced by the whole building. However, there seems to be no readily available source that indicates these numbers. Our best guess is that Hoovers extracts its information from FOIA requests to the IRS, New York State Dept. of Labor, etc., contextualizes this information in its own interface, and sells it. Type “Public Information about Businesses” into a Google search field, and it will return a plethora of such “services”. In conclusion, there isn’t a convenient source that totals employees for a specific business, however it is relatively easy to find out, for example, that Starbucks has about 16,000 franchises worldwide and about 176,000 employees, for an average of ≈ 11 employees per franchise (presumably less actually clerking the stores, to account for corporate office workers). While Starbucks is convenient because of its ubiquity, other smaller operations, whether it be “Au Bon Pain” on the mid scale, or “Tower Cleaners” on the independent level, virtually no record exists in any easily accessible form.

7.)Last but not least, the ambivalence of “DIY”

While the ethic is lauded as a creative micro-revolution amongst young progressives, is it not an analogous impulse vis-a-vis information and analysis that compromises certain institutions upon which we depend (say, newspapers) potentially, and quite ironically, empowering certain undesirable elements of the status quo? Regardless, truth has always been up for grabs.

Final Accuracy

Because our calculations rely on the aggregation and correlation of disparate sources of information, each bringing their own inconclusive probability of error, there is no way of calculating a standard deviation from the back end. The most statistically rigorous conclusion we can offer is that we will either be really close, in the ball park, or completely off. Considering that you’re probably not even reading this, I will smugly remind that after a career’s worth of research on food, the best thing Michael Pollan can tell you to do is “eat food”, and after an hour and half of highly polished statistics, the best thing Al Gore can tell you to do is, “use energy efficient light bulbs.” So there.

Comments are closed.