Tag Archives: data

That’s What She sed: !awk Lessons From Fun[ctional] Programming

Somewhere at the intersection of unexpected genius, linguistic mastery, and femininity there’s a trope of compelling film/fiction that goes something like this: a character (ideally a woman or weakling) speaks a language that no one expects and suddenly reveals a competency or comprehension that strengthens his or her position, provides for some comedy, or drops a beat of provocative timing. This kind of surprising exolingual + monolingual situation is common and interesting. I’m thinking Daenaerys Tarygarden speaking Valerian, or when Nancy Travis speaks Russian to her cat-callers in So I Married an Axe-Murderer, or that scene in the Goonies when Corey Feldman, a child, gives the maid surprising instructions in Spanish, or those times on the subway when I can tell who the françaises next to me are gossiping about and giggle to myself at the semantic secrets I’m privy to by virtue of closet bilingualism. It’s a common and compelling scene, not one wholly relegated to spoken tongues; it has its echoes in computational languages too.

Unexpected fluency in a programming language is fascinating. There’s still an interesting amount of surprise that accompanies any woman speaking intelligently at a tech conference, or a child-ish programming prodigy who sells his company at 18 and enjoys wild and precocious success. With that in mind, I decided to explore some languages recently that I had little experience with, if only to investigate their utility, and build up some surprising and cinematic techcred of my own.

Ontology Web Language? (http://www.w3.org/2001/sw/wiki/OWL)

Informing this, a recent and short tumble into the land of Game of Thrones led me through the wikipedian labyrinth to LCS, this linguistic non-prof that constructs languages (conlangs), composed of member constructors (conlangers) and the responsible creators of languages like Klingon in Star Trek and Dothraki in Game of Thrones. My tangent into a trope sparked some curiosity about how we define computer languages and how we use them thereafter, and the authority of the inventors of these languages.ll-sarcasmantics

Like other languages, computational tongues are often indexed by stereotypes, but unlike spoken conlangs which have evolved to express a multiplicity of (in)translatable nuances, CSlangs often are more objectively restricted by a purpose, not developed to express all of the things but rather to accomplish a task. Valarian is “the only language for poetry,” while Dothraki is harsh and gutteral like its speaker population; SQL is a “special-purpose” query language, Objective-C is a “general purpose” object-oriented language, Visual Basic is the “most-WTF-y” language; but even in these stereotypical distinctions, coders contend about what these adjectives might mean, and who is best suited to speak these languages in such-or-such situation. And personalities presumptions align to these -types as well: women, being generally lovely and fluffy, are unlikely to speak a brutal and ugly  bash shell scripts….they should be front-end programmers because pretty, and easy. 😦

And despite this, I’ve been investing a bit of time in the prelims of every data project, somewhere between scoring a raw pile of data and shaping it up for a visualization, always accomplished via some language/library. New projects and experiments always make me wonder if there’s a better library, plugin, resource or language to articulate my objectives and otherwise get me the results I’m after, which in this case involve a bit of OCR and semantic analysis, batch processing and cleaning and file pruning where language all-around is pretty important. For this round of tech adventures, I settled on SED, but I’m sure the operations I’ll be performing in this post could be fairly accomplished by other languages. What he sed.Further notes on my actual adventure can be found here, but as a quick suite of examples, say you have a batch of files whose extensions you need to change:

blog-origLS

You can do this with text utilities:

blog-textutil

This converts all .docx files to .txt in a given directory (ignore the bogus .pdf dud).

blog-textconvert

Then, say you need to restructure file names in a directory so that you can sort them, as I wanted to by date, but your current file format is something like this:

23NY080214.txt Or ##-NY-DDMMYY.txt

You can reorder characters in a set of files by running a sed script like this:

blog-sedreorder

This tells Terminal to break up the file name by “.” to represent characters and then re-order those parenthetical entities according to the numerical set order at the end of the line (4\3\2\1) where 4=YY, 3=MM, 2=DD, 1=23NY. It makes that reorder actionable for each (*) .txt file in the directory.

blog-rename text

None of these applications is really what sed was “made for,” but I found them pretty satisfactory implementations of the language for my immediate need. Taken together, all this got me thinking about linguistic development and about the “meta”-languages of programmatic thinking, the classes and cases of computational articulation that lead us toward fluency in one or more languages, preference, and eventual specialty in the operations most suited to that lexicon.

newLangsWhile living on a continent with ~3,000+ spoken languages, pidgins, and regional dialects, I also started thinking about how the diversity of computer languages compares to other paroles of parlance, and how our systems for organizing and inventing new tongues might best map to eachother for optimal productivity. There are rough guides for this kind of crosswalkexpected hierarchies, rankings, paradigm comparisons, and schemes of which languages are appropriate for the most hardcore hackers (see also, the “Real Programmer” fallacy).

But to redirect the conversation to a more critical and less-subjective breakdown, it seems appropriate to consider the semantics of not just the language itself but also its classification schemas in trying to assess their flexibility and purpose. One of the beautiful things about objectively breaking down languages by purpose, is that they can be ranked according to their flexibility and utility, their merits, rather than subjective judgements about their syntax. As with most anything in code, bash, or whatever scripting, part of the learning process is absorbing typical commands and the rest is playing with how to appropriately pair them for more complex operations (roughly: what commends are possible and how to link them). Snooping through Stack Overflow can usually get you pretty far on the first one, the second comes later, when repeated compartmentalized operations become exhaustive and your frustration has driven you to the point of investment in some serious study or thought on how to most efficiently arrive at your goal.

comp_linguistics

languagesFor this project, I selected sed because I’d read about its utility for my purposes. I’ve got several years worth of newspaper and journal data to convert from various file formats to one, and then rename in a batch before diving into the actual contents and cleaning and reformatting. Sed seemed appropriate for this, I could probably do it in Python or bash or JS or somesuch and maybe there’s someone who’s already build an online GUI that automates all this…but I was looking for something that worked and something new to learn, a new dialect to surprise myself with. While I felt stupidly proud when surprising others with this workflow and earning the ‘hacker’ merit badge du jour at work, I didn’t choose it to be cool, I chose it because it fit my needs. I chose it because sed is simpler than awk an perl, syntactically and performatively, but it provides a variety of text processing and regex support operations, and suits most things I would need in combination with other commands. I’m still at the ‘hello world’ stage with some of the magic of stream editors, but sed had some pun promise for the title of this post so I thought I’d go with that and see how far I could get with the operations that I wanted to perform.

And this is where I started thinking, perhaps there are other language paradigms to adapt for this purpose. Taking tips from symbollic and declarative languages might be useful, if only conceptually. I’d like to type in my desired output and allow the language to fumble through the mechanics of its implementation. When in SQL and I’m select from where’ing, I’d like to sed-ify that operation for data cleaning. Select *.csv from _ directory where _[date].csv. In researching and polling friends about addtional “sql-ish” (pronounces “squish” please) languages, I came across a few interesting features that I have yet to test in practice but seem like pretty cool operations to incorporate in a meta-sed lang.

In the past, and via wikipedia, I’ve heard  “declarative” applied to XSLT. Your blocks ll-intentof code are statements, declared like: “when you get to {this} w/ property {that}, do {these things}.” You can declare them in any order and they will run in the appropriate sequence.  However, is XSLT “declarative” according to all definitions? Diving further down the language research rabbit-hole had me questioning more of what “declarative” means in this context. Despite the overwhelming arguments you can get yourself into when defending the merits of one computer language over another, the terminology used to rll-morphefer to different programmatic concepts and classification schemas can be vague, misleading and largely unhelpful if you approach them as a foreigner, with other linguistic fluencies influencing your translations. The term “declarative language” for example can reference “non-procedural”, but that is also valid for the other language styles. The author in this article linked above uses “where you declare…” to define his term of “declarative language.” With XSLT, you write blocks of procedural code, called in reaction to something in the source doc, otherwise unlinked to the calling procedure (“where you declare…”).

If you think of lots of front-end and web prog languages, they pretty much fall into this category: small blocks of code linked to a user interaction, operation (onClick listen –> then run {this}). The author features a bunch of interesting language paradigms like concatenated languages, but there are other, now (perhaps) obsolete meta-languages that also address these concepts with more flourish and in many cases the same hiccupy classification semantics that can obscure their utility. Like what about languages made to describe algorithms, APL-ish tongues with general and placeholder operators, “compression functions” to apply operators pairwise to members of a vector, right to left programming execution sequencing. Or what about REXX, a shell scripting language using juxtaposition and ‘|’ interchangeably for concatenation, using blanks as operators. Even the semantics of concatenation have been through debates about the appropriateness of the term to “co-chain” vs. just catenate (“chain”).

Both conlangs seem to require quite a bit of syntactical adjustment but have features I’ve never seem echoed in other languages. And still, the point is, no one remembers these syntactical idiosyncrasies, languages are remembered for what operations they perform and how well. Our memories are operation-orientated, perhaps not-solely focused on syntax. Are these lexicons appropriate for high poetry, are they guttural and direct; what do they evoke, how do they surprise?

Plus, I’m wondering if I even understand how to appropriately use and manipulate a language when I’m not sure how to best describe it. Taking a page from my spoken fluencies, those languages that I know best and feel most comfortable using in practice are always those whose grammar and constructs I can explain and justify with greatest ease. There’s little mastery in the unwritten blundering I do in Swahili or Creole, though I’ve spent serious time in places where they were spoken; English and French, the product of formal study and informal fumbles, I totally own like whoa.

lang

In programming there’s a declarative and imperative paradigms; likewise an imperative mood (expressing commands) in most spoken/written languages. One might read Dothraki or Klingon, a brutal class of LCS languages and particularly “imperative” in their ‘commanding’ manner, unapologetic guttural articulation. But what might be the meaning of declarative? Do many people know? The internet suggests not. As per uszhe, everyone has his own definition, disambiguations + citation needed, wikipedia, hint hint.

So what’s the best language to communicate what we want, when the writing about languages is indecisive and muddled? Probably, and unsuprisingly, the language you speak best. True masters can adapt languages to their purpose, but most still recognize that CS languages are freighted with an intention, and this limits their applicability to all situations. The ambiguity of classifications like “declarative” in reference to a few languages or other terms applied to and restricting language adoption crumbles when you consider languages for their ideal operations, and not their syntax or semantics. What is the purpose of the language, how to absorb typical commands and how to appropriately pair them for more complex operations? Operation-oriented language selection (ruby is good for… and …) rather than grammaticentric (ruby syntax is “bloated and confusing“) might be the best approach for study; one that respects the romantic tropes of surprise, and pushes you to build a vocabulary based on the declared objectives of your goal, rather than the pretense of some predefined language hierarchy.

That, appropriately, perhaps unsurprisingly, is what she sed.

ilikethisNote: I like to alliterate my titles so if you thought this would be a post about functional programming and are now disappointed, you should check out my friend Jonathon’s post on functional programming coming out in Smashing Mag at some point in the soon, or this explanation series which is fairly brill IMHO.

If you wanted more stream editing and shell scripting, some resources you might enjoy are this one, and for awk reading (the best!), this one.

Tagged , , , ,

New Economies of Innovation: Value the Tacit, Trash the Tangible

This is a blog post about economies of technology, it’s long, so let’s start out with 3 concept anec-quotes, and works it’s way to a series of bracketed themes: innovation + enterprise.

# Innovation

In a February 2013 interview with Wired, Larry Page  (Google founder) commented on Google X and paths to innovation:

When I was growing up , I wanted to be an inventor. Then I realized that there’s a lot of sad stories about inventors like Nikola Tesla, amazing people who didn’t have much impact because they never turned their inventions into businesses.

Feb. 2013, Stephen Levy, 7 Massive Ideas that Could Change the World

Let’s ignore that Tesla was in any way slighted as yet another “inventor” who lacked “impact” (WTF) and proceed. This comment led me to question whether we need to monetize to achieve, and how do we create healthy economies for qualities as ill-defined as “innovation” or “integrity.” Maybe innovation alone is an opal not a diamond: beautiful and valuable to be sure, but unless someone contrives rarity or economy (ahem, debeers) around it won’t be nearly as rad. So, can we build a business on intangibles and “values” that as yet have no monetary equivalent?

# Enterprise

Suketu Gandhi comments on this in The Wall Street Journal’s Deloitte Insight , loandefining the “postdigital enterprise”  as one where innovators can either “take your existing processes and apply these new technologies to them,” or rethink the process that technology enables you to enact. In contemporary (apparently “postdigital”) enterprise, maybe the application of technologies to process gives innovation economic weight. Do we need business process to innovate and what do we value in a digital world where lots of interactions and transactions lack the physicality of “real” life? Gandhi also cited “ the big five disruptive technologies,” 3 of which struck me as strangely nebulous, not so much ‘technologies’ as vague ‘values’ of interaction: “social,” “mobility,” “cyber security.” The ability to be social, mobile, and secure seemed to bleed outside the bounds of “technology” as I would typically define it, and venture into the fuzzy region of human interactions and freedoms in the physical world. How do we monetize these, and should we?

# Monetization

To that end, Ecologies of Knowing blogger Pavel asserted that “much of the ubiquity bitcoinbillionaireof computing today is of course driven by opportunities to monetize social interactions and shifts in cultural perception.” As a software architect, I get paid to build things that have no physical product, my work is as intangible as the concepts whose value I’m now interrogating. While part of me is proud that so much of my life is “priceless,” part of me is a bit distressed that that I haven’t founded a business on the obscure intangibles and important aspects of my life. How can we re:define an economy to appropriately capture what we value? Can we bank on innovation, social mobility and security without building an enterprise? Or do ideas lack value when they lack an emphasis on economy?

Taken together, all of these anec-quotes coalesce in the topics at hand for this blogpost: bitcoins, cultural [in]security currency, innovation ecologies/economies, and basically banking on intangibles over bills. Let’s treat each in turn.

## Bitcoin to Begin

A few weeks ago, I hosted a Stereo Semantics radio show about new forms of banking. I’m interested in the development of independent economies, new currencies of exchange appropriate for our internet and IRL environments. Part in parcel to this obsession is my newfound interest in Bitcoins. As per the consistent popularity of Bitcoin in contemporary media, I’ve built a short URList (my new favorite OSStartup) on the topic.

18 Links from: Bitcoins

moonjelly, via Urlist

To take it further, and more topically, a recent NY Times article treated Bitcoin forays into governmental policy and Bitcoin progress toward legitimacy in exchange-traded funding.

The Times tempered this topic judiciously with an explanation of Bitcoin, and my URList includes a series of past and real-time updated publications/interactives focused on the topic. IRL, I’ve attended a few meetups on Bitcoin Startup philosophy and can submit from my cursory exploration that the Bitcoin ecosystem is pretty nascent, warbly in the real world, even now, long-after it’s debut. It’s hard to codify what conditions and cooperation merit my financial “trust” but I find that most startups built on Bitcoin fall in a category of specious, less-traveled by other landscapes of internet innovation.

## [In]security Currency

So, In prefacing with this artificial currency of contemporary fascination, I started privacyIsDeadthinking about other domains where potential economies could be crafted, and I found that defining values like “trustworthiness,” “integrity,” and “security,” also meandered in a nebulous and ill-articulated part of my consciousness. A recent MoMA PS1 panel discussion on Privacy and [National} Security, further forked this thought to consider a slurry of “rights” billed to US citizens but now in question in a post-PRISM world. What do we value? What are our intangible freedoms that form the substrate of our cultural currency? Services like Highlig.ht and Sitegeist would suggest that we value proximous information over privacy. In promotional material, the former markets itself as a “sixth sense for the world around you, showing your hidden connections, and making your day more fun.” The latter bills (ha) as an “the app present[ing] solid data in a simple at-a-glance format to help you tap into the pulse of your location.” Sounds exciting, discovering a secret garden of semiotics and site-specific information? How exhilarating! Until a third party starts tracking it, and determines your habits, patterns, behaviors, your prospective memories, your potential to commit thoughtcrime… so how do we balance an interest in information with a right to resist being polled? Right now, we don’t.

A recent app built by Open Data City in Germany for a local conference tracks bitcoinminerpopulation movements in a timeseries visualization hosted here  and blogged about here. ODC’s sensors detected passive interactions with mobile devices on the conference floor via each devices’ unique mac address. The visualized animation of conference traffic from sensor perception point to point is stellar and stunning but also scary. What’s disturbing about this isn’t just the tracking of these data points, more incriminating and valuable metadata is captured daily by our social applications and email clients, later mined by 3rd party services that sell us products and promotions. What’s disturbing is that unlike those social apps that we opt into voluntarily, if idiotically, on the daily, these sensors were tracking participants without explicit consent; if you had a device (phone, laptop, tablet) you were traceable, part of someone else’s time series art project. Potentially innocuous since mac addresses were probably anonymized by some hash, probably difficult to relate to your identity, but what about the other traffic patterns evident on your device? Could tweets, correspondence, conversations be layered over mac address traffic to trace aspects of your “private” interactions? :/ The project authors allude to this in their blog post:

One thing is clear: The application displays the duality of such records. On the one hand it is clear what data traces you leave, often unconsciously. Therefore, we hope that the application will help to raise awareness for the protection of their own privacy. And is perhaps only once thought about why someone “Free Wifi” offers before you log.

Zur re:log-Website. Realisiert von OpenDataCity. Unterstützt durch picocell und newthinking. Anwendung steht unter CC-BY 3.0.
But is awareness of this enough? And are we more jazzed by the  “Open Data [City]” potential of these apps than by the one-valued privacy we enjoyed in comparative anonymity? Further, how does “freedom” articulate in our ecology of networked intelligence? Is newfound “freedom” afforded by the “open” arrangement of the internet equivalent to the right to hide or the right to expose what’s been hidden? Is it the right to keep secrets or the right to reveal them? Are these even of value? And further how do we re:define value to suit a digital landscape?

## Innovation Economies

In defense of “open data,” my fascination with Bitcoin follows from persistent interest in open source and internet innovations toward replication of analog concepts. Not going to a lie, I’m totally an open data/knowledge/info fangirl. I’ve enjoyed the transition of Encyclopedias to Wikipedias, of gift economies founded in the likes of Burning Man to online exchange platforms like TimeBanks; I can dig it. There’s an intangible quality to trade and barter of “time” or “security” over monetary payment, and perhaps those tacit economies best express in the bit and byte-built world of the internet. Maybe we need to start thinking about cultural economies, the tacit luxuries that we value for their rarity and not necessarily their potential to facilitate purchase. Intangibles like “freedom,” “privacy,” and “security” are governed by their own economies based on contemporary scarcity. If scarcity and control are the determinants of value and weight, then privacy is the gem in our the rough of our current monetary systems.  

bitcointransaction

So what’s new about this? Are bitcoins really that different from current economies? Maybe not, but they’re a provocative start to thinking about tacit economies and the value-making of intangibles. To return to the article that inaugurated this blogpost, I’ll revisit the Larry Page interview, if only to root this endless econ-odyssey in a more agreeable symmetry. In response to what he envisions as successful ideas and company concepts, Page asserted that “[y]ou just need to have the conviction to make a long-term investment and to believe that things could be a lot better.” Will the world be better with investment in a more artificial econ? Will I be more content when currency codifies not as a physical bill but as an ephemeral bit? Will that make me appreciate that money really bears little of the emotional weight that I’ve applied to it,  and that intangible and ill-defined values and virtues warrant a more miserly defense than I’ve ever invested in them? Maybe, a bit[coin]…

## Banking on Intangibles

To conclude, I’m not alone in recognizing the impact of bitcoin currency on our potential economic future, nor am I particularly brilliant at applying economic social science to even more subjective qualities of “innovation,” “privacy,” “safety” and “security,” but it’s comforting to read how new systems of value are developing in tandem with technological innovation. Their access points are becoming increasingly available to a pedestrian public, but new post-digital economies demand an understanding of what we value and how we define the ephemeral.  Do we view privacy and innovation as valuable independent of a price point applied post-facto? And as we’re building these economies, I’m not sure how we’ll incorporate those ethics and morals into the “monetizable” and “business-driven” soup of innovation.

Throughout Who Owns the Future?, Jared Lanier comments on this relationship between economy and digital society, and the cost of “free” information to social and cultural constructs.  As citizens of a digitally-driven society, how do we resist violations of our intangible values via capitalization on our social, mobile, and [in]secure interactions? Should we embrace a new economy that appreciates exchanges of ideas and information, that values innovation without insisting on its monetization? Come check out Lanier’s talk at NYPL in October to find out, and in the meantime, let me close with the indubitable paraphrased prescience of one of my favorite poets:

I like to think

(it has to be!)

of a cybernetic ec[onom]y

where we are free of our labors

and joined back to nature,

returned to our mammal brothers and sisters,

and all watched over

by machines of loving grace.

Tagged , , ,