The Lapidary Lemur

You Type Too Much

2015-10-10T17:44:58-07:00

You type too much. Whether it’s in the command line, your editor, repeating the same code patterns, or whatever else it all comes down to one thing: you type too much, and I’m here to help fix that.

The irony here is that this is a long article in which I most certainly type too much. This is far more an overview article than anything, and there will be followups detailing the covered sections at a later date.

Noted that I’ll try and include books I’ve read on some of the below that I’ve found handy. Know of another? Leave a comment!

Learn your Shell

http://linuxcommand.org/tlcl.php

Chances are high you’ve been repeating a lot of commands on your shell, especially around git and history based items. After a while all of those characters can add up, time to cut them down to size.

Noted that I mentioned some of this in an earlier article: http://baweaver.dev/blog/2013/09/29/getting-cozy-with-the-command-line/

If you notice yourself typing something more than once, or typing a string of commands you tend to forget, it’s time to break out some shell scripting and knock things down to size.

ZSH

If you haven’t checked it out yet, ZSH is loaded with aliases and extra power from the start. I won’t cover the list of features, but the ones we should be concerned with are (and some exist in BASH):

Aliasing
Tab Completion
History
Globbing

Aliasing

(note: Oh-My-ZSH has a lot of this built in: https://github.com/robbyrussell/oh-my-zsh)

How often do you find yourself typing in git add or git commit -m or other items? You can alias those into a few characters, saving a lot of typing:

alias g='git'
alias gcm='git commit -m'
alias gcb='git checkout -b'
alias gpo='git push origin'

Now what’s the difference here between that and Bash? ZSH supports global aliases which can be used anywhere in a command. Let’s say you want to keep a log of a statement:

alias -g LOG="| tee -a ~/log.txt"

Functions

Functions, much like any other language, can be used to combine actions. Sometimes a quick function in your shell is all you need.

How about getting your current branch name for a commit message?

function branch_name() { git rev-parse --abbrev-ref HEAD }

git push origin `branch_name`

# Though you could also just:
git push origin HEAD

# Though what if you want your commit messages prefixed with your task?
#
# ex: ABC-123-my-branch-name
function branch_prefix() { branch_name | cut -d'-' -f1,2 }

I tend to use this a lot for grep, less, and other common shell functions in my workflow. It’s really handy when it’s some heinous AWK or SED line I don’t want to remember.

Tab Completion

While this works in Bash with some extensions, it comes built into ZSH. Even more comes up when you have Oh-My-ZSH which uses Compleat: https://github.com/mbrubeck/compleat

If you’re like me and you prefix your git branches with tags, you can autocomplete against that if you happen to misplace a branch.

Learn your Editor

Your editor has features designed to save time as well. While autocompletion comes to mind, I personally find it tedious and not nearly as powerful as other features such as snippets and macros.

Sublime

https://sublimetextbook.com/

The first considerations in sublime should be those mentioned on the front page, such as column selection and multi-select for words. It’s worth it to read the documentation there as there are several features that will be immediately usable.

Plugins

There’s a sublime plugin for pretty well everything, including that documentation you hate to write by hand and can never remember the syntax for.

Snippets

Sublime comes built with the concept of snippets, letting you define blocks of code with interpolatable tags:

function ${1:myFunction} (${2:args}) {
  ${3:return;}
}

These can be bound to language specific contexts, preventing overlaps for potentially the same names (jasmine vs rspec snippets anyone?)

Macros

http://docs.sublimetext.info/en/latest/extensibility/macros.html

These are going to look very familiar if you’ve been using vim, at least in terms of key commands.

A macro is a series of actions that can be replayed at a later time, even bound to a key combination.

Catch yourself correcting 4 space indentation to 2 space? You can macro that!

An evil left-bracer got a hold of your files? You can macro that!

Someone is writing Java on your team and you want to get rid of it for Scala? You can macro that one too, but a bit more hackery and a priest to contain the evil during the exorcism will be required.

Vim

https://pragprog.com/book/dnvim/practical-vim

Much of the same general features in Sublime are available in Vim with some extension, including substantially more powerful macro and snippets features. Sublime just happens to come pre-baked with simpler sane defaults.

Shell out

Vim can use the command system to execute whatever you want from the shell, learning to use this will be of extreme benefit. You can even go as far as having your own scripts directory for generating more code on the fly.

Ulti-Snips

https://github.com/SirVer/ultisnips

You remember how sublime snippets can interpolate values for you? Ultisnips takes it a step further by taking those interpolated values and using them to generate more.

Why is this useful? Think initializers and documentation skeletons. Learn a bit of Python and you can be off with substantially more dynamic snippets.

What about x IDE?

IDEs are designed to cater to a wide base, and more times than not I find that assumption to make it very annoying to work with. Instead, editors like Vim and Emacs allow me to build things from the ground up, catered specifically to my style of programming.

It should come as little surprise that you tend to remember shortcuts that you yourself make as opposed to memorizing a list of commands and keyboard shortcuts.

Then why do I use Sublime on occasion you may ask? Sublime is far less likely to cause someone to incite physical violence against my person when pair programming than a modded-out instance of Vim with remapped keys everywhere.

That, and Sublime is quite frankly a much better editor for people starting out.

But emacs!

My religious preferences (vim) prevent me from giving credence to such an eVil editor :P. On a serious note, never saw much of a reason to bother with it as I already knew Vim from SysAdmin work.

Learn to recognize unabstractable duplication

There are a few frameworks out there that have a concept of generators. Two of them are Yeoman for NodeJS and Generators for Rails.

Creating generators that are catered to the style of your team can greatly reduce the time and mistakes made when implementing a new section of code. Even if that code can only accurately generate up to 70% of your shippable code, that’s 70% you know works and has passed style inspections and the like.

Yeoman Generators

http://yeoman.io/authoring/

Yeoman, it seems, has a bit of a sense of humor in that they’ve gone and made a generator-generator to help you make more generators. A bit meta, but quite useful in getting started.

Yeomen generators come with options for making prompts much like a wizard, and the nice thing is that they remember your last responses as the new default options.

Say you like a certain generator but need to get more done, just compose it with another generator to get them to run in tandem.

You could even tie them into your editor using custom made adapters. As long as you respond to IO properly, Yeoman can take care of the rest for you.

Rails Generator

A lot of people I know in the Rails world complain about the viability of scaffolded code, saying it comes no where close to what they intended. Well handily enough you can customize every step of the scaffold process, including the style of models, controllers, and views they generate.

Say you just want a simple searchable and sortable bootstrap table with CRUD operations, maybe make that an Angular or React view? You can do that with generators with some customization. You can even generate the RSpec and Jasmine for it while you’re at it.

Don’t underestimate Rails Generators because they’re abused by newer coders. Thumbing your nose at it is a serious mistake.

Learn to let your code speak for itself

Of course you could make generators for all of your code, define the perfect styles and agree on everything, but what if something else could generate it already? Seems like a waste to ignore.

Fortunately there’s such a concept of generating services code for RESTful APIs, present in a number of frameworks:

RAML - http://raml.org/
API Blueprint - https://apiblueprint.org/
Swagger - http://swagger.io/

In most of these you can treat your API definitions as generators for your client service code. Imagine not having to write services in Angular or other frontend frameworks.

Now if you’re clever, you could even tie this into an inline-api tool like ApiPie to create things in line with your actual API methods giving you both documentation and workable client code at the same time.

Finishing up

The main thing in reducing typing is to never be content running a long string of tasks. Always be looking for areas that can be improved, reduced, or eliminated altogether. Most of this same mentality is already applied to our code, so why not the process leading up to and around our code as well?

Association Aggregates Explained

2015-09-28T19:52:29-07:00

The last post covered some of the basics of aggregate commands, but left out a section explaining the more perilous aggregates of associations and more advanced querying against them.

Here are a few questions to get you thinking before we start. Given a model Foo which has_many Tags (key, value, foo_id):

How do we find the count of tags for every Foo?
How do we find a Foo with multiple matching tags? (name: ‘David Tennant’ AND color: ‘Blue’)

Suddenly ActiveRecord becomes very annoyingly complicated to use, but not to fear! We can still use SQL for all of this.

So now our application looks something like this:

# Foo: {a: String, b: String, c: String}
# Tag: {key: String, value: Text, foo_id: Integer}
# Foo has_many Tags and Tag belongs_to a Foo

# Seeds

words   = IO.readlines('/usr/share/dict/words').flat_map { |w| w.chomp.downcase }
records = 100.times.map { |i| {a: words.sample, b: words.sample, c: words.sample} }
foos    = Foo.create(records)

tag_seeds = {
  name:  [
    'William Hartnell',
    'Patrick Troughton',
    'Jon Pertwee',
    'Tom Baker',
    'Peter Davison',
    'Colin Baker',
    'Sylvester McCoy',
    'Paul McGann',
    'Chris Eccleston',
    'David Tennant',
    'Matt Smith',
    'Peter Capaldi'
  ],
  place: %w(Tardis Gallifrey Kasterborous Earth),
  color: %w(Red Blue Yellow Green Black White Orange)
}

foos.each { |foo|
  tag_seeds.each { |key, values|
    foo.tags << Tag.create(key: key, value: values.sample)
  }
  foo.save
}

A Path Lesser Traveled

Normally when you start to look into ActiveRecord Queries, you’re going to see some code that looks something like this:

Foo.where(a: 1, b: 2, c: 3)

or perhaps you’ll see the normal escaped queries:

Foo.where('a = ? AND b = ? AND c = ?', 1, 2, 3)

…but lurking in the documentation you’ll find another way entirely that’s not so often advertised by guides, allowing us the same power as the string based conditionals with what I would argue as a lot more clear way.

Foo.where('a = :a AND b = :b AND c = :c', {a: 1, b: 2, c:3})

So why not just use the first variant like any sane developer, you might wonder. Put simply, because this allows us the full leverage of string conditionals with a lot more clarity. Try and do this with the hash syntax:

Foo.where(
  'a LIKE :a OR b LIKE :b AND created_at > :date AND length(:c) > 5',
  {a: 'a%', b: 'b%', c: 5, date: 20.days.ago}
)

Counting associations

Say you want the count of tags on a foo, how would you go about it?

Foo.joins(:tags).group('foos.id').count('tags.id')

Now the conundrum here is why do we need to use group? Let’s take a look at the generated SQL:

Foo.joins(:tags).group('foos.id').count('tags.id')
   (0.5ms)  SELECT COUNT(tags.id) AS count_tags_id, foos.id AS foos_id FROM "foos" INNER JOIN "tags" ON "tags"."foo_id" = "foos"."id" GROUP BY foos.id

In order to run a count on an association, we need to aggregate the records into groups that we’ll run the count against. Handy thing is, this allows us to do a few more… interesting things:

[35] pry(main)> Foo.joins(:tags).group('tags.key').count('tags.id')
   (0.6ms)  SELECT COUNT(tags.id) AS count_tags_id, tags.key AS tags_key FROM "foos" INNER JOIN "tags" ON "tags"."foo_id" = "foos"."id" GROUP BY tags.key
=> {"color"=>100, "name"=>100, "place"=>100}

You can run aggregates for different groups, not just the supposedly common case, hence why AR wants you to specify it. In the first case, we’re simply telling it to aggregate the tags based on the id of their parent.

Finding multiple matching tags

I’m going to put a disclaimer here and say that trying to play for Single Table Inheritance hacks like this will cause you a lot more harm than good. Now if your data model is not so friendly and forces you into this, it’s something worth remembering.

Aliased Inner Joins

SQL has a concept for this, but AR currently does not give us this power. Thankfully we have access to find_by_sql:

search_tags = [
  {key: 'name', value: 'David Tennant'},
  {key: 'color', value: 'Blue'}
]

aliased_tags = search_tags.map.with_index { |tag, i| ["t#{i}", tag] }.to_h

sql_data = aliased_tags.reduce({
  sql: '', data: [], where: {}
}) { |state, (i, tag)|
  state[:sql]  << " INNER JOIN tags AS #{i} ON #{i}.key = ? "
  state[:data] << tag[:key]

  state[:where].merge!(i => {"#{i}_value" => tag[:value]})

  state
}

where_clause = sql_data[:where].reduce({
  sql_fragments: [], data: {}
}) { |state, (i, tag)|
  state[:sql_fragments] << "#{i}.value = :#{tag.keys.first}"
  state[:data].merge!(tag.symbolize_keys)
  state
}

where_sql = Foo.where(
  where_clause[:sql_fragments].join(' AND '),
  where_clause[:data]
).to_sql

select_sql, new_where_sql = where_sql.split('WHERE')
final_sql = select_sql + ' ' + sql_data[:sql] + " WHERE " + where_sql

Foo.find_by_sql([final_sql, *sql_data[:data]])

# The final SQL looks something like this:
SELECT "foos".* FROM "foos"
  INNER JOIN tags AS t0 ON t0.key = 'name'
  INNER JOIN tags AS t1 ON t1.key = 'color'
  WHERE (t0.value = 'David Tennant' AND t1.value = 'Blue')

Now note that this is most certainly not the best way to go about this, suggestions are quite welcome as to better ways to deal with this one.

What we’re doing here is in essence creating an on-the-fly single table inheritance to query against.

Admittedly a better way to do this currently eludes me, and I would recommend against using this on your own solutions.

One can use subqueries to circumvent this type of issue, but the solution will be similar if not slower.

Finishing up

As you can see, this design of ours quickly devolves into madness when querying against at the end of the article. In the next sections, I’ll be covering methods of database design to avoid these issues as much as possible.

Aggregate Active Record

2015-09-07T18:49:41-07:00

Active Record is an extremely powerful abstraction on SQL, but many a Rails programmer tends to forget that that means the entirety of the SQL standard. While it might be common knowledge for some, aggregate queries seem to be missing from the toolkit of a newer rails programmer.

For this we’ll be using a model called Foo with the fields a, b, and c. All of the fields are strings with random words chosen from OSX’s built in wordlist:

db_seed

# rails g model foo a b c
words   = IO.readlines('/usr/share/dict/words').flat_map { |w| w.chomp.downcase }
records = 10_000.times.map { |i| {a: words.sample, b: words.sample, c: words.sample} }
Foo.create(records)

Count

How many times have you done this?

Model.all.size

The problem with this one is quite simply that it’s retrieving all the records just to get a count. Seem inefficient? It is:

[3] pry(main)> Foo.all
  Foo Load (33.7ms)  SELECT "foos".* FROM "foos"

Instead, use the count method:

[4] pry(main)> Foo.count
   (0.2ms)  SELECT COUNT(*) FROM "foos"

Let’s go ahead and blank out the a field for the first thousand or so records. Note that I don’t use first, as that returns an array:

[8] pry(main)> Foo.where('id < 1000').update_all(a: nil)
  SQL (3.5ms)  UPDATE "foos" SET "a" = NULL WHERE (id < 1000)
=> 999

Now how would we get the count of records where a is present? Count takes arguments:

[9] pry(main)> Foo.count(:a)
   (1.5ms)  SELECT COUNT("foos"."a") FROM "foos"
=> 9001

Would you look at that, it’s over 9000!

Group

Let’s say we want to group our records by their length to find out how many words there are for a certain length. Ruby has a built in group_by method:

[28] pry(main)> Foo.all.group_by { |v| v.a.try(:size) }.map { |k,v| [k,v.size] }.to_h
  Foo Load (22.5ms)  SELECT "foos".* FROM "foos"
=> {nil=>999,
 14=>348,
 10=>1246,
 11=>992,
 4=>204,
 8=>1166,
 9=>1262,
 5=>404,
 6=>630,
 7=>856,
 12=>795,
 16=>117,
 13=>576,
 15=>220,
 17=>65,
 3=>51,
 18=>40,
 19=>14,
 2=>4,
 20=>6,
 1=>2,
 22=>2,
 21=>1}

That all should be enough of a trigger to start looking for an aggregate method:

[29] pry(main)> Foo.group('length(a)').count
   (9.7ms)  SELECT COUNT(*) AS count_all, length(a) AS length_a FROM "foos" GROUP BY length(a)
=> {nil=>999,
 1=>2,
 2=>4,
 3=>51,
 4=>204,
 5=>404,
 6=>630,
 7=>856,
 8=>1166,
 9=>1262,
 10=>1246,
 11=>992,
 12=>795,
 13=>576,
 14=>348,
 15=>220,
 16=>117,
 17=>65,
 18=>40,
 19=>14,
 20=>6,
 21=>1,
 22=>2}

SQL functions are perfectly valid in this context, and quite helpful as well. Just using a column name in group, we can group by similar values as well.

Pluck

Pluck doesn’t just get certain columns from a database, it can also be used for SQL functions. Let’s say we want a list of what length of words we have:

[35] pry(main)> Foo.pluck('DISTINCT length(a)')
   (3.3ms)  SELECT DISTINCT length(a) FROM "foos"
=> [nil, 14, 10, 11, 4, 8, 9, 5, 6, 7, 12, 16, 13, 15, 17, 3, 18, 19, 2, 20, 1, 22, 21]

How about the average length of our a column?

[36] pry(main)> Foo.pluck('avg(length(a))')
   (2.2ms)  SELECT avg(length(a)) FROM "foos"
=> [9.574158426841462]

Noted you can use the average(:a) function here as well:

[39] pry(main)> Foo.average('length(a)')
   (2.9ms)  SELECT AVG(length(a)) FROM "foos"
=> #

…but what you cannot do with average, min, max, and other calculation functions is this useful tidbit:

[40] pry(main)> Foo.pluck('avg(length(a))', 'max(length(a))', 'min(length(a))', 'count(a)')
   (5.4ms)  SELECT avg(length(a)), max(length(a)), min(length(a)), count(a) FROM "foos"
=> [[9.574158426841462, 22, 1, 9001]]

That one, without aggregate functions, is likely to take quite a while indeed.

Calculations

There are some other common functions that may well come in handy if you only happen to need one value:

[42] pry(main)> Foo.minimum('length(a)')
   (2.4ms)  SELECT MIN(length(a)) FROM "foos"
=> 1

[43] pry(main)> Foo.maximum('length(a)')
   (2.2ms)  SELECT MAX(length(a)) FROM "foos"
=> 22

Where

Not an aggregate per-se, but using a where clause can still use SQL functions. Say you only want records with an a field longer than 10 characters:

[44] pry(main)> Foo.where('length(a) > 10').count
   (2.1ms)  SELECT COUNT(*) FROM "foos" WHERE (length(a) > 10)
=> 3176

Maybe the count isn’t what you’re after. Perhaps you want the ids instead?:

[45] pry(main)> Foo.where('length(a) > 10').ids
   (5.0ms)  SELECT "foos"."id" FROM "foos" WHERE (length(a) > 10)
=> [1000,
 1002,
 1004,
 # ...

Never underestimate the value of being familiar with basic functions in SQL, as they’ll save your database a lot of headaches.

Finishing up

While a strong knowledge of SQL is not always necessary for Rails development, it will most certainly improve your code and your performance. Not everything has to fit into hash arguments for a where clause.

The Clairvoyant Project

2015-07-04T21:48:54-07:00

The Clairvoyant project is one of my more ambitious personal projects, with one “simple” goal in mind: Your tests should be able to generate your application code

This post will outline the beginnings of the madness that led to Clairvoyant as well as some of the details of how things are planned to be implemented.

Code as Data

The LISPers among you will notice a very common theme throughout this post. That theme is quite simply that I’m taking a ruby file and treating it as data for an entirely different parser.

The DSL is already there, the data set, the question becomes what can we divine from what we have with reasonable certainty?

Logic Languages

Along with inspirations from LISP, we’re drawing pretty from Logical languages such as Prolog. A logic program is a statement of facts used to derive an answer to a question:

{http://www.csse.monash.edu.au/~lloyd/tildeLogic/Prolog.toy/Examples/}
witch(X)  <= burns(X) and female(X).
burns(X)  <= wooden(X).
wooden(X) <= floats(X).
floats(X) <= sameweight(duck, X).

female(girl).          {by observation}
sameweight(duck,girl). {by experiment }

? witch(girl). {Now we ask it}

Now I don’t pretend to be an expert in Prolog, or even really particularly any good at it. Given that, it still reminds me of something:

rspec-example

describe Witch do
  describe '#burns' do
    it 'burns' do
      expect(subject.burns).to eq(true)
    end
  end

  describe '#wooden' do
    it 'is made of wood' do
      expect(subject.wooden).to eq(true)
    end
  end

  # ...
end

So if RSPEC looks like it’s already making assertions about the nature of our program, what happens if we treat it like a logic language?

Repurposing a DSL

The thing about Ruby is it’s incredibly flexible. The DSL from RSPEC can just as easily be hijacked and run in another compiler with this one simple trick (and I swear I won’t clickbait):

class MyParser
  def initialize(file_lines)
    @descriptions = []
    self.class_eval(file_lines)
  end
end

Now what have we done? We’ve evaluated the entirety of the loaded file in the context of our class. What happens if we redefine describe inside of there?

def describe(description, &block)
  @descriptions << description
  block.call
end

# Now when it hits 'Witch', it returns a symbol of the name instead
def const_missing(name)
  name
end

We can capture the entirety of the describe blocks, or for that matter anything else we want. As long as the spec file isn’t using ::RSpec.describe we can hijack whatever we want. If it does, it just makes it mildly more annoying to reason about.

This is effectively the current state of Clairvoyant. You can take a very basic spec file and generate a skeleton of it such that:

describe Foo do
  describe '#bar' do
    it 'does something magical' do
      expect(subject.bar).to eq(5)
    end
  end
end

Will generate:

class Foo
  # It does something magical
  #
  # @return [Integer]
  def bar
    # Code goes here later
  end
end

Granted that’s not all that impressive at this point, but beyond this stage we’re going to find some very interesting problems. This brings me to my next section.

Theoreticals

Generating a skeleton is easy enough, and still very useful in its own right. Actually writing code from expectations and matchers that could be near infinite in number and complexity? That becomes a whole different story very quickly. These are theoretical musings of the future nature of Clairvoyant as I see it.

The nature of the ‘it’ block

Past the description, we’re defining what the logic of the program is. From here we can infer a good number of relevant details:

it 'does this' do
  expect(method(a,b,c).last).to eq(5)
end

The description string of the method can be used for documentation and a nifty method description in some cases.

The actual expectation call tells us the name of the method, and potentially anything that we can call after it. In the above example we know that last can be called on the result of our method and the arity of the method can be 3. We can also guess that the method is some form of Enumerable, dropping possible options for output substantially. Given that we only have an integer here, we can make a reasonable statement that the return value of method is Array[Integer]

We have facts to work with here, and the more it methods we have inside of a describe, the more we can divine from given facts of the method. Say another test called keys on our methods return, now we can reasonably guess it’s a Hash or close derivative.

That’s well and good, but a lot of ruby methods tend to be very conditional in nature. They could return different things dependent on a context. Luckily we have just such a method we can hijack!

Contextual contexts abound

Say we find our method doing strange things dependent on what the context is. We can possibly even derive a conditional from a well laid out context description:

context 'When value is even' do
  # ...
end

We have a potentially bindable name in value, and a proper context test to throw against with is even. Of course at this point it’s going to be a lot more difficult to glean this information and will be heavily dependent on the robustness of a tokenization library and parser.

This becomes substantially more difficult to reason about, because now we’re trying to tell people how to write their tests instead of divining information from what’s already there. That may be fine for new code but can be incredibly tedious to make behave properly.

Expectational matchers

Matchers provide an even more interesting challenge, especially factoring in user defined options. We can make some reasonable assertions based on raise error to give some error handling for dynamic methods, but that again becomes dependent on context blocks being clear enough to grok.

Meta-testing

Given that we have all of that figured out, the next fun part is proving whether or not what we did even works right. At this point we can run our generated code through the RSPEC again as the core team intended to see if we made it pass. If we did, great! That’s the easy case if we’ve already gotten this far. If we haven’t on the other hand it opens up a whole different can of worms.

Meta-Meta-testing

So maybe it didn’t pass. Hey! It gave us data back to use to further refine and polish our solutions. We can use that to (hopefully) get them to pass on the next round! At this point the failed tests would be ported back and we could do a few things at this point:

Fail the method and leave a comment for the user or Attempt to repair the method to make the test pass

The first would be far more practical if we’ve gotten this far, but hey, we’re in theory land. Let’s push our luck a bit more here.

Meta-.*-testing

At this stage we would be throwing code back and forth until something works, a very brute force solution to hoping we hit the sweet spot. In something that could only be compared to the quandry of monkies writing Shakespear, we might squeeze just a bit more code out of there.

Though honestly, halting problem is just a euphemism for being dull, let’s have more fun!

S-Expression analysis

So we can get a hold of a lot of your application code as well right? Let’s not limit that. Let’s grab as much ruby code as we can stuff into memory and try and find patterns between their RSPEC code and application code. Machine learning and deep analysis can be applied to more acurately divine intended code based on community behaviors (though I will explicitly prune out you maniacs who use globals like candy.)

Throw it in a Spark cluster and let the thing roar. We’re deep into AI land of making some very interesting code generation black magic happen, and probably well beyond anything that’s been attempted up to this point.

I have no qualms saying this is well beyond me, but it sounds like a blast to try anyways.

You’re out of your mind

It wouldn’t be the first time I’ve been told this, and certainly won’t be the last. This is a personal project and a great deal of fun in learning Ruby internals along the way. Maybe one day this will be a fully functional project that can magically make your wildest dreams come true, or maybe not.

Really, when it gets down to it, that’s the fun of it. The potentials here are limitless, the problem hard, and the code plentiful. That’s the best type of problem to poke at. It’d be no fun if I knew entirely what I was doing.

Check out Clairvoyant, leave me a comment, let me know what you think!

Intro to Spark

2015-06-21T18:13:24-07:00

Assuming you’ve read the first article on Functional Programming in Scala and Python, you should be ready to sink your teeth into a few practical Spark problems

Getting Spark

The first step to running Spark is to get a standalone instance to play with on our machines.

Go to the Spark homepage: https://spark.apache.org/downloads.html

We’ll be using version 1.4.0. Select that version from releases, and select Pre-built for Hadoop 2.6 and later (unless you currently have another Hadoop / HDFS instance at a different version.)

Go ahead and download / unpack that into the directory of your choice, and cd into it.

Getting our wordlist

We’ll be using an english wordlist from SIL for the following exercises. Make sure to save wordsEn.txt somewhere where you can load it later.

Spark REPL

The last tutorial mentioned the concept of a REPL as a way to play with code interactively. Handy enough, Spark implemented its own REPL over Scala and Python (and not Java.)

For Scala that would be bin/spark-shell

For Python, it’s bin/pyspark

You should see something like this (snipped for length):

spark-repl-scala

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.3.1
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_31)
Type in expressions to have them evaluated.
Type :help for more information.

or this:

spark-repl-python

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 1.3.1
      /_/

Using Python version 2.7.5 (default, Mar  9 2014 22:15:05)
SparkContext available as sc, HiveContext available as sqlContext.

There will be a considerable amount of other debugging and logging statements than that, but for the point of this those will do as things to look for.

Spark Context

In the Spark shell, we’re given the entirety of the Spark library as sc to interact with. We can use that to load in our text file:

scala-wordlist

scala> val wordList = sc.textFile("/Users/lemur/dev/wordlist/wordsEn.txt")
// ...debugger output

python-wordlist

>>> wordList = sc.textFile("/Users/lemur/dev/wordlist/wordsEn.txt")
# ...debugger output

WARNING - Remember last time when I mentioned Spark was Lazy? If you type that path in wrong, it’s not going to tell you anything until you try and run commands on it. This is the same for a lot of functions in Spark, you won’t know it’s broken until you run it.

Now we have our files loaded into memory to do some experimentation with as RDDs (Resilient Distributed Datasets), Spark’s abstraction for distributed data.

Let’s try a basic one to start, how many lines are in the file? (I’m going to be trimming output so we don’t fill the page with debugger info)

scala-wordlist-count

scala> wordList.count()
// ...debugger output
res0: Long = 109583

python-wordlist-count

>>> wordList.count()
# ...debugger output
109583

With that you’ve just run a Spark job. Simple as that, and not much different than how you’d interact with anything else.

Starts with

Now, since this is a dictionary, each word is in there once. That makes a wordcount a bit pointless, so instead let’s get a list of what letters they start with:

starts-with-scala

scala> wordList.filter(_ != "").map(word => (word(0), 1)).reduceByKey(_+_).foreach(println)

(w,2714)
(s,12108)
(e,4494)
(a,6541)
(k,964)
(i,4382)
(y,370)
(u,3312)
(o,2966)
(q,577)
(g,3594)
(d,6694)
(z,265)
(m,5806)
(c,10324)
(p,8448)
(x,79)
(t,5530)
(b,6280)
(h,3920)
(n,2475)
(f,4701)
(j,1046)
(v,1825)
(r,6804)
(l,3363)

starts-with-python

letterCounts = wordList \
  .filter(lambda w: w != "") \
  .map(lambda w: (w[0], 1)) \
  .reduceByKey(lambda a,b: a + b) \
  .collect() # Force the result to run

>>> for count in letterCounts:
...   print count
...
(u'a', 6541)
(u'c', 10324)
(u'e', 4494)
(u'g', 3594)
(u'i', 4382)
(u'k', 964)
(u'm', 5806)
(u'o', 2966)
(u'q', 577)
(u's', 12108)
(u'u', 3312)
(u'w', 2714)
(u'y', 370)
(u'b', 6280)
(u'd', 6694)
(u'f', 4701)
(u'h', 3920)
(u'j', 1046)
(u'l', 3363)
(u'n', 2475)
(u'p', 8448)
(u'r', 6804)
(u't', 5530)
(u'v', 1825)
(u'x', 79)
(u'z', 265)

Spark SQL

On occasion we’ll have the niceties of structured data such as JSON, and Spark has just the way to deal with it using Spark SQL.

WARNING - Spark guide has been quoted as saying:

Note that the file that is offered as a json file is not a typical JSON file. Each line must contain a separate, self-contained valid JSON object. As a consequence, a regular multi-line JSON file will most often fail.

…and it will crash if you pass it actually valid JSON. If any reader knows the reasoning behind this particularly confounding piece of work, I’d love to know.

We’ll be using fake people data: https://gist.githubusercontent.com/baweaver/b6460bb96feff1faeb78/raw/4c9b46be165725d041ff47bdc042c6a4880c1877/people.json (right click to save)

Let’s go ahead and load it up using the sqlContext:

scala-sql-load

scala> val people = sqlContext.jsonFile("/Users/lemur/dev/wordlist/people.json")
people: org.apache.spark.sql.DataFrame = [_id: string, address: string, age: bigint, balance: double, company: string, email: string, eyeColor: string, gender: string, guid: string, index: bigint, isActive: boolean, latitude: double, longitude: double, name: string, phone: string, picture: string, registered: string]

// Make SURE to register it as a table
scala> people.registerTempTable("people")

python-sql-load

>>> people = sqlContext.jsonFile("/Users/lemur/dev/wordlist/people.json")

>>> people
DataFrame[_id: string, address: string, age: bigint, balance: double, company: string, email: string, eyeColor: string, gender: string, guid: string, index: bigint, isActive: boolean, latitude: double, longitude: double, name: string, phone: string, picture: string, registered: string]

# Make SURE to register it as a table
>>> people.registerTempTable("people")

Let’s start with something fairly basic on the SQL, getting the index of people who are inactive with a balance greater than $2000:

scala-sql-basic

// Note I'm calling on SQL Context here
scala> sqlContext.sql("""
     |   SELECT index
     |   FROM people
     |   WHERE isActive == false AND
     |         balance > 2000.00
     | """).count()

res1: Long = 75

python-sql-basic

>>> sqlContext.sql("""
...   SELECT index
...   FROM people
...   WHERE isActive == false AND
...         balance > 2000.00
... """).count()

75

Triple quotes are a life saver when making larger SQL-like strings.

Like SQL, you can join, count, group, and various other operations all in a big data context. It’s a shame it won’t play nicely with actual JSON, but the features are handy nonetheless.

Spark MLLib - Statistics

Spark even comes with its own Machine Learning libraries, but for the sake of brevity we’re only going to look into some of the basic statistical options. Later tutorials will address this in some depth.

We’ll be looking into the column stats of our wordList from earlier:

scala-statistics-basics

// Make SURE to import it
scala> import org.apache.spark.mllib.stat.Statistics
scala> import org.apache.spark.mllib.linalg.Vectors

scala> val wordList = sc.textFile("/Users/lemur/dev/wordlist/wordsEn.txt")

scala> val wordLengths = wordList.map(w => Vectors.dense(w.length))
wordLengths: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] = MapPartitionsRDD[6] at map at <console>:32

scala> val summaryStatistics = Statistics.colStats(wordLengths)
summaryStatistics: org.apache.spark.mllib.stat.MultivariateStatisticalSummary = org.apache.spark.mllib.stat.MultivariateOnlineSummarizer@4377e40a

// Let's take a look inside shall we?
scala> summaryStatistics.mean
res22: org.apache.spark.mllib.linalg.Vector = [8.533905806557591]

scala> summaryStatistics.max
res23: org.apache.spark.mllib.linalg.Vector = [28.0]

scala> summaryStatistics.min
res24: org.apache.spark.mllib.linalg.Vector = [0.0]

scala> summaryStatistics.variance
res25: org.apache.spark.mllib.linalg.Vector = [6.448337984119102]

python-statistics-basics

# Make SURE to import it
>>> from pyspark.mllib.stat import Statistics

>>> wordList = sc.textFile("/Users/lemur/dev/wordlist/wordsEn.txt")

# Python will take a standard list in
>>> wordLengths = wordList.map(lambda w: [len(w)])

>>> summaryStatistics = Statistics.colStats(wordLengths)

# Let's take a look inside shall we?
>>> summaryStatistics.mean()
array([ 8.53390581])

>>> summaryStatistics.max()
array([ 28.])

>>> summaryStatistics.min()
array([ 0.])

>>> summaryStatistics.variance()
array([ 6.44833798])

Wrapping Up

We’ve taken a cursory look at some of the features and basic operations of Spark. Here’s the question though, what do you as readers want to know more about? Vote on Strawpoll to let me know: http://strawpoll.me/4701594

Think of it as a choose your own adventure of sorts. I’ll be writing about all of the above in more detail, but in the order you want to see it happen.

A Functional Programming Primer for Spark

2015-06-20T19:48:31-07:00

There’s a lot of hype around Spark and Big Data in general, especially around the concepts of Functional Programming. Problem is, Functional Programming is a tall order for a standard Java programmer.

The goal of this post is to get you up to speed in the very basics of Functional Programming as they’ll later relate to Spark. I’ll be primarily covering Scala with Python alternate versions.

What about Java 8?

I do not intend to cover Java in this tutorial or any other. MapReduce is a concept based in Functional Programming, and you would be doing yourself a great disservice by trying to shoehorn Java into that role, including Java 8.

You might wonder how bad it could possibly be, perhaps I’m just biased. I would direct you to look at the Hadoop word count example and see the horrors of allowing Java patterns and card carrying GoF members to pretend they’re programming functionally:

hadoop-wordcount-example

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.net.URI;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Counter;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.hadoop.util.StringUtils;

public class WordCount2 {

  public static class TokenizerMapper
       extends Mapper<Object, Text, Text, IntWritable>{

    static enum CountersEnum { INPUT_WORDS }

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    private boolean caseSensitive;
    private Set<String> patternsToSkip = new HashSet<String>();

    private Configuration conf;
    private BufferedReader fis;

    @Override
    public void setup(Context context) throws IOException,
        InterruptedException {
      conf = context.getConfiguration();
      caseSensitive = conf.getBoolean("wordcount.case.sensitive", true);
      if (conf.getBoolean("wordcount.skip.patterns", true)) {
        URI[] patternsURIs = Job.getInstance(conf).getCacheFiles();
        for (URI patternsURI : patternsURIs) {
          Path patternsPath = new Path(patternsURI.getPath());
          String patternsFileName = patternsPath.getName().toString();
          parseSkipFile(patternsFileName);
        }
      }
    }

    private void parseSkipFile(String fileName) {
      try {
        fis = new BufferedReader(new FileReader(fileName));
        String pattern = null;
        while ((pattern = fis.readLine()) != null) {
          patternsToSkip.add(pattern);
        }
      } catch (IOException ioe) {
        System.err.println("Caught exception while parsing the cached file '"
            + StringUtils.stringifyException(ioe));
      }
    }

    @Override
    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      String line = (caseSensitive) ?
          value.toString() : value.toString().toLowerCase();
      for (String pattern : patternsToSkip) {
        line = line.replaceAll(pattern, "");
      }
      StringTokenizer itr = new StringTokenizer(line);
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
        Counter counter = context.getCounter(CountersEnum.class.getName(),
            CountersEnum.INPUT_WORDS.toString());
        counter.increment(1);
      }
    }
  }

  public static class IntSumReducer
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values,
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    GenericOptionsParser optionParser = new GenericOptionsParser(conf, args);
    String[] remainingArgs = optionParser.getRemainingArgs();
    if (!(remainingArgs.length != 2 | | remainingArgs.length != 4)) {
      System.err.println("Usage: wordcount   [-skip skipPatternFile]");
      System.exit(2);
    }
    Job job = Job.getInstance(conf, "word count");
    job.setJarByClass(WordCount2.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    List<String> otherArgs = new ArrayList<String>();
    for (int i=0; i < remainingArgs.length; ++i) {
      if ("-skip".equals(remainingArgs[i])) {
        job.addCacheFile(new Path(remainingArgs[++i]).toUri());
        job.getConfiguration().setBoolean("wordcount.skip.patterns", true);
      } else {
        otherArgs.add(remainingArgs[i]);
      }
    }
    FileInputFormat.addInputPath(job, new Path(otherArgs.get(0)));
    FileOutputFormat.setOutputPath(job, new Path(otherArgs.get(1)));

    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

That’s just a word count example, imagine the headaches of anything remotely complex in that paradigm and you’ll do the same as I did and swear off Hadoop.

Are there workarounds for it? Yes. You can also put a nice coat of paint on an old rusty car. It’ll look nicer but it’s not fooling anyone what’s under the hood.

Spark can do better

Remember that word count example? Scala and Spark do it substantially better:

spark-wordcount-examplelink

val textFile = spark.textFile("hdfs://...")
val counts = textFile.flatMap(line => line.split(" "))
                 .map(word => (word, 1))
                 .reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://...")

Even Python leaves it in the dust:

python-wordcount-examplelink

text_file = spark.textFile("hdfs://...")
counts = text_file.flatMap(lambda line: line.split(" ")) \
             .map(lambda word: (word, 1)) \
             .reduceByKey(lambda a, b: a + b)
counts.saveAsTextFile("hdfs://...")

You don’t have to spend five seconds scrolling to get through that one. The point is that by defining a mapreduce task in terms of functions, we only need to tell Spark what actions it’s taking on the data. We’ll get to what all this means later.

Why Scala over Python?

I advocate the usage of Scala in general for Big Data problems over Python. The reasoning is that Scala is a Statically typed language, surprisingly moreso than even Java (we’ll cover that in a moment.) Spark was also written in Scala, meaning its DSL is going to be very familiar if you’re any grade of Scala programmer.

Python is a great language, don’t get me wrong, but it’s not fully functional. You’ll see why that’s a big deal in a moment here. That, and I don’t like typing lambda all the time.

Basics of Functional Programming

So what is Functional Programming, besides the most thrown around concept in modern days? Quite simply it’s a program built up from Functions instead of Objects.

A little bit more into it, Functional Programming embraces a few interesting ideals:

The REPL - That’s Read Evaluate Print Loop, a program for running code inline much like a Unix Shell
Mutation is forbidden - All variables are final
Functional purity - If you pass A into a function, you’re always getting B back
Nil is dead - Null pointer exceptions begone!
Programs are composed of functions - Think writing a program on terms of verbs instead of nouns
Functions are first class citizens - You can pass functions as arguments, and even return them
Laziness is useful - Functions and values that don’t evaluate until they’re called

We’ll be getting into each of those and why they’re relevant to Spark here in a moment.

Before we get too far

You’re going to want to get Scala or Python installed so we can get at their REPLs. When you have them installed, drop into your terminal shell and type in either scala or python to drop into a REPL. Give it a swing real quick:

scala-repl

Welcome to Scala version 2.11.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_31).
Type in expressions to have them evaluated.
Type :help for more information.

scala> 5 + 5
res0: Int = 10

python-repl

Python 2.7.5 (default, Mar  9 2014, 22:15:05)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 5 + 5
10

In terms of Functional Programming, and later Spark, the REPL will quickly become your best friend.

A Function

So what’s a function? Let’s give the REPL a whirl:

scala-basic-function

scala> def add2(x:Int) = x + 2
add2: (x: Int)Int

scala> add2(3)
res1: Int = 5

Fair warning that Python wants you to put a blank line before it assumes you’re done typing

python-basic-function

>>> def add2(x): return x + 2
...
>>> add2(3)
5

Notice something interesting about Scala there? There’s no need for a return. It’s implied that the last statement in a function is the return value. You’ll also notice that the Scala REPL guessed that we’re going to return an Integer as well, as per the method signature.

Now what’s in a method signature? It’s a contract, a guarantee of a return type. This brings us to our next concept

Goodbye to Nil

Now remember when I said that Scala was more statically typed than Java? Try to give add2 a nil and see what happens in both languages:

scala-none-function

scala> add2(None)
<console>:9: error: type mismatch;
 found   : None.type
 required: Int
              add2(None)

python-none-function

>>> add2(None)
Traceback (most recent call last):
  File "", line 1, in <module>
  File "", line 1, in add2
TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'

But that’s not nil! That’s something called None!

So what’s the difference? Put briefly, Scala and Python both do not have a concept of nil which is a very very good thing for us.

Unless we explicitly tell Scala it can take a None, it will always throw a type error. So how do we define something that might take a value or might not? That’s what we have Option for:

scala> def add2Maybe(x:Option[Int]) = x.getOrElse(0) + 2
add2Maybe: (x: Option[Int])Int

scala> add2Maybe(Some(2))
res6: Int = 4

scala> add2Maybe(None)
res7: Int = 2

Python does not have this concept, it only replaces nil with None.

Higher Order Functions and Map

One of the most powerful concepts in Functional Programming is the ability to pass functions as arguments. By doing this we’re afforded a great deal of flexibility in defining abstract interfaces for basic operations.

WARNING Python users, normally you’re going to want to use List Comprehensions for this type of thing. Since you’re going to be applying this to Spark, it’s necessary to know these types of functions.

Take map for instance, a function that applies a function to a list. This will be confusing for first timers in this territory, so stick with me for a bit here:

scala-map

scala> List(1,2,3,4).map { i => i * 2 }

python-map

>>> map(lambda x: x * 2, [1,2,3,4])

Now what do you suppose those two do? We’re passing in a function that takes an argument x and returns x * 2. We’re applying that function to each element in the list, so let’s step through this in Scala:

// val is short for value, or an immutable variable
scala> val myList = List(1,2,3,4)
myList: List[Int] = List(1, 2, 3, 4)

scala> myList.map { i => i * 2 }

// First iteration:  i is 1, returns 2
// Second iteration: i is 2, returns 4
// Third iteration:  i is 3, returns 6
// Fourth iteration: i is 4, returns 8
// ...and now we have a new list returned:
res8: List[Int] = List(2, 4, 6, 8)

// You remember I said immutable? What's myList right now?
scala> myList
res9: List[Int] = List(1, 2, 3, 4)

So not only did we double each element in the list, but our original list is untouched. Given this, we can transform myList however we want and it’ll never change the value of it. Now if we want that result for something, we can always create a new val to save it.

An aside, Scala is very good about trying to simplify things when it can:

scala-short-map

scala> List(1,2,3,4) map (_ * 2)
res10: List[Int] = List(2, 4, 6, 8)

We don’t really need the dot there, and whenever something only takes one parameter Scala will be more than happy to take an underscore to shorten it up for us. While this may seem obscuring to some, it’s a very common pattern in Scala. Best to understand what it’s doing because no amount of rudimentary googling is going to turn that up without some fidgeting, but such are operator and syntactic sugar searches.

Higher Order Functions - Filter

The next function on our list is filter, which takes a function and applies it to each element of a list looking for elements where the result is true:

scala-filter

scala> List(1,2,3,4) filter (_ > 2)
res10: List[Int] = List(3, 4)

python-filter

>>> filter(lambda x: x > 2, [1,2,3,4])
[3, 4]

Higher Order Functions - Reduce

This one is going to be a bit trickier, as it takes a function with two arguments: an accumulator and a value. It reduces a list of elements into one element. Now why would you want such a function? Think of something such as a sum. I’ll be using longhand here as this is one of the harder first functions to really understand:

scala-reduce

scala> List(1,2,3,4).reduce { (accumulator, i) => accumulator + i }
res11: Int = 10

python-reduce

>>> reduce(lambda accumulator, i: accumulator + i, [1,2,3,4])
10

But how did that work? Let’s step through the logic here in Scala:

scala-reduce-explained

scala> List(1,2,3,4).reduce { (accumulator, i) => accumulator + i }

// In our first iteration, the accumulator is either set to a default value,
// or the head element of the list is used. In this case it's 1

// First iteration - accumulator: 1, i: 2 => 3

// This function returns 3, which is passed in as the next value of the
// accumulator:

// Second iteration - accumulator: 3, i: 3 => 6
// Third iteration  - accumulator: 6, i: 4 => 10

// Now we're out of elements, so reduce returns the accumulator as the result:
res11: Int = 10

Naturally there’s a shorthand for this:

scala-reduce-shorthand

scala> List(1,2,3,4).reduce(_+_)
res12: Int = 10

An astute reader will notice that we used two underscores here. Scala binds arguments in succession to the underscore, making for a bit more confusion in searching.

Higher Order Functions - Closures

One of the really nifty things about Functions in languages like Scala is that they capture their local environment when they’re defined. What do I mean by that? Let’s take a look:

scala-closure

scala> def adder(x:Int) = (y:Int) => x + y
adder: (x: Int)Int => Int

scala> val add3 = adder(3)
add3: Int => Int = <function1>

scala> add3(5)
res14: Int = 8

So where did it get 3 from? It remembered the value, or in functional terms it closed over the value when it was defined.

How is this handy? Let’s use map and pass it our function:

scala-closure-map

scala> List(1,2,3,4).map(add3)
res15: List[Int] = List(4, 5, 6, 7)

Let’s take it one step further though and just use adder. After all, we might need a bit more flexibility there:

scala-closure-map-dynamic

scala> List(1,2,3,4).map(adder(5))
res17: List[Int] = List(6, 7, 8, 9)

So now we have a function which returns a function that gets used by map. To put it another way, let’s have at filter:

scala-closure-filter

scala> def divisibleBy(y:Int) = (x:Int) => x % y == 0
divisibleBy: (y: Int)Int => Boolean

scala> List(1,2,3,4).filter(divisibleBy(2))
res18: List[Int] = List(2, 4)

Laziness can be good

You might wonder when a concept like a lazy value might be handy. Let’s say you need an infinite list, or stream, from which you have no clue how many elements you’ll either need or get from it.

How about an infinite stream of Fibonacci numbers, straight from the Scala source code:

scala-lazy

scala> val fibs: Stream[BigInt] =
  BigInt(0) #:: BigInt(1) #:: fibs.zip(fibs.tail).map { n => n._1 + n._2 }
fibs: Stream[BigInt] = Stream(0, ?)

scala> fibs.take(10).foreach(println)
0
1
1
2
3
5
8
13
21
34

That’s a lot of code to digest, but it gives us an infinite stream of Fibonacci numbers. Let’s break it apart a bit:

scala-lazy-explained

// We're creating a new value that's a stream of BigInts
val fibs: Stream[BigInt] =
  // Where the first value is zero, lazily concatenated with
  BigInt(0) #::
  // The second value, which is one, lazily concatenated with
  BigInt(1) #::
  // A function that takes the current fibonnaci numbers,
  // zips them with their tail, and adds those pairs together
  fibs.zip(fibs.tail).map { n => n._1 + n._2 }

// What are zip and tail?
scala> val zipList = List(1,2,3,4)
zipList: List[Int] = List(1, 2, 3, 4)

// Remember head? It gets the first element of our list
scala> zipList.head
res22: Int = 1

// Tail just gets the rest
scala> zipList.tail
res23: List[Int] = List(2, 3, 4)

// It takes two lists and zips them together into tuple pairs
scala> zipList.zip(zipList.tail)
res24: List[(Int, Int)] = List((1,2), (2,3), (3,4))

// Now about that map function:
// map { n => n._1 + n._2 }
//
// That just adds the two elements of the tuple together. In Scala, _n is the
// nth element of the list, non-zero indexed

How is this relevant?

Now that we have all these components, let’s take another look at that wordcount example for Spark:

spark-wordcount-examplelink

val textFile = spark.textFile("hdfs://...")
val counts = textFile.flatMap(line => line.split(" "))
                 .map(word => (word, 1))
                 .reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://...")

Some of those look familiar? Let’s dissect it a bit:

spark-wordcount-example-explainedlink

// We're reading our document from HDFS, and storing it in textFile
val textFile = spark.textFile("hdfs://...")

// Now we're defining our pipeline - By the way, this is lazy
val counts =
  textFile
    // Flat map is very similar to map, except it flattens the results after it
    // gets them (see flatmap below)
    //
    // What we're doing here is splitting each line by whitespace, and then
    // flattening into one stream of words to go through
    .flatMap(line => line.split(" "))
    // Then we're mapping all those words into a tuple, we'll see why in a
    // second
    .map(word => (word, 1))
    // Reduce by key takes all similar keys and reduces the values with a
    // function, in this case a sum
    .reduceByKey(_ + _)

// Why the tuple and reduce by key? Normally you'd use a groupBy operator here,
// but that does not parallelize cleanly.
//
// What we do to compensate here is make tuples so that we can send specific
// words to different partitions to be reduced

// NOW the pipeline gets called, as we want a value out of it. In this case it
// saves a new text file
counts.saveAsTextFile("hdfs://...")

// Flat Map
scala> List(List(1,2,3), List(2,3,4)).map(list => list.map(adder(2)))
res27: List[List[Int]] = List(List(3, 4, 5), List(4, 5, 6))

scala> List(List(1,2,3), List(2,3,4)).flatMap(list => list.map(adder(2)))
res28: List[Int] = List(3, 4, 5, 4, 5, 6)

Now there’s a lot more to Spark than this, but now you’ve got a grounding by which you can build on. Next we’ll be looking more into Spark specifically.

Sublime Scoping with Rails

2015-05-04T22:21:12-07:00

Even the most ardent adherent of skinny controllers will find themselves plagued by the ferocious number of filters demanded for any non-trivial search on their models. Given enough attributes, you’ll notice your controller starting to look a little hairy

class PeopleController
  def index
    @people = Person.where(name: params[:name]) if params[:name]
    @people = @people.where(birthday: params[:birthday_start]..params[:birthday_end]) if params[:birthday_start] && params[:birthday_end]
    @people = @people.where(sex: params[:sex]) if params[:sex]
  end
end

The horrifying trend will only continue as our demand for searching power grows, which begs the question: How can we tame this mess?

Strong Params

Your first line of defense against this will be using strong params to your advantage. They’re not only for creating objects.

Let’s try something out in the console:

[1] pry(main)> ActionController::Parameters.new({a: 1, b: 2})
=> {"a"=>1, "b"=>2}
[2] pry(main)> _.permit(:a)
Unpermitted parameter: b
=> {"a"=>1}

So by using permit on our parameters object, we can filter down a hash to only our permitted values. So what if we did something like this?

class PeopleController
  def index
    @people = Person.where(params.permit(:name, :sex))
    @people = @people.where(birthday: params[:birthday_start]..params[:birthday_end]) if params[:birthday_start] && params[:birthday_end]
  end
end

With that we’ve already cleaned out a lot of the cruft of our controller, but what about that last one?

Scoping and Class Methods

We can get rid of it as well, using either scoping or class methods to take care of it for us:

class Person
  # We can go with a scope:
  scope :born_between, -> start, end { where(age: start..end) }

  # ...or a class method:
  def self.born_between(start, end)
    where(age: start..end)
  end
end

Which will let us trim down our controller even a little more here:

class PeopleController
  def index
    @people = Person.where(params.permit(:name, :sex)).born_between(params[:age_start], params[:age_end])
  end
end

Conditional Scoping

The astute reader will note that the above method is going to fail gloriously should we forget either of those params. We could always drop it to another variable and mutate people, but that’s generally frowned upon and doesn’t normally produce superheroes.

What we can do, however, is introduce a more conditional scoping method. Class methods are, after all, ruby methods. Let’s use them to their potential a bit more:

class Person
  def self.born_between(start, end = Time.now)
    start ? where(age: start..end) : all
  end
end

By throwing in an all, we can conditionally chain freely.

Like Scoping

The problem is, that name search just isn’t doing it for us. We don’t want to break out solr or trigrams quite yet, but we can use some like queries to make it a bit more flexible:

class PeopleController
  def index
    @people = Person.where(params.permit(:sex)).born_between(params[:age_start], params[:age_end])
    @people = @people.where('name LIKE ?', params[:name]) if params[:name]
  end
end

Though we spend all that time getting rid of postfix if checks, can we do something about this one as well?

class Person
  def self.where_name_like(name)
    name ? where('name LIKE ?', name) : all
  end

  def self.born_between(start_date, end_date = Time.now)
    start_date ? where(age: start_date..end_date) : all
  end
end

That we can:

class PeopleController
  def index
    @people =
      Person
        .where(params.permit(:sex))
        .born_between(params[:age_start], params[:age_end])
        .where_name_like(params[:name])
  end
end

Like that, we’ve eliminated another suffix if.

More advanced filtering

Though most of these examples have been fairly straightforward, there will be times when you have to break out some joins and other operations depending on your parameters. Strong params aren’t going to cut it on those, but class methods just might do the trick.

We have a new model to work with, Post, and with it the following controller:

class PostsController
  def index
    @posts = Post.where(params.permit(:name))
    @posts = @posts.join(:users).where(users: {id: params[:user_id]}) if params[:user_id]
    @posts = @posts.includes(:comments) if params[:show_comments]
    @posts = @posts.includes(:tags).where(tag: {name: JSON.parse(params[:tags])}) if params[:tags]
  end
end

Some of those earlier techniques just aren’t going to cut it, and it’s going to be a lot more difficult to be intention revealing here. Including comments and tags unless we have to could be a big expense, so we need to keep those under conditionals to prevent unnecessary data from being fetched.

We’re going to have to use something new here. Let’s condense those conditionals into a scope:

class Post
  def self.by_user(args = {})
    args[:if] ? join(:users).where(users: {id: args[:if]}) : all
  end

  def self.with_comments(args = {})
    args[:if] ? includes(:comments) : all
  end

  def self.with_tags(args = {})
    args[:if] ? includes(:tags).where(tag: {name: JSON.parse(args[:if])}) : all
  end
end

Noted that keyword arguments would be very unhappy with us using if there, making it a no-go.

Which allows us to write a much clearer controller:

class PostsController
  def index
    @posts =
      Post
        .where(params.permit(:name))
        .by_user(if: params[:user_id])
        .with_comments(if: params[:show_comments])
        .with_tags(if: params[:tags])
  end
end

Through just a few simple scoping mechanisms, we can trim down our controllers while still getting a very useful search from vanilla rails.

The Impersonal Interview

2015-03-22T20:21:38-07:00

It’s been said time and time again, our technical interview process is broken. Lately, it’s become fashionable to attack the interview process, but little seems to be done in regards to it. This is my opinion on the matter.

IQ and GPA are Irrelevant

Google famously came out saying that GPA was worthless. Trick problems and brain teasers were doing little to no good in revealing good engineers.

In an industry where it’s borderline impossible to establish reliable metrics to programmers skills, is it any wonder that interviews backfire? Despite this, we’re trying to use one metric in particular to measure our coders, and it’s doing a great deal of harm to the industry: memorization.

Standardized Testing in Schools

When confronted with the idea of standardized testing, teachers I’ve spoken to have been quick to say that they are a poor measure of students. Some kids just don’t learn by memorization and test like that, potentially brilliant young people are barred because they can’t memorize a fact sheet. Why should they?

What bothered me in school was that I was expected to memorize a bunch of information that was literally inches from me, either via internet or the text book. Sure, I could memorize the quadratic formula, but what would be the point? Memory is faulty, but being able to automate or make reference of something is true value.

We’re testing rote memorization

If this is failing so badly in our schools, why are we applying the same principles to coding interviews?

It’s not uncommon to have pre-interview filters ask manual page questions, system internals, or things in general that would take no more than a second for someone to find in reference. Instead, we expect them to have an instant answer to these questions when rarely do coders actually have such information memorized.

The amount of false-negatives here is staggering, especially for cross-job interviews such as a developer seeking devops jobs or vice-versa. Of course a developer won’t have systems knowledge memorized, and likewise an administrator probably won’t have the entire Skiena’s book of algorithms memorized. Does this make them incapable of the job? Hardly.

We’re not studying for a test. We’re trying to show that we have what it takes to innovate, to build. Memory is an abhorrent measure of aptitude. If GPA is already pegged for this, we should be doing away with rote memorization in much the same manner.

A Cache System

An engineer’s true power lies not in memorization. If anything, I would argue that it’s a severe weakness to have an engineer who insists on only memorizing large sums of information. It’s inefficient. Much like a computer, only information that is immediately relevant should be cached. That’s why we have references and man pages, we’ve relegated the information to a metaphorical hard drive for later recovery when it’s needed.

Here are a few examples:

Instead of memorizing a system, dictate it into reference.
Instead of memorizing a deployment process, automate it.
Instead of memorizing esoteric language behaviors, write hooks in your editor and VCS to catch them
Instead of memorizing how to manually get system uptime and kernel information, write a tool to fetch it and return relevant information

The list goes on.

An experienced engineer has a lot of knowledge to draw on from past jobs. Chances are they’ve probably forgotten more than a more junior engineer has claimed to have memorized. Does this make them less valuable? Hardly. They recognize that it’s sometimes necessary not to attempt to know everything up front.

Now granted that an experienced engineer is going to be far more effective in finding the correct references and information. It makes a world of difference and can really speak to the skill level of a person:

the older I get, the more convinced I am that the key qualities of a senior engineer are research skills and leveraging past experience #fb
— Scott Francis ن (@darkuncle) March 27, 2015

Who would you rather have working for you? Someone who memorizes your entire deployment process to a T, or someone who automates the entire thing so anyone can do it with a click? I would argue that the latter is exponentially more valuable to a team.

Hit By a Bus Effect

The weakness in memorization is that if only one person becomes a bastion of all knowledge, you’re going to get in trouble quickly. If they take a vacation and the entire team falls apart, there’s a problem. Information is meant to be shared and made easily accessible.

Memorization has the horrid side-effect of blinding your team from bad documentation and process. By building tools, you won’t have to explain to the poor new hire the hundreds of caveats of even starting to develop your application and nonsensical process that was allowed to grow over time.

But Here We Are

Yet given this, there’s a perverse obsession with reciting algorithms from the book, quoting man pages, and all forms of memory-backed questions. It’s a double standard. We interview on the metric of memory, yet any sane coder will go into a panic attack given a completely memory based employee without a fierce knack for automation and tooling.

Then what’s a better way?

Do away with pre-interviews, all they do is filter out potentially great people with knowledge they may not have in immediate memory. Instead, ask them what they’ve built, what makes them tick, what has them up plugging away. You’ll learn far more from getting someone’s story than asking them five quick questions.

Avoid anything based in reciting man-pages and algorithm books. Instead, seek to either pair with the person or have them demonstrate on a small project. Dig through one of their already built projects, do something practical. Whatever tool they have available to them as a developer should be fair game. If you really want to be bold, have them bring their own laptops in to see how they work.

The point is to learn if this person can contribute to your team, not chant Dijkstra’s Algorithm and write out a Quick Sort. If you’re not going to be working on it daily, it does not belong in an interview.

Parting Thoughts

“Everybody is a genius. But if you judge a fish by its ability to climb a tree, it will live its whole life believing that it is stupid.” - Albert Einstein

The World is a Functional Program

2015-02-20T22:01:46-08:00

Intro

What if the world was, in its entirety, a functional program? Through the discovery of mathematics and pure functions, we can derived the process in which lead to our world.

This current variant is still in need of refinement, comments are appreciated as I clean up around the edges. Working on rewriting the current code segments and writing new ones in Lisp for added effect

Divine Recursion

Take a simple recursive function, factorial:

def factorial(n)
  n < 2 ? 1 : n * factorial(n-1)
end

We have established a base case in which a constant can be derived, numbers less than two will always return one. The flaw of current world views is that we assume creation to have occurred as a constant base case, and seek the answer there.

What really happened was something different entirely. While we quibble over how to find the base constants of our world, we miss a very crucial fact of inception: who called the function that started the divine recursion?

Closures

In programming we have the concept of a closure, where a function closes over a value:

closure_fn = -> {
  my_data = 'Foo'

  -> name {
    "Hello, #{name}. #{my_data}!"
  }
}.call

# This returns a function that we can now call:
closure_fn.call('Brandon') # => "Hello, Brandon. Foo!"

The function inside captures the state around itself, enclosing it, or rather creating a closure over it.

Our world is the result of a closure in which something defined our function with a set of constant values outside of our function, but inside of our scope of knowledge. This is how we derive logic, time, and the rules of the world in general. This begs the question though, how did it get called? A transcendence, much like a programmer that calls a function.

Free Will and Branch Theory

When a recursive problem approaches a function that can go down many paths, its result is not necessarily known by the one who invoked it. Given certain conditionals, a branch may be permanently trimmed off the world tree as it approaches its absolute single-branch return value.

While the invoker may not know the result of the branch, they may be able to make certain guesses about the nature of its execution. With enough insight and knowledge of a program, you can predict the results of a tree. That makes it sound as if there’s no such thing as free will and there’s an inevitable predestination, but here’s the brilliant part: it’s not.

Callbacks

Inside each execution loop of the world tree, a callback is invoked in which an external function can be reached. This can be considered much the equivalent of prayer, sacrifice, meditation, and other spiritual activities. Given that these callbacks do not prevent the execution of malicious code, bad things are able to happen as a result of misusing them (Oija boards, summonings, occult, falling from grace.)

Through this process of callbacks in the tree, the execution order is now unknown to even the invoker. Even at that, the knowledge of the inner workings of the program will still lend considerably more insight into the path of execution than will be known by the data (or person.)

In a way, it’s a solved game. Much like a supercomputer playing chess, the end result was decided before the game even began. Free will is the result of the game itself being played in the interim around the fixed endpoints.

Laziness, Currying, and Partial Application

Given the process of callbacks, the entire world tree is already effectively defined but the functions have not been called as all the data is not present.

plus_one = -> x {
  -> y { x + y }
}.call(1)

plus_one # => anonymous function

plus_one.call(2) # => 3

Without all the data being present, a function will not be called. When provided with its last parameter, the value will be returned and a branch can be derived.

By currying our choices and states along the tree, we build up towards the execution of functions that will change the branch we’re currently on. The world tree is lazy in nature, it will not execute branch changes until it has all the data necessary.

Evil in the form of exceptions

The crux to giving the ability for callbacks and laziness in functions is that errors can and will be raised. Ones that are outside the influence of the invoker due to the nature of the function. Does this undermine the omnipotence of the invoker? No, as they had already provided rescue conditions throughout the application to save data from exceptions.

def saved(function, data)
  function(data)
rescue
  outer_context(data)
end

Monadic state

So then how do we reconcile with a young earth versus an old? We don’t. Both are plausible at the same time with the presence of monadic state, a seed of sorts. By invoking the world tree with a set of predefined knowledge, time can be simulated, elongated, or generally distorted beyond the current rules of our world tree.

def world_tree(state)
  some_execution_chain(state)
end

world_tree(logic: rules, entities: creations)

Much like a dream to us, we view something as always present, predefined. Perhaps our concept of time is warped by the seed data in such a way that we observe something beyond our functional world tree. As it recurses, it carries with it the state that could have easily been arbitrarily defined along the way.

Evolution and Functional Composition

Evolution is also a result of seed data, but more thoroughly of functional composition:

organism(cell(protein(x)))

Functions are composed upon one another such that the pattern that composes a monkey may well be an earlier variant of a human that has not had all of its functional chain called through. This is what leads to similarities of DNA, a monkey would merely be a human without the remaining functions between them.

We see evolution, but in reality it’s the base functions that have been built from the ground up in order to create us and the creatures around us.

The apocalypse and the return

You remember the presence of constants in the system? They were never meant to be the beginning, but the end of the chain. If the end is already known, and the beginning was made from seed data, the process in the middle is left largely to the result of execution.

At the end of our world tree function, and when certain branches are returned, our state is transferred into the closure above us, more commonly known as our heavens and hells. These returns will only happen when a function has called through an entire branch at the end of the tree, known as the apocalypse.

In the interim, we’re stored in a state that’s carried throughout the remainder of the world tree, in what would be called as Limbo.

Given that the function was invoked by an outside source, certain code may have been arbitrarily introduced in such a way to allow new state to manifest itself at certain points of the chain in ways that again defy our given rules. This can lead to such things as a virgin birth, resurrection, and even a return as the end condition itself.

The world is a functional program, and we are the data that flows through it.

The Transcendent Turtle

2014-10-30T20:16:31-07:00

To many, Minecraft was a gateway drug to the world of technology. Redstone was a novel idea that let us experiment with some circuitry, make traps, and in general create more dynamic things. It’s great for the basics, and a lot of fun to work with, but the interesting thing about it is that you’re already starting to program by using it. Why not take it a step further? Computercraft gives you the power to jump into a full programming environment inside Minecraft using Lua.

But I’m not a Programmer!

Neither are the people who frequently play Minecraft. Really, you don’t even have to know how to program to get the benefits of the mod. Programmers are a peculiar breed who love to share their creations publicly. That means you can get some amazing tools and scripts from brilliant people simply by looking through the Computercraft Forums

Say you can’t find it but you want to make something. There are tons of tutorials out there for how to get started with computercraft.

It’s going to be hard!

If you already use Redstone, you’re working with a lot harder material already. All those logic gates you use to get basic doors to work? What if you could just put a password on it? Simple:

-- Reference for more advanced: http://computercraft.info/wiki/Making_a_Password_Protected_Door

while true                            -- we want to keep the program going
  print("What is your quest?: ")       -- Give them a prompt to let them know what you want
  input = read("*")                   -- Read in their input
  if input == "holygrail" then        -- Is the input the password we want?
    redstone.setOutput("back", true)  -- Send a redstone current behind the computer
    sleep(2)                          -- Wait a few seconds
    redstone.setOutput("back", false) -- Lock it again
  end
end

No having to reference that long image about logic gates, that’s it. Welcome to the concept of programming abstractions. Redstone was a low level language, and Lua is a lot higher level.

Surely those mining robots are harder to make though. Not really. Want to make it

-- Reference for more advanced: http://pastebin.com/73gH7BUL

print("How far we going boss?: ") -- Ask them how far to go
distance = read("*")              -- Get the distance

for i = 0, distance do -- For the numbers 0 up to the distance that was entered...
  turtle.dig()     -- Dig in front
  turtle.forward() -- Move forward
  turtle.digUp()   -- Dig above
end  -- ...and repeat!

I don’t even know what they can do

That’s what the Wiki Pages are for! Tons of information on how turtles work, what commands they can run, and various other handy bits.

Typing on that terminal is annoying

I agree, and I don’t bother with it either. I type my code and post it on Pastebin, and then just download it to the turtle like this:

-- From http://pastebin.com/73gH7BUL
pastebin get 73gH7BUL digger

…where 73gH7BUL is the url hash of the pastebin, and digger is the name we want to save the program as. All we need to do to use it now is to type in digger in the terminal and it’s off on its merry way.

It defeats the purpose of the game

It really depends on who you ask. To me, the purpose is to build cool things, not spend forever gathering the resources to make it happen. Computercraft allows you to automate a lot of that work, and the nice thing is that most of the scripts for common things like digging tunnels and stairs are already out there for you to use.

If you feel content spending hours on mangling redstone to do what you can do in under 20 lines of Lua in a few minutes, more power to you. Best hope you didn’t make a mistake, or you’ll end up digging the entire thing up again. To me, it enhances the game by allowing you to get more done faster.

Tunnels? Lame.

How about a swarm of mining turtles controlled by a boss?: https://www.youtube.com/watch?v=g5153BiTNI8

3D Printing from a turtle GUI Paint program?: https://www.youtube.com/watch?v=AuofE9dqiuU

Youtube videos in Minecraft?: https://www.youtube.com/watch?v=tpqOv7SxkHA

Maybe a massive villager shopping mall: https://www.youtube.com/watch?v=Xasa_Jr-lcI

Though a Minecart Station may be your thing: https://www.youtube.com/watch?v=ws4iDwLc0zQ

The point is, if you can imagine it, someone has probably already built it. If not, you can make it. Of course the more advanced you get the harder it’ll be, and programming can get hard past the trivial stuff. It takes time, but you can ask on the forums to get the help you need.

It favors Programmers

Well, yeah, it is programming in Minecraft. Experienced programmers will have an edge. The good thing is that most programmers love sharing their toys, and love it even more when people use them and thank them for it. The thing to remember is that all of the seriously advanced programs out there take days to weeks to complete, so they’re not getting an easier time necessarily.

There are already tons of scripts online of all types to download that will have you running at about the same level as any mid-range developer, and they’re even documented. Even if there is a veteran on the server, chances are they like to share as well. Just ask some time.

I don’t want to have to redownload things

Yeah, me either. If you’re sufficiently advanced you’re going to run into the issue of remaking your programs and having different versions out there. There are a few of us out there crazy enough to try and fix that issue with an entire deployment management system for turtles like Tortuga (WIP) which will take care of a lot of that. Think Opscode Chef for turtles.

A Burgeoning Blog

2014-10-20T20:56:07-07:00

Why do people bother to blog? Very few must have anything truly breathtaking to say, at least not to the caliber of other writers already out there. I get it, it’s intimidating to publish when there’s already so much good content out there already. There’s always a fear of looking the fool, saying the wrong thing, or otherwise just doing a poor job of it. So why should you even bother?

Relative to What?

Open up Github, or whatever code store you may have, and take a look at the code you’ve written even a few months ago. Chances are high you’re cringing a bit at some of the things you’ve written, patterns you’ve tried, or even lack of testing. If you had the time, you’d likely think of refactoring the entire thing, and doing it right this time.

That urge is one of the most compelling reasons you could ask for to start writing. That experience that transformed the way you think about code is a valuable thing, and worth sharing. It doesn’t matter if the realization was that you shouldn’t use eval in your code or that an abstraction could have saved several hours of time in the future, it’s valuable.

A Long Road Ahead

Every programmer will find themselves at a different stage of experience, many looking for someone who went through the same trials they did. By writing, you’ve given that person a resource on which they can build and grow as you did. You’ve given them a map to guide them out of a potential pitfall that you’ve once encountered, and by doing so you’ve helped them move faster than they would have on their own.

Many a new programmer will find themselves terrified by the complexity that most of us take for granted. The hours of hacking away at a terminal just to get your first Rails or Node server running, the perils of deploying your first code, the nightmares of your first testing suite, these experiences are not to be undervalued. Writing a post explaining any of the things you had to fight through just to see that glorious hello world on the screen for the first time may be just the hope someone else needs to keep going.

A Great Distance Traveled

You’ll find that as you blog, you can learn far more about yourself. You can chart where you’ve been, what you’ve learned over the years, and trace the path to what you’ve become. It’s a warm feeling to be able to point back at your earlier writings and say “I was there too, once, and I made it.”

Write, and share your struggles so that others may be lifted above them on the shoulders of giants.

Sanctimoniously Self-Made

2014-10-18T21:50:00-07:00

In this industry, and especially in American in general, we place great value on being a self-made person. Beating the odds, overcoming adversity, and coming out on top. I fear that such an attitude is extremely toxic for one key reason: there’s no such thing as being self-made.

This industry has a very grave problem in which we delude ourselves into thinking that our achievements and accolades are due solely to our own work. While it’s critically important to work hard and learn, I feel that most miss the point. Behind the story of every towering success, every captain of industry, are people who helped get them there.

Every Legend has a Story

No one starts out a legend, that’s for after the story has already been written. They become legend over time with the help of friends and colleagues. Steve Jobs had Wozniak, yet we rarely hear mention of him. Bill Gates had Allan, and again the crickets chirp. Why are we so wrapped up in heralding one person instead of the entire group?

What this has led to are a collection of people who believe they owe nothing to the worlds that raised them. They come to believe in the terrifying notion that people not in their position are not as hard working or not as dedicated. That may be the case in some matters, but often times it’s far from the truth.

Pay it Forward

I strongly believe that we in the industry have an obligation to pay it forward. All the time people have spent investing in us should be given back to the community, whether that be mentoring, connecting, or even helping to pay someones way. Remember it wasn’t long ago that you may well have been in their same position, dazed and confused.

There have been several people in my life that have contributed to me getting to where I am today, and I thank them for investing so much time and effort. I wouldn’t be here if not for them. From the patience of my High School tech teacher, to the hard-nosed Unix professor in College, and to the man who taught me everything I knew starting out when no one else in the area could understand what I was talking about.

If you know such people in your life, open a new tab and thank them. Remember what they’ve done for you, and realize that there are yet more people coming up that could use you in much the same way.

Seeking Seniors

No one starts out a grizzled veteran or proficient programmer, and it’s time we realize this.

The current trend is not sustainable. We look for Seniority when we fail to invest in bringing people to that level. Colleges pump out fresh new programmers to meet a need that we refuse to fill, instead defaulting to creating artificial scarcity. If you’re in DevOps in San Francisco with a Senior level, take a look at your inbox if you don’t believe it. It’s not unusual for me to see 10+ messages a day at a Mid level.

Juniors with 3+ years!?

We set expectations for Junior positions to 3+ years experience, and fail to mention anything of Entry Level. It’s no wonder there can be such a panic on graduation. Meanwhile, there are some extremely clever people flying below the radar because your HR department is hard-nosed on time based experience. By foregoing this, you’re missing out on an extremely passionate demographic of people.

That means being willing to hire a few Juniors instead of insisting on Senior levels. That means being willing to take on College Students to show them the industry, and what to expect.

Stacking the Odds

So what if graduates don’t have your entire stack mastered? Can they learn? Are they willing? Honestly, you should also be asking yourself if you had even half the skills at that stage of life. The answer is most likely no, so why expect it from someone just entering?

By listing so much on a requirement, you may well be scaring off some truly brilliant people with a greater than average amount of modesty, a trait this industry sorely needs more of.

A Final Thought

To put this article as succinctly as possible: Invest in the future, or by the time you get there it won’t be worth anything.

Reveling in REST

2014-10-04T20:03:43-07:00

Services are always the last thing anyone mentions in conjunction with Angular, despite being the single most important part. It rather well kills the point of having a frontend framework if you can’t even get data properly from your respective backend. Great! Now that we have that out of the way, let’s dive right into how to implement a RESTful client in Angular to get you up on your data in high fashion!

…except if you’ve already tried, you noticed quite the disconcerting truth. Angular is Javascript, and like its kin it has as many implementations of REST as there are people who are capable of making one. A few might chuckle to themselves on the fulfillment of the LISP curse, but fret not! There is hope yet, or at very least someone with enough patience to lay out a few options worth looking into so you don’t have to.

We’re going to cover some of the more popular options out there, some of their strengths, and where they’re going to quickly become a thorn in your side. For those unaware of the LISP curse, it’s quite simply that the language is so powerful that everything becomes a social issue. Javascript is very close to LISP in terms of expressiveness, and as a result suffers from a lot of the same effects. Hundreds of half baked implementations of what you want, with very few ever offering a full package beyond the all too common “It solved my problem fine” hack library.

$http

The built in http methods, also known as rolling your own service.

The Good

This is the low level of making a request. As long as it fits in the scheme of HTTP you can define it here. This gives you a lot of power to take care of those fine little details.

The Bad

The problem with having that type of power is that very very rarely can anything be described as special enough in a RESTful framework to necessitate fine grained control. If it does, you’re likely doing something very wrong and need to look at your implementation a bit more carefully.

The Ugly

When I say low level, I mean it. You have to handle setting up every method for every type of request. The only way to really overcome this is to set up a base service and define common methods, but by the time you do that, you’re already a great deal of the way to items further down the list.

If you see yourself there, it’s time to take a sober look in the mirror and ask yourself if you really want to invent the next RESTful service handler in Angular. Nothing against that if you have something clever, but chances are you just want to get work done, or at least you’re supposed to be getting it done.

Yes, it’s easy. Yes, you could probably make something pretty spiffy. Yes, it would fit your needs like a glove. No, you probably don’t have the time to maintain every little detail of it if you manage to make a mistake on it. Use something already out there unless you really need that level of granularity. The LISP curse needs no more help propagating itself into the Javascript world.

$resource

An abstraction beyond $http allowing you to define a lot of the methods at once.

The Good

Unlike $http this allows you to define all of the resources in one swoop.

The Bad

The Documentation is neigh unreadable and you’ll spend plenty of time fumbling through blog posts and whatever books you can find to get a solid implementation of them

The Ugly

It took a while for them to get to promises

RestAngular

Touted as solving a lot of the annoyances with $resource

The Good

You probably won’t need to bother with making Services, you can just drop in Restangular and use it in your controllers. Everything from that point on is a Restangular object you can call through on, and they all return promises. Very handy to get a lot of code out of repetitive services.

Want to do something out of the usual? Restangular can do it. You get custom methods for sending new types of requests, and you can even define your own methods on it.

The Bad

Hopefully you like lodash (I do), because it’s a required dependency. This one is debatable, as I’m of the opinion that people should be using it more as is, but I digress.

What about relationships? You’re going to end up with a lot more code there, especially on trying to get many to many relationships to behave in anything that resembles coherence.

The custom methods are nice, but you’re going to very quickly see your controllers start looking like half-baked services. If you’re finding yourself defining a ton of custom methods for unique methods, you’ll find yourself going back towards services very quickly. Granted, that likely means you need to redesign systems on the backend, and of course there’s nothing against abstracting Restangular into base controllers either.

The Ugly

These objects can become heavyweight fast. All the Restangular methods are getting appended to the objects meaning you’re passing around a lot more data. If you send a POST or PUT request to create or update something, you had better hope you’re filtering paramaters.

You’ll end up getting a mouthful of Restangular chaff on every object you’re trying to send up, and the only way to get around this one is to run a cleaner on it. To me it seems like far too much work being done for far too little extra gain.

Angular Data

Eventually you get fed up with all of this and decide that there has to be a better way to manage all of this. If you’ve noticed a trend so far, it’s that each successive recommendation is an abstraction on the last. Angular Data is the culmination of getting far too pissed off at implementing base services and other nonsense trying to get Angular to behave coherently.

The Good

You can define relationships, have resources defined much in the same way as a Rails-like framework, and even bind data to the scope with very little extra code.

That means you get fun like hasMany, hasOne, and belongsTo. No more needing to create extra methods to get at nested data, and no need to repetitively define relationships.

The author is extremely responsive to issues and is known to have feedback within the day. Many of the frustrations above were things that he’d cited as reasons for creating this framework.

The Bad

I’ve yet to have found anything compelling against it to this point, except that you need to be very specific in telling it how your server responds to queries.

So what wins?

Really it depends on how much horsepower you need to get tasks done, but as of now I would still put Restangular as the go-to for most occasions with angular-data being a very interesting up and coming framework.

The Frivolous Frontend Framework

2014-10-02T19:46:01-07:00

Many a hardened Rails programmer will swear by their ERB or HAML, shaking a fist at the sky decrying the acolytes from the land of Javascript for their frivolous frontend frameworks.

“Who needs them!” they say haughtily. “jQuery has sustained us perfectly fine, and our applications are not nearly large enough to warrant the extra overhead! Why would any of us use a frontend framework?”

Yet there are those of us, standing upon the hill, pilgrims from the unholy land of Ajax Callbacks and Asynchronous Updates, looking upon them with something akin to pity.

The Short Version

If you’re looking here for answers on whether you need a framework, chances are very high that you do. If you’ve found your way fumbling about AJAX one too many dark nights whilst imbibing strong drink, it’s time to bite the bullet and make a jump to a better place.

The (Poetically) Long Version

In the modern day web, dynamic never seems to be dynamic enough for some. Data needs to update live on the page, masses of components need to update and render as if in some practiced dance. You find yourself saying “Just one more patch hack and it should work again.” How many nights has it been now? Two? It seems you’ve lost count. Anything akin to structure is a mad snarl of brambles waiting to take you should you misstep even one unit test.

You cry out in anguish as IE8 fails to render yet again. Surely there must be a better way, but the application is not yet large enough to warrant such an expenditure of effort! What you do are only a few AJAX calls to your APIs, the callbacks have only nested five levels by now. You’ll switch when it gets worse, you think to yourself. Only, does anything ever happen in that most special circle of hell known as Technical Debt?

What a piece of CRUD

Duplication, everywhere you see. The same basic operations of Create, Read, Update, and Delete. All of which done to the tune of the team who happened to be working on them that particular week. None quite work the same, and all attempts at consolidation and style guides have long since been laughed off as meaningless. The code is littered with edge cases, special hacks, and one time things with promises of removal and cleaning.

It’s not that any of the implementations were particularly bad (except for Bobs, that was a mess, how is he still working here again?) They all make sense in their own particular ways, and their creators could speak at great length on their strengths in such a grandiose bravado. You nod vigorously, a great deal of sense is made here! …but venture you further into the tribe of Neckbeard to here their prophet of the promise speak so eloquently of their path to righteousness. You knuckle your head as you bow out, some poor fool had brought up editors again.

It was quite a quandary, so many made sense but in such different ways. Which was right, who was to say, but then you saw it. The new hotness they had called it on HackerNews, singing its praises in bringing order to the chaos. Skeptically you listened in on the discussion, and rightly so. You seem to recall them saying something about Java and Perl dying again last week for the fifth time this year. Best to take them with a grain of salt. How many had claimed by now to have the one true way? AngularJS stood proud among the rest, EmberJS attracting its crowds as well, while still more frameworks begged attention.

Which one should I investigate? The answer is quite simply that they all have a point, and as to which one is not nearly as important as deciding upon migration before Jira comes to swallow your hopes and dreams.

AngularJS

For the sake of this article, I’ll cover things in the terms of Angular. I have a great deal of respect for Ember and the work they’ve done, but I can’t speak nearly as much to its strengths. The purpose of this is far more to show how a front end framework can liberate you from the shackles of the oppressive Raw AJAX.

Organically grown frameworks very seldom work, and more often than not end up becoming a cluster of micro-frameworks that are incomprehensible to all but the most trained in their ways. Best hope there are no rouge buses to rob you of their knowledge. While in the first place it seems like a good idea to allow free reign to interpret ideas and build more creatively, you’ll quickly learn that anything that can be considered a social issue to programmers is grounds for a war.

This is why we have style guides and procedures in place, to bring order. No more having to listen to a lengthy discussion on the merit and readability of two spaces versus four, the style guide had set it in stone months ago.

Much the same can be said for a Javascript Frontend Framework. While many would say they’re frivolous things, they bring order to the wildness of javascript. If only for that reason I would take them into great consideration, but their effectiveness does not end there.

With Angular, the DOM feels almost a distant memory. Cleanly abstracted from you, table rows can be updated dynamically and the page manipulated with as little as a simple data binding. Search bars for data sets are within perhaps 50 characters at most. Things that would strike horror into a pure jQuery Programmers heart are now trivial, abstracted away to build upon to greater heights.

Now things are tied to directives and actions rather than nodes that may change by a simple accident. The page can be reasoned about as a whole rather than as segmented pieces cobbled together with selectors.

When someone asks me why a frontend framework, I would quite simply reply:

“Because the sense of unity and structure they provide far outweighs the price of its implementation”

Well, that, and because Angular behaves properly in IE8 as of current versions. That alone is worth its weight in gold.

Many Woes with Many to Many Relations

2014-10-01T21:11:59-07:00

Rails provides us with a lot of power in routing and associations, but if you’ve ever tried to set up an API with any form of many-to-many relationship, you’re in for a nightmare. Google won’t save you, the Rails guides are sparse, and there’s a grand total of one good blog post on the matter from a few years ago.

Many to Many

So how does a many to many relationship work? Via an association table containing IDs of both of the resources to be linked. Both then have access to the other collection. It’s extremely handy for certain problems, and if you’re just using Rails through the view you’ll likely never have a problem with it.

The Fun Starts

But now you’ve heard about this awesome thing called Angular / Ember / New Hot JS Framework that you just have to use. I don’t blame you, a few weeks in Angular and I don’t want to use Rails Views again. You decide to take the high road and segregate the apps, making Rails an API and using your framework (Angular assumed from here on out) to build out the frontend through calls.

It all works great, you even found RestAngular to help you out with some of the plumbing. Simple actions are now trivial. Want a list of a Users comments? Easy:

// Livescript
RestAngular.one \users, 1 .getList \comments .then (data) -> $scope.comments = data

But then there are Categories

RestAngular already has us covered, any other case and we’re sailing along. Now we want to add categories to our posts, a many to many relationship. How would we script that one? Likely the first thing you try is this:

// Livescript
RestAngular.one \posts, 1 .getList \categories .post formData

Checking the DB, you’ll notice the new association isn’t there. Odd. Maybe delete will work?

// Livescript
RestAngular.one \posts, 1 .one \categories, 1 .remove!

…except now for some reason, category one is gone everywhere. Thinking through it, it becomes clear that what we’ve done is simply request a nested resource and sent it a delete request.

So what do you do?

There’s an association table with your name on it called something like PostCategory. Trying to route through either one of the hosts is likely to give you nightmares.

First let’s take a look at what your controller action should look like to handle the queries:

def index
  PostCategory.where(params.slice(:post_id, :category_id))
end

def create
  @post_category = PostCategory.new(post_category_params)

  if @post_category.save
    render json: @post_category, status: :createds
  else
    render json: @post_category.errors, status: :unprocessable_entity
  end
end

def destroy
  if params[:id]
    PostCategory.find(params[:id]).destroy
  else
    PostCategory.where(post_id: params[:post_id], category_id: params[:category_id]).first.destroy
  end

  head :no_content
end

Note that the index action is a very succinct way of saying:

def index
  post_categories = PostCategory.all
  post_categories = post_categories.where(post_id: params[:post_id]) if params[:post_id]
  post_categories = post_categories.where(category_id: params[:category_id]) if params[:category_id]
end

Though the latter has been known to drive me to very lengthy discussions on mutability morality and ethics.

This allows us to search against either posts or categories depending on the params, but this can only work if we cheat a bit around the routes and define a DELETE action on the root resource:

delete '/post_categories' => 'post_categories#destroy'

Not exactly the most straightforward method, but given the odd alternatives like adding controller actions to either of the ends of the relation like post#add_category and adding multiple routes for every time you try it I far and prefer this idea. The only real difference is that you end up with a request like this instead:

DELETE mysite.com/post_categories?post_id=1&category_id=1

Now all we have to do are basic actions like on any other service and we’re golden:

// Livescript
RestAngular.all \post_categories .post
  post_category:
    post_id: $scope.new_category.post_id
    category_id: $scope.new_category.category_id

RestAngular.all \post_categories .remove
  post_id: $scope.new_category.post_id
  category_id: $scope.new_category.category_id

Wrap it in a service and you’re set to go. Just remember that the association tables are there for a reason, use them. Rely on too much rails magic and you’ll end up burned thinking something’s going to work.

I welcome any thoughts on how better to address such issues as this in the comments, I’d love to hear your opinions!

Izzy Hackery

2014-09-30T22:43:48-07:00

In which I explain the gem Izzy

In the time I’ve been off of writing on here, I’ve had a bit of a stint of gem creation. We’re going to cover a number of them in the coming week.

Some may say that monkeypatching is inherently evil, but I would tend to disagree. An RPG serves a very tactical purpose when used correctly, but often times it can have rather unfortunate results in the hands of the untrained. Such is monkeypatching, something that should be viewed in a pragmatic sense rather than one of dogmatic vitriol. With that, let’s take a look:

Izzy

Izzy got popular right after a Ruby Weekly post mentioned it as a method of mitigating long conditionals. I made it for the express purpose of simplifying multiple conditionals on the same object into something more succinct.

Going off of what’s in the README as far as order, we’ll take a look into some of the inspiration and workings of each method.

Matchers

Matchers are methods that are checked against any of the attributes of an Object that includes Izzy. Let’s say we have an instance of me made in a Person class implementing Izzy:

brandon = Person.new('brandon', 24, 'm')

Now it gets really tiresome to do something like this while trying to validate against this object:

brandon.age > 18 && brandon.name =~ /^br/ && brandon.gender == 'm'

It seems repetitive and downright unnecessary to specify the object multiple times. Rails has a tendency to use hashes to create, query, and update object, so why not add some of that type of magic to validations?

brandon.matches_all? name: /^br/, age: -> a { a > 18 }, gender: 'm'

To me that’s far more succinct. So how do we make something like this in Ruby? Let’s take a look at the source:

def matches_all?(matchers = {})
  matchers.all? &matcher_check(:all?)
end

def matches_any?(matchers = {})
  matchers.any? &matcher_check(:any?)
end

def matches_none?(matchers = {})
  matchers.none? &matcher_check(:any?)
end

private

def matcher_check(type = :all?)
  -> matcher {
    m, val = *matcher
    values = val.is_a?(Array) ? val : Array[val]
    values.send(type) { |v| v === self.send(m) }
  }
end

The first thing you may notice is that the body of the block is abstracted into a private matcher_check. This is to abstract the logic for reuse on the two other matcher types.

The fun thing about this is because it’s in a method, we can send another argument to it. The value is then pulled into the block, or closure if you prefer. In this case, we’re sending what to check the values against dynamically. Notice that we only use all on the matches_all? method.

Let’s step through this piece by piece with only using the check brandon.matches_all? name: /^br/:

# Call matches_all? on brandon:
matchers = {name: /^br/}

matchers.all? &matcher_check(:all?)

# matcher_check
type = :all?

# Hash gets exploded into the method and the value to check against
m, val = [:name, /^br/]

# Since we're able to check against multiple conditions using an array, we want to make sure we have one to work with:
values = Array[/^br/]

# We then check the values array with :all?, or :any? in the case of any and none checks.
values.send(:all?) ...

# which will use === to check it against the actual value:
/^br/ === self.send(:name)

So why does === work there you might wonder. It’s overridden very frequently for classes, notably for Regex (matches), Range (includes), and Proc (call). Most of the time this is bad practice not to use the longhand versions, but in this case it affords us a great deal of flexibility not to worry about how it’s evaluated as long as it does a proper match.

This is actually one of the most powerful features in the case statement, which uses === for its when clauses. Notice that Proc.call is the same as Proc.===, meaning you can throw lambda and friends into the mix for even more powerful checks.

Boolean Matchers

Boolean matchers were the original method, again using the abstracted block:

def all_of?(*methods)
  methods.all? &method_check
end

def any_of?(*methods)
  methods.any? &method_check
end

def none_of?(*methods)
  methods.none? &method_check
end

private

def method_check
  -> m { self.send(m) }
end

This one is far simpler in that all it does is call a list of methods on an object, checking their truthfulness. If we had some methods defined on person to check legal status, or various other simple checks, this would come in handy:

brandon.all_of? :legal?, :older_than_21?, :male?

Enumerable Module

Because sometimes it’s nice to have a bit of that Rails feel in regular Ruby. These methods use the matches_all? method in conjunction with select, reject, and find to provide some Rails like shorthand:

def select_where(matchers = {})
  self.select { |s| s.matches_all? matchers }
end

def reject_where(matchers = {})
  self.reject { |s| s.matches_all? matchers }
end

def find_where(matchers = {})
  self.find { |s| s.matches_all? matchers }
end

We’re not always in Rails, and one of my favorite features are the ActiveRecord where and find methods. Composing the two functions allows us to do that quite nicely.

Final Notes

Combining multiple small functions into something larger is one of the cornerstones of functional programming known as composition, and something well worth looking into. Not every gem has to be a monolithic beast that can tame the worlds problems. Sometimes you only need to do the simple things well and build up from there.

Next up we’ll look into Streamable, Pipeable, @banister’s Funkify Library, and hacking Piping functionality onto Ruby.

Setting up Rails in Debian 7

2013-10-02T22:15:00-07:00

In this tutorial we’ll cover the entire process of setting up a basic Rails environment on a clean install of Debian 7.1.

What am I making?

You will be making a Debian 7.1 Box with Ruby 2.0, Rails 4.0, and Git. At the time of this writing, these are the most recent versions.

Virtual Box

The first thing we’re going to need is Virtual Box. Feel free to set this up as a standalone OS, the process will essentially be the same.

In my case, we just need:

sudo apt-get install virtualbox

Getting Debian

http://www.debian.org/

Installing Debian

Unless otherwise noted, specify the default options on your install. I will note the steps as I go along installing Debian 7.1 i386 on an instance of Virtual Box with a Host OS of Linux Mint 14 x64. The steps should not differ heavily with other Host OS platforms.

Hostname and Domainname

Your host and domain names are completely up to you, but if this is just a test I would suggest leaving them as the defaults for the time being. They can be changed later on.

User Accounts

The same will apply to the passwords and the other information used for account setups. At this point on a test box I specify a trivial password and other information, considering I’m installing on a VM that will not see the light of day. I don’t advocate doing such things on a live server, the Ops will hit you or do nasty things to your home directory if you do.

Partitioning

Select Guided for the partitioning method, unless you’re feeling brave or know your way around Unix. This will be explained in detail in a later tutorial, but for now it will be fine to accept the defaults.

As it will tell you, select all on same partition. There are quite a few benefits towards seperate partitions, but if this is a test box or a VM it will be irrelevant for now.

Finish the partitioning and write the changes onto the disk.

Base System Install

After this point, the base system will begin to install. Now would be an ideal time for coffee or other niceties you may desire as it will take about 5-10 mintues to complete.

Configuring Apt

This is another instance of selecting defaults unless you have compelling reason not to. Chances are low that you will, and HTTP proxies will be rare in most cases considering you’d be routing through your Host’s NIC.

Select and Install Software

Now would be another great time to catch a break, as it’s going to be downloading a fair amount of packages from the package server. Make sure to watch for the popularity contest prompt. Feel free to select as you wish.

On the packages list, you can deselect using the space bar. ONLY select SSH Server and Standard System Utilities. We want to keep this lightweight for tests. In the case of a server, DO NOT select a Desktop environment. Put simply, you’re doing yourself a disservice as most commercial servers will be running headless as is. Press Enter to continue, and it will continue to retrieve the requested files.

Now that we’re here, it’s time to boot into our new system!

Getting Rails Set Up

Go ahead and log into our test account. Now the first thing we’re going to want to get a hold of are a few programs:

Programs

sudo apt-get install git zsh vim

ZSH and Vim being preference, but will save you some headaches later on down the road. Git is by far manditory for any form of Rails Development. Learn Version Control, it will save you countless hours later on.

Next we’re going to want to get a hold of RVM, Ruby Version Manager, to handle various Ruby installations.

RVM

curl -L https://get.rvm.io | bash
source /etc/profile.d/rvm.sh

rvm install 2.0

Notice the source command, you won’t be getting very far without it. This will take some time as it’s building Ruby from source. Now to get Rails running for us.

Rails Install

gem install rails -v 4.0 --no-rdoc --no-ri

We’re explicitly leaving off the documentation, as it takes substantially longer to compile. The thought behind this is that you should have a hold of the great Obie Fernandez’s The Rails 4 Way sitting on your desk. No? Purchase it. I’ll wait, and you have plenty of time before Rails installs as well.

Testing it out

Now we’ll get a skeleton app up to demonstrate that we have everything working. Make a directory for tests, and run

rails new test-app

You should see a lot of code flash by, and a hang at bundle install. This is retrieving all the extra libraries for Rails to get running.

I will warn you there’s a potentially nasty bug lurking here, in that a javascript environment will need to be installed, run the rollowing commands:

NodeJS Install

sudo apt-get update
sudo apt-get install python-software-properties python g++ make
sudo add-apt-repository ppa:chris-lea/node.js
sudo apt-get update

sudo apt-get install nodejs

After this, go ahead and give it a shot and watch it come to life!

Rails Server

rails s

Running into problems? Shoot me a tweet @keystonelemur and I’ll add it to a footer section of problems encountered and we’ll get it all sorted out!

Getting Cozy with the Command Line

2013-09-29T00:50:00-07:00

To some, the command line is a truly frightening beast. To be fair, when I began, it really was. Who in the world would ever want to sit around in a prompt when there are such beautiful visual editors out there? It seems so counterintuitive that no one should ever want to go that way.

Yet here we are. The great bearded ones hammering away in their prompts, invoking vim wizardry, emacs enigmas, and unix hackery. What makes them so cozy?

It Will Hurt

When I was getting started into technology, a good friend and mentor of mine recommended I install OpenBSD. I installed it, and the first words out of my mouth were ‘Where’s the GUI!?’

I’d never been closer to throwing a computer out a window than trying to figure that thing out. It was horrible, I was slow, and nothing made sense. I kept projecting my expectations for an OS onto it, proclaiming loudly how worthless it was and why it was so stupid.

After some coaxing, my friend told me to wait it out, read a few books on the subject, and bear through it. If there was a single moment that changed everything I’d ever known on technology, this would have been it.

The best advice I can give to a newbie to the great prompt is that man pages are your friend, google is an infinite purveyor of knowledge, and amazon holds within it great archives of literature waiting to be discovered. Read, and find a basic Linux administration guide such as Linux Administration - A Beginner’s Guide.

The commands you are going to want to know inside and out are Awk, Sed, Grep, and Find. They will make your experiences with log files and other text files far more enjoyable.

ZSH

The single best thing you can do for your prompt is to install ZSH, and shortly thereafter Oh My ZSH. Any knowledge you have of BASH will be quickly transferrable, as it will all already work in ZSH.

ZSH offers quite a few little niceties that will speed up your work flow immensely.

Command Correction - Type in the wrong command? It’ll ask you what you meant.
Tab Completion - Press Tab in an empty directory and you get a list of all files to cycle through, and as you type it will start a fuzzy search.
Git Integration - cd into a Git Repo, and it will tell you your branch, and give you a number of aliases to shorten git work.
Shared History - Command in the wrong shell? Not a problem with shared history

There are so many more things that I could discuss on ZSH, but there are plenty of reasons.

Aliases

As I mentioned in an earlier post, aliases are your friend. Anything I type more than once that’s greater than five characters will get an alias. Combine with ZSH features such as global and suffix and you can get some crazy commands going fast.

VIM

Vim was probably the biggest learning curve I had when switching to a command prompt based layout, and also by far the most rewarding when I really got it. Heck, I’m writing this post in Vim.

The biggest advantage to it is that your hands never have to leave the keyboard. The mouse has become an enemy to productivity to me, and I refuse to touch it when programming. Learning shortcuts and how to use vim properly has sped up my programming substantially.

Combined with any number of the Great Master Tim Pope’s Plugins and VIM will be a match for much of any editor out there.

The real question to ask on matters of efficiency is this: When was the last time you watched someone in Sublime or Textmate programming and thought ‘Wow!’ ? Go watch a Vim guru fly, and you’ll swear you just witnessed black magic.

EMACS

I can’t mention Vim without mentioning Emacs, lest I invoke a Holy War. Emacs is a beast all its own, and the only apt description of it would be an Operating System pretending to be a Text Editor. Seriously, IRC and a Music player? It undoubtably has some substantial power, but ultimately it clashed with my desire for a streamlined workflow.

Perhaps I’ll come back to this after I start back into LISP and go into Emacs, but for now it’s not my thing.

TMUX

TMUX is a Terminal Multiplexer. But simply, it allows you to have multiple panes open in a single terminal window. Combine that with the ability to save your sessions, and make templates for new sections and it will quickly become valuable.

Admittedly I have not had as much of a chance as I would have liked to to experiment with it, but it is definitely worth a look.

But Why?

I switched to an almost completely terminal based workflow for one reason in the end: efficiency. I’m notoriously irratable with repetition of anything, and anything that allows me to remove repetition is worth the effort.

That, and it is always nice to have a new programmer accuse you of black magic hackery after seeing you do anything.

The Functional side of Ruby

2013-09-28T11:08:00-07:00

Many people come into Ruby from a C-based language background, and are quick to use only what they really feel comfortable with that has a direct parallel in their language of choice. Doing so, you miss out on all types of wonderful features of Ruby, and in this post we’ll cover a few of them.

Functional?

If you’re like me, you’ve heard this word thrown around more than anything, and never really defined. Everyone sings praises of this great new renaissance of programming, but no one seems to know what it even is.

In its simplest terms, functional programming is a program based on functions. Functions return values, mutation is a naughty word, and the law of the land is no side effects.

Well that sounds all well and good, but how exactly can you program if all variables are in their final state? That seems rather counterintuitive at best, and confoundedly stupid at worst. So why then?

First Class Citizens

In languages that support functional programming style, all functionas are first class citizens. This means that they can be passed themselves as arguments the same as any other value, because by their definition they return a value.

With Ruby, every function returns a value, whether implicitly or explicitly. Let’s see what we mean here:

First Class Functions

def bob
  "my name is Bob!"
end

def hello(message)
  puts "Hello, #{message}"
end

hello bob
  # => Hello, my name is Bob!

We just passed a function as a value! This opens up a lot of interesting possibilities, which brings us to our next point.

Anonymous Functions

Anonymous functions are functions without a name. This may sound strangely foreign, but if you’ve ever touched javascript you might recognize this pattern:

Anonymous Functions in Javascript

var square = function (x){
  alert(x*x);
}

You may notice that we just set a variable equal to a function, or more correctly that we just named an anonymous function. So where did this come from? Let’s take a look at the same thing in Scheme:

Anonymous Functions in Scheme

(define square (lambda (x) (* x x)))

This type of pattern is extremely common in LISP like languages, which is why some readers are going to start noticing some striking similarities to Ruby at this point. Let’s give this one more try in Ruby:

Anonymous Functions in Ruby

square = lambda(x){ x * x }
square = ->(x){ x * x } # Ruby 1.9+ Syntax

Blocks are essentially anonymous functions that are called on the fly to operate on enumerator values, and discarded. Blocks can also be saved if need be, which brings us to

Why Bother?

What benefits does it really bring? Is it even worth it? In short, the authors (probably biased) opinion is yes. The key reason to this is idempotence.

You see, idempotence is a complicated word that essentially means that no matter how many times you run a function, given the same input it will always return the same output.

The benefit of this is that you don’t have to worry about a mystical black box and ordering scheme, as well as necessary blood sacrifices in order to get a unit test to pass. You know for a fact that a function will return the same every single time. That, my friends, will save you a great many nightmares down the road.

The great thing about idempotence is it translates almost directly into thread safe methods that will not do unusual things to your values if written correctly. Functional languages thrive in multi-threaded and distributed environments, just look at Erlang.

Erlang was a language invented by Sony Ericsson in order to manage their massive phone distributions. They came across a hairy question, how do we update our phone network and ensure no down time? Enter Erlang with its hot-swappable modules that could be changed out in production. Functional languages and techniques can give you that type of power.

The Good and the Bad

So what constitutes good practice and bad practice? Let’s dive into a few examples shall we?

In string manipulation, modifying the original string will yield some very nasty side effects very quickly.

String Manipulation

str = 'Hello, ' + str # BAD, we just mutated the variable! Running
multiple times would be BAD news.

"Hello, #{str}" # GOOD, no mutation, just a return value

Bang (!) methods should be used extremely rarely, as they modify the sender. Instead, return the results to a new array.

Square an Array

ary = [1,2,3,4,5]

ary.map!{ |i| i * i } # BAD, mutated array

new_ary = ary.map{ |i| i * i } # GOOD

This would be more amusing if I hadn’t done it before when I started. Read up on the Enumerable module, as it will save you immeasurable amounts of time in the long run.

Select from an Array

ary = %w(hartnell troughton pertwee baker davison baker mccoy mcgann
eccleston tennant smith capaldi)

new_ary = []
ary.each{ |name| new_ary << name.length if name.length > 5 } # BAD

new_ary = ary.select{ |name| name.length > 5 } # GOOD

Again with the things I wish I had never done. Iterators like this are definitely not needed in a language like Ruby where practically everything is an object. Again, learning the methods will save you a lot.

Count Number of Records Processed

ary = %w(hartnell troughton pertwee baker davison baker mccoy mcgann
eccleston tennant smith capaldi)

i = 0
new_ary = ary.select{ |name| i++ if name.length > 5; name.length > 5 } # BAD

new_ary = ary.select{ |name| name.length > 5 }
new_ary.count # GOOD

The amount of time that you will save by simply reading over the Enumerable module, and learning the commands map, reduce, and select will be astounding. All of which originated from a LISP like language.

In the Wild

So, this is a fairly short writeup on the subject, and I will definitely cover it in more detail later on, but you should have a decent idea of what to look for.

The thing to take away from this is that if used properly, unit tests and making sure things behave as they should becomes exponentially easier. Some of the hardest tasks in programming merely require a different perspective.

Automate it

2013-09-26T21:51:00-07:00

The better programmer is not the one who flies across the keyboard, generating hundreds of lines of code, but the one who has but a few strokes that do the same work in half the effort.

When to Automate

Did you use it more than twice in a day? Automate it.
Does it take more than five keystrokes? Alias it.
Are you repeating yourself? Automate it.
Did you just wonder if you should automate it? Do it.

Automation seems to be a scary concept for some, a black magic that many try and avoid because they already know all of their commands and appreciate their vanilla editors.

But it takes too much time!

In some cases, yes, you are spending far more time automating something than actually getting it done. Then again, really, how far and inbetween are those cases that you can justify it all away with just that? XKCD, as always, has our backs on timing it out:

Shells

If you’re in a Unix environment, your shell should be your best friend. Know your way around a command prompt well enough and you’ve already made some serious headway in reducing the amount of time it takes to do something!

I would seriously suggest taking a look into ZSH and its’ extension Oh My ZSH! as they alone will save you a lot of time in a shell prompt.

Aliases

Now let’s take a look at a few of the aliases I frequent:

Aliases

alias v="vim"            # Shorten Vim
alias vrc="vim ~/.vimrc" # Edit my Vimrc
alias vc="vim ."         # Open the current directory in Vim

alias vzpf="vim ~/.zprofile"  # Edit my zprofile
alias zsrc="source ~/.zprofile" # Reload my zprofile

These, of course, being pulled from my Special Sauce Repository.

So what rule of thumb do I use when adding new aliases? If it takes more than five keystrokes to do, I alias it. Digging into my .zprofile will show you a most_used command which I have to keep me honest about how much I use commands.

The amount of time I save from just that adds up quickly as I type many of those commands several hundred times a day. Adding an alias, and sourcing my .zprofile takes me all of five seconds to do.

It adds up

So what, we have a few niceties and aliases around. We may save five minutes a day with a basic set. Perhaps, but the more you alias and the more you start to chip away at your daily repetition, the more you will realize that you’re quickly outpacing your normal speeds.

Be Lazy

Really. Be lazy. Hate to repeat yourself so much that adding an alias is a natural twitch. Hate doing things by hand so much that you crack open your editor and start scripting it out!

Learn the keyboard shortcuts, and stop touching that mouse. If you’re really hardcore on it, learn Vim or Emacs and get to town on Macros and keybindings.

Automate it

When in doubt, automate it and document it. Sharing is caring, and many people are quite kind as to post their zprofiles, so take a peek and learn.