Finding Unseen Unicode/ASCII Characters in Ruby

This morning, I thought I was losing my mind. I'm writing a little web app (mostly Angular) that makes API calls. I know the API works, but for some reason, the calls from my app to the API were getting a 500 error in response. I tailed the API logs to see an "ArgumentError: argument out of range". However, the only thing that happened on this line was the date parsing. I open up the Rails console and start debugging. First I type out the date that isn't working. It works. Then I copy and paste from my browser. Failure.

irb(main):027:0> "2017-02-13T13:12:51Z".to_time(:utc)
=> 2017-02-13 13:12:51 UTC
irb(main):028:0> "2017‑02‑13T13:12:51Z".to_time(:utc)
ArgumentError: argument out of range

As you can see above, they look IDENTICAL. One of my coworkers suggested that I check the ASCII value of each character. Lucky for me, Ruby makes this easy.

"2017‑02‑13T13:12:51Z".each_byte do |c|
  puts c
end
==>
50
48
49
55
226
128
145
48
50
226
128
145
49
51
84
49
51
58
49
50
58
53
49
90

If you look at a chart of ASCII characters and values, you can see that 127 is the end of the standard characters. My fifth character starts with 226. I know that the pattern of 226, 128, 145 repeats twice and in the same spot as the dash. Looking at a UTF-8 encoding table, I can see that set of characters represents the non-breaking hyphen, which is definitely breaking my API call. Mystery #1 of the morning? Solved.

Creating Awesome Documentation With Yard

This semester I have been taking a computer architecture class. Overall, it's been pretty fun because I was given three projects and allowed to do them in the language of my choice. I chose Ruby. I'm pretty proud of these projects, so I decided to post them all to Github. If you are interested in the actual code, you can find it here. While I was doing this, I realized I needed to up my average documentation game. I needed the grader, who didn't know Ruby, to be able to easily understand what I was doing and why I was doing it. For the first two projects, I just wrote up documentation in a relatively reasonable way, and they were able to read through the code comments to see how it worked.

That's when I found YARD. YARD uses markup (I used markdown) and tags to create delightful HTML docs. What were just comments in my code turned into this, with almost no extra effort. I'd heard of it before, but I hadn't had a project that was worth massive documentation. YARD made the documentation a delight. You create the necessary documentation by using markdown, so your README functions as the homepage for your docs. Then, within each class, you use tags to explain params, return values, and add notes and examples. Here is an example from my MIPSDisassembler project:

class MipsDisassembler
  # Creates a new instance of MipsDisassembler
  # @param array_of_instructions [Array] the array of string instructions
  # @param starting_address [String] starting address for instructions, should be a string hexadecimal value
  # @param is_hex [Boolean] true if array of instructions is in hex, false if in binary
  # @note Each object in array_of_instructions should be a string representation of either binary or hexadecimal number
  # @return [MipsDisassembler] a new MipsDisassembler object
  # @example Create an object
  #    mips = MipsDisassembler.new(["0x022DA822", "0x8EF30018", "0x12A70004"], "7A060", true)
  def initialize(array_of_instructions, starting_address, is_hex)
    @instructions = array_of_instructions
    @starting_address = starting_address
    @is_hex = is_hex
  end

  # Takes binary/hex instructions and starting address and return an array of MIPs instructions.
  # @note Determines if r-format or i-format and parses accordingly.
  # @return [Array] an array of MIPs instructions, human-readable
  # @example Disassemble instructions
  #    mips.disassemble => ["7A060 sub $21 $17 $13", "7a064 lw $19, 24 ($23)", "7a068 beq $7, $21, address 0x7a07c"]
  def disassemble
    disassemble_instructions(@instructions, @starting_address, @is_hex)
  end

  # write MIPs instructions to file
  # @note File "mips_results.txt" will be created in same directory as code
  # @return [void]
  def output_to_file
    File.open("mips_results.txt", "w") do |f|
      in_file = ""
      disassemble.each { |instruction| in_file << instruction + "\n" }
      f.write(in_file)
    end
  end
 end

You can see the result of this code here as well as the image below. The result is easy to navigate documentation that you can share with anyone. It also gives the ability to see the source code of each method inline, so you don't have to go far to see the actual code behind public methods that you would want to use. I know I'm a bit of a dork, but I seriously loved putting this documentation together and I'm hoping it made my code just a bit more accessible.

Fun with Binary & Hex in Ruby!

While I haven't written a coding post in three months, I swear I do code every day. Recently, I started taking night classes again. This semester, I'm taking Computer Architecture and Data Structures with Java. My first project in Computer Architecture was to build a MIPS disassembler. I decided to use Ruby, which ended up bringing up some unique issues, mostly because Ruby does not have a short variable type within it's Numeric class. In Java, the short type is a 16-bit signed two's complement integer. Ruby does not use primitive types because everything has to be an object. No short object == no short type. Also, while binary and hexadecimal numbers can be converted easily to decimal in Ruby, they are initially strings. What does this mean? It means that in addition to the other parts of the translate, I also had to convert from hex to binary and from binary to signed decimal. I'll probably share all of my code in the future, but for now, here's a walkthrough of those two functions:

Translating to binary

This was pretty simple. I just had to use sprintf and it immediately converted the hexadecimal numbers into binary. Only one hitch! I needed the leading zeros (if there were any), so I had to use rjust to make sure it was a 32 bit binary by padding it to the left with 0s.

# takes an array of hex and returns an array of binary
# (32bit, including leading zeros)
def translate_to_binary(array_of_hex)
  array_of_binary = []
  array_of_hex.each do |num|
    array_of_binary << sprintf("%b", num).rjust(32, '0')
  end
  array_of_binary
end

Converting to a signed integer

Since I couldn't just cast as a short, I had to use two's complement. With two's complement, I knew that if the integer version of the binary was greater than 2^15, then it was actually a negative number. Otherwise, it was correct as is.

def convert_to_signed_binary(binary)
  binary_int = binary.to_i(2)
  if binary_int >= 2**15
    return binary_int - 2**16
  else
    return binary_int
  end
end

Boom! I hope this helps someone else who might've had the same trouble I did at first. I'll go into the program in full after my whole class has actually submitted theirs. 😛

SQL is the best!

I gave a SQL tutorial at PyLadies Boston last night and it was pretty fun. We used sqlite3 (which is definitely my least favorite DBMS, but it does come installed on pretty much every Linux/Unix machine by default and is the default for Django so I decided it was the best tool for this particular job. Giving a tutorial on something I used daily and have used consistently for 7 years was a bit weird because I did forget a few things because it didn't even cross my mind that people wouldn't know. For example: I initially neglected to mention that every statement needs a semicolon at the end and that you can't mix quotes (no " with '). Consider that was the bulk of all the issues, I'm feeling pretty successful right now! Take a look at the full tutorial below and let me know what you think.

Plotting Data From A CSV with Matplotlib

After I got all that data from the logs, my boss wanted it in a nice graph. First of the active user numbers, then the top 15 users. I knew that, despite having never used Matplotlib, it will still take me less time to learn it than any of my other options. I was able to get my script running and plotting correctly in less than two hours, so I felt pretty good about that. However, I had a few nested for loops and I wasn't a big fan. Enter the crowd-sourced code review! My friend Jenny was able to come up with a cool alternative to my solution that I ended up using. She utilized plot_date to sort the dates/data, which really helped (I was doing all sorts of crazy fun things).

So here's an example of what active_users.csv looked like:

system,au1,au30,date
jira,5,20,2016-06-09
confluence,16,23,2016-06-09
jira,8,22,2016-06-10
confluence,18,26,2016-06-10
jira,10,22,2016-06-11
confluence,18,26,2016-06-11
jira,11,23,2016-06-12
confluence,19,27,2016-06-12
jira,13,24,2016-06-13
confluence,19,28,2016-06-13
jira,8,24,2016-06-14
confluence,10,28,2016-06-14
jira,9,26,2016-06-15
confluence,15,30,2016-06-15
jira,15,26,2016-06-16
confluence,20,30,2016-06-16

he biggest problem was determining how to store the data in the program in a way that could be easily plotted. End solution? A dictionary of arrays. Or more precisely, a dictionary of a dictionary of arrays. With each line, we appended each data point to the matching array, which meant that a given date had the same index as it's data. And boom! It works!

import matplotlib.pyplot as plt
import csv
from datetime import datetime

active_users = {
  'jira': {
    'dates': [],
    'au1': [],
    'au30': []
  },
  'confluence': {
    'dates': [],
    'au1': [],
    'au30': []
  }
}

with open('active_users.csv') as csvfile:
  active_users_csv = csv.reader(csvfile)
  for system, au1, au30, date in active_users_csv:
    active_users[system]['dates'].append(datetime.strptime(date, '%Y-%m-%d'))
    active_users[system]['au1'].append(au1)
    active_users[system]['au30'].append(au30)

plt.plot_date(active_users['jira']['dates'], active_users['jira']['au1'], label='jira au1', color='orange', fmt='-')
plt.plot_date(active_users['jira']['dates'], active_users['jira']['au30'], label='jira au30', color='red', fmt='-')
plt.plot_date(active_users['confluence']['dates'], active_users['confluence']['au1'], label='confluence au1', color='green', fmt='-')
plt.plot_date(active_users['confluence']['dates'], active_users['confluence']['au30'], label='confluence au30', color='blue', fmt='-')
plt.legend(bbox_to_anchor=(0., 1.02, 1., .102), loc=3,
           ncol=4, borderaxespad=0.)
plt.show()
Resulting graph of AU1 and AU30 numbers

Resulting graph of AU1 and AU30 numbers

Ok, so now that graph #1 is done, I had to graph the top 15 users over the past week and their usage patterns. First off, here's an example of the data I was working with:

User,Date,Request Count
jsmith,2016-06-20,12
kthrace,2016-06-20,1
shastings,2016-06-20,11
sbristow,2016-06-20,3
jmccoy,2016-06-20,3
akoni,2016-06-20,9
gmorrison,2016-06-20,4
pfisher,2016-06-20,18
ndrake,2016-06-20,10
lbriscoe,2016-06-20,7
egreen,2016-06-20,13
crubirosa,2016-06-20,20
avanburen,2016-06-20,2
mlogan,2016-06-20,18
ckincaid,2016-06-20,11
rcurtis,2016-06-20,21
jfontana,2016-06-20,16
clupo,2016-06-20,5
kbernard,2016-06-20,7

Obviously, with our actual prod data, there were thousands of users... so a few more lines to loop through. The first problem was to put the data into a format I could use. Since even a top user might not use the system at all one day (say a Sunday), I couldn't use a simple dictionary; this time I had to utilize defaultdict. Defaultdict enabled me to create a dictionary of users where the value was (by default) an array of 7 zeros (representing usage for the past 7 days). After that, I was able to loop through the file for each day. To get the file names, I had to start with yesterday's date and go backwards. The date still gets appended to the 'dates' array, but the big change is in users: instead of appending the data to an array, I insert it into the index that matches that day.

So now that I have a dictionary of dates and users, I have all that I need to determine the top 15 users of the week. I create another dictionary that has the users as keys and sums up their total requests from the array and sets that as the value. Once I do that, I sort it, end up with a tuple, reverse it, then slice off the top 15. At that point, I just need to loop through my weekly_active_users list and then plot each user's data! Though I did have one, final (much smaller) problem: I had to find 15 matplotlib colors that I could use and distinguish. I created my array of colors and added a counter to each loop so I could add a unique color to each user. Success!

import matplotlib.pyplot as plt
import csv
from datetime import datetime, timedelta, date
from collections import defaultdict
import operator
from itertools import islice
import re

active_users = {
  'jira': {
    'dates': [],
    'users': defaultdict(lambda: [0] * 7)
  },
  'confluence': {
    'dates': [],
    'users': defaultdict(lambda: [0] * 7)
  }
}

active_users_weekly = {
  'jira': {},
  'confluence': {}
}
current_date = date.today()
day = timedelta(days=1)

for i in range(0,7):
  current_date = current_date - day
  current_date_txt = current_date.strftime('%Y-%m-%d')
  for system in ['jira', 'confluence']:
    active_users[system]['dates'].append(current_date)
    with open("active_users_{0}_{1}.csv".format(system,current_date_txt)) as csvfile:
      users = csv.reader(csvfile)
      for user, log_date, request_count in users:
        active_users[system]['users'][user][i] = int(request_count)

sorted_active = {'jira': {}, 'confluence': {}}
for system in ['jira', 'confluence']:
  for user, request_list in active_users[system]['users'].items():
    active_users_weekly[system][user] = sum(request_list)
  sorted_active[system] = sorted(active_users_weekly[system].items(), key=operator.itemgetter(1))
  sorted_active[system].reverse()
  sorted_active[system] = list(islice(sorted_active[system],15))

fig = plt.figure()
jira = fig.add_subplot(211)
jira.set_title('JIRA')
conf = fig.add_subplot(212)
conf.set_title('Confluence')
colors = ['red', 'gold', 'darkorange', 'green', 'turquoise', 'dodgerblue', 'navy', 'darkviolet', 'violet', 'pink', 'darkslategrey', 'silver', 'blue', 'lime', 'orange']

i = 0
for user, request_num in sorted_active['jira']:
  jira.plot_date(active_users['jira']['dates'], active_users['jira']['users'][user], label=user, fmt='-', color=colors[i])
  i += 1

i = 0
for user, request_num in sorted_active['confluence']:
  conf.plot_date(active_users['confluence']['dates'], active_users['confluence']['users'][user], label=user, fmt='-', color=colors[i])
  i += 1

jira.legend(bbox_to_anchor=(0., 1.1, 1., .102), loc='lower center',
           ncol=8, borderaxespad=0.)
conf.legend(loc='upper center', bbox_to_anchor=(0.5,-0.1), ncol=8)
plt.show()
Oooh pretty colors! Also, fake data makes for a bad graph.

Oooh pretty colors! Also, fake data makes for a bad graph.

Getting Started As A Developer

Through my work with PyLadies Boston, I have been asked quite a few times on how to get started with development. I'm going to try to write it all down here.

So you want to become a software developer?

Awesome! It's a pretty fun (albeit sometimes frustrating) gig and the pay is pretty decent too. Just be patient... it's not super easy and sometimes it'll get difficult. It's worth it though, so stick with it.

Step 1: Pick a language

Don't spend too long on this step! I would recommend either Python or Ruby as good beginner languages. The syntax is relatively similar to English, so it's not too hard to read code from early on. Also, these are two languages that are widely used at actual companies! Ruby is a fan favorite of startups and Python has a huge following in the scientific/academic communities. If you want to further progress into web development, I would recommend Ruby because, in my opinion, I think the documentation and tutorials available for Rails are much better (and in some cases easier to understand) than the docs/tutorials for Django.

Either way: don't think too hard about it. You just need to pick one. Once you learn one, you can always, much more easily, learn another.

Step 2: Pick a method

There are a load of resources out there. One I recommend is Zed Shaw's Learn Code the Hard Way (for Ruby, Python, SQL, and C). There's also How To Think Like A Computer Scientist (for Python), along with plenty of others. If you prefer a book, I can recommend both Dietel's How To Program (Java) and Pine's Learn To Program (Ruby, also a web tutorial!). The world of programming books/tutorials is  your oyster! Just pick a learning style that you like and stick with it. If videos are your thing, Codeschool has excellent video tutorials.

What I do not recommend: while Codecademy can be good for trying to decide what language to use, I do not recommend it for learning. Codecademy is software (what you will be building) and software has bugs. What you don't want to be spending time on is trying to figure out if the bug is yours or Codecademy's. If you think that sounds crazy, I have had Python code that I've run locally with no errors that gets a random error on Codecademy. Plus, one of the most difficult parts is installation and setup. You miss that with Codecademy. If this is your tool of choice, you have been warned.

Step 3: Give it some time

Try to dedicate some amount of time every day. 10 minutes when you first get in to work? 30 minutes when you get home? Doesn't matter. The more time you can dedicate, the faster you will progress, but the important thing is to make it a habit so you stick with it. Most of these resources have forums that you can utilize if you run into problems. If they don't, then you can also use StackOverflow. If you google for your error message, you will probably get a result on StackOverflow. Check it out and see if you can fix your bug. Once you get past the basics, give yourself a challenge by trying some exercism.io problems. They have problems for almost all languages and your submissions will actually get code reviewed!

Step 4: Level up!

You have a solid foundation! Time to take it to next level! And by that I mean web development. Is that the only route you can go? Nope! But I'm a web developer, so that's what I actually have experience on. Also, I have the most experience in Python and Ruby, so those are the languages that I'll have the most links for. If anyone has some next level topics for non-web developers, put it in the comments! Or link to your own post. Depending on what you started with, here are some resources:

Ruby:

  • Michael Hartl's Rails Tutorial - This is the best Rails tutorial out there. I'd almost argue that it's the best web dev tutorial out of any language.
  • CodeSchool's Rails For Zombies - If you prefer videos, Rails For Zombies is corny, but pretty great. And the first course is free!
  • Sinatra - a microframework for Ruby. If you really want to dig in and try to learn how things work, using a microframework that doesn't enable all the bells and whistles by default is awesome. 

Python:

  • Tracy Osborn's Hello Web App - Awesome book series made to teach non-programmers web development through Django
  • Getting Started With Django - Short video series. Starts you after the official Django tutorial
  • Django Book - The official Django tutorial. I'm hoping it's been updated since I tried to go through it because it was a bit buggy then.
  • Flask - a microframework for Python. Also see this tutorial
  • Lynn Root's NewCoder.io - Not web dev, but definitely a level up. Lynn has written tutorials on APIs, web scraping, data visualization, GUIs, and networks. These are great if one of these topics is of interest to you.
  • Daniel and Audrey Roy Greenfield's Two Scoops of Django - this is not really a beginner book. More an "after your first app" book. But this is one of the best programming books I have ever read, so I absolutely had to add it to this list.

Java:

  • Play Framework - As far as I can tell, this is the most popular web framework for Java. Their own documentation contains a solid amount of good tutorials to get you up and running fast.

Step 5: Build something!

This is absolutely the hardest step. Why? Because it requires you to actually be a little imaginative and think of something that you want to create. To start, you can create a website (either a personal site or a landing page for your project) on Github Pages. It's free and super easy to get started! As far as picking a project, there are shortcuts if your brain is a bit fried and you can't think of anything. There are lists of coding projects that you can pick from. You can also contribute to open source. Whatever you choose, the important thing is to keep working at it. Even senior developers are still constantly improving their skills, so you will constantly be learning at all stages of your career.

Calculating Age... in Java 8

I'm doing a bit more Java now that I'm taking a Java class. With that is coming a lot of "oh that should be easy... wait, there's not a really simple way to accomplish this???". First example of this: determining someone's age.

import java.time.LocalDate;

public class Age {
  private int birthYear, birthMonth, birthDay, age;

  public Age(int birthYear, int birthMonth, int birthDay){
    this.birthYear = birthYear;
    this.birthMonth = birthMonth;
    this.birthDay = birthDay;
    this.age = getAge(birthYear, birthMonth, birthDay);
  }

  public int getAge(int birthYear, int birthMonth, int birthDay){
    LocalDate fullBirthday = LocalDate.of(birthYear, birthMonth, birthDay);
    LocalDate now = LocalDate.now();
    long daysSinceBirth = now.toEpochDay() - fullBirthday.toEpochDay();
    return (int) (daysSinceBirth/365);
  }
}

LocalDate is new to Java 8. Previously it was part of the Joda-Time API, but the Java folks seem to have added the bulk of the functionality directly into Java. Sweet! What does this allow us to do? LocalDate creates an object that represents a date and has quite a verbose API. Since we're calculating someone's age, we are going to need an object that represents their birthday and an object that represents today (in this case fullBirthday and now). If we convert these both to Epoch Days, which is generally just the number of days from 1970-01-01, we can just compare the number of days and divide by 365 to get the age. Not too hard... but did take a second to come up with it... I was a bit surprised that it seemed like I couldn't actually subtract dates. Ruby has spoiled me...

Update: In the comments below, Ted Vinke alerted me to another, even easier way to calculated it with the Period. This solution still uses LocalDate, but takes less math on our part. Thanks for the improvement, Ted!

import java.time.Period;
import java.time.LocalDate;

public class Age {
  public int getAge(int birthYear, int birthMonth, int birthDay){
    LocalDate fullBirthday = LocalDate.of(birthYear, birthMonth, birthDay);
    LocalDate now = LocalDate.now();
    int period = Period.between(fullBirthday, now).getYears();
    return period;
  }
}