Last September, I attended the O’Reilly Artificial Intelligence conference. Despite possessing a formal background in the field, many of the talks were unapproachable. Technical presentations commonly fall victim to dense slide decks loaded with obtuse jargon and incomprehensible descriptions. It occurred to me that “Artificial Intelligence” has become a buzzword drained of clear meaning. For these reasons, this post attempts to do three things:

  1. Clarify what AI is
  2. Illustrate how AI is relevant to my professional interests on the GitHub Semantic Code team
  3. Summarize highlights from the O’Reilly AI conference

What is AI?

Despite drawing industry-wide sensationalism, few can speak meaningfully about artificial intelligence. Earnings calls increasingly reference AI. Companies that name-drop the jargon du jour are more likely to receive funding when they do. However, before becoming an overused term sweeping the tech sector, pop culture presented AI through science fiction fantasies of sentient droids going rogue and destroying humanity. While that may seem unlikely in the present day, it doesn’t seem far-fetched that we’ll all have self-driving cars, and our children will have personalized robotic tutors. The ideas underlying AI were first proposed by Alan Turing nearly 70 years ago in his paper, Computational Machinery and Intelligence, popularized as the Turing Test in the Imitation Game. He suggested machines could become capable of performing tasks better than humans. The first computer learning program came two years later, built by Arthur Samuel. This was followed by the first neural network designed in 1957. Deep learning did not appear on the scene until much later when Geoffrey Hinton coined the term in 2006.

image

Artificial Intelligence is sufficiently mature that we see a path for it to transform every new industry. It is an evolution in computing that is orders of magnitude more massive than mobile. Its applications span from microbial design to risk assessment and cybersecurity. Despite all this hype about AI, its principal economic value is created via supervised learning.

However, the boundaries of what we consider “AI” seem to shift continually. There was a point when things we take for granted, such as spam filters and autocorrect, were regarded as “cutting-edge” AI. In this way, definitions become difficult because Artificial Intelligence always seems like a step in the future. This moving target makes the overall term diluted and meaningless unless it’s backed with more rigor.

Differences between Artificial Intelligence, Machine Learning, and Deep Learning

Since these terms are often conflated, it’s worth building some vocabulary to differentiate them. An easy way to distinguish them is by looking at what motivates each discipline.

  • Artificial intelligence is motivated by wanting a machine to be capable of human problem-solving, such as performing tedious tasks.

  • Machine Learning is a sub-field within AI. Instead of programming the computer to understand what to do, Machine Learning is motivated by the computer’s ability to teach itself how to do something (given decent structure and appropriate examples to learn from).

  • Deep learning is a sub-field of ML, motivated by wanting the computer to create its structure and interpret data by itself.

image

Artificial Intelligence is a tool that makes computers better at doing human things. Machine Learning is a way to build that tool. Deep Learning is a form of Machine Learning capable of accomplishing Artificial Intelligence.

What about Neural Nets?

Artificial Neural Networks are a class of trainable machine learning algorithms. They are not necessarily modeled after, but rather, inspired by the structure of the human brain. The computational unit of a neural network is an artificial neuron. The network describes how these units connect. A hierarchical model of neural networks involves layers of neurons that each represent one level in the hierarchy.

image

Feed-forward ANNs allow signals to travel one way only: from input to output without and feedback.

Recurrent ANNs have signals traveling in both directions, as computations derived from the initial input are cycled back into the network.

image

An excellent place to start reading up on the topic would be by exploring a typical ANN architecture, the multi-layer perceptron (MLP) neural network with a sigmoid activation function.

How do Neural Nets relate to “Deep Learning”?

The quick explanation is that people kept adding layers to neural nets, which resulted in Deep Neural Networks (DNNs). Deep Learning involves constructing machine learning models that learn a hierarchical representation of the data. Adding a lot of layers enabled the construction of the hierarchical representations required for deep learning. Therefore, Neural Networks became a way to demonstrate Deep Learning.

If AI is so old, why is it a big deal now?

Although ideas foundational to AI have existed since the ’50s, the recent boom is due to a confluence of factors. These factors include better hardware and GPU acceleration, larger training data sets, algorithmic developments, and the creation of services and infrastructure that encourage adoption amongst developers.

Graphical Processing Units (GPUs) are specialized electronic circuits. They have reduced the time needed to train the neural networks used for deep learning. In addition to fast computers, we have enough data to train large neural networks. And while neural networks have been around since 1957, algorithmic advancements in deep learning have enabled significant progress. Cloud-based infrastructure (offered by Google, Amazon, IBM, and Microsoft) have also reduced development costs.

Open source, entrepreneurship, and public interest

Greater VC investment has catalyzed the AI adoption curve. Additionally, the growth of open-source initiatives lead by Google, Facebook, and OpenAI have been critical in the recent explosion. Not only have these tools driven breakthroughs, but open-sourcing has proven to be strategically advantageous, particularly given the example of TensorFlow. TensorFlow leverages the collective expertise of skills around the globe, instead of confining their talent pool to Googlers alone. By defining standards and successfully encouraging massive adoption, Google has ensured its place in becoming the default machine intelligence engine.

How AI is relevant to my work at GitHub

As an engineer on GitHub’s Semantic Code team, my work serves the broader, more audacious goal of making software development easier, more powerful, and accessible. GitHub has one of the largest and richest software data sets in the world. We can use data about source code, its underlying structures, semantics, comments, diffs, and other abstractions to understand the meaning of code—its intent, capabilities, and quality. Through such analyses, we can build tooling that offers greater insight for developers. Our team employs program analysis techniques and lean heavily on research in the fields of programming language theory (PLT), parsing, and syntax tree diffing.

Our work is in service first and foremost to our users, but it’s also exciting to think of it more broadly as an infinitesimal advancement in computing. While we mostly deal with the worlds of applied programming language theory, program analysis, and formal methods, glimpsing progress in a neighboring sub-discipline of computer science, AI, opens up possibilities. Since several innovations happen at intersections, it’s inspiring to imagine how artificial intelligence, applied to program analysis and applied PLT, will alter how technology is created and consumed.

Conference highlights

Now that we’ve established some terminology, we’re well-situated to discuss cutting-edge activities in the field. To this end, I’ve provided a brief synopsis of my favorite talks from the O’Reilly AI Conference below.

Cyclance: Incident Response and Cyberthreats

AI and ML can help automate the security function in many ways. Cyclance uses predictive analysis to automatically identify sound files from malicious ones based on mathematical risk factors. This good/bad classification enables the training of machines to react autonomously to in-progress cyber-threats in real-time. They do this by applying a K-means classifier to group objects based on attributes and features into known K groups and calculating the distance between files.

Gila Lab: Interview with 8-year-old CTO, Abu Qadar

We live in a time where 18-year-old founder/CTOs and self-taught Machine Learning engineers are possible. Abu recently began his studies at Cornell on a scholarship. Two years ago, he built out a system that could detect anomalies within mamograms. They’re hard to analyze, given a very high false-positive rate. By integrating applied pattern recognition and machine learning, he built out the network architecture that performed a more efficient, accurate, and cheaper analysis than what radiologists alone can do.

Some problems he ran into included limited availability of image datasets. Additionally, given his domain was healthcare, accuracy isn’t the sole metric to examine. You have to look at sensitivity, specificity, and how close the bounding box is. A 1% increase or decrease in sensitivity and specificity is enough to impact thousands of people negatively. He elaborates on such challenges in his TEDTalk, “How I searched my way to a cure”.

Abu was 15 when he started, admitting that’s “a bit young to understand how mathematics and theory behind this work.” He started tinkering around the same time MOOCs were becoming popular. He took Andrew Ng’s famous ML course, saying that “GitHub and the documentation people were building out something was so motivating. Going through other people’s codebases was helpful.”

My transcription of an inspiring portion of his talk:

“I didn’t go into ML/AI because I was interested in the mechanics or how can I learn this, but instead, how can I apply it? How can I use it positively to help a lot of people? With AI, you can branch out and apply it to healthcare and social justice, to prevent terrorism. I know this technology exists, but how can I use this. We need to have a personal connection to the problem we’re solving.

Making sure this technology works, and when we release it and push it out to hospitals, I want to ensure it will work. This involves building partnerships with universities and hospitals. Ensuring we built it, but also making sure it is being used. I think there are significant issues in the justice system we have. I’m motivated to analyze it all from a technological system.”

Cardiogram, Freenome, Cytokinetics: healthcare companies innovating in drug discovery and biology

Biology is a great application domain for AI because of how complex it is. Doctors alone can’t look at a genome and determine you have cancer. Applying convolutional neural nets to genomics, however, has the potential to enable early cancer detection and the development of technologies that support geneticists and molecular biologists in the creation of new therapies.

Moore’s Law demonstrates that the cost of computing is exponentially decreasing. Despite declining compute and storage costs, drug prices are exponentially increasing. Many drugs are available and often necessary for depression or PTSD. However, one presenter talked about how there are also behavioral sides to several diseases like type-2 diabetes. His proposal was focused on using ML to scale and improve therapeutics (cures without drugs).

Data is expected to be the factor in making significant dents within the healthcare space. As mobile phones and watches become miniature doctor’s offices in your pocket, tracking steps, heart rates, etc., all this data will become cheaper.

Here are some companies in the space:

  • Cardiogram is a company that analyzes data from wearables.
  • Freenome is an example of a health technology company that develops non-invasive screenings to detect cancer early. This is hugely valuable, given that over 80% of breast/ovarian/prostate/lung cancer deaths are preventable if caught early.
  • Cytokinetics is a biopharmaceutical company using computer vision techniques to predictively model how a molecule can change as it reacts to different drugs being developed.

Using Deep Learning and Google Street View to Estimate the Demographic Makeup of the US

Timnit Gebru’s research demonstrates the new era of visual computational sociology. She used computer vision techniques on images from Google Street View to estimate the demographic composition of the US. Why Google street view? Because almost every address is associated with GSV.

She applied object recognition techniques to infer the race, wealth, how green the state is, which cars are correlated with democratic vs. republican leanings, and more. She did this by looking at 200 images of American cities and classified vehicles. Why cars? Cars are visible. They are not easily deformable. And from a computer vision standpoint, cars are rigid objects.

So Gebru’s team built an extensive data set of cars. It took one year and $35K to gather this data set. The study examined 88 car attributes across different zip codes. She used simple linear regression with L2 regularization to perform classification, using 12% of all US zip codes as training sets, and testing on the rest. Of course, none of this is causal. Only correlation can be drawn between demographic data and features looked at (for example, % sedan, number of cars per image, miles per gallon, etc.). I found the following findings most interesting:

  • Which cars are correlated with Obama supporters? The research found more sedans voted democrat.

  • Which cars are correlated with Black neighborhoods? Cadillacs. Incidentally, Cadillac was one of the first companies that gave out loans to African Americans and directly advertised to them.

  • Which cars are correlated with White neighborhoods? Subarus. The presenter did not discuss this further, but I know that

  • How green is each state? The efficiency of cars (miles per gallon), is associated with environmental. Burlington, Vermont was most efficient based on analysis, and happens to have 100% of the city is renewable.

  • Which cities are the most segregated? Income segregation. We could look at the price of the car. Used Moran’s I statistic: a measure of spatial correlation used widely by segregation research. Chicago was the most segregated. Jacksonville is not super segregated; we don’t see large clusters of expensive cars.

  • Can we predict income? Aggregate everything about cars in a particular zip code, and go from that to some demographic attribute. In this case, it’s income. “Predict” is not a word liked by sociologists because it infers something in the future.

In a similar study on “City Forensics”, computer vision researchers used deep learning models to infer neighborhood safety. Stanford AI Lab also did work to study, Combining satellite imagery and machine learning to predict poverty.

US census data is often used to predict the winner of the next presidential election (although they got it wrong this time, unfortunately). But instead of using census data, can we do this via computer vision? Can we predict census data before it is available? These are the types of questions Timnit Gebru is trying to answer.

Tying it back

Over the past few decades, software development has become ubiquitous. Standards and best practices continue to evolve. There is enormous work left to be done in eliminating ambiguity and applying a more rigorous, empirical, and data-driven understanding of code, as well as the humans that write it and are impacted by it. A deeper understanding could drive the creation of tools that allow every developer to engage with technology at a higher level of sophistication. As auxiliary disciplines in computer science flourish, more avenues for interdisciplinary innovation become available. We have the potential to fashion tools that accelerate development, thereby accelerating the pace at which problems are solved through software.