ballpen blur close up computer

Which programming skills do employers want in 2020 for six-figure salaries?

In the middle of the Coronavirus pandemic, I’ve found myself needing to find a job. This can be a pain at the best of times, and there are definitely a few good companies still growing and hiring more people, but what are they actually looking for? Time to look at the job sites to find out which programming skills do employers want.

Well to answer the above question, I put together a quick web scraper to full data down from a few job sites to work out what’s the most common tech stacks people are looking for.

The strategy – find which programming skills do employers want

Firstly, I broke this down into a few separate tasks.

  • Use a web scraper to pull data from a website.
  • Identify the relevant data and discard anything that’s irrelevant.
  • Aggregate the technologies together and print the results.
  • Filter by salaries to identify high-value skill sets.

Simple!

The scraper

Next, we need to identify a standard web scraping framework to make things as simple as possible. Luckily, AngleSharp has a very simple library for pulling data from the DOM. This allows you to select the HTML elements that hold the description of the Job. After we have that, we can fully analyse the content.

CancellationTokenSource cancellationToken = new CancellationTokenSource();
HttpClient httpClient = new HttpClient();
HttpResponseMessage request = await httpClient.GetAsync(url);
cancellationToken.Token.ThrowIfCancellationRequested();
Stream response = await request.Content.ReadAsStreamAsync();
cancellationToken.Token.ThrowIfCancellationRequested();
var context = BrowsingContext.New();
var document = await context.OpenAsync(req => req.Content(response));
return document;

The content

The above piece of code gives us the DOM for a specific URL, which is fantastic… but very bloated. The next step is to clean this out and get back the parts we want.

var document = await GetDocument(base.Configuration.GetPagedUrl(startingRecord));
var list = document.All.Where(q => q.ClassList.Contains(JOB_TITLE_CLASS));

This now filters down the HTML content to the div’s with the class title in indeed.com, this is the “jobtitle”. So we’ve successfully gone to a job site, pulled down all the relevant jobs, and have the links to them. The next step is to repeat the web scraping and access individual job descriptions, I’ll skip over that, because it’s effectively more of the same.

So once we’ve managed to pull out all of the job descriptions, we’re left with a big block of text. Lots of that text is a little useless to us, so how do we identify the data that’s important? Well, I considered two methods:

  • Cross-reference against a list/database of programming languages.
  • Remove all words that exist in the dictionary, hopefully leaving us with words that are programming languages.

Both of these solutions are flawed. The initial solution was very hard to track down a reliable consistent source for, and the alternative solution could end up removing languages that were real word… argh!

Either way, I settled on removing dictionary words and hoping for the best!

Dictionary Cleansing

To clean my dataset up, I needed a reliable Dictionary source and settled on Hunspell as it’s simple to implement and quick to verify words. Embedding the code it in a simple adapter would make it easy to switch out at a later date anyway.

public class EnglishDictionary : IDictionary
{
        private readonly HunspellDictionary _dictionary;

        public EnglishDictionary()
        {
            _dictionary = HunspellDictionary.FromFile(@"en_GB.dic");
        }
        public bool Check(String word)
        {
            return _dictionary.Check(word);
        }
}

The final stage involves iterating over the words, removing those that appear in the dictionary, and counting those that don’t.

public class WordAnalyser
{
        private readonly ISpellCheck _spellCheck;
        public WordAnalyser(ISpellCheck spellCheck) => _spellCheck = spellCheck;

        public List<String> GetUnknownWords(IEnumerable<IScraperResults> scraperResults)
        {
            List<String> totalResults = new List<string>();
            foreach (var res in scraperResults.Where(p =>p.Value != null))
            {
                var invalids = _spellCheck.GetInvalidWords(res.Value);
                totalResults.AddRange(invalids);
            }
            return totalResults;
        }

        public IOrderedEnumerable<WordFrequency> SortWordsByFrequency(IEnumerable<String> wordList) => wordList
              .GroupBy(p =&gt; p)
              .Select(g =&gt; new WordFrequency() { Keyword = g.Key, Occurrences = g.Count() })
              .OrderByDescending(g =&gt; g.Occurrences);
}

The above code gives a quick snippet of how I accomplished this by passing the data scraped from a URL into a GetUnknownWords method, prior to calculating word frequency with a simple sort /grouping method. This finally leaves us with a set of results… which we’ll discuss next post.

You can see what employers want here

2 thoughts on “Which programming skills do employers want in 2020 for six-figure salaries?”

  1. Thiѕ desiցn is spectacular! You definitely know how to keep a reader
    entertained. Between your wit and your videos, I was aⅼm᧐st movеd to start my own blog (well, almost…НaHa!) Fаntastic job.
    I really enjoyed what you had to say, and more than thɑt, how you ⲣresented it.
    Too cool!

Leave a Comment

Your email address will not be published. Required fields are marked *

Exit mobile version