My attempt at the 7-Segment Display problem from Tom Scott's 'The Basics' series. See the video here.

In [1]:
# Load in modules
from itertools import compress
import matplotlib.pyplot as plt
import numpy as np
import re
In [2]:
# Read the dictionary (from https://github.com/dwyl/english-words)
words = [line.rstrip('\n') for line in open('words_alpha.txt')]
In [3]:
# Out of interest, how many words does this dictionary have?
len(words)
Out[3]:
370103
In [4]:
# What is the distribution of word lengths
lengths = np.array([len(i) for i in words])
In [5]:
plt.hist(lengths, bins=30)
plt.title('Distribution of English Word Length')
plt.xlabel('Length')
plt.ylabel('Frequency')
Out[5]:
Text(0, 0.5, 'Frequency')

7-Segment Display Question

What is/are the longest English word(s) that can be written on a standard 7-segment display?

This largely comes down to defining a set of letters that we can justifiably render on a 7-segment display. Tricky letters are those that contain diagonal lines such as 'W', 'M' and 'Z'. I think Tom Scott's list of bad letters (gkmqvwxz(io)) is nearly perfect. I think 'G' is fine and that 'I' and 'O' should be included. I'll evaluate a few combinations of bad letters and see what happens.

In [6]:
v1 = '[kmqvwxz]' # What I think is fine
v2 = '[kmqvwxzg]' # Tom's v1
v3 = '[kmqvwxzgio]' # Tom's v2
v4 = '[kmqvwxzgior]' # Just interested
v5 = '[kmqvwxzgiort]' # Demonstrating that the code can handle multiple results of the same length
In [19]:
for version in [v1, v2, v3, v4, v5]: # Loop through the different versions of excluded letters
    pattern = re.compile(version)

    res = [] # Define a results list to append to
    for word in words : # Loop through the list of all words append F word contains bad letters
        if pattern.findall(word):
            res.append(False)
        else:
            res.append(True)

    assert len(words) == len(res) # Check the boolean and word list match

    goodWords = np.array(list(compress(words, res))) # Compress the word list by results
    
    goodLengths = np.array([len(i) for i in goodWords]) # Evaluate all lenghts

    longest = np.max(goodLengths) # Evaluate the longest length

    ix = np.where(goodLengths == longest) # Take all position that equal the longest length
    
    # Print the results
    print("Longest Words: {}\n - Excluded Letters: {}\n - Length: {}".format(str(goodWords.take(ix))[2:-2],
                                                                            version[1:-1],
                                                                            str(longest)))
Longest Words: 'dichlorodiphenyltrichloroethane'
 - Excluded Letters: kmqvwxz
 - Length: 31
Longest Words: 'dichlorodiphenyltrichloroethane'
 - Excluded Letters: kmqvwxzg
 - Length: 31
Longest Words: 'supertranscendentness'
 - Excluded Letters: kmqvwxzgio
 - Length: 21
Longest Words: 'phenylacetaldehyde'
 - Excluded Letters: kmqvwxzgior
 - Length: 18
Longest Words: 'unappealableness' 'unappeasableness' 'unascendableness'
  'unassessableness' 'uncalculableness' 'undefendableness'
  'undependableness' 'unsuccessfulness'
 - Excluded Letters: kmqvwxzgiort
 - Length: 16

This was an interesting problem all round, thanks for sharing Tom.

Criticism and discussion of the methods I've used to answer this problem are entirely welcome. Also, if you know of any other cool coding problems, feel free to send them my way.

Happy Coding,

Sean

Follow-up

After I finished my attempt, a friend found a dense one-liner to replace the 'word in words' for loop and compression step. Really good code golf skills here!

In [30]:
goodWords_followup = np.array(list(filter(lambda word: not any({*word} & {*v5}), words)))
In [32]:
np.array_equal(goodWords, goodWords_followup) # Check the results are the same
Out[32]:
True