My attempt at the 7-Segment Display problem from Tom Scott's 'The Basics' series. See the video here.
# Load in modules
from itertools import compress
import matplotlib.pyplot as plt
import numpy as np
import re
# Read the dictionary (from https://github.com/dwyl/english-words)
words = [line.rstrip('\n') for line in open('words_alpha.txt')]
# Out of interest, how many words does this dictionary have?
len(words)
# What is the distribution of word lengths
lengths = np.array([len(i) for i in words])
plt.hist(lengths, bins=30)
plt.title('Distribution of English Word Length')
plt.xlabel('Length')
plt.ylabel('Frequency')
What is/are the longest English word(s) that can be written on a standard 7-segment display?
This largely comes down to defining a set of letters that we can justifiably render on a 7-segment display. Tricky letters are those that contain diagonal lines such as 'W', 'M' and 'Z'. I think Tom Scott's list of bad letters (gkmqvwxz(io)) is nearly perfect. I think 'G' is fine and that 'I' and 'O' should be included. I'll evaluate a few combinations of bad letters and see what happens.
v1 = '[kmqvwxz]' # What I think is fine
v2 = '[kmqvwxzg]' # Tom's v1
v3 = '[kmqvwxzgio]' # Tom's v2
v4 = '[kmqvwxzgior]' # Just interested
v5 = '[kmqvwxzgiort]' # Demonstrating that the code can handle multiple results of the same length
for version in [v1, v2, v3, v4, v5]: # Loop through the different versions of excluded letters
pattern = re.compile(version)
res = [] # Define a results list to append to
for word in words : # Loop through the list of all words append F word contains bad letters
if pattern.findall(word):
res.append(False)
else:
res.append(True)
assert len(words) == len(res) # Check the boolean and word list match
goodWords = np.array(list(compress(words, res))) # Compress the word list by results
goodLengths = np.array([len(i) for i in goodWords]) # Evaluate all lenghts
longest = np.max(goodLengths) # Evaluate the longest length
ix = np.where(goodLengths == longest) # Take all position that equal the longest length
# Print the results
print("Longest Words: {}\n - Excluded Letters: {}\n - Length: {}".format(str(goodWords.take(ix))[2:-2],
version[1:-1],
str(longest)))
This was an interesting problem all round, thanks for sharing Tom.
Criticism and discussion of the methods I've used to answer this problem are entirely welcome. Also, if you know of any other cool coding problems, feel free to send them my way.
Happy Coding,
Sean
After I finished my attempt, a friend found a dense one-liner to replace the 'word in words' for loop and compression step. Really good code golf skills here!
goodWords_followup = np.array(list(filter(lambda word: not any({*word} & {*v5}), words)))
np.array_equal(goodWords, goodWords_followup) # Check the results are the same