![]() ![]() ![]() Dothraki, “High Valyrian”, etc.), or whatever you’d like. You can search for text, phrases (in quotes), character names, languages (e.g. The site has a simple google-like single-search bar. Game of Thrones Script Searchįeel free to explore Game of Thrones Script Search. But like I said, I still want to share it: so I’m announcing a search interface for the entire Game of Thrones script!įirst I’ll show the search interface and then I’ll describe how I got there. While I was making the dataset, I had a few project ideas that could be revenue-generating, so I decided to hold off on making the dataset public. I want to share the raw data, I really do, but I’m not quite ready yet. This dataset of the lines spoken throughout Game of Thrones has character attribution, a language property and translation value if I could find it (thank you which has Dothraki, Valyrian dialects, and more), timestamps, and more sprinkled in. srt files I could find along with some scrapes of what was good at, reformatted them into JSON, rewatched the entire show (for at least the third or fourth time), and built a new dataset (you know, as one does). The closest thing I could get to scripts in a format I could work with was at, but their data wasn’t complete (they’d had a server malfunction?), included spelling errors (and US/UK issues), and wasn’t completely clean. srt files also break up the lines spoken by a character into showable chunks for the screen keeping text in that format would make it difficult to search for extended lines, phrases, context, etc. srt files don’t include data about who says a particular line unless that character is off-screen, but that nomenclature isn’t even consistent. srt files are not always super accurate - there are loads online in all sorts of languages without a canonical version for a given episode. My natural starting point was closed-captioning. After unsuccessfully looking to see if anyone had already made one, I decided to make my own. For the Game of Thrones datasets I’ve been producing ( on github), I realized I didn’t yet have a textual dataset of the words characters speak throughout the show. Like many people, I’m always on the lookout for new data to work/play with. Search by word, phrase, character, or languageĬheck out “ The Ultimate Game of Thrones Dataset ” if you want to learn about other datasets in this series, and have a look at “ 32 Game of Thrones Data Visualizations ” and “ 19 More Game of Thrones Data Visualizations ” for a bunch of visualizations using those datasets. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |