Movie Script Shot Lister - SJSU Computer Science Department

Movie Script Shot Lister - SJSU Computer Science Department

Movie Script Shot Lister David Robert Smith Spring 2016 CS298 Writing Project Advisor: Dr. Chris Pollett Goal: Create a tool which will take a raw but properly formatted motion picture script and output a shot list for the movie. The ultimate program is called the Lister Tool The Lister tool will use Training Sets and Nave Bayes in order to

calculate the shot list Basic Components of This Project: The Parser The Liner Tool The Training Sets The Vector Populator Tool The Feature Set The Lister Tool The Comparer The Human Judge Tool

A Motion Picture Script Sample Script Breakdown Lined Script The Four Target Features Our output will be similar to the lined script on the previous slide, except done with variables instead of drawn lines Each line will have four target features:

Cut ShotType CleanType Motion Think of a cut on a line as the beginning and/or end of one the vertical lines Target Feature Legend

Cut NoCut Cut ShotType ECU Extreme Close Up CU Close Up MCU Medium Close Up MS Medium Shot MWS Medium Wide Shot WS Wide Shot VWS Very Wide Shot

EWS Extreme Wide Shot None Other CleanType Single 1 person SingleOTS Single over the shoulder Two shot 2 people Multi more than 2 people MultiOTS multiple over the shoulder Empty nobody there

Motion Static camera doesnt move Loose camera moving a little Tilt tilting up and down Pan panning side to side Zoom lens in and out Dolly camera moves Steadicam camera carried Crane camera booms up/down Aerial camera in air Handheld held in hands DutchTilt side rotate

DollyZoom zoom and dolly Circle circler around Other The Parser Reads a raw but properly formatted script and translates it into a data structure usable by the other programs Who uses the Parser: The Liner Tool The Lister Tool Creates a data structure called AllData which contains:

Script data structure Dataline structure for storing shot list information The First Program: The Parser The Parser reads the raw script line by line and identifies each line: New Scene i.e. EXT. ELM STREET NIGHT, INT. BEDROOM - DAY Blank Line Line with nothing other than spaces Dialogue

Block that starts with an all uppercase character name Action Anything else This information is used to create the Script data structure The Parser The script data structure: Contains script objects, scenes, and scene objects script objects are data structures that can be things like characters or props which appear in more than one scene

scene objects are only in one scene such as dialogue and action blocks With the notable exception of the blank line scene object which appears everywhere A scene in the data structure sense has references to all the script objects and scene objects that appear in that scene, along with other useful information such as the scene number and the scene header info The Parser The Dataline structure: Used for storing the shot list information for the whole script Each line of script has its own line data which has: The line of the script as a string

Space for the four target features: A boolean for cut or no cut on that line An enum for shot type An enum for clean type An enum for motion type Booleans for whether characters appear in the shot

The Parser only fills in the line of script string; the rest are just set initially to the default values The Liner Tool A GUI Interface Dual Purpose: Line script, i.e., Add shot data to a script View an already lined script Uses the Parser to process a script Outputs an AllData file in the form of a JSON or a zip file. (The zip file

is a zipped JSON) The Liner Tool Primary use of Liner Tool is to line scripts, that is to mark the shots on the script Lining is named after practice of drawing vertical lines on a script to mark shots. Although, in our program, shots are marked by selecting the beginning and ending of a shot, by selecting if there are cuts on a line. A cut on a line marks the end of one shot and the beginning of a new We want to be able to line scripts in order to create training sets for using later

The Liner Tool The Liner Tool Lining the script involves going line by line and selecting from the options for each line:

Cut: cut or no cut ShotType: Extreme Wide Shot, Wide Shot, Medium Shot, Close Up, etc. CleanType: Single, Two shot, Multi, empty, etc. Motion: Static, Pan, Tilt, Dolly, etc. Selecting whether a character appears in the current shot. It is worth noting that ShotType, CleanType, etc. can change from line to line regardless of whether a cut took place When all of these options have been picked for each line, the script is considered to be lined The Liner Tool

Lining a script can take a while so the Liner tool was refactored many times to make it more convenient Hot keys were added Navigation of the script was made as simple as possible with use of either button or hot keys Multiple lines of data can be changed at the same time Choices can be copied from one line to another Lines with cuts or changes are highlighted in different colors Each character is given a unique color A space was added for entering timecode for convenience Hovering mouse over objects gives more details Progress can be saved and restored at any time

Training Sets The Lister tool gets its data from Training Sets In our case, a training set, is a film script that has been lined with the Liner tool In its pure form, the Lister tool is interested in shot probabilities We chose to line our scripts based on the actual films. In this way, we are collecting the data from the real film i.e., If a cut happens on screen, the cut check box is selected next to the line where the cut happened. The appropriate shot type, etc. is selected Training sets could also be created without viewing the actual film, but

just by using a humans intuition to mark shots Training Sets Ideally, we wouldve wanted a hundred or even a thousand different lined scripts This wouldve provided both more robust data, but also more possible selections For example, we can customize our output by carefully selecting our input. Say we want our program to line a comedy, we would want to populate our vector with Training Sets of comedies With more scripts, we could even go finer grain than that

Training Sets Unfortunately, lining scripts according to actual movies is both difficult and time consuming Lining a single standard script, in practice took between 16 and 20 hours My time, combined with the time others graciously donated to me, I was able to get 11 different scripts lined, plus another lined for testing purposes, which is an adequate number for our experiments The Vector Populator Tool The Vector Populator tool actually takes the training sets, and pulls out the probability data

This data is then stored in a vector file for use by the Lister tool The vector file is additive, so not only can it be created with any number of Training Sets, but more Training Sets can be added to the same file later Different vector files can be created from different training sets to change the kind of output the Lister Tool will deliver Before Explaining the Vector Populator Tool Lets have a refresher course on the Nave Bayes algorithm, which is critical to understand before understanding how the vectors become populated.

The Nave Bayes algorithm takes the probabilities from the vector and applies them to the inputted script. The very basic idea is that given the data from each line of the unlined script, along with the probabilities, we want to find which option has the highest probability. The option with the highest probability is the one that is picked. Bayes Rule The foundation for Nave Bayes is understanding Bayes rule: What this says in English, is that the probability of A given B is equal to the probability of B given A times the probability of A divided by the probability of B

Lets let look at an example using Bayes Rule: Say we know the line is a new scene and we want to use that information to decide if we should cut or not cut on that line Example = probability of a cut on a line given a new scene A: TargetFeature.Cut = CutOptions.Cut B: Features.SceneObjectType = SceneObjectType.newScene = number of times new scene given a cut, i.e. 79/800 = number of times a cut over every line, i.e. 800/8000 = number of times a new scene over every line 80/8000

0.9875 Example = probability of no cut on a line given a new scene A: TargetFeature.Cut = CutOptions.NoCut B: Features.SceneObjectType = SceneObjectType.newScene = number of times new scene given no cut, i.e. 1/7200 = number of times no cut over every line, i.e. 7200/8000 = number of times a new scene over every line 80/8000 0.0125

Example Probability of cut given new scene = 0.9875 Probability of no cut given new scene = 0.0125 0.9875 > 0.0125 so we pick the cut option This is basically the Nave Bayes algorithm, although instead of just applying one feature, like Features.SceneObjectType, we want to apply many features in the calculation for deciding cut Nave Bayes The complete formula is: K represents all the options represents the probability of each option k

represents the probability of each xi given Ck The right half of this formula is actually a variation on Bayes Rule, except youre multiplying all the probabilities together Notice we didnt divide by p(xi) each iteration It would be the same value for each k, so it wouldnt make a difference Nave Bayes What Nave Bayes does is calculate the probability of each option given all the selections of the features All the probabilities of each feature are multiplied together to come up with the final probability for each option Then the probabilities of all the options are compared and the one

with the max probability is used The Vector The vector stores all of the probabilities. They are extracted from the training set by the Vector Populator tool Each target feature has its own vector, but the four vectors corresponding to the four target features are all stored in one JSON file A vector looks like this: CutVector.feature[f].selection[selected].option[picked].count Each selection represents a Feature (like SceneObjectType), one of which is selected (like SceneObjectType.NewScene)

The option represents the Target Feature Example: CutVector.feature[Features.sceneObjectType].selection[SceneObjectType.NewScene].option[TargetFeatur e.Cut].count would represent the number of times there was a cut when there was a new scene That number divided by the total number of cuts would give the probability of a new scene given a cut The Vector Populator again The Vector Populator goes through a lined script (training set), line by line, uses the picks for the Target Features, extracts the selection for the Features given the content of the script and appropriately increases the counts within the vector Each line of script actually has many Features which can be extracted

and used to create a stronger vector file. One of the goals of this project has been to come up with a well balanced Feature set The Feature Set Features can be anything that can be extracted from the script. I have come up with two main types of feature: pure: A feature that can be calculated regardless of options picked i.e., SceneObjectType, LengthOfScene non-pure: A feature that is based on options picked on previous lines

i.e., LinesSinceCut, LastShotType Its fine to use pure features in any calculation of any option, but we must be careful how we use non-pure features Features Pure sceneObjectType linesSinceObjectChange linesSinceNewScene intExt scriptObjectsInScene

actionBlocksInScene dialogueBlocksInScene sceneLength Non-Pure cut shotType cleanType Motions linesSinceCut linesSinceShotTypeChange lastShotTypeNoCut

lastShotTypeWhenCut linesSinceCleanChange lastCleanType linesSinceMotionChange lastMotionType tiltCount zoomCount panCount dialogueCountInShot uniqueDialogueCountInShot actionCountInShot

lineCountInShot The Lister Tool Takes a raw script as input Allows naming of output file Requires vector file Also allows changing of Feature Settings Feature Settings Features can be turned on/off

Features can be weighted Weighting is done with exponents Settings can be saved/loaded The Lister Takes raw scripts and calls parser to convert to AllData Goes through the converted script line by line and applies Nave Bayes algorithm to each target feature using the inputted vector file The option picked for each target feature is applied to the AllData

The file is converted to JSON and outputted to a zip file The zip file can be opened in the Liner tool to display the results The output of the lister tool is the same file structure as the output of the liner tool. Thus, it is the same as a training set Comparison Comparing the output of the lister tool to a training set can be done in one of two ways: The Comparer Tool Programmatically compares two files and gives their difference Ultimately doesnt provide very good data

The Human Judge Tool Outputs samples from two or more files into a convenient to look at file which humans can judge If humans prefer the lister output as much as the training set or even more, the output is considered good Experiments Basic: Lister output vs. Training Set Control Sample: Training Set vs. Human lined vs. Lister output Training Set used as input of same script Feed Lister output in as input Many training sets vs. few training sets

Comparing different scripts by same director Results Experiments Round 1 13 people participated Fleiss Kappa score: 0.2647 fair agreement Human lined script preferred: 111/156 = 71.1538462% Experiments Round 2 9 people participated Fleiss Kappa score: 0.1915 slight agreement Human lined script preferred: 72/108 = 66.6667%

Some refinements were done between the experiments Refinement The feature settings were tweaked some, disabling certain features from certain target features Some features were given heavier weight The biggest refinement came from separating the target features and doing a complete line by line pass of the script for each target feature The biggest benefit this provided was being able to see where a shot began and ended, thus provide more data to the next target features

Conclusion The project output doesnt pick perfect output but it does do reasonable output, providing a good starting place for would-be human shot list creators Having more training sets wouldve definitely provided more customization Having many training sets in a vector doesnt necessarily provide better output than a few or just one training set The Liner tool is a great tool for humans to create shot lists Time to demo the software!

Recently Viewed Presentations

  • Plastics


    Fantastic changes to plastic . The government introduced a plastic bag charge on 5th October 2015. In 2014 over 7.6 billion single-use plastic bags were given to customers by major supermarkets in England. That's something like 140 bags per person,...
  • Inequality Copyright 2006  Biz/ed Inequality Copyright Inequality Copyright 2006 Biz/ed Inequality Copyright

    Inequality Inequality Incomes Inequality: Incomes Vertical Inequality Difference between the rich and the poor Horizontal Inequality Where people of similar background, status, qualifications, etc. have differences in incomes Inequality: Incomes Caused by: The Labour Market: Differences in education, qualifications, skills,...
  • An Innovative Solution to tackle food waste through

    An Innovative Solution to tackle food waste through

    Approach (3): Awareness, engagement, behavioural change. Raise awareness, engage citizens and provides motivations and incentives for change in behaviour lifestyle. Connecting and engaging citizens and existing communities in order to participate in food surplus redistribution is the core
  • Moving large, heavy loads is crucial to today's

    Moving large, heavy loads is crucial to today's

    RIGGING fundamentals. Presented By: HENNEPIN TECHNICAL College . in partnership with Federal OSHA . Susan Harwood Grant . This material was produced under Grant # SH-19496-09-60-F-27 from the OSHA, U.S. Dept of Labor.
  • Framework for Enhancing Student Learning

    Framework for Enhancing Student Learning

    Inquiry Mindset. Each school in the district has been engaged in an inquiry process focused on student learning for three years. The school inquiry processes and findings from the inquiries have informed the district planning process and provided more clarity...
  • Unit 3 Review Questions From Kahoot!

    Unit 3 Review Questions From Kahoot!

    Unit 3 Review Questions and Answers From Kahoot! Which of the following objects has more momentum if they are all moving at the same speed? Ping pong ball. Balloon. ... Unit 3 Review Questions From Kahoot! Last modified by: Microsoft...
  • The Marshall Court

    The Marshall Court

    Supreme Court's power of judicial review. Cohens v. Virginia, 1821. Cohens was convicted in Virginia for selling illegal lottery tickets. Supreme Court asserts its power to review state court decisions. Fletcher v. Peck, 1810. ... The Marshall Court
  • 1.1 Rational Recreation and Amateurism - Weebly

    1.1 Rational Recreation and Amateurism - Weebly

    Rational recreation. As moral influence exerted by the middle classes increased so did idea of Fair Play. Three major contributors to the emergence of rational recreation:-Codification Competitions Organisations. Codification- How it happened? Public schools looking to develop discipline in their...