Mining Temporal Properties of Data Invariants

Mining Temporal Properties of Data Invariants

G(x XFy) login attempt auth failed login attempt guest login login authorized attempt auth failed login attempt login attempt guest login authorized auth failed login attempt Authorized auth failed login attempt auth failed Texada G(guest login XFauthorized) General LTL Specification Mining Caroline Lemieux, Dennis Park and Ivan Beschastnikh University of British Columbia Department of Computer Science source: https://bitbucket.org/bestchai/texada 1 Program Specifications Formal expectation of how a program should work Specs are useful, but rarely specified by developers May be difficult to write out May fall out of date like documentation program without specs: easier for initial dev class C{ class B{ oo() class A{

ping() ar() foo() pongar() ... bar() ... ... } } } harder for debugging, refactoring, maintenance program with specs: harder for initial dev class C{ class B{ oo() class A{ ping() ar() foo() pongar() ... bar() ... ... } } } foo() always precedes bar() ... easier for debugging, refactoring, maintenance 2 Program Specifications Formal expectation of how a program should work

Specs are useful, but rarely specified by developers May be difficult to write out May fall out of date like documentation program without specs: easier for initial dev class C{ class B{ oo() class A{ ping() ar() foo() pongar() ... bar() ... ... } } } program with specs: harder for initial dev class C{ class B{ oo() class A{ ping() ar() foo() pongar() ... bar() ... ... } } } solution: infer specs harder for debugging, refactoring, maintenance foo()

always precedes bar() ... easier for debugging, refactoring, maintenance 3 Uses of Inferred Specs in Familiar Systems class C{ class B{ oo() class A{ ping() ar() foo() pongar() ... bar() ... ... } } } familiar system foo() always precedes bar() ... inferred specs ? unfamiliar system foo() always

precedes bar() ... inferred specs system comprehension[4] program maintenance[1] confirm expected behavior[2] system modeling[4] reverse bug detection[2] engineering[1] test generation[3] [1] M. P. Robillard, E. Bodden, D. Kawrykow, M. Mezini, and T. Ratchford. Automated API Property Inference Techniques. TSE, 613-637, 2013. [2] M. D. Ernst, J. Cockrell, W. G. Griswold and D. Notkin. Dynamically Discovering Likely Program Invariants to Support program evolution. TSE, 27(2):99123, 2001. 4 [3] V Dallmeier, N. Knopp, C. Mallon, S. Hack and A. Zeller. Generating Test Cases for Specification Mining. ISSTA, 85-96, 2010. [4] I. Beschastnikh, Y. Brun, S. Schneider, M. Sloan and M. D. Ernst .Leveraging existing instrumentation to automatically infer invariant-constrained models. FSE, 267277, 2011. Inferred Specs in Unfamiliar Systems class C{ class B{ oo() class A{ ping() ar() foo() pongar() ... bar() ... ... } } } familiar system foo() always

precedes bar() ... inferred specs ? unfamiliar system foo() always precedes bar() ... inferred specs system comprehension[4] program maintenance[1] confirm expected behavior[2] system modeling[4] reverse bug detection[2] engineering[1] test generation[3] [1] M. P. Robillard, E. Bodden, D. Kawrykow, M. Mezini, and T. Ratchford. Automated API Property Inference Techniques. TSE, 613-637, 2013. [2] M. D. Ernst, J. Cockrell, W. G. Griswold and D. Notkin. Dynamically Discovering Likely Program Invariants to Support program evolution. TSE, 27(2):99123, 2001. 5 [3] V Dallmeier, N. Knopp, C. Mallon, S. Hack and A. Zeller. Generating Test Cases for Specification Mining. ISSTA, 85-96, 2010. [4] I. Beschastnikh, Y. Brun, S. Schneider, M. Sloan and M. D. Ernst .Leveraging existing instrumentation to automatically infer invariant-constrained models. FSE, 267277, 2011. Spec Mining Sources Specs can be mined from various program artifacts. Source code [1] Documentation [2] Revision histories [3] Focus of talk: textual logs (e.g., execution traces) Easy to instrument, extensible

sales_page search sales_anncs search sales_anncs search search sales_anncs sales_anncs -homepage search homepage search sales_anncs sales_anncs homepage search web log 0 is 1 is 2 is 3 is 4 is .. 0 is 1 is 2 is 3 is 4 is .. 0 is 1 is 2 is 3 is 4 is .. THINKING HUNGRY THINKING THINKING THINKING THINKING

EATING THINKING THINKING THINKING dining phil. THINKING THINKING THINKING THINKING THINKING StackAr(int) isFull() isEmpty() top() isEmpty() topAndPop() isEmpty() isFull() isEmpty() top() isEmpty() push(java.lang.Object) isFull() isFull() isEmpty() top() isEmpty() push(java.lang.Object) data struct. this.currentSize == this.front this.currentSize == this.back this.theArray[] elements == null this.theArray[].getClass() elements == null this.currentSize == 0 .. this.back <= size(this.theArray[])-1 .. this.back <= size(this.theArray[])-1 ..

this.back <= size(this.theArray[])-1 .. this.back <= size(this.theArray[])-1 .. this.theArray[] elements == null this.theArray[].getClass() elements == null this.currentSize == 0 this.front one of { 0, 6 } data inv. log [1] R. Alur, P. Cerny, P. Madhusudan, W. Nam. Synthesis of Interface Specifications for Java Classes. In Proceedings of POPL05. [2]L. Tan, D. Yuan, G. Krishna, and Y. Zhou. /*Icomment: Bugs or BadComments?*/. In Proceedings of SOSP07. [3] V. B. Livshits and T. Zimmermann. Dynamine: Finding Common Error Patterns by Mining Software Revision Histories. In Proceedings of ESEC/FSE05. 6 Spec Patterns to Mine In this talk, focus on mining temporal specs open() is always followed by close() (response pattern) Many temporal properties could be mined: strict response pattern + resource allocation [2] variations of response pattern [1] lots of small patterns to combine into big ones [4] response patterns of arbitrary length [3] branching livesequence charts [5] [1] J. Yang, D. Evans, D. Bhardwaj, T. Bhat and M. Das. Perracotta: Mining Temporal API Rules from Imperfect Traces. ICSE06. [2] M. Gabel and Z. Su. Javert: Fully Automatic Mining of General Temporal Properties from Dynamic Traces. FSE08.

[3] D. Lo, S-C. Khoo, and C. Liu. Mining Temporal Rules for Software Maintenance. Journal of Software Maintenance and Evolution: Research and Practice, 20 (4), 2008. [4] G. Reger, H. Barringer, and D. Rydeheard. A Pattern-Based Approach to Parametric Specification Mining. In Proceedings of ASE13. [5] D. Fahland, D. Lo, and S. Maoz. Mining Branching-Time Scenarios. In Proceedings of ASE13. 7 Spec Patterns to Mine In this talk, focus on mining temporal specs open() is always followed by close() (response pattern) Many temporal properties could be mined: strict response lots of small pattern + resource patterns to combine [2] into big ones [4] allocation Which temporal spec mining tool should variations of response pattern [1] response patterns of arbitrary length [3] I use? branching livesequence charts [5] [1] J. Yang, D. Evans, D. Bhardwaj, T. Bhat and M. Das. Perracotta: Mining Temporal API Rules from Imperfect Traces. ICSE06. [2] M. Gabel and Z. Su. Javert: Fully Automatic Mining of General Temporal Properties from Dynamic Traces. FSE08. [3] D. Lo, S-C. Khoo, and C. Liu. Mining Temporal Rules for Software Maintenance. Journal of Software Maintenance and Evolution: Research and Practice, 20 (4), 2008. [4] G. Reger, H. Barringer, and D. Rydeheard. A Pattern-Based Approach to Parametric Specification Mining. In Proceedings of ASE13. [5] D. Fahland, D. Lo, and S. Maoz. Mining Branching-Time Scenarios. In Proceedings of ASE13. 8 Ultimate Temporal Spec Inference mine any general temporal pattern

Texada pattern-based: can output a set of simple patterns, or more general patterns patterns specified in LTL, includes 67 pre-defined templates 9 Contributions Texada: general LTL specification miner a b c e d textual log (x,y) any LTL formula Texada (a,b) (c,e) (e,d) inferred specs Approximate confidence/support measures for LTL Concurrent system analysis Dining Philosophers Sleeping Barber 10 Texada Outline inputs login attempt guest login auth failed authorized -login attempt auth failed login attempt authorized -login attempt

auth failed login attempt auth failed -login attempt auth failed login attempt guest login authorized -- Texada Log Parser parsed log Property Instance Checker output G(guest login XFauthorized) Valid Property Instances property instances Log x is always followed by y G(xXFy) Property Type SPOT[1] LTL Parser events formula tree Property Instance Generator

[1] A. Duret-Lutz and D. Poitrenaud. Spot: an Extensible Model Checking Library using Transition-Based Generalized Buchi automata. In Proceedings of MASCOTS04. 11 Property Type Mining Parse each property type into interpretable format (tree) For each property type, dynamically generate and check property instances on log: G(authorized XFguest login) G(authorized XFlogin attempt) G(authorized XFauth failed) G(guest login XFauthorized) x is always followed by y G(xXFy) G(auth failed XFauthorized) G(auth failed XFguest login) G(auth failed XFauthorized) G(guest login XFlogin attempt) G(guest login XFauth failed) G(login attempt XFauthorized) G(login attempt XFguest login) G(login attempt XFauth failed) 12 Linear Log Parsing Texada parses logs by regexes (specify event line format, trace separator) login attempt guest login auth failed authorized -login attempt auth failed login attempt authorized -login attempt auth failed login attempt auth failed -login attempt auth failed

login attempt guest login authorized -- set of traces in linear format 1. login attempt guest login auth failed login attempt auth failed login attempt authorized 3. login attempt auth failed login attempt auth failed login attempt auth failed login attempt guest login 2. 4. authorized authorized 13

Property Instance Checking (Linear Alg) Check each instance on each trace in log holds on trace holds on first event of trace holds on first event of trace 0 1 guest login G guestqlogin p 2 login attempt 3 auth failed authorized G(p): check if p holds at every time point qr :check if qr X X(s): check if s holds at next time point F(a): check if a holds at some time point Fr s authorized 14 Linear Algorithm Observations Linear checker works but is slow. Notice: most temporal operators rely on relative positions Optimization: use map format login attempt

login attempt guest login auth failed auth failed login attempt authorized auth failed event posns login attempt [0] guest login [1] auth failed [2] authorized [3] event posns login attempt [0,2] auth failed

[1,3] 15 Checking on Map Traces Check on trace in map form also tree-based but also uses the negation of nodes Map form allows algorithm to skip over trace G G(p) holds at 0 if !p never occurs find first occurrence of !p guest login X search for first occurrence of guest login (1) F authorized posns login attempt [0] guest login [1] auth failed [2] authorized [3] first occ 3 &

!p never occurs in trace, G(p)login holds. guest event X G ! first occurs at last occurrence of authorized (3) authorized 16 Memoization (reuse of computation) To check property type, check each instance on log for N unique events, M variables, ~NM instances tree form allows for specialized memoization G G G X F F F X

X guest login G X X F F G X login attempt X X X F F G X F F G X

F G X X F X G F F F X G G G X X G X

G X F G G F G G G F F G X F G X F G X

authorized authorized F F Preliminary memo over 3 instantiations: 7% speedup 17 Support, Confidence for LTL Want to know which instances almost never violated check guest login is always followed by authorized: login attempt guest login auth failed authorized guest login authorized guest login only one guest login not followed by authorized guest login is almost always followed by authorized Can we formalize this? 18 Initial Support, Confidence Concept Proposal: support for G(p) = # number of time points where p holds qqqq qpqq pppp sup G(p)= 0 sup G(p)= 1 sup G(p)= 4

But: support for G(pXFq) pppq pqpp rrrr sup G(pXFq)= 4 sup G(pXFq)= 2 sup G(pXFq)= 4 19 Support, Confidence Heuristic What we do: focus on falsifiability guest login XFauthorized vacuously true on login attempt guest login auth failed authorized guest login authorized guest login Call these vacuously true time points not falsifiable Approximate support, support potential for arbitrary LTL Support potential of : number of falsifiable time points Support of : number of falsifiable time points on which is satisfied Confidence of : support/support potential (or 1 if both are 0) 20 Texada Evaluation Can Texada mine a wide enough variety of temporal properties? Can Texada help comprehend unknown systems? Real estate web log

StackAr Can Texada confirm expected behavior of systems? Dining Philosophers Sleeping Barber Is Texada fast? Texada vs. Synoptic Texada vs. Perracotta Can we use Texadas results to build other tools? Quarry prototype 21 Texada Evaluation Can Texada mine a wide enough variety of temporal properties? Can Texada help comprehend unknown systems? Real estate web log StackAr Can Texada confirm expected behavior of systems? Dining Philosophers Sleeping Barber Is Texada fast? Texada vs. Synoptic Texada vs. Perracotta NEW Can we use Texadas results to build other tools? Quarry prototype 22 Expressiveness of Property Types Texada can express properties from prior work Name Synoptic[1] Regex LTL

Always Followed by G(xXFy) Never Followed by G(xXG!y) OneCause y*(xyy*)* G(xX(!x U y)) CauseFirst (xx*yy*)* (!y W x) & G(xXFy) OneEffect y*(xx*y)* G((xXFy) & (yX(!y W x))) Texada can Always Precedes (!y W x) variety of mine a wide (!y W x) & G((xX(!x U y)) & (y X(!y W properties Alternating (xy)* x))) MultiEffect (xyy*)*concurrent (!y W x) & G(xX(!x U y))properties Texada can mine sys. MultiCause (xx*y)* (!y W x) & G((xXFy) & (yX(!y W x))) Perracotta

Texada has reasonable performance EffectFirst y*(xy)* G((xX(!x U y)) & (y X(!y W x))) [2] Patterns in Property Specifications for Finite-State Verification [1] I. Beschastnikh, Schneider, M. Sloan and M. D. Ernst. Leveraging Existing Instrumentation to Automatically Infer Invariant-Constrained Models. [Dwyer Y.etBrun, al.S.ICSE99] FSE11. 23 [2] Jinlin Yang, David Evans, Deepali Bhardwaj, Thirumalesh Bhat, Manuvir Das. Perracotta: Mining Temporal API Rules from Imperfect Traces. ICSE06. Dining Philosophers Classic concurrency problem: philosophers sit around a table, thinking, hungry, or eating. 0 needs two chopsticks to eat 4 1 but this pair can eat at the same time so this pair cant eat at the same time 3 2 These specs could not be checked with previous temporal spec miners! 24 Multi-Propositional Traces

LTL: multiple atomic propositions may hold at a time Standard log model: one event at each time point Texada supports multi-propositional logs: multiple events can occur at one time point Dining philosophers log: 5 one minute traces, 6.5K lines multiple events at single time point time point separator 0 is 1 is 2 is 3 is 4 is .. 0 is 1 is 2 is 3 is 4 is .. THINKING HUNGRY THINKING THINKING THINKING THINKING EATING THINKING THINKING THINKING ... 25 Dining Phil. Mutex (safety property) Two adjacent philosophers never eat at the same time Property pattern: G(x !y) if x occurs, y does not 0 G(3 is EATING ! 4 is EATING) 1

4 3 2 G(4 is EATING ! 3 is EATING) Texada output for G(x !y) includes G(0 is EATING ! 1 is EATING) G(0 is EATING ! 4 is EATING) G(1 is EATING ! 2 is EATING) G(2 is EATING ! 3 is EATING) together, mean that two adjacent philosophers never eat at the same time G(3 is EATING ! 4 is EATING) 26 Dining Phil. Efficiency (liveness property) Non-adjacent philosophers eventually eat at the same time Property pattern: F(x & y) eventually x and y occur together 0 F(2 is EATING & 4 is EATING) Texada can1 mine a wide variety of 4 properties F(4 is EATING & 2 is EATING) 2 3 Texada can mine concurrent sys. properties Texada Texada reasonable performance output has for F(x & y) includes

F(0 is EATING & 2 is EATING) F(0 is EATING & 3 is EATING) F(1 is EATING & 3 is EATING) F(1 is EATING & 4 is EATING) together, mean that nonadjacent philosophers eventually eat at the same time F(2 is EATING & 4 is EATING) 27 Texada vs. Synoptic Texada performs favourably against Synoptics miner on three property types it is specialized to mine. More results in paper. Texada algs benefit from log-level short-circuiting. 28 Texada vs. Perracotta Perracotta performs favourably against Texada: Unique events (10K events/trace, 20 traces/log) Perracotta Texada (map miner) Texada can wide 120 mine a 0.85 s variety 2.42 of s 0.97 s 4.07 s properties 160 260 mine concurrent 1.42 s 10.21 Texada can sys.s properties

Texada has reasonable performance Perracottas algorithm particularly effective at reducing instantiation effect on runtime. Further memoization work (along with good expiration policies) might help reduce instantiation effect 29 Conclusion Many temporal spec miners, unclear which to use Texada: general LTL spec miner confirms expected behavior, discovers unexpected use patterns prototyped confidence measures (future work to improve this) can examine concurrent system logs Open source and ready to use: https://bitbucket.org/bestchai/texada/ 30

Recently Viewed Presentations

  • Nature of planning and control - Weebly

    Nature of planning and control - Weebly

    Nature of planning and control . ... The nature of decision on planning and control is depending on supply and demand. ... So, the operations that buy - in resources and produce only when demanded by customers are know as...
  • INTERNATIONAL STAFF EMERGING SECURITY CHALLENGES SECRTARIAT INTERNATIONAL DFIS

    INTERNATIONAL STAFF EMERGING SECURITY CHALLENGES SECRTARIAT INTERNATIONAL DFIS

    Being military NATO manages a definition - though it is not much help, it doesn't give ideas of where the difference lies between insurgents, pressure groups, freedom fighters etc or any idea of scale. It doesn't explore the issue of...
  • Faculty Removal, Recruitment Allowance and Reimbursements

    Faculty Removal, Recruitment Allowance and Reimbursements

    Definitions. Senate Faculty. Faculty appointees who are members of the Academic Senate. Appointees are in Asst, Assoc, Full Professor titles, or full-time LPSOE/LSOE/Sr LSOE (Lecturer Security of Employment) positions
  • EngageNY

    EngageNY

    Cohort 1: Preliminary ESSA results show that approximately 25 receivership cohort 1 schools remain designated, (5 of which are PSSs). 2 more cohort 1 receivership schools will be closing in June 2019. Therefore, approximately 23 Cohort 1 receivership schools will...
  • www.acsu.buffalo.edu

    www.acsu.buffalo.edu

    Kant is both analytical and critical. Both critique and analysis are modes of philosophizing. which can be found together in Kant's work. Kant wrote three "critiques"—synt
  • www.chem.msu.ru

    www.chem.msu.ru

    Мембранное материаловедение проф. д.х.н. Ямпольский Ю.П. д.х.н. Алентьев А.Ю. ИНХС РАН ...
  • Drafting &amp; Residential Interior Design Applications

    Drafting & Residential Interior Design Applications

    Students will create a simple, effective floor plan with furniture arrangements. 12.ID.8.c. ... providing support to floors above and the roof. ... To draw a corner, you have to plan ahead. If the room measures 12' on the interior wall,...
  • Using FANBOYS to Combine Sentences

    Using FANBOYS to Combine Sentences

    Simple Sentences. She wanted to play with Jill. She didn't want to play with Jim. can be made into compound sentences using a comma and a FANBOY (or coordinating conjunction). She wanted to play with Jill, but she didn't want...