Similarity Analysis of Legal Judgments and applying Paragraph-link to Find Similar Legal Judgments.

Size: px
Start display at page:

Download "Similarity Analysis of Legal Judgments and applying Paragraph-link to Find Similar Legal Judgments."

Transcription

1 Similarity Analysis of Legal Judgments and applying Paragraph-link to Find Similar Legal Judgments. by Sushanta Kumar Thesis submitted in partial fulfillment of the requirements for the degree of Master of Science (by Research) in Computer Science and Engineering Center for Data Engineering International Institute of Information Technology Hyderabad , INDIA December 2012 April 2014

2 International Institute of Information Technology Hyderabad, India CERTIFICATE It is certified that the work contained in this thesis, titled Similarity Analysis of Legal Judgments and applying Paragraph-link to Find Similar Legal Judgments by Sushanta Kumar, has been carried out under my supervision and is not submitted elsewhere for a degree. Date Adviser: Prof. P. Krishna reddy

3 Copyright c Sushanta Kumar, All Rights Reserved

4 Dedicated to my parents Mrs. Sulochana Devi, Mr. Surya Deo Prasad Singh, and my brother Mr. Nishanta Kumar for their ever lasting love and support.

5 Acknowledgments First and foremost, all praise belongs to God who gave me all the help, knowledge, and courage to finish my work. This work would not have been possible without the help and support of many individuals. As my advisor, I offer my sincerest gratitude to my supervisor, Prof. P.Krishna Reddy, who has supported me throughout my thesis with his patience and knowledge whilst allowing me the room to work in my own way. I attribute the level of my Masters degree to his encouragement and effort and without him this thesis, too, would not have been completed or written. He taught me how to pursue research. He helped to shape the direction of this work, filled in many of the gaps in my knowledge, and helped steer me toward solutions. His constant encouragement and near-miraculous ability to always find time for his students have made working with him a true pleasure. I want to thank all the people in IT for Agriculture Lab and Center for Data Engineering lab for their stimulating company during the past years. My life would not be the same without the many friends I have made. My good friends Abhishek Sainani, Mohit Goyal, Aravindhan, Raviteja, Sumit Maheshwari, Suvra Saurav and Sirish Verma have kept my life both interesting and entertaining during my MS. Finally, I want to express my gratefulness to my mother Mrs. Sulochana Devi, father Mr. Surya Deo Prasad Singh and brother Mr. Nishanta Kumar for their endless love, support, encouragement, patience and selfsacrifice. I am also thankful to my parents for teaching me the value of knowledge and education. No words in any natural language would be sufficient to thank my parents for all they have done for me.

6 Abstract With technological advancements, more and more content is becoming available in digital form on a regular basis. Such overwhelming amount of available data has lead to the problem of information overload. This has led to an increased interest in developing methods that can help users to effectively navigate, summarize and organize this information. The ultimate goal is to help users find what they are looking for. Significant efforts have been made to encounter challenges posed by information overload in web-domain. These efforts include exploitation of text-content as well as links (present as URL in web-pages). Interestingly, phenomenon of data explosion isn t limited to web-domain but also observed in various other domains as well. Though the nature of challenges induced by information load is quite the same across all domains, uniqueness of a domain demands specialized solution. Unsurprisingly, efforts have been made to build information retrieval system in other domains by extending the popular notions developed for web-domain retrieval systems. In this thesis, we made an effort to address one of the challenges in legal domain and investigated the problem of finding similar legal judgments. In legal domain, information overload has adverse effect on finding similar judgments, which is a crucial task for a lawyer to prepare his arguments. Due to enormous number of judgments available, a lawyer needs to browse through hundreds of legal judgments to find a set of legal judgments similar to a given judgment J and hence finds this task tiring and time consuming. In order to find similar legal judgment, he starts browsing legal database using his knowledge and experience. Once he finds an older judgment (say judgment J ) which adequately satisfies his requirements, he starts looking for more judgments similar to judgment J for comprehensive analysis of the legal principle applied in those judgments. Since number of judgments in legal database is enormous and, in general, size of each judgment is huge, an automated mechanism for finding similar legal judgments turns out to be a non-trivial problem. Textual information and its accessibility play a particularly important role in legal domain. The amount of available text-data in legal domain is vast and continuously growing which makes it challenging to deal with. Apart from the size of data, the inherent complexity of legal domain demand better and more sophisticated methods to process legal documents to satisfy information need of legal practitioners. To begin with, we investigated the issue of finding similar legal judgment by exploiting various attributes of legal judgments. We conducted our experiment on real world dataset and found that, textcontent of judgments is not as effective as links (known as case-citations) available in legal-judgments, for finding similar legal judgments.

7 Further investigation showed that, the performance of case-citation as similarity measure extracts small number of similar judgments from the corpus of judgments. This phenomenon was observed due to availability of less number of case-citations in judgments. Therefore, sole dependency on casecitation to find similar judgments isn t enough. In order to improve the performance, we proposed the notion of paragraph-link. We exploited content-based similarity approach to apply paragraph-links and then applied link-based measure to find similar judgments. It was found that, the new approach produces encouraging results and improves the performance of the previous approach.

8 Contents Chapter Page 1 Introduction Information Retrieval: Overview IR in Legal Domain: Challenges Motivation and Problem Description Overview of Proposed Approach Thesis contribution Organization of thesis Related Work IR approaches: Overview Link-based approaches in IR Exploiting links in web-domain: Exploiting links in text-documents: Automatic generation of links in text-documents: IR approaches in legal domain: Summary Similarity Analysis of Legal Judgments Background Types of legal system: Existing similarity measures: Legal Judgment: An overview Finding Similar judgments: Problem Statement Features employed to find similar judgments: Approaches to find similar judgments: Cosine similarity using all-terms Cosine similarity using legal-terms Bibliographic coupling similarity using out-citations Co-citation similarity using in-citations Experiments Description of dataset Experimental setup Results Analysis by domain experts for sample pairs: Conclusion:

9 CONTENTS 4 Finding Similar Legal judgments using Paragraph-Links Issue Basic Idea Proposed Approach Method for identifying paragraph in legal judgments Method to applying paragraph links Method for finding similar legal judgments: Experimental Results: Preprocessing Experimental setup Observation results Analysis: Evaluation study Conclusion Conclusion and Future work Summary Conclusion Future Work Publications Bibliography

10 List of Figures Figure Page 1.1 Typical IR process An example of links between judgments A typical judgment from Supreme court of India. Discontinuous lines show missing texts Citation frequency against Judgment count plotted on linear and logarithmic scales. Plots show that case-citation follow power-law of distribution A typical headnote from a judgment. Discontinuous lines show missing text from the headnote. Serial number of first two paragraphs and case-citations present at the end of paragraphs are marked by rectangles Bibliographic coupling similarity method. Continuous links represent case-citations while dis-continuous lines represent paragraph links between judgments

11 List of Tables Table Page 3.1 Statistics of judgments used for experiment All term similarity score is high while rest similarity score are less Legal term similarity score is high while rest similarity score are less Co-citation similarity score is high rest similarity score are less Bibliographic coupling similarity score is high while rest similarity score are less Judgment-pairs having high bibliographic coupling score method Judgment-pairs having high co-citation coupling score method Algorithm to identify paragraphs of judgments Algorithm to apply paragraph links Algorithm to find similar judgments Statistics of judgments used for experiment Evaluation of case-citation with domain expert score Evaluation of Paragraph-link(PLs) with domain expert score Evaluation of PL+case-citation with domain expert score Judgment-pairs with score

12 Chapter 1 Introduction Availability of affordable storage media has made it feasible to accumulate digital data in huge size. This is a new phenomenon which demands newer and intelligent methods for processing data of such scale. The discipline within computer science that deals with the representation, storage, organization of, and access to information is called information retrieval (IR). Although IR is a relatively old and well established area of research, it has received particular attention during the last decade when data explosion took place due to world wide web and related technologies. Apart from sheer amount of information, new form of information (e.g image, video) and semi-structured documents (e.g. XML), as well as new kinds of vast document collections such as enterprize repositories and digital libraries drawn major attention back to this field. Unsurprisingly, problems in IR domain to satisfy information need with greater accuracy and efficiently has been an active research area. In general, user of a retrieval system enters a query and browses the responses to satisfy his information need. Information retrieval system responds to the entered query by matching the query with the list of documents in its repositories. Hence, it is desirable that query should be formatted in such a way, so that relevant documents could be obtained. Therefore, the challenge lies at how to represent document and query in a way that can be manipulated by computers with the high accuracy. Since, there is a consequent need for better techniques to access information, it has become important to provide efficient mechanisms to organize, locate and present information effectively. One of the domains where textual information and its accessibility play a particularly important role is the legal domain. The amount of available text-data in legal domain is vast and continuously growing which makes it challenging to deal with. Apart from the size of data, the inherent complexity of legal domain demand better and more sophisticated methods to process legal documents to satisfy information need of legal practitioners. Additionally, it is of high importance in common law system 1, to have access to as many judgments (older cases) as possible. A lawyer cannot risk missing a relevant case that might be available to the opposing lawyer. This kind of competition makes the use of large legal databases a necessity. It is desired from a lawyer to study as many as possible judgments which are similar to the current task in hand. Hence after finding a judgment, it is important for a legal practitioner 1 One of the legal systems, which gives importance to the previously delivered judgments. 1

13 to find more and more judgments similar to the found judgments so that the applied legal principle can be studied in full detail, which can be applied by the lawyer to prepare his defence for the current task. In this way, finding similar judgments 2 is a non trivial problem. Besides, complex nature of judgments and reasonable big size of each judgments make finding similar judgment a challenging task. In this chapter we give an overview of information retrieval (IR), discuss various issues that are being faced by IR and various research efforts to address the issues. Then we explain motivation and problem description, give an overview of the proposed approach in the thesis. Finally we mention the major contributions made in the thesis and organization of the thesis. 1.1 Information Retrieval: Overview Information Retrieval (IR) is the discipline that deals with retrieval of unstructured data, in response to a query or topic statement, which may itself be unstructured. The need for effective methods of automated IR has grown in importance because of the tremendous explosion in the amount of unstructured data, both internal, corporate document collections, and growing number of document sources on the Internet. IR typically seeks to find documents in a given collection that are about a given topic or that satisfy a given information need [17]. The topic or information need is expressed by a query, formulated by users. Documents that satisfy the given query of the user are said to be relevant. Documents that are not about the given topic are said to be non-relevant. An IR engine may use the query to classify the documents in a collection (or in an incoming stream), returning to the user a subset of documents that satisfy some classification criterion. Since the size of data in the corpus is huge, hence in general IR engines rank the list of return documents in response to the entered query. Higher the rank of a documents higher is its relevance to the entered query. As shown in Figure 1.1, the IR process begins when a user having information need approaches an IR system. user enters a query represent his information need into the IR engine. Once the query is entered then the first level of IR engine, namely preprocessing, filters the query. While Indexing is already a stored set of data at IR disposal, the entered query undergoes through query operations, searching and ranking steps. Finally all relevant documents are returned as the result of entered query. This interaction is not one-way and the user can reframe the entered query by looking at the obtained result from the previous query. 1.2 IR in Legal Domain: Challenges Challenges for IR researchers are quite unique when they deal with legal domain. One important aspect that distinguishes the legal domain from text documents of any other domain is related to the 2 A judgment is a closed legal case and a case is defined as a dispute between opposing parties resolved by a court, or by some equivalent legal process. 2

14 Figure 1.1 Typical IR process. materials themselves and the way they are used by legal practitioners [42]. It makes traditional IR techniques non-effective in legal domain. Format of judgments as well as unique writing style itself distinguishes legal domain from generic corpus of text data. Consequently, one needs to exploit these unique properties to process them. For example, statistical characteristics of important words are different from other generic text corpora. In law, relevant terms may appear only with one occurrence besides lengthy argumentation for other points of view. Therefore, the selection of an appropriate vector representation remains tricky. Weighting of terms is challenging because very special words or phrases have to be treated with particular attention that is not statistically evident. Furthermore, the content-based classification tends to become somewhat distorted for legal judgments, since it covers different topics and are also big in size. In such a scenario, documents may not only be organized according to their content, but also to a large degree by their structure or type of document. We thus need to identify ways to provide a better content representation for legal documents. Such complexity has encouraged IR researchers to investigate the existing set of problems in various ways. Efforts have being made to improve the search performance by exploiting the notions of abstraction [28], representation [13], classification [41], and retrieval [6]. Content-based clustering and labeling of European law is attempted in [39]. Big size of judgments have motivated for work in the field of summarization [27][36]. 1.3 Motivation and Problem Description Link analysis has been quite an effective approach in web domain. PageRank [12] and HITS [21] are landmark examples of exploiting links existing between various web-pages. Apart from web domain links have also been exploited in scientific research papers. Interestingly, the significance of links have also encouraged IR researchers to generate links in those text documents, which originally didn t have links. However, such experiments are not done with legal judgments wherein links are present in the 3

15 form of case-citations. In this thesis taking a cue from web environment we explored the effectiveness of links in finding similar legal judgments and then explored possible method to generate links automatically. Under common law system, finding similar judgment is a non-trivial task for legal practitioners. Generally, after getting a task, typically a lawyer prepares his arguments by following these steps: lawyer browses legal database to find similar judgments using his knowledge and past experience. Once he finds a judgment, say judgment J, which satisfies his requirements adequately, he starts looking for more judgments which are similar to judgment J to analyze the applied legal concept in full detail. Normally, their background knowledge is not sufficient to decide the case right away; hence they have to consult a number of legal sources. The research task consists of bridging the gap between the problem at hand and the legal sources, in order to construct legal arguments [10]. Generally, size of legal database is huge, and browsing legal database manually consumes significant amount of time and effort. In thesis we investigated this problem, to reduce manual effort and time consumed for a legal practitioner and proposed approach to find similar legal judgments under common law system. Prior to finding a reference judgment to explore the legal database in more detail, the requirement of a lawyer is abstract. Hence, the only method to proceed further is by browsing legal database manually, whereas after finding one judgment which adequately satisfies the requirement of lawyer, we get a reference point to find similar judgments. Even though there are existing methods to find similar text documents, they can t be directly used with legal judgments because, of the domain specific nature of legal judgments. Apart from this, the conventional methods like vector space model suffer from problems like Polysemy and Synonymy. 1.4 Overview of Proposed Approach In this thesis, we address the problems of finding similar legal judgments, by exploiting domain specific attributes of legal judgments. We identified various features of judgment and explored their effectiveness to find similar judgments. We identified that link-based similarity measure is more reliable to find similar legal judgments. Investigating further, we proposed a notion of paragraph-link to link two judgments and then used them to find list similar judgments for a given judgment. Relative effectiveness of features: We explored the format of legal judgments and investigated various attributes. We observed that, apart from text-data it also contains case-citations. The purpose of case-citation is to strengthen the applied reasoning behind the applied legal concept in the delivered judgment. We came up with four different similarity measures which capitalizes four different attributes of judgments, namely, all-terms, legal terms, in-citations and outcitations. All these features are explained in detail in the later part of this thesis. Our observation showed the relative effectiveness of legal-terms over all-terms whereas out-citations were found to be more effective than in-citations. We found that, bibliographic coupling method is quite effective compared to rest other investigated similarity measures. 4

16 Enrichment of judgments using paragraph-links: It was observed that, though case-citation are effective in finding similar legal judgments, they aren t sufficient to get all the similar judgments for a given judgment. One of the reasons is significant variations in number of case-citations from one judgment to another. Therefore, sole dependence on existing links (case-citations) of judgments fails to produce desired outcome and provides an opportunity for further research in this regard. After further investigation, we propose to enrich each judgment by inducing paragraphlinks in each judgments which can further be leveraged to find similar judgments. It was observed that, a judgment is structured into paragraphs such that each paragraph deals with separate legal concept and hence, the case-citation of a judgment doesn t refer to the whole judgment but to a specific legal concept which is expressed in one paragraph of referred judgment. Our proposed approach is to apply text-based similarity to identify judgment pairs which have similar paragraphs, we call them having paragraph-links and then applied bibliographic-coupling method to find similar judgments. 1.5 Thesis contribution Major contributions of this thesis are enumerated below: Problems being faced in legal domain under common law system are analyzed, and formulated the problem of identifying similar legal judgments. Analyzed the format of judgment and identified several features. Identified features to find similar judgments and compared their effectiveness. Proposed notion of paragraph-link to improve the performance of similar legal judgments. 1.6 Organization of thesis In this chapter, we have covered introduction to the contributions of this thesis. Rest of the thesis is organized as follows: Chapter 2 - Apart from related work on link-based approaches in IR, and IR approaches in legal domain, it provides overview of IR approaches. Chapter 3 - Identification and analysis of four different features which are used to identify similar judgments. Chapter 4 - Proposing notion of paragraph links to identify similar judgments. Chapter 5 - Conclusion and future work. 5

17 Chapter 2 Related Work In this chapter we review selected publications related to the topic covered in this thesis. First section outlines the work related to various IR approaches to find similar text document. Second section outlines various IR approaches applied in web-domain and in third section discusses various IR methods applied in legal domain. Summary of the chapter is provided in the last section. 2.1 IR approaches: Overview A number of retrieval models have been devised to abstract the processes underlying information retrieval systems. Models in which formal queries specify precise criteria for retrieved documents are said to be exact-match models, whereas best-match models return a ranked list of documents for a query conveying suitable documents. Exact-match models such as the Boolean model in which queries are formulated as logic expressions are more popular in legal and scientific search systems than Web search engines. The three most prominent best match models are the vector space model, probabilistic model and the language model. Vector space model: In this model, queries and documents are modeled as vectors in a highdimensional Euclidean space where each axis corresponds to a distinct term and the co-ordinate along the axis is a weight determined by statistical occurrence data for the term. Once encoded in vectors, similarities between queries and documents can be deduced according to vector arithmetic. Often the inner product of vectors is used in this regard. Term weighting schemes are key to performance in these models since terms carry varying levels of significance depending on context. Typically the weight of a term in a document or a query is determined by a combination of its local profile within the document or query, its global profile within a wider context (the document collection as a whole) and a normalization factor compensating for discrepancies in the length of documents. Probabilistic model: The probabilistic model takes a more conceptually intuitive approach. Instead of being based on relatively abstract vector arithmetic, relevance rankings are based on a 6

18 probabilistic measure of relevance classifications given a user s query and document. The measure used is the likelihood ratio for relevant classifications of the query and document and is formulated as P(R Q,D)/P(NR Q,D) (thats the probability of a relevant classification by searchers divided by the probability of non-relevant classification by searchers). Under the assumption that term occurrences are independent - a little manipulation of this measure involving application of Bayes rule, reveals that a proportional approximation of it can be derived from estimates of the probability that the document s terms feature in relevant classification (formulated as P(t R)) and non-relevant classifications ( formulated as P(t NR). Language model: Similar to the probabilistic model is the language model in which relevance rankings for documents are based on the probability that a user had that particular document in mind when generating their query, this is formulated as P(D Q). Under the assumption that query terms occur independently and some manipulation with application of Bayes rule it follows that the measure can be approximated using estimates for the probability that query terms feature in the document (formulated as P(t D) ) along with a prior probability for the document (formulated as P(D) ). Typically, maximum likelihood estimates taken from document term frequency data are used in estimating query-term probabilities whilst document lengths are used in estimating document prior probabilities. 2.2 Link-based approaches in IR In many respects, content on the World Wide Web is quite similar to the content in off-line document collections. On the other hand, though the content is the same the treatment, storage and processing methods of the on-line content needs significant change in the way off-line content are deal with. Hence, the approaches taken in traditional information retrieval are highly applicable to web but with subtle modifications. Since web is a collection huge number of pages (and hence data) the role of links are quite significant in the web Exploiting links in web-domain: Web-pages are structured documents, having various components which hint about the topic discussed in the page. One of the important feature of web-pages that plays significantly important role in the field of IR in web is links present in those web-pages. In web terminology this link is known as hyperlink. In general the observation is as follows: The presence in a given page, P1, of a URL pointing to a second page P2, implies some association between the two pages. But there are no general uniform rules, let alone enforcement mechanisms, for ensuring that there is some reasonable connection, e.g., by author or topic, between any two pages linked by URL. Independently of one another, both brin [12] and Kleinberg [21] have exploited the hyper-link structure of web. Hyperlinks are a particularly valuable source of information. Due to the fact that hyperlink authors are often not the authors of documents that 7

19 are the targets of their hyperlinks, potentially impartial judgments on documents can be discerned. Web IR, the importance of hyperlinks is two fold. Firstly, the hypertext associated with hyperlinks enable the representation of target documents to be enriched and secondly, hyperlinks allow linkage between otherwise unconnected documents. Yet another application of Web link analysis is in Web clustering and categorization algorithms which grouping similar pages together. [14] demonstrates that links and their surrounding anchor text can be used to develop an automatic resource compiler with performance that is compatible to the manual Web directory Yahoo!. Clustering of Web pages also feature in Web meta search engines such as Vivisimo citevivisimo which further categorize search results for the convenience of searchers. An interesting application of web links is introduced by IBM Research [7]. They apply temporal link data in identifying significant trends and events in matters pertaining to a query. A temporal link is introduced as a dated in-link, before a clear example of how profiling the distribution of dated in-links (by date) can be revealing. The study concludes by demonstrating the utility of dated in-links to Web IR. An HITS algorithm in which links are weighted according to their temporal relevance is shown to produce more contemporary results than standard HITS Exploiting links in text-documents: A plain text document is composed of only text data. However, there are certain text-documents which contain not only text data but also references to other documents (which can be considered as links). Scientific research papers are one such example. It consists of all related work done in the past in the related field. Legal judgments are another example which exhibits such traits. It is also composed of plain text and case-citations. Efforts have been made to investigate behavior of links in text-documents (which already have links) as well as to generate automatic links among text-documents. For the sake of better understanding we categorize literature into two categories. Efforts have been made to exploit links available in text-documents as well. Links available in scientific research papers have been explored to find similar documents. Two well-known methods in this regard are [20] and [18]. These methods compare a pair of documents by comparing links present in those documents. The difference between these two approaches is that while [20] compares out-links of a document [18] compares their in-links. In legal judgments links are present in the form of case-citation. Efforts have been made to exploit case-citations to extract various information. In [44], a new tool is proposed namely, semantics-based citation network. Using this tool, users can easily navigate in the citation networks and study how citations are interrelated and how legal issues have evolved in the past. Various forms of natural language processing (NLP) technologies are used in building the meta-data behind the prototype. The main idea behind this work is to link semantically similar works by exploiting case-citations, readily available in each judgments. 8

20 2.2.3 Automatic generation of links in text-documents: In general, it was found that, the topic of discussed in a long text document can be divided into various sub-topics across the whole document. Hence, it will be quite useful for a reader if each subtopic can be linked directly to the relevant topics (or sub-topics) in different of same document [32]. The ease of hyperlinks, in web environment has encouraged IR researchers to automatically generate hyperlinks within a long text document. Such a link helps a reader to grasp the underlying context more lucidly. One of the reasons for the ease for understanding is the users have a context in which the information needs to be seen. Significant amount of effort have been made in the area of generating links among those textdocuments, which don t have them. One of the earliest effort in this direction is made by [32] wherein notion of hypertext was used for those links which were generated by text-based comparison across the whole document. In this work, a link was placed between the related pieces of text in different documents. Using this links, text relation maps were constructed and improved system was built to access the text on related themes that exist in different documents. Generating links pointing to units of a smaller granularity than a document, which can be considered as a task of passage or focused retrieval, has also been addressed recently. In this task, the system locates the relevant information inside the document instead of only providing a link to the document [22]. In recent times, efforts have been made in this direction. According to [22] current approaches for generating links can be divided into three groups: Link-based approaches discover new links by exploiting an existing link graph. Semi-structured approaches try to discover new links using semi-structured information, such as the anchor texts or document titles. Purely content-based approaches use as an input plain text only. They typically discover related resources by calculating semantic similarity based on document vectors. Unlike web environment text-document data is static and hence the link applied on them is also static in nature. One of the popular approach for generating static links is to apply links between two semantically related texts. For this, the similarity between all pairs of text is compared, and then insert links between those that are most similar. There are many ways of measuring similarity and then determining whether a link should be in place. Salton described building a set of cross-references for an encyclopedia [34] and links are created using both similarity and spreading activation [24]. Green introduced the use of lexical chains, exploiting the semantic relatedness of individual words, to determine when links should be used [16]. 9

21 2.3 IR approaches in legal domain: A legal judgment has its own style of diction, hence it needs more sophisticated methods to deal with them. Being a complex domain it throws plethora of challenging issue to deal with in order to come-up with IR approach which provides desirable results. Efforts has been made to deal with legal domain using ontology, machine learning techniques, case-based-reasoning and etc. It has been observed that technological innovations, advanced retrieval models, structured knowledge representation schemes, and hypertext form the basis of modern legal IR systems [38]. In order to provide legal practitioners with a truly useful tool, these technologies have to be integrated in the best possible way. Ontology has been utilized to understand query entered by a user. In [37] ontology based on legal domain framework is applied to understand the query terms and then list of relevant judgments in response to the entered query is obtained. The documents considered for this study are from three different subdomains viz. rent control, income tax and sales tax related to civil court judgments. This method shows that ontology ensure efficient retrieval by enabling inferences based on domain knowledge, which is gathered during the construction of the knowledge base by overcoming issues like polysemy and synonymy. Another major issue which is inherent in legal domain is its big size. In [36] this issue is dealt with this issue by providing summarization of the judgments. This approach constructs proper features sets with an efficient use of CRF for segmentation and presentation tasks, in the application of extraction of key sentences from legal judgments. In this approach the format of judgment is exploited and rhetorical role of each sentence is identified so that the summary, which will be obtained at last, would be having significant sentences. Efforts have been made for automatic text representation, classification and labeling in European Law. One such effort is made by [39]. In this approach, topical similarity detection and to structure a document collection accordingly Self-Organizing Map (SOM), a popular unsupervised neural network to cluster documents is used. The Self-Organizing Map is quite appropriate dealing with this problem because it takes into account the co-occurrences in a very high-dimensional feature space. Since lawyers are highly trained text analyzers and expect a higher degree of quality. Therefore, the presented tool may be very helpful for a legal researcher but further improvements of the labeling quality are necessary. As the segmentation has been quite successful for improved indexing, using available XML structure will also provide more quality. Especially helpful would be the numbering of the paragraphs of court decisions and the paragraphs or articles of statutes. Work in legal domain, has been done to automate the process of understanding legal judgments automatically. It is needed to find more and more suitable judgments as precedents, so that a legal practitioner can prepare his arguments. Since under common law system, lawyers argue a current undecided case on the basis of decided cases, which are legal precedents, a lawyer needs to analyze various attributes of a judgment to decide whether that can be used as a precedent or not. One such effort is made at [43]. Since, to carry out case based reasoning, the essential first step is to determine what factors hold in reported decisions, which is different from establishing the facts of the case in the first place, it is manually cumbersome task. In this work, a new semi-automated legal text analysis tool is devel- 10

22 oped which incorporates lexical semantics and expert legal knowledge for the identification of legal case factors. Here case factors is defined as the analysis of what factors hold in a precedent case. 2.4 Summary In this chapter, we explained works done by exploiting links in various types of text documents (viz. web pages, scientific research papers etc). Mainly it was shown that, links are significant attributes and can be exploited to find related documents. Also, efforts have been made to generate links when there were no ready-made links between documents. In next chapter we explain the efforts made at our end to observe the relative significance of various attributes of legal judgments to find similar judgments. 11

23 Chapter 3 Similarity Analysis of Legal Judgments One of the challenging tasks that any legal practitioner faces under common law system 1 is to find similar legal judgments. By virtue of common law system law is not static concept, but it keeps on evolving. Newer legal concepts are expressed in the form of latest legal judgments. Hence, it is crucial to find similar legal judgments for a legal practitioner to update himself about the latest legal concept under given facts, so that he (or she) could prepare his (or her) arguments accordingly. In this thesis we are investigating this issue and analyzed various challenges encountered in finding similar legal judgments. In this thesis, a judgment denotes a legal judgment under common law system. In this chapter at first we discuss the background of the problem domain. In next section, we explain overview of legal judgments, wherein we discuss the structure and details of various features of judgments. In next segments, we explain the problem statement of finding similar judgments. Finally we analyzed various methods utilized to solve the problems and then conclude our findings under conclusion section. 3.1 Background Legal sources are typically written documents that form the basis of legal reasoning. They can be divided into three categories: legislation, judicial decisions, and literature. Among these legal sources which legal sources are given a higher priority varies across different legal systems. The two main legal systems today are civil law legal system and common law legal system [5] Types of legal system: Civil law: In civil law, as for instance in Continental Europe, legislation is the primary legal source. The judgments of courts are based on the provisions of legislation, from which solutions for the individual cases are derived. 1 One of the legal systems, which gives importance to the previously delivered judgments. 12

24 Common Law: In common law highest priority is given to decisions by courts. When there is no authoritative law that can be applied to a certain case, judges have the authority to create a precedent. The body of a precedent is referred to as common law or case law and is binding in future decisions. Countries like India, US and UK follow common law system. The emphasis on different legal sources in civil law and common law influences the type and amount of material required for the research task. In common law cultures the availability and accessibility of as much case law as possible is of great importance. In civil law on the other hand, it is generally sufficient to have access to applicable legislation and selected landmark cases Existing similarity measures: Since we are analyzing problem of finding similar judgments, in this section we discuss three well known similarity measures. Aforementioned, three methods are independent of domains, and are accepted approaches to compare two documents. Cosine similarity: One common and popular model for document representation is to represent each textual document as a set of terms. Most commonly, the terms are words extracted automatically from the documents themselves, although they may also be phrases, n-grams, or, manually assigned descriptor terms (of course, any such term-based representation sacrifices information about the order in which the terms occur in the document, syntactic information, etc.). Often, if the terms are words extracted from the documents, stop-words (i.e., noise words with little discriminatory power) are eliminated, and the remaining words are stemmed so that only one root form (or the stem common to all the forms) is used. We can apply this process to each document in a given collection, generating a set of terms that represents the given document. If we then take the union of all these sets of terms, we obtain the set of terms that represents the entire collection. This set of terms defines a space such that each distinct term represents one dimension in that space. Since we are representing each document as a set of terms, we can view this space as a document space. We can then assign a numeric weight to each term in a given document, representing an estimate of the usefulness of a term for the given document. It should be stressed that a given term may receive a different weight in each document in which it occurs; a term may be a better descriptor of one document than of another. A term that is not in a given document receives a weight of zero for that document. The weights assigned to the terms in a given document d j can then be interpreted as the coordinates of d j in the document space. Using vector representation, we can effectively calculate the document-document and document-query similarity. Cosine similarity is the most popular similarity function to calculate the similarity between two vectors. For two document vectors ( d 1 and d 2 ), this measure is defined as: Sim( d 1, d 2 ) = Cosine( d 1, d 2 ) = d 1 d 2 d 1 d 2 (3.1) 13

25 Where, indicates the vector dot product and d 1 is the length of document vector d 1. Bibliographic coupling: In literature, methods have been proposed wherein, in place of the contents of documents, references are compared to check whether those two documents are similar or not. One such method is bibliographic coupling method [20]. Bibliographic coupling method is a well known link-based similarity measure when it comes to scientific research papers. Measuring bibliographic coupling can be useful in a wide variety of fields since it helps researchers find related research done in the past, though its exact interpretation may vary depending on the field, since different fields have different citation practices. According to bibliographic coupling method, two documents are similar when they cite threshold number of similar documents. bibcoupling(d 1, D 2 ) = OC D1 OC D2 (3.2) OC D1 denotes out-citations of document D 1, and OC D2 denotes out-citations of document D 2. Hence, two documents are similar if bibcoupling(d 1, D 2 ) δ, where δ is threshold number of common out-citations needed to declare two documents as similar. For example, in Figure 3.1, documents A and B are similar because both are sharing two common documents i.e E and F (assuming threshold value δ = 2). Co-citation: Apart from co-citation another link based similarity measure found in literature cocitation. Progression of citation study methods introduced another similarity measure called as co-citations. According to [18] co-citation analysis is a better indicator of subject similarity. According to co-citation method, two documents as similar documents when they are cited together threshold number of times by other documents. cocitation(d 1, D 2 ) = IC D1 IC D2 (3.3) where, IC D1 denotes in-citations of document D 1, and IC D2 denotes in-citations of document D 2. Hence, two documents are similar if cocitation(d 1, D 2 ) δ, where δ is threshold value. For example, in Figure 3.1, documents E and F are similar because they are cited together two times, by twos documents i.e A and B (assuming threshold value δ = 2). 3.2 Legal Judgment: An overview A lawsuit or a case begins when a plaintiff files a document called a complaint with a court, informing the court of the wrong that the plaintiff has allegedly suffered because of the defendant, and requesting a remedy. Once a case is filed and accepted by the court, then depending upon the provision of the constitution of the country, the case is argued between two lawyers wherein each lawyer represents either plaintiff or defendant. Once judges finish hearing all the testimonies from both sides, finally a judgment is delivered. Thus, by definition, a legal judgment is a closed old case. Typically, a legal judgment contains the following attributes: 14

26 A B C D E F G Figure 3.1 An example of links between judgments. Name of judgment: Name of judgment is given as per the name of Appellant s and defendant s name. For example, in Figure 3.2, name of the judgment is Khandesh spg& wvg mills co. ltd. V. The Rashtriya Girni Kamgar Sangh Jalagaon. Names of judges: Names of those judges who delivered the judgment after hearing the case are mentioned under this section. For example, the judgment presented in Figure 3.2, delivered by three judges bench of Supreme court of India and name of judges are K Subbarao, P.B Gajendragadkar and K.C. Das Gupta Citation: It contains unique IDs given to the judgments by which this judgment will be referred by other judgments. Format of these names vary according to law reporters. In general, the format contain: title of the reports, volume number, page number and year (of publication). For example, (1988) 2 SCR where 1988 corresponds to year of publication, 2 corresponds to volume of the reporter, SCR corresponds to name of the reporter (abbreviation of Supreme Court Reporter) and 809 is page Number of the judgment within the volume. In Figure 3.2, the judgment contains four different IDs. These IDs are, 1960 AIR 571, 1960 SCR(2) 841 Act: It categorizes the issue discussed in the judgment from legal point of view. Since, a judgment resolves in a dispute between two or more parties involved, Act specifies all legal specification of the matter involved in the dispute. For example, in Figure-3.2 Industrial dispute-bonus-full bench formula-rehabilitation- reserves used as working capital-mode of proof. Headnote: Headnote is a summary of the text of a court decision to aid readers. Generally, a legal judgment is very big in size due to which, it is quite difficult to read the whole judgment. To make a judgment easier to analyze, summary of the judgment is prepared which is known as headnote. Case citation: These are embedded into the headnote text of the judgments. It represents older legal judgments which are referred for pronouncing the current judgment. In Figure-3.2, [1960] 2 S.C.R 32, and [1960] 1 S.C.R 1 are shown. More about Case citation Under common law, one of the prominent features of a legal judgment is references mentioned to older judgments. References are mentioned to strengthen the presented arguments. These references are known as case citation. Since a case citation links two legal judgments, 15

27 Figure 3.2 A typical judgment from Supreme court of India. Discontinuous lines show missing texts. 16

28 it resembles in nature with URLs of web-pages as well as references mentioned at the end of scientific research papers. Difference between citation and case citation : Citation mentioned in section 3.2 indicates unique ID by which current judgment will be referred by other judgments, whereas case citation are those older judgments which are referred by the current judgment. Generally the format of judgment is such that, citation is mentioned before headnote, while case citations are embedded within the text of headnote of the judgments. Significance of case citations : By the nature of law itself, case citations go through strict scrutiny of legal experts. For instance, during argument of a case, if an older judgment is referred by a lawyer, which is not relevant to the issue under consideration currently, then, the opposing lawyer draws judge s attention to that, which is then verified by the judge, who is also a legal expert. Case citations contribute towards the argument of the judgments by leveraging the applied legal concepts of the cited judgments. Thus, case citations carry significant human endorsement towards the topic similarity of the linked judgments. This property of case citation separates legal judgments from web-pages, where hyperlinks are added for a wide variety of reasons [21] and scientific literatures, where references are mentioned are perfunctory or done out of politeness, policy or piety [19]. In-citation and Out-citation After examining properties of case citations we define two notions, which are: Out-citation (OC) For a given judgment J, we define out-citations as those case citations which are mentioned in judgment J and are referring another judgments. In short, Out-citations of judgment J are all those case citations which are mentioned in the headnote of judgment J. The judgment shown in figure 3.2, has out-citations as [1959] SCR 925, [1960] 2 SCR 32, [1960] 1 SCR 1. In-citation (IC) For a given judgment J, we define in-citations as those case citations which are referring to given judgment J. The judgment shown in figure 3.1, is an in-citation for all those judgments which are referred by this judgments. Hence, the shown judgment [i.e 1960 AIR 571 or 1960 SCR (2) 841) is in-citation for [1959] SCR 925, [1960] 2 SCR 32, [1960] 1 SCR Finding Similar judgments: Problem Statement As explained in previous sections, since the nature of common law system itself is such that law is dynamic in nature and evolves with each judgments, it is critical for a lawyer to prepare his arguments by analyzing the latest legal interpretation under the light of facts. Typically, after getting a task, typically a lawyer prepares his (or her) arguments by following these steps: lawyer browses legal database to find similar judgments using his knowledge and past experiences. Once he (or she) finds a judgment, 17

29 say judgment J, which satisfies his (or her) requirements adequately, he (or she) starts looking for more judgments which are similar to judgment J to analyze the applied legal concept in full detail. Normally, their background knowledge is not sufficient to decide the case right away; hence they have to consult a number of legal sources. The research task consists of bridging the gap between the problem at hand and the legal sources, in order to construct legal arguments [10]. Generally, the size of legal database is huge, and browsing legal database manually consumes significant amount of time and effort. A variety of approaches have been proposed to calculate similar text documents. Traditional approaches calculate similarity score according to document contents, such as Vector Space Model [35], n-gram measures [15] etc. Traditional content-based methods compare text data of one judgment with that of another, such that higher overlap of text between two judgments signifies higher similarity between them. Apart from well known drawbacks of traditional content-based method like polysemy and synonymy, another major drawback is inability in distinguishing between texts which are more significant than others in a given document. For example, to compare two legal judgments traditional approaches treat both legal and non-legal terms present in legal judgments equally, even though the intuition says that legal terms are more significant then non-legal terms. Owing to voracious nature of legal judgments, presence of enormous number of non-legal text dominates legal terms. Due to this naive approach, traditional content-based methods don t retrieve similar legal judgments suitably. In this thesis we investigated this problem, to reduce manual effort and time consumed for a legal practitioner and proposed approach to find similar legal judgments under common law system. Formally, our problem can be stated as: A judgment J is given as an input. The problem of finding similar judgment is to find a set of judgments which are similar to given judgment J. Similarity is a subjective phenomena. It varies depending upon need as well as context once is talking about. In general it can said that, two similar objects look as if they are one and the same in respect to a certain properties. the properties based on which two objects are declared as similar depend upon the type of object one is comparing. Since our dataset is domain specific text-document, criteria for deciding whether two judgments are similar or not is decided by legal practitioners which is mentioned in section Features employed to find similar judgments: We analyzed format and various attributes of judgments to understand the importance played by them in the context of whole judgment. After analysis we identified three features of judgments which are employed to compare two judgments. These three features are : All-terms: All-term feature of judgment is defined as the content under headnote of judgments. Selection of all-term as feature vector of judgment can be considered as a novice approach. The 18

30 intuition behind choosing all-term as feature vector of judgments is: since headnote is the summary of judgments, content of headnote are nothing but representatives of the concepts discussed in the judgment. Headnote contents are extracted and filtered before applying them for comparison. Details of these steps are mentioned in next section.s Cosine similarity using all-terms is a conventional method to compare a pairs of text-documents, to find whether they are similar or not. Legal terms: Legal-term feature of judgment is defined as all those text-data which are available under headnote as well as appears in legal dictionary. It was observed that, a judgment is generally of big in size because it discusses the underlying disputes in full detail. Apart from making a judgment bigger in size it also makes extracting and comparing two judgments using data of large scale computationally expensive. On the other hand, since a judgment is a domain specific document the vocabulary used in legal domain would be controlled and hence even the detailed explanation would be consisting of comparatively smaller number of texts. Hence, it is logically more convenient to use legal-term compare two judgments. Case-citations: As mentioned in section 3.2, case-citations are embedded in headnote and are link between current judgment and older judgment. It is an important feature considering the fact that, it has human endorsement in terms of relatedness of topics between two judgments. Additionally, in literature work has been done to find similarity using existing links, it is interesting to see how case-citation behaves in legal domain. 3.5 Approaches to find similar judgments: After analyzing features of judgments as mentioned in section 3.4 and various similarity measures mentioned in section we formulated four different similarity measures and utilized them to compare two judgments. These four similarity measures are mentioned below Cosine similarity using all-terms All-terms are extracted from judgments using below mentioned steps. Judgments are named with year of judgment serial number. e.g , etc. Text of judgments are converted to small case. Stop words[2] are removed. Non alpha-numeric characters are removed. Stemming is done using Porter s algorithm[3]. 19

31 tf-idf value for each term is computed. Using equation 3.2, cosine similarity of each judgment pair is computed Cosine similarity using legal-terms Size of all-terms as well as domain specific nature of judgments encourages us to see how the conventional comparison method behaves when the content is filtered and only domain specific terms are employed as feature vector of judgments. Legal-terms from judgments are extracted by below mentioned steps. Judgments are named with year of judgment serial number. e.g , etc. Text of judgments are converted to small case. Regular expressions are written for legal terms available at [1] and using those regular expression legal terms are extracted from all judgments. Each legal term is weighted according to the formula mentioned in equation 1. Using equation 2, cosine similarity of each judgment pair is computed Bibliographic coupling similarity using out-citations Unlike above mentioned approach, this approach falls into the category of link-based approach. Bibliographic coupling similarity measure is a known method to compare two text documents by employing their links, hence we applied this approach in our problem domain. Below mentioned steps are used to extract out-citations: Judgments are named with year of judgment serial number. e.g , etc. Regular expressions are written for case-citations format according to three law reporters i.e AIR, SCR and SCC. Headnote of judgments are scanned to extract all the out-citations present in the judgments. A judgment can be cited by any of its name given by various law reporters, so prior to compare two cited judgments we need to rename all the judgments into single format. Hence, all possible names of a judgment are extracted from citation section of the judgment (discussed in section 3.1). Citation names of each judgment is replaced with the their corresponding name given by us. E.g: SCC 338 was replaced by in our corpus. 20

32 3.5.4 Co-citation similarity using in-citations Apart from bibliographic coupling, another well accepted method for comparing two text-documents using their links is co-citation technique. This measure is employed to see how it behaves with domain specific text-documents. Out-citations are extracted following steps mentioned in section For each judgment acting as out-citation, corresponding judgments are collected from judgment and out-citations pairs, and pair is reversed to produce judgment in-citation pair as shown in example in section Experiments In this section, we describe our dataset and experimental setup and analysis of results obtained Description of dataset Since India follows common law system, experiments explained in this paper are conducted on judgments delivered by Supreme court of India. Our dataset consists of judgements delivered by Supreme court of India, downloaded from [4] in september It was found that the number of case citations in the judgments varies from 1 to 97. For our experiment we chose only those judgments, which are having minimum 3 and maximum 12 case citations. The statistics of dataset chosen for conducting experiments is available in Table-1. Table 3.1 Statistics of judgments used for experiment Total no. of judgments in the dataset 2,430 Minimum size of a judgment KB Maximum size of a judgment 546 KB Average size of a judgment KB Minimum no. of token in a judgment 185 Maximum no. of token in a judgment 33,628 Average no. of token in a judgment Minimum no. of case citations in a judgment 3 Maximum no. of case citations in a judgment 12 Average no. of case citations in a judgment Experimental setup We conducted experiment in two stages. In first stage, we investigated the relative effectiveness of similarity measure. We collected four category of samples, each consisting of six judgment pairs, such 21

33 that each category was having high similarity score based on only one similarity measure while similarity score from remaining methods are low. It is done to see to make sure that there is direct co-relation between the expert similarity score with that similarity measure which is dominating in the sample pair. Sample pair of judgments were given to legal domain experts (legal practitioners) without informing the computed similarity values. Since, in each pair only one of the similarity measure dominates, hence similarity score given by domain experts will indicate which similarity measure method of the judgment is the most crucial to decide whether judgments are similar or not. Legal experts assigned similarity score of judgment pairs based on following aspects: Similarity in issue discussed in the judgment. Similarity in underlying facts of the judgment. Utility to the lawyer, researching for judgments similar to a given judgment. In second phase of the experiment, we verified the applicability of bibliographic coupling score for finding similar legal judgments. We collected all the judgment pairs with bibliographic coupling score=3, and judgments were given to legal experts once again to get the similarity score. It was found that, almost all the judgment pairs satisfies human notion of similarity. Table-3.6 shows the response obtained after phase-2 part of the experiment Results The computed similarity scores are compared with the average similarity values given by legal domain experts after normalizing between 0 to 1. Judgment pairs in Table-3.2 and Table-3.3 contain high values of all terms cosine similarity score, legal term similarity score respectively. Similarly, Table- 3.4 and table-3.5 contains bibliographic similarity score and co-citation similarity score with lower values of cosine similarity scores. Note that, since the minimum number of case citation in out dataset is 3, so maximum bibliographic and co-citation score possible is 3 while minimum is 0. The feedback obtained from legal experts are analyzed below: Similarity analysis result using all-terms : Table-3.2 shows that, average score given by legal experts doesn t agree to the all term cosine score. This observation shows that judgments contain high number of those words which do not capture the essence of the judgment and hence, even though a judgment pair contains high number of text in common they are unable to satisfy human notion of similarity. This observation could be explained as following: A judgment explains each underlying issue in full detail to explain the applied legal concept as discreet as possible, at the same time there is not specific style of writing such details and hence every judge is independent in terms of what and how he explains. It is not rare for a judge to write famous anecdote and popular moral stories in judgments. Such liberal writing style gives ample space to many a few texts which aren t related to 22

34 the underlying disputes directly, but is related in abstract form. undesired words in the judgments. Such a collection words can t be removed by stemming and stop words removal techniques and hence these words appear in the feature vector of judgments. Since these are not directly related to the context of the judgments, they play their role as noisy words and hence, the comparison using feature vectors which include them don t seem to be effective. Similarity analysis result using legal-terms : Table-3.3 shows that, average score given by legal experts agree to the legal term cosine score. This result is quite self explanatory itself. Typically, a judge employs legal terms from legal domain for expressing the related concepts to improve communication and understanding. So it is natural that, any similar judgements will have common legal term. As a result, similarity computation based on legal terms are giving fair results. Since judgments are domain specific document, it is not surprising that, utilizing only domain specific terms to construct feature vector which could be utilized to compare two judgments, comes out as more effective technique then all-term comparison. The reasoning behind this phenomenon is as following: legal domain has restricted vocabulary which encourages a judge to use same terminology for the similar issues. The controlled vocabulary restricts the liberty of explaining underlying dispute and hence comparing two judgments using only legal terms comes out as quite effective measure. Similarity analysis result using co-citation : Table-3.4 shows performance of co-citation similarity score. It is found that the domain experts are not agreeing with the computed similarity values based on varying range of co-citation similarity score. In order to investigate co-citation property of judgments, we took all the 11 pairs of judgments such that their co-citation score geq 3. The similarity score from domain experts have been shown in Table-3.7. We observed that, judgments having co-citation score 3 are not similar in nature. This is a interesting result as compared to the phenomena in scientific literature [18]. This observation could be explained as following: Unlike a scientific literature paper, a judgement doesn t deal with homogenous concepts. A judgment is generally composed of several subtopics to cover the diversity of the dispute and a case-citation is made for those subtopics. Hence, unlike the phenomenon with scientific papers wherein referring two papers together hints towards relatedness of the topic they are dealing with, it isn t true to when two judgements are cited together. It is so because, citing same set of judgments hints towards relatedness in one specific legal-concept of judgments, and since a judgment is collection of more then once legal-concept it fails to impress the human judgment of similarity. Similarity analysis result using bibliographic-coupling : Table-3.5 shows the results based on bibliographic coupling score. The results show that domain experts agree with higher values of bibliographic similarity score. Since results of bibliographic score are quite encouraging, we 23

35 collected all the judgment pairs having bibliographic score 3 and corresponding similarity score given by experts are shown in Table-3.6. This observation complies with the inherent nature of legal judgments. If two judgements cite the same set of judgements, both agree to the context of the cited judgements. In general, a typical judgement is known for one certain legal-concept discussed as one of its subtopics. So, if two judgements cite the same judgement, they also agree on the subtopic most of the cases and hence, there is a high probability that those two judgments would be similar. Hence, unlike bibliographic coupling score which covers the similarity in more then one sub-topics between judgments by comparing their out-citations, co-citations is mere able to identify similarity in subtopics of judgments but not of the whole topic Analysis by domain experts for sample pairs: Analysis by domain experts: Here, we present the views of the domain experts on three judgment pairs of Table-3.6 regarding their justification for the given similarity score. Pair & : This judgment pair exhibits, bibliographic score 3, expert score 0.40 In Case , the Court is looking at issues relating to the principle of res judicata. There is a discussion on the application of the principle in cases where petitioners try to challenge the validity of the provisions of the Act on different grounds at different times. In Case , the judgment revolves around issues of pending suits for eviction The reason for a low domain expert score is basically that there is very little substantive similarity. The facts of the two cases are distinct. The facts, issues and legal principles are not closely connected. The reason why a search engine may throw up these two cases as being similar is because of similar legislations or the use of same terms (land, rent etc) in the two judgments & : This judgment pair exhibits, bibliographic score 3, expert score 0.50 While Case deals with the issue of discrimination in promotion and pay scales on the basis of educational qualification. The central issue here was whether two groups of employees, one that includes degree holders and the other with diploma holders, be treated equally in promotion and payment of salaries. In Case , the Court was dealing with petitions claiming the application of equal pay for equal work principle for those employees who had not been regularized but were in service for a long period of time. These cases are similar in that they involve a close discussion on the 24

36 legal principle of equal pay for equal work. The Supreme Court in both these cases considers the constitutional scheme of the directive principles of state policy while discussing the principle. However, there is not a lot of similarity in the facts and the judgments also look at distinct issues - the first case has a discussion on the nature of the directive principles while the second case looks at issues of taxation. These cases will be fairly useful for a researcher or a lawyer who wishes to argue or cite cases on the principle of equal pay for equal work but not so much in other issues. Hence, the average similarity score is given. Pair & : This judgment pair exhibits, bibliographic score 3, expert score 0.70 In Case , the Court was addressing a petition by a hearing therapist, who claimed that while he was performing a job same or similar to senior speech pathologist, senior physiotherapist, senior audiologist, and speech pathologist in the same institution under the same employers, he had been given a lower pay scale in comparison to these posts. The Court adjudicated the reliance on the equal pay for equal work principle. In Case , the Court was dealing with petitions claiming the application of equal pay for equal work principle for those employees who had not been regularized but were in service for a long period of time. The Court here agreed with the petitioners but cautiously gave directives to the State Government keeping in view the economic capacity of the States. For most lawyers and legal researchers, these two cases are going to be useful because the core principle is the same. There is an extensive discussion of the various arguments for and against the application of the principle of equal pay for equal work and the Court looks at the elements of this principle in-depth. 3.7 Conclusion: Our experiment shows that, legal term and bibliographic coupling score with 3 common outcitations, are the most significant attributes to identify a similar judgment pair. Even though, it is difficult to draw definitive conclusion from these studies, this experiment shown us the way forward towards finding similar judgments. However, the number of case-citations varies in judgments and sole dependence on case-citation doesn t yield similar judgments in sufficient numbers. Hence, in next chapter we are going to explore this issue and enriching each judgment by linking them to appropriate judgments. 25

37 Table 3.2 All term similarity score is high while rest similarity score are less. Sl. Judgment All Legal Bibliographic Co-citation Average score No. pairs terms term coupling score by domain score score score expert & & & & & & Average score is 0.45 Table 3.3 Legal term similarity score is high while rest similarity score are less. Sl. Judgment All Legal Bibliographic Co-citation Average score No. pairs terms term coupling score by Domain score score score expert & & & & & & Average score is 0.76 Table 3.4 Co-citation similarity score is high rest similarity score are less. Sl. Judgment All Legal Bibliographic Co-citation Average score No. pairs terms term coupling score by Domain score score score expert & & & & & & Table 3.5 Bibliographic coupling similarity score is high while rest similarity score are less. Sl. Judgment All Legal Bibliographic Co-citation Average score No. pairs terms term coupling score by Domain score score score expert & & & & & &

38 Table 3.6 Judgment-pairs having high bibliographic coupling score method Sl. Judgment Bibliographic Average score No. pairs coupling by domain score expert & & & & & & & & & & & & & & & & & & Average score 0.65 Table 3.7 Judgment-pairs having high co-citation coupling score method Sl. Judgment co-citation Average score No. pairs score by domain expert & & & & & & & & & & & Average score

39 Chapter 4 Finding Similar Legal judgments using Paragraph-Links Encouraging results of link based analysis in web environment has inspired IR researchers to such an extent that links between text documents are explored to check their viability to produce desired results. Text documents which didn t have links originally are also enriched by generating links automatically to various passages of the same documents of the same documents. In this chapter, we investigated behavior of legal judgments when links are generated between a pair of judgments artificially and then exploited those to find similar judgments. 4.1 Issue Previous work has shown that link-based similarity is more effective then text-based comparison [23]. However, further investigations we found that, though link-based similarity are effective to find similar legal judgments, they don t exist each judgments adequately to be leveraged to find similar judgments. Hence, link-based similarity approach explained in previous chapter is able to fetch only a small set of similar judgments for a given judgment say J. We observed that, mainly there are two reasons due to which solely dependance on case-citation is unable to fetch similar judgments sufficiently. These two reasons are explained using an example of a pair of similar judgments (namely judgment A and judgment B ) Insufficient number of case-citations : Existing approach claims that two judgments are declared similar if the number of their common links 1 is higher than the threshold. Hence, higher the number of links in a judgment, higher will be its probability to have threshold number of common links. On the other hand, if either judgment A or B or both judgments have links lesser then threshold values needed to be declared as similar judgments, then A and B doesn t show-up in the result set of similar judgments. Figure 4.1 shows that the number of case-citations in judgments is not available in equal numbers and follow power law of distribution. 1 In a legal judgment, a link is available in the form of case-citation. 28

40 Time-gap between judgments is high : Time-gap refers to the the time duration between the delivery of two judgments. Due to continuous evolving nature of law, it is general practice for legal practitioners to refer to the latest judgments. It is done to make sure that the latest legal concept is applied during the argument. Hence, two similar judgments, (say judgment A and B dealing with similar disputes), if A is delivered in 1970 and B in 1971 then it is likely that both these judgments would refer to same set of judgments, on the other hand if J is delivered in 1970 and K in 1980 then it s quite unlikely that the citations of judgment J would be cites by K as well. However, there is a high possibility that, judgments cited by K would have cited judgments which are cited by judgment J. Hence, the commonality between citation is implicit. In our dataset, 85% of linked judgments are having time-gap 3 years. 4.2 Basic Idea In web environment hyperlinks have been exploited. Two highly cited works are HITS[21] and PageRank [12]. The problem dealt in HITS [21] is abundance problem which is described as (in his words) as The number of pages that could be reasonably relevant is far too large for a human to digest. He notes that this problem arises when applying content-only retrieval to broad topic queries with a large representation on the Web. In this work concepts like hub and authority are defined. On the other hand [12] discusses another iterative algorithm which computes PageRank of web-pages. A PageRank results from a mathematical algorithm based on the webgraph, created by all world wide web pages as nodes and hyperlinks as edges. Besides, link analysis in web domain is also used in web clustering and categorization algorithms which grouping similar pages together. It has been demonstrated that links and their surrounding anchor text can be used to develop an automatic resource compiler with performance that is compatible to the manual web directory Yahoo [14]. The underlying intrinsic property which enables links to produce excellent results is that each links are applied meticulously, which means it carries inherent human endorsement of relatedness between linked pages. Significance of links are not limited to only web environment and link-based measures like bibliographic coupling [20], co-citations [18] are found to be effective with text documents with references. Interestingly, link based analysis with text-document is not limited to only those documents which are already having links. IR literature also comprises of works wherein links are generated between applied various parts (paragraph, sections etc.) of a text documents which were not having links originally. The idea behind such links was to enable a user easy access to various section of the documents [16], [24], [34]. As an extension to such approaches, we are applying paragraph links (PL) between two judgments. In general a legal judgment is not a homogenous rather, it contains myriad number of legal-concepts separated into various paragraphs related to the existing dispute. As shown in Figure 4.2, format of a judgment is such that, it is divided into various paragraphs wherein each paragraph describes one legal concept. Generally, each paragraph also ends up with a case-citation which is mentioned to let the reader 29

41 Citation frequency Citation nature in legal judgments Judgment count Citation frequency (On logarithmic scale) Citation nature in legal judgments Judgment count (On logarithmic scale) Figure 4.1 Citation frequency against Judgment count plotted on linear and logarithmic scales. Plots show that case-citation follow power-law of distribution. know how the judge reached to the conclusion explained in that section. Hence, it can be said that, while a judgment refers to another judgment, it doesn t refer to the whole judgment but refers to a specific paragraph which describes a particular legal concept of the judgment. The motivation behind Paragraph link is following: Since link-based similarity has been found as an effective approach with legal judgments [23], we investigated various means by which a judgment could be enriched by links so that it could enable the existing link-based approaches to find similar judgments. It is to be noted that, IR literature contains work wherein links between paragraph of documents have been applied using text-based similarity measures. For example, in [32] the notion of paragraph links have been exploited to build text relation maps for accessing the text on related themes that exist in different documents. In [22] finer granularity than documents is investigated, so that a user can quickly access a passage in another possibly long document related to the discussed topic. The main idea applied here is the use of semantic similarity as a predictor for automatic link generation. 4.3 Proposed Approach We consider each paragraph of judgment as an independent entity, and applied text-based similarity method to identify a set of paragraphs which are similar to the given paragraph using which we apply paragraph-link between judgments. If it was found that there are threshold number of paragraphs between two judgments which are found to be similar then those two judgments are said to have paragraph links. In this way, paragraph links are applied between two judgments by the virtue of its paragraph property. Algorithm for applying paragraph links is explained in Table Method for identifying paragraph in legal judgments As shown in Figure 4.2, paragraphs under headnote begins after keyword HELD: followed by an integer value enclosed between brackets ( and ). It was observed that each paragraph was sepa- 30

42 Figure 4.2 A typical headnote from a judgment. Discontinuous lines show missing text from the headnote. Serial number of first two paragraphs and case-citations present at the end of paragraphs are marked by rectangles. 31

CHAPTER 3 METHOD AND PROCEDURE

CHAPTER 3 METHOD AND PROCEDURE CHAPTER 3 METHOD AND PROCEDURE Previous chapter namely Review of the Literature was concerned with the review of the research studies conducted in the field of teacher education, with special reference

More information

How to Create Better Performing Bayesian Networks: A Heuristic Approach for Variable Selection

How to Create Better Performing Bayesian Networks: A Heuristic Approach for Variable Selection How to Create Better Performing Bayesian Networks: A Heuristic Approach for Variable Selection Esma Nur Cinicioglu * and Gülseren Büyükuğur Istanbul University, School of Business, Quantitative Methods

More information

Hypothesis-Driven Research

Hypothesis-Driven Research Hypothesis-Driven Research Research types Descriptive science: observe, describe and categorize the facts Discovery science: measure variables to decide general patterns based on inductive reasoning Hypothesis-driven

More information

Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation

Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation Ryo Izawa, Naoki Motohashi, and Tomohiro Takagi Department of Computer Science Meiji University 1-1-1 Higashimita,

More information

Models of Information Retrieval

Models of Information Retrieval Models of Information Retrieval Introduction By information behaviour is meant those activities a person may engage in when identifying their own needs for information, searching for such information in

More information

Chapter 12 Conclusions and Outlook

Chapter 12 Conclusions and Outlook Chapter 12 Conclusions and Outlook In this book research in clinical text mining from the early days in 1970 up to now (2017) has been compiled. This book provided information on paper based patient record

More information

Citation for published version (APA): Geus, A. F. D., & Rotterdam, E. P. (1992). Decision support in aneastehesia s.n.

Citation for published version (APA): Geus, A. F. D., & Rotterdam, E. P. (1992). Decision support in aneastehesia s.n. University of Groningen Decision support in aneastehesia Geus, Arian Fred de; Rotterdam, Ernest Peter IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to

More information

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

A Comparison of Collaborative Filtering Methods for Medication Reconciliation A Comparison of Collaborative Filtering Methods for Medication Reconciliation Huanian Zheng, Rema Padman, Daniel B. Neill The H. John Heinz III College, Carnegie Mellon University, Pittsburgh, PA, 15213,

More information

Using Data Mining Techniques to Analyze Crime patterns in Sri Lanka National Crime Data. K.P.S.D. Kumarapathirana A

Using Data Mining Techniques to Analyze Crime patterns in Sri Lanka National Crime Data. K.P.S.D. Kumarapathirana A !_ & Jv OT-: j! O6 / *; a IT Oi/i34- Using Data Mining Techniques to Analyze Crime patterns in Sri Lanka National Crime Data K.P.S.D. Kumarapathirana 139169A LIBRARY UNIVERSITY or MORATL^VA, SRI LANKA

More information

Framework for Comparative Research on Relational Information Displays

Framework for Comparative Research on Relational Information Displays Framework for Comparative Research on Relational Information Displays Sung Park and Richard Catrambone 2 School of Psychology & Graphics, Visualization, and Usability Center (GVU) Georgia Institute of

More information

Inference Methods for First Few Hundred Studies

Inference Methods for First Few Hundred Studies Inference Methods for First Few Hundred Studies James Nicholas Walker Thesis submitted for the degree of Master of Philosophy in Applied Mathematics and Statistics at The University of Adelaide (Faculty

More information

Relationships Between the High Impact Indicators and Other Indicators

Relationships Between the High Impact Indicators and Other Indicators Relationships Between the High Impact Indicators and Other Indicators The High Impact Indicators are a list of key skills assessed on the GED test that, if emphasized in instruction, can help instructors

More information

Causal Knowledge Modeling for Traditional Chinese Medicine using OWL 2

Causal Knowledge Modeling for Traditional Chinese Medicine using OWL 2 Causal Knowledge Modeling for Traditional Chinese Medicine using OWL 2 Peiqin Gu College of Computer Science, Zhejiang University, P.R.China gupeiqin@zju.edu.cn Abstract. Unlike Western Medicine, those

More information

Stepwise Knowledge Acquisition in a Fuzzy Knowledge Representation Framework

Stepwise Knowledge Acquisition in a Fuzzy Knowledge Representation Framework Stepwise Knowledge Acquisition in a Fuzzy Knowledge Representation Framework Thomas E. Rothenfluh 1, Karl Bögl 2, and Klaus-Peter Adlassnig 2 1 Department of Psychology University of Zurich, Zürichbergstraße

More information

CONDUCTING TRAINING SESSIONS HAPTER

CONDUCTING TRAINING SESSIONS HAPTER 7 CONDUCTING TRAINING SESSIONS HAPTER Chapter 7 Conducting Training Sessions Planning and conducting practice sessions. It is important to continually stress to players that through practice sessions

More information

Identification of Tissue Independent Cancer Driver Genes

Identification of Tissue Independent Cancer Driver Genes Identification of Tissue Independent Cancer Driver Genes Alexandros Manolakos, Idoia Ochoa, Kartik Venkat Supervisor: Olivier Gevaert Abstract Identification of genomic patterns in tumors is an important

More information

Cohesive Writing Module: Introduction

Cohesive Writing Module: Introduction Cohesive Writing Module: Introduction Introduction In this module, we will examine elements of academic writing that contribute to making a piece of writing cohesive. When you are writing assignments at

More information

EPF s response to the European Commission s public consultation on the "Summary of Clinical Trial Results for Laypersons"

EPF s response to the European Commission s public consultation on the Summary of Clinical Trial Results for Laypersons EPF s response to the European Commission s public consultation on the "Summary of Clinical Trial Results for Laypersons" August 2016 This document received funding under an operating grant from the European

More information

Citation for published version (APA): Oderkerk, A. E. (1999). De preliminaire fase van het rechtsvergelijkend onderzoek Nijmegen: Ars Aequi Libri

Citation for published version (APA): Oderkerk, A. E. (1999). De preliminaire fase van het rechtsvergelijkend onderzoek Nijmegen: Ars Aequi Libri UvA-DARE (Digital Academic Repository) De preliminaire fase van het rechtsvergelijkend onderzoek Oderkerk, A.E. Link to publication Citation for published version (APA): Oderkerk, A. E. (1999). De preliminaire

More information

ANALYSIS AND DETECTION OF BRAIN TUMOUR USING IMAGE PROCESSING TECHNIQUES

ANALYSIS AND DETECTION OF BRAIN TUMOUR USING IMAGE PROCESSING TECHNIQUES ANALYSIS AND DETECTION OF BRAIN TUMOUR USING IMAGE PROCESSING TECHNIQUES P.V.Rohini 1, Dr.M.Pushparani 2 1 M.Phil Scholar, Department of Computer Science, Mother Teresa women s university, (India) 2 Professor

More information

1. INTRODUCTION. Vision based Multi-feature HGR Algorithms for HCI using ISL Page 1

1. INTRODUCTION. Vision based Multi-feature HGR Algorithms for HCI using ISL Page 1 1. INTRODUCTION Sign language interpretation is one of the HCI applications where hand gesture plays important role for communication. This chapter discusses sign language interpretation system with present

More information

Holt McDougal Avancemos!, Level correlated to the. Crosswalk Alignment of the National Standards for Learning Languages

Holt McDougal Avancemos!, Level correlated to the. Crosswalk Alignment of the National Standards for Learning Languages Holt McDougal Avancemos!, Level 1 2013 correlated to the Crosswalk Alignment of the National Standards for Learning Languages READING 1. Read closely to determine what the text says explicitly and to make

More information

Excerpts from Eat, Drink, Heal, by Dr. Gregory A. Buford

Excerpts from Eat, Drink, Heal, by Dr. Gregory A. Buford Excerpts from Eat, Drink, Heal, by Dr. Gregory A. Buford Eat, Drink, Heal: The Art and Science of Surgical Nutrition Printed by: Core Aesthetics Publishing Copyright 2016, Gregory A. Buford, MD FACS Published

More information

From where does the content of a certain geo-communication come? semiotics in web-based geo-communication Brodersen, Lars

From where does the content of a certain geo-communication come? semiotics in web-based geo-communication Brodersen, Lars Downloaded from vbn.aau.dk on: april 02, 2019 Aalborg Universitet From where does the content of a certain geo-communication come? semiotics in web-based geo-communication Brodersen, Lars Published in:

More information

Multi Parametric Approach Using Fuzzification On Heart Disease Analysis Upasana Juneja #1, Deepti #2 *

Multi Parametric Approach Using Fuzzification On Heart Disease Analysis Upasana Juneja #1, Deepti #2 * Multi Parametric Approach Using Fuzzification On Heart Disease Analysis Upasana Juneja #1, Deepti #2 * Department of CSE, Kurukshetra University, India 1 upasana_jdkps@yahoo.com Abstract : The aim of this

More information

Holt McDougal Avancemos!, Level correlated to the. Crosswalk Alignment of the National Standards for Learning Languages

Holt McDougal Avancemos!, Level correlated to the. Crosswalk Alignment of the National Standards for Learning Languages Holt McDougal Avancemos!, Level 2 2013 correlated to the Crosswalk Alignment of the National Standards for Learning Languages with the Common Core State Standards READING 1. Read closely to determine what

More information

Answers to end of chapter questions

Answers to end of chapter questions Answers to end of chapter questions Chapter 1 What are the three most important characteristics of QCA as a method of data analysis? QCA is (1) systematic, (2) flexible, and (3) it reduces data. What are

More information

Correlated to: ACT College Readiness Standards Science (High School)

Correlated to: ACT College Readiness Standards Science (High School) ACT College Readiness Science Score Range - 1-12 Students who score in the 1 12 range are most likely beginning to develop the knowledge and skills assessed in the other score ranges. locate data in simple

More information

BIOLOGY. The range and suitability of the work submitted

BIOLOGY. The range and suitability of the work submitted Overall grade boundaries BIOLOGY Grade: E D C B A Mark range: 0-7 8-15 16-22 23-28 29-36 The range and suitability of the work submitted In this session essays were submitted in a wide range of appropriate

More information

Recent Verdict Against Personal Trainer Lessons to be Learned

Recent Verdict Against Personal Trainer Lessons to be Learned Recent Verdict Against Personal Trainer Lessons to be Learned The Litigation In April of this year, a jury in Erie County, New York returned a verdict in a case against a personal trainer for $1.4 million,

More information

INTERNATIONAL STANDARD ON ASSURANCE ENGAGEMENTS 3000 ASSURANCE ENGAGEMENTS OTHER THAN AUDITS OR REVIEWS OF HISTORICAL FINANCIAL INFORMATION CONTENTS

INTERNATIONAL STANDARD ON ASSURANCE ENGAGEMENTS 3000 ASSURANCE ENGAGEMENTS OTHER THAN AUDITS OR REVIEWS OF HISTORICAL FINANCIAL INFORMATION CONTENTS INTERNATIONAL STANDARD ON ASSURANCE ENGAGEMENTS 3000 ASSURANCE ENGAGEMENTS OTHER THAN AUDITS OR REVIEWS OF HISTORICAL FINANCIAL INFORMATION (Effective for assurance reports dated on or after January 1,

More information

Expert System Profile

Expert System Profile Expert System Profile GENERAL Domain: Medical Main General Function: Diagnosis System Name: INTERNIST-I/ CADUCEUS (or INTERNIST-II) Dates: 1970 s 1980 s Researchers: Ph.D. Harry Pople, M.D. Jack D. Myers

More information

a) From initial interview, what does the client want? g) Formulate a timetable for action List options to present to client.

a) From initial interview, what does the client want? g) Formulate a timetable for action List options to present to client. From: Legal Services Practice Manual: Skills 2017 Benchmark Institute CASE PLANNING GUIDE 1. IDENTIFY CLIENT OBJECTIVES a) From initial interview, what does the client want? b) Summarize facts c) Identify

More information

Houghton Mifflin Harcourt Avancemos!, Level correlated to the

Houghton Mifflin Harcourt Avancemos!, Level correlated to the Houghton Mifflin Harcourt Avancemos!, Level 4 2018 correlated to the READING 1. Read closely to determine what the text says explicitly and to make logical inferences from it; cite specific textual evidence

More information

Psychology 205, Revelle, Fall 2014 Research Methods in Psychology Mid-Term. Name:

Psychology 205, Revelle, Fall 2014 Research Methods in Psychology Mid-Term. Name: Name: 1. (2 points) What is the primary advantage of using the median instead of the mean as a measure of central tendency? It is less affected by outliers. 2. (2 points) Why is counterbalancing important

More information

Assurance Engagements Other Than Audits or Reviews of Historical Financial Information

Assurance Engagements Other Than Audits or Reviews of Historical Financial Information SINGAPORE STANDARD ON ASSURANCE ENGAGEMENTS SSAE 3000 (Revised) Assurance Engagements Other Than Audits or Reviews of Historical Financial Information The Singapore Standard on Auditing (SSA) 100 Assurance

More information

January 2, Overview

January 2, Overview American Statistical Association Position on Statistical Statements for Forensic Evidence Presented under the guidance of the ASA Forensic Science Advisory Committee * January 2, 2019 Overview The American

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Thoughts on Social Design

Thoughts on Social Design 577 Center for Mathematical Economics Working Papers September 2017 Thoughts on Social Design Walter Trockel and Claus-Jochen Haake Center for Mathematical Economics(IMW) Bielefeld University Universitätsstraße

More information

QUESTIONING THE MENTAL HEALTH EXPERT S CUSTODY REPORT

QUESTIONING THE MENTAL HEALTH EXPERT S CUSTODY REPORT QUESTIONING THE MENTAL HEALTH EXPERT S CUSTODY REPORT by IRA DANIEL TURKAT, PH.D. Venice, Florida from AMERICAN JOURNAL OF FAMILY LAW, Vol 7, 175-179 (1993) There are few activities in which a mental health

More information

AMTA Government Relations Overview

AMTA Government Relations Overview AMTA Government Relations Overview Why license the massage therapy profession? Under U.S. law authority rests with states to regulate professions that have an impact on the health, safety and welfare of

More information

The Regression-Discontinuity Design

The Regression-Discontinuity Design Page 1 of 10 Home» Design» Quasi-Experimental Design» The Regression-Discontinuity Design The regression-discontinuity design. What a terrible name! In everyday language both parts of the term have connotations

More information

Summary Evaluation National Framework Forensic Diagnostics for Juveniles

Summary Evaluation National Framework Forensic Diagnostics for Juveniles Summary Evaluation National Framework Forensic Diagnostics for Juveniles Amsterdam, 11 November 2009 Wendy Buysse Mieke Komen Oberon Nauta Met medewerking van:with the assistance of: Bram van Dijk Annelies

More information

How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis?

How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis? How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis? Richards J. Heuer, Jr. Version 1.2, October 16, 2005 This document is from a collection of works by Richards J. Heuer, Jr.

More information

An Escalation Model of Consciousness

An Escalation Model of Consciousness Bailey!1 Ben Bailey Current Issues in Cognitive Science Mark Feinstein 2015-12-18 An Escalation Model of Consciousness Introduction The idea of consciousness has plagued humanity since its inception. Humans

More information

Challenges of Fingerprint Biometrics for Forensics

Challenges of Fingerprint Biometrics for Forensics Challenges of Fingerprint Biometrics for Forensics Dr. Julian Fierrez (with contributions from Dr. Daniel Ramos) Universidad Autónoma de Madrid http://atvs.ii.uam.es/fierrez Index 1. Introduction: the

More information

Chapter 2. Knowledge Representation: Reasoning, Issues, and Acquisition. Teaching Notes

Chapter 2. Knowledge Representation: Reasoning, Issues, and Acquisition. Teaching Notes Chapter 2 Knowledge Representation: Reasoning, Issues, and Acquisition Teaching Notes This chapter explains how knowledge is represented in artificial intelligence. The topic may be launched by introducing

More information

Illinois Supreme Court. Language Access Policy

Illinois Supreme Court. Language Access Policy Illinois Supreme Court Language Access Policy Effective October 1, 2014 ILLINOIS SUPREME COURT LANGUAGE ACCESS POLICY I. PREAMBLE The Illinois Supreme Court recognizes that equal access to the courts is

More information

EXECUTIVE SUMMARY INTERPRETING FUND SCOPING PROJECT LAW INSTITUTE OF VICTORIA

EXECUTIVE SUMMARY INTERPRETING FUND SCOPING PROJECT LAW INSTITUTE OF VICTORIA i EXECUTIVE SUMMARY INTERPRETING FUND SCOPING PROJECT LAW INSTITUTE OF VICTORIA 2 Introduction In Victoria, civil law matters range from small consumer disputes to large contractual claims between businesses.

More information

WORLD DATABASE OF HAPPINESS 1. Ruut Veenhoven, Erasmus University Rotterdam. Social Indicators, 1995, vol 34, pp

WORLD DATABASE OF HAPPINESS 1. Ruut Veenhoven, Erasmus University Rotterdam. Social Indicators, 1995, vol 34, pp WORLD DATABASE OF HAPPINESS 1 Ruut Veenhoven, Erasmus University Rotterdam Social Indicators, 1995, vol 34, pp 299-313 ABSTRACT The World Database of Happiness is an ongoing register of research on subjective

More information

Handling Partial Preferences in the Belief AHP Method: Application to Life Cycle Assessment

Handling Partial Preferences in the Belief AHP Method: Application to Life Cycle Assessment Handling Partial Preferences in the Belief AHP Method: Application to Life Cycle Assessment Amel Ennaceur 1, Zied Elouedi 1, and Eric Lefevre 2 1 University of Tunis, Institut Supérieur de Gestion de Tunis,

More information

Writing Reaction Papers Using the QuALMRI Framework

Writing Reaction Papers Using the QuALMRI Framework Writing Reaction Papers Using the QuALMRI Framework Modified from Organizing Scientific Thinking Using the QuALMRI Framework Written by Kevin Ochsner and modified by others. Based on a scheme devised by

More information

CHAPTER 4 THE QUESTIONNAIRE DESIGN /SOLUTION DESIGN. This chapter contains explanations that become a basic knowledge to create a good

CHAPTER 4 THE QUESTIONNAIRE DESIGN /SOLUTION DESIGN. This chapter contains explanations that become a basic knowledge to create a good CHAPTER 4 THE QUESTIONNAIRE DESIGN /SOLUTION DESIGN This chapter contains explanations that become a basic knowledge to create a good questionnaire which is able to meet its objective. Just like the thesis

More information

FORMAT FOR CORRELATION TO THE GEORGIA PERFORMANCE STANDARDS. Textbook Title: Benchmark Series: Microsoft Office Publisher: EMC Publishing, LLC

FORMAT FOR CORRELATION TO THE GEORGIA PERFORMANCE STANDARDS. Textbook Title: Benchmark Series: Microsoft Office Publisher: EMC Publishing, LLC FORMAT FOR CORRELATION TO THE GEORGIA PERFORMANCE STANDARDS Subject Area: Business and Computer Science State-Funded Course: Computer Applications II Textbook Title: Benchmark Series: Microsoft Office

More information

Writing in an Academic Style Module: Introduction

Writing in an Academic Style Module: Introduction Writing in an Academic Style Module: Introduction What is Academic Style? Writing tasks are different across different academic disciplines and to some extent the language use will be quite different from

More information

The Scientific Method

The Scientific Method The Scientific Method Objectives 1. To understand the central role of hypothesis testing in the modern scientific process. 2. To design and conduct an experiment using the scientific method. 3. To learn

More information

Errol Davis Director of Research and Development Sound Linked Data Inc. Erik Arisholm Lead Engineer Sound Linked Data Inc.

Errol Davis Director of Research and Development Sound Linked Data Inc. Erik Arisholm Lead Engineer Sound Linked Data Inc. An Advanced Pseudo-Random Data Generator that improves data representations and reduces errors in pattern recognition in a Numeric Knowledge Modeling System Errol Davis Director of Research and Development

More information

Assurance Engagements Other than Audits or Review of Historical Financial Statements

Assurance Engagements Other than Audits or Review of Historical Financial Statements Issued December 2007 International Standard on Assurance Engagements Assurance Engagements Other than Audits or Review of Historical Financial Statements The Malaysian Institute Of Certified Public Accountants

More information

Automated Medical Diagnosis using K-Nearest Neighbor Classification

Automated Medical Diagnosis using K-Nearest Neighbor Classification (IMPACT FACTOR 5.96) Automated Medical Diagnosis using K-Nearest Neighbor Classification Zaheerabbas Punjani 1, B.E Student, TCET Mumbai, Maharashtra, India Ankush Deora 2, B.E Student, TCET Mumbai, Maharashtra,

More information

Title: Healthy snacks at the checkout counter: A lab and field study on the impact of shelf arrangement and assortment structure on consumer choices

Title: Healthy snacks at the checkout counter: A lab and field study on the impact of shelf arrangement and assortment structure on consumer choices Author's response to reviews Title: Healthy snacks at the checkout counter: A lab and field study on the impact of shelf arrangement and assortment structure on consumer choices Authors: Ellen van Kleef

More information

CHAPTER 6 HUMAN BEHAVIOR UNDERSTANDING MODEL

CHAPTER 6 HUMAN BEHAVIOR UNDERSTANDING MODEL 127 CHAPTER 6 HUMAN BEHAVIOR UNDERSTANDING MODEL 6.1 INTRODUCTION Analyzing the human behavior in video sequences is an active field of research for the past few years. The vital applications of this field

More information

How preferred are preferred terms?

How preferred are preferred terms? How preferred are preferred terms? Gintare Grigonyte 1, Simon Clematide 2, Fabio Rinaldi 2 1 Computational Linguistics Group, Department of Linguistics, Stockholm University Universitetsvagen 10 C SE-106

More information

Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials

Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials Riccardo Miotto and Chunhua Weng Department of Biomedical Informatics Columbia University,

More information

A Survey on Brain Tumor Detection Technique

A Survey on Brain Tumor Detection Technique (International Journal of Computer Science & Management Studies) Vol. 15, Issue 06 A Survey on Brain Tumor Detection Technique Manju Kadian 1 and Tamanna 2 1 M.Tech. Scholar, CSE Department, SPGOI, Rohtak

More information

Unit 2, Lesson 5: Teacher s Edition 1. Unit 2: Lesson 5 Understanding Vaccine Safety

Unit 2, Lesson 5: Teacher s Edition 1. Unit 2: Lesson 5 Understanding Vaccine Safety Unit 2, Lesson 5: Teacher s Edition 1 Unit 2: Lesson 5 Understanding Vaccine Safety Lesson Questions: o What are the main issues regarding vaccine safety? o What is the scientific basis for issues regarding

More information

Lecturer: Rob van der Willigen 11/9/08

Lecturer: Rob van der Willigen 11/9/08 Auditory Perception - Detection versus Discrimination - Localization versus Discrimination - - Electrophysiological Measurements Psychophysical Measurements Three Approaches to Researching Audition physiology

More information

*A Case of Possible Discrimination (Spotlight Task)

*A Case of Possible Discrimination (Spotlight Task) *A Case of Possible Discrimination (Spotlight Task) This activity is adapted by one of the authors, Christine Franklin, from Navigating through Data Analysis in Grades 9-12, Burrill, Gail, Christine Franklin,

More information

LEN 227: Introduction to Corrections Syllabus 3 lecture hours / 3 credits CATALOG DESCRIPTION

LEN 227: Introduction to Corrections Syllabus 3 lecture hours / 3 credits CATALOG DESCRIPTION 1 LEN 227: Introduction to Corrections Syllabus 3 lecture hours / 3 credits CATALOG DESCRIPTION Prerequisite: Undergraduate level RDG 099 Minimum Grade of P or Undergraduate level RDG 055 Minimum Grade

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a

More information

- Conduct effective follow up visits when missing children return home ensuring intelligence is shared with appropriate partners.

- Conduct effective follow up visits when missing children return home ensuring intelligence is shared with appropriate partners. Job title: Grade: Role code: Status: Main responsibilities: Missing and Child Exploitation PCSO Grade D SDV027 Police Staff Main purpose of the role: Conduct enquiries to locate missing children as directed

More information

Artificial Intelligence For Homeopathic Remedy Selection

Artificial Intelligence For Homeopathic Remedy Selection Artificial Intelligence For Homeopathic Remedy Selection A. R. Pawar, amrut.pawar@yahoo.co.in, S. N. Kini, snkini@gmail.com, M. R. More mangeshmore88@gmail.com Department of Computer Science and Engineering,

More information

Disease predictive, best drug: big data implementation of drug query with disease prediction, side effects & feedback analysis

Disease predictive, best drug: big data implementation of drug query with disease prediction, side effects & feedback analysis Global Journal of Pure and Applied Mathematics. ISSN 0973-1768 Volume 13, Number 6 (2017), pp. 2579-2587 Research India Publications http://www.ripublication.com Disease predictive, best drug: big data

More information

Lecturer: Rob van der Willigen 11/9/08

Lecturer: Rob van der Willigen 11/9/08 Auditory Perception - Detection versus Discrimination - Localization versus Discrimination - Electrophysiological Measurements - Psychophysical Measurements 1 Three Approaches to Researching Audition physiology

More information

Review of Veterinary Epidemiologic Research by Dohoo, Martin, and Stryhn

Review of Veterinary Epidemiologic Research by Dohoo, Martin, and Stryhn The Stata Journal (2004) 4, Number 1, pp. 89 92 Review of Veterinary Epidemiologic Research by Dohoo, Martin, and Stryhn Laurent Audigé AO Foundation laurent.audige@aofoundation.org Abstract. The new book

More information

Using Your Brain -- for a CHANGE Summary. NLPcourses.com

Using Your Brain -- for a CHANGE Summary. NLPcourses.com Using Your Brain -- for a CHANGE Summary NLPcourses.com Table of Contents Using Your Brain -- for a CHANGE by Richard Bandler Summary... 6 Chapter 1 Who s Driving the Bus?... 6 Chapter 2 Running Your Own

More information

III. WHAT ANSWERS DO YOU EXPECT?

III. WHAT ANSWERS DO YOU EXPECT? III. WHAT ANSWERS DO YOU EXPECT? IN THIS CHAPTER: Theories and Hypotheses: Definitions Similarities and Differences Why Theories Cannot be Verified The Importance of Theories Types of Hypotheses Hypotheses

More information

5.I.1. GENERAL PRACTITIONER ANNOUNCEMENT OF CREDENTIALS IN NON-SPECIALTY INTEREST AREAS

5.I.1. GENERAL PRACTITIONER ANNOUNCEMENT OF CREDENTIALS IN NON-SPECIALTY INTEREST AREAS Report of the Council on Ethics, Bylaws and Judicial Affairs on Advisory Opinion 5.I.1. GENERAL PRACTITIONER ANNOUNCEMENT OF CREDENTIALS IN NON-SPECIALTY INTEREST AREAS Ethical Advertising under ADA Code:

More information

Chapter 3 Software Packages to Install How to Set Up Python Eclipse How to Set Up Eclipse... 42

Chapter 3 Software Packages to Install How to Set Up Python Eclipse How to Set Up Eclipse... 42 Table of Contents Preface..... 21 About the Authors... 23 Acknowledgments... 24 How This Book is Organized... 24 Who Should Buy This Book?... 24 Where to Find Answers to Review Questions and Exercises...

More information

Artificial intelligence and judicial systems: The so-called predictive justice. 20 April

Artificial intelligence and judicial systems: The so-called predictive justice. 20 April Artificial intelligence and judicial systems: The so-called predictive justice 20 April 2018 1 Context The use of so-called artificielle intelligence received renewed interest over the past years.. Stakes

More information

Observational Category Learning as a Path to More Robust Generative Knowledge

Observational Category Learning as a Path to More Robust Generative Knowledge Observational Category Learning as a Path to More Robust Generative Knowledge Kimery R. Levering (kleveri1@binghamton.edu) Kenneth J. Kurtz (kkurtz@binghamton.edu) Department of Psychology, Binghamton

More information

CHAPTER-5. Family Disorganization & Woman Desertion by Socioeconomic Background

CHAPTER-5. Family Disorganization & Woman Desertion by Socioeconomic Background CHAPTER-5 Family Disorganization & Woman Desertion by Socioeconomic Background CHAPTER-5 FAMILY DISORGANIZATION AND WOMAN DESERTION BY SOCIOECONOMIC BACKGROUND This chapter examines the part played by

More information

FINAL REPORT Measuring Semantic Relatedness using a Medical Taxonomy. Siddharth Patwardhan. August 2003

FINAL REPORT Measuring Semantic Relatedness using a Medical Taxonomy. Siddharth Patwardhan. August 2003 FINAL REPORT Measuring Semantic Relatedness using a Medical Taxonomy by Siddharth Patwardhan August 2003 A report describing the research work carried out at the Mayo Clinic in Rochester as part of an

More information

An assistive application identifying emotional state and executing a methodical healing process for depressive individuals.

An assistive application identifying emotional state and executing a methodical healing process for depressive individuals. An assistive application identifying emotional state and executing a methodical healing process for depressive individuals. Bandara G.M.M.B.O bhanukab@gmail.com Godawita B.M.D.T tharu9363@gmail.com Gunathilaka

More information

OECD QSAR Toolbox v.4.2. An example illustrating RAAF scenario 6 and related assessment elements

OECD QSAR Toolbox v.4.2. An example illustrating RAAF scenario 6 and related assessment elements OECD QSAR Toolbox v.4.2 An example illustrating RAAF scenario 6 and related assessment elements Outlook Background Objectives Specific Aims Read Across Assessment Framework (RAAF) The exercise Workflow

More information

Cohesive Writing. Unit 1 Paragraph Structure INDEPENDENT LEARNING RESOURCES. Learning Centre

Cohesive Writing. Unit 1 Paragraph Structure INDEPENDENT LEARNING RESOURCES. Learning Centre Cohesive Writing Unit 1 Paragraph Structure INDEPENDENT LEARNING RESOURCES Learning Centre Unit 1 PARAGRAPH STRUCTURE OBJECTIVES OF THIS UNIT After you have completed this unit, we hope you will be able

More information

Heiner Oberkampf. DISSERTATION for the degree of Doctor of Natural Sciences (Dr. rer. nat.)

Heiner Oberkampf. DISSERTATION for the degree of Doctor of Natural Sciences (Dr. rer. nat.) INTEGRATED REPRESENTATION OF CLINICAL DATA AND MEDICAL KNOWLEDGE AN ONTOLOGY-BASED APPROACH FOR THE RADIOLOGY DOMAIN Heiner Oberkampf DISSERTATION for the degree of Doctor of Natural Sciences (Dr. rer.

More information

Sentiment Analysis of Reviews: Should we analyze writer intentions or reader perceptions?

Sentiment Analysis of Reviews: Should we analyze writer intentions or reader perceptions? Sentiment Analysis of Reviews: Should we analyze writer intentions or reader perceptions? Isa Maks and Piek Vossen Vu University, Faculty of Arts De Boelelaan 1105, 1081 HV Amsterdam e.maks@vu.nl, p.vossen@vu.nl

More information

Prediction of Malignant and Benign Tumor using Machine Learning

Prediction of Malignant and Benign Tumor using Machine Learning Prediction of Malignant and Benign Tumor using Machine Learning Ashish Shah Department of Computer Science and Engineering Manipal Institute of Technology, Manipal University, Manipal, Karnataka, India

More information

Incorporation of Imaging-Based Functional Assessment Procedures into the DICOM Standard Draft version 0.1 7/27/2011

Incorporation of Imaging-Based Functional Assessment Procedures into the DICOM Standard Draft version 0.1 7/27/2011 Incorporation of Imaging-Based Functional Assessment Procedures into the DICOM Standard Draft version 0.1 7/27/2011 I. Purpose Drawing from the profile development of the QIBA-fMRI Technical Committee,

More information

Data Mining in Bioinformatics Day 4: Text Mining

Data Mining in Bioinformatics Day 4: Text Mining Data Mining in Bioinformatics Day 4: Text Mining Karsten Borgwardt February 25 to March 10 Bioinformatics Group MPIs Tübingen Karsten Borgwardt: Data Mining in Bioinformatics, Page 1 What is text mining?

More information

Thinking Like a Researcher

Thinking Like a Researcher 3-1 Thinking Like a Researcher 3-3 Learning Objectives Understand... The terminology used by professional researchers employing scientific thinking. What you need to formulate a solid research hypothesis.

More information

Chapter IR:VIII. VIII. Evaluation. Laboratory Experiments Logging Effectiveness Measures Efficiency Measures Training and Testing

Chapter IR:VIII. VIII. Evaluation. Laboratory Experiments Logging Effectiveness Measures Efficiency Measures Training and Testing Chapter IR:VIII VIII. Evaluation Laboratory Experiments Logging Effectiveness Measures Efficiency Measures Training and Testing IR:VIII-1 Evaluation HAGEN/POTTHAST/STEIN 2018 Retrieval Tasks Ad hoc retrieval:

More information

Perceived similarity and visual descriptions in content-based image retrieval

Perceived similarity and visual descriptions in content-based image retrieval University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2007 Perceived similarity and visual descriptions in content-based image

More information

The Open Access Institutional Repository at Robert Gordon University

The Open Access Institutional Repository at Robert Gordon University OpenAIR@RGU The Open Access Institutional Repository at Robert Gordon University http://openair.rgu.ac.uk This is an author produced version of a paper published in Intelligent Data Engineering and Automated

More information

Artificial Doctors In A Human Era

Artificial Doctors In A Human Era Artificial Doctors In A Human Era The term Artificial Intelligence (AI) is overused today. Unfortunately, this often leads to a misunderstanding of what AI is. Artificial intelligence is an umbrella term

More information

On the Combination of Collaborative and Item-based Filtering

On the Combination of Collaborative and Item-based Filtering On the Combination of Collaborative and Item-based Filtering Manolis Vozalis 1 and Konstantinos G. Margaritis 1 University of Macedonia, Dept. of Applied Informatics Parallel Distributed Processing Laboratory

More information

What Smokers Who Switched to Vapor Products Tell Us About Themselves. Presented by Julie Woessner, J.D. CASAA National Policy Director

What Smokers Who Switched to Vapor Products Tell Us About Themselves. Presented by Julie Woessner, J.D. CASAA National Policy Director What Smokers Who Switched to Vapor Products Tell Us About Themselves Presented by Julie Woessner, J.D. CASAA National Policy Director The CASAA Consumer Testimonials Database Collection began in 2013 through

More information

Predicting Breast Cancer Survivability Rates

Predicting Breast Cancer Survivability Rates Predicting Breast Cancer Survivability Rates For data collected from Saudi Arabia Registries Ghofran Othoum 1 and Wadee Al-Halabi 2 1 Computer Science, Effat University, Jeddah, Saudi Arabia 2 Computer

More information

Coversheet: Medicinal cannabis: 100 day action

Coversheet: Medicinal cannabis: 100 day action Coversheet: Medicinal cannabis: 100 day action Advising agencies Decision sought Proposing Ministers Ministry of Health Introduction of Misuse of Drugs Amendment Bill Hon Dr David Clark, Minister of Health

More information

EECS 433 Statistical Pattern Recognition

EECS 433 Statistical Pattern Recognition EECS 433 Statistical Pattern Recognition Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 19 Outline What is Pattern

More information