Identifying Variable Length Multi-pair Palindromic Patterns with Errors in a DNA Sequence

Size: px
Start display at page:

Download "Identifying Variable Length Multi-pair Palindromic Patterns with Errors in a DNA Sequence"

Transcription

1 Identifying Variable Length Multi-pair Palindromic Patterns ith Errors in a DNA Sequence Hyoung rae Kim Department of Computer Sciences Florida Institute of Technology Melbourne, FL 90, USA hokim@fit.edu William D. Shoaff Department of Computer Sciences Florida Institute of Technology Melbourne, FL 90, USA ds@cs.fit.edu Abstract The emphasis in genome projects has moved toards sequence analysis in order to extract biological meaning (eg., evolutionary history of particular molecules or their functions) from the sequence. Especially, palindromic or direct repeats that appear in a sequence has biophysical meaning [6]. A problem is recognizing interesting patterns and configurations of ords (strings of characters) over complementary alphabets. We propose an algorithm to identify variable length palindromic pairs (longer than a threshold)here e can allo gaps (distance beteen ords). The algorithm is called palindrome algorithm (PA) and has O(N) time complexity. A palindromic pair consists of a hairpin structure. By composing collected palindromic pairs e build n-pair palindromic patterns; this is called a structural representation algorithm (SRA). In addition, e dot some of the longest pairs in a circle to represent the structure of a DNA sequence. We run this algorithm over several selected genomes and the results of E.coli K are presented. Keyords: complement, palindrome, palindromic pattern, algorithm, string searching, DNA structure, DNA sequence, structural representation, efficient pattern extraction, Escherichia coli K, Escherichia coli o57, salmonella,. Introduction One of the problems arising in the analysis of biological sequences is the discovery of patterns that appear at different positions in a nucleic acid. Genomic science and structural biology have relationships in terms of the sequence and the structure of nucleic acids. It is ell knon that in a palindrome of nucleic acids the subsequence binds ith the subsequence in the opposite direction complementary on its on strand to make a stem-loop. Since in DNA, topological entanglements such as knots and catenation are crucial to the function of cells, finding these palindromes and knots is important []. Especially, palindromic or direct repeats that appears in a sequence has biophysical meaning: recognition site of dimers, forming stem-loops, and contributions to global structure of nucleic acids; moreover, the genetic netork, transduction pathay, and tissue specificity are also related to these sequences [6]. Our research focuses on finding inverted repeats (palindromes). When e start to search for palindromic pairs in a DNA sequencee basically have no information about the ords (subsequences) for hich e are searching. It is easy to start searching for fixed length palindromic pairs; hoever, finding fixed length palindromes has several disadvantages, in particular setting the length. It may be very difficult to kno hich length has biological meaning. We can find variable length palindromes longer than the minimum ord length (this can be calculated either automatically or assigned by a user). Palindromes may contain some mismatches in the form of gaps and defects of various other natures [4,7]. So e also allo errors ithin a ord. We may be able to compose these collected palindromic pairs (that consists of a hairpin structure) to multiple pair palindromic patterns. Furthermoree can use this to help understand the structure of a DNA sequence. We divide a sequence into n-gram tokens, and then combine tokens by merging adjacent tokens to long ords. Our algorithm has linear time complexity for the length of a DNA sequence and O(N LTK) space complexityhere LTK is n-gram indo size. We call this kind of searching paradigm break a string and then merge them to be real one break and gather searching technique. A disadvantage of this algorithm is it is not complete and may not find a palindromic pair even if one exists. This hoever, happens only in a very synthetic string sets (for example a string that consists of only one character). Our contributions for this ork are to introduce a ne palindrome searching algorithm (PA), to introduce a ne possibility of structural representation of a DNA sequence, to extend a palindromic representation from hairpin structure to multi-pair palindromic patterns, and to introduce a ne searching paradigm, break and gather that e have used for finding palindromic pairs. We hope our ne structural representation scheme by using a n-

2 pair palindromic pattern ill help biologist in visualizing DNA sequences. The rest of this paper is organized as follos: Section defines palindrome, palindromic pattern, and presents our palindrome algorithm; Section details our PA and the algorithm to generate patterns; Section 4 describes our experiments; Section 5 analyzes the results from the experiments; Section 6 discusses related ork in searching palindrome; Section 7 summarizes our ork and suggests possible future ork.. Palindromic pair and patterns Palindromic pairs represent hairpin structures (stemloop formation) that has biological meaning in DNA and RNA [6]. A palindrome is an inverted repeats of a ord (subsequence); a ord is a sequence of characters. When searching a DNA sequence for this palindromic paire do not have the information about the ord length or ho many palindromic pairs exist in a DNA sequence. We can easily think about the fixed length of a ord. The method of finding fixed length of palindromes has the critical disadvantage of setting the length of a palindromehere the length that has biological meaning may not be knon. In our approach the length of a palindrome is not fixed a priori. We find variable length palindromes that are longer than minimum length of a ord. A palindromic pair can represent only a hairpin structure. With multiple palindromic pairs e extend this research to multi-pair palindromic patterns. We composed multiple palindromic pairs and constructed patterns (details in Sec..). In the next sections e detail the characteristics of a palindrome and palindromic patterns; the input and output data is also illustrated... Palindromic pair A palindromic pair consists of a ord and its complement as shon in Figure. Even though its structure is not complex, there are some characteristics: gaps, length, overlap, and errors. There can be a gap beteen the ord and its complement. Palindrome recognition differs according to the presence of a gap and differences in length; it is necessary to examine complement ith gaps [6]. Long palindromes are more informative. Short ones are too common in DNA. For example all palindromes of length 4 can be found easily but it is hard for us to find information from them. Word aaatg cattt Reverse complement Figure. Word and complement There can be overlaps among palindromes as shon in Figure. Palindrome ( and ) and ( and ) are overlapped. There can be many ays of collecting pairs from this situation. First collect only and (collecting the first pair), second collect the longer pairs, third collect both pairs, and fourth collect and (collecting the second pair). Since one of our purposes is to collect as many pairs as possiblee chose the second method that collects both pairs. In some cases three or more pairs can overlap each other. Hoever, since the biological meaning of overlapping is not knon yete do not collect all possible pairs alloing multiple overlaps (e only allo overlap one time). The details are explained in Section... Once e get the palindromic pairs including overlapped pairs, those overlapped pairs can be removed easily depending on the necessity. Figure. Overlapping palindromes The algorithm to find complement token is explained in Section... There can be mismatches in the form of gaps and defects in a sequence itself [4,7]. In PA, tokens match exactly but e allo an error or mismatches beteen tokens hen they are combined. Every error can exists only after the token length. Since normally the length of a token is much smaller than a ord and e assign small errorse do not think this limitation ill cause big differences in the results. This process is detailed in Section Palindromic pattern A palindromic pair constructs hairpin structures; once e get multiple palindromic pairs e can easily extend the structure to multiple pair palindromic patterns. A palindromic pattern is a pattern composed of multiple palindromic pairs. To ords and ith complements and form a -pair palindromic pattern. There are three different -pair patternshich can be identified by integers or ordinal quadruples. Figure presents three types, each of hich is ordered folloing a Closest Position First (CPF) rule that states a complement is to be inserted in the position closet to the start of the pattern starting from the last ord. Each number inside the bracket represents a ord; the repeated numbers represent their complement. Overlap

3 Type 0: (,,, ) Type : (,,, ) Type : (,,, ) Figure. -pair palindromic patterns For example (,,,) in Figure represents (, ); this type can also be represented in a sequence S as shon in Figure 4. s Figure 4. Pattern type in a string s To-pair palindromic patterns may also have restrictions just as -pair palindromic patterns do, yet another constraints can be applied: A minimum and maximum gap beteen ords (subsequences). Words should not be closer than the minimum ord gap and should not be separated by more than the maximum ord gap. Capital letters in Figure 5 represents ords and their complements; e call the gap beteen a ord and its complement complement gap and the gaps beteen ords a ord gap. N-pair patterns generate many different multiple-pair palindromic patterns. In section..e explain an algorithm to generate all possible n- pair palindromic patterns. We can abbreviate some patterns hen they are isomorphic hen vieed as a graph (details in Sec...). Complement gap Complement gap s = acac AAAT acac CACT acac ATTT acac AGTG acac Word gap Word gap Word gap Figure 5. Word and complement gap There can be many palindromic pairs in a DNA sequence. Once e pick a number of longest palindromic pairs (n)e can represent them as n-pair palindromic patterns. We call this a structural representation. One can identify certain DNA sequences by using enough palindromic pairs. This structure can be implemented simply, and e call the algorithm as structural representation algorithm (SRA). More details of SRA ill be explained in section... Different genomes appear to have different structural representations... Input/output data Our research focuses on collecting palindromic pairs from a DNA sequence. In collecting palindromes there are several options (constraints). We may need to set the token (n-gram) length. Sometimes e may only ant long palindromic pairs, so e need to set the minimum length (this can be the same as the length of a token). We can assign a gap beteen a ord and its complement and a gap beteen different ords (minimum and maximum complement gaps, and minimum and maximum ord gap). These gaps can be any value from 0 through the sequence length. The length of an error means the number of character mismatches alloed hen tokens are combined. An example of input data is shon in Figure 6. A DNA sequence:.. aaattaata.. aataaaaaga.. gataaact... tattaattt... tctttttatt agtttatc. length: 8 Minimum length of a palindromic pair: 9 Minimum complement gap: 0 (< sequence size) Maximum complement gap: The size of a sequence Minimum ord gap: 0 (< sequence size) Maximum ord gap: The size of a sequence Length of an error: 0 Figure 6. Input data The output results in variable length palindromes. An example is shon in Figure 7. Our program continues to build n-pair palindromic patterns and count the number of these patterns. The numbers of different type n-pair palindromic patterns are also output data; in addition, by using a number of collected pairs e represent a structure of a DNA by combining and by plotting in a circle. Position String Position String 50. Approach Word aaattaata aataaaaaga Figure 7. Output data Complement tattaattt tctttttatt... When the objects that e are searching for are not knon and lie in an enormous amount of data, the problem becomes difficult. For this kind of a problem e propose a ne searching method, called Break and Gather. It breaks the problem of finding long palindromes into finding small, fixed-sized chunks, then gathers and combines the chunks that can be combined. Another advantage of this Break and Gather method is

4 its easy adaptability to other pattern searching problems (ex. searching for direct repeats). We explain the palindrome algorithm (PA) that detects palindromic pairs ith variable length and gaps from a DNA sequence in Section.. Details are explained in In addition, the collected palindromic pairs can then be used for generating multi-pair palindromic patterns by synthesizing and matchinghich is explained in Section.... Palindrome algorithm We use n-gram tokens to search a DNA sequence for ords. We ill briefly explain the process. At first e set the length of a token (LTK). We treat characters as numbers: a = 0, c =, g = and t =. The encoding of a token ill be explained in... Each token is stored in an open hash table, an array of pointers (details in Section..). As e store a token, the existence of its complement is also checked. We do not convert each token to its complement and search the hash table; insteade pass the complement together ith the token. The complement gaps are considered hile searching for a complement token. All the token pairs that meet the constraints are retrieved from the DNA sequence. If a valid token pair that meets complement gaps is found, then they are stored in a token table (TT) that holds the token, the complement token, and their positions. This process is explained in Section... The structure of a token table is shon in Figure 8. Figure 8. Structure of the token table Once the hole DNA sequence is scanned, the token table is sorted by positions. When all the pairs are sorted by position some tokens and complements are adjacent to the next token and complement. It is clear that those pairs can be combined. Thereforee combine adjacent tokens and store them in a Combined token table (CTT). This is illustrated in Section..4. Out of the combined token table e select only the combined tokens longer than minimum ord length and store them in a ord table (WT) those results are palindromic pairs. The folloing subsections explain the details.... structure Even though the goal is to collect long variable length palindromes; as a first stepe break a DNA sequence into small tokens and start to search for palindromic token pairs. Since e do not kno the maximal size of the palindromes in a DNA sequencee start from a small size. To reduce the time e treat the characters as numbers. But the problem is ho to convert a long set of characters into a number and ho to reduce the time complexity of converting a token to a number. We encode and save a token as a number. The characters are treated as number: a = 0, c =, g = and t =. For instance, a string s = caacgt ill be calculated like this: = 05. The number 05 exactly signifies the string caacgt. Since e are converting characters into number there is a limitation in the token size. We ill use signed long integer in C++. Of course, other different data types could be used such as unsigned long long hich reaches to around The maximum value of a signed long integer is,47,48,647. This simply implies that an Integer can only store 5 characters (4 5 <,47,48,647<4 6 ). Hoever, the length of a token (LTK) could be longer than 5. So e divide a token into to parts: a hash index and list indexes (the first characters are used for the hash index and the remainders are grouped into 5 character chunks, except the last chunk hich may be smaller) as shon in Figure 9. The reason hy e assign characters for hash index and the structure of the open hash table ill be explained in the next section. Hash index List index Hash index List index List index List index n Figure 9. Structure of a token Pointer of positions Pointer of positions For examplehen a token length is 50 the first a token Its complement characters are used for the hash index and the next 5 characters are used for the first list index, the next 5 characters for the second list index, and the last 9 characters for the third list index. When a fixed length token is encoded into a number, it is dependent on the token length (LTK). Given that the time complexity of converting a token to a number is Ο(LTK), the total time complexity of converting all tokens in a DNA sequence is Ο(LD LTK)here LD stands for the length of the sequence. Hoever, the index of the next ord can be computed incrementally simply by scanning a ne character. We can vie a string of c consecutive characters as representing a length-c quaternary number. Given a string T[..d], let ts denote the quaternary value of the length-c substring T[s+..s+c], for s=0,,, d-c. Certainly, ts = if and only if T[s+..s+c] = [..c]. We can compute in time O(c): =P[c] + 4(P[c-] + 4(P[c-]+ +4(P[]+4P[]) )).

5 To compute the remaining values t, t,,td-c in time O(d-c), it suffices to observe that ts+ can be computed from ts in constant time [4], since ts+ = 4(ts 4 c- T[s+])+T[s+c+]. For example, let us say a token (LTK is 8) is acggtgat. the next token is cggtgatg, the hash ord length is and the length of each list ord is. First, acg is encoded and 6 ( =6) is kept as the hash index, gtg is encoded as 46 and kept as list index, and at is encoded as for list index. The first character a of the hash index in the token becomes the old character for the ne hash index hile the g in list index becomes the ne character in the hash index. Similar shifts occur for the other indices. This is depicted in Figure 0. The ne values for the ne token is calculated: ne value=(4 (old value old char base)) + ne char, index length- here base is 4 and index length is the number of converted characters in the index. For example the ne hash value is 6 (6=(4 (6-0 4 ))+)here 6 is old value, 0 is old character ( a ), 4 is base (4 index length- ), index length is, and is ne character ( g ). Old character for the hash index Hash index c g g Figure 0. Converting character to number Ho to get the old and ne characters ill be explained in Section Hash table Ne character in the hash index & Old character for the list index t g a List index Hash index List index List index List index a c g g t g Ne character in the list index & Old character for the list index Ne token Ne character in the list index While scanning a DNA sequence each token is stored in an open hash table. Since e are trying to build a token table (a table that stores tokens and their complements ith positions), every time a token is scanned existence of its complement should be checked also. And if it is there, a t t g Old token both should be stored in the token table. To increase the searching speed e use an open hash table. The structure of a open hash table is basically just an array of pointers to linked lists that stores a token. The structure is shon in Figure. The structure is very simple, so e focus on explaining the size of the hash table. The size of the hash table is not related to the token size, but the size of the DNA sequence (LD). The program saves all the captured tokens and their positions; therefore, the space complexity is Ο(LD LTK). The number of tokens in a DNA base is LD LTK + LD. Therefore, if e say the optimal open hash table size is LD. The optimal length of hash character sets is log 4 LD. For instance, the size of E.coli K is 4,69, and the number of characters used for the hash index is (.07 log 4 4,69,). Hash table Figure. Hash table... Finding a complement token Position array Above e explained ho a token is inserted into a hash table. It is also necessary to check hether there its complement exists already. If e convert the number every time to check the existence of a complement, its time complexity is going to be O(LD LTK). Therefore, instead of converting the characters e pass the complement as a value. Here e ill explain ho to compute the complement token ith efficiency and ho to get the old and ne characters for each token conversion. Later e ill explain ho to build the token table. As mentioned abovee do not convert each token to its complement and search the hash table; insteade pass its complement together ith the token. We no explain ho to kno ne added or removed characters from a token ith constant time complexity. We devised a special type of queue, called an open queuehich removes the first element and get a ne element hen it is full, and has a Peek method that sees the character at a certain position besides the standard Enqueue and Dequeue methods. We made to queues hich size is

6 token length+ ith open queue type: one is called queue and the other is called requeue. While scanning, all the ords pass through the queue and all complement token through the requeue. One gros from the right and the other from the left as shon in Figure. The time complexity of passing through or peeking into a queue is O(). By peeking certain positions ne characters and old characters for each section of a ord are picked. Queue a c g g t g a t g c Ne char. for list index Ne char. for list index, Old char. for list index Ne char. for hash index, Old char. for list index Old char. for hash index Requeue g c a t c a c c g t Old char. for list index Ne char. for list index Old char. for list index Ne char. for list index, Old char. for hash index Ne char. for hash index Figure. Open queue and open requeue The complement gaps are considered hile searching for a complement token. All the token pairs that meet the constraints are retrieved from the sequence. If a valid token pair is found, then it is saved in a token table (TT) that holds the token, the complement token, and their positions. A token table is made from position arrays as follos: each token instance has a position array that stores all positions here the token occurred. When a position is added to this array a check to the complement token position array is made and if there is a valid pair (one ith enough gap) then the position pair is sent to the token table. After this, the array index of the smaller position value is advanced so that position ill not be used for future comparisons in pair searching. For example, in the case of Figure, the first position value arrived and is stored in the position array of a token. At this moment no comparison occurs since the complement token s position array is empty. When the next value, 59, arrived it becomes a pair ith position, and the value () ill not be used again. The pair (, 59) is sent to a token table. When position value 750 arrived, it pairs ith 59. And the pair (59, 750) is sent to the token table. Usually the lengths of each position arrays are not very long. If it is long it means there are many repeated tokens and palindromic pairs. To collect hole possible overlapping pairs the hashing process and searching complement process should be separated. Searching complements for each token has to be done after all the tokens are stored in the hash table. The token table consists of cartition products of the to sets of positions of ords and their complementshere the position of a token appears ahead of its complement. This separating process ensures us to find the longest palindromic pairs as ell, and the time complexity does not change. Position array of a token Position array of its complement token Pointer table (TT) Complement aattaat 59 attaattt Figure. Finding a token pair Long palindromes are rare and considered more informative. Soe prefer longer palindromes over shorter ones. Furthermore, if the token size is too small then there may be too many token pairshich lead to space problems in our algorithm. Let s suppose a token length of and four different characters. Then the average distant point here their palindromes occur can be calculated in the uniformly random case: all possible different cases are 6 (4 ), since the string is random, all 6 different cases may be distributed evenly. The probability of next occurrence of the same token is 6, but the palindromic token is alays ithin the 6 tokens, so e need to divide it by. The average distance of its complement is: 4 LTK /here 4 is the number of different characters (a, c, g, t) Generally, the palindrome that can happen in a random case may not be informative; thereforee assign the value of LTK bigger than the length that can statistically happen in a random case...4. Combining adjacent tokens Breaking a huge string into parts and locating token pairs as explained above. No e explain ho to combine the parts. Once the hole DNA is scanned, the token table is sorted by positions. When all the pairs are sorted by position some tokens and complements are adjacent to the next token and complement. It is clear that those pairs can be combined. Thereforee combine adjacent tokens and store them in a Combined token table (CTT). In the example of Figure 4, the token and complement token at the positions of (, 4) and (59, 59) respectively overlap except one character. We can combine these into one larger ord; this procedure is

7 called combination. The combined token table shos fully connected tokens. The positions of next ord can be represented as p+, hen p is the position of a ord ithout alloing errors. But hen e allo E errors (E is the number of error characters), the position of next ord becomes beteen p+ through p++e. We keep combining if the next position of a ord and complement are ithin this period. Position String aaattaat 4 aattaata 50 aataaaaa 5 ataaaaag 5 taaaaaga 00 gataa act... table (TT) Combined token table (CTT) position String aaattaata 50 aataaaaaga 00 gataaact... Complement token Position String attaattt tattaatt tttttatt ctttttat tcttttta agttt atc... Complement token position String 58 tattaattt 76 tctttttatt 9 agtttatc... Figure 4. Combining adjacent tokens PA is not a complete algorithmhich means there can exist a palindromic pairs PA cannot find. For example a sequence is atatatat atatat (size is S) that consists of only to complement characters a and t. if e do not allo complement gap the size of the palindromic pair should be the half of the sequence (S/). But our algorithm may not be able to find it. Lets suppose the token length is LTK, then the size of the ord and its palindrome ill be (S-LTK) and they are overlapped; this pair ill be removed. To check this problem e check each token that consists of only to complement characters. To compensate this probleme proposed above an idea as a future ork. We store tokens and complements ith positions in a hash table. After that e build a token table that consist of cartition products of the to sets of positions of ords and their complements, here a ord alays comes ahead of its complement. Then e sort the token table and combine tokens and complements. PA () { () Scanning and building TT by using open hash table () Sorting the TT in the order of position () Combine adjusted tokens and palindromes (4) Select combined tokens longer than minimum length of a palindromic pairs and store the selected ones to WT (5) return WT } Figure 5. Palindromic algorithm The combined tokens are not considered as ords yet. Only the ones hich length is bigger than the minimum ord length are stored into a ord table (WT). For instance the minimum ord length is 9, the token at the position of 00 is not selected for the ord table... Palindromic pattern algorithm Knoing palindromes is useful but only for predicting potential hairpins. Since structural information is importante try to find more complicated structures (palindromic patterns). Even though e can generate very complicated patterns using multiple palindromese only focus on -pair palindromic patterns in counting the number of patterns in our experiment. The algorithm that generates all possible n-pair palindromic patterns is explained in Section... After generating all possible palindromic patterns e tried to count each palindromic pattern to kno the dominant patterns. Further more, for visualizatione present some simplification of their patterns. A ay of simplification of palindromic patterns is explained in Section... It may be possible to create a specific n-pair palindromic pattern that can identify a DNA sequence if e can use enough number of palindromic pairs; this algorithm is detailed in Section Generating all n-pair palindromic patterns An algorithm for generating n-pair palindromic patterns is explained in this section, called palindromic pattern algorithm (PPA). This algorithm can be applied to generating any n-pair patterns. When... n and their complements (, ) exist in a string s, they are ordered such that i+ n alays appear after i and, by convention, each complement appears after. The ords... n and their complement n, can make various combinations called n-pair palindromic patterns. For example { } for n= and {{ }, { }, { }} for n=. When n is small, one can build the possible patterns intuitively. But hen n is big there ill be many multi-pair palindromic patterns. As an example the case for -pair palindromic patterns is shon in Figure 6. First, the three ords and are assigned to the root nodehere happens after and happens after. Positioning generates one child node because can occur only after ; positioning generates three child nodes because have three possible positions: beteen and, beteen and, after. We follo a Closest

8 Position First (CPF) rule. Finally, by positioning in each possible location, fifteen -pair palindromic patterns are generated. An integer value is assigned for each pattern from left to right in a leaf nodes in Figure 6, here each number stands for a ord.,,... Simplification of palindromic patterns -pair palindromic pattern generated 5 different patterns. More dimensional palindromic patterns ill generate more patterns. Some of their patterns are isomorphic hen vieed as a graph. To abbreviate the pattern types disregard direction and some patterns form the same shape. For example the pattern {,,,,,} and {,,,,,} becomes one type by graph similarity. This is illustrated in Figure 8. 0:,,,,, :,,,,, :,,,,, :,,,,, 4:,,,,,, 5:,,,,, 6:,,,,, 7:,,,,, 8:,,,,, Figure 6. -pair palindromic patterns 9:,,,,, 0: : : : 4: The number of leaf nodes for n=,, and is,, and 5 respectively. Let L(n) denote this sequence. Intuitivelye can see that the number of leaf nodes, L(n), in a palindromic pattern tree can be calculated by multiplying by L(n-) by n here L()=. Notice n- is simply the number of ords in the penultimate level of the tree. For example, for -pairs the number of child nodes is, and there are 5 position to insert hen building -pair patterns. L()= L(n) = ( n ) L(n- ) = (n ) (n ) 7 5 * String I = put three ords (ex., n ) * Pattern S = {I} // store set of strings * Stack = store complements (ex.... ) n PPA (Stack, S) { If (Stack is empty) return S End if } = Stack.pop () For (each e S) S = S & <insert to all possible places in e from closest place to > End of For PPA (Stack, S ) Figure 7. Palindromic pattern algorithm Figure 8. Similar pattern types in -dimensions... Structural representation algorithm Above e explained an algorithm to generate all possible n-pair palindromic patterns. We can easily expect that if e have enough palindromic pairs collected (ex. 0 or more), then a palindromic pattern can identify a certain DNA sequence. We call this structural representation the algorithm is called structural representation algorithm (SRA). The SRA gets this sequence of pairs as an input parameter. Lets suppose () the input is to ros of data that stores the positions of the palindromic pairs. () e sort them by position of ords, and () assign the same integer id to each ord and its palindrome. (4) We make the to ros as a one long ro. (5) Sort the ro by position and (6) read the integer id. * PW is the positions of ords and their complements SRA (PW) { () sort them by position of ords () assign a integer tag to each ord and its complement () make the to ros as one long ro (4) Sort the ro by position (5) return the integer tags } Figure 9. Structural representation algorithm

9 4. Experiment The purpose of our research is to identify palindromes globally. Experiments ere conducted on the DNA bases of Escherichia coli K (E.coli K) [], Escherichia coli o57 (E.coli o57) [], and Salmonella [], but only the result of E.coli K (ACCESSION U00096LOCUS ECOLI 469 bp DNA circular BCT 0-SEP-997) is presented in this paper since the purpose of this conference is more related to the computation than the biological results. The results of the rest sequences can be provided upon requests. The size of those genomes is around 5 MB. Even though this size is not the same as humans, it is big enough to test the performance of our algorithm. We ill focus on the number of palindromic pairs collected, the number of -pair palindromic pairs, and the structural representation of this genome. The computer that e used is UNIX Sun Ultra 60 Workstation. We used C++ for implementation. 5. Analysis Time complexity is one of the main evaluation criteris. To measure the time complexity e listed main processes in Table. The time complexity for scanning and building a token table is O(LD)here LD is the size of a DNA sequence; the space complexity is O(LD LTK), here LTK is the length of a token. The space complexity can be reduced by storing pointers of the tokens. Sorting the token table takes O(TT log TT)here TT means the size of a token table. but generally LD is much bigger than the TT. So, the time complexity remains O(LD). CTT is also much smaller than the LD and TThere CTT means the size of combined token table. We therefore, can still claim that the time complexity of this algorithm is O(LD). Tsunoda [6] introduced an algorithm ith O(N log N)here N is the size of DNA sequence. Logically our algorithm has a linear time complexity for the length of a DNA sequence, but it is still affected by LTK hen LTK is significantly long. Table. Main processes Processing Execution time Memory space. Scanning and building token table O(LD) O(LD LTK). Sorting the token table O(TT log TT) O(TT). Building adjusted token table O(CTT) O(CTT) 4. Collecting palindromic pairs O(CTT log CTT) * LD is the size of a DNA sequence * LTK is a token size * TT is the size of token table * CTT is the size of a combined token table The folloing focus on the efficiency of PA: the number of long palindromic pairs. We introduced an other palindromic pattern algorithm (PPA) and a structural representation algorithm (SRA); e present the results of these algorithms also. The input sequence as E.coli K genome. Other options ere the token length (50), minimum length of a palindromic pair (50), minimum complement gap (0), maximum complement gap (ithout limitation), minimum ord gap (0), maximum ord gap (ithout limitation). We chose the token length as 50 because this length made reasonably size of token table. We did not assign any constraints to the gaps and errors to simplify our experiment. The base counts ere,4,6 for a,79,4 for c,76,775 for g and,40,877 for t. There as no other characters other that the four base characters. The longest ord pair length as,456hich is unusual in random sequences; in a random sequence ith a length of 4,69, the maximum ord pair length is : the maximum ord length in the random case is log 4,69,. The number of ord pairs ith length 986 and longer as 4, the number of ord pairs ith length beteen 58 and 985 as 7, and the number of ord pairs ith length beteen 50 and 57 as 06. The total number of ord pair that is longer than 50 as 7. The results from E.coli K- ere surprising because there ere a lot of very long palindromic pairs. We confirmed that is generally the longest ord pair length in a random sequence of length 4,69,. Three random sequences ith 4,69, characters ere generated, and tested; from all of them the longest ord length as. Intuitively, in the random case the average distance of repetition of a ord length of,456 is 4 456, here 4 is the number of different characters. The probability of the repetition of the ord in a sequence length of 4,69, is 469/ Since e are looking for a arbitrary length of palindromic pair ith in a sequence e divide the value by ; the probability of having a palindromic pair length of,456 in a random sequence length of 4,69, is (469 )/ This proves that E.coli K is not a random sequence. E.coli o57 [], and Salmonella [] also had much longer palindromic pairshich proved that these sequences are not random as ell. The number of pattern type 0 to 4 are listed in the Table. The Total pattern number in three-dimensions as,080,76. The bigger number means that there are more multi pair palindromic patterns in a string that are of interest to a biologist. The actual running time on a UNIX Sun Ultra 60 Workstation as.04,.0, and.0 minutes for three tests (average as.0minutes)here.7,.5, and.49 minutes ere for the time of collecting palindromic pairs (average is.40 minutes).

10 Table. Summarized results of E.coli K Description Value length 50 Minimum length of a palindromic pair 50 Minimum complement gap 0 Maximum complement gap Without limitation Minimum ord gap 0 Maximum ord gap Without limitation Total DNA sequence 4,69, a,4,6 c,79,4 g,76,775 t,40,877 others 0 longest pair length,456 longer than or equal to Beteen 58 and Beteen 50 and Totol palindromic pairs 7 Type 0, 5,488 Type, 46,77 Type,,6 Type, 07,06 Type 4, 0,55 Type 5, 5,7 Type 6,,547 Type 7, 95,46 Type 8, 95,955 Type 9, 88,0 Type 0, 0,65 Type, 85,99 Type, 06,084 Type, 67,94 Type 4, 40,977 Total palindromic pattern,080,76 We put similar patterns together and shoed the results in Table. The explanation for the pattern similarity as explained in section... Table. Combine similar pattern types in E.coli K Description Value Pattern 0, 5,488 Pattern, 07,06 Pattern 4, 0,55 Pattern 9, 88,0 Pattern, 06,084 Pattern 4, 40,977 Pattern,5 (,) 7,964 Pattern,0 (,) 45,7 Pattern 6,7,8 (,,) 0,98 Pattern, (,) 5,94 Total Pattern,080,76 We chose 0 longest palindromic patterns as shon in Table 4. The number 0 as arbitrary picked. We identified each pair by index; e presented the ord length, start and end positions of a ord and their complements from the left. The end position of a ord and its complement can be calculated by adding the start position and ord length. To check the overlap e presented the end position in the table also. Index Table 4. Long ord pairs in E.coli Word Length Start position Word End position Complement Start position End position These 0 pairs can identify different types (=9 7 5, this formula as shon in Section..). We represented the ords and complements in the order of positions. Index 9 appeared first and then index 8 and so on. Each number represents the index of each ord and complement (the first number out of to identical numbers represents a ord and the second one represents its complement) Instead of indexing by the order of palindrome length, e changed the index.r.t the position of ords. We call this sequence a 0-pair palindromic pattern structural representation of E.coli K As e got this structural representation from other to genomes (E.coli o57 and Salmonella) e could observe that they are different from each other. Even though e could not prove that this value can identify genomes theoretically, it appeared that different genomes have different structural representations. Jensen visualized complete microbial chromosomes in a circle so that repetitive sequences in base composition or DNA structure became visible [0]. We borroed their

11 idea and presented some longest pairs to see the DNA structure. Dotting all the palindromic pairs can confuse us and e are more interested in the structure of some very long pairs. The five longest ord pairs ere doted in a circle in Figure 0. One of the biggest surprises in genetics is a loop in mrna. It seems that Eucaryotic genes contain loads of junk messages - non-coding sequences. The genes themselves are fragmented [5]. E.coli K- is not a Eucaryotic genome, but is has similar loops. The palindromic pairs st, nd, and 5 th are close together; therefore, matching them can generate small loops. Gene is symmetric about the origin of replication that indicates that matching sequences tend to occur at the same distance from the origin [6]. Hoever the circle ith the five longest ord pairs in Figure 0 does not sho any symmetric pattern. The only conclusion that e can find is that rd and 4 th pairs sho somehat symmetric pattern. E.coli 057 and salmonella did not sho symmetry neither. Origin 4,000,000,000,000 Figure 0. Word pairs dotted in a circle 6. Related ork 0 st pair nd pair rd pair 4 th pair 5 th pair,000,000,000,000 Bailey [5] developed a program that can detect palindromes; but, because it only applies a complementary symmetry matrix for each position, it could not detect the general palindromic sequences ith gaps of arbitrary length. In his algorithm, the time complexity and memory complexity exploadhen arbitrary gaps and lengths ere considered. To solve this problem Tsunoda [6] devised an algorithm that extracts variable length palindromes alloing variable length of gaps. They classified repetitive patterns as four cases: () direct repeats, () trans-strand repeats, () backard repeats, (4) inverted repeats (palindrome). They clearly described the necessity of various type repetitive patterns. Their algorithm searched for patterns of those four types of ith O(N log N) time complexityhere N as the size of sequences; they found exact matches. Porto [] alloed errors ithin palindromes; that is, they introduced approximate palindromes. Their algorithm found all approximate complements ith K errors of a certain ord. The difference is that our algorithm finds a complement for each ord. There have been parallelpalindrome-searching algorithms that find all palindromes of a certain initial ord alloing complement gaps [,]. Jensen [0] presented a method for visualizing complete microbial chromosomes so that repetitive sequences (direct repeats, inverted repeats etc) or DNA structure became visible. It is difficult to get an overvie of a complete genome due to its size. They used an atlas (a colored circle) for representing sequences and structures. In our research e plotted palindromic pairs. We hope this can give an insight toards the symmetric of a genome. Eisen [6] found symmetry around the replication origin and terminus; that is, the distance of a particular conserved feature (DNA or protein) from the replication origin is conserved beteen closely related pairs of species. They also found statistically significant x-shaped patterns ithin some genomes, indicating that there is symmetry about the replication origin. Hoever in our structural representation and plotting of palindromic pairs the results did not sho clear symmetric shape. 7. Concluding remarks We found variable length palindromic pairs instead of fixed length pairs. The method of finding fixed length of palindromic pairs has the disadvantage of setting the lengthhere the length that has biological meaning is not knon. Even though e can allo gaps e did not assign any constraints. We also collected overlapped palindromes. Some errors ere alloed in a ord or a complement because of mismatches in the form of gaps and defects in a sequence itself [4,7]. We propose a ne palindrome algorithm (PA) using a Break and Gather mechanism. Surprisingly this algorithm can collect palindromic pairs ith time complexity of O(LD)here LD is the length of a DNA sequencehich is faster than the O(LD log LD) in [6]. PA consists of four major processes: scanning these sequences and building a token table, sorting the token table, building combined token table, and collecting palindromic pairs. Another advantage of this Break and Gather method is its easy adaptability to other pattern searching problems. We tested our algorithm against the ell-knon E.coli K- genome, as ell as other sequences. The longest palindromic pair found as much longer than the maximum lengths of the ord pairs that can happen in random sequences and the number of long palindromic pairs as much more than e expected. The longest ord pair length is,456 (the maximum length of random palindromic pairs for sequences of (their length) is

12 expected to be ). There are 4 pairs ith length 986 or longer and 7 pairs ith length beteen 58 and 985. We concluded that the sequences of E.coli K are not random sequences, because the probability of having pairs size of,456 significantly lo. E.coli o57 and salmonella ere not random as ell. By composing multiple palindromic pairse could generate multiple pair palindromic patterns. We proposed a palindromic pattern algorithm (PPA) that could generate all possible combinations of n-pair palindromic pairs. PPA can be applied for generating any n-pair patterns. By using this e counted the existing -pair palindromic patterns. In additione picked 0 longest palindromic pairs and represented them as 0-pair palindromic patterns. We also presented structural representation algorithm for this (SRA). The structural representations from different DNA sequences (E.coli K, E.coli 057, and salmonella) yielded different structures. Soe assume that this method can identify different genomes. We also dotted the five longest palindromic pairs in a circle that represents a sequence to see the symmetric of a genome. Some research shoed symmetry of a genome [6], but upon this approach E.coli K, E.coli 057, and salmonella did not sho this symmetric shape. Our algorithm is not a complete algorithm; therefore, in some very artificial sequences even though there exists a palindromic pairs, our algorithm could not detect it. To compensate this eakness e proposed another algorithm briefly; therefore, implementing this ne approach is one of our future orks. Mismatches in subsequences are not considered; e focused on extracting completely matched subsequences, since, in general it is difficult to estimate ho mismatches affect the forming the structure of nucleic acids. Our Break and Gather method also can be used for finding direct repeats ith O(N) time complexityhere N is the length of a sequence. Our algorithm can be modified to scan RNA and Protein by alloing other characters such as U. We can extend this research to a human genome. Furthermoree need to examine hether our structural representation scheme for identifying DNA sequences by studying multiple sequences experimentally. 8. Acknoledgement We thank Timothy J. Atkinson for his valuable comments. 9. References [] A. Apostolico, D. Breslauer, and Z. Galil, Parallel Detection of All Palindromes in a String, Theoretical Computer Science, 4:, 995, 6-7. [] A.H.L. Porto and V.C. Barbosa, Finding Approximate Palindromes in Strings, Pattern Recognition 5, 00, [] D. Breslauer and Z. Galil, Finding All Periods and Initial Palindromes of a string in parallel, Algorithmica 4:4, 995, [4] D. Gusfield, Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press, Ne York, NY, 997. [5] Gonick, Larry, and M. Wheelis. The Cartoon Guide To Genetics. Ne York: Barnes & Noble, c [6] J.A. Eisen, J.F. Heidelberg, O. White, and S.L. Salzberg, Evidence for Symmetric Chromosomal Inversions Around the Replication Origin in Bacteria, Genome Biology (6), 000. [7] J. Jurka, Origin and Evaluation of Alu Repetitive Elements, in R.J. Maraia (Ed.), The Impact of Short Interspersed Element (SINEs) on the Host Genome, R.G. Landes, Ne York, NY, 995, pp 5-4. [8] J.T.L. Wang, Discovering Active Motifs in Sets of Related Protein Sequences and Using Them for Classification, Nucleic Acids Research, (4), 994, [9] K. Shishido, N. Komiyama, and S. Ikaa, Increased Production of a Knotted Form of Plasmid pbr DNA in Escherichia coli DNA Topisomeraes Mutants, Journal of Molecular Biology, 00, pp [0] L.J. Jensen, C. Friis, and D.W. Ussery, Three Vie of Microbial Genomes, Res. Microbiol. 50, 999, [] National Center for Biotechnology Information, Complete genome sequence of Salmonella enterica serovar Typhimurium LT, 00, nucleotide&list_uids=67690&dopt=genbank, 00. [] National Center for Biotechnology Information, Genome sequence of enterohaemorrhagic Escherichia coli O57, 00, t_uids=6445&dopt=genbank, 00. [] National Center for Biotechnology Information. The Complete Genome Sequence of Escherichia coli K, 997, t_uids=67994&dopt=genbank, Apr. 00. [4] T.H. Cormen, C.E. Leiserson, and R.L. Rivest. Introduction to Algorithms. Ne York: McGra-Hill, [5] T.L. Bailey, Discovering motifs in DNA and protein sequences. Univ. of California at Sandiego (Ph.D. dissertation), 995. [6] T. Tsunoda, M. Fukagaa, and T. Takagi, Time and Memory Efficient Algorithm for Extracting Palindromic and Repetitive Subsequences in Nucleic Acid Sequences, Pacific Symposium on Biocomputing 4, 999, pp. 0-. [7] X. Guan and E.C. Uberbacher, A Fast Look-Up Algorithm for Detecting Repetitive DNA Sequences, Pacific Symposium on Biocomputing, Singapore, 996,

Discover Activity. Think It Over Inferring Do you think height in humans is controlled by a single gene, as it is in peas? Explain your answer.

Discover Activity. Think It Over Inferring Do you think height in humans is controlled by a single gene, as it is in peas? Explain your answer. Section Human Inheritance Reading Previe Key Concepts What are some patterns of inheritance in humans? What are the functions of the sex chromosomes? What is the relationship beteen genes and the environment?

More information

Generalized idea of Palindrome Pattern Matching: Gapped Palindromes

Generalized idea of Palindrome Pattern Matching: Gapped Palindromes I J C T A, 9(17) 2016, pp. 8635-8642 International Science Press Generalized idea of Palindrome Pattern Matching: Gapped Palindromes Shreyi Mittal * and Sunita Yadav ** ABSTRACT The process of palindrome

More information

Comfort, the Intelligent Home System. Comfort Scene Control Switch

Comfort, the Intelligent Home System. Comfort Scene Control Switch Comfort, the Intelligent Home System Comfort Scene Control Sitch Introduction...1 Specifications...3 Equipment...3 Part Numbers...3 Identifying the SCS Version...4 Contents...4 Assembly...4 Settings...5

More information

LOG- LINEAR ANALYSIS OF FERTILITY USING CENSUS AND SURVEY DATA WITH AN EXAMPLE

LOG- LINEAR ANALYSIS OF FERTILITY USING CENSUS AND SURVEY DATA WITH AN EXAMPLE LOG- LIEAR AALYSIS OF FERTILITY USIG CESUS AD SURVEY DATA WITH A EXAMPLE I. Elaine Allen and Roger C. Avery, Cornell University The use of log -linear models is relatively ne to the field of demography,

More information

Genetic Variations. F1 Generation. Mechanisms of Genetics W W. STAAR Biology: Assessment Activities

Genetic Variations. F1 Generation. Mechanisms of Genetics W W. STAAR Biology: Assessment Activities male parent female parent sperm cells egg cells F1 Generation Mechanisms of Genetics 181 182 Mechanisms of Genetics Teacher Pages Purpose The purpose of this activity is to reinforce students understanding

More information

pain" counted as "two," and so forth. The sum of point, the hour point, and so forth. These data are

pain counted as two, and so forth. The sum of point, the hour point, and so forth. These data are FURTHER STUDIES ON THE "PHARMACOLOGY" OF PLACEBO ADMINISTRATION 1 BY LOUIS LASAGNA, VICTOR G. LATIES, AND J. LAWRENCE DOHAN (From the Department of Medicine (Division of Clinical Pharmacology), and the

More information

Review Questions in Introductory Knowledge... 37

Review Questions in Introductory Knowledge... 37 Table of Contents Preface..... 17 About the Authors... 19 How This Book is Organized... 20 Who Should Buy This Book?... 20 Where to Find Answers to Review Questions and Exercises... 20 How to Report Errata...

More information

Searching for unusual clusters of palindromes and close inversions in the SARS virus genome. June 4, 2003

Searching for unusual clusters of palindromes and close inversions in the SARS virus genome. June 4, 2003 1/33 Searching for unusual clusters of palindromes and close inversions in the SARS virus genome June 4, 2003 Kwok-Pui Choi Department of Mathematics, NUS Department of Statistics and Applied Probability

More information

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data Breast cancer Inferring Transcriptional Module from Breast Cancer Profile Data Breast Cancer and Targeted Therapy Microarray Profile Data Inferring Transcriptional Module Methods CSC 177 Data Warehousing

More information

Chapter 3 Software Packages to Install How to Set Up Python Eclipse How to Set Up Eclipse... 42

Chapter 3 Software Packages to Install How to Set Up Python Eclipse How to Set Up Eclipse... 42 Table of Contents Preface..... 21 About the Authors... 23 Acknowledgments... 24 How This Book is Organized... 24 Who Should Buy This Book?... 24 Where to Find Answers to Review Questions and Exercises...

More information

Part I. Boolean modelling exercises

Part I. Boolean modelling exercises Part I. Boolean modelling exercises. Glucose repression of Icl in yeast In yeast Saccharomyces cerevisiae, expression of enzyme Icl (isocitrate lyase-, involved in the gluconeogenesis pathway) is important

More information

Chapter 7: Descriptive Statistics

Chapter 7: Descriptive Statistics Chapter Overview Chapter 7 provides an introduction to basic strategies for describing groups statistically. Statistical concepts around normal distributions are discussed. The statistical procedures of

More information

μ i = chemical potential of species i C i = concentration of species I

μ i = chemical potential of species i C i = concentration of species I BIOE 459/559: Cell Engineering Membrane Permeability eferences: Water Movement Through ipid Bilayers, Pores and Plasma Membranes. Theory and eality, Alan Finkelstein, 1987 Membrane Permeability, 100 Years

More information

J. E. R. STADDON DUKE UNIVERSITY. The relative inability of the usual differential. to ask whether performance under DRL schedules

J. E. R. STADDON DUKE UNIVERSITY. The relative inability of the usual differential. to ask whether performance under DRL schedules JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 1969, 12, 27-38 NUMBER I (JANUARY) THE EFFECT OF INFORMATIVE FEEDBACK ON TEMPORAL TRACKING IN THE PIGEON' J. E. R. STADDON DUKE UNIVERSITY Pigeons emitted

More information

Tellado, J: Quasi-Minimum-BER Linear Combiner Equalizers 2 The Wiener or Minimum Mean Square Error (MMSE) solution is the most often used, but it is e

Tellado, J: Quasi-Minimum-BER Linear Combiner Equalizers 2 The Wiener or Minimum Mean Square Error (MMSE) solution is the most often used, but it is e Quasi-Minimum-BER Linear Combiner Equalizers Jose Tellado and John M. Cio Information Systems Laboratory, Durand 112, Stanford University, Stanford, CA 94305-9510 Phone: (415) 723-2525 Fax: (415) 723-8473

More information

Exploratory Approach for Modeling Human Category Learning

Exploratory Approach for Modeling Human Category Learning Exploratory Approach for Modeling Human Category Learning Toshihiko Matsuka (matsuka@psychology.rutgers.edu RUMBA, Rutgers University Neark Abstract One problem in evaluating recent computational models

More information

How to Create Better Performing Bayesian Networks: A Heuristic Approach for Variable Selection

How to Create Better Performing Bayesian Networks: A Heuristic Approach for Variable Selection How to Create Better Performing Bayesian Networks: A Heuristic Approach for Variable Selection Esma Nur Cinicioglu * and Gülseren Büyükuğur Istanbul University, School of Business, Quantitative Methods

More information

Principals of Object Perception

Principals of Object Perception Principals of Object Perception Elizabeth S. Spelke COGNITIVE SCIENCE 14, 29-56 (1990) Cornell University Summary Infants perceive object by analyzing tree-dimensional surface arrangements and motions.

More information

Identification of Tissue Independent Cancer Driver Genes

Identification of Tissue Independent Cancer Driver Genes Identification of Tissue Independent Cancer Driver Genes Alexandros Manolakos, Idoia Ochoa, Kartik Venkat Supervisor: Olivier Gevaert Abstract Identification of genomic patterns in tumors is an important

More information

Blast Searcher Formative Evaluation. March 02, Adam Klinger and Josh Gutwill

Blast Searcher Formative Evaluation. March 02, Adam Klinger and Josh Gutwill Blast Searcher Formative Evaluation March 02, 2006 Keywords: Genentech Biology Life Sciences Adam Klinger and Josh Gutwill Traits of Life \ - 1 - BLAST Searcher Formative Evaluation March 2, 2006 Adam

More information

Daily Warm-Up and Fundamental Exercises 2016 VIRGINIA TECH TRUMPET FESTIVAL DR. J. PEYDEN SHELTON

Daily Warm-Up and Fundamental Exercises 2016 VIRGINIA TECH TRUMPET FESTIVAL DR. J. PEYDEN SHELTON c q = 60 On Mouthpiece q = 80 4 2 Daily Warm-p and Fundamental Exercises 2016 VIRGINIA TECH TRMPET FESTIVAL DR J PEYDEN SHELTON Section 1: Mouthpiece Buzzing Buzzing the mouthpiece allos the player to

More information

Pooling Subjective Confidence Intervals

Pooling Subjective Confidence Intervals Spring, 1999 1 Administrative Things Pooling Subjective Confidence Intervals Assignment 7 due Friday You should consider only two indices, the S&P and the Nikkei. Sorry for causing the confusion. Reading

More information

Fatigued? Or fit for work? How to tell if your workers are tired enough to make mistakes and how to prevent this happening

Fatigued? Or fit for work? How to tell if your workers are tired enough to make mistakes and how to prevent this happening Fatigued? Or fit for ork? Ho to tell if your orkers are tired enough to make mistakes and ho to prevent this happening Safetree Fatigued? Or fit for ork? u 1 Tired orkers are more likely to have accidents

More information

Direct memory access using two cues: Finding the intersection of sets in a connectionist model

Direct memory access using two cues: Finding the intersection of sets in a connectionist model Direct memory access using two cues: Finding the intersection of sets in a connectionist model Janet Wiles, Michael S. Humphreys, John D. Bain and Simon Dennis Departments of Psychology and Computer Science

More information

MATCHING LAYER DESIGN OF AN ULTRASONIC TRANS- DUCER FOR WIRELESS POWER TRANSFER SYSTEM

MATCHING LAYER DESIGN OF AN ULTRASONIC TRANS- DUCER FOR WIRELESS POWER TRANSFER SYSTEM MATCHIN LAYE DESIN OF AN ULTASONIC TANS- DUCE FO WIELESS POWE TANSFE SYSTEM unn Hang Electronics and Telecommunications esearch Institute, Multidisciplinary Sensor esearch roup, Daejeon, South Korea email:

More information

Alternative Splicing and Genomic Stability

Alternative Splicing and Genomic Stability Alternative Splicing and Genomic Stability Kevin Cahill cahill@unm.edu http://dna.phys.unm.edu/ Abstract In a cell that uses alternative splicing, the total length of all the exons is far less than in

More information

BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA

BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA PART 1: Introduction to Factorial ANOVA ingle factor or One - Way Analysis of Variance can be used to test the null hypothesis that k or more treatment or group

More information

MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks

MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1 Lecture 27: Systems Biology and Bayesian Networks Systems Biology and Regulatory Networks o Definitions o Network motifs o Examples

More information

RNA and Protein Synthesis Guided Notes

RNA and Protein Synthesis Guided Notes RNA and Protein Synthesis Guided Notes is responsible for controlling the production of in the cell, which is essential to life! o DNARNAProteins contain several thousand, each with directions to make

More information

Discrimination and Generalization in Pattern Categorization: A Case for Elemental Associative Learning

Discrimination and Generalization in Pattern Categorization: A Case for Elemental Associative Learning Discrimination and Generalization in Pattern Categorization: A Case for Elemental Associative Learning E. J. Livesey (el253@cam.ac.uk) P. J. C. Broadhurst (pjcb3@cam.ac.uk) I. P. L. McLaren (iplm2@cam.ac.uk)

More information

3D Morphological Tumor Analysis Based on Magnetic Resonance Images

3D Morphological Tumor Analysis Based on Magnetic Resonance Images 3D Morphological Tumor Analysis Based on Magnetic Resonance Images Sirwoo Kim Georgia Institute of Technology The Wallace H. Coulter Department of Biomedical Engineering, Georgia. Abstract In this paper,

More information

Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation

Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation Ryo Izawa, Naoki Motohashi, and Tomohiro Takagi Department of Computer Science Meiji University 1-1-1 Higashimita,

More information

CONSTELLATIONS AND LVT

CONSTELLATIONS AND LVT CONSTELLATIONS AND LVT Constellations appears in our comparative grid as innovation in natural systems, associated with emergence and self-organisation. It is based on ideas of natural harmony and relates

More information

How to build robots that make friends and influence people

How to build robots that make friends and influence people Ho to build robots that make friends and influence people Cynthia Breazeal Brian Scassellati cynthia@ai.mit.edu scaz@ai.mit.edu MIT Artificial Intelligence Lab 545 Technology Square Cambridge, MA 2139

More information

Tactile perception of sequentially presented spatial patterns1

Tactile perception of sequentially presented spatial patterns1 Tactile perception of sequentially presented spatial patterns JAMESCBLSS, STANFORD RESEARCH NSTTUTE AND STANFORD UNVERSTY HEWTTD CRANE,STEPHENW LNK ANDJAMEST TOWNSEND, STANFORD RESEARCH NSTTUTE Tactile

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc. Variant Classification Author: Mike Thiesen, Golden Helix, Inc. Overview Sequencing pipelines are able to identify rare variants not found in catalogs such as dbsnp. As a result, variants in these datasets

More information

Take-Home Final Exam: Mining Regulatory Modules from Gene Expression Data

Take-Home Final Exam: Mining Regulatory Modules from Gene Expression Data Take-Home Final Exam: Mining Regulatory Modules from Gene Expression Data 36-350, Data Mining; Fall 2009 Due at 5 pm on Tuesday, 15 December 2009 There are three problems, each with several parts. All

More information

HAEMOLYTIC DISEASE OF THE NEWBORN

HAEMOLYTIC DISEASE OF THE NEWBORN SERUM ENZYME CTVTY N PREMTURTY ND N HEMOLYTC DSESE OF THE NEWBORN BY M. BREND MORRS and J. KNG From the Paediatric and Pathology Departments, North Lonsdale Hospital, the Barro and Furness Hospital Group

More information

Incorporating quantitative information into a linear ordering" GEORGE R. POTTS Dartmouth College, Hanover, New Hampshire 03755

Incorporating quantitative information into a linear ordering GEORGE R. POTTS Dartmouth College, Hanover, New Hampshire 03755 Memory & Cognition 1974, Vol. 2, No.3, 533 538 Incorporating quantitative information into a linear ordering" GEORGE R. POTTS Dartmouth College, Hanover, New Hampshire 03755 Ss were required to learn linear

More information

Central Algorithmic Techniques. Iterative Algorithms

Central Algorithmic Techniques. Iterative Algorithms Central Algorithmic Techniques Iterative Algorithms Code Representation of an Algorithm class InsertionSortAlgorithm extends SortAlgorithm { void sort(int a[]) throws Exception { for (int i = 1; i < a.length;

More information

PO Box 19015, Arlington, TX {ramirez, 5323 Harry Hines Boulevard, Dallas, TX

PO Box 19015, Arlington, TX {ramirez, 5323 Harry Hines Boulevard, Dallas, TX From: Proceedings of the Eleventh International FLAIRS Conference. Copyright 1998, AAAI (www.aaai.org). All rights reserved. A Sequence Building Approach to Pattern Discovery in Medical Data Jorge C. G.

More information

Choose an approach for your research problem

Choose an approach for your research problem Choose an approach for your research problem This course is about doing empirical research with experiments, so your general approach to research has already been chosen by your professor. It s important

More information

Evolutionary Programming

Evolutionary Programming Evolutionary Programming Searching Problem Spaces William Power April 24, 2016 1 Evolutionary Programming Can we solve problems by mi:micing the evolutionary process? Evolutionary programming is a methodology

More information

An Escalation Model of Consciousness

An Escalation Model of Consciousness Bailey!1 Ben Bailey Current Issues in Cognitive Science Mark Feinstein 2015-12-18 An Escalation Model of Consciousness Introduction The idea of consciousness has plagued humanity since its inception. Humans

More information

Target-to-distractor similarity can help visual search performance

Target-to-distractor similarity can help visual search performance Target-to-distractor similarity can help visual search performance Vencislav Popov (vencislav.popov@gmail.com) Lynne Reder (reder@cmu.edu) Department of Psychology, Carnegie Mellon University, Pittsburgh,

More information

T. R. Golub, D. K. Slonim & Others 1999

T. R. Golub, D. K. Slonim & Others 1999 T. R. Golub, D. K. Slonim & Others 1999 Big Picture in 1999 The Need for Cancer Classification Cancer classification very important for advances in cancer treatment. Cancers of Identical grade can have

More information

Performance Testing of a Semi-Hermetic Compressor with HFC-236ea and CFC-114 at Chiller Conditions

Performance Testing of a Semi-Hermetic Compressor with HFC-236ea and CFC-114 at Chiller Conditions Purdue University Purdue e-pubs International Refrigeration and Air Conditioning Conference School of Mechanical Engineering 1994 Performance Testing of a Semi-Hermetic Compressor ith HFC-236ea and CFC-114

More information

Polyomaviridae. Spring

Polyomaviridae. Spring Polyomaviridae Spring 2002 331 Antibody Prevalence for BK & JC Viruses Spring 2002 332 Polyoma Viruses General characteristics Papovaviridae: PA - papilloma; PO - polyoma; VA - vacuolating agent a. 45nm

More information

Functionalist theories of content

Functionalist theories of content Functionalist theories of content PHIL 93507 April 22, 2012 Let s assume that there is a certain stable dependence relation between the physical internal states of subjects and the phenomenal characters

More information

An application of topological relations of fuzzy regions with holes

An application of topological relations of fuzzy regions with holes Chapter 5 An application of topological relations of fuzzy regions with holes 5.1 Introduction In the last two chapters, we have provided theoretical frameworks for fuzzy regions with holes in terms of

More information

Classification of Autism Spectrum Disorder Using Supervised Learning of Brain Connectivity Measures Extracted from Synchrostates

Classification of Autism Spectrum Disorder Using Supervised Learning of Brain Connectivity Measures Extracted from Synchrostates Classification of Autism Spectrum Disorder Using Supervised Learning of Brain Connectivity Measures Extracted from Synchrostates Abstract: Objective. The paper investigates the presence of autism using

More information

Chapter 11. Experimental Design: One-Way Independent Samples Design

Chapter 11. Experimental Design: One-Way Independent Samples Design 11-1 Chapter 11. Experimental Design: One-Way Independent Samples Design Advantages and Limitations Comparing Two Groups Comparing t Test to ANOVA Independent Samples t Test Independent Samples ANOVA Comparing

More information

The ability of different strains of rats to acquire a visual nonmatching-to-sample task

The ability of different strains of rats to acquire a visual nonmatching-to-sample task Psychobiology 1996. 24 (1). 44-48 The ability of different strains of rats to acquire a visual nonmatching-to-sample task JOHN P. AGGLETON University of Wales, CoUege of Cardiff, Cardiff, Wales Rats offour

More information

Reactive agents and perceptual ambiguity

Reactive agents and perceptual ambiguity Major theme: Robotic and computational models of interaction and cognition Reactive agents and perceptual ambiguity Michel van Dartel and Eric Postma IKAT, Universiteit Maastricht Abstract Situated and

More information

A model of parallel time estimation

A model of parallel time estimation A model of parallel time estimation Hedderik van Rijn 1 and Niels Taatgen 1,2 1 Department of Artificial Intelligence, University of Groningen Grote Kruisstraat 2/1, 9712 TS Groningen 2 Department of Psychology,

More information

Bayesian (Belief) Network Models,

Bayesian (Belief) Network Models, Bayesian (Belief) Network Models, 2/10/03 & 2/12/03 Outline of This Lecture 1. Overview of the model 2. Bayes Probability and Rules of Inference Conditional Probabilities Priors and posteriors Joint distributions

More information

Enumerative and Analytic Studies. Description versus prediction

Enumerative and Analytic Studies. Description versus prediction Quality Digest, July 9, 2018 Manuscript 334 Description versus prediction The ultimate purpose for collecting data is to take action. In some cases the action taken will depend upon a description of what

More information

Maximum Likelihood ofevolutionary Trees is Hard p.1

Maximum Likelihood ofevolutionary Trees is Hard p.1 Maximum Likelihood of Evolutionary Trees is Hard Benny Chor School of Computer Science Tel-Aviv University Joint work with Tamir Tuller Maximum Likelihood ofevolutionary Trees is Hard p.1 Challenging Basic

More information

Warm Up. one simple guide one simple warm up. 1. Air. Exercise

Warm Up. one simple guide one simple warm up. 1. Air. Exercise Warm p one simple guide one simple arm up George Krimperis gkrimperis.com A fe things before starting the arm-up The arm-up, rather than the practice session, is a sort of a reminder of ho e have to play.

More information

Comment on McLeod and Hume, Overlapping Mental Operations in Serial Performance with Preview: Typing

Comment on McLeod and Hume, Overlapping Mental Operations in Serial Performance with Preview: Typing THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 1994, 47A (1) 201-205 Comment on McLeod and Hume, Overlapping Mental Operations in Serial Performance with Preview: Typing Harold Pashler University of

More information

IEEE SIGNAL PROCESSING LETTERS, VOL. 13, NO. 3, MARCH A Self-Structured Adaptive Decision Feedback Equalizer

IEEE SIGNAL PROCESSING LETTERS, VOL. 13, NO. 3, MARCH A Self-Structured Adaptive Decision Feedback Equalizer SIGNAL PROCESSING LETTERS, VOL 13, NO 3, MARCH 2006 1 A Self-Structured Adaptive Decision Feedback Equalizer Yu Gong and Colin F N Cowan, Senior Member, Abstract In a decision feedback equalizer (DFE),

More information

Learning to Use Episodic Memory

Learning to Use Episodic Memory Learning to Use Episodic Memory Nicholas A. Gorski (ngorski@umich.edu) John E. Laird (laird@umich.edu) Computer Science & Engineering, University of Michigan 2260 Hayward St., Ann Arbor, MI 48109 USA Abstract

More information

Exploration and Exploitation in Reinforcement Learning

Exploration and Exploitation in Reinforcement Learning Exploration and Exploitation in Reinforcement Learning Melanie Coggan Research supervised by Prof. Doina Precup CRA-W DMP Project at McGill University (2004) 1/18 Introduction A common problem in reinforcement

More information

A Hierarchical Artificial Neural Network Model for Giemsa-Stained Human Chromosome Classification

A Hierarchical Artificial Neural Network Model for Giemsa-Stained Human Chromosome Classification A Hierarchical Artificial Neural Network Model for Giemsa-Stained Human Chromosome Classification JONGMAN CHO 1 1 Department of Biomedical Engineering, Inje University, Gimhae, 621-749, KOREA minerva@ieeeorg

More information

Frequency Tracking: LMS and RLS Applied to Speech Formant Estimation

Frequency Tracking: LMS and RLS Applied to Speech Formant Estimation Aldebaro Klautau - http://speech.ucsd.edu/aldebaro - 2/3/. Page. Frequency Tracking: LMS and RLS Applied to Speech Formant Estimation ) Introduction Several speech processing algorithms assume the signal

More information

Chapter 14 Part One Biotechnology and Industry: Microbes at Work

Chapter 14 Part One Biotechnology and Industry: Microbes at Work Chapter 14 Part One Biotechnology and Industry: Microbes at Work Objectives: After reading Chapter 14, you should understand How biotechnology has resulted in numerous pharmaceutical products to help lessen

More information

Warm up, daily routine & technique (fundamentals)

Warm up, daily routine & technique (fundamentals) Warm up, daily routine & technique (fundamentals) by Simon Kräuter This is a collection of exercises that I like to do I did not invent them or compose them Most of these exercises are from etude books

More information

IDENTIFICATION OF C-REACTIVE PROTEIN FROM GINGIVAL CREVICULAR FLUID IN SYSTEMIC DISEASE

IDENTIFICATION OF C-REACTIVE PROTEIN FROM GINGIVAL CREVICULAR FLUID IN SYSTEMIC DISEASE IDENTIFICATION OF C-REACTIVE PROTEIN FROM GINGIVAL CREVICULAR FLUID IN SYSTEMIC DISEASE Amelia S * Department of Periodontology, Faculty of Dental Medicine University of Medicine and Pharmacy "Grigore

More information

RECALL OF PAIRED-ASSOCIATES AS A FUNCTION OF OVERT AND COVERT REHEARSAL PROCEDURES TECHNICAL REPORT NO. 114 PSYCHOLOGY SERIES

RECALL OF PAIRED-ASSOCIATES AS A FUNCTION OF OVERT AND COVERT REHEARSAL PROCEDURES TECHNICAL REPORT NO. 114 PSYCHOLOGY SERIES RECALL OF PAIRED-ASSOCIATES AS A FUNCTION OF OVERT AND COVERT REHEARSAL PROCEDURES by John W. Brelsford, Jr. and Richard C. Atkinson TECHNICAL REPORT NO. 114 July 21, 1967 PSYCHOLOGY SERIES!, Reproduction

More information

Unit 7 Comparisons and Relationships

Unit 7 Comparisons and Relationships Unit 7 Comparisons and Relationships Objectives: To understand the distinction between making a comparison and describing a relationship To select appropriate graphical displays for making comparisons

More information

ing the fixed-interval schedule-were observed during the interval of delay. Similarly, Ferster

ing the fixed-interval schedule-were observed during the interval of delay. Similarly, Ferster JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAIOR 1969, 12, 375-383 NUMBER 3 (MAY) DELA YED REINFORCEMENT ERSUS REINFORCEMENT AFTER A FIXED INTERAL' ALLEN J. NEURINGER FOUNDATION FOR RESEARCH ON THE NEROUS

More information

Exploring HIV Evolution: An Opportunity for Research Sam Donovan and Anton E. Weisstein

Exploring HIV Evolution: An Opportunity for Research Sam Donovan and Anton E. Weisstein Microbes Count! 137 Video IV: Reading the Code of Life Human Immunodeficiency Virus (HIV), like other retroviruses, has a much higher mutation rate than is typically found in organisms that do not go through

More information

A Study on Cross-Population Age Estimation

A Study on Cross-Population Age Estimation A Study on Cross-opulation Age Estimation Guodong Guo LCSEE, West Virginia University guodong.guo@mai1.vu.edu Chao Zhang LCSEE, West Virginia University cazhang@mix.vu.edu Abstract We study the problem

More information

HARRISON ASSESSMENTS DEBRIEF GUIDE 1. OVERVIEW OF HARRISON ASSESSMENT

HARRISON ASSESSMENTS DEBRIEF GUIDE 1. OVERVIEW OF HARRISON ASSESSMENT HARRISON ASSESSMENTS HARRISON ASSESSMENTS DEBRIEF GUIDE 1. OVERVIEW OF HARRISON ASSESSMENT Have you put aside an hour and do you have a hard copy of your report? Get a quick take on their initial reactions

More information

Biologically-Inspired Human Motion Detection

Biologically-Inspired Human Motion Detection Biologically-Inspired Human Motion Detection Vijay Laxmi, J. N. Carter and R. I. Damper Image, Speech and Intelligent Systems (ISIS) Research Group Department of Electronics and Computer Science University

More information

Announcements. Perceptual Grouping. Quiz: Fourier Transform. What you should know for quiz. What you should know for quiz

Announcements. Perceptual Grouping. Quiz: Fourier Transform. What you should know for quiz. What you should know for quiz Announcements Quiz on Tuesday, March 10. Material covered (Union not Intersection) All lectures before today (March 3). Forsyth and Ponce Readings: Chapters 1.1, 4, 5.1, 5.2, 5.3, 7,8, 9.1, 9.2, 9.3, 6.5.2,

More information

PANDEMICS. Year School: I.S.I.S.S. M.Casagrande, Pieve di Soligo, Treviso - Italy. Students: Beatrice Gatti (III) Anna De Biasi

PANDEMICS. Year School: I.S.I.S.S. M.Casagrande, Pieve di Soligo, Treviso - Italy. Students: Beatrice Gatti (III) Anna De Biasi PANDEMICS Year 2017-18 School: I.S.I.S.S. M.Casagrande, Pieve di Soligo, Treviso - Italy Students: Beatrice Gatti (III) Anna De Biasi (III) Erica Piccin (III) Marco Micheletto (III) Yui Man Kwan (III)

More information

Genetically Generated Neural Networks I: Representational Effects

Genetically Generated Neural Networks I: Representational Effects Boston University OpenBU Cognitive & Neural Systems http://open.bu.edu CAS/CNS Technical Reports 1992-02 Genetically Generated Neural Networks I: Representational Effects Marti, Leonardo Boston University

More information

Guided Notes: Chromosomes. What type of macromolecule is DNA? DNA stands for: DNA is made up of repeating (the monomer of nucleic acids!

Guided Notes: Chromosomes. What type of macromolecule is DNA? DNA stands for: DNA is made up of repeating (the monomer of nucleic acids! Guided Notes: Chromosomes The Structure of DNA What type of macromolecule is DNA? DNA stands for: DNA is made up of repeating (the monomer of nucleic acids!) 1) Where is it found (use figures 1 and 2 to

More information

Chong Ho. flies through your eyes

Chong Ho. flies through your eyes Examining the Bubble Plot frame by frame for multi dimensional data visualization Chong Ho Yu, Ph.D. (2013) Azusa Pacific University chonghoyu@gamil.com http://www.creative wisdom.com/computer/sas/sas.html

More information

Is Cognitive Science Special? In what way is it special? Cognitive science is a delicate mixture of the obvious and the incredible

Is Cognitive Science Special? In what way is it special? Cognitive science is a delicate mixture of the obvious and the incredible Sept 3, 2013 Is Cognitive Science Special? In what way is it special? Zenon Pylyshyn, Rutgers Center for Cognitive Science Cognitive science is a delicate mixture of the obvious and the incredible What

More information

SAM Teacher s Guide Four Levels of Protein Structure

SAM Teacher s Guide Four Levels of Protein Structure SAM Teacher s Guide Four Levels of Protein Structure Overview Students explore how protein folding creates distinct, functional proteins by examining each of the four different levels of protein structure.

More information

A Survey of Bayesian Network Models for Decision Making System in Software Engineering

A Survey of Bayesian Network Models for Decision Making System in Software Engineering A Survey of Bayesian Netork Models for Decision Making System in Softare Engineering Nagesarao M. Research Scholar, CSE Department, Sri Krishnadevaraya University, Ananthapuram, India. N. Geethanjali,

More information

Running head: How large denominators are leading to large errors 1

Running head: How large denominators are leading to large errors 1 Running head: How large denominators are leading to large errors 1 How large denominators are leading to large errors Nathan Thomas Kent State University How large denominators are leading to large errors

More information

Cayley Graphs. Ryan Jensen. March 26, University of Tennessee. Cayley Graphs. Ryan Jensen. Groups. Free Groups. Graphs.

Cayley Graphs. Ryan Jensen. March 26, University of Tennessee. Cayley Graphs. Ryan Jensen. Groups. Free Groups. Graphs. Cayley Forming Free Cayley Cayley University of Tennessee March 26, 2014 Group Cayley Forming Free Cayley Definition A group is a nonempty set G with a binary operation which satisfies the following: (i)

More information

Query Length Impact on Misuse Detection in Information Retrieval Systems

Query Length Impact on Misuse Detection in Information Retrieval Systems uery Length Impact on Misuse Detection in Information Retrieval Systems Nazli Goharian and Ling Ma Information Retrieval Laboratory, Illinois Institute of Technology {goharian}@ir.iit.edu BSTRCT Misuse

More information

Information Design. Information Design

Information Design. Information Design Information Design Goal: identify methods for representing and arranging the objects and actions possible in a system in a way that facilitates perception and understanding Information Design Define and

More information

Numerical Integration of Bivariate Gaussian Distribution

Numerical Integration of Bivariate Gaussian Distribution Numerical Integration of Bivariate Gaussian Distribution S. H. Derakhshan and C. V. Deutsch The bivariate normal distribution arises in many geostatistical applications as most geostatistical techniques

More information

Computer Science, Biology, and Biomedical Informatics (CoSBBI) Outline. Molecular Biology of Cancer AND. Goals/Expectations. David Boone 7/1/2015

Computer Science, Biology, and Biomedical Informatics (CoSBBI) Outline. Molecular Biology of Cancer AND. Goals/Expectations. David Boone 7/1/2015 Goals/Expectations Computer Science, Biology, and Biomedical (CoSBBI) We want to excite you about the world of computer science, biology, and biomedical informatics. Experience what it is like to be a

More information

This document is a required reading assignment covering chapter 4 in your textbook.

This document is a required reading assignment covering chapter 4 in your textbook. This document is a required reading assignment covering chapter 4 in your textbook. Chromosomal basis of genes and linkage The majority of chapter 4 deals with the details of mitosis and meiosis. This

More information

Real Time Sign Language Processing System

Real Time Sign Language Processing System Real Time Sign Language Processing System Dibyabiva Seth (&), Anindita Ghosh, Ariruna Dasgupta, and Asoke Nath Department of Computer Science, St. Xavier s College (Autonomous), Kolkata, India meetdseth@gmail.com,

More information

Eukaryotic Gene Regulation

Eukaryotic Gene Regulation Eukaryotic Gene Regulation Chapter 19: Control of Eukaryotic Genome The BIG Questions How are genes turned on & off in eukaryotes? How do cells with the same genes differentiate to perform completely different,

More information

Genetics 1. How do genes influence our characteristics?

Genetics 1. How do genes influence our characteristics? Genetics 1 This activity will focus on the question: How do genes contribute to the similarities and differences between parents and their children? This question can be divided into two parts: How do

More information

Outlier Analysis. Lijun Zhang

Outlier Analysis. Lijun Zhang Outlier Analysis Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Extreme Value Analysis Probabilistic Models Clustering for Outlier Detection Distance-Based Outlier Detection Density-Based

More information

Chapter 10 Cell Growth and Division

Chapter 10 Cell Growth and Division Chapter 10 Cell Growth and Division 10 1 Cell Growth 2 Limits to Cell Growth The larger a cell becomes, the more demands the cell places on its DNA. In addition, the cell has more trouble moving enough

More information

Predictive Validity of a Robotic Surgery Simulator

Predictive Validity of a Robotic Surgery Simulator Predictive Validity of a Robotic Surgery Simulator Anirudh Pasupuleti SUID: 05833435 SCPD#: X120939 Introduction The primary goal of this project is to answer the question: does performance/training with

More information

Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA

Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA The uncertain nature of property casualty loss reserves Property Casualty loss reserves are inherently uncertain.

More information

The Chromosomes of a Frimpanzee: An Imaginary Animal

The Chromosomes of a Frimpanzee: An Imaginary Animal The Chromosomes of a Frimpanzee: An Imaginary Animal Introduction By now, you have heard the terms chromosome, mitosis, and meiosis. You probably also know that chromosomes contain genetic information

More information

Taking the journey through cancer

Taking the journey through cancer wellbeing your Taking the journey through cancer This month: Coping with cancer In this issue of Your Wellbeing, we re looking at cancer, including our emotions and how best to cope. For more information

More information