# LeetCode 187 Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

bit manipulation真得挺难的，遇到这类题一直没啥思路。。。

0 = 00 (bits in binary number system) = 'A'
1 = 01 (bits in binary number system) = 'C'
2 = 10 (bits in binary number system) = 'G'
3 = 11 (bits in binary number system) = 'T'

A A C C T C C G G T
00 00 01 01 11 01 01 10 10 11 = 00000101110101101011 (binary) = 23915 (decimal)

Set<Integer> seq = new HashSet<>();
Set<Integer> repeatedSeq = new HashSet<>();

v = (v<<2 | map.get(s.charAt(i))) & 0xfffff;

public class Solution {
public List<String> findRepeatedDnaSequences(String s) {
List<String> seqs = new ArrayList<>();
Set<Integer> seq = new HashSet<>();
Set<Integer> repeatedSeq = new HashSet<>();
HashMap<Character,Integer> map = new HashMap<Character,Integer>();
map.put('A',0);
map.put('C',1);
map.put('G',2);
map.put('T',3);

int v = 0;
// Use a sliding window to check every 10-bit substring
for (int i = 0; i < s.length(); i++) {
// 2 bits/char * 10 char = 20 bits so use 0xfffff
v = (v<<2 | map.get(s.charAt(i))) & 0xfffff;
if (i < 9) continue;
// Check each 10-bit substring
else {
// If first come out duplicates, then add to list