LeetCode-in-Java

3475. DNA Pattern Recognition

Medium

Table: Samples

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| sample_id      | int     |
| dna_sequence   | varchar |
| species        | varchar |
+----------------+---------+
sample_id is the unique key for this table.
Each row contains a DNA sequence represented as a string of characters (A, T, G, C) and the species it was collected from. 

Biologists are studying basic patterns in DNA sequences. Write a solution to identify sample_id with the following patterns:

Return the result table ordered by __sample_id in ascending order.

The result format is in the following example.

Example:

Input:

Samples table:

+-----------+------------------+-----------+
| sample_id | dna_sequence     | species   |
+-----------+------------------+-----------+
| 1         | ATGCTAGCTAGCTAA  | Human     |
| 2         | GGGTCAATCATC     | Human     |
| 3         | ATATATCGTAGCTA   | Human     |
| 4         | ATGGGGTCATCATAA  | Mouse     |
| 5         | TCAGTCAGTCAG     | Mouse     |
| 6         | ATATCGCGCTAG     | Zebrafish |
| 7         | CGTATGCGTCGTA    | Zebrafish |
+-----------+------------------+-----------+ 

Output:

+-----------+------------------+-------------+-------------+------------+------------+------------+
| sample_id | dna_sequence     | species     | has_start   | has_stop   | has_atat   | has_ggg    |
+-----------+------------------+-------------+-------------+------------+------------+------------+
| 1         | ATGCTAGCTAGCTAA  | Human       | 1           | 1          | 0          | 0          |
| 2         | GGGTCAATCATC     | Human       | 0           | 0          | 0          | 1          |
| 3         | ATATATCGTAGCTA   | Human       | 0           | 0          | 1          | 0          |
| 4         | ATGGGGTCATCATAA  | Mouse       | 1           | 1          | 0          | 1          |
| 5         | TCAGTCAGTCAG     | Mouse       | 0           | 0          | 0          | 0          |
| 6         | ATATCGCGCTAG     | Zebrafish   | 0           | 1          | 1          | 0          |
| 7         | CGTATGCGTCGTA    | Zebrafish   | 0           | 0          | 0          | 0          |
+-----------+------------------+-------------+-------------+------------+------------+------------+ 

Explanation:

Note:

Solution

# Write your MySQL query statement below
WITH SampleAnalysisCte AS (
    SELECT sample_id, dna_sequence, species,
        dna_sequence REGEXP '^ATG' AS has_start,
        dna_sequence REGEXP 'TAA$|TAG$|TGA$' AS has_stop,
        dna_sequence REGEXP '.*ATAT.*' AS has_atat,
        dna_sequence REGEXP '.*GGG.*' AS has_ggg
    FROM Samples
)

SELECT sample_id, dna_sequence, species, has_start, has_stop, has_atat, has_ggg
FROM SampleAnalysisCte
ORDER BY sample_id ASC;