DNA Sequence Analysis And Computer Security

Abstract

In current era of ubiquitous smart devices, detecting malware is becoming an endless battel between ever evolving malware and anti-virus program which leads to increase in day-by-day processing of security related data. For detecting those malware various approaches has been developed over time. One of the approach among them is Deoxyribonucleic acid (DNA) sequence analysis. This includes comparision of sequencs in order tosearch similarity, identification of intrinsic features of sequence search, identification of differences and variations, revealing the evolution and genetic diversity of sequences and identifiction of molecular structure from given sequence. Over time massive inprovement in DNA sequencing has lead to prolifration of bioinformatics tools and as increase in usability of this tools has begun these tools has encountered little adverse impact. This paper will explain the primary concept of DNA sequence analysis, relationship between computer system and DNA sequence and malware detectetion technique used to avoid possible attacks using DNA sequence analysis.

DNA sequence analysis

DNA is basically a way of storing information. Generally, it encodes instructions for making living things but it can be used for other purposes as well. In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution[1]. Due to increase in the amount of methods of high-throughput production of gene and protein sequences, the rate at which new sequences are added into databases is increasing rapidly. Now a days, scientists are comparing these new sequences with known functions so that they can understand the biology of an organism from which they can generate the new sequence. Thus, sequence analysis can be used to assign function to genes and proteins by the study of the similarities between the compared sequences.

Relation between Computer security analysis and DNA sequencing

Fast improvement in cost and time is required to sequence and analyze DNA. In past decade, cost of sequencing a human genome has decreased 100,000 fold or more which was made possible by using parallel processing. Now a days, we can sequence hundreds of millions of DNA strands simultaneously which has opened so many opportunities in increase of applications in domains ranging from human behavior, personalized medicine to study of microorganisms in our gut.

Usually computers are utilized to process, analyze and store all this millions of DNA sequences and due to rapid improvement in technology new and unexpected interactions between electronic and biological systems has been noticed over time. Once DNA is sequenced, it is usually processed and analyzed by numerous computer programs which are called the DNA data processing pipeline. Generally, it analyze the computer practices of commonly used open-source programs in this pipeline.[2] Scientists has utilized DNA to store books, recordings, Amazon gift card and even GIFs. Researchers from University of Washington has managed to take over computer by encoding a malicious program in DNA.

Malware detection techniques

Early proposed detection techniques were on basis of static analysis which includes examining binary code and identifying malicious code without execution. But now a days, inspecting binary code is difficult and since obfuscation techniques such as polymorphism, encryption, or packing become more sophisticated. In addition, this static analysis depends on pre- built signature database which make hard for them to detect new unknown malware until signature is updated. To reduce these limitations of static analysis and compliment it, dynamic analysis has been found and widely used now a days to achieve effective malware detection. Dynamic analysis executes malware and detects its behaviors. Mainly two approaches are used for dynamic analysis: Control flow analysis and API call analysis. Both trace malware based on analysis of similarity between already known and new ones. Many currently available API call techniques reveal the characteristics of malware in same class quickly but fail to show sequence of malware behavior and easy to evade by different malware authors’ inserting and executing dummy and redundant API calls. Some other researchers extracts API call sequence for each class and develop static signatures based on it. But creating signatures from extracting frequently found call sequence for malware in each class does not allow them to detect malware in known form.

Due to this requirement for new approaches in API call sequence analysis incurred. The information gathered through the dynamic approach can also be processed using simple statistics such as frequency counting and data mining or machine learning [3]. Recent studies focuses on the fact that the critical low-level system call sequence does not change until the main purpose of the malware does not change so the focus of them is on API call sequence for certain function of malware instead of call sequence for malware in each class. Sequence alignment algorithm is used to extract the similar subsequences from different sequences. This algorithm have been applied in natural language processing and biometrics and have provided excellent results.

How a DNA can be used to compromise computer?

The researchers at University of Washington try to mimic an adversary and (1) synthesize a real, biological DNA sequence with a malicious, embedded exploit. Then experimentally evaluate the impact of that exploit DNA on a victim by having the victim (2) sequence that DNA using standard sequencing methods and (3) post-process the DNA sequence with a realistic program — a program that a scientist might use to analyze the resulting DNA sequence [2]. They got the results which shows that while their exploited program is vulnerable to basic buffer overflow exploit, the security of the overall DNA sequencing pipeline is not much better.

In their experiment they used FASTQ compression utility, fqzcomp, which is designed to compress sequences. For experiment they inserted vulnerability into this utility by copying fqzcomp from https://sourceforge.net/projects/fqzcomp/ and inserted into version 4.6 of source code; a function which processes and compresses DNA reads individually using fixed-size buffer to store the compressed data. This modification cause buffer overflow with longer than expected DNA read by hijacking control flow. As expected, use of fixed-size buffer is vulnerability in system since fqzcomp already contains more than two dozen static buffers. They modified 54 lines of C++ code and removed 127 lines from fqzcomp. This modified version used a simple 2-bit DNA encoding scheme such as four nucleotides were encoded as two bits- A as 00, C as 01, G as 10, and T as 11 – packing bits into bytes starting with the most significant bits. They ran the target Cpts_483 Topics in Computer Science program in a simplified computing environment and disabled common security features like stack canarie and ASLR and marked stack as executable.

Today, any fixed-size buffer would likely be vulnerable, as new longer read sequencing technologies can produce reads that are upwards of 60,000 bases[4]. Their exploit triggered a buffer overflow when program tried to read the 176 base pairs on their strand and portion of code also granted the team remote control of the sequencing machine’s computer and later crash the system. Their demonstration serves as a warning sign about a new kind of attack that could occur someday.

Another researcher’s team did experiment by setting up virtual environment to run malicious programs to trace API call sequence in runtime. They used the Detours hooking library supported by Microsoft to trace API call. Before the target function starts, the Detour function leaves the log of target function’s name which allows them to trace API call sequence. They utilize VirtualBox to execute malware and observe it’s activity which was 32-bit Windows XP Service Pack 3. They set up maximum monitoring period as two minutes for the default value to trace API call sequence.

For DNA sequence alignment they used ClusterX which is widely used freeware in genome sequence analysis such as DNA, RNA or protein sequences. Their experiment results showed facts that malware in the same family shares much common call subsequences. On the other hand, malware in different classes can have common call sequences [3].

Conclusions

In this paper, we’ve seen the method of API call sequence analysis and control flow and how malware can be added using DNA sequence. Malware detection system depend on signature of a malware’s static information, like file size, process and its artifacts. From above we found that antivirus vendors’ labeling of malware could be less accurate to be applied in the dynamic analysis of API call sequences. Therefore, they fail to detect new unknown malware until the signature has been updated.

References

  1. Sequence analysis. https://en.wikipedia.org/wiki/Sequence_analysis
  2. Peter Ney, Karl Koscher, Lee Organick, Luis Ceze, Tadayoshi Kohno (August 2017). Computer Security, Privacy, and DNA Sequencing: Compromising Computers with Synthesized DNA, Privacy Leaks, and More. https://www.usenix.org/system/files/conference/usenixsecurity17/sec17-ney.pdf
  3. Youngjoon Ki, Eunjin Kim, Huy Kang Kim (June 2015). A Novel Approach to Detect Malware Based on API Call Sequence Analysis. https://journals.sagepub.com/doi/full/10.1155/2015/659101
  4. Pacific Biosciences Of California. Smrt sequencing: Read lengths (February 2016). http://www.pacb.com/smrt-science/smrtsequencing/read-lengths/
  5. Researchers Embed Malware Into DNA to Hack DNA-Sequencing Software (August 2017). https://spectrum.ieee.org/the-human-os/computing/software/researchers-embed-malicious-code-into-dna-to-hack-dna-sequencing-software
Posted in DNA

DNA/Gene Classification Using RNN Sequential Analysis

Abstract

Each active life has complex molecules in their cells called DNA (Deoxyribonucleic Acid) which are responsible for all biological features. These DNA molecules are further reduced into grander structures called chromosomes, which together compose the being’s genome. Genes are size altering DNA sequences which comprise code that are frequently used to produce proteins.

There has been a struggle to reliably try to identify the gene sequences since the entire anthropological genome has been sequenced. Gene classification and prediction are difficult tasks to be executed, by numerous variables conditioning its efficiency. Developments in machine learning concepts has enhanced the prediction and classification of DNA sequences.

Deep Learning (DL) can be observed as a progress of the Artificial Neural Networks technology, are proficient to abstract significant features from raw data, and to practice these features for classification tasks. This report presents deep learning neural network for DNA sequence classification based on spectral sequence depiction. In case of datasets having huge number of attributes DL is relatively suitable to manage classification/regression tasks. This report points towards the learning of prediction and classification of genomic sequences and further improves knowledge in the field of gene classification using a DL model named RNN,

The deep recurrent neural network (RNN) designs can be used to trap the structure in a genetic sequence. By relating the perplexity attained after training on actual genome to that attained after training on a random sequence of nucleotides, we can confirm that a character level RNN can seize the non-random parts of DNA

Introduction

In recent years, deep RNNs are used by researchers to manage numerous machine learning difficulties in the province of NLP. Most of these applications examine complications like named entity recognition, translation and sentiment analysis. A smaller amount of work has been completed with RNNs on what is possibly the most natural language: the genome, a sequence of 4 letters (A, C, G, T). The persistence of this report is to discover how an RNN architecture can be used to study sequential patterns in genomic sequences. To confirm that an RNN can model the structure in a genomic sequence, firstly we train a simple character-level RNN to predict one of the 4 likely characters assumed the prior string of characters.

In case the capability of an RNN to forecast the subsequent character for a real genome is as similar as for a random genome, then we need to more cautiously tweak our model till we recognize some signal. This simple task is used to assist in selecting an appropriate model architecture. As soon as we have empirical evidence that an RNN could seize the non-random structure in a genome, we discover a sequence classification problem. Biological researches have revealed that subsequences of the anthropological genome are frequently controlled equally by neighboring and very distant sequences. Biologists have been able to recognize whether a specific genomic feature will be detected for a specific sequence. Few examples of these features comprise

  • a. DNase I hypersensitive sites: sequences that are delicate to cleavage by the DNase I enzyme
  • b. Histone marks: chemical changes to histone proteins, biomolecules which control certain sequences.
  • c. Transcription factors: proteins that bind to a specific sequence

Overview of RNN

RNN is a class of artificial neural network where networks amongst units form a directed graph along a sequence. This permits it to display progressive behavior for a time sequence. RNNs use their internal memory to process sequences of inputs. This makes them appropriate to tasks like unsegmented and speech recognition.

Recurrent Neural Networks (RNN) were formed to report the faults in ANN that didn’t make conclusions based on prior knowledge. A characteristic ANN had learned to make conclusions depending on situation in training, but once it was building conclusions for use, the conclusions were made autonomous of each other.

Recurrent Neural Network arises into the depiction when any model requires context to be able to deliver the output based on the input provided.

Literature Review

Sequences in biology certainly fit the processing power of RNNs. This is due to the temporal modeling abilities of RNNs. By using iterative function loops, they stock information from input sequences. As they stock framework information in a flexible way, RNNs are a perfect architecture for sequence labelling tasks. They take input data in different forms and illustrations, by having the knowledge of what to stock and what to ignore. Also, they can comprehend sequential patterns in the existence of sequential noise (Graves, 2012). The time window method used by additional nonsequential networks suffers from shortage of robustness counter to sequential misrepresentations and the necessity to manually regulate the window length. It also surges the quantity of weights in the network. The other alternative method is to announce a delay from input processing to output generation. This method is robust in contrast to sequential misrepresentations but the delay sequence should be manually determined. Also, the network should recollect original inputs during the delay (Graves, 2012). A suitable approach to better comprehend RNN architectures is to unfold the cyclical connections through a graph, where each time step forms a node and shares the similar weights as other nodes (Graves, 2012).

Certainly, RNNs are powerful architectures. As RNN-Turing machines can also implement this kind of functions, the author demonstrates their equivalence by relating an RNN based on perceptron’s with a program executing a computable function. This similarity is found in terms of the transitions of states to the program flow and the network internal state to the program state.

Problem Statement

Gene classification using RNN is the problem of categorizing the functionality of genes using only the sequence data (ATGTGT….) repeatedly. This problem can be solved using RNN which will monitor the sequence and deliver meaningful data. They can integrate contextual data from past inputs, with the benefit to be robust to constrained distortions of the input sequence along the time. In this research RNN based network is used for the DNA/gene classification. RNN is a kind of recurrent neural networks with a more composite computational unit that leads to improved performance. RNN model in this research is developed using tensorflow python package. RNNs are mostly used for handing out sequences of data which progresses along the time axis. Sequences do not have explicit features, and the usually used illustrations announce the disadvantage of the high dimensionality. For sure, machine learning techniques dedicated to supervised classification tasks are dependent on the feature extraction stage, and to shape a good illustration it is required to identify and measure meaningful particulars of the items. The multi-task learning notion disturbs the models, both in terms of required training time and performance.

Methodology

  • a. Collection of Data and Preparation of Dataset: Genomic sequences in the dataset is from 16S dataset. Pictures in the dataset were clustered into 5 dissimilar classes.
  • b. Character Embedding: It is done by character level one hot encoding. This illustration considers each character ‘i’ of the alphabet through a vector of length equivalent to the size of the alphabet, having all zero entries excluding a single one in location ‘i’. This method also leads to a sparse illustration of the input provided, which is undertaken in the NLP literature through an embedding layer.
  • c. Training: Training the deep convolution neural network for making an image classification model is done. CaffeNet architecture is used and attuned to support our 15 classes. Rectified Linear Units also known as ReLU are used as the substitution for saturating nonlinearities. This activation function adaptively studies the parameters of rectifiers and progresses accuracy at insignificant additional computational cost.
  • d. Testing: In this stage the author uses the test set for prediction of gene sequence class.

Architecture of DNA/Gene Classification System

Experimental Design

  • a. Dataset: ‘16S’ dataset has been used in this research which was downloaded from the RDP-II by NCBI. A total of 3000 sequences have been selected, and can be further clustered into 5 well-ordered taxonomic rank Family, Order, Class, Phylum and Genus by a consequent filtering stage.
  • b. Evaluation Measures: Standard Deviation (SD) & Mean are calculated on 10 validation test folds.
  • c. Hardware & Software Requirements: Python based Deep Learning libraries & Computer Vision will be oppressed for the project growth and its research. Training is performed on GPUs.

Conclusions

Deep learning models are powerful complement to traditional ML techniques and other analysis strategies. These approaches are used in a various application in computational biology, which includes image analysis and regulatory genomics. Research specifies that standard RNNs and its variations have improved performance in time-series data as when compared with other models. It is extensively used in the process of sequence data, like video description, text classification because RNNs can efficiently extract feature data from time-series data,

The main aim of this project was to evaluate how DL models could be functional in the classification of DNA sequences. Since gene annotation and gene prediction are crucial tasks to comprehend how the several genomes work, I want to contribute to the extent with few insights regarding trending technological methods. Every gene classification model that has any practice in the field of bioinformatics is required to achieve lots of features and also identify gene sequences. This further include, classification of homologous sequences and identification of other specific binding sites, terminator and regulatory regions, the identification of promoter within the genome.

In general, there are shared approaches between comparative methods and ab initio to augment classification results. With this report and work I wanted to measure and emphasize the usage of DL algorithms in the cracking of genomic problems. One issue that proved to be tough to overcome was the computational limits. DL projects typically distribute state of the art results by running huge information by consuming more time in clusters of dozens or hundreds or even thousands of high-end machines, and most of them use GPUs for matrix calculations.

RNNs in specific are popular in sequence classification problems with samples of variable length. Additional efforts at extracting more features from DNA sequences before feeding them to the models could also be made which results in overall calculations with more accuracy. The key enhancement would be in the working environment. With accomplishments like usage of GPU for calculations and the integration of our machine in a cluster that comprises other computers, hyperparameters like reduced learning rate can be improved, or increase size of the dataset. All of this possibly will result in improved inferences than those that are attained previously.

References

  1. Juergen Schmid Huber. Deep Learning in Neural Networks: An Overview. Neural Networks
  2. Oriol Vinyals, Ilya Sutskever and Quoc V Le. Sequence to sequence learning with neural networks
  3. Alex Graves. Generating Sequences with RNN
  4. Ritambhara Singh, Yanjun Qi and Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks. arXiv
  5. Zachary C. Lipton. A Critical Review of RNN’s for Sequence Learning. arXiv
Posted in DNA

Techniques In Genomic DNA Extraction From Palm Oil Leaves

Introduction

The oil palm (Elaeis guineensis Jacq.) having a place with the family Arecaceae [1], a diploid oil-delivering crop with a genomic size of 1.8 Gb [2], is one of the most important oil-bearing crops in the world. It is a large feather palm having a solitary columnar stem, short internodes, and short spines on both the leaf bases and within the fruit bunches [3]. It has irregular sets of leaflets on the leaf, which gives the palm its characteristic appearance. The palm is monoecious with male or female inflorescence, but hermaphroditic inflorescences sometimes develop in the axils of the leaves [3]. The fruit, which is borne on the large compact bunch, is called a drupe [1]. Distinguishing the different types of oil palms has been controversial. These attempts have been unsatisfactory since in the wild state, each palm represents a hybrid with respect to some of its traits [1]. Oil palm is classified based on the fruit type and fruit form. It has three fruit forms: Dura, Pisifera and Tenera (hybrid fruit form) and different fruit types namely, virescens, albescens, nigrescens and poissoni [1]. Oil palm has benefited immensely from conventional breeding program in Nigeria. This has solely been made possible through the dedicated breeding program put in place by the Nigerian Institute for Oil Palm Research (NIFOR).

The progress made so far has been very limited for two reasons: (1) the long generation time and (2) the outcrossing nature of the crop. With the emergence of deoxyribonucleic acid (DNA) marker technologies, scientists see the possibility that significant success can be achieved if markers are extensively applied by oil palm breeders as exemplified by the cloning of the shell thickness gene [4]. Exploration of DNA marker technologies, combining the knowledge from research in molecular genetics and genomics, offers great possibilities to oil palm breeding [5]. With DNA marker technologies, the underlying genetic basis of phenotypic traits can be studied independent of environmental influences. However, critical to the adoption, application and the domestication of these technologies is an effective DNA extraction method that is less complex than the methods that have been previously applied to extract DNA from palms [6-10]. The basic principles underlying DNA extraction procedures are not very complicated, but the growing numbers of DNA extraction procedures indicate that it is not always simple and the published protocols are not necessarily reproducible for all species [11,12]. The objective of this study was to develop a method for DNA extraction from oil palm leaves that is cost effective and adaptable to low budget laboratories

Background

A number of commercial genomic DNA extraction kits are available to speed up the extraction process. However, the use of commercial kits to isolate oil palm DNA is mostly expensive and often does not give satisfactory results compared to the conventional protocol (Ying and Faridah, 2006). The hexadecyltrimetyhlammomnium bromide (CTAB) protocol described by Doyle and Doyle (1987; 1990) is one of the conventional methods commonly used for the isolation of DNA from plant species (Borges et al., 2012) including oil palm. In contrast to commercial kits, this protocol is time- consuming and laborious and as such, can be a problem if DNA is to be extracted from hundreds of samples. Therefore, there is a need for a rapid and simple extraction procedure that yields good quality and quantity of genomic DNA. Several protocols for rapid preparation of DNA from plant tissues have been reported (Ausubel et al., 2003; Dhakshanamoorthy et al., 2009; Arif et al., 2010) and can be exploited for extracting DNA from oil palm.

The existing modified CTAB-based protocol applied in our laboratory is a combination of methods described by Saghai-Maroof et al. (1984), Rogers and Bendich (1985) and Doyle and Doyle (1987) which, was published by Weising et al. (1995). This method has sucessfully produced high quality and quantity of DNA but only allows a small number of samples to be processed at a time. The current method yields about 200 – 680 µg DNA g-1 leaf tissue (Rahimah et al., 2006). Since the oil palm tissue is very fibrous, approximately 1 to 2 hr is spent grinding four to six samples using mortar and pestle in liquid nitrogen. As such, one laboratory technician can only handle a limited number of samples per day. Additionally, four days are needed to complete the entire extraction protocol. Due to these drawbacks, initiative was taken to test a published DNA extraction protocol that can be completed within a day, and gives the required quality and quantity.

This study explored the DNA extraction protocol described by Arif et al. (2010). The published protocol of Arif et al. (2010) suggests that grinding the tissue in the extraction buffer (with NaCl) and sterile sand provides acceptable DNA yield suitable for routine molecular biology analysis including PCR amplification. The protocol omits the use of liquid nitrogen (N2), polyvinylpyrrolidone (PVP) and lithium chloride (LiCl) and reportedly produces on average 70 µg DNA g-1 sample. The protocol as described was tested on oil palm tissues, but did not produce sufficient amount of DNA for certain applications and the quality was also slightly below expectation. As such, this study describes minor modifications to the extraction protocol of date palm described by Arif et al. (2010) for routine isolation of acceptable quality and quantity of DNA from oil palm tissues.

Problem Statement

The method stated by Arif et al. was chosen due to the hardy and fibrous nature of the date palm leaves which is quite similar to oil palm leaves. The protocol almost simple and it could provide sufficient DNA yield, and appears convenient for daily extraction of DNA from palm oil samples.

Using the same amount of start-up materials (0.1g) processed using the older protocol, the DNA quality and yield varies and the data was fairly inconsistent. In addition, the original protocol just utilizes 0.1 g of starting tissue with 500 microlitre buffer in 1.5 ml Eppendorf tube. The DNA yield obtained from 0.1 g oil palm tissue was generally less50 microgram, and that is not sufficient for some applications such as restriction fragment length polymorphism (RFLP).

Discussion

DNA Extraction

Initially the method described by Arif et al. (2010) was applied without modification. A total of 10 samples were evaluated. DNA from the same sample was then re-extracted using the modified method described below:

The 2X CTAB lysis buffer (2% cetylmethylam- monium bromide, 100 mM Tris-HCl pH 8.0, 20 m MEDTA pH 8.0, 1.4 M NaCl, and 2% PVP-40), 7.5 ml, was pre-heated in a glass beaker to 60°C in a water bath. While the buffer was heating, 2 g of frozen leaves was ground using a sterile mortar and pestle with gradual addition of liquid N2. The 250 mg sterile acid sand was added into the mortar prior to the grinding process. The frozen fine powder was left at room temperature for 5 min.

Seventy-five µl each of 0.5 M ascorbic acid, 0.4 M DIECA, and 1% β-mercaptoethanol were added to the pre-heated 2X CTAB lysis buffer (this step was carried out in the fumehood). The powdered tissue was further thawed by immersing the mortars in warm water. Following this, 7.5 ml of the lysis buffer was added into the mortar and gently mixed using the pestle. The mixture was then transferred into a 15 ml Falcon tube, vortexed briefly and incubated at 60oC for 30 min in a shaking water bath. The samples were then allowed to cool at room temperature for 30 min. This is followed by centrifugation at 4000 rpm for 30 min at 25°C in a swing-bucket rotor of the Eppendorf Centrifuge 5810 R. Six millilitres of the upper aqueous phase was carefully transferred into a new Falcon tube using wide bore pipette. An equal volume (6 ml) of chloroform: isoamyl alcohol (24:1) was added and the tube was vigorously shaken for 5 min to mix the solution thoroughly. DNA was precipitated by centrifugation at 4000 rpm, 30 min, 25°C. About 5 ml of the supernatant was again transferred to a new Falcon tube and treated with 6 µl of RNAse at 37°C or room temperature for 30 min. The 500 µl of 3.0 M sodium acetate and 1 volume (5 ml) of cold isopropanol were added gently, mixed and kept in -20°C freezer for at least 30 min. After centrifugation at 4000 rpm, 4°C for 30 min, the supernatant was discarded. The pellet was resuspended in 2 ml of wash buffer (76% ethanol, 10 mM ammonium acetate) and kept at 4°C for at least 30 min. The wash buffer was carefully poured off and the pellet was dried in a speed vacuum for 10-15 min. Two millilitres of 70% cold ethanol was added and kept at room temperature for 15 min. The ethanol was discarded and the tubes were placed in a speed vacuum to allow complete drying of the pellet. The dried DNA pellet was then carefully transferred into a 1.5 ml eppendorf tube and dissolved in 300-350 µl of TE buffer, depending on size of the pellet, followed by incubation at 50°C in a shaking water bath. After the pellet was fully dissolved, a small aliquot of the DNA samples was electrophoresed in 0.8% agarose gels at 100V for 1.5 hr. The results were visualised after staining the gels in 0.5 µg ml-1 ethidium bromide (EtBr) solution.

DNA Quantification

The total DNA yield and purity index were determined using Multiskna Go (Thermo Sciientific). Five microgram of eachof the extracted DNA was also digested with six (BGIII) and four (HaeIII) base pair cutter restriction enzymes to check its digestibility. The DNA was also tested to evaluate its suitability for simple sequence repeat (SSR) analysis and single nucleotide polymorphism (SNP) genotyping.

Conclusion

In order to obtain higher DNA yields, the Eppendorf tube was replaced with 15 ml volume Falcon tube, so that the starting material could be increased to 2 g. The lysis buffer then was increased to 7.5 ml. Liquid nitrogen may not be necessary in the method by Arif et al. (2010) because the authors used fresh tissue. But it is not possible in some cases the leaf samples were stored frozen in -80 degree celcius in the freezer befor the extraction process happens. This is due to the sampling site was quite far from the laboratory, so extraction process was not possible in the same day the samples were collected. Because of the distance, the sample must be cleaned and frozen in liquid nitrogen as soon the sample has arrived from the sampling site and kept in in -80 degree celcius in the freezer. The advantage of this procedure over the original method is that it allows larger amount of tissue to be used as starting materials. Higher yield of genomic DNA can be obtained for various analysis and long term storage. The method is quite simple and it could be done in two days. A laboratory technician could easily process up to 12 samples a day. Furthermore, the quality and the yield of DNA is also similar to the conventional method. Therefore, we conclude that the modified method can yield DNA for routine molecular biology studies of oil palm and perhaps also useful for other plant species.

References

  1. Hartley CWS (1988) The Oil Palm (Elaeis guineensis Jacq.) 3rd edn. Longman Scientific and Technical, Longman Group, United Kingdom. 761 p.
  2. Singh R, Ong-Abdullah M, Low EL, Manaf MAA, Rosli R, et al. (2013) Oil palm genome sequence reveals divergence of interfertile species in Old and New worlds. Nature 500: 335-339. doi: 10.1038/nature12309.
  3. Corley RHV, Tinker PB (2003) The oil palm. 4th edn. Blackwell Science Ltd, Oxford, U. K 562 p. [Google Scholar]
  4. Singh R, Low EL, Ooi LC, Ong-Abdullah M, Ting N, et al. (2013) The oil palm SHELL gene controls oil yield and encodes a homologue of SEEDSTICK. Nature 500: 340-344. doi: 10.1038/nature12356. [View Article] [PubMed] [Google Scholar]
  5. Collard BCY, Mackill DJ (2008) Marker-assisted selection: an approach for precision plant breeding in the twenty-first century. Philos Trans R Soc Lond B Biol Sci 363: 557-572. doi: 10.1098/rstb.2007.2170. [View Article] [PubMed] [Google Scholar]
  6. Ying ST, Zaman FQ (2006) DNA extraction from mature oil palm leaves. J Oil Palm Res 18:219-224.
  7. Arif IA, Bakir MA, Khan HA, Ahamed A, Al Farhan AH, et al. (2010) A simple method for DNA extraction from mature date palm leaves: impact of sand grinding and composition of lysis buffer. Int J Mol Sci 11: 3149-3157. doi: 10.3390/ijms11093149. [View Article] [PubMed] [Google Scholar]
  8. Ouenzar B, Hartmann C, Rode A, Benslimane A (1998) Date palm DNA mini-preparation without liquid nitrogen. Plant MolBiol Rep 16:263- 269.
  9. Risterucci AM, Grivet LN, Goran JAK, Pieretti I, Flament MH, te al. (2000) A high density linkage map of Theobroma cacao L. Theor Appl Genet 101:948-955.
  10. Billotte N, Marseillac N, Risterucci A, Adon B, Brottier P, et al. (2005) Microsatellite-based high density linkage map in oil palm (Elaeis guineensis Jacq.). Theor Appl Genet 110: 754-765. doi: 10.1007/s00122-004-1901-8. [View Article] [PubMed] [Google Scholar]
  11. Porebski S, Bailey LG, Baum BR (1997) Modification of a cTAB DNA extraction protocol for plants containing high polysaccharide and polyphenol components. Plant MolBiol Rep 15:8-15.
  12. Vinod KK (2004) Total genomic DNA extraction, quality check and quantitation. Proceedings of the training programme on “classical and modern plant breeding techniques – a hands on training”. Tamil Nadu Agricultural University, Coimbatore, India, pp 109-121. [Google Scholar]
Posted in DNA

Extraction Of DNA From Strawberries

Introduction

Deoxyribonucleic acid, often abbreviated to DNA, is found in the nucleus of the cells of almost all living organisms on earth. DNA contains the genetic instructions for making proteins and how an organism will develop, live and reproduce, and is often referred to the building block of life for organisms (reference). DNA is arranged in a spiralling double helix shape, similar to a twisted ladder, and contains thousands of repeating nucleotides, which are the structural components of DNA. Each nucleotide is comprised of a deoxyribose sugar molecule and a phosphate group, which create the ‘backbone’ of each DNA strand, and a nitrogenous base. There are four bases that can be found within DNA, which include adenine, thymine, cytosine and guanine. These bases form base pairs, where each base will only pair with its complementary base. This means that adenine will only pair with thymine, and cytosine will only pair with guanine. These base pairs make up the ‘rungs’ of the ladder in DNA’s twisted helix formation. The order of the bases within each DNA molecule is what is used as a code to synthesise proteins which determine each characteristic of every living organism. DNA molecules are bound to and coiled around proteins called histones which allow a large amount of DNA to be stored within each nucleus. These coiled histones form a coiled network named chromatin which, when a cell is about to divide, coils even tighter into structures called chromosomes. This experiment was designed to extract clumped strands of DNA from strawberry cells using a solution of salt, detergent, water and ethanol. The aim of the experiment was to investigate the strands of DNA extracted from the strawberries and analyse the effect that changing the amount of detergent in the solution had on the weight of DNA that was produced from the cells. Strawberries are particularly useful when investigating DNA because they are octoploid, which means that each cell contains 8 copies of each chromosome, resulting in a large quantity of DNA that is able to be extracted in the experiment (Washington). Different quantities of detergent were added to the solutions whilst the amount of strawberries, salt, ethanol and water were controlled in order to test the effect that changing the concentrations of the detergent would have on the amount of DNA produced. The hypothesis of this experiment was that if more detergent was added to the solution, more DNA would be extracted from the strawberry cells.

Discussion

As demonstrated by graph 1 and table 1, there was a linear trend that increases the weight of DNA collected as the amount of detergent was increased. This linear increase of DNA extracted can be suggested to be attributed to the structure and properties of detergent. Detergent is created from molecules that contain charged hydrophilic, or water-loving, heads and non-charged hydrophobic, or water-hating, ends (Detergent structure). This structure allows the detergent molecules to act as an emulsifier for lipids, which means that it has the ability to suspend and disperse small droplets of lipids in a water-based solution by trapping them in bubbles called micelles (emulsifier definition). Micelles have a similar structure to a cell’s membrane, as the both have a hydrophilic head facing out and hydrophobic tails facing in in a spherical shape. Because lipids are non-polar and insoluble in water, they are able to be broken down by the non-polar ends of the detergent and then trapped within the micelle. In this experiment, the detergent is able to release DNA from the cells by causing cell lysis, or breaking down, of the lipids in the phospholipid bilayer and rupturing the nuclear membrane where the strands of DNA are stored within a cell (Murdoch). As DNA is insoluble to ethanol, the ethanol that is added to the solution causes the DNA to precipitate into a white stringy substance which is then able to be extracted from the solution and weighed. As the concentration of detergent was increased in the solution, it was likely able to break down the cellular and nuclear membranes of the strawberry cells more effectively as the detergent was able to come into contact with a higher number of cells, and therefore release more DNA strands than the solutions with lower concentrations of detergent.

As the amount of detergent reached 4mL, the amount of DNA produced from the strawberries stopped increasing in a linear trend and levelled out at 6g of DNA produced. This lack of increase with a higher concentration of detergent could suggest that sufficient detergent was added to the solutions in order to release all the viable DNA, and that adding more detergent to the solution would not have an impact as the membranes are already completely broken down. The data produced from this experiment is linear and consistent and is therefore relatively both reliable and accurate. The data collection strategy was also considerably reliable, as the experiment was conducted in the same environment and the same time, and the controlled variables of the amount of strawberries, water, ethanol and salt meant that the amount of detergent was the only variable being altered. However, a limitation of the data collection strategy was that the scale used in the experiment only displayed measurements for every tenth of a gram and was therefore not as accurate in the exact weights of the DNA produced. Another limitation was that whilst the strawberries were all measured to 10g, the lack of a more accurate scale with smaller increments could have resulted in slightly larger amounts of strawberries in one of the solutions which could potentially have produced more DNA due to having a larger quantity of cells. The experiment could be adjusted to provide more accurate and reliable results by using a scale with smaller increments of measurement, by performing multiple trials to get an average for each different amount of detergent, and by adding smaller increments of detergent to be able to accurately see the change of DNA produced in relation to the concentration of detergent.

Conclusion

An experiment was conducted to test the amount of DNA that could be extracted from strawberry cells when changing the concentrations of detergent in the solution that was mixed with the strawberries. The results of the experiment were that the amount of DNA increased as more detergent was added, before levelling off after 3mL of detergent was added. The amount of DNA increased as more detergent was added because the detergent breaks down the lipids in the cellular membranes of cells, allowing them to rupture and release the DNA stored within the nucleus. The levelling of the amount of DNA produced after 3mL of detergent was added could be attributed to the membranes being fully broken down, and could suggest that no more detergent was required to break down and separate the lipids, and that 3mL was the optimum amount of detergent for the amount of strawberries. The experiment produced consistent linear results and therefore can be considered as relatively valid data. The hypothesis that as more detergent is added, a higher quantity of DNA will be produced was supported by the linear trend of the data from this experiment.

REFERENCES

  1. www.wisegeek.com/what-is-dna
  2. (Washington) www.gs.washington.edu/outreach/Dhillon_dnaprocedure.pdf
  3. (R2 value) My Accounting Course. (2019). What is R Squared (R2)? – Definition | Meaning | Example. [online] Available at: https://www.myaccountingcourse.com/accounting-dictionary/r-squared [Accessed 19 Aug. 2019].
  4. (detergent structure) https://www.sigmaaldrich.com/technical-documents/articles/biofiles/detergent-properties.html
  5. (emulsifier definition) https://www.rimpro-india.com/articles1/surfactants-as-detergents-and-emulsifiers.html
  6. https://www.thoughtco.com/how-do-detergents-clean-607866
  7. https://www.murdoch.edu.au/Biotech-out-of-the-box/_document/Kit-Handout-Sheets/DNA-extraction-from-strawberries.pdf
  8. https://www.researchgate.net/post/What_is_the_function_of_detergent_in_DNA_extraction
  9. (image 1) https://cdn.instructables.com/FKR/QF9W/GYN8W52O/FKRQF9WGYN8W52O.LARGE.jpg?auto=webp&&frame=1
  10. (cell lysis and breakdown) http://www.explorecuriocity.org/portals/2/themes/biotechnology/DNA-Extraction-Backgrounder.pdf
  11. (Alcohol) https://info.gbiosciences.com/blog/bid/156468/work-of-salt-isopropanol-and-ethanol-in-dna-extraction
  12. (chromosomes) https://www.nature.com/scitable/topicpage/chromosomes-14121320/
Posted in DNA

The Significance Of DNA Database

DNA database plays an important role in the world, specifically the criminal and forensic world. DNA database, in this case forensic DNA database. The term DNA database refers to a collection of DNA samples and any other evidence stored as DNA profiles. DNA database could be extremely useful during criminal investigations. For example comparing a DNA sample, specifically taken from a crime scene, to a suspect’s DNA stored in a database, would determine whether the suspect is guilty or not, and it could eventually lead to the main goal of that forensic investigation.

Even though DNA database is known to be useful and partially accurate, it still has some ethical issues such as the errors made when DNA is evaluated by using data but also the consequences of having a national DNA Database. In this case, the main focus on this issue is the errors made in matching the correct suspect using forensic DNA database. A brief idea on what causes this complication is the lab experts or forensic scientists, and it could depend on their skills and techniques used to extract DNA from the samples. It is a main issue because it could cause complications during the investigation for example when a suspect is detained wrongfully because it is believed that suspect is guilty, in fact that suspect is innocent. It could affect the local residents, ethically by having a false criminal record, which could affect employment. It could also affect the government as well financially and legally, due to wrongfully convicted suspects suing the government. The focus of this investigation is to investigate why the errors are caused when lab technicians evaluate forensic DNA database.

In order to test whether DNA database is accurate, two researchers, Itiel Dror and Greg Hampikian, conducted an experiment. Both researchers had gathered the same exact DNA samples and sent out to 17 different lab experts and forensic scientists. It was hypothesised that every DNA sample would have the same outcome due to coming from the same material. However, the results proved this to be wrong because each result from each expert had a different and a remarkable outcome. This may have big impacts on investigations because the outcomes might affect innocent people or suspects. “This demonstrated that what the forensic scientist knows about the investigation may impact the interpretation of a DNA sample. Perhaps then, it is no surprise that there are now numerous of cases of lab technicians who make mistakes or argue that there was a DNA match when there was none” – Aziza Ahmed. The reliability of this fact is high because it came from an experiment performed by two lab experts.

This graph shows cases of suspects that have been wrongfully convicted were released due to the following reasons; Eyewitness Misidentification, Improper forensics, False Confessions and Informants. Improper forensics currently shows that it is 47% and falls second highest on the graph. As mentioned previously, this data relates to the issue and the impact of improper forensic techniques is really big and could affect innocent people, only due to incorrect techniques used and etc. This graph is not extremely reliable because there is no background information on when the data was collected, how and by who.

A famous case of an innocent convicted suspect might prove that DNA database might not be proper due to improper forensics. David Camm was a main suspect and was accused of murdering his wife and also his two children. According to a prosecutor or lawyer, Stan Faith, the DNA evidence that was found on a shirt left near one of the deceased children, was not in the database system therefore the only person that could match that sample was the innocent Camm. After two trials, the DNA evidence was re-evaluated and was sent to do another CODIS, a new DNA evidence was found which matched a convicted lawbreaker, known to attack women, named Charles Boney. “Boney’s DNA was in the system all along.. There must’ve been some misunderstanding on the part of the State Police about what was wanted”. It was concluded that the prosecutor Faith, had either lied about the DNA being evaluated or the DNA was evaluated improperly by the forensic lab experts, causing a controversy with an innocent suspect which led him to being convicted due to improper DNA database evaluation. This fact is reliable because it comes from a police report, which is from the government.

The aim of this investigation was to understand why there are such errors are made when evaluating DNA in the database. There have been some cases where innocent suspects have been imprisoned due to false or incorrect DNA analysis. According to a few facts or evidence such as the experiment, which was conducted to see whether all results will be the same, it was concluded that every result was different meaning there will be problems when It comes to criminal investigations. The graph is another evidence that shows how many false imprisonments have been made due to improper forensic and that supports the argument that it could be the lab technician’s fault conducting the analysis. In order to improve or even solve the problem, improvement in technology would help out the case massively. Due to the many mistakes caused by lab experts or forensic scientists, if a new feature is created such as robot or a computer specialised in analysing DNA data, that would cause less errors. This way people’s lives and ethics are not ruined by the media that would cause problems In for example finding jobs, etc. It is likely for there to be a solution in the future, because technology improves every day and it would not take as much time during this time rather than 50 years ago.

Posted in DNA

Identifying An Unknown Tissue Sample Via DNA Extraction

Introduction

DNA, or Deoxyribonucleic acid, contains vital coding that makes up the entirety of an organism (Lesk, 2005). These long, double helix structures contain four nucleotides which sequentially create nucleic acids, then consequently combine in different ways to form specific proteins that perform various tasks for the organism during its lifetime (Sanderson, 2007). Once the animal reproduces, this genetic coding is passed on to offspring (Lesk, 2005). There are many practical uses for DNA extraction, which range from its use within forensic environments, (Brandt & Gonzales, 2005) to identifying the bloodline of an animal and its potential for mutations (Cocciolone & Timms, 1992). As each organism has a different and unique genetic code within their DNA; achieving a ‘match’ for unknown tissue has become a well valued process to understand more about animals and their genetic construct.

Students participated in two laboratories over the course of a week to extract DNA from four separate tissue samples from an unknown organism, one of which is known to be the target animal. The sample that matches with the specific target organism is anticipated to be discovered through utilising the processes of PCR, amplification, and gel electrophoresis.

Method

Prior to participating in laboratory one and two, standard hygiene procedures were utilised to minimise the effects of contaminating the samples. These procedures included washing hands thoroughly and using lab coats and safety glasses.

Materials and method for DNA extraction and PCR

Students first labelled a 2mL tube with their group name and the sample number of the tissue they had chosen. This tissue was cut, weighed to .25 grams and placed in the tube. 500μl of lysis buffer and 500μl DI water was then added, and the tube was gently flicked. 1 drop of 1% protease solution was then added using a transfer pipette. The large piece of tissue was split up using a tissue grinder until the solution was discoloured, and then vortexed for 10 seconds. The tissue samples were placed into an incubator for 10 minutes, and then students transferred the liquid of this sample ( with care to avoid solid tissue) into a centrifuge tube. 1mL of lysis buffer was added to the tube with the aid of a transfer pipette, and gently flicked. A new pipette was used to transfer 2mL of cold ethanol into the tube, at a 45° angle and immediately rested for 2 minutes. After 2 minutes, 200μL of DNA precipitate was then transferred to a 2mL tube, and 250μL of elution buffer was added. The PCR sample was then prepared by exchanging the DNA sample with one already filtered, purified and diluted. 20μL of DNA and 20μL of master mix was then added into a PCR tube, and immediately capped and placed on ice

Materials and method for Gel electrophoresis

PCR tube was placed into a small tube with lid cut off, then consecutively into a larger tube to fit centrifuge with other samples. The centrifuge was pulse- spun for a few seconds. 10μl of Orange G loading dye was loaded into PCR tube, and tube was centrifuged once more. The gel electrophoresis chamber and insertion of the allele ladder samples was done by lab demonstrators to ensure accuracy. One student from each group then placed 20μl of PCR sample into the chamber, and the well number was documented on each paper for identification purposes. For 30 minutes the electrophoresis chamber was run, at 100v. Following this process, the uv transilluminator was then used with the room in full darkness, and a photo was taken with a digital camera.

Results

These results identify the allele ladders present, followed by the positive controls in each chamber.

The 9 samples that are shown in the electrophoresis chamber are identified via their number, from one to four accordingly. In figure 2, both images also show a negative control.

Discussion

The role of DNA in science

DNA extraction is a process that involves the separation of strands of DNA from different parts of the cell (O’Sullivan, et al., 1999). This process is often utilised as a form of isolation or purification of the DNA prior to the PCR, or Polymerase chain reaction process. PCR is a synthetic process which replicates target sequences of DNA coding through periods of heating and cooling of Deoxyribonucleic acid until appropriate replications of the target DNA are created (Booth, et al., 2010). The target DNA at this stage is usually fabricated in extremely large quantities to ensure the efficiency of results (Reed, et al., 2007). Gel electrophoresis then occurs and is the process that pulls the DNA through smaller and smaller compartments with the aid of an electrical current until it cannot be pulled any further. These processes are frequently utilised in studies due to their diverse range of uses. For example, utilizing DNA proves useful in identifying blood or other fluids within a forensic setting (Ciampolini, et al., 2000), genetic analysis of animals in order to identify certain lineages or individuals (Cocciolone & Timms, 1992), and agricultural analysis and genetic alteration of crops (Brandt & Gonzales, 2005). Genetic diseases and cancer can also be diagnosed through applying these practices (Reed, et al., 2007).

Discussion of results

Upon completion of gel electrophoresis, the tissue sample number three matched with that of the positive control. This is easily identified due to the close match with that of the control sample in the results, as illustrated in figures 1 and 2, and clearly supports the original hypothesis. The tissue samples that were utilised were determined to be that of a sheep’s liver, as the lab demonstrators disclosed this information on the completion of gel electrophoresis. Upon reflection, it was concluded that despite the success of the experiment, there were certain aspects that were well monitored to ensure the attainment of results, making it difficult to specify any errors that could’ve been made if students completed these labs individually. For example, the DNA that students extracted was not used in the final experiment. Instead, lab demonstrators swapped this DNA over to one that was pre-filtered, purified, and diluted to utilise in the PCR process. In addition to this, students were also assisted with the gel electrophoresis process as the chamber, allele ladders and positive controls were already set up. By participating in this laboratory, it is clear that DNA is an incredible structure within any organism’s makeup, and because of this has proven extremely useful in research within scientific studies.

Bibliography

  1. Booth, C. S. et al., 2010. Efficiency of the polymerase chain reaction. Chemical Engineering Science, 65(17 ), pp. 4996–5006, doi: 10.1016/j.ces.2010.
  2. Brandt, C. G. & Gonzales, R. A., 2005. DNA Testing in Animal Forensics. Journal of Wildlife Management, 69(4), pp. 1454-1462 doi: 10.2193/0022-541X(2005)69[1454:DTIAF]2.0.CO;2.
  3. Ciampolini, R., Leveziel, H., Mazzanti,E., Grohs,C & Cianci, D. 2000. Genomic identification of the breed of an individual or its tissue. Meat Science, 54(1), pp. 35-40 doi: 10.1016/S0309-1740(99)00061-3.
  4. Cocciolone, R. A. & Timms, P., 1992. DNA Profiling of Queensland Koalas reveals Sufficient Variability for Individual Identification and Parentage Determination. Wildlife Research , 19(3), pp. 279-287.
  5. Lesk, A. M., 2005. Introduction to Bioinformatics. Second ed. Oxford: Oxford University Press.
  6. O’Sullivan, G., Sharman, E. & Short, S., 1999. The molecular biology explosion and Social Context. In: Goodbye Normal Gene. NSW: Pluto Press Australia Limited, pp. 14-16.
  7. Reed, R., Holmes, D., Weyers, J. & Jones, A., 2007. Molecular genetics II – PCR and related aplications. In: Practical Skills in Biomolecular Sciences. Essex: Pearson Education Limited, pp. 439-455.
  8. Sanderson, C. J., 2007. DNA: The template. In: Understanding Genes and GMOs. USA: World Scientific Publishing Co, pp. 6-20.
Posted in DNA

The Extraction Of DNA From Buccal Cells To Obtain DNA Quantification And Purity

The reuptake of dopamine within the brain is initiated by proteins referred to as “Dopamine Transporters” (DAT) found in-between neurons. DAT act on the pre-synaptic neurons nerve endings and allows them to absorb the dopamine neurotransmitter, thus terminating the transmission of a message. The reuptake and regulation of dopamine results in a steady and level headed mental state.

Dopamine as a molecule is a monoamine neurotransmitter, a term that refers to its chemical structure and the fact that it derives from an amino acid. Dopamine is also a Catecholamine (dopamine acts as both a neurotransmitter and a hormone) a term that also refers to its chemical structure and the fact that it contains a catechol nucleus. Dopamine synthesis occurs when the amino acid tyrosine is converted into L-Dopa which is decarboxylated to form dopamine.

There are several areas of their brain where dopamine neurons are concentrated, the largest are the substantia nigra and ventral tegmental area in the mid-brain and other areas include the hypothalamus, olfactory bulb and retina.

There are several major dopamine pathways that carry dopamine from these areas of concentration to other parts of the brain. Some of the largest are the Neostriatal/Nigrostriatal pathway, which stretches from the substantia nigra to the striatum. The mesolimbic pathway which stretches from the ventral tegmental area to the nucleus accumbens and other limbic and the structures of The Mesocortical pathway which stretches from the ventral tegmental area throughout the cerebral cortex

Abnormal dopamine levels have been associated with most commonly ADHD (1), bipolar disease (2), Parkinson’s disease (3) and various other mental conditions, and since Dopamine Transporters are the pivotal factor in regulating dopamine levels, medications act on DAT to bring dopamine levels back to normal. This is the fundamental basis on the mechanism of action of medications that aim to change dopamine levels within the brain – they’ll either suppress or stimulate these dopamine transporter proteins that aim to change dopamine levels within the brain.

Aim

To obtain DNA from buccal cells which was then used to determine the concentration and purity. Amplification of the 3’UTR DAT allowed the use of electrophoresis within an agarose gel.

Extraction of DNA from buccal cells

10ml of 0.9% saline solution in a 50ml centrifuge tube is provided, use the sterile saline solution to wash around your mouth for approximately 20 seconds (to collect buccal cells) and proceed to spit the saline solution back into the tube provided which should be labelled (name, date and solution).

Place the saline -buccal sample into a centrifuge, the important part of the centrifuge is to balance it (using similar weight samples and opposite placements within the centrifuge). Spin/centrifuge the samples at 2000 RPM for 10minutes.

Once centrifugation is done the sample will contain a pellet at the bottom of the tube and a supernatant liquid above then using a transfer pipette very carefully cyphon off the supernatant liquid as much as possible without disturbing the pellet. Remove almost all the supernatant liquid and discard the liquid into waste beakers that contain bleach (bleach pots). All that should remain is the pellet.

Using a new transfer pipette, pipette 500l of re-suspended chelex beads into your tube containing the pellet (chelex beads are an ion exchange resin which will bind to cations thus inhibiting DNases which would digest DNA).

Using the same tip re-suspend the cells by pipetting up and down several times (check that no clumps remain) and again using the same tip transfer the cell-chelex solution into a 1.5ml screw capped Eppendorf (which should be labelled with initials)

Now take the cell-chelex solution and boil for 10minutes either in a water bath or heating block (set at 95C so that the cells will lysate).Then place the tube on ice for 5 minutes to cool down.

Now the cells have become lysate, the debris and proteins must be removed by centrifugation using a micro centrifuge (once again balance the centrifuge) that span at 130000 RPM for 3 minutes to form a pellet that contains chelex and denatured proteins. The DNA is contained within the supernatant liquid therefore must be removed using a pipette into a new-clean Eppendorf tube.

Proceed to label the tube with initials then place in an ice bucket to be stored at -20C.

Discussion

The results obtained from the extraction of DNA from buccal cells/DNA Quantification and purity was successful as it was within the range of 1.8 to 2.0 as the results were 1.81.

Our third practical, however, produced unattainable results as seen from figure 2 as the DNA did not project through the gel as predicted it should have, compared to the sample as seen in Fig.3. There are several reasons why the results from Figure 1 produced unattainable results.

First, the percent of the agarose within the gel (usually 0.7-3%) determines the distance between DNA bands of a length. As, a whole the lower the concentration of the agarose gel the better it is for larger molecules as lower concentration agarose gel results in greater separation between bands that are relatively close in size. However, the main disadvantage of higher concentration agarose gels is the potential of longer-run times, instead, a potential solution to high concentration agarose gels would be to either run a PFE (pulse field electrophoresis) or FIE (field inversion electrophoresis).

Second, the applied voltage, the higher the voltage, the faster the DNA molecules move. But the applied voltage has parameters by the data that the higher the voltage the more likely the gel may melt (voltage must be set between 1-10v per cm whereas the gel may melt at 5-12V per cm). During the experiment the voltage was set at 70V but yielded no results after the set time (40 minutes) whereby it was decided with the guidance of the lab technicians that a higher voltage would need to be applied, in this instance 100v was used but also yielded unattainable results as after another set time (further 40 minutes) the gel barely moved which may occur as an internal issue with the power pack given however at 120V the gel proceeded to move down and through the gel but at a potential cost as the higher voltage applied may have caused the buffer to evaporate and expose the gel (5).

Furthermore, other causes can include the fact that agarose gel’s do not have a uniform pore size, TBE (Tris-Borate-EDTA) buffer (1mm was to be set above the gel) may have been overused both in the gel and above the gel, TBE buffer is a good conductive medium so is less prone to overheating and is used for longer runs and TBE offers a high resolution and has a high buffering capacity at greater temperatures however TAE (Tris- Acetate-EDTA) buffer may be also be used as it is significantly cheaper to make and stocks can be 50x more concentrated this can result in the TAE buffer taking up less space than the original 10x concentrated TBE stock provided. Also, agarose gel electrophoresis must be run multiple times to produce clean sample results which played an important factor as running the experiment for the first time and without the capability of running it multiple times to compare different results faulted the understanding of the experimenters as they all produced inconclusive results from their first and only attempt at a clean sample.

Conclusion

All things considered, the amplification of the gene DAT 3’UTR VNTR by the use of the laboratory technique polymerase chain reaction (PCR) required a lot of practise and accuracy to obtain clean sample results.

Founded from the results gained from the experiment of figures 1 – 4, I believe changes will need to be made for future replication of this exact technique but it will be most important to repeat the experimentation multiple times to gain clean sample results and implement more precise methods to obtain pure DNA. One way would be to find a set way to set the concentration of the agarose gel consistently through the experiments and use (the lower the better) however if a higher concentration of agarose gel is needed polyacrylamide gel can be used at higher concentrations. Another implantation would be to set the specified voltages needed and a new power pack source if needed.

Lastly, the implantation of a new buffer could be most profound as, if a higher voltage is still required and longer runs due to better buffering capacity’s (above 80V) the TBE buffer would be the necessary buffer required however TBE buffer decreases electro phonetic mobility in agarose gel by approximately 10%(4) if necessary and dependent on circumstance and situation TAE buffer could be used for shorter runs as most agarose gels are run for a relatively short period of time and TAE buffer cane be reused for up to 4 to 5 electrophoretic separations compared to TBE buffer this intern maybe able to produce purer results

Posted in DNA

Have Direct Measures Of DNA Variation Now Become Educationally Useful?

In the last few decades there has been an increasing interest on how genes affect children’s learning processes and development. Nowadays, researchers are trying to find out what exactly is contributing in education, what are the important outcomes any educational professional should know and how education can be improved. Biological factors are actually being measured in different studies to be able to understand how do genes come up in each individual and if there is any relation between them. We have seen surprising studies such “500 genes that are linked to intelligence” (Harvard University in the US, University of Edinburgh and University of Southampton in the UK, 2018) or “Why is educational achievement heritable?” (King’s College London, 2014) which show researchers’ interest to find something useful that can give proper results and new knowledge about how children learn. Nowadays we know genes influence who we are, how we are and who we turn to. Past studies have found that genes are definitely related on personality’s and successful in life since the DNA was discovered. There is of course, a strong relation with genes, but there are also environmental factors involved. Actually, there have been studies that have been looking at in which way environment and genes affect identical twins and non-identical twins. Other studies have checked the relation between parental genotypes and the educational achievement measuring parental level studies. Others have simply checked how educational achievement is related to the number of years students have been schooled. There is a massive motivation for researchers to find what is actually affecting children´s educational progress to find out how policy makers, teachers, or educational professionals can improve education.

Some psychologists and geneticists affirm that the DNA variation is absolutely useful for educational professionals to be able to work better, having precise genetic information about children’s differences to create new educational methods. Robert Plomin, for example, psychologist and geneticist investing his career in looking up for behavioural genes to explore nature-nurture interface confirms: “genetic differences cause most variation in psychological traits – things like personality and cognitive abilities. The way your parents raise you, the schools you attend – they don’t have much effect on those traits. Children are similar to their parents, but that similarity is due to shared environment.” For Plomin, there is a high probability that children´s achievement at school is due to their genes, as he explains in an interview in 2015, “If there are educational opportunities for all the children, that means that the differences that are left are going to be mainly genetic differences”. In contrast, the psychologist registered in the British of Psychological Society, Oliver James, who also criticises Plomin’s research, believes that “sticking with the genetic story holds out no hope”. Instead he prefers to stick with the environmental story, which is a far richer narrative, full of parental missteps, social maltreatment and educational neglect (The Guardian, 2018). Then, are environmental factors correlated to nature ones? Is nature the governing force behind our behaviour or is it nurture? While almost everyone agrees that it seems a mixture of both, there has been no end of disagreement about which is the dominant influence. Is DNA variation, then, useful for education?

Each student is biologically different as everyone has a unique genetic profile, therefore, a unique genotype, meaning all the genes that have been inherited. Genes are the storage units of genetic information and are made up of DNA sequences. They are essential to generate and manage mental processes, that is, they intervene in the mental life and, consequently, also in how we learn. So, they influence in the creation of neurons and other brain cells, the chemicals these cells secrete, the way they react to new information, and the way they connect with each other. They intervene, then, in the psychic faculties and intellectual capacities, but they are not deterministic because students are also influenced by the environment. The genetic profile contributes to the unique phenotypic profile, which refers on how these genes are actually expressed (including physical and nonphysical traits), so the temperament and abilities of each student will be slightly different depending on their genetic variants. In addition, only a little of each DNA is likely contributing to the variation in each trait (Kovas, Malykh, Petrill, 2014).

Scientists can look at the influence of genes on behaviour by using a mathematical formula called heritability estimate. Heritability is the percentage of variation between two individuals with respect to a specific characteristic that is attributable to genetic differences. To study heritability, scientists use information from identical twins because the genetic material is almost exactly the same which makes it easier to determine the relative influence of the environment. So, it gives information about the differences that exist between two people, but the rest will depend on the environment, as genetic effects are not deterministic. For example, the same genes may have completely different effects depending on the environments in which they express themselves. For instance, they can have highly heritable in one culture and highly environmental in another depending on the access to education (Kovas, Malykh, Petrill, 2014). Thus, genetic information about complex traits is probabilistic because just as knowing something about a person’s home’s environment may provide only probabilistic information about their educational potential, so knowing their DNA sequence can provide only probabilistic information, too (Kovas, Malykh, Petrill, 2014). In contrast, the effect of the environment on a phenotype depends on genotype. In relation to the acquisition of expertise, genotype–environment interaction refers to the possibility that children respond differently to a training regime on the basis of genetic differences between them (Kendler & Eaves, 1986; Plomin, DeFries, & Loehlin, 1977). Therefore, to what extent is it important to study the genetics related to education? This biology affects our psychology, which in turn affects how we move through the education system. The largest type of genetic studies in education that have been done are mostly twin and genome-wide association studies (GWAS) which are related to the two branches of behavioural genetics: quantitative genetics and molecular genetics. They both study the sources of the relationship of different traits to understand environmental influences (Kovas, Malykh, Petrill, 2014). There are thousands of studies looking for the impact of nature-nurture on individuals such twin methodology, evaluation of the teacher/classroom effect, examination of the ethology of learning disabilities, parental genotypes effect or socioeconomic status effect on children’s learning and educational attainment. Even the whole motivation of finding new information, it seems researchers are having difficulties to find how to quantify nurture and its effects on each individual knowing the nature is totally different in each person.

At the moment it is thought that will never find a single gene that can explain a person’s ability because it seems that the combination of genes and experiences ultimately form our personality, identity, and influences in our behaviour (Meta-analysis of genome-wide association studies for personality, PT Costa, DI Boomsma, 2010; Genes, environment and behaviour, Khan Academy). Therefore, on the way we learn. That means then that the environment then has a strong impact on how we are, too. Some kids pick things up in a flash, others struggle with the basics. This does not mean it’s all in their genes because the child’s environment can play a big role in the educational attainment. Of course, kids with supportive, stimulating families and motivated peers have an advantage, while in some extreme cases the effects of malnutrition or trauma can compromise brain development (P. Ball, 2018). Curricula, teacher training, teaching methods, class settings, educationally relevant cultural norms and values, are all examples of environmental factors that have a profound effect on what, when and how we learn. “There are multiple definitions of learning, but all of them are related to behaviour changes due to the experience; that is, with acquired changes. By interacting with the environment that surrounds us, we learn; assimilate and store the result of this interaction and use it, voluntarily or involuntarily, in future interactions. We are modifying our behaviour as we learn” (Aprendizaje y Herencia, E.Sánchez). However behavioural genetic research show that these and other educational environments interact with people’s unique genetic profiles, which may lead to huge individual differences in motivation, learning, ability an achievement, for example (Kovas, Malykh, Petrill, 2014). What it is essential to know, then, is that genes are there, but if the environment does not let them expose, they will not be expressed. If you had a whole different set of experiences over your lifetime your genes may be expressed in different ways, and you may behave differently than you do now (Genes, environment and behaviour, Khan Academy). This is to say that the environmental effects are modulated by epigenetic programming of gene expression to shape development (D. Francis, D.Kaufer, 2011). Epigenetics studies refer to the set of functional elements that regulate the gene expression of a cell without altering the DNA sequence. The cells have the ability to mark which genes should be expressed, in what degree and at what moment. The epigenetic changes are not static and can be modified throughout the life of the cell meaning they are reversible and can be influenced by environmental factors. Therefore, how important is to have a deeply knowledge about genetic mechanisms? Is it ethical to have the whole picture of genetics within education?

Clearly, genetics have an enormous influence on how a child develops. However, it is important to remember that genetics are just one piece of the intricate puzzle that makes up a child’s life. Environmental variables including parenting, culture, education and social relationships also play a vital role (K.Cherry, 2018). Having a deep knowledge about genes that influence learning and educational success can be positive for the individual itself, but it can also lead to negative consequences. We live in a world in which it seems that the “difference” is the “strange” and therefore it’s excluded within the society. It seems there are not enough values implemented in the society and everything out the normal is wrong. Ethically it seems mechanisms do not have to be deeply known and understood because they can have a big impact in the society, as it seems that everything that comes out of the ordinary has a negative connotation. Thus, having the specific knowledge of the influential genes in the education of each of the individuals enrolled in the educational system may end up resulting in social exclusion through negative labels attributed.

By personalising education, schools should draw out natural ability and build individual education plans for every single child, based on children specific abilities and interests rather than on governments decisions. The observation and tracking process has to be intensified for children who are falling behind in a basic skills for life area, and that these children receive individualized support in the school, so the abnormal has to be normal (G is for genes, K.Asbury, R. Plomin). It is believed that individualized plans associated with the needs of each one of the individuals can help them to keep developing their skills as well as their learning process the way they need, and it sounds fantastic. For example, the use of genetic markers for dyslexia as a basis for early intervention using established phonological training techniques illustrates the potential benefits (Thomas e t al. 2015). Even though, realistically, it is very utopian because the educational system we are inversed in do not give enough tools to approach the new changes teachers face every day. It is difficult to have individual plans for each student in need. As good as it sounds, in a daily basis class it is hard to give one to one opportunity to those children. We all know educational roles who actually guide those children and help them to develop, but in some cases, those students may already feel apart from the ordinary. That means that having too much genetical knowledge can also lead to intrinsic consequences related to personal and social development. The fact of labelling for being slower when it comes to learning how to read, having less facility for mathematics or having some kind of deficiency that makes educational development less possible in some areas, ends up damaging the student personally and socially. Being labelled, and therefore excluded, ends up doing emotional damage that could have been evaded at first, without having an individualized plan. But this is not all because judgements are everywhere. It seems that even if children have individual plans or not, people who surround them are able to see the learning differences and this can end up affecting their emotions. So, labelling will still be harming people due to the constant competitivity in the society. “Children go to school, they fail, they get diagnosed, they’re given special resources but by then it’s too late… Why not preventative education?” (Asbury, Plomin 2014). As they confirm, the Learning Chip can make a reliable genetic prediction of heritable differences between children in terms of their cognitive ability and academic achievement. I suspect this can be unethical, as this idea will send a chill down the spines of many parents, who might fear that children will be branded for success or failure from birth. As educators, is this what we want?

Teachers, then, need to be experts in child development, with strong personal and communication skills that allow them to connect with individual pupils, understand their needs and desires, and nurture them in the appropriate way (K. Asbury, R. Plomin). This is one of the ways in which current educational policies and practices need to be changed and genetics can suggest changes that might have a positive impact but there is a need to have in mind how having knowledge about the whole mechanism can influence individual’s personal and social development. What it is actually needed now is to teach values from a very early age, understand that each person is different, and that each has its peculiarities, defects and / or difficulties. I believe that, in addition to know how genes can interfere in anyone’s development, it is very important to know the environment in which those genes are influenced. Even so, I believe that knowledge of genes is helping to discover new ways in which children with learning difficulties can learn, but it is not enough, since the daily experience affects that learning, and therefore, the person. For genes being useful to education there is a long pathway for education to be done first, to transfer knowledge about values and respect to promote understanding. So, education has first to make few changes for genes to be able to be useful at all. Also, it is very important to keep researching in the field, to have enough evidence about how education can be improved always trying to meet individual goals, but also, realising how much negative impact they can make in their social and personal life.

Posted in DNA

Purification Of Plasmid pBR322 DNA From E. Coli Cells

Abstract

An important method used in biology is plasmid purification. What makes this method so important is because a purified plasmid sample is essential for many experiments, including important techniques like DNA sequencing. Purified E.coli plasmid pBR322 using gel electrophoresis and a calibration curve were used in this experiment to quantify the size of the purified plasmid. Examples were examined using gel electrophoresis and were measured to the known size of pBR322, this helped determined if the plasmid was the target plasmid. The results showed that it was the target plasmid but the shape of the purified DNA was more compact, since the results showed a band much lower, 3113, than the standard marker for the literature value of 4361 base pairs.

Introduction

The purpose of this experiment was to purify the plasmid pBR322 of the bacteria strain Escherichia coli, these are important bacteria that can be found in human , they usually don’t impose harm, but they can be pathogenic, infections etc. The main function of plasmid purification is to separate the plasmid DNA from cellular RNA and chromosomal DNA of the bacteria. A way to obtain this is through a technique which uses a traditional alkaline lysis method with a phenol/chloroform extraction and a subsequent precipitation using ethanol as a clean-up step. DNA then is analyzed through restriction digestion and run through a gel-electrophoresis. The use of gel electrophoresis is good for analyzing relative sizes of molecules. After applying an electric current, the negatively charged DNA backbone separates from the loading well terminal, that is negatively charged, to the positive end of the machine, by which it is attracted. The electrophoresis is run on a 1 % agarose gel, which contains pores through which the molecules will have to travel. Since DNA can have various shapes including supercoiled. Supercoiled DNA is partly unwound, it compacts DNA and is better in interacting with biomolecules. The shape of DNA is important when determining the distance of bands that travelled in the gel electrophoresis. If the DNA is more compact, then it travels further down.

Procedures

First 10 microliters of the plasmid were transferred with a P-20 micropipette into a new and clean 1.5 mL microfuge tube. Then, 2 microliters of 6X loading dye were added. Each student then transferred their sample into the well of the DNA ladder for the gel-electrophoresis. The gel was run at 100 Volts for 60 minutes and stained with protein pre-stain solution for another 50 minutes. 1.5 mL of E. coli cells were transferred and centrifuged for 2-3 minutes at 13,l000 rpm.. This process was repeated two times. The bacterial cells in the pellet were then put in 250 microliters of buffer P1. Then, 250 microliters of buffer P2 was added to the above solution and inverted. To this 350 microliters of buffer N3 were added, mixed and centrifuged for 10 minutes at 13,000 rpm. When centrifugation was complete, the supernatant was pipetted into a QIAprep spin column, centrifuged for 1 more minute and the flow-through was discarded. The QIAprep spin column was then washed with 0.75 mL of buffer PE, that contained 10 mM Tris-HCl and 80% ethanol and causes the plasmid DNA to stay on the column. The flow-through was again discarded and centrifuged for an additional 2 minutes to remove residues of wash buffer. The QIAprep column was then placed into a clean 1.5 microfuge tube and 50 microliters of buffer EB (10 mM Tris-HCl) were added to elute DNA. The mixture then left in the incubator for five minutes and centrifuged for 2 minutes. The final samples were stored.

Discussion

The results exhibited that the size of the purified plasmid pBR322 was 3113 base pairs, this indicates that is not close to the 4361 base pairs. Because of this, you can assume that the plasmid was more compact in shape, and as discussed before, the more compact in shape the further it travels down the gel. Gel electrophoresis separates molecules depending on their size, the results showed that the purified plasmid was not linear. Because if it was, it would be further up. The technique used for the purification of plasmid pBR322 was the Qiagen Mini prep-kit, this kit has all that was necessary in order to conduct the experiment. This method is good to use because it is easy and can be done in a few hours. Since the size in base pairs was less than the one expected, this shows that there were errors. Sources of error could have been wrong volumes of buffers, which does not give the ideal mixtures. Another possible error also could have been wrongly prepared agarose gel, this might have interfered or change the distance traveled.

References

  1. Ninfa, Alexander J. Fundamental Laboratory Approaches for Biochemistry and Biotechnology, 2nd Edition. 2. John Wiley & Sons, Inc, 2017
  2. Biochemistry(8th ed.).
  3. Berg, J. M., Tymoczko, J. L., Stryer, L., & Gumport, R. I. (2006).
  4. Basingstoke: W.H. Freeman & Co.
  5. Calibration Curve for Plamid DNA Purification from E.coli
Posted in DNA

DNA: The Silver Bullet For Crime Scene Investigation?

Many believe that DNA is the silver bullet in a crime scene investigation, to which I strongly disagree. Quoting Chris Alpen, ‘DNA can never replace a thoughtful, creative detective with the right resources’, and that ‘technology is ultimately a system run by humans where mistakes can and will be made, regardless of how advanced it is.’

DNA is powerful in its ability to identify a person; ¬its discovery can determine the innocence or guilt of a suspect. Trace amounts of DNA is all it takes to identify perpetrators and victims of a crime. DNA is also used in Disaster Victim Identification and kinship testing, and is important in expediting the identification of people who are decomposed beyond recognition, providing a crucial lead in investigations or closure in some instances. DNA profiling helps to uphold criminal and civil justice, but the potential for its misuse or abuse undermines its credibility, confidentiality, and validity.

Based on research, DNA evidence had been misleading, with main problems being its relevance, validity, or usefulness in proving a vital point in trials. Moreover, there were many cases where DNA was contaminated and wrongly used to prove that someone is guilty despite their innocence. DNA can be retrieved from a crime scene to determine one’s presence but it alone is inadequate as a conviction and cannot be used to prove that the person is the criminal. DNA is merely one of the many types of evidence that can be collected from a crime scene; investigators must be careful not to overlook alternative clues in the midst of scouring for DNA.

Absence of evidence is never evidence of absence, as a suspect’s DNA may not be found at the crime scene. DNA can be found in hair, nail clippings, skin cells and more. Factors affecting the transfer of epithelial cells include substrate, duration and nature of contact, shedder status, activity, and environmental conditions.

Contamination during DNA analysis ¬can be prevented with precautions and proper technique. DNA is transferred when cells travel among individuals and objects, which inevitably occurs when people touch or speak. Although advanced technology has enabled the recovery of minute traces of DNA, not only is it difficult to determine its origin, but such DNA samples also often contain material from multiple individuals, which is challenging to tease apart. Furthermore, a 1996 study revealed that DNA found on a piece of clothing was transferred to everything else in the washing machine. The major DNA profile of an item does not always correspond to the person who last touched it as everyone has different amounts of shedding.

Undeniably, DNA is valuable in solving crimes, but it is not the silver bullet. Other pieces of the jigsaw puzzle that are equally crucial include the testimony of eye-witnesses, confessions from accused persons, good detective work and expert processing of crime scene. Although rare, errors and contamination can occur and must be acknowledged. Investigators should not be overly dependent on DNA for it will bring about a miscarriage of justice.

Posted in DNA