Thursday, August 22, 2013

Extract all possible methionine residues to the end from a protein sequence

Extract all possible methionine residues to the end from a protein sequence

pI am looking to extract all Methionine residues to the end from a
sequence. /p pe.g./p pIn the below sequence:/p
precodeMFEIEEHMKDSQVEYIIGLHNIPLLNATISVKCTGFQRTMNMQGCANKFMQRHYENPLTG
/code/pre pOriginal Amino Acid sequence
(atgtttgaaatcgaagaacatatgaaggattcacaggtggaatacataattggccttcataatatcccattattgaatgcaactatttcagtgaagtgcacaggatttcaaagaactatgaatatgcaaggttgtgctaataaatttatgcaaagacattatgagaatcccctgacgggg)/p
pI want to extract from the sequence any M residue to the end e.g. obtain
the following:/p pcode-
MFEIEEHMKDSQVEYIIGLHNIPLLNATISVKCTGFQRTMNMQGCANKFMQRHYENPLTG/code/p pcode-
MKDSQVEYIIGLHNIPLLNATISVKCTGFQRTMNMQGCANKFMQRHYENPLTG/code/p pcode-
MNMQGCANKFMQRHYENPLTG/code/p pcode- MQGCANKFMQRHYENPLTG/code/p pcode-
MQRHYENPLTG/code/p pWith the data I am working with there are cases where
there are a lot more M residues in the sequence. /p pThe script I am
currently have is below (This script translates the genomic data first and
then works with the amino acid sequences). This does the first two
extractions but nothing further. /p pI have tried to repeat the same scan
method after the second scan (See commented part in the script below) but
this just gives me an error - codeprivate method scan called for
#lt;Array:0x7f80884c84b0gt; No Method Error/code /p pI understand I need
to make a loop of some kind and have tried, but all in vain. I have also
tried matching but I haven't been able to do so - I think that you cannot
match overlapping characters a single match method but then again I'm only
a beginner.../p pSo here is the script I'm using.../p
precode#!/usr/bin/env ruby require bio def
extract_open_reading_frames(input) file_output = File.new(./output.aa, w)
input.each_entry do |entry| i = 1 entry.naseq.translate(1).scan(/M\w*/i)
do |orf1| file_output.puts gt;#{entry.definition.to_s} 5\'3\' frame
1:#{i}\n#{orf1} i = i + 1 orf1.scan(/.(M\w*)/i) do |orf2| file_output.puts
gt;#{entry.definition.to_s} 5\'3\' frame 1:#{i}\n#{orf2} i = i + 1 #
orf2.scan(/.(M\w*)/i) do |orf3| # file_output.puts
gt;#{entry.definition.to_s} 5\'3\' frame 1:#{i}\n#{orf3} # i = i + 1 # end
end end end file_output.close end biofastafile =
Bio::FlatFile.new(Bio::FastaFormat, ARGF)
extract_open_reading_frames(biofastafile) /code/pre pThe script has to be
in ruby since this part of a much longer script that is in ruby.../p pMany
Thanks,/p

No comments:

Post a Comment