substring count in string
schnidda
Hi all,
I am facing a small challenge checking strings with Igor.
I tried to build a function to count all occurrences of a substring (motif) within a string (sequence) using stringmatch.
My string is about 8000 characters long, and I have more than one substring (actually hundreds) to find.
I have to add the beginning of the string to the end as well, so don't wonder why I create the "search_string"
One run takes approximately 30s - do you have any ideas to speed this up?
function find_motif(sequence, motif)
string sequence, motif
string search_string = sequence+sequence[0,strlen(motif)-1]
variable motif_count = 0
variable i = 0
for(i=0; i<(strlen(sequence)-1); i+=1)
if(stringmatch(search_string[i,(i+strlen(motif)-1)], motif))
//add warning if motif in last part
if(i>=(strlen(sequence)-strlen(motif)+1))
print "motif found in end_to_start region"
endif
motif_count += 1
endif
endfor
return motif_count
end
string sequence, motif
string search_string = sequence+sequence[0,strlen(motif)-1]
variable motif_count = 0
variable i = 0
for(i=0; i<(strlen(sequence)-1); i+=1)
if(stringmatch(search_string[i,(i+strlen(motif)-1)], motif))
//add warning if motif in last part
if(i>=(strlen(sequence)-strlen(motif)+1))
print "motif found in end_to_start region"
endif
motif_count += 1
endif
endfor
return motif_count
end
Thanks for your help!
Best,
Fabian
not tested, but precalculating the string lengths outside the loop should help.
April 11, 2019 at 02:40 am - Permalink
StrSearch is probably also much faster than manually stepping through the whole string one character at a time.
If you want to make it complicated you could even multithread several StrSearches with different starting points.
April 11, 2019 at 04:12 am - Permalink
thanks to both of you!
string length outside the loops did not improve the speed.
StrSearch had a drastic impact of a factor of 100 :-)
April 11, 2019 at 09:28 am - Permalink
If you're searching through DNA/RNA sequences, you might look at the FindSequence operation and the associated demo experiment.
April 11, 2019 at 10:40 am - Permalink