data:image/s3,"s3://crabby-images/d7a86/d7a86ab8e7fb8423b56c702bb852f247ea86fe0d" alt=""
substring count in string
data:image/s3,"s3://crabby-images/15cde/15cdeed7b875902a2a203a47bb9174db5daf8323" alt=""
schnidda
Hi all,
I am facing a small challenge checking strings with Igor.
I tried to build a function to count all occurrences of a substring (motif) within a string (sequence) using stringmatch.
My string is about 8000 characters long, and I have more than one substring (actually hundreds) to find.
I have to add the beginning of the string to the end as well, so don't wonder why I create the "search_string"
One run takes approximately 30s - do you have any ideas to speed this up?
function find_motif(sequence, motif) string sequence, motif string search_string = sequence+sequence[0,strlen(motif)-1] variable motif_count = 0 variable i = 0 for(i=0; i<(strlen(sequence)-1); i+=1) if(stringmatch(search_string[i,(i+strlen(motif)-1)], motif)) //add warning if motif in last part if(i>=(strlen(sequence)-strlen(motif)+1)) print "motif found in end_to_start region" endif motif_count += 1 endif endfor return motif_count end
Thanks for your help!
Best,
Fabian
not tested, but precalculating the string lengths outside the loop should help.
April 11, 2019 at 02:40 am - Permalink
StrSearch is probably also much faster than manually stepping through the whole string one character at a time.
If you want to make it complicated you could even multithread several StrSearches with different starting points.
April 11, 2019 at 04:12 am - Permalink
thanks to both of you!
string length outside the loops did not improve the speed.
StrSearch had a drastic impact of a factor of 100 :-)
April 11, 2019 at 09:28 am - Permalink
If you're searching through DNA/RNA sequences, you might look at the FindSequence operation and the associated demo experiment.
April 11, 2019 at 10:40 am - Permalink