Dealing with web addresses that use delimiter other than ? I've been removing tracking IDs from website addresses that users post to my message boards and classifieds, which has admittedly gotten WAY more complicated than I intended. But I've recently run across a new one, so I'm curious how you guys and gals would suggest dealing with it. ++++++++++++++ list of top cheapest host http://Listfreetop.pw Top 200 best traffic exchange sites http://Listfreetop.pw/surf free link exchange sites list http://Listfreetop.pw/links list of top ptc sites list of top ptp sites Listfreetop.pw Listfreetop.pw +++++++++++++++ In this example, the link looked like: https://example.com/foo/bar|pcrid|391022977133|pkw||pmt||pdv|m|slid||product||pgrid|78378217177|ptaid||&pgrid=78378217177&ptaid=&source=WFP2019-DD-NATL-GD-US-BCON&subsource=78378217177---391022977133&refcode=WFP2019-DD-NATL-GD-US-BCON&refcode2=78378217177---391022977133&utm_source=Google&utm_campaign=WFP2019-DD-NATL-GD-US-BCON&utm_term=-391022977133&utm_medium=Display&gclid=EAIaIQobChMIpMLsjry95QIVQqFRCh3slA_eEAEYASAAEgLLc_D_BwE I use Perl's URI::Find to find links in the text and convert it to a ... tag, but it doesn't recognize the | delimiter so I end up with: https://example.com/foo/bar|pcrid|391022977133|pkw||pmt||pdv|m|slid||product||pgrid|78378217177|ptaid||&pgrid=78378217177&ptaid=&source=WFP2019-DD-NATL-GD-US-BCON&subsource=78378217177---391022977133&refcode=WFP2019-DD-NATL-GD-US-BCON&refcode2=78378217177---391022977133&utm_source=Google&utm_campaign=WFP2019-DD-NATL-GD-US-BCON&utm_term=-391022977133&utm_medium=Display&gclid=EAIaIQobChMIpMLsjry95QIVQqFRCh3slA_eEAEYASAAEgLLc_D_BwE And since the rest of that isn't recognized as part of the link, my system doesn't remove any of those parameters, including the parts that are actually delimited by & (and it would usually remove all of them). I'm kind of at a loss on how to handle this one. I could use a regex to find http, followed by anything that's not a space, until it gets to a |, and then remove everything after and including that |. That's a bit dangerous since someone could realistically use a | in the parameter value that I wouldn't want to remove, though. I guess that the regex would look something like: $text =~ s#\b(https?://[^\s])\|[^\s]*\b#$1#i; What do you all think? I say: Crikey. [^\s] == \S unless there's something I am overlooking. Did you mean [^\s]+ (i.e. \S+) ? That was my own question mark, but in fact you'd need to say \S+?\| in order to stop as soon as possible, i.e. before the first | character if there's more than one of them. .www.affiliate-traffic-builder.com www.earthcam.com host the oscars witcher 2 make money happybirthdates.com x domain y hosting angular 6 o domain registrar 1 domain 2 email servers domain xyz Seems like you'd want something like [\w/.,~-]+ for the path part, to constrain it to things that can reasonably occur. If you could be certain that the | is the only no-no that will ever show up, the pattern would be more like [^\s?|]+([?|]blahblah)? where ? and | don't need to be escaped inside grouping brackets (but it does no harm if you do escape them). I suppose you have already considered the possibility of telling your code to disregard (don't make links from) URLs that don't follow the rules :( Your forums members really are an unruly bunch aren't they. Disclaimer: I am about to disappear for at least 24 hours, possibly longer (thank you very much, PG&E), so if I said something hopelessly misleading you will have to remain misled. Yikes! 24 hours without lucy24! (get those folks out there to solve the power problems!)