Find and Replace

Code junkies hangout here

Moderators: ChrisThornett, LXF moderators

Find and Replace

Postby LeeNukes » Wed Apr 21, 2010 2:49 pm

Hello,

I have a requirement to go through two large files one has 3307339 words and the other has 3998911 words.

Against these two files I need to find and replace occurances of files, at the moment they are in two columns in a spreadsheet.

So where there is an occurence of A1 for example, replace with B1 and so on.

Now I understand that I will likely need to output these, maybe to a CSV or a Tabbed CSV file, and then likely use regular expressions to do the find and replace, like this:

http://www.regular-expressions.info/perl.html

What I'm wondering about, is how do I take each entry as a variable?

The two files checking against are parallel texts, so one is in English, one is in Spanish but they are sentence aligned (this shouldn't matter, its just a background).

The content I need to replace looks like this:

Code: Select all
intermediate   intermedio
Linux   Linux
mount   montaje


So for every occurance of Linux I want to make sure it is replaced with Linux (probably a bad example). intermediate should be replaced with intermedio. Etc.
User avatar
LeeNukes
LXF regular
 
Posts: 954
Joined: Sun Jun 21, 2009 8:11 pm
Location: At the bar

Postby nelz » Wed Apr 21, 2010 3:13 pm

Create a file containing all the replacement rules and then use sed to do the replacements.
Code: Select all
sed --file=rules english.csv >spanish.csv

where rules contains
Code: Select all
s/intermediate/intermedio/g
s/friend/amigo/g
s/1sf (troll)/gringo/g
...
"Insanity: doing the same thing over and over again and expecting different results." (Albert Einstein)
User avatar
nelz
Site admin
 
Posts: 8553
Joined: Mon Apr 04, 2005 11:52 am
Location: Warrington, UK

Postby Bazza » Wed Apr 21, 2010 3:21 pm

Hi nelz...

> "s/1sf (troll)/gringo/g"

ROTFL...

You`re on form boyo... ;oD

Apologies for butting in LeeNukes but that caught my humour...
73...

Bazza, G0LCU...

Team AMIGA...
User avatar
Bazza
LXF regular
 
Posts: 1482
Joined: Sat Mar 21, 2009 11:16 am
Location: Loughborough

Postby LeeNukes » Wed Apr 21, 2010 3:30 pm

There are hundreds of replacements that need doing. Are you suggesting I put s/<original word>/<replacement word> for each occurance of a word?
User avatar
LeeNukes
LXF regular
 
Posts: 954
Joined: Sun Jun 21, 2009 8:11 pm
Location: At the bar

Postby nelz » Wed Apr 21, 2010 4:32 pm

How else would it know which words to replace or what to replace them with?

Unless you want to send it through Babelfish :)
"Insanity: doing the same thing over and over again and expecting different results." (Albert Einstein)
User avatar
nelz
Site admin
 
Posts: 8553
Joined: Mon Apr 04, 2005 11:52 am
Location: Warrington, UK

Postby LeeNukes » Wed Apr 21, 2010 6:05 pm

It's for a translation system but its to guarantee certain translations.
User avatar
LeeNukes
LXF regular
 
Posts: 954
Joined: Sun Jun 21, 2009 8:11 pm
Location: At the bar

Postby nelz » Wed Apr 21, 2010 6:11 pm

So however you did it, you'd need a translation table of some sort?
"Insanity: doing the same thing over and over again and expecting different results." (Albert Einstein)
User avatar
nelz
Site admin
 
Posts: 8553
Joined: Mon Apr 04, 2005 11:52 am
Location: Warrington, UK

Postby LeeNukes » Wed Apr 21, 2010 6:14 pm

I would have thought it would have been possible to ready in the content between the commas for a CSV and use that as a find replace.

So, assume I'd set the first entry as variable1 and the last part as variable2 then do:

s/$variable1/$variable2

I just don't know how it's done.
User avatar
LeeNukes
LXF regular
 
Posts: 954
Joined: Sun Jun 21, 2009 8:11 pm
Location: At the bar

Postby nelz » Wed Apr 21, 2010 9:48 pm

You could, but you'd need to define the variable pairs somewhere. If you have them in a file, sed will turn that file into a suitable script for sed.
"Insanity: doing the same thing over and over again and expecting different results." (Albert Einstein)
User avatar
nelz
Site admin
 
Posts: 8553
Joined: Mon Apr 04, 2005 11:52 am
Location: Warrington, UK


Return to Programming

Who is online

Users browsing this forum: No registered users and 0 guests