I've got a question about initcap.
Is it posibible to create an initcap statement to skip the change of words that are smaller than 4 characters.
Because i have to change the words with less than 4 characters back to normal, after i've finished the initcap.
So i tought mabye there is a possibility to create an function/procedure/trigger that will just skip the words?? The words are used in an location name like "Son En Breugel", the "En" in the middle must become lower.
The first letter of the string doesn't need to change, only the first small words after a space(Like in the middle of the string)
I've started to create an procedure, but it needs a bit of finetuning
*All strings that don't need to be changed with initcap are changed back
*Initcap with xDutch format
--Still need to find way to change 'S into 's, i think i've deleted the record with this script?
Can somebody assis me with this??
create or replace PROCEDURE Location_Name_Routine IS
BEGIN
DELETE
FROM Location
WHERE Name LIKE '%[^0-9a-zA-Z]%';
UPDATE Location
set Name = nls_initcap(Name, 'NLS_SORT=xDutch');
UPDATE Location
SET Name = REGEXP_REPLACE(Name,' En',' en');
UPDATE Location
SET Name = REGEXP_REPLACE(Name,' Van',' van');
UPDATE Location
SET Name = REGEXP_REPLACE(Name,' De',' de');
UPDATE Location
SET Name = REGEXP_REPLACE(Name,' Den',' den');
UPDATE Location
SET Name = REGEXP_REPLACE(Name,' Over','over');
UPDATE Location
SET Name = REGEXP_REPLACE(Name,' Aan',' aan');
UPDATE Location
SET Name = REGEXP_REPLACE(Name,' Bij',' bij');
END;
There may not be a simple answer to the underlying question. I assume you are trying to properly capitalize addresses in Dutch and this question is related to this other question from yesterday.
Combining the questions, there are at least three special cases so far:
'S GRAVENHAGE => 's Gravenhage
IJSLAND => IJsland
SON EN BREUGEL => Son en Breugel
INITCAP and even NLS_INITCAP('...', 'NLS_SORT=xDutch') fail to properly handle them. Before you start coding you should collect all the requirements. Are these the only rules for Dutch capitalization, or are there many more?
The answers posted so far may help to solve one specific exception. But chances are you cannot simply combine regular expressions and solve them all. You may want to take a more top-down approach here.
UPDATE
Based on wolφi's idead, it is possible to brute-force the problem by using all existing names. NLS_INITCAP alone works 95% of the time. Using the 431 names from the spreadsheet at this link it is possible to build a list of all 25 exceptional cases.
Run this statement once to build a DECODE expression to handle all non-trivial cases:
--Build decode for UPDATE.
select
--Start the decode
'decode(upper(name),'||
--List all the exceptions. Single quotes are a mess, no way around it.
listagg(
--Upper case version to match
''''||upper(replace(column_value, '''', ''''''))||
--Pre-defined init-capped version
''','''||replace(column_value, '''', '''''')||''''
, ','||chr(10)
)
within group (order by column_value)
||
--Default to NLS_INITCAP
',nls_initcap(name, ''NLS_SORT=xDutch''))'
from table(sys.odcivarchar2list('Bellingwedde','Menterwolde','Oldambt','Pekela','Stadskanaal','Veendam','Vlagtwedde','Appingedam','Delfzijl','Loppersum','Bedum','Ten Boer','Eemsmond','Groningen','Grootegast','Haren','Hoogezand-Sappemeer','Leek','De Marne','Marum','Slochteren','Winsum','Zuidhorn','Achtkarspelen','Ameland','het Bildt','Boarnsterhim','Dantumadiel','Dongeradeel','Ferwerderadiel','Franekeradeel','Harlingen','Kollumerland en Nieuwkruisland','Leeuwarden','Leeuwarderadeel','Littenseradiel','Menaldumadeel','Schiermonnikoog','Terschelling','Tytsjerksteradiel','Vlieland','Bolsward','Gaasterlân-Sleat','Lemsterland','Nijefurd','Sneek','Wûnseradiel','Wymbritseradiel','Heerenveen','Ooststellingwerf','Opsterland','Skarsterlân','Smallingerland','Weststellingwerf','Aa en Hunze','Assen','Midden-Drenthe','Noordenveld','Tynaarlo','Borger-Odoorn','Coevorden','Emmen','Hoogeveen','Meppel','Westerveld','De Wolden','Dalfsen','Hardenberg','Kampen','Ommen','Staphorst','Steenwijkerland','Zwartewaterland','Zwolle','Deventer','Olst-Wijhe','Raalte','Almelo','Borne','Dinkelland','Enschede','Haaksbergen','Hellendoorn','Hengelo','Hof van Twente','Losser','Oldenzaal','Rijssen-Holten','Tubbergen','Twenterand','Wierden','Apeldoorn','Barneveld','Ede','Elburg','Epe','Ermelo','Harderwijk','Hattem','Heerde','Nijkerk','Nunspeet','Oldebroek','Putten','Scherpenzeel','Voorst','Wageningen','Buren','Culemborg','Geldermalsen','Lingewaal','Maasdriel','Neder-Betuwe','Neerijnen','Tiel','West Maas en Waal','Zaltbommel','Aalten','Berkelland','Bronckhorst','Brummen','Doetinchem','Lochem','Montferland','Oost Gelre','Oude IJsselstreek','Winterswijk','Zutphen','Arnhem','Beuningen','Doesburg','Druten','Duiven','Groesbeek','Heumen','Lingewaard','Millingen aan de Rijn','Nijmegen','Overbetuwe','Renkum','Rheden','Rijnwaarden','Rozendaal','Ubbergen','Westervoort','Wijchen','Zevenaar','Almere','Dronten','Lelystad','Noordoostpolder','Urk','Zeewolde','Abcoude','Amersfoort','Baarn','De Bilt','Breukelen','Bunnik','Bunschoten','Eemnes','Houten','IJsselstein','Leusden','Loenen','Lopik','Maarssen','Montfoort','Nieuwegein','Oudewater','Renswoude','Rhenen','De Ronde Venen','Soest','Utrecht','Utrechtse Heuvelrug','Veenendaal','Vianen','Wijk bij Duurstede','Woerden','Woudenberg','Zeist','Andijk','Anna Paulowna','Drechterland','Enkhuizen','Harenkarspel','Den Helder','Hoorn','Koggenland','Medemblik','Niedorp','Opmeer','Schagen','Stede Broec','Texel','Wervershoof','Wieringen','Wieringermeer','Zijpe','Alkmaar','Bergen (NH.)','Heerhugowaard','Heiloo','Langedijk','Schermer','Beverwijk','Castricum','Heemskerk','Uitgeest','Velsen','Bloemendaal','Haarlem','Haarlemmerliede en Spaarnwoude','Heemstede','Zandvoort','Wormerland','Zaanstad','Aalsmeer','Amstelveen','Amsterdam','Beemster','Diemen','Edam-Volendam','Graft-De Rijp','Haarlemmermeer','Landsmeer','Oostzaan','Ouder-Amstel','Purmerend','Uithoorn','Waterland','Zeevang','Blaricum','Bussum','Hilversum','Huizen','Laren','Muiden','Naarden','Weesp','Wijdemeren','Hillegom','Kaag en Braassem','Katwijk','Leiden','Leiderdorp','Lisse','Noordwijk','Noordwijkerhout','Oegstgeest','Teylingen','Voorschoten','Zoeterwoude','''s-Gravenhage','Leidschendam-Voorburg','Pijnacker-Nootdorp','Rijswijk','Wassenaar','Zoetermeer','Delft','Midden-Delfland','Westland','Alphen aan den Rijn','Bergambacht','Bodegraven','Boskoop','Gouda','Nieuwkoop','Reeuwijk','Rijnwoude','Schoonhoven','Vlist','Waddinxveen','Albrandswaard','Barendrecht','Bernisse','Binnenmaas','Brielle','Capelle aan den IJssel','Cromstrijen','Dirksland','Goedereede','Hellevoetsluis','Korendijk','Krimpen aan den IJssel','Lansingerland','Maassluis','Middelharnis','Nederlek','Oostflakkee','Oud-Beijerland','Ouderkerk','Ridderkerk','Rotterdam','Rozenburg','Schiedam','Spijkenisse','Strijen','Vlaardingen','Westvoorne','Zuidplas','Alblasserdam','Dordrecht','Giessenlanden','Gorinchem','Graafstroom','Hardinxveld-Giessendam','Hendrik-Ido-Ambacht','Leerdam','Liesveld','Nieuw-Lekkerland','Papendrecht','Sliedrecht','Zederik','Zwijndrecht','Hulst','Sluis','Terneuzen','Borsele','Goes','Kapelle','Middelburg','Noord-Beveland','Reimerswaal','Schouwen-Duiveland','Tholen','Veere','Vlissingen','Bergen op Zoom','Breda','Drimmelen','Etten-Leur','Geertruidenberg','Halderberge','Moerdijk','Oosterhout','Roosendaal','Rucphen','Steenbergen','Woensdrecht','Zundert','Aalburg','Alphen-Chaam','Baarle-Nassau','Dongen','Gilze en Rijen','Goirle','Hilvarenbeek','Loon op Zand','Oisterwijk','Tilburg','Waalwijk','Werkendam','Woudrichem','Bernheze','Boekel','Boxmeer','Boxtel','Cuijk','Grave','Haaren','''s-Hertogenbosch','Heusden','Landerd','Lith','Maasdonk','Mill en Sint Hubert','Oss','Schijndel','Sint Anthonis','Sint-Michielsgestel','Sint-Oedenrode','Uden','Veghel','Vught','Asten','Bergeijk','Best','Bladel','Cranendonck','Deurne','Eersel','Eindhoven','Geldrop-Mierlo','Gemert-Bakel','Heeze-Leende','Helmond','Laarbeek','Nuenen, Gerwen en Nederwetten','Oirschot','Reusel-De Mierden','Someren','Son en Breugel','Valkenswaard','Veldhoven','Waalre','Beesel','Bergen (L.)','Gennep','Horst aan de Maas','Mook en Middelaar','Peel en Maas','Venlo','Venray','Echt-Susteren','Leudal','Maasgouw','Nederweert','Roerdalen','Roermond','Weert','Beek','Brunssum','Eijsden','Gulpen-Wittem','Heerlen','Kerkrade','Landgraaf','Maastricht','Margraten','Meerssen','Nuth','Onderbanken','Schinnen','Simpelveld','Sittard-Geleen','Stein','Vaals','Valkenburg aan de Geul','Voerendaal'))
where column_value <> nls_initcap(column_value, 'NLS_SORT=xDutch');
Use the result from that statement to build an UPDATE like this:
--Update names to properly init-capped name, as defined by:
--http://epp.eurostat.ec.europa.eu/portal/page/portal/nuts_nomenclature/local_administrative_units
update location
set name =
decode(upper(name),'''S-GRAVENHAGE','''s-Gravenhage',
'''S-HERTOGENBOSCH','''s-Hertogenbosch',
'AA EN HUNZE','Aa en Hunze',
'ALPHEN AAN DEN RIJN','Alphen aan den Rijn',
'BERGEN (NH.)','Bergen (NH.)',
'BERGEN OP ZOOM','Bergen op Zoom',
'CAPELLE AAN DEN IJSSEL','Capelle aan den IJssel',
'GILZE EN RIJEN','Gilze en Rijen',
'HAARLEMMERLIEDE EN SPAARNWOUDE','Haarlemmerliede en Spaarnwoude',
'HOF VAN TWENTE','Hof van Twente',
'HORST AAN DE MAAS','Horst aan de Maas',
'KAAG EN BRAASSEM','Kaag en Braassem',
'KOLLUMERLAND EN NIEUWKRUISLAND','Kollumerland en Nieuwkruisland',
'KRIMPEN AAN DEN IJSSEL','Krimpen aan den IJssel',
'LOON OP ZAND','Loon op Zand',
'MILL EN SINT HUBERT','Mill en Sint Hubert',
'MILLINGEN AAN DE RIJN','Millingen aan de Rijn',
'MOOK EN MIDDELAAR','Mook en Middelaar',
'NUENEN, GERWEN EN NEDERWETTEN','Nuenen, Gerwen en Nederwetten',
'PEEL EN MAAS','Peel en Maas',
'SON EN BREUGEL','Son en Breugel',
'VALKENBURG AAN DE GEUL','Valkenburg aan de Geul',
'WEST MAAS EN WAAL','West Maas en Waal',
'WIJK BIJ DUURSTEDE','Wijk bij Duurstede',
'HET BILDT','het Bildt',
nls_initcap(name, 'NLS_SORT=xDutch'));
If the question is about Dutch place names, would it be an option to have a lookup table? According to eurostat, there are 418 "Gemeenten" on NUTS5/LAU2 level. A list is available at http://epp.eurostat.ec.europa.eu/portal/page/portal/nuts_nomenclature/local_administrative_units. If this is not acceptable, you could at least verify your procedure with the official list...