Merge and matching tables in Oracle - oracle
Does anyone know how to merge two tables with a common column name and data into a single table? The shared column is a date column. This is part of a project at work, no one here quite knows how it works. Any help would be appreciated.
table A
Sub Temp Weight Silicon Cast_Date
108 2675 2731 0.7002 18-jun-11 18:45
101 2691 3268 0.6194 18-jun-11 20:30
107 2701 6749 0.6976 18-jun-11 20:30
113 2713 2112 0.6616 18-jun-11 20:30
116 2733 3142 0.7382 19-jun-11 05:46
121 2745 2611 0.6949 19-jun-11 00:19
125 2726 1995 0.644 19-jun-11 00:19
table B
Si Temperature Sched_Cast_Date Treadwell
0.6622 2542 01-APR-11 02:57 114
0.6622 2542 01-APR-11 03:07 116
0.7516 2526 19-jun-11 05:46 116
0.7516 2526 01-APR-11 03:40 107
0.6741 2372 01-APR-11 04:03 107
0.6206 2369 01-APR-11 09:43 114
0.6741 2372 19-jun-11 00:19 125
the results would look like:
Subcar Temp Weight Silicon Cast_Date SI Temperature Sched_Cast_Date Treadwell
116 2733 3142 0.7382 19-jun-11 05:46 0.7516 2526 19-jun-11 05:46 116
125 2726 1995 0.644 19-jun-11 00:19 0.6741 2372 19-jun-11 00:19 125
I would like to run a query that returns a results data only where Sched_Cast_Date and Cast_Date are the same. A table with the same qualities would work just as well.
I hope that this makes more sense.
Are you asking how to join two tables on a common column? i.e.
select a.Sub, a.Temp, a.Weight a.Silicon a.Cast_Date, b.SI,
b.Temperature, b.Sched_Cast_Date, b.Treadwell
from a
join b on b.sched_cast_date = a.cast_date
Related
Why are all the values same in ARIMA model predictions?
The data set had 1511 observations. I used the first 1400 values to fit ARIMA model of order (1,1,9), keeping the rest for predictions. But when I look at the predictions, apart from the first 16 values all the remaining values are the same. Here's what I tried: model2=ARIMA(tstrain,order=(1,1,9)) fitted_model2=model2.fit() And for prediction: start=len(tstrain) end=len(tstrain)+len(tstest)-1 predictions=fitted_model2.predict(start,end,typ='levels') Here tstrain and tstest are the train and test sets. predictions.head(30) 1400 214.097742 1401 214.689674 1402 214.820804 1403 215.621131 1404 215.244980 1405 215.349230 1406 215.392444 1407 215.022312 1408 215.020736 1409 215.021384 1410 215.021118 1411 215.021227 1412 215.021182 1413 215.021201 1414 215.021193 1415 215.021196 1416 215.021195 1417 215.021195 1418 215.021195 1419 215.021195 1420 215.021195 1421 215.021195 1422 215.021195 1423 215.021195 1424 215.021195 1425 215.021195 1426 215.021195 1427 215.021195 1428 215.021195 1429 215.021195 Please help me out here. What am I missing?
ncurses: init_color() has no effect
Trying to define color pairs, I was getting strange results. All 256 colors are already defined, and attempt to change any color with init_color() has no affect. I'm using Putty with 256-colors enabled and TERM=xterm-256color (also putty-256color), ncurses 6.0 compiled with --enable-widec and --enable-ext-colors. This shows all colors are defined and the init_color() doesn't change anything even though it succeeds: init_scr(); start_color(); if (has_colors() && COLORS == 256 && can_change_color()) { NCURSES_COLOR_T f; for (f = 1; f < 256; f++) { if (init_pair(f, f, COLOR_BLACK) == ERR) break; attron(COLOR_PAIR(f)); printw("(%d)", f); attroff(COLOR_PAIR(f)); refresh(); } getch(); clear(); for (f = 1; f < 256; f++) { if (init_color(f, 0, 0, f*3) == ERR) break; if (init_pair(f, f, COLOR_BLACK) == ERR) break; attron(COLOR_PAIR(f)); printw("(%d)", f); attroff(COLOR_PAIR(f)); refresh(); } getch(); clear(); } I've read that the default colors can't be changed, but only refers to COLOR_BLACK, etc (0-7). Where are these 256 default colors defined and why can't I change them? If they can't be changed, I could make use of the colors defined, but only if I can rely on them being the same on any 256-color capable terminal.
short: PuTTY doesn't do that, ncurses can't tell if PuTTY can... long: In ncurses, the init_color function checks its parameters (in the example given, those appear okay if your $TERM is "xterm-256color"), as well as checking if the terminal description has the initc (initialize_color) capability. If that is missing or cancelled, ncurses returns an error. However, that's only the terminal description. ncurses cannot tell if you have chosen an incorrect or inappropriate terminal description. In a quick check, PuTTY doesn't respond to the control sequence which is used in initc. This is a known limitation, as indicated in the (more appropriate) terminal description putty-256color provided by ncurses: putty-256color|PuTTY 0.58 with xterm 256-colors, use=xterm+256setaf, use=putty, That xterm+256setaf is used for terminals whose palette is hard-coded. PuTTY is not the only terminal which both sets TERM=xterm` and lacks the ability to change its palette. If you happen to be using an old version of the terminal database, you may be misled, since that error was fixed in 2014: # 2014-03-30 # * cancel ccc in putty-256color and konsole-256color for consistency # with the cancelled initc capability (patch by Sven Zuhlsdorf). # * add xterm+256setaf building block for various terminals which only # get the 256-color feature half-implemented -TD # * updated "st" entry (leaving the 0.1.1 version as "simpleterm") to # 0.4.1 -TD # Like the other terminals whose developers set TERM=xterm (or TERM=xterm-256color), there are differences between those and xterm. Further reading: Why not just use TERM set to "xterm"?
Couple of things I discovered. First, yes I was apparently referencing an old putty-256color terminfo that had "ccc", allowing can_change_color() to succeed, but then init_color() would fail. But the same Putty window using "xterm-256color" would init_color() OK and color_content() even shows the new values, but nothing changed on the screen. What was really confusing is sometimes the colors I set would appear and other times seemingly random colors appeared instead. Here's what I found: putty-256color xterm-256color gnome-256color xterm-256color (putty) (putty) (gnome-terminal) (MobaXterm) -------------- -------------- ---------------- -------------- change_color OK/ERR OK OK OK init_color ERR ERR OK OK color_content ERR OK/NOCH OK OK color changed? NO NO YES YES So there's basically no way to determine whether colors can be changed or not. But I did find that every terminal had already defined the standard 256 xterm colors, whether they could be changed or not. So, now, I just define the colors I want to use using the same color numbers as in the xterm palette. That way, the colors I expect will appear whether I needed to define them or not. So, to use "PaleGreen3", I just use: init_color(77, 372, 843, 372) If it works, it works, and if not, its probably already defined. For reference, I converting all the Xwindow/xterm colors from GUI hex notation to the ncurses (0-1000) values: # Name Tk Ncurses --- ---------------- ------- ------------- 16 Grey0 #000000 0,0,0 17 NavyBlue #00005f 0,0,372 18 DarkBlue #000087 0,0,529 19 Blue3 #0000af 0,0,686 20 Blue3 #0000d7 0,0,843 21 Blue1 #0000ff 0,0,1000 22 DarkGreen #005f00 0,372,0 23 DeepSkyBlue4 #005f5f 0,372,372 24 DeepSkyBlue4 #005f87 0,372,529 25 DeepSkyBlue4 #005faf 0,372,686 26 DodgerBlue3 #005fd7 0,372,843 27 DodgerBlue2 #005fff 0,372,1000 28 Green4 #008700 0,529,0 29 SpringGreen4 #00875f 0,529,372 30 Turquoise4 #008787 0,529,529 31 DeepSkyBlue3 #0087af 0,529,686 32 DeepSkyBlue3 #0087d7 0,529,843 33 DodgerBlue1 #0087ff 0,529,1000 34 Green3 #00af00 0,686,0 35 SpringGreen3 #00af5f 0,686,372 36 DarkCyan #00af87 0,686,529 37 LightSeaGreen #00afaf 0,686,686 38 DeepSkyBlue2 #00afd7 0,686,843 39 DeepSkyBlue1 #00afff 0,686,1000 40 Green3 #00d700 0,843,0 41 SpringGreen3 #00d75f 0,843,372 42 SpringGreen2 #00d787 0,843,529 43 Cyan3 #00d7af 0,843,686 44 DarkTurquoise #00d7d7 0,843,843 45 Turquoise2 #00d7ff 0,843,1000 46 Green1 #00ff00 0,1000,0 47 SpringGreen2 #00ff5f 0,1000,372 48 SpringGreen1 #00ff87 0,1000,529 49 MediumSpringGreen #00ffaf 0,1000,686 50 Cyan2 #00ffd7 0,1000,843 51 Cyan1 #00ffff 0,1000,1000 52 DarkRed #5f0000 372,0,0 53 DeepPink4 #5f005f 372,0,372 54 Purple4 #5f0087 372,0,529 55 Purple4 #5f00af 372,0,686 56 Purple3 #5f00d7 372,0,843 57 BlueViolet #5f00ff 372,0,1000 58 Orange4 #5f5f00 372,372,0 59 Grey37 #5f5f5f 372,372,372 60 MediumPurple4 #5f5f87 372,372,529 61 SlateBlue3 #5f5faf 372,372,686 62 SlateBlue3 #5f5fd7 372,372,843 63 RoyalBlue1 #5f5fff 372,372,1000 64 Chartreuse4 #5f8700 372,529,0 65 DarkSeaGreen4 #5f875f 372,529,372 66 PaleTurquoise4 #5f8787 372,529,529 67 SteelBlue #5f87af 372,529,686 68 SteelBlue3 #5f87d7 372,529,843 69 CornflowerBlue #5f87ff 372,529,1000 70 Chartreuse3 #5faf00 372,686,0 71 DarkSeaGreen4 #5faf5f 372,686,372 72 CadetBlue #5faf87 372,686,529 73 CadetBlue #5fafaf 372,686,686 74 SkyBlue3 #5fafd7 372,686,843 75 SteelBlue1 #5fafff 372,686,1000 76 Chartreuse3 #5fd700 372,843,0 77 PaleGreen3 #5fd75f 372,843,372 78 SeaGreen3 #5fd787 372,843,529 79 Aquamarine3 #5fd7af 372,843,686 80 MediumTurquoise #5fd7d7 372,843,843 81 SteelBlue1 #5fd7ff 372,843,1000 82 Chartreuse2 #5fff00 372,1000,0 83 SeaGreen2 #5fff5f 372,1000,372 84 SeaGreen1 #5fff87 372,1000,529 85 SeaGreen1 #5fffaf 372,1000,686 86 Aquamarine1 #5fffd7 372,1000,843 87 DarkSlateGray2 #5fffff 372,1000,1000 88 DarkRed #870000 529,0,0 89 DeepPink4 #87005f 529,0,372 90 DarkMagenta #870087 529,0,529 91 DarkMagenta #8700af 529,0,686 92 DarkViolet #8700d7 529,0,843 93 Purple #8700ff 529,0,1000 94 Orange4 #875f00 529,372,0 95 LightPink4 #875f5f 529,372,372 96 Plum4 #875f87 529,372,529 97 MediumPurple3 #875faf 529,372,686 98 MediumPurple3 #875fd7 529,372,843 99 SlateBlue1 #875fff 529,372,1000 100 Yellow4 #878700 529,529,0 101 Wheat4 #87875f 529,529,372 102 Grey53 #878787 529,529,529 103 LightSlateGrey #8787af 529,529,686 104 MediumPurple #8787d7 529,529,843 105 LightSlateBlue #8787ff 529,529,1000 106 Yellow4 #87af00 529,686,0 107 DarkOliveGreen3 #87af5f 529,686,372 108 DarkSeaGreen #87af87 529,686,529 109 LightSkyBlue3 #87afaf 529,686,686 110 LightSkyBlue3 #87afd7 529,686,843 111 SkyBlue2 #87afff 529,686,1000 112 Chartreuse2 #87d700 529,843,0 113 DarkOliveGreen3 #87d75f 529,843,372 114 PaleGreen3 #87d787 529,843,529 115 DarkSeaGreen3 #87d7af 529,843,686 116 DarkSlateGray3 #87d7d7 529,843,843 117 SkyBlue1 #87d7ff 529,843,1000 118 Chartreuse1 #87ff00 529,1000,0 119 LightGreen #87ff5f 529,1000,372 120 LightGreen #87ff87 529,1000,529 121 PaleGreen1 #87ffaf 529,1000,686 122 Aquamarine1 #87ffd7 529,1000,843 123 DarkSlateGray1 #87ffff 529,1000,1000 124 Red3 #af0000 686,0,0 125 DeepPink4 #af005f 686,0,372 126 MediumVioletRed #af0087 686,0,529 127 Magenta3 #af00af 686,0,686 128 DarkViolet #af00d7 686,0,843 129 Purple #af00ff 686,0,1000 130 DarkOrange3 #af5f00 686,372,0 131 IndianRed #af5f5f 686,372,372 132 HotPink3 #af5f87 686,372,529 133 MediumOrchid3 #af5faf 686,372,686 134 MediumOrchid #af5fd7 686,372,843 135 MediumPurple2 #af5fff 686,372,1000 136 DarkGoldenrod #af8700 686,529,0 137 LightSalmon3 #af875f 686,529,372 138 RosyBrown #af8787 686,529,529 139 Grey63 #af87af 686,529,686 140 MediumPurple2 #af87d7 686,529,843 141 MediumPurple1 #af87ff 686,529,1000 142 Gold3 #afaf00 686,686,0 143 DarkKhaki #afaf5f 686,686,372 144 NavajoWhite3 #afaf87 686,686,529 145 Grey69 #afafaf 686,686,686 146 LightSteelBlue3 #afafd7 686,686,843 147 LightSteelBlue #afafff 686,686,1000 148 Yellow3 #afd700 686,843,0 149 DarkOliveGreen3 #afd75f 686,843,372 150 DarkSeaGreen3 #afd787 686,843,529 151 DarkSeaGreen2 #afd7af 686,843,686 152 LightCyan3 #afd7d7 686,843,843 153 LightSkyBlue1 #afd7ff 686,843,1000 154 GreenYellow #afff00 686,1000,0 155 DarkOliveGreen2 #afff5f 686,1000,372 156 PaleGreen1 #afff87 686,1000,529 157 DarkSeaGreen2 #afffaf 686,1000,686 158 DarkSeaGreen1 #afffd7 686,1000,843 159 PaleTurquoise1 #afffff 686,1000,1000 160 Red3 #d70000 843,0,0 161 DeepPink3 #d7005f 843,0,372 162 DeepPink3 #d70087 843,0,529 163 Magenta3 #d700af 843,0,686 164 Magenta3 #d700d7 843,0,843 165 Magenta2 #d700ff 843,0,1000 166 DarkOrange3 #d75f00 843,372,0 167 IndianRed #d75f5f 843,372,372 168 HotPink3 #d75f87 843,372,529 169 HotPink2 #d75faf 843,372,686 170 Orchid #d75fd7 843,372,843 171 MediumOrchid1 #d75fff 843,372,1000 172 Orange3 #d78700 843,529,0 173 LightSalmon3 #d7875f 843,529,372 174 LightPink3 #d78787 843,529,529 175 Pink3 #d787af 843,529,686 176 Plum3 #d787d7 843,529,843 177 Violet #d787ff 843,529,1000 178 Gold3 #d7af00 843,686,0 179 LightGoldenrod3 #d7af5f 843,686,372 180 Tan #d7af87 843,686,529 181 MistyRose3 #d7afaf 843,686,686 182 Thistle3 #d7afd7 843,686,843 183 Plum2 #d7afff 843,686,1000 184 Yellow3 #d7d700 843,843,0 185 Khaki3 #d7d75f 843,843,372 186 LightGoldenrod2 #d7d787 843,843,529 187 LightYellow3 #d7d7af 843,843,686 188 Grey84 #d7d7d7 843,843,843 189 LightSteelBlue1 #d7d7ff 843,843,1000 190 Yellow2 #d7ff00 843,1000,0 191 DarkOliveGreen1 #d7ff5f 843,1000,372 192 DarkOliveGreen1 #d7ff87 843,1000,529 193 DarkSeaGreen1 #d7ffaf 843,1000,686 194 Honeydew2 #d7ffd7 843,1000,843 195 LightCyan1 #d7ffff 843,1000,1000 196 Red1 #ff0000 1000,0,0 197 DeepPink2 #ff005f 1000,0,372 198 DeepPink1 #ff0087 1000,0,529 199 DeepPink1 #ff00af 1000,0,686 200 Magenta2 #ff00d7 1000,0,843 201 Magenta1 #ff00ff 1000,0,1000 202 OrangeRed1 #ff5f00 1000,372,0 203 IndianRed1 #ff5f5f 1000,372,372 204 IndianRed1 #ff5f87 1000,372,529 205 HotPink #ff5faf 1000,372,686 206 HotPink #ff5fd7 1000,372,843 207 MediumOrchid1 #ff5fff 1000,372,1000 208 DarkOrange #ff8700 1000,529,0 209 Salmon1 #ff875f 1000,529,372 210 LightCoral #ff8787 1000,529,529 211 PaleVioletRed1 #ff87af 1000,529,686 212 Orchid2 #ff87d7 1000,529,843 213 Orchid1 #ff87ff 1000,529,1000 214 Orange1 #ffaf00 1000,686,0 215 SandyBrown #ffaf5f 1000,686,372 216 LightSalmon1 #ffaf87 1000,686,529 217 LightPink1 #ffafaf 1000,686,686 218 Pink1 #ffafd7 1000,686,843 219 Plum1 #ffafff 1000,686,1000 220 Gold1 #ffd700 1000,843,0 221 LightGoldenrod2 #ffd75f 1000,843,372 222 LightGoldenrod2 #ffd787 1000,843,529 223 NavajoWhite1 #ffd7af 1000,843,686 224 MistyRose1 #ffd7d7 1000,843,843 225 Thistle1 #ffd7ff 1000,843,1000 226 Yellow1 #ffff00 1000,1000,0 227 LightGoldenrod1 #ffff5f 1000,1000,372 228 Khaki1 #ffff87 1000,1000,529 229 Wheat1 #ffffaf 1000,1000,686 230 Cornsilk1 #ffffd7 1000,1000,843 231 Grey100 #ffffff 1000,1000,1000 232 Grey3 #080808 31,31,31 233 Grey7 #121212 70,70,70 234 Grey11 #1c1c1c 109,109,109 235 Grey15 #262626 149,149,149 236 Grey19 #303030 188,188,188 237 Grey23 #3a3a3a 227,227,227 238 Grey27 #444444 266,266,266 239 Grey30 #4e4e4e 305,305,305 240 Grey35 #585858 345,345,345 241 Grey39 #626262 384,384,384 242 Grey42 #6c6c6c 423,423,423 243 Grey46 #767676 462,462,462 244 Grey50 #808080 501,501,501 245 Grey54 #8a8a8a 541,541,541 246 Grey58 #949494 580,580,580 247 Grey62 #9e9e9e 619,619,619 248 Grey66 #a8a8a8 658,658,658 249 Grey70 #b2b2b2 698,698,698 250 Grey74 #bcbcbc 737,737,737 251 Grey78 #c6c6c6 776,776,776 252 Grey82 #d0d0d0 815,815,815 253 Grey85 #dadada 854,854,854 254 Grey89 #e4e4e4 894,894,894 255 Grey93 #eeeeee 933,933,933
Neo4j very slow for graph import
I'm using neo4j to load a graph . It is a csv file of 11 million rows and it is taking a long time for loading 2 hours have passed yet the graph is not finished loading yet Is it normal ? My laptop is an i7 2.4Ghs and 8g RAM The sample data: protein1 protein2 combined_score 9615.ENSCAFP00000000001 9615.ENSCAFP00000014827 151 9615.ENSCAFP00000000001 9615.ENSCAFP00000026847 802 9615.ENSCAFP00000000001 9615.ENSCAFP00000015235 900 9615.ENSCAFP00000000001 9615.ENSCAFP00000007210 261 9615.ENSCAFP00000000001 9615.ENSCAFP00000025394 248 9615.ENSCAFP00000000001 9615.ENSCAFP00000038575 900 9615.ENSCAFP00000000001 9615.ENSCAFP00000011457 177 9615.ENSCAFP00000000001 9615.ENSCAFP00000002193 503 9615.ENSCAFP00000000001 9615.ENSCAFP00000042321 900 9615.ENSCAFP00000000001 9615.ENSCAFP00000011541 207 9615.ENSCAFP00000000001 9615.ENSCAFP00000038517 183 9615.ENSCAFP00000000001 9615.ENSCAFP00000003009 151 Query CREATE CONSTRAINT ON (n:Node) ASSERT n.NodeID IS UNIQUE; USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'file:///linksdog.csv' AS line MERGE (n1:Node {NodeID: line.protein1}) MERGE (n2:Node {NodeID: line.protein2}) MERGE (n1)-[:ACTING_WITH {Score: TOFLOAT(line.combined_score)}]->(n2);
Pandas performance issue of dataframe column "rename" and "drop"
Below is the line_profiler record of a function : Wrote profile results to FM_CORE.py.lprof Timer unit: 2.79365e-07 s File: F:\FM_CORE.py Function: _rpt_join at line 1068 Total time: 1.87766 s Line # Hits Time Per Hit % Time Line Contents ============================================================== 1068 #profile 1069 def _rpt_join(dfa, dfb, join_type='inner'): 1070 ''' join two dataframe together by ('STK_ID','RPT_Date') multilevel index. 1071 'join_type' can be 'inner' or 'outer' 1072 ''' 1073 1074 2 56 28.0 0.0 try: # ('STK_ID','RPT_Date') are normal column 1075 2 2936668 1468334.0 43.7 rst = pd.merge(dfa, dfb, how=join_type, on=['STK_ID','RPT_Date'], left_index=True, right_index=True) 1076 except: # ('STK_ID','RPT_Date') are index 1077 rst = pd.merge(dfa, dfb, how=join_type, left_index=True, right_index=True) 1078 1079 1080 2 81 40.5 0.0 try: # handle 'STK_Name 1081 2 426472 213236.0 6.3 name_combine = pd.concat([dfa.STK_Name, dfb.STK_Name]) 1082 1083 1084 2 900584 450292.0 13.4 nameseries = name_combine[-Series(name_combine.index.values, name_combine.index).duplicated()] 1085 1086 2 1138140 569070.0 16.9 rst.STK_Name_x = nameseries 1087 2 596768 298384.0 8.9 rst = rst.rename(columns={'STK_Name_x': 'STK_Name'}) 1088 2 722293 361146.5 10.7 rst = rst.drop(['STK_Name_y'], axis=1) 1089 except: 1090 pass 1091 1092 2 94 47.0 0.0 return rst What surprise me is these two lines: 1087 2 596768 298384.0 8.9 rst = rst.rename(columns={'STK_Name_x': 'STK_Name'}) 1088 2 722293 361146.5 10.7 rst = rst.drop(['STK_Name_y'], axis=1) Why a simple dataframe column "rename" and "drop" operation costs that much percentage of time (8.9% + 10.7%)? Anyway, the "merge" operation only costs 43.7% , and "rename"/"drop" looks not like a calculation-intensive operation. How to improve it ?
Pig Join is returning no results
I have been stuck on this problem for over twelve hours now. I have a Pig script that is running on Amazon Web Services. Currently, I am just running my script in interactive mode. I am trying to get averages on a large data set of climate readings from weather stations; however, this data doesn't have country or state information so it has to be joined with another table that does. State Table: 719990 99999 LILLOOET CN CA BC WKF +50683 -121933 +02780 719994 99999 SEDCO 710 CN CA CWQJ +46500 -048500 +00000 720000 99999 BOGUS AMERICAN US US -99999 -999999 -99999 720001 99999 PEASON RIDGE/RANGE US US LA K02R +31400 -093283 +01410 720002 99999 HALLOCK(AWS) US US MN K03Y +48783 -096950 +02500 720003 99999 DEER PARK(AWS) US US WA K07S +47967 -117433 +06720 720004 99999 MASON US US MI K09G +42567 -084417 +02800 720005 99999 GASTONIA US US NC K0A6 +35200 -081150 +02440 Climate Table: (I realize this doesn't contain anything to satisfy the join condition, but the full data set does.) STN--- WBAN YEARMODA TEMP DEWP SLP STP VISIB WDSP MXSPD GUST MAX MIN PRCP SNDP FRSHTT 010010 99999 20090101 23.3 24 15.6 24 1033.2 24 1032.0 24 13.5 6 9.6 24 17.5 999.9 27.9* 16.7 0.00G 999.9 001000 010010 99999 20090102 27.3 24 20.5 24 1026.1 24 1024.9 24 13.7 5 14.6 24 23.3 999.9 28.9 25.3* 0.00G 999.9 001000 010010 99999 20090103 25.2 24 18.4 24 1028.3 24 1027.1 24 15.5 6 4.2 24 9.7 999.9 26.2* 23.9* 0.00G 999.9 001000 010010 99999 20090104 27.7 24 23.2 24 1019.3 24 1018.1 24 6.7 6 8.6 24 13.6 999.9 29.8 24.8 0.00G 999.9 011000 010010 99999 20090105 19.3 24 13.0 24 1015.5 24 1014.3 24 5.6 6 17.5 24 25.3 999.9 26.2* 10.2* 0.05G 999.9 001000 010010 99999 20090106 12.9 24 2.9 24 1019.6 24 1018.3 24 8.2 6 15.5 24 25.3 999.9 19.0* 8.8 0.02G 999.9 001000 010010 99999 20090107 26.2 23 20.7 23 998.6 23 997.4 23 6.6 6 12.1 22 21.4 999.9 31.5 19.2* 0.00G 999.9 011000 010010 99999 20090108 21.5 24 15.2 24 995.3 24 994.1 24 12.4 5 12.8 24 25.3 999.9 24.6* 19.2* 0.05G 999.9 011000 010010 99999 20090109 27.5 23 24.5 23 982.5 23 981.3 23 7.9 5 20.2 22 33.0 999.9 34.2 20.1* 0.00G 999.9 011000 010010 99999 20090110 22.5 23 16.7 23 977.2 23 976.1 23 11.9 6 15.5 23 35.0 999.9 28.9* 17.2 0.09G 999.9 000000 I load in the climate data using TextLoader, apply a regular expression to obtain the fields, and filter out the nulls from the result set. I then do the same with the state data, but I filter it for the country being the US. The bags have the following schema: CLIMATE_REMOVE_EMPTY: {station: int,wban: int,year: int,month: int,day: int,temp: double} STATES_FILTER_US: {station: int,wban: int,name: chararray,wmo: chararray,fips: chararray,state: chararray} I need to perform a join operation on (station,wban) so I can get a resulting bag with the station, wban, year, month, and temps. When I perform a dump on the resulting bag, it says that it was successful; however, the dump returns 0 results. This is the output. HadoopVersion PigVersion UserId StartedAt FinishedAt Features 1.0.3 0.9.2-amzn hadoop 2013-05-03 00:10:51 2013-05-03 00:12:42 HASH_JOIN,FILTER Success! Job Stats (time in seconds): JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs job_201305030005_0001 2 1 36 15 25 33 33 33 CLIMATE,CLIMATE_REMOVE_NULL,RAW_CLIMATE,RAW_STATES,STATES,STATES_FILTER_US,STATE_CLIMATE_JO IN HASH_JOIN hdfs://10.204.30.125:9000/tmp/temp-204730737/tmp1776606203, Input(s): Successfully read 30587 records from: "hiddenbucket" Successfully read 21027 records from: "hiddenbucket" Output(s): Successfully stored 0 records in: "hdfs://10.204.30.125:9000/tmp/temp-204730737/tmp1776606203" Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 I have no idea why my this contains 0 results. My data extraction seems correct. and the job is successful. It leads me to believe that the join condition is never satisfied. I know the input files have some data that should satisfy the join condition, but it returns absolutely nothing. The only thing that looks suspicious is a warning that states: Encountered Warning ACCESSING_NON_EXISTENT_FIELD 26001 time(s). I'm not exactly sure where to go from here. Since the job isn't failing, I can't see any errors or anything in debug. I'm not sure if these mean anything, but here are other things that stand out: When I try to illustrate STATE_CLIMATE_JOIN, I get a nullPointerException - ERROR 2997: Encountered IOException. Exception : null When I try to illustrate STATES, I get java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 Here is my full code: --Piggy Bank Functions register file:/home/hadoop/lib/pig/piggybank.jar DEFINE EXTRACT org.apache.pig.piggybank.evaluation.string.EXTRACT(); --Load Climate Data RAW_CLIMATE = LOAD 'hiddenbucket' USING TextLoader as (line:chararray); RAW_STATES= LOAD 'hiddenbucket' USING TextLoader as (line:chararray); CLIMATE= FOREACH RAW_CLIMATE GENERATE FLATTEN ((tuple(int,int,int,int,int,double)) EXTRACT(line,'^(\\d{6})\\s+(\\d{5})\\s+(\\d{4})(\\d{2})(\\d{2})\\s+(\\d{1,3}\\.\\d{1})') ) AS ( station: int, wban: int, year: int, month: int, day: int, temp: double ) ; STATES= FOREACH RAW_STATES GENERATE FLATTEN ((tuple(int,int,chararray,chararray,chararray,chararray)) EXTRACT(line,'^(\\d{6})\\s+(\\d{5})\\s+(\\S+)\\s+(\\w{2})\\s+(\\w{2})\\s+(\\w{2})') ) AS ( station: int, wban: int, name: chararray, wmo: chararray, fips: chararray, state: chararray ) ; CLIMATE_REMOVE_NULL = FILTER CLIMATE BY station IS NOT NULL; STATES_FILTER_US = FILTER STATES BY (fips == 'US'); STATE_CLIMATE_JOIN = JOIN CLIMATE_REMOVE_NULL BY (station), STATES_FILTER_US BY (station); Thanks in advance. I am at a loss here. --EDIT-- I finally got it to work! My regular expression for parsing the STATE_DATA was invalid.